Labeling data as the task of every employee

There is more and more data. According to research by the International Data Corporation (an international research company), the increase in the volume of information stored in electronic form is about 40% per year. At the same time, there is no certainty regarding security tools, applied methods and methods of data protection.

Where did the classification problem come from?

Typically, a large organization does not know exactly where and what data it has and how to protect it. The problem is that up to 90% of the data in companies is in unstructured form. These can be documents, all kinds of reports, presentations, etc., located in network folders or on users’ local machines. And, since the security personnel are engaged in ensuring this very information security of the company, then they should choose the tools. Some DLP system is often used – Data Leak Prevention, information leakage prevention – monitoring a protected circuit for messages, moving files to external media or web resources, etc. In this way, corporate information in the form of documents is protected, the transfer of personal data archives, program source codes, design documentation, etc. is prevented.Thus, in order to minimize risks, IS officers face tasks that, as a rule, include the following points:

Classification of information in the company. This is necessary to understand what data is in the company and which of them should be protected.
Differentiation of access to information. You need to know who creates the data and with what and where it is located.
Updating threats and building a list of information leakage channels. Modern channels of information leakage should be understood in order to counter them in a timely manner.

For the first two tasks, content analysis of files used by employees is used, after which resources are allocated for DLP training. In addition, data classification systems are used, which are quite popular outside our country, but less common in Russia.

Don’t know what information you have – you do not know how it is enough, but not excessively to protect

How to classify data

At first glance, the solution lies purely in the organizational plane. What is really simpler – when creating a document, specify the information class that this document contains. And this is a fairly widespread approach – some financial organizations, large industrial organizations, including international ones, follow this path. An internal regulation is being developed, which describes the principles of assigning a particular label to a document when it is created / changed. After that, employees are trained in the basics of working with sensitive information. But if technically it is not regulated in any way, then the result is still the data ends up in the wrong hands as a result of leakage. More often this happens with local machines.

Another solution that lies on the surface is automatic classification. This functionality is often implemented in DLP class solutions. There are proposals on the market that allow you to mark documents available in a network environment according to one or another attribute in the content. But the documents are also scattered across local machines. In addition, they can be copied and modified.

Next, a technical tool is considered: Docs Security Suite (DSS) – it is a platform for marking and classifying electronic documents.

Classification as the responsibility of each employee

Consider the first approach – an employee classifies documents himself. It is worth adding here technical means, which, each time a document is created / changed, copied or printed, will require assigning it to a certain category, and the task of classifying information is simplified. The selected label should be written visually into the document, as well as into the file structure.

Thus, the following is achieved:

– Mandatory compulsory marking of the document by the author or editor (owner of information);

– Accounting by technical means of electronic documents in the organization;

– Expert classification of electronic documents at the stage of document creation;

– Differentiation of user access to documents with different confidentiality labels;

– Accounting and registration of user actions with documents containing tags;

– Tracking the development of documents, forming a tree of parent and child electronic documents;

– Involvement of employees of the organization in the process of information security, as a result – an increase in the culture of information security in the company.

This way a complete set of usage data is collected for each document. And, as a result, an evidence base for investigating information security incidents, which increases the effectiveness of the investigation and reduces its cost, both in financial and in terms of time.

What is a label

The label manifests itself visually in the document. That is, if the document is confidential, the corresponding inscription is located, for example, on the background of the text. It is located, as already mentioned, automatically, and the user cannot delete it. Accordingly, each label should be centrally configurable, allowing you to specify the font, color, size, position on the page, etc. By the way, both an image and a barcode can be a visual mark.

However, in addition to this, other data must be stored in the electronic document. If we set the task of integrating our tool with DLP, then the visual image of the label will clearly not be enough for this. To do this, an electronic tag is created – additional data that is stored somewhere in the structure of the document file, for example, in metadata. The following information should be written there:

– Description of the tag. The main information within our system, just it serves for integration with DLP, for access control, etc.

– Document ID. We keep records of documents, which means that each of them must be assigned an identifier.

– ID of the parent document. Having copied or “Save as …” a file we get a new copy, so it is worth knowing its “origin”.

– Date, time of the last opening.

– Author of the document – user ID or username.

– Last document editor – ID or login. The last two points talk about who is responsible for assigning a particular label.

– The machine on which the last changes were made.

– The path along which the document lies.

– File hash. Allows you to control the integrity of the data in the document.

– Plus a certain amount of other service information.

Such a set of data allows you to more closely control the flow of data in an organization, collect a knowledge base about electronic documents and, as a result, prevent possible incidents, making life easier for information security employees.

However, it should be emphasized: the main idea of such automation is the involvement of all employees of the organization working with documents in the process of ensuring information security of the company. As a result of using data classification systems (in my example – DSS), sooner or later, all files in the security loop will have one or another label. The user will simply not be allowed to close or print a document without a mark. This means that the employee will be obliged to fulfill this duty – to classify a document that still, for some reason, does not have one or another label.