Hiding sensitive data when logging

Hello everyone! My name is Sergey Solovykh, I am a Java developer in the MTS Digital team. In this article I will tell you how to hide personal user data when organizing logging.

This need arises when tracking requests, analyzing errors and diagnosing problems. However, when processing users' personal data (passport data, TIN, SNILS and other identity documents), it is necessary to take into account that their contents are not subject to disclosure. This is a serious issue that affects many aspects: the company's reputation, consumer trust, legislation. So the developer's task is not only to connect the entire request chain with logs, but also to exclude from them those data that are not subject to disclosure.

Today we will not delve too deeply into the details of how this or that technology works, but will simply look at several available solutions.

And yet a little theory

Logging is the process of capturing events and storing them in a log. A log event is a customizable text entry that typically contains the severity level of the event, a timestamp, the source, and most importantly, the main message.

It is this that the developer describes in the required section of code: passes the stackTrace of the caught exception, method parameters, or records the start of some process. Then the message line goes to the logger, which formats it, supplements it with the above metadata, and publishes it to the log. Let's look at several options where we can intervene in this process and prevent data leakage.

Example

We will conduct our experiments in a Spring Boot project on the “earth digger” of the programming world – the user class. We will consider the last name, password, mobile number and even age to be confidential data:

@AllArgsConstructor
@Data
public class User {

   private String name;

   private String surname;

   private String password;

   private Long mobileNumber;

   private int age;
}

Overriding the toString() method

The simplest and most obvious way is to intervene in the creation of the log at the stage of message formation. We immediately discard the option of specifying each field manually for obvious reasons, so we will log the entire object at once:

log.info("User = {}", user);

For the user object, the toString() method is called under the hood, which returns a string representation of the object. So the first way to avoid data leakage is to write your own implementation of this method:

@Override
public String toString() {
   return "User{" +
           "name="" + name + "\'' +
           ", surname="*****"" +
           ", password='*****'" +
           ", mobileNumber=#####" +
           ", age=##" +
           '}';
}

After running the code, we will see the line in the console:

2024-04-16 12:02:11.454  INFO 48615 --- [main] dev.riccio.LogProcessor: User = User{name="Alex", surname="*****", password='*****', mobileNumber=#####, age=##}

The disadvantages of this solution include insufficient flexibility: you will have to abandon the implementation of the toString() method via the Lombok project annotations and make all edits manually in case of class changes. In addition, this will worsen the readability of the code, especially in the case of classes with a large number of fields. In general, hard code is not our method, especially since the soul is drawn to something light and declarative.

Light and declarative

Why not put an annotation above the class field so that when the toString() method is called, the value of this field automatically changes to the specified template? Let's do just that. Let's create an annotation that we will use to mark confidential data:

@Target({ElementType.FIELD})
@Retention(RetentionPolicy.RUNTIME)
public @interface Confidentially {
}

We mark the class fields:

@AllArgsConstructor
@Data
public class User {

   private String name;

   @Confidentially
   private String surname;

   @Confidentially
   private String password;

   @Confidentially
   private Long mobileNumber;

   @Confidentially
   private int age;
}

Let's see what the toString() method looks like when obtained using Lombok:

public String toString() {
   return "User(name=" + this.getName() +
           ", surname=" + this.getSurname() +
           ", password=" + this.getPassword() +
           ", mobileNumber=" + this.getMobileNumber() +
           ", age=" + this.getAge() +
           ")";
}

By default, it uses getters, so we will only need to implement an aspect that will intercept the call to getters during the execution of the toString() method and, if the requested field contains the annotation we created, return a template value. This requires a pointcut of the cflow type.

Cflow (control flow) is one of the AspectJ features that allows you to define join points based on the control flow. However, as stated Spring documentationthere is no rush to implement this function in Spring AOP:

Using AspectJ

Well, let's use ancient magic. Add a plugin to gradle that allows you to run ajc after the Java compiler:

id "io.freefair.aspectj.post-compile-weaving" version "8.6"

Addiction:

implementation "org.aspectj:aspectjrt:1.9.21.1"

And we create our aspect:

@Aspect
public class ConfidentialDataAspect {

   @Around("cflow(execution(public String *.toString(..))) && get(@Confidentially * *)")
   public Object processConfidentialData(ProceedingJoinPoint jp) throws Throwable {
       final var obj = jp.proceed();
       final Object result;

       if (obj instanceof String) {
           result = "*****";
       } else {
           result = null;
       }

       return result;
   }
}

Let's run the code:

2024-04-16 12:12:01.454  INFO 48615 --- [main] dev.riccio.LogProcessor: User = User(name=Alex, surname=*****, password=*****, mobileNumber=null, age=0)

Not as pretty as before. Since we intercept getters, they must return the same types as class fields. This means we cannot replace numeric values ​​with pretty strings like “######”. In this case, only string data can be masked – shell types will receive the null value, and primitives will be equal to zero. You can see how primitives are processed using the example of int in the org.aspectj.runtime.internal.Conversions#intValue method:

public static int intValue(Object o) {
   if (o == null) {
       return 0;
   } else if (o instanceof Number) {
       return ((Number)o).intValue();
   } else {
       throw new ClassCastException(o.getClass().getName() + " can not be converted to int");
   }
}

Replacing real values ​​with default values ​​will hide user data, but it can also cause confusion when analyzing an incident. Let's say we hid a phone number in this way – the log will show zero and questions will immediately arise: “Was the mobile number transmitted? Or was it missing from the request, and because of this, a failure occurred? Maybe we should look for the problem elsewhere?”

You can refine the logic and mask only part of the numerical value: for example, in the mobile number 71112223344, replace only a few digits after the mobile operator code with zeros, for example: 71110000044, but this approach already loses its universality and requires binding to the subject area.

A similar solution can be implemented in maven using aspectj-maven-plugin.

Message adjustment at logger level

Another option is to implement your converter at the logger level. There will be no miracles here: the message line will go to the class we created before publication and will be analyzed there according to some features.

Let's define them head-on by adding the Confidetial suffix to each field:

@AllArgsConstructor
@Data
public class User {

   private String name;

   private String surnameConfidetial;

   private String passwordConfidetial;

   private Long mobileNumberConfidetial;

   private int ageConfidetial;
}

In this project I am using logback, so I will create the following configuration in logback-spring.xml, specifying the converter class:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <contextName>logback</contextName>
   <conversionRule conversionWord="mask" converterClass="config.logging.converter.LogConverter"/>
   <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
       <encoder>
           <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{3}: %mask(%msg%n)</pattern>
           <charset>utf-8</charset>
       </encoder>
   </appender>
   <root level="info">
       <appender-ref ref="console"/>
   </root>
</configuration>

And, actually, the converter itself:

public class LogConverter extends CompositeConverter<ILoggingEvent> {

   public String transform(ILoggingEvent event, String in) {
       final String result;

       if (Objects.nonNull(in) && in.contains("Confidetial")) {
           result = Arrays.stream((in).split(", "))
                          .map(it -> {
                              if (it.contains("Confidetial")) {
                                  final var start = it.substring(0, it.lastIndexOf("Confidetial") );
                                  return start + ": \"***\"";
                              } else {
                                  return it;
                              }
                          })
                          .collect(Collectors.joining(", ", "", ")" + System.lineSeparator()));
       } else {
           result = in;
       }

       return result;
   }
}

Let's run the code:

2024-04-16 12:34:35.199 [main] INFO  d.r.LogProcessor: User = User(name=Alex, surname: "***", password: "***", mobileNumber: "***", age: "***")

Each log line is checked for the keyword before being published, split, modified and reassembled if necessary. It looks terrible, but it works.

This method has additional overhead, but sometimes it is the only possible way out, for example, if you use protobuf or avro, and the incoming request must be logged immediately. Unless, of course, you turn to the dark side and use the Reflection API, JavaParser or ASM. These tools are a topic for a separate article, since “with great power comes great responsibility”.

Conclusion

I have considered several options, starting with the basic one, which requires manual management of message generation, replacing values ​​using aspects, and editing the line at the logger level. There is no “silver bullet” among them — a unique solution that would suit all cases encountered in practice. Manually managing logs is too troublesome and carries the risk of human error. Processing annotations using aspects looks good, but this option is not suitable for generated code. The option with analyzing the generated log line requires additional resources, and the more logs you have, the more resources will be spent on this. Each situation needs to be analyzed and a solution selected for it — I have described the most obvious ones that I have encountered myself. If you have also solved a similar problem, then write in the comments, we will discuss it together.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *