The history of personal data leakage in Github
A story about one careless workshop participant from GeekBrains.
As usual, I tried to find information about a company I was interested in on Github.
This time, the road took me in a completely different direction.
I found a personal direct link to a document (personal data) that should not be shared with anyone. The document is located on the website of the company I’m interested in:
I thought it was another new SMS bomber, but it turned out differently.
Having opened the file, I immediately realized that it was the login and password from the site highlighted in blue. This site is not affiliated with the company I’m interested in:
Having opened the history of the change, the following became clear. As it turned out, a certain user added 754 files with personal data and other confidential information with the last commit:
Example of data among files:
How did it happen?
So, the story is no longer connected to the company, which interests me. How did it happen? I also asked myself this question. I opened its Test repository and it immediately became clear from the comments why this happened:
By exploring the rest of the repositories, we confirm the connection with GeekBrains:
The student learned to fork the teacher’s repository and create repositories, as well as upload files to them, without understanding what exactly he was uploading.
Let’s take a look at this workshop:
Correspondence of groups and topics for the workshop.
What is version control system
What is a version control system for?
Installing git on your PC (depending on the system)
Installing VSCode on your PC
What is a repository and instructions for creating local repositories.
Basic work with a local repository
What are branches and what are they for when working with a version control system.
Basic branching in git.
What is a remote repository and what is it for?
Basic work with remote GitHub repositories
How it is built and why collaboration is needed in version control systems
Instructions for creating a pull request
Books and useful links on learning git.
Alternative version control systems.
Missing a section on repository auditing and repository deletion rules?
All the files that were discussed earlier in the repository from the desktop are arranged in folders:
Also studying the history of user actions, I finally understood how it all happened.
A certain Anastasia, studying at GeekBrains, performed tasks on a shared computer used by other family members, including work tasks related to the processing of personal data of drivers. As a result, as a result of training, the working folder located on the student’s desktop was mistakenly uploaded to Github along with other files in the Test repository.
A user who has access to the personal data of other individuals, presumably legally not having access to them, tries to log into IT;
GeekBrains tried to educate him;
Nothing came of it;
The student did not even understand what he had done;
The teacher had little control over what the student was doing;
The company whose contractor is the user’s relative and the relative himself has been notified, no response yet;
During the training, study in detail everything that you will do twice, if something is not clear, do not hesitate to ask the lecturer and friends;
Lecturers do not study well enough what their students do.
If you do not want to delete the repository, then you need to delete confidential data according to the instructions: Removing sensitive data from a repository – GitHub Docs
Even today, in one of my repositories, I found a Google API key that I divulged in 2019: