Studying the sources in the repository allows you to evaluate the level of application security. But if no one is looking at the code, the problems will only grow. Fortunately, GitHub has its own security experts who recently discovered Trojan in several Git repositories. For some reason, the owners of these repositories themselves did not notice it. Although we cannot dictate to other people how to manage their own repositories, we can learn from their mistakes. In this article, we will look at useful techniques for working with repositories.
Explore Your Repository
Perhaps this is the most important recommendation. Regardless of whether you created the repository yourself or handed it to you, it is important to know the contents of your repository. At a minimum, you need to know the basic components of the code base that you manage. If a random file appears after dozens of merges, you can easily find it, because it will cause you questions. Next, you will want to check it to figure it out, and after that decide its fate.
Try not to add binaries
Git was originally tailored for text files, whether it be code in C, Python or Java, or JSON, YAML, XML, Markdown, HTML and so on:
$ cat hello.txt This is plain text. It's readable by humans and machines alike. Git knows how to version this. $ git diff hello.txt diff --git a/hello.txt b/hello.txt index f227cc3..0d85b44 100644 --- a/hello.txt +++ b/hello.txt @@ -1,2 +1,3 @@ This is plain text. +It's readable by humans and machines alike. Git knows how to version this.
Git doesn’t like binary files:
$ git diff pixel.png diff --git a/pixel.png b/pixel.png index 563235a..7aab7bc 100644 Binary files a/pixel.png and b/pixel.png differ $ cat pixel.png �PNG ▒ IHDR7n�$gAMA�� �abKGD݊�tIME� -2R�� IDA�c`�!�3%tEXtdate:create2020-06-11T11:45:04+12:00��r.%tEXtdate:modify2020-06-11T11:45:0
Data in a binary file cannot be analyzed in the same way as plain text, therefore, if something changes in the binary file, it must be completely overwritten.
Even worse, you yourself cannot verify (read and analyze) binary data.
In addition to the usual POSIX tools, you can find binaries using git diff. When you try to run the diff command with the –numstat option, Git will return a null result:
$ git diff --numstat /dev/null pixel.png | tee - - /dev/null => pixel.png $ git diff --numstat /dev/null file.txt | tee 5788 0 /dev/null => list.txt
If you are still considering adding binary files to your repository, stop and think. If a binary is generated during the build process, then why add it to your repo? If you nevertheless decide that it makes sense to do this, make sure that you in the README file or a similar place describe why you store the binaries and what is the protocol for updating them. Updates should be performed sparingly, because with every change you make to a binary object, the space for storing it doubles.
Third-party libraries must remain third-party
Although one of the many benefits of open source is that you can freely use and distribute code that you did not write, there are many good reasons not to host a third-party library in your own repository. First of all, you will have to independently check all this code and its further updates to make sure that the library is reliable. Secondly, when you copy third-party libraries to the Git repository, this shifts the focus from the main project.
To manage external dependencies, use Git submodule.
Do not use git add “blindly”
If your project has successfully compiled, do not give in to the desire to use the git add command. (where “.” is the current directory for example). This is especially important if you are not compiling your project manually, but using the IDE to manage your project. It can be extremely difficult to keep track of what has been added to your repository when the IDE manages your project. Therefore, it is important to add only what you yourself created and prepared for addition, and not any new object that mysteriously appeared in the folder of your project.
So before you run git add, see what will be added to the repository. If you see an unfamiliar object, find out where it comes from and why it is still in your project directory after running the make clean command (or the equivalent command).
Use git ignore
A typical catalog of any project contains many hidden files, metadata and unnecessary artifacts. You are better off ignoring these objects: the more they will be, the greater the likelihood that this “garbage” will interfere with you and you will miss something important or dangerous.
The gitignore file makes it possible to filter out the excess. Github.com/github/gitignore offers several specially created gitignore templates that you can download and place in your project. Gitlab.com, for example, offered such patterns a few years ago.
Moderate code base changes
When you receive a merge or pull request, or you receive a patch by e-mail, you should make sure that everything is fine there. Your task is to study the new code coming into your code base and understand what it does. If you do not agree with its implementation or, even worse, do not understand this implementation, write a message to the sender and ask for clarification. There is nothing wrong with learning new code that claims to be in your project. Moreover, you do this for the benefit of your users: in this case, they will clearly understand what changes you accept and why.
Securing open source software is a community effort. Study the codebase, don’t discourage clutter, and don’t ignore potential security threats in the repositories you cloned. Git is a powerful tool, but it’s just a computer program, so the responsibility for managing the repositories ultimately rests with you.
As an advertisement
Epic servers – this is Linux virtual servers or Windows with powerful AMD EPYC family processors and very fast Intel NVMe drives. Diverge like hot cakes!