Programmers should not trust anyone, not even themselves
Programmers are supposed to be paranoid.
- “I double checked the code”
- “The code passed the tests”
- “The reviewer approved my code”
“Is my code correct?”
Writing correct code is difficult, and checking its correctness is impossible. Here are some reasons why:
- Versatility: Even if your code works correctly once, will it work correctly in all cases, on all machines, in all situations?
- False positive results: Failed tests indicate the presence of bugs, but passed tests do not promise their absence.
- Lack of confidence: You could write a formal proof of the correctness of your code, but now you have to ask yourself whether this proof is correct. You will have to verify the proof. This chain of proof verification never ends.
It's foolish to be absolutely sure that your code is correct. The bug may be hiding in dependencies that you'll never find. However, don't despair. We can still reduce the risk of errors by developing a deep understanding of the code and working with it conscientiously.
Abstractions
What is “deeper understanding”? Let's focus on one aspect of code understanding that is relevant to programmers:
abstractions
.
Abstractions are…
- mental models of how exactly the code works
- when we interpret entity A as if She was entity B
- metaphorically…
- I always want to fit information in my head as compactly as possible
- you need to be able to see the forest for the trees
- is constantly used in everyday life
The word “abstraction” has many meanings. In programming, we mean layers of code that hide complexity. This post will only discuss abstractions in the cognitive sense.
Examples of abstraction:
- We treat our bank deposits as if The bank simply holds this money for us.
- In reality, the bank doesn't just hold the money we deposit. It lends out/invests most of the money people deposit. Our money Not they lie idle in a big pile in storage..
- The abstraction works because banks still keep enough cash on hand to handle most withdrawals.
- We relate to time as if it flows equally fast for everyone.
- Relativistic time dilation slightly alters the passage of time for each person/object depending on the speed and gravity the person is under.
- GPS satellites orbiting the Earth automatically adjust their clocks by ~38 microseconds per day to account for time dilation (Source)
- The abstraction works because the effect of time dilation is too small to notice unless you're doing extremely precise design.
___________________________________________________________________________________
One way to form abstractions is to remove detail (create a simplified representation of something complex). For example, most people who drive a car don't know much about the inside of their car. Their representation of the car can be summarized as follows:
- The ignition starts the car
- The accelerator makes the car move
- The brake makes the car stop
- Wheels make the car move
- The car requires petrol/diesel
Knowing this abstraction, there is no need to understand the internal structure of car engines. Most drivers have only this applied knowledge of cars, and they can get where they need to go.
When we use a programming language, it provides abstractions that allow us to control computers without fully understanding their internal workings.
- Main features of the language (such as loops, if-conditions, functions, operators, and expressions) are all abstractions that hide:
- Hardware-level details: Processor instructions, registers, flags, and details specific to the processor architecture, …
- OS level details: call stack management, memory management, …
- Portability: Languages abstract us from having to worry about differences between different machines.
- Any compiled Java program (e.g. jar file) must run on any machine that has the Java Runtime Environment (i.e. JVM) installed.
- Python Script must run on any machine with a Python interpreter.
- C program must compile and run on any machine that has a C compiler.
Abstractions fail
Unfortunately, abstractions don't work.
- Language abstractions are not enough if you want to improve the performance of your code. To make your code run faster, you need to know the hardware and OS-level details.
- Porting programs that have external dependencies, such as dynamic libraries or network requirements, is not so easy. You can't just copy them to another machine and run them. Additional configuration is required, and this requires knowledge.
- Car owners who know only the bare minimum may find themselves in a situation where their car breaks down. If a driver does not change the lubricant/oil in their car regularly, they will shorten the life of the engine.
The driver abstraction works well in the short term (for a single trip), but fails in the long term (many years). Joel Spolsky called such failed abstractions “leaky” and derived
:
All non-trivial abstractions are leaky to some degree.
Which is analogous to the axiom from statistics:
All models are wrong, but some of them are useful.
When we write code, we use leaky abstractions all the time. Here are some examples:
- Garbage collection frees us from having to worry about memory management (unless we are particularly concerned about latency)
- C++ smart pointers provide memory safety (provided you don't store raw pointers in it)
- Hash tables are fast because they perform O(1) operations (but arrays are faster at smaller sizes).
- Passing by reference is faster than passing by value (except for copy exceptions and values that fit in processor registers, such as integers)
Fortunately, many leaky abstractions will crash your code when they fail and are immediately apparent, so they are easy to fix. However, some may cause undefined behavior or performance degradation, which are harder to detect and fix.
Press X to doubt
So if abstractions can be problematic, is it worth trying to understand the topic without abstractions (to get to know cars as they are?
In fact
)? No. When you dig beneath abstractions, you just find more abstractions. It's
.
- The basis of our car abstraction example is to understand the purpose of each component.
- Behind this abstraction lies the chemistry of combustion and the mechanics of the engine.
- Below that is mathematics/physics, modeling the forces of nature.
These layers of abstractions deepen until we reach our most basic axioms about logic and reality.
As programmers, we must view knowledge as a house of cards, made up of leaky abstractions and assumptions. We must exercise a healthy skepticism toward everything and everyone, including ourselves.
Trust but check
A programmer should adhere to the “trust but verify” policy.
Here are some examples:
- Trust the information you are given, but check it against what is written in the documents.
- Test your beliefs by trying to disprove them.
- You've written tests for the code you changed, and they pass on the first try. Try running the tests without your changes and see if they still pass. There may be a bug in them that causes them to always pass.
- You've refactored your code, which shouldn't be a problem. All tests still pass. Check if there are actually tests that run the code you refactored.
- You've optimized your service and are seeing the expected reduction in resource usage. Make sure your service isn't just processing fewer requests right now.
- You released your code changes and the next day you didn't find any issues with the service. Make sure the rollout was done that day and your code was included.
- When optimizing code, always measure what you're getting. Code changes that seem “theoretically” faster may end up being slower due to factors exposed at lower levels of abstraction.
Beware of the unknown
The most terrifying epistemological problem for programmers is the “unknown unknown.”
Exist …
- things you know (i.e. “known”)
- what you know but don't know (“known unknowns”
- things you don't even realize you don't know (“unknown unknowns”)
These unknowns are the root of abstraction failure (and the reason why programmers can never accurately predict how long a project will take).
You may have never heard of…
- Sanitizing user input
- If you use custom strings in SQL queries, your service can be hacked using SQL attacks.
- Symbolic encodings
- Any text data that your code processes must use the character encoding (e.g. ASCII, UTF-8, UTF-32, etc.) that your code expects/supports.
- Random access to a character in a text buffer can take constant time (for ASCII) or linear time (for UTF-8) depending on the character encoding…
- You may get strange characters if you try to read text data written in the wrong encoding.
- Java Heap Size
- Your program may slow down due to lack of heap memory.
- You could solve this problem if you knew to configure a larger maximum heap size for your Java program.
If you haven't heard of these topics before, you may not even realize that you've fallen into traps.
There is no foolproof way to catch unknowns when they are nearby, but we should look for them under at least one layer of abstraction. Especially if the project requires learning something new, you should always learn more than necessary. This will reduce the risk of unexpected errors at the level of abstraction.
When working with or learning an unfamiliar platform/language/tool/library/technology:
- Read more documentation than the bare minimum
- Watch the video
- The presentations at the conference are of the highest quality.
- Read blog posts
- Read the source code
- Expand your understanding of the abstractions you will have to work with
- Learn about features recently added to the programming language you are working with.
- Explore all public library features, not just the ones you use.
- See all the flags in the CLI tool's man page.
- Learn abstractions at least at a level lower than you need.
- Learn about your compiler's optimizations.
- If you're using a service, look into an orchestration platform (such as Kubernetes).
- If you work in Java, learn the JVM.
- If you work in Python, learn the Python interpreter.
Conclusion
Abstractions are necessary because they allow us to think effectively, but they are insidious because they can make us feel like we “know enough.” Programmers who learn superficially will not succeed in complex projects. Those for which there are no known solutions and which do not span multiple subject areas.
However, the ideal picture presented in this article must be measured against reality. Obviously, in a rush, we cannot take the time to study every little detail. Moreover, beginners cannot be expected to be so meticulous. Ideals must be balanced against real circumstances.
Real circumstances must be balanced against ideals, however. We must be willing to incur some short-term costs in order to gain time for careful learning and testing. Not only to write correct code, but as part of our long-term professional development.