A simple code analyzer for a program written in 8051 Assembler
The main task was to reduce the size of the program, since used a microcontroller with a small amount of memory, and the functionality of the product should be large. Therefore, the idea came up to use a code analyzer, a search on the Internet did not give anything, so I had to do it myself.
I decided to share my ideas, because I think that someone can write a more decent program for analyzing a program in 8051 assembler.
In this article I will describe the main stages of the resulting analyzer. Some of these steps can be used to analyze programs written in other languages.
Stage 1. First, you need to convert the source code of the program to the simplest possible form. It is more convenient to work with the program code from which everything unnecessary has been removed.
Example:
Function_ADD:
mov A,Peremenay1
add A,#Constanta1
mov Peremenay2,A
bring to mind
code_0001: mov A,020h
add A,#030h
mov 021h,A
For this I decided to make disassembly from a HEX file. So it seemed much more convenient than processing the source, besides, the compiled file already has the code allocated in memory, which will be useful later.
Stage 2. I created a table in which I entered the lines of the source program, the type of the instruction, the type of the operands, the address in memory, etc.
Example:
Line number | Jump address (if any) | Operation name | Operand 1 | Operand 2 | Jump address | Operand type 1 | Operand type 2 |
1|code_0001|mov|A|020h| |ACC|DIR|
2| |add|A|030h| |ACC|CONST|
3| |mov|021h|A| |DIR|ACC|
The more filled-in columns with different possible options, the easier it will be to do the analysis.
Stage 3. Let’s go directly to the analysis. While I have analyzed two neighboring commands, the algorithm for such an analysis is quite simple, we check the commands according to certain rules.
Example:
1. Substitution of procedure call commands
LCALL code_0001
RET
can be replaced with the command
LJMP code_0001
this will save 1 byte.
2. Replacement of assignment commands
MOV A,#CONSTANTA
MOV @R0,A
can be replaced with the command
MOV @R0, #CONSTANTA
or
MOV A,@R0
MOV RAM,A
can be replaced with the command
MOV RAM,@R0
3. Replacing movement commands
JNZ label1
SJMP label2
label1:
.................
label2:
can be replaced with the command
JZ label2
label1:
..........
label2:
And there are a lot of such options, when a program is written, it is not always written taking into account space saving, but so that it would be clear in the future.
Stage 4. Count how many hits for each link address, the task is to find single hits. We also look at which command is in front of the jump address in the program. Based on these data, we add additional data regarding the transitions to the table.
Example:
MOV A,030h
MOV 020h,A
ret
code_0001: MOV A,020h
This means that it is possible to move a part of the program starting from the jump address “code_0001” closer to the command of accessing this address, or to remove the jump command altogether, and move part of the program to its place.
There are also zero calls, this is when a certain address can only be accessed through a jump command, but it is not in the program, for example, as in the example, before the command with the address “code_0001” is the command “RET”.
Stage 5. Count how many times each cell of data memory (RAM) is used, and how it is used (initial data for a command or changing data into cells).
For example, it may turn out that a memory cell is only changed, while it is not used anywhere as source data. You can get rid of commands that change such a cell.
Stage 6. Compilation of the source code. After compilation, it is possible to check the number of bytes between the branch instruction and the branch address, for example, in order to shift a part of the code closer to the branch instruction, replacing it from LJMP to SJMP.
When using this analysis, I got the result that the 8-kilobyte program was reduced by 300 bytes.
Further plans, to analyze the code based on its functioning, i.e. what a particular part of the program does and how it can be replaced.