What an engineer can do when cornered (or how the Display Port works in the Aspeed AST2600)

Hi all!

My name is Valentin Alferov, I am the head of the marketing department at E-Flops. In this article, I want to share with you the experience of our programmers, who told me this story in detail and demonstrated the results.

Given:
A server board with the latest chip from Aspeed. The board has two video connectors soldered directly from the chip: VGA and DisplayPort. Video is reproduced wonderfully through the ancient analog VGA connector, but the modern fast and digital DP does not want to work. Not that it does not want to work at all… just rare unstable flickering of the picture.

Task:
1. Fix the image output on Display Port.
2. Yesterday

You may ask, why did we need DP? All servers have always worked via VGA, and no one suffered from it.

The answer is simple: We just decided to make it better and more convenient. VGA is ancient and it is already being installed because “it has always been installed, where would we be without it”? We are making a NEW server platform for years to come. It should be said that we have not yet completely abandoned VGA and technically it is still available simultaneously with DP (we will tell you how to use VGA another time). But a monitor without a VGA connector is becoming more and more common, and in the future all monitors will be without VGA. DP supports hot-plug. DP supports high resolutions and frame-rate. And finally, DP is supported by the ASPEED chip. And since the hardware allows it, then “Why not YES!?”. In the end, DP-VGA adapters are still available for oldies. The main thing is not to get carried away and end up with something like this:

Three minds gathered in a cozy little room of programmers: Mikhalych – a seasoned systems engineer, from whose stern look alone the transistor keys in the microcircuits are ready to switch to the necessary levels.
Viktorych – Viktorovich is not his age, but since the trend has set in, it is not right to lag behind – a young, energetic engineer with a background in various fields, capable of powering a light bulb with his enthusiasm alone. Kuzmich is our spiritual leader and mentor, pushing with his belly authority of our startup's cargo straight from the cliff towards a bright future. “You can't refuse Kuzmich!” (With).

What we had, besides the above: A Datasheet for AST2600, found with great difficulty on the Internet and not the latest, with a description of most registers, but without much detail (there was no Errat, but there were references to it). The Graphics User Guide, which was relatively easy to find, on the one hand adds some knobs to the DP that you can turn, but on the other hand doesn't quite explain their purpose (like: if your DP doesn't work, dump these memory addresses and contact our technical support) Technical support, which for obvious reasons won't want to talk to you right now. A representative of the “iron masters” broadcasting to the entire laboratory that everything programmers pianists promote LGBT ideology and that the DP port does not work only because of the “crooked” settings of our software. It sounded like a challenge, and we accepted it.

Let's go!

Having analyzed the docs and a separate binary for DP, Mikhalych assumed that all processing occurs in a separate core, with an architecture different from ARM and with access to a common bus. The 32 registers mentioned in the Datasheet, mapping to common addresses, hinted that these are the registers of this particular third-party core.

Viktorych, try to open this binary under different architectures with a widely used disassembler, maybe it will understand which one? Kuzmich set the task.

Viktorych was silent for half an hour, then an answer was heard:

No, it looks like this is not an executable file. I tried a dozen common architectures, and it turns out to be gibberish. And judging by the HEX editor, the values ​​in large intervals of the file are very often repeated and similar in value. Most likely some kind of config.

Yes, it is binary! I tell you for sure! Mikhalych did not calm down. I'm ready to take it apart and write my own disassembler. I just need time and the go-ahead from my boss.

After two days of other unsuccessful experiments, Kuzmich gave Mikhalych the go-ahead for research work. And literally a couple of days later, Mikhalych produces the first intermediate results. He parsed about 30% of the bytecode and it really began to take shape in the executable listing, in which about 70% of the commands are initialization of registers and areas (it was the use of practically the same commands that made Viktorych mistakenly assume that this was a config file).

“Mikhalych, how did you figure out this puzzle? What got you stuck on it?” Viktorych asked casually.

Aspeed has a built-in hardware debugger specifically for DP. You can give it commands directly from uboot. There is a step-by-step mode, a breakpoint, and the ability to view the contents of registers. The step-by-step mode did not work very well, it skipped commands, and showed incorrect values. But the breakpoint worked stably! I used it to tear apart the code.

Viktorych hooted, jumped at the computer and began to vigorously click something with the mouse. Soon a joyful cry was heard:

Aha!!! Mikhalych, throw away your disassembler, I think I found it.

Everyone gathered around Viktorych's work place, but the joy was premature. Although the instructions were very similar to WDC 65816, alas, there was no complete hit.
Continued collecting information. Viktorych analyzed the intermediate assembler listing from Mikhalych, and he continued his research on the remaining unclear commands.

We've arrived, or what Mikhalych found out

And he found out that: machine instructions have the same size – 4 bytes; when executing commands in step-by-step mode, changes in registers are visible; there are 31 general-purpose registers in total (r1-r31), the registers are 32-bit, although the controller itself is 16-bit; there is a pseudo-register r0, which is always equal to zero, r31 is a link register, r29 is possibly a stack pointer, since it is initialized to the end of memory and is not explicitly used anywhere else; program counter (PC) exists separately; the system uses interrupts; the address space of the control program (CP) of the DP controller is mapped to 0x18020000 of the main core and has a size of 16 KB; from the point of view of the DP controller, the UP lives in the address range [0-0x4000]; the DP controller register space is mapped to address 0x18010000 and has a size of 4 KB; the address space of the UP data is mapped to address 0x18000000 and has a size of 4 KBytes; during the execution of the control program, r20 is always equal to 0x1e6eb000 (as a constant) – this is the base address of the DP control registers; during the execution of the control program, r9 is always equal to 0x18000000 – the base address for mapping DPCD and RAM for the control program.

Mikhalych's further logic was as follows.

If you look at the contents of the DP binary dump, the beginning is like this:

0xa0000080 0xa0000080 0xa0000080 0xa00035dc
0xa000363c 0xa000369c 0xa0000080 0xa0000080
0xa0000080 0xa0000080 0xa0000080 0xa0000080
0xa0000080 0xa0000080 0xa0000080 0xa0000080
0xa0000080 0xa0000080 0xa0000080 0xa0000080
0xa0000080 0xa0000080 0xa0000080 0xa0000080
0xa0000080 0xa0000080 0xa0000080 0xa0000080
0xa0000080 0xa0000080 0xa0000080 0xa0000080
0x2a801e6e 0x5694b000 0x29201800 0x55290000
0xdc090afc 0x54c00104 0x54e00000 0x05093800
0xdc080000 0x0ce70004 0x90e6fff4 0xdc090200
0xdc090204 0xdc090600 0x54c0beef 0xdcc90604
0x54c009fc 0x54e00800 0x05093800 0xdc080000
0x0ce70004 0x90e6fff4 0x54c00d30 0x54e00d00

It is clear that the first 32 four-byte words differ from the following ones. It is logical to assume that the interrupt vector table is located at the beginning, and then the usual code.
It follows that the command 0xa0000080 is similar to an unconditional jump to address 0x80 – since the “normal” code begins at this address.

The first two instructions after the interrupt table have the values ​​2a801e6e and 0x5694b000. You can pay attention to the lower 16-bits of these instructions – together they make up 0x1e6eb000 – and this is the address of the DP controller registers!!!
Let's assume that the lower 16 bits are allocated for data, and the higher ones are for indicating the type of operation and the number of registers to which it should be applied. Since there are 32 registers, 5 bits should be allocated for storing the register number.

Let's remember that r20 is always 0x1e6eb000 and look for the number 20 (binary 10100) in the high bits.

Let's look at the most significant bits in binary code:
0x2a80 = 0010 1010 1000 0000 = 001010 10100 00000 (op rD rS) = 0x0a 20 0
0x5694 = 0101 0110 1001 0100 = 010101 10100 10100 (op rD rS) = 0x15 20 20
op is the operation code (6 bits)
rD – target register (5 bits)
rS – source register (5 bits)
It turns out that instruction 10(0x0a) is or or add for the higher 16-bits
or.h r20, r0, 0x1e6e # r20=0x1e6e0000
Instruction 21(0x15) is or or add for the lower 16-bits
or.l r20, r20, 0xb000 # r20=0x1e6eb000
Instruction 40(0x28) – jumps in interrupt table 0xa0000080 – this is jump to the specified address
jump 0x80

It is not difficult to check these assumptions if you write small tests directly in the codes and add them to the beginning of the regular code (after the vector table), but you must not forget to loop the tests, fortunately we have a command for unconditional jumps!!!

Further investigation showed that the command system is simple: • the operation code is always 6 bits; • it is possible to execute subroutines – r31 is used as a link register, that is, the return address is automatically written there; • there is a stack and the push and pop commands are supported; • there is a ret and iret command – exit from a subroutine and from an interrupt; • there is most likely no flag register, because there is no direct reference to it in the command codes, the analysis of the result is performed directly in the conditional commands.

The list of commands that Mikhalych eventually managed to decode (decimal codes, mnemonics – so that it would be clear):
1 add.r regD, regS1, regS2
2 add.c regD, regS, data
3 add.u regD, regS, data
6 sub.i regD, regS, data
7 sub.u regD, regS, data
10 or.h regD, regS, data
11 nop data
12 push rS
13 push rD
16 and.r regD, regS1, regS2
17 and.u regD, regS, data
18 orn.r regD, regS1, regS2
20 or.r regD, regS1, regS2
21 or.l regD, regS, data
26 sl.d regD, regS, data
30 sr.d regD, regS, data
31 sr.r regD, regS1, regS2
32 ie regD, regS, rel_offs
33 jz regD, regS, rel_offs
34 jme regD, regS, rel_offs
35 jm regD, regS, rel_offs
36 jle regD, regS, rel_offs
37 jl regD, regS, rel_offs
38 jne regD, regS, rel_offs
39 jnz regD, regS, rel_offs
40 jump addr
41 call addr
42 ret rS
44 years
49 ldr.1 regD=[regS+offs] // accordingly, loading 1 byte
51 ldr.2 regD=[regS+offs] // loading 2 bytes
52 ldr.4 regD=[regS+offs] // loading 4 bytes
53 p.1 [regS+offs]=regD
54 p.2 [regS+offs]=regD
55 p.4 [regS+offs]=regD

Let's go again, only this time quickly!

Then, over the course of three days, only periodic shouts were heard from Viktorych:

Oh, I found a function that changes registers by masks. It is used almost everywhere.
Yep, all three interrupt handlers handle different variations of the HPD (Hot-Plug Detect) signal.
Oops, here comes the exchange via AUX…. And here is the second one.
A – this is where Link-training happens. I won't go into detail, it's specific, but the meaning is clear.
Look, I found the setting for the number of lines and signal speed (after which Viktorych gave me a patch to reduce the exchange speed from 2.7 to 1.6 Gbps, and Mikhalych welded the firmware). After starting DP, the oscilloscope actually saw a 1.6 Gbps signal and the monitor began to blink the picture more often.

As a result, after a few days, almost the entire FW dump was translated into digestible code. Finally, the meaning of the key values ​​0xdead, 0xcafe, 0xaced written to the configuration space became maximally clear. The logic of the counters in the DPCD region became clear. And much more interesting. We attach the files, admire them yourself.

Bloody results

DisplayPort we won after all. The problem was not with the programmers.

About the ASPEED DP MCU command system. We assume that this ASM has its roots somewhere in Motorola M68. It also had 32-bit registers and a 16-bit data bus. But our knowledge is not limitless, so welcome to the comments. And we have already started working on other tasks.

THAT'S IT! Wait for new stories.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *