A real gaming router

We're racing GTA: Vice City on a TP-Link TL-WDR4900 wireless router.

What it is?

This is a TP-Link wireless router equipped with an external AMD Radeon GPU. It connects via PCIe, runs Debian Linux, and you can play games on this router:

What is the highlight of this router?

TP-LINK's TL-WDR4900 v1

is a very interesting WiFi router. Instead of the typical MIPS or ARM CPUs found in regular WiFi routers, the WDR4900 features a

PowerPC

from NXP.

CPU NXP/Freescale QorIQ P1014used in WDR4900 is 32-bit PowerPC processor.
These CPUs provide full 36-bit address space, and the processor is very fast (for a 2013 router) and has excellent PCIe controllers.

They quickly gained popularity in the OpenWrt and Freifunk communities, which is not surprising for a cheap router powered by such a high-performance CPU. WiFi chipsets with frequencies of 2.4 GHz and 5 GHz (manufactured by Qualcomm/Atheros) are connected to the CPU via PCIe.

PCIe Issues in Embedded Systems

PCIe cards are transparently mapped to host CPU memory space. The PCIe controller on the host CPU then sends all accesses that affect a specific memory region to the exact PCIe device that is responsible for that memory region.

Each PCIe card can provide several such mappings, also called “BAR” (registers for assigning base addresses). The maximum size of such mappings varies from CPU to CPU.

Previously, even a regular CM4 for Raspberry Pi could allocate only 64 MiBiB of its address space for graphics cards.
Many other devices (such as MIPS-based router CPUs) are limited to only 32 MiBiB (or less).

In principle, all modern graphics cards require that the BAR address space on the host system be at least 128 MiBiB – this is how much is needed for communication with the driver. Even relatively new cards, particularly Intel ARC, require “Resizable BAR” – a marketing term for very large 64-bit areas of memory. Such cards allow you to map the entire VRAM (about 12+ GiBiB) to the host memory space.

Even with sufficient BAR space, PCIe device memory may not perform quite as well as conventional memory (for example, on an x86 CPU). This is why numerous problems arose when trying to connect the GPU to the Raspberry Pi.

Similar problems (related to memory ordering/caching/nGnRE maps/alignment) occur even on large Arm64 server processors, and because of this you have to brute force the kernel and come up with workarounds such as:

We will retrofit the miniPCIe slot

Out of the box, the router does not provide any connectivity with external devices via PCIe. To connect to the graphics card, we developed our own miniPCIe printed circuit board adapter, which is connected to the router with an enameled copper wire:

The PCIe tracks leading from the CPU to one of the Atheros chipsets had to be cut and redirected to a miniPCIe slot.

U-boot reports that PCIe2 is connected to the AMD Radeon HD 7470 graphics card:

U-Boot 2010.12-svn19826 (Apr 24 2013 - 20:01:21)

CPU:   P1014, Version: 1.0, (0x80f10110)
Core:  E500, Version: 5.1, (0x80212151)
Clock Configuration:
       CPU0:800  MHz,
       CCB:400  MHz,
       DDR:333.333 MHz (666.667 MT/s data rate) (Asynchronous), IFC:100  MHz
L1:    D-cache 32 kB enabled
       I-cache 32 kB enabled
Board: P1014RDB
SPI:   ready
DRAM:  128 MiB
L2:    256 KB enabled
Using default environment

PCIe1: Root Complex of mini PCIe Slot, x1, regs @ 0xffe0a000
  01:00.0     - 168c:abcd - Network controller
PCIe1: Bus 00 - 01
PCIe2: Root Complex of PCIe Slot, x1, regs @ 0xffe09000
  03:00.0     - 1002:6778 - Display controller
  03:00.1     - 1002:aa98 - Multimedia device
PCIe2: Bus 02 - 03
In:    serial
Out:   serial
Err:   serial
Net:   initialization for Atheros AR8327/AR8328
eTSEC1
auto update firmware: is_auto_upload_firmware = 0!
Autobooting in 1 seconds
=>

Installing Debian Linux

By installing OpenWrt on the router, we immediately get a working kernel and user space, but the user space in OpenWrt turns out to be quite limited (busybox, musl libc, no libraries for graphics/games, etc.).

Also when working with the default kernel OpenWrt we were missing AMD graphics drivers. We managed to solve the problem with drivers by compiling our own OpenWrt tree, in which we included additional modules. Then we booted this kernel via TFTP directly from u-boot:

setenv ipaddr 10.42.100.4
tftpboot 0x2000000 10.42.100.60:wdr4900-nfs-openwrt.bin
bootm 0x2000000

Fortunately, the Debian Linux architecture provides just this case.

“PowerPCSPE” port

, designed for this type of CPU (e500/e500v2). On a system with statically compiled QEMU user binaries and properly configured binfmt handlers, you can use Debian's debootstrap tool to create a mirror-based userspace boot space:

sudo QEMU_CPU=e500v2 debootstrap --exclude=usr-is-merged --arch=powerpcspe --keyring ~/gamingrouter/debian-ports-archive-keyring-removed.gpg unstable "$TARGET" https://snapshot.debian.org/archive/debian-ports/20190518T205337Z/

debootstrap will chroot into the newly minted root filesystem and simply execute the binaries

(post-install hooks, etc

). This work is clearly done by qemu-user-static, which is responsible for executing PowerPCSPE binaries on this amd64 host machine. From an additional environment variable

QEMU_CPU=e500v2

The QEMU environment learns which CPU to emulate.

GPU amdgpu (modern AMD)

We carried out our first experiments on the AMD Radeon RX570 GPU using a modern graphics driver

amdgpu

. As a result, very strange artifacts arose and so far no (normal) image:

After doing a bit of debugging and finally installing 32-bit x86 (i386) Linux on another computer, we noticed that the same problem occurs on every other 32-bit platform, even regular Intel PCs. Apparently, in amdgpu There is some incompatibility with 32-bit platforms.

We opened a discussion about this bugbut for some reason it is not being actively analyzed yet.

GPU radeon (AMD legacy)

But with an AMD Radeon HD 7470 card using an older radeon driver, everything suddenly started working:

Problems with order from oldest to youngest

For this platform we compiled reVC (a reverse engineered version of GTA Vice City, the source code of which is publicly available). This required preparing our own premake, glfw3, glew and reVC assemblies.

root@gaming-router:/home/user/GTAVC# ./reVC
Segmentation fault

Oops 🙂

We still need to work. It turns out that the game itself and the rendering engine (at least in the decompiled version) are not at all suitable for working from senior to junior. When loading game resources, you have to load structures (which contain offsets, sizes, numbers, coordinates, etc.) directly into memory. The data in these structures is numbered from least significant to highest, and it ends up on a platform that works on a high to low basis. Because of this, the game tries to access memory with absurd offsets and crashes almost immediately.

We spent several days patching the game and rendering engine librwso that this code works normally on machines numbered from high to low. In the source code there were more than 100 mechanisms that needed to be patched, the patches looked something like this:

@@ -118,6 +136,7 @@ RwTexDictionaryGtaStreamRead1(RwStream *stream)
  assert(size == 4);
  if(RwStreamRead(stream, &numTextures, size) != size)
    return nil;
+  numTextures = le32toh(numTextures);

  texDict = RwTexDictionaryCreate();
  if(texDict == nil)
@@ -458,8 +477,8 @@ CreateTxdImageForVideoCard()
          RwStreamWrite(img, buf, num);
        }

-       dirInfo.offset = pos / CDSTREAM_SECTOR_SIZE;
-       dirInfo.size = size;
+       dirInfo.offset = htole32(pos / CDSTREAM_SECTOR_SIZE);
+       dirInfo.size = htole32(size);
        strncpy(dirInfo.name, filename, sizeof(dirInfo.name));
        pDir->AddItem(dirInfo);
        CStreaming::RemoveTxd(i);

After the game had time to load some resources using

RwStreamRead()

and the data loaded into the structures needed to be converted from the least significant numbering order to the most significant numbering order on the host.

For operations such as saving games, settings, etc., it was necessary to equip it with a reverse mechanism, which always ensured saving in order from youngest to oldest.
Now we were quite able to load the game, explore the world, and drive the car. But when trying to display the character, very strange graphical glitches appear.

Glitches with player model

Attention: the picture is very rippling

The following video has a lot of flashing elements/glitches. If you have epilepsy or may experience seizures in response to light or other stimuli, please do not watch this video.

When all major and minor characters were disabled, there were no visible glitches. Everything worked fine, the game was quite playable (as much as you can play without supporting characters).

We spent a few more days looking for a bug in our code. Obviously, we made some mistake when implementing support for ordering from senior to junior. All applicable variables, coordinates, vertices, transformations were dumped as numbers and compared with the version of the game where the order was from low to high.

Now everything looked completely normal, and we couldn't find any more problems.
In this state, the project froze for several months.

Wii U port

We managed to find another port for reVC on the Internet: the

Wii U

. Wii U uses CPU

IBM Espresso

, this is a PowerPC based processor, just like ours. It also works in order from oldest to youngest.

We contacted Gary, the author of this Wii U port, and very, very politely asked if it was possible to take a look at the source code, patched in order from major to minor. Gary, thanks again!

By transplanting Gary's patches into the regular reVC codebase (without all the Wii U-specific changes), we were able to run reVC on TP-Link using Gary's well-researched patches…
And the same graphics damage began as before. What is this?!

At this stage we were looking everywhere for an answer, questioning and trying to check whether every part of the system was made intelligently: the kernel, GPU drivers, compilers and libraries.
PowerPC SPE is not the most common architecture (it was even removed from GCC 9), with very unusual floating point extensions (this architecture is very different from regular PowerPC CPUs).

Disabled spe (-mno-spe) and switched to the floating point software model, switched to e500, e500v2 as compilation target platforms, etc. – nothing changed.

i386 test

To make sure the code wasn't broken, we connected the same GPU to an x86 machine (a trusty ThinkPad T430, via ExpressCard 34). Installed the same version of Debian 10, same libraries, same radeon driver, same firmware and compiled the same reVC source code for i386.

The game worked perfectly, no graphical defects were observed.

Modern LLVM kernel

At this stage we wanted to try out a newer kernel (with newer radeon drivers). GCC has stopped supporting PowerPC SPE, so it will not be possible to build modern Linux 6.7 under GCC 8. But LLVM/clang just added support for PowerPC SPE, Linux can also be built using clang.

make LLVM=1 ARCH=powerpc OBJCOPY="~/binutils-2.42/build/binutils/objcopy" all -j 40 V=1
mkimage -C none -a 0x1200000 -e 0x1200000 -A powerpc -d arch/powerpc/boot/simpleImage.tl-wdr4900-v1 uImage12-nvme

We needed to provide our own version of binutils/objcopy (with PowerPC support) and ld.

Other changes that needed to be made to the TP-Link WDR4900 with the kernel from the main branch turned out to be very small:

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 968aee202..5ce3eeb09 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -181,6 +181,7 @@ src-plat-$(CONFIG_PPC_PSERIES) += pseries-head.S
 src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
 src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
 src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
+src-plat-$(CONFIG_TL_WDR4900_V1) += simpleboot.c fixed-head.S

 src-plat-$(CONFIG_PPC_MICROWATT) += fixed-head.S microwatt.c

@@ -351,7 +352,7 @@ image-$(CONFIG_TQM8548)                     += cuImage.tqm8548
 image-$(CONFIG_TQM8555)                        += cuImage.tqm8555
 image-$(CONFIG_TQM8560)                        += cuImage.tqm8560
 image-$(CONFIG_KSI8560)                        += cuImage.ksi8560
-
+image-$(CONFIG_TL_WDR4900_V1)          += simpleImage.tl-wdr4900-v1
 # Board ports in arch/powerpc/platform/86xx/Kconfig
 image-$(CONFIG_MVME7100)                += dtbImage.mvme7100

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 352d7de24..414216454 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -345,6 +345,11 @@ adder875-redboot)
     platformo="$object/fixed-head.o $object/redboot-8xx.o"
     binary=y
     ;;
+simpleboot-tl-wdr4900-v1)
+    platformo="$object/fixed-head.o $object/simpleboot.o"
+    link_address="0x1000000"
+    binary=y
+    ;;
 simpleboot-*)
     platformo="$object/fixed-head.o $object/simpleboot.o"
     binary=y
diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S
index 39724ff5a..80da35f85 100644
--- a/arch/powerpc/kernel/head_85xx.S
+++ b/arch/powerpc/kernel/head_85xx.S
@@ -968,7 +968,7 @@ _GLOBAL(__setup_ehv_ivors)
 _GLOBAL(__giveup_spe)
        addi    r3,r3,THREAD            /* want THREAD of task */
        lwz     r5,PT_REGS(r3)
-       cmpi    0,r5,0
+       PPC_LCMPI       0,r5,0
        SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
        evxor   evr6, evr6, evr6        /* clear out evr6 */
        evmwumiaa evr6, evr6, evr6      /* evr6 <- ACC = 0 * 0 + ACC */
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 9315a3b69..86ba4b5e4 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -176,6 +176,18 @@ config STX_GP3
        select CPM2
        select DEFAULT_UIMAGE

+config TL_WDR4900_V1
+    bool "TP-Link TL-WDR4900 v1"
+    select DEFAULT_UIMAGE
+    select ARCH_REQUIRE_GPIOLIB
+    select GPIO_MPC8XXX
+    select SWIOTLB
+    help
+      This option enables support for the TP-Link TL-WDR4900 v1 board.
+
+      This board is a Concurrent Dual-Band wireless router with a
+      Freescale P1014 SoC.
+
 config TQM8540
        bool "TQ Components TQM8540"
        help
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 43c34f26f..55268278d 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
 obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
 obj-$(CONFIG_FB_FSL_DIU)       += t1042rdb_diu.o
 obj-$(CONFIG_STX_GP3)    += stx_gp3.o
+obj-$(CONFIG_TL_WDR4900_V1) += tl_wdr4900_v1.o
 obj-$(CONFIG_TQM85xx)    += tqm85xx.o
 obj-$(CONFIG_PPA8548)     += ppa8548.o
 obj-$(CONFIG_SOCRATES)    += socrates.o socrates_fpga_pic.o
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index b2d8c0da2..21bc5f06b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -272,7 +272,7 @@ config TARGET_CPU
        default "e300c2" if E300C2_CPU
        default "e300c3" if E300C3_CPU
        default "G4" if G4_CPU
-       default "8540" if E500_CPU
+       default "8548" if E500_CPU
        default "e500mc" if E500MC_CPU
        default "powerpc" if POWERPC_CPU

The result was a bootable kernel. Again, no graphical defects occurred. It turned out that it was very nice to completely get rid of the OpenWrt toolkit.

qemu-user-static using llvmpipe

To make debugging a little easier, we copied the root filesystem to the local amd64 machine (using qemu-user-static again) and configured the X server to work with a formal/virtual monitor. Then we connected it to the x11vnc system so that we could look at this formal monitor.

Section "Device"
    Identifier  "Configured Video Device"
    Driver      "dummy"
    VideoRam    256000
EndSection

Section "Monitor"
    Identifier  "Configured Monitor"
    HorizSync   60.0 - 1000.0
    VertRefresh 60.0 - 200.0
    ModeLine    "640x480"   23.75  640 664 720 800  480 483 487 500 -hsync +vsync
              # "1920x1080" 148.50 1920 2448 2492 2640 1080 1084 1089 1125 +Hsync +Vsync
EndSection

Section "Screen"
    Identifier  "Default Screen"
    Monitor     "Configured Monitor"
    Device      "Configured Video Device"
    DefaultDepth 24
    SubSection "Display"
        Depth 24
        Modes "640x480"
    EndSubSection
EndSection

Inside chroot (with

QEMU_CPU

set for

e500v2

), we launched Xorg, x11vnc and finally reVC:

export LIBGL_ALWAYS_SOFTWARE=true
export GALLIUM_DRIVER=llvmpipe
export DISPLAY=:2

Xorg -config /etc/xorg.conf :2 &
x11vnc -display :2 &
xrandr --output default --mode "800x600"
/home/user/GTAVC/reVC

… despite the fact that this mechanism works

absurd

slowly (1 frame about ~20 s), everything worked. It even worked with player models, without any graphical problems. The main differences were as follows:

  • QEMU emulates the CPU, not the actual hardware;
  • llvmpipe instead of radeon/r600.

Then installed

GALLIUM_DRIVER=llvmpipe

on real hardware. Because of this, performance deteriorated even more (about 1 frame per minute!), but everything worked!

There were no noticeable graphical defects

(although I had to wait almost an hour to get into the game…).

Mesa update

Then we set about updating mesa on the router. This also required updating a number of dependencies. cmake, libglvnd, meson, drm and, finally, mesa had to be built from scratch, the code was taken either directly from git or from the latest release.

After installing the new libglvnd, drm and mesa, the display of characters worked normally on real hardware (with acceleration!). We still haven't identified the real cause of the problem (or which library is at fault), but we were more than pleased with how we eventually managed to solve the problem.

Result


You might also want to read this:

News, product reviews and competitions from the Timeweb.Cloud team – in our Telegram channel

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *