SuperRT – Super Nintendo ray tracing chip

image

Continuing the topic, we present to your attention the translation of the original articles from Ben Carter

Links to videos on this article:

I finally got the results of a project that I had been doing in my spare time for about a year.

His idea came up when I was trying to come up with an interesting project for learning Verilog and FPGA design. I came up with the idea of ​​creating a simple ray tracer (partly inspired by the successes of my horribly clever friend who created his own GPU). A little later (probably because my brain hates me and enjoys coming up with stupid tasks) it all turned into a question: “Wouldn’t it be interesting to make the SNES do ray tracing?” This is how the idea for the SuperRT chip was born.

I wanted to try to create something that resembles the Super FX chip used in games like Star Fox. SNES in them executes game logic and passes the description of the scene to the chip in the cartridge, which is engaged in generating graphics. I deliberately limited myself to using a single homemade chip in the design, and not the ARM core on the DE10 board or any other external computing resources.

The final results look something like this:

I apologize for the poor quality of the screenshots, for some reason my capture card provides terrible results when capturing signals from the SNES, so I had to use the good old “take pictures of the screen with the lights off” method.

The Super Nintendo shown here (which is, strictly speaking, the Super Famicom) has the cover removed to provide room for wiring, but otherwise it is completely unmodified. A printed circuit board with a copy is connected to it terrible a Pachinko game that I bought for 100 yen from a local thrift store. The ROM of the game was removed and replaced with a cable sleeve. It then goes through a set of level shifters to convert the 5V SNES to 3.3V and then connects to a DE10-Nano FPGA development board with Cyclone V FPGA. The level shifter boards are downright awful, and their assembly turned into a nightmare because of the must-have ICs that are only sold in surface mount packages. However, they do their job.

The SuperRT chip creates a scene using a specialized instruction language executed by one of three blocks of parallel chip code execution (in fact, these are specialized CISC processors) for calculating ray crossing tests. Scene description allows you to create objects using a subset of CSG operations: spheres and planes are used as basic building blocks, and they are used using OR, AND, and subtraction to create the desired geometry. AABBs are also supported and are mainly used for truncation tests (they can also be rendered if desired, but they have lower positioning accuracy compared to other primitives, so they are not particularly useful except for debugging tasks).

The renderer emits up to four rays per screen pixel, calculating direct shadows from a directional light source and one specular reflection. Each of the surfaces has a diffused light color and reflective property; modifiers can be applied to them based on CSG results or custom functions. This is used to generate a checkerboard pattern on the floor.

The ray color for each pixel is calculated by a ray engine that handles the entire ray life cycle; it uses an execution engine module to execute a control program that describes the scene as many times as necessary to compute the ray results. The control program itself is downloaded from the SNES and stored in a local buffer of 4 KB RAM – the animation is implemented by writing modified commands to this buffer. The disassembled command buffer looks like this:

0000 Start
0001 Plane 0, -1, 0, Dist=-2
0002 SphereSub OH 2, 1, 5, Rad=5
0003 SphereSub OH 4, 1, 4, Rad=4
0004 SphereSub OH 5, 1, 9, Rad=9
0005 SphereSub OH 2, 1, 2, Rad=2
0006 SphereSub OH -0.5, 1, 2, Rad=2
0007 RegisterHitNoReset 0, 248, 0, Reflectiveness=0
0008 Checkerboard ORH 48, 152, 48, Reflectiveness=0
0009 ResetHitState
0010 Plane 0, -1, 0, Dist=-2.150146
0011 RegisterHit 0, 0, 248, Reflectiveness=153
0012 AABB 4, -2.5, 11,    8, 3.5, 13
0013 ResetHitStateAndJump NH 44
0014 Origin 6, 2, 12
0015 Plane -0.2929688, 0, -0.9570313, Dist=0.2497559
0016 PlaneAnd OH 0.2919922, 0, 0.9560547, Dist=0.25
0017 PlaneAnd OH 0, 1, 0, Dist=1
0018 PlaneAnd OH 0, -1, 0, Dist=4
0019 PlaneAnd OH -0.9570313, 0, 0.2919922, Dist=-1
0020 PlaneAnd OH 0.9560547, 0, -0.2929688, Dist=1.499756
0021 RegisterHit 248, 0, 0, Reflectiveness=0

Each execution engine is a processor unit with a 14-cycle pipeline, and typically one instruction completes per clock cycle, so each execution unit can compute approximately 50 million sphere, plane, or AABB intersections. The exception is that branching operations need to clear the entire pipeline, and therefore, they spend 16 cycles (14 cycles to clear the pipeline + 2 cycles of delay to receive the command). To avoid this as much as possible, a branch prediction system is used – fortunately, the spatial connectivity of adjacent rays often leads to a high level of prediction coincidence.

Intersections in the runtime engine are computed by two pipelines – one handling AABB, the other handling spheres and planes. The system as a whole works exclusively with 32-bit integer math in fixed point 18.14 format; if it is known that the values ​​are in the range of ± 1, then the 16-bit (2.14) format is used, and the pipeline for computing the intersections of spheres / planes has two additional specialized mathematical blocks that calculate the operations of inverse values ​​and square root.

When rendering a frame, the PPU transform module converts the frame buffer into a format that can be transferred using DMA directly to the SNES VRAM for display, shrinking it to 256 colors and replacing it with the bit planes of the symbol tiles. The screen has a resolution of 200×160, that is, a full frame takes up exactly 32000 bytes of image data, which, due to bandwidth limitations, are transferred to VRAM as two fragments of 16000 bytes in successive frames. Therefore, the full image can only be updated every two frames, which limits the maximum frame rate to 30FPS. However, the test scene runs closer to 20FPS (mainly due to bottlenecks on the SNES side).

Many thanks to the participants this thread on SNESdev for many helpful ideas about the DMA fullscreen expansion chip that came up with the solution I described.

This chip also implements many other basic functions – there is an interface with the SNES cartridge bus, as well as a small ROM for programs containing 32 KB of code for the SNES (it is limited by the fact that the interface board is still connected only to the address bus A lines of the SNES console, and therefore, the available address space is only 64 KB, of which 32 KB is used for memory mapped I / O registers used to communicate with the SuperRT chip). There is also a multiply acceleration block that allows the SNES to quickly perform 16×16 bit multiplications.

For debugging, I used the HDMI interface of the DE10 board, outputting data to a second monitor, as well as a Megadrive gamepad connected to the GPIO pins to control the debugging system. However, due to limited resources, when you turn on all three cores of the ray engine, debugging has to be turned off.

Here is a brief overview of the system, in the future I plan to publish new articles with a more detailed description of the work of individual components. However, if you have questions or thoughts, please contact with meand I’ll try to answer!

Many thanks to Matt, Jamine, Rick and everyone who helped with advice, inspiration and support!


Advertising

Reliable server for rent and the right choice of a tariff plan will allow you to be less distracted by unpleasant monitoring notifications – everything will work without interruptions and with a very high uptime!


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *