We create a network card on discrete logic

This post is a continuation of my project to create a complete computer system using discrete logic components. I already have a computer capable of running network applications, such as an HTTP server or a game over a LAN.

Last year I made physical layer adapter, which converts the 10BASE-T Ethernet signal to SPI and back. Then I used an STM32 microcontroller to test its operation, and now I am implementing a MAC layer module to connect it to my homemade computer.

Both adapters are full duplex and have separate transmitter and receiver.

The whole computer. The new module is on the bottom right

New module with physical layer shield removed

Receiver

Brief description of the receiver operation:

FCS

not tested on equipment.

Data collection

First, the SPI serial data must be converted into a byte stream.

Serial data is shifted into the shift register (

U32

).

U30

And

U31

count bits and bytes. SRAM write signal

recv_buf_we

is generated using a D-trigger

U29B

. This signal goes low briefly after every 8 bits of input data:

The received bytes are written to a 2 kB buffer of 6116 static RAM (

U20

).

U13

,

U16

form

U18

address multiplexer: it selects as input address for SRAM (

U20

) or byte counter, or system address bus. Tri-state buffer

U21

redirects the received byte to RAM.

To provide access to the received data and its length, the RAM and byte counter are connected to the system data bus by three-state buffers:

U25

connects the receiver RAM to the system data bus. After the frame completes, the byte counter is not reset and its value is stored on the bus.

recv_byte_cnt

. This bus is connected to the system data bus via

U26

And

U27

. They are activated when the CPU makes a read request to certain addresses. Second half

U27

forms a two-state read-only register that is used to interrogate the status of the receiver and transmitter.

MAC Address Filtering

While analyzing Ethernet traffic, I noticed that frames usually arrived in small groups (3-4 frames separated by a short pause). Frames in the same group usually had different destination MAC addresses. This made me think that my computer was unable to filter received frames by MAC and re-enable the receiver fast enough to catch frames intended for it. I needed hardware MAC filtering.

The solution of storing the MAC address and comparing the first six bytes received with it did not suit me – it was too complicated. I could also make it repeat one byte (for example, FE:FE:FE:FE:FE:FE), but that's boring. To add some variety to my MAC, I made it a byte index function:

  • Bit 0 has a fixed value of 0;
  • Bit 1 has a fixed value of 1;
  • Bits 2-4 are the inverse of the byte index;
  • Bits 5-7 have a fixed value of 1.

When using this rule, the MAC address takes the form

FE:FA:F6:F2:EE:EA

. Also to work with ARP we need to accept the broadcast MAC

FF:FF:FF:FF:FF:FF

.

In this diagram the bus

a[0..3]

— these are the low-order 4 bits of the byte counter. Tire

d[0..7]

is the received byte.

U33

compares data bits 0 and 2-4 with the desired values; if they match, then the output

U34A

there will be a high signal.

U35A

implements a broadcast MAC check: its output will be high when bits 0 and 2-4 are equal to ones. These two signals are combined using logic OR (implemented using diodes

D7

and resistor

R6

). The remaining bits are checked for equality to one using

U35B

.

This block only checks the validity of one byte. To check all six, the result is accumulated in U10A. If frames are not received, the signal ss (SPI Slave Select Input Signal) is low, and U10A has a value of 1. As frames are received, this value is updated for each byte received. If the recipient's MAC address meets the criteria, then the value U10A remains high. When the byte address reaches 5, the final value is latched into U36B. This pin is used to stop receiving frames if the destination address does not match.

Transmitter

Similar to the receiver, the transmitter does not implement FCS generation, it is done in software. To further simplify the transmitter, I decided to support only fixed-length frames. This eliminates the need for a complex digital comparator, and the frame transmission logic depends on only one bit in the byte counter. I chose 1024 bytes as the frame length, which is close to the typical MTU of 1500 bytes. The frame preamble (a sequence of many 0x55s ending with 0xD5, which is required by 10BASE-T) is also included in these 1024 bytes and must be loaded there in software.

Fixing the frame length had no effect on higher-layer protocols because they encode the packet size in their headers and therefore do not rely on the true Ethernet frame length.

Brief description of the transmitter operation:

  • Data is stored in static RAM;
  • The 20 MHz clock signal is fed to the 4-bit counter, its overflow pin is used as the byte clock signal;
  • To transmit a frame, the user writes to a specific read-only memory area, which turns on the counter;
  • Parallel byte data is serialized using a shift register.

Counters

As in the receiver, two counters are used to count the bits (

U12

) and bytes (

U14

). The first counter is fed with a 20 MHz clock signal from an integrated generator. The 20 MHz are not used directly, but are divided by at least 2. Due to this, the generator's duty cycle does not affect the output signal.

Data stream

As in the receiver, three 74HC157 multiplexers (not shown in the image) are used to select the input address for the RAM (

U22

).

U23

used to load data into RAM.

U24

used as intermediate storage for the currently transmitted byte. The principle here is similar to mine

VGA pipeline

: Byte counter 74HC4040 – slowly stabilized oscillation counter,

U24

provides stable output while RAM output is still invalid. This data is passed to the shift register

U28

where they are shifted byte by byte.

After making the device, I noticed that I had mixed up the order of the bits coming from the RAM into the shift register. I had to programmatically change the order of the bits to eliminate this hardware bug. I couldn't test this in Verilog beforehand.

To form a beautiful 10BASE-T signal (see my previous post) MOSI And SCK must be precisely synchronized. This problem is solved U11A And U8B. tx_cnt0 (bit 0 of the bit counter divided by 20 MHz) is used as a clock signal. U11A changes its output synchronously with this signal. U8B delays the clock signal to match the delay introduced U11A. Since the D latch is more complex than a simple AND gate and has a slightly higher (5 ns) latency, the faster 74LV74A is used here. Its propagation delay is the same as the 74HC08. This is the only chip on my board from the “fast” family.

CPU Interface

From a programmer's point of view, my Ethernet adapter has the following interface:

  • Both frame buffers are mapped to 0xF000.
  • There are two read-only registers:

    • 8-bit status register in 0xFB00 has two flags:

      • RX_FULL – frame received,
      • TX_BUSY — the frame is transmitted;
    • 16-bit register of received data length in 0xFB02.
  • Write any value to 0xFB00 turns on the receiver again.
  • Write any value to 0xFB01 starts transmission.

There are no interrupts because my CPU does not support them.

All matching addresses start with

F

(all the senior 4 bits are equal to one). This condition is checked

U2A

.

Bit 11 must be zero for the buffer address. This is being checked U1D, D2, R2 And U1E. The buffer select signal is then combined with the write enable or output enable signals to select writing to the TX buffer or reading from the RX buffer.

Equals the second hexadecimal digit for registers to the value B (1011) is being checked U1B And U2B. Then another block of diode logic (D1, R1, U1C) combines it with checking the first digit. Decoders U4A And U4B are used to select a specific function.

Two LEDs indicate access to buffers or registers.

Programming

I wanted to make my computer network-capable, but I was too lazy to implement the TCP/IP stack myself. I also wanted a decent C compiler, because my first compiler sucked and assembly programming is tedious. So I created

C compiler

. It is advanced enough to compile uIP 1.0 (a tiny TCP/IP library). Even though my CPU has a horrendously low code density, uIP is small enough to fit in RAM and still have room in it for the application.

The network speed is very low, but I am still very pleased with it, because everything was created without the use of commercial CPUs or special chips:

  • The total ping path averages 85ms;
  • The download speed of the HTTP server is 2.6 kB/s (transferring static files from an SD card).

Project Repository

Models, schematic files and PCB drawings are posted

on Github

.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *