# SDRAM controller implementation

In an article about the VGA interface, I wrote that I used an external SDRAM memory as a frame buffer. I want to share its implementation, if only because when I was developing this module I spent a lot of time, because standard IP cores do not support this chip. And, as a result, I want to help someone in this matter.

The debug board was used all the same with the FPGA of the Spartan6 xc6slx16 family. There is also a 32 MB SDRAM (MT48LC16M16A2) on board.

Here is a photo of the debug board (hide photo):

Input data.

And so, from datasheet on the MT48LC16M16A2 chip, it has a structure of 4 banks of 4 million cells with a capacity of 16 bits (4M cells x 16 x 4 banks). In principle, everything is clear here.

To understand the addressing of this memory, we look in the datasheet and see such a table, we are interested in the last column:

It shows that we have 13 bits for row addressing, 9 bits for column addressing, and 2 bits for bank addressing; only 24 bits for addressing the entire amount of memory. Using some simple math, you can get the same amount of memory that is promised in the datasheet: 16*2^24 = 268435456 bits (33554432 bytes).

If you look at the pinout of this microcircuit, you will see that there are only 13 bits of pins for addressing + 2 bits of bank addressing. The figure shows that the signals A0-A12 are the address bus and BA0-BA1 are the bank selection bus, well, that’s all … Where are the other 9 bits of the address?

If it is better to study the question of the mechanism of operation of SDRAM (or it is better to memorize material in pairs, as in my case[я, разумеется, все забыл]), it turns out that the address bus is used twice. The first time it is used to select a row, in our case all 13 bits of the address are used; and the second time is used to select a column (only 9 bits of address), essentially a cell inside that row. Once again: first select a row (N clock cycle), then select a column (N+x clock cycle) and their intersection will give us a memory cell, of course, do not forget about the selected bank.

The DQ0-DQ15 data bus also catches the eye. As you can see, it is one and is used – both for writing and for reading. To me, as a person who often uses BRAM primitives in FPGAs, such an architecture seemed extremely inconvenient. But if you look at it from the other side, you can immediately understand that a microcircuit is a physical device soldered on a board, and if all the pins had to be separated, then it would be that hemorrhoids, and no one needs it. In addition, there is one address bus, and I don’t know cases when it is necessary to request reading and writing at the same address at the same time.

Work algorithm.

SDRAM operates on a command system that sets the mode and stages of operation. Here is a list of commands for the MT48LC16M16A2 chip:

The microcircuit itself has a lot of different modes of operation, for example:

• Various packet lengths that can be read and written at the same time.

• Auto recharge of memory cells.

• The number of participating memory banks during a charge update.

• There are also various options for transitioning from state to state.

In order to simplify my life, I switched to the access mode to one cell at a time. I also took advantage of the property that after the read / write command, you can execute the next read / write command if the new address is within the activated line. In my case, I could write 512 words without stopping, then the process of updating memory cells begins and the controller’s automaton goes into command waiting mode. As a result, I got the following module:

The m_* interface is the input for loading commands and data if the command is for writing. The s_* interface outputs the result of reading from memory. The data is read with a delay of 3 cycles.

The logic of the module is simple, read or write commands are captured as long as the address changes in the lower 9 bits. Also, commands are no longer captured if the type of the command has changed (read<=>write). The module is sensitive to the m_valid signal, if it falls, then the memory controller proceeds to close the activated row and updates the charge in the cells.

Despite the fact that the datasheet says that the maximum frequency for this chip is 133 MHz, in my debugging the module worked at a frequency of 150 MHz. But I did not tempt fate and left the frequency at 100 MHz (it is so convenient for me for further use).

Here is the module code:
```````timescale 1ns / 1ps

module ctrl_sdram_v2
#(
parameter [2:0] CL = 'd3
)
(
input clk,
input rst,
//user interface)
input [15:0] m_data, // valid if m_we==1
input [23:0] m_addr, //2bit BANK, 13bit ROW, 9bit COLUMM
input		 m_we	  , // 0 - read, 1 - write)
input		 m_valid,
output reg[15:0] s_data,
output reg    s_valid,

//SDRAM interface
output reg sd_cke,
output sd_clk,
output sd_dqml,
output sd_dqmh,
output reg  sd_cas_n,
output reg sd_ras_n,
output reg sd_we_n,
output reg sd_cs_n,
inout  [15:0] sd_data
);

reg [3:0] state_main = 'd0;
reg [15:0] state_tri ;
reg [15:0] sd_data_o ;
wire [15:0] sd_data_i ;

reg flg_first_cmd = "d1;
reg [15:0] cnt_wait="d0;
reg [15:0] cnt_wait_buf = "d0;
reg [10:0] cnt_refresh_sdram = 'd0;
always@(posedge clk)
begin
if(rst) begin
state_main <= 'd0;
flg_first_cmd <= 'd1;
end else begin
case(state_main)
0: begin //wait 100 us
if(cnt_wait >= 8000) state_main<= 'd1;
else cnt_wait <= cnt_wait + 1;
end
1: begin //set NOP
if(cnt_wait >= 10000) begin
state_main<= 'd2;
cnt_wait <= 'd0;
end else cnt_wait <= cnt_wait + 1;
end
2: begin //cmd PRECHARGE ALL
if(cnt_wait >= 'd1) begin
cnt_wait <= 'd0;
state_main <= 'd3;
end else cnt_wait <= cnt_wait + 1'b1;
end
3: begin // AUTO REFRESH 0
if(cnt_wait[14:0] >= 'd6) begin
cnt_wait[14:0] <= 'd0;
if(cnt_wait[15]) begin
state_main <= 'd4;
cnt_wait[15] <= 'd0;
end else cnt_wait[15] <= 'd1;
end else cnt_wait <= cnt_wait + 1'b1;
end
if(cnt_wait >= 'd1) begin
cnt_wait <= 'd0;
state_main <= 'd5;
end else cnt_wait <= cnt_wait + 1'b1;
end
5: begin //IDLE state
if(m_valid) begin
state_main <= 'd6;
cnt_refresh_sdram <= 'd0;
end else begin
if(&cnt_refresh_sdram) begin
state_main <= 'd8;
cnt_refresh_sdram <= 'd0;
end else cnt_refresh_sdram <= cnt_refresh_sdram + 1;
end
end
6: begin // cmd ACTIVATE row
if(cnt_wait >= CL) begin
cnt_wait <= 'd0;
if(m_we) begin //cmd WRITE
state_main <= 'd7;
state_main <= 'd9;
end
flg_first_cmd <= 'd1;
end else cnt_wait <= cnt_wait + 1'b1;
end
7: begin //WRITE
if(flg_first_cmd) begin
flg_first_cmd <= 'd0;
end else begin
state_main <= 'd8;
end
end
end
8: begin //cmd PRECHARGE after write
if(cnt_wait >= 'd3) begin
cnt_wait <= 'd0;
state_main <= 'd5;
end else cnt_wait <= cnt_wait + 1'b1;
end
if(flg_first_cmd) begin
flg_first_cmd <= 'd0;
end else begin
state_main <= 'd10;
cnt_wait_buf <= cnt_wait;
end
end
cnt_wait <= cnt_wait + 1'b1;
end
10: begin //reading data from SDRAM
if(cnt_wait == cnt_wait_buf+CL) begin
state_main <= 'd11;
cnt_wait <= 'd0;
end else cnt_wait <= cnt_wait + 1;
end
11: begin // cmd AUTO REFRESH after read
if(cnt_wait >= 'd3) begin
cnt_wait <= 'd0;
state_main <= 'd5;
end else cnt_wait <= cnt_wait + 1'b1;
end
endcase
end
end

assign m_ready = (state_main == 'd7 && m_we == 'd1 && new_row_addr == 'd0) ? 'd1 :
(state_main == 'd9 && m_we == 'd0 && new_row_addr == 'd0) ? 'd1 :
'd0;

always@(posedge clk)
begin
s_data <= sd_data_i;
s_valid <= ((state_main == 'd9 || state_main == 'd10) && cnt_wait > CL) ? 'd1 : 'd0;
end

assign sd_dqml	=0;
assign sd_dqmh	=0;

always@(posedge clk)
begin

state_tri <= (state_main == 'd7) ? 16'd0 : 16'hFFFF;
sd_data_o <= (state_main == 'd7) ? m_data : 'd0;

sd_cke	<= (state_main == 'd0) ? 'd0 : 	'd1;

sd_cas_n<=			(state_main == 'd1) ? 'd1 : // INIT NOP
(state_main == 'd2 && cnt_wait==0) ? 'd1 : //PRECHARGE
(state_main == 'd2 && cnt_wait>0)  ? 'd1 :
(state_main == 'd3 && cnt_wait[14:0]==0)  ? 'd0 : //autorefresh
(state_main == 'd3 && cnt_wait[14:0]!=0)  ? 'd1 ://nop
(state_main == 'd4 && cnt_wait==0)  ? 'd0 : //load mode
(state_main == 'd4 && cnt_wait!=0)  ? 'd1 : //nop
(state_main == 'd5)  ? 'd1 : //nop
(state_main == 'd6 && cnt_wait==0)  ? 'd1 : //activate
(state_main == 'd6 && cnt_wait!=0)  ? 'd1 : //nop
(state_main == 'd7 && m_valid=='d1 && m_ready=='d1  ) ? 'd0  : //WRITE
(state_main == 'd7 && (m_valid=='d0 || m_ready=='d0)) ? 'd1  : //nop
(state_main == 'd8 && cnt_wait==0) ? 'd1 : //precharge after write
(state_main == 'd8 && cnt_wait!=0) ? 'd1 : //nop
(state_main == 'd9 && m_valid=='d1 && m_ready=='d1  ) ? 'd0  : //READ
((state_main == 'd9 || state_main == 'd10) && (m_valid=='d0 || m_ready=='d0)) ? 'd1 : // nop
(state_main == 'd11 && cnt_wait==0) ? 'd1: //'d0 : //auto REFRESH(1) //precharge after read
(state_main == 'd11 && cnt_wait!=0) ? 'd1 : // nop
'd1;
sd_ras_n<=	(state_main == 'd1) ? 'd1 :
(state_main == 'd2 && cnt_wait==0) ? 'd0 :
(state_main == 'd2 && cnt_wait>0)  ? 'd1 :
(state_main == 'd3 && cnt_wait[14:0]==0)  ? 'd0 :
(state_main == 'd3 && cnt_wait[14:0]!=0)  ? 'd1 :
(state_main == 'd4 && cnt_wait==0)  ? 'd0 :
(state_main == 'd4 && cnt_wait!=0)  ? 'd1 :
(state_main == 'd5)  ? 'd1 :
(state_main == 'd6 && cnt_wait==0)  ? 'd0 :
(state_main == 'd6 && cnt_wait!=0)  ? 'd1 :
(state_main == 'd7 && m_valid=='d1 && m_ready=='d1  ) ? 'd1  :
(state_main == 'd7 && (m_valid=='d0 || m_ready=='d0)) ? 'd1  :
(state_main == 'd8 && cnt_wait==0) ? 'd0 :
(state_main == 'd8 && cnt_wait!=0) ? 'd1 :
(state_main == 'd9 && m_valid=='d1 && m_ready=='d1  ) ? 'd1  :
((state_main == 'd9 || state_main == 'd10) && (m_valid=='d0 || m_ready=='d0)) ? 'd1 :
(state_main == 'd11 && cnt_wait==0) ? 'd0: //'d0 :
(state_main == 'd11 && cnt_wait!=0) ? 'd1 :
'd1;
sd_we_n	<=  (state_main == 'd1) ? 'd1 :
(state_main == 'd2 && cnt_wait==0) ? 'd0 :
(state_main == 'd2 && cnt_wait>0)  ? 'd1 :
(state_main == 'd3 && cnt_wait[14:0]==0)  ? 'd1 :
(state_main == 'd3 && cnt_wait[14:0]!=0)  ? 'd1 :
(state_main == 'd4 && cnt_wait==0)  ? 'd0 :
(state_main == 'd4 && cnt_wait!=0)  ? 'd1 :
(state_main == 'd5)  ? 'd1 :
(state_main == 'd6 && cnt_wait==0)  ? 'd1 :
(state_main == 'd6 && cnt_wait!=0)  ? 'd1 :
(state_main == 'd7 && m_valid=='d1 && m_ready=='d1  ) ? 'd0  :
(state_main == 'd7 && (m_valid=='d0 || m_ready=='d0)) ? 'd1  :
(state_main == 'd8 && cnt_wait==0) ? 'd0 :
(state_main == 'd8 && cnt_wait!=0) ? 'd1 :
(state_main == 'd9 && m_valid=='d1 && m_ready=='d1  ) ? 'd1  :
((state_main == 'd9 || state_main == 'd10) && (m_valid=='d0 || m_ready=='d0)) ? 'd1 :
(state_main == 'd11 && cnt_wait==0) ? 'd0 ://'d1 :
(state_main == 'd11 && cnt_wait!=0) ? 'd1 :
'd1;
sd_cs_n	<=			(rst == 'd1) ?  'd1 :
(state_main == 'd1) ? 'd0 :
(state_main == 'd2 && cnt_wait==0) ? 'd0 :
(state_main == 'd2 && cnt_wait>0)  ? 'd0 :
(state_main == 'd3 && cnt_wait[14:0]==0)  ? 'd0 :
(state_main == 'd3 && cnt_wait[14:0]!=0)  ? 'd0 :
(state_main == 'd4 && cnt_wait==0)  ? 'd0 :
(state_main == 'd4 && cnt_wait!=0)  ? 'd0 :
(state_main == 'd5)  ? 'd0 :
(state_main == 'd6 && cnt_wait==0)  ? 'd0 :
(state_main == 'd6 && cnt_wait!=0)  ? 'd0 :
(state_main == 'd7 && m_valid=='d1 && m_ready=='d1  ) ? 'd0  :
(state_main == 'd7 && (m_valid=='d0 || m_ready=='d0)) ? 'd0  :
(state_main == 'd8 && cnt_wait==0) ? 'd0 :
(state_main == 'd8 && cnt_wait!=0) ? 'd0 :
(state_main == 'd9 && m_valid=='d1 && m_ready=='d1  ) ? 'd0  :
((state_main == 'd9 || state_main == 'd10) && (m_valid=='d0 || m_ready=='d0)) ? 'd0 :
(state_main == 'd11 && cnt_wait==0) ? 'd0: //'d0 :
(state_main == 'd11 && cnt_wait!=0) ? 'd0 :
'd0;
sd_addr[12:0]	<=  (state_main == 'd2 && cnt_wait==0) ? {4'b0,1'b1,10'b0} :  //[10] = 1
(state_main == 'd4 && cnt_wait==0)  ? {2'b00,3'b000,1'b1,2'b00,CL[2:0],1'b0,3'b000} :  //BA[1:0]==0,A[12:10]==0,WRITE_BURST_MODE = 0,OP_MODE = 'd0, CL = 2, TYPE_BURST = 0, BURST_LENGTH = 1
(state_main == 'd6 && cnt_wait==0)  ? m_addr[21:9] :
(state_main == 'd7) ? {5'd0,m_addr[8:0]} :
(state_main == 'd8 && cnt_wait==0) ? {4'b0,1'b1,10'b0} :  //[10] = 1
(state_main == 'd9) ? {7'd0,m_addr[8:0]} :
(state_main == 'd11 && cnt_wait==0) ? {4'b0,1'b1,10'b0} :  //[10] = 1
'd0;

end

ODDR2 #(
.DDR_ALIGNMENT("NONE"), // Sets output alignment to "NONE", "C0" or "C1"
.INIT(1'b0),    // Sets initial state of the Q output to 1'b0 or 1'b1
.SRTYPE("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
) ODDR2_inst (
.Q		(sd_clk),   // 1-bit DDR output data
.C0	(clk),   // 1-bit clock input
.C1	(!clk),   // 1-bit clock input
.CE	(!rst), // 1-bit clock enable input
.D0	(1), // 1-bit data input (associated with C0)
.D1	(0), // 1-bit data input (associated with C1)
.R		(0),   // 1-bit reset input
.S		(0)    // 1-bit set input
);

genvar i;
generate
for (i=0; i < 16; i=i+1)
begin: tri_state
OBUFT #(
.DRIVE(12),   // Specify the output drive strength
.IOSTANDARD("DEFAULT"), // Specify the output I/O standard
.SLEW("SLOW") // Specify the output slew rate
) OBUFT_inst (
.O(sd_data[i]),     // Buffer output (connect directly to top-level port)
.I(sd_data_o[i]),     // Buffer input
.T(state_tri[i])      // 3-state enable input
);

IBUF #(
.IOSTANDARD("DEFAULT")    // Specify the input I/O standard
)IBUF_inst (
.O(sd_data_i[i]),     // Buffer output
.I(sd_data[i])      // Buffer input (connect directly to top-level port)
);
end
endgenerate

endmodule
``````

Conclusion. The article did not turn out to be comprehensive and all-explaining, but on this site there is a Russian version of the datasheet and with comments from the author. I relied on it when I mastered the material.

PS how I docked the work of the VGA module and the SDRAM controller.

The project had two clock domains, at 100 MHz and at 25 MHz. Due to the fact that the memory controller worked at a frequency of 100 MHz, it could theoretically write 3 new frames into itself before 1 frame was drawn on the monitor.

The automaton works in two states, either it loads a new frame or it subtracts an existing frame for further rendering. The default mode is the mode of writing a new frame to memory, when a signal comes from FIFO that it is almost empty, the machine switches to reading from memory and subtracts the required amount. In this case, the almost_empty signal rises when there are 100 values ​​left in FIFO, this is done so that the machine has time to switch to read mode and the Ctrl_SDRAM module has time to finish the previous command. The machine reads the next 900 pixel values ​​from memory and switches back to the recording mode.

FIFO is two-clock, with a depth of about 1000 values. At a frequency of 100 MHz, it writes to it, and the VGA module subtracts at its frequency of 25 MHz. If we estimate the time after which it will again be necessary to switch the machine to reading, then it is as follows: 100 values ​​+ 900 new values ​​are read from memory – a quarter of the values ​​\u200b\u200bthat have time to be subtracted during this period, ultimately we have 750 values ​​in FIFO. As a result, the VGA module will read the next values ​​of 650 cycles, before the almost_empty flag rises, translating this to 100 MHz we get 2600 memory write cycles, this is more than enough. Naturally, here you need to understand that the VGA module does not read from FIFO in those areas where the picture is not drawn: about 160 cycles at the end of each row of pixels and 7200 cycles at the end of the frame at a frequency of 25 MHz, and this is a total of 336,000 idle cycles at a frequency of 100 MHz.