Launching I2S Transceiver on Artery [часть 2] (DMA FSM, Pipeline)

Prologue

In this text you will learn what an I2S transceiver and pancakes have in common. Yes… That's right. Why does a microcontroller programmer need pipelines and digital filters?

This text describes how to emit sound using I2S DMA.

What's the problem?

In the previous text it was mentioned that the operation of the I2S transceiver in interrupt mode is associated with a relatively high load on the processor core. This is due to the fact that interrupt handlers are often called. If we transmit sound with a frequency of 48 kHz, then interrupts will occur with a frequency of 96 kHz.

Something needs to be done about this problem.

Statement of the problem

Write a firmware application that will be both continuously write (R) and continuously broadcast (T) data via I2S2 in the Artery microcontroller. At the same time? everything should work in real time. Implement the R->T scheme.

Translated into kitchen language, you need to do fully digital echo. Through RAM memory.

As a task with an asterisk, try to slightly modify the received data in real time before sending it back to I2S. Pass it through some digital filter. That is, implement the Read(R)->Proc(P)->Transmit(T) scheme.

Solution

So, let's start from scratch… The solution to the problem is to broadcast and record sound via direct memory access (DMA). DMA is a coprocessor that can only move data within the physical memory of the microprocessor. In essence, DMA is a hardware implementation of the memcpy() function. DMA is good because it generates a DONE interrupt when sending is complete. DMA can also generate an interrupt about the successful transfer of half of the planned data volume. That's it.

DMA has 3 modes.

No.	Src	Dist	Explanation
1	RAM	PHY	Broadcast to the periphery
2	PHY	RAM	Recording from the periphery
3	RAM	RAM	memcpy()
4	PHY	PHY	there is no such mode

Obviously, we can't record and play the same PCM data at the same time. We need to record it in pieces and play it in pieces too. For example, 1024 samples. One sample is 2 channels. Each channel has 2 bytes (int16_t).

Naively intuitively, this is the way to solve the problem of digital echo

However, this is a very bad idea, as it breaks the continuity of the audio recording and breaks the continuity of the audio playback. This method will only work if the size of the recorded audio track is only one sample! However, this does not solve the problem of too frequent interruptions.

What should we do?

To solve this engineering problem, we need to remember an old Russian folk logic puzzle.

How to fry 3 pancakes in 3 minutes if each side is fried for 1 minute, and only 2 pancakes fit in the frying pan?

And the solution to the problem is this: T-top, B-button.

This is, in a way, the very first life experience of using pipeline processing at the household level. A similar principle should be applied to I2S digital sound processing. Only here, instead of pancakes, there will be arrays of words of 514 bytes. Instead of a frying pan – an I2S transceiver. That's all…

So it turns out that we need to divide the temporary memory into two equal parts Low and Hi. While L is being written, broadcast H, while H is being written, broadcast L.

Considering that DMA has a circular mode, such processing will work on its own indefinitely. It just needs to be started. How to start? In working-class language, I2S-DMA needs to be started “by pushing”. What does it look like?

We will have to implement a simple finite state machine with 4 states in the firmware code. The machine has a task, no more no less, but to ensure the correct start and subsequent self-synchronization between the transmission and sending of samples so that the recording lags in phase from the playback by half a period. This is necessary for the implementation of a fully digital loopback. Here is the graph of this finite state machine.

You can select an array of samples I2STempData[1024]reset it and send it via DMA to I2S. As soon as the interrupt is triggered half by half send enable DMA recording to the same I2STempData array. Then everything will happen on the hardware level by itself. And the digital echo will work.

It turns out that interruptions occur every 512 samples. That is, with a frequency of 93.75 Hz. Every 10.6 ms interruption by DMA. This is 512 times better than without DMA. Moreover, the load can be even reduced further if you increase the size of the temporary array with I2STempData samples.

Everything is in order here: continuous recording, continuous playback. The only minus is the output signal delay by half the length of the time array. How big is this delay? Let's say we work at a frequency of 48 kHz. The duration of one sample is 20.8333 us. How much time is needed to play 512 samples? Answer: only 10.6 ms. In principle, anything less than 15 ms is not noticeable to a person.

But what if we want to tweak the data a little more before sending it? Then a new phase Proc (P) will appear. We'll get a three-stroke conveyor like this.

However, it will not be possible to conduct a three-cycle pipeline, since usually DMA subsystems on ARM Cortex-MCU microcontrollers do not generate interrupts for 1/2 and 2/3 of sending. Yes.. That's it… We have only one intermediate interrupt for 1/2 sending at our disposal. We have to adapt to this.

So we'll have to do it vegetating conveyorwhich does nothing 25% of the time. That's it, a four-stroke conveyor.

This is a palliative solution, purely because of compatibility with Artery MCU (STM has the same situation). Now we are already operating in quarters temporary array with samples. But who will give the order to start processing quarters?

To do this, no matter how you look at it, you will have to run hardware timer #8, which generates interrupts every 21.3ms. This is the same time it takes to broadcast a temporary array of 1024 samples at a sampling frequency of 48kHz. It turns out that the timer will overflow with a frequency of 46.875 Hz. And interrupts will occur with a frequency of 93.75 Hz.

You also need to set up an interrupt on the hardware timer compared to the comparator at half the period. Just like in DMA.

On the lower graph All_Interrupts you can see that interrupts now occur every 256 samples, every 5.3(3)ms. That is, with a frequency of 187.5 Hz. This is 256 times less load on the processor, compared to when we emitted I2S traffic by interrupts without using DMA at all. Success!

But who will give the go-ahead to the hardware timer to start counting? Here, purely in combinatorics, no matter how you look at it, there are only 4 options

No. DMA	Data direction	Event	Action
1	Tx	Half	Timer Counter = 75%
2	Tx	Done	Timer Counter = 25%
3	Rx	Half	Timer Counter = 25%
4	Rx	Done	Timer Counter = 75%

And as you can see, the answer is nobody. In DMA interrupts, you will simply have to set the hardware counter value to 25% or 75% of the counting period. This is even good, since I2S and the hardware timer will be constantly synchronized.

This way we will get the necessary scale to start the procedure of processing the received samples (for example, digital volume change, or digital filtering). It remains only to understand who exactly will give the command to process each individual quarter of the received array. This is it table shows who will do the processing.

We've decided on the theory. Now we need to implement it all on an electronic board.

What's the plan?

–Apply clock to DMA1

–Define channel1 of DMA1 to send audio stream to I2S2

–Define channel2 DMA1 to receive audio stream from I2S2

–Start hardware timer 8

–Set up a comparator on timer 8 and generate an interrupt at half the period.

–Configure I2S2 to the following parameters: 16bit, Master I2S, AudioFreq: 48kHz.

Practical part

It's time to make a decisive experiment and test this idea on real hardware. As a prototype, I will use a training electronic board AT-Start-F437Here is a two-tier assembly: AT-Start-F437 + WM8731.

Had to choose an audio codec WM8731 since it is the only audio codec for which debug boards are sold. This is how the WM8731 audio codec module and the AT-START-F473 training electronic board are connected.

Wire	GPIO	PIN MUX	Pull	MCU directory	WM8731 board
I2S2_CK	PС7	5	Down	out	BCLK
I2S2_WS	PB9	5	Up	out	DACLR, ADLRC
I2S2_SD	PC3	4	Up	out	DADAT
I2S2_SDEXT	PC2	6	Up	in	ADD
I2S2_MCK	PA3	5	Air	out	—
I2C2_SDA	PF0	4	Up	in_out	SDA
I2C2_SCL	PF1	4	Up	out	SCL

Connection diagram of the module with the WM8731 audio codec.

Educational and training kit for recording and playing digital audio

At the SoC(a) AT32F437 level, transceivers must be activated SPI2 for transmission, and I2S2EXT for reception. Plus turn on TMR8

Here it is necessary to explain that in Artery, in order for I2S to work in full duplex, it is necessary to make it up from two separate, simultaneously operating synchronous I2S transceivers.

#	Hardware module	Direction	tire	mode	Boundary address
1	SPI2	Transmitter	APB1	MASTER_TX	0x4000 3800
2	I2S2EXT	Receiver	APB2	SLAVE_RX	0x4001 7800

It is worth noting that resetting registers in SPI2 simultaneously resets registers in I2S2EXT. Here is such a greeting from Artery.

You need to assign I2S2 functions to the DMA channels. Like this.

DMA	Channel	Function	Function	Direction	Data Width	Direction
1	1	13	SPI2_TX	out	HALF_WORD	MEM->PERIPH
1	2	110	I2S2_EXT_RX	in	HALF_WORD	PERIPH->MEM

Processing received samples

As we remember, with a sampling frequency of 48 kHz, we have only 20.83 us to process one sample. This means that in the pipeline for processing a continuous portion of 256 samples, we will have a maximum of 5.3 ms. For everything. Therefore, it is necessary for the C function in the firmware to have time to do its DSP processing in less than 5.3 ms.

As an example of simple audio processing, I tried to implement an artificial echo effect based on a digital IIR filter. Here is its digital circuit.

All calculations were for 16-bit samples. On ARM Cortex-M4F with a core frequency of 250 MHz, these 256 16-bit samples are calculated by the IIR filter in just 1.613 ms. It turns out that one sample is processed in 6.3 us. This is 3.3 times faster than the deadline allows. Success.

How to debug I2S?

Any development starts only when debugging tools appear. Especially in I2S, since it is a hi load interface. Here megahertz is in bit rate. How to keep track of everything? How to check everything? The answer is simple. It is necessary to debug the audio path in parts, similar to how integrals are calculated in parts in mathematics.

Phase1: Test I2S on MCU side

I2S in full duplex mode can be tested in a very clever way. You can install a jumper from the output to the input and enable I2S data transmission. This is called loopback It is expected that after stopping the same I2S stream will be recorded as it was played. Bit for bit.

This way it will be possible to check the correctness of the I2S transceiver settings on the microcontroller side.

Phase2: I2S testing on the WM8731 audio codec side

Similarly, you can check the Audio Codec itself. You need to take a jumper or a bridge and connect the ADC-DATA and DAC_DATA pins. This will result in a fully digital I2S echo. Bit for bit. The same loopback.

Then you can try saying something into the microphone and listening to the headphones at the same time.

++If the config in the codec is correct, then you will hear what you said in real time. Without delay.

—If there is no sound or some crackling appears, then look for an error in the I2C cell config inside the ASIC audio codec.

That's it. With just one wire it's quite possible to find an error in the configs, either in the MCU or in the codec.

When the I2S Full Duplex mode starts working, the oscilloscope should show a picture like this.

Phase 3 Checking Synchronization

You can also use GPIO to control and debug synchronization between DMA receive and send channels. You just need to do the following in the DMA interrupt handlers:

DMA	Channel	Interrupt	GPIO Action	GPIO	explanation
1	1	Half	Set 3.3V	PB6	Sending half
1	2	Half	Set 3.3V	PB7	Reception of half
1	1	Done	Set 0V	PB6	Sending completed
1	2	Done	Set 0V	PB7	The reception is closed

Then, if everything is fine, you should get an oscillogram like this

It is precisely DMA sending and DMA receiving that are offset from each other by half a period. This is how it should be.

Now you can safely launch Timer8 with comparison channel #1 to generate signals to start processing the received audio samples. It is worth noting that not all timers in the Artery microcontroller have comparison channels. Timer 6 does not have them at all. And all timers have a different number of comparison channels. I made the interrupt in the middle of the count based on the hardware comparator that is in the timers and these comparators can also generate interrupts.

As you can see, we managed to split the I2S 1024 sample reception into 4 equal time intervals and generate triggers for each quarter. Now all that remains is to add the sample processing code to the interrupt handlers of the timer 8 and DMA channels and we will get real-time sound processing.

Advantages of I2S in DMA mode

++Low load on the microcontroller's central processor. Maximum 2 interrupts for the entire data transfer. Sending and receiving are done entirely by hardware. This opens the way for processing I2S traffic in real time directly on the MCU.

Disadvantages of I2S in DMA mode

–DMA does not have interrupts at 1/3 and 2/3 of the transferred data. In my opinion, this is sad. Interrupts at 33% 66% of the work would allow for optimization of the Read-Proc-Transmitt (RPT) type pipeline performance.

Results

As you can see, running I2S in DMA mode is a real hassle. This is a hi-load topic. Everything needs to be done extremely carefully. A whole conveyor needs to be debugged and launched. You need to understand finite state machines. You need to know how to debug.

But DMA is the only correct way to work with interfaces such as I2S, UART, SPI, etc.

This text is an example of how in microcontroller programming, competent documentation is more important than source code. Anyone can write code with good documentation. And having only the code, it is sometimes simply unrealistic to restore the documentation.

The results obtained open the way for developing all sorts of applications for processing audio data in real time. It is possible to impose an echo effect, distortion, perform digital filtering, etc.

I managed to implement a digital echo effect based on an IIR filter, which works completely in real time. Right on the microcontroller.

I hope that the text will help someone else to understand I2S and write an interesting sound application on a microcontroller.

Dictionary

Acronyms	Transcript
DMA	direct memory access
BIH	infinite impulse response
IIR	Infinite impulse response
I2S	Inter-Integrated Circuit Sound
RPT	Read-Process-Transmit
RT	Read-Transmit
GPIO	General-purpose input/output