Launching I2S Transceiver on Artery [часть 2] (DMA FSM, Pipeline)

Prologue

In this text you will learn what an I2S transceiver and pancakes have in common. Yes… That's right. Why does a microcontroller programmer need pipelines and digital filters?

This text describes how to emit sound using I2S DMA.

What's the problem?

In the previous text it was mentioned that the operation of the I2S transceiver in interrupt mode is associated with a relatively high load on the processor core. This is due to the fact that interrupt handlers are often called. If we transmit sound with a frequency of 48 kHz, then interrupts will occur with a frequency of 96 kHz.

Something needs to be done about this problem.

Statement of the problem

Write a firmware application that will be both continuously write (R) and continuously broadcast (T) data via I2S2 in the Artery microcontroller. At the same time? everything should work in real time. Implement the R->T scheme.

Translated into kitchen language, you need to do fully digital echo. Through RAM memory.

As a task with an asterisk, try to slightly modify the received data in real time before sending it back to I2S. Pass it through some digital filter. That is, implement the Read(R)->Proc(P)->Transmit(T) scheme.

Solution

So, let's start from scratch… The solution to the problem is to broadcast and record sound via direct memory access (DMA). DMA is a coprocessor that can only move data within the physical memory of the microprocessor. In essence, DMA is a hardware implementation of the memcpy() function. DMA is good because it generates a DONE interrupt when sending is complete. DMA can also generate an interrupt about the successful transfer of half of the planned data volume. That's it.

Transferring data from RAM to SPI

Transferring data from RAM to SPI

DMA has 3 modes.

No.

Src

Dist

Explanation

1

RAM

PHY

Broadcast to the periphery

2

PHY

RAM

Recording from the periphery

3

RAM

RAM

memcpy()

4

PHY

PHY

there is no such mode

Obviously, we can't record and play the same PCM data at the same time. We need to record it in pieces and play it in pieces too. For example, 1024 samples. One sample is 2 channels. Each channel has 2 bytes (int16_t).

Naively intuitively, this is the way to solve the problem of digital echo

However, this is a very bad idea, as it breaks the continuity of the audio recording and breaks the continuity of the audio playback. This method will only work if the size of the recorded audio track is only one sample! However, this does not solve the problem of too frequent interruptions.

What should we do?

To solve this engineering problem, we need to remember an old Russian folk logic puzzle.

How to fry 3 pancakes in 3 minutes if each side is fried for 1 minute, and only 2 pancakes fit in the frying pan?

And the solution to the problem is this: T-top, B-button.

This is, in a way, the very first life experience of using pipeline processing at the household level. A similar principle should be applied to I2S digital sound processing. Only here, instead of pancakes, there will be arrays of words of 514 bytes. Instead of a frying pan – an I2S transceiver. That's all…

So it turns out that we need to divide the temporary memory into two equal parts Low and Hi. While L is being written, broadcast H, while H is being written, broadcast L.

Considering that DMA has a circular mode, such processing will work on its own indefinitely. It just needs to be started. How to start? In working-class language, I2S-DMA needs to be started “by pushing”. What does it look like?

We will have to implement a simple finite state machine with 4 states in the firmware code. The machine has a task, no more no less, but to ensure the correct start and subsequent self-synchronization between the transmission and sending of samples so that the recording lags in phase from the playback by half a period. This is necessary for the implementation of a fully digital loopback. Here is the graph of this finite state machine.

You can select an array of samples I2STempData[1024]reset it and send it via DMA to I2S. As soon as the interrupt is triggered half by half send enable DMA recording to the same I2STempData array. Then everything will happen on the hardware level by itself. And the digital echo will work.

It turns out that interruptions occur every 512 samples. That is, with a frequency of 93.75 Hz. Every 10.6 ms interruption by DMA. This is 512 times better than without DMA. Moreover, the load can be even reduced further if you increase the size of the temporary array with I2STempData samples.

Everything is in order here: continuous recording, continuous playback. The only minus is the output signal delay by half the length of the time array. How big is this delay? Let's say we work at a frequency of 48 kHz. The duration of one sample is 20.8333 us. How much time is needed to play 512 samples? Answer: only 10.6 ms. In principle, anything less than 15 ms is not noticeable to a person.

But what if we want to tweak the data a little more before sending it? Then a new phase Proc (P) will appear. We'll get a three-stroke conveyor like this.

However, it will not be possible to conduct a three-cycle pipeline, since usually DMA subsystems on ARM Cortex-MCU microcontrollers do not generate interrupts for 1/2 and 2/3 of sending. Yes.. That's it… We have only one intermediate interrupt for 1/2 sending at our disposal. We have to adapt to this.

So we'll have to do it vegetating conveyorwhich does nothing 25% of the time. That's it, a four-stroke conveyor.

This is a palliative solution, purely because of compatibility with Artery MCU (STM has the same situation). Now we are already operating in quarters temporary array with samples. But who will give the order to start processing quarters?

To do this, no matter how you look at it, you will have to run hardware timer #8, which generates interrupts every 21.3ms. This is the same time it takes to broadcast a temporary array of 1024 samples at a sampling frequency of 48kHz. It turns out that the timer will overflow with a frequency of 46.875 Hz. And interrupts will occur with a frequency of 93.75 Hz.

You also need to set up an interrupt on the hardware timer compared to the comparator at half the period. Just like in DMA.

On the lower graph All_Interrupts you can see that interrupts now occur every 256 samples, every 5.3(3)ms. That is, with a frequency of 187.5 Hz. This is 256 times less load on the processor, compared to when we emitted I2S traffic by interrupts without using DMA at all. Success!

But who will give the go-ahead to the hardware timer to start counting? Here, purely in combinatorics, no matter how you look at it, there are only 4 options

No. DMA

Data direction

Event

Action

1

Tx

Half

Timer Counter = 75%

2

Tx

Done

Timer Counter = 25%

3

Rx

Half

Timer Counter = 25%

4

Rx

Done

Timer Counter = 75%

And as you can see, the answer is nobody. In DMA interrupts, you will simply have to set the hardware counter value to 25% or 75% of the counting period. This is even good, since I2S and the hardware timer will be constantly synchronized.

This way we will get the necessary scale to start the procedure of processing the received samples (for example, digital volume change, or digital filtering). It remains only to understand who exactly will give the command to process each individual quarter of the received array. This is it table shows who will do the processing.

We've decided on the theory. Now we need to implement it all on an electronic board.

What's the plan?

–Apply clock to DMA1

–Define channel1 of DMA1 to send audio stream to I2S2

–Define channel2 DMA1 to receive audio stream from I2S2

–Start hardware timer 8

–Set up a comparator on timer 8 and generate an interrupt at half the period.

–Configure I2S2 to the following parameters: 16bit, Master I2S, AudioFreq: 48kHz.

Practical part

It's time to make a decisive experiment and test this idea on real hardware. As a prototype, I will use a training electronic board AT-Start-F437Here is a two-tier assembly: AT-Start-F437 + WM8731.

here it is in real life

here it is in real life

Had to choose an audio codec WM8731 since it is the only audio codec for which debug boards are sold. This is how the WM8731 audio codec module and the AT-START-F473 training electronic board are connected.

Wire

GPIO

PIN MUX

Pull

MCU directory

WM8731 board

I2S2_CK

PС7

5

Down

out

BCLK

I2S2_WS

PB9

5

Up

out

DACLR, ADLRC

I2S2_SD

PC3

4

Up

out

DADAT

I2S2_SDEXT

PC2

6

Up

in

ADD

I2S2_MCK

PA3

5

Air

out

I2C2_SDA

PF0

4

Up

in_out

SDA

I2C2_SCL

PF1

4

Up

out

SCL

Connection diagram of the module with the WM8731 audio codec.

Educational and training kit for recording and playing digital audio

Educational and training kit for recording and playing digital audio

At the SoC(a) AT32F437 level, transceivers must be activated SPI2 for transmission, and I2S2EXT for reception. Plus turn on TMR8

Here it is necessary to explain that in Artery, in order for I2S to work in full duplex, it is necessary to make it up from two separate, simultaneously operating synchronous I2S transceivers.

#

Hardware module

Direction

tire

mode

Boundary address

1

SPI2

Transmitter

APB1

MASTER_TX

0x4000 3800

2

I2S2EXT

Receiver

APB2

SLAVE_RX

0x4001 7800

It is worth noting that resetting registers in SPI2 simultaneously resets registers in I2S2EXT. Here is such a greeting from Artery.

You need to assign I2S2 functions to the DMA channels. Like this.

DMA

Channel

Function

Function

Direction

Data Width

Direction

1

1

13

SPI2_TX

out

HALF_WORD

MEM->PERIPH

1

2

110

I2S2_EXT_RX

in

HALF_WORD

PERIPH->MEM

Processing received samples

As we remember, with a sampling frequency of 48 kHz, we have only 20.83 us to process one sample. This means that in the pipeline for processing a continuous portion of 256 samples, we will have a maximum of 5.3 ms. For everything. Therefore, it is necessary for the C function in the firmware to have time to do its DSP processing in less than 5.3 ms.

As an example of simple audio processing, I tried to implement an artificial echo effect based on a digital IIR filter. Here is its digital circuit.

All calculations were for 16-bit samples. On ARM Cortex-M4F with a core frequency of 250 MHz, these 256 16-bit samples are calculated by the IIR filter in just 1.613 ms. It turns out that one sample is processed in 6.3 us. This is 3.3 times faster than the deadline allows. Success.

How to debug I2S?

Any development starts only when debugging tools appear. Especially in I2S, since it is a hi load interface. Here megahertz is in bit rate. How to keep track of everything? How to check everything? The answer is simple. It is necessary to debug the audio path in parts, similar to how integrals are calculated in parts in mathematics.

Phase1: Test I2S on MCU side

I2S in full duplex mode can be tested in a very clever way. You can install a jumper from the output to the input and enable I2S data transmission. This is called loopback It is expected that after stopping the same I2S stream will be recorded as it was played. Bit for bit.

This way it will be possible to check the correctness of the I2S transceiver settings on the microcontroller side.

Phase2: I2S testing on the WM8731 audio codec side

Similarly, you can check the Audio Codec itself. You need to take a jumper or a bridge and connect the ADC-DATA and DAC_DATA pins. This will result in a fully digital I2S echo. Bit for bit. The same loopback.

Then you can try saying something into the microphone and listening to the headphones at the same time.

++If the config in the codec is correct, then you will hear what you said in real time. Without delay.

—If there is no sound or some crackling appears, then look for an error in the I2C cell config inside the ASIC audio codec.

That's it. With just one wire it's quite possible to find an error in the configs, either in the MCU or in the codec.

When the I2S Full Duplex mode starts working, the oscilloscope should show a picture like this.

typo: not I2C2_SCL, but I2S2_CLK

typo: not I2C2_SCL, but I2S2_CLK

Phase 3 Checking Synchronization

You can also use GPIO to control and debug synchronization between DMA receive and send channels. You just need to do the following in the DMA interrupt handlers:

DMA

Channel

Interrupt

GPIO Action

GPIO

explanation

1

1

Half

Set 3.3V

PB6

Sending half

1

2

Half

Set 3.3V

PB7

Reception of half

1

1

Done

Set 0V

PB6

Sending completed

1

2

Done

Set 0V

PB7

The reception is closed

Then, if everything is fine, you should get an oscillogram like this

It is precisely DMA sending and DMA receiving that are offset from each other by half a period. This is how it should be.

Now you can safely launch Timer8 with comparison channel #1 to generate signals to start processing the received audio samples. It is worth noting that not all timers in the Artery microcontroller have comparison channels. Timer 6 does not have them at all. And all timers have a different number of comparison channels. I made the interrupt in the middle of the count based on the hardware comparator that is in the timers and these comparators can also generate interrupts.

As you can see, we managed to split the I2S 1024 sample reception into 4 equal time intervals and generate triggers for each quarter. Now all that remains is to add the sample processing code to the interrupt handlers of the timer 8 and DMA channels and we will get real-time sound processing.

Advantages of I2S in DMA mode

++Low load on the microcontroller's central processor. Maximum 2 interrupts for the entire data transfer. Sending and receiving are done entirely by hardware. This opens the way for processing I2S traffic in real time directly on the MCU.

Disadvantages of I2S in DMA mode

–DMA does not have interrupts at 1/3 and 2/3 of the transferred data. In my opinion, this is sad. Interrupts at 33% 66% of the work would allow for optimization of the Read-Proc-Transmitt (RPT) type pipeline performance.

Results

As you can see, running I2S in DMA mode is a real hassle. This is a hi-load topic. Everything needs to be done extremely carefully. A whole conveyor needs to be debugged and launched. You need to understand finite state machines. You need to know how to debug.

But DMA is the only correct way to work with interfaces such as I2S, UART, SPI, etc.

This text is an example of how in microcontroller programming, competent documentation is more important than source code. Anyone can write code with good documentation. And having only the code, it is sometimes simply unrealistic to restore the documentation.

The results obtained open the way for developing all sorts of applications for processing audio data in real time. It is possible to impose an echo effect, distortion, perform digital filtering, etc.

I managed to implement a digital echo effect based on an IIR filter, which works completely in real time. Right on the microcontroller.

I hope that the text will help someone else to understand I2S and write an interesting sound application on a microcontroller.

Dictionary

Acronyms

Transcript

DMA

direct memory access

BIH

infinite impulse response

IIR

Infinite impulse response

I2S

Inter-Integrated Circuit Sound

RPT

Read-Process-Transmit

RT

Read-Transmit

GPIO

General-purpose input/output

Links

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *