“Sigma delta” or how to make a good sound card from STM32F401

My wife is prevented from watching the latest news from the phone and TV, grandchildren coming to eat (primarily?) and play on the computer (secondary?). She certainly loves them, but the sounds of their interaction with the computer annoy her greatly. I had to put headphones on my grandchildren. And the sound output from the computer is in an inconvenient place and everyone wants it with their own volume. Well, I had to develop an external USB sound card. I want beautiful and high quality. However, the grandchildren are more likely, only a reason to be nostalgic for my old specialty as a radio designer and in general, since for the past twenty-odd years I have been away from her and have been writing spells image reconstruction programs for medical tomographs in large and small firms, that is, a pogromist engineer by profession. I wanted to write an article on this very interesting and important topic (computed tomography), but it turned out that I was not allowed under the terms of the contract …

So let’s get back to our sheep grandchildren and sound cardswe have several copies (over dofiga, bought while they were cheap) modules from China

1 Stm32f401ccu6 black pill – now $3 a piece

2 I2S DAC Decoder GY-PCM5102 ->$3.5 per item

3 SPI display ips 1.3 inch 240×240 (controller st7789) ->$7 for two

First, let’s build the maximum configuration of two screens and I2S GY-PCM5102.

Configuring the cube, diluting it shit code to your G. code, we add a PLL (phase-locked loop or PLL on theirs), to match the speeds of the data coming from the computer and output an external DAC (DAC) to the i2s. Hmm, sounds very good, obviously better than most built-in sounders. PCM5102 is a very, very high-quality DAC for its price of a couple of dollars for a module with a chip. We add the display of level indicators on a pair of good st7789 displays …

It took a little fiddling with them. Firstly, these do not display CS (Chip Select – chip selection leg). Therefore, each has its own SPI (SPI_1 and SPI_3). Secondly, their DMA (direct memory access) is very slow. Accordingly, copying a full screen from memory takes 32 mS for SPI_1, and 51 milliseconds for SPI_3, respectively.

Based on this, the arrows of the meters and their shadows (!) Are drawn and erased incrementally, which fits in 8 milliseconds in total on both screens. The position of the hands is set to a maximum of approximately 20ms with a decay time constant of 300ms (approximately like real VU meters)

wonderful and relevant song by V. Vysotsky “Storks” (video)

And then I remembered, a very long time ago, more than 30 years ago, I was taught to be a blue smoke caster, that is, I know why an oscilloscope needs handles, and microcircuits need legs. Handles – to pull them, and legs – to pull them! Is it possible to render the sound of the Stm itself?

There is no DAC on this chip (STM32f401). As such, you can use a timer with its PWM (PWM) or SPI.

Timers can be initialized to two, three or 4 channels for N signal levels at Fbus = 84 MHz, we get the sampling rate Fds = 84,000,000 / N , or vice versa N=Fbus/ Fds . For example, for a sampling rate of 44100 Hz, we get 84,000,000/44100 = 1904 levels, which corresponds to approximately log2(1904) = 10.9 bits. Well, or at 96000 Hz, respectively, 84MHz / 96KHz = 875 levels, which corresponds to 9.8 bits. It will not be enough.

SPI can only output two levels 0 or 1, though up to 42 megabits per second.

Hmm.. 1 bit is not enough though…

So PDM (Pulse-density modulation, I don’t know exactly how in Russian).


Let’s also formulate it in the form of a code (input signal from -1.0 to 1.0 )

struct sigmaDeltaStorage
	float integral;
	int    y;

struct sigmaDeltaStorage left_chanel;
struct sigmaDeltaStorage right_chanel;

int sigma_delta(struct sigmaDeltaStorage* st,float x)
	st->integral += x - st->y;
	st->y =  (st->integral>0.0f) ? 1:-1;
	return st->y;

in a loop:

Read data of left and right channels from USB receiver circular buffer with interpolation (preferably bilinear)

Call the sigma_delta function

Send the result for issuance.

The trick here is that if the input signal is limited to a certain frequency, and the sampling frequency (sampling) is large enough, then passing the output signal through a low-pass filter will get a signal close to the original one, and the higher the sampling frequency, the better!

Consider an example. Input signal – constant x=0.31415

void testFunc(int NumberSamples)
	for(int t=0;t<NumberSamples;t++)
		x[t] = 0.31415f;
	for(int t=0;t<NumberSamples;t++)
		y[t] = sigma_delta(&Ch0,x[t]);
	float meanSumm = 0;
	for(int t=0;t<NumberSamples;t++)
		meanSumm += y[t];
	printf("meanSum=%f \n",meanSumm);

for 10 calls, the function will give the following sequence [1,-1, 1,1,-1,1,1,-1,1,1] their average will be 0.4 , for 1000 calls the average will be 0.314 , for 10000 -> 0.3142 and so on.

That is, the higher the sampling frequency, the more accurate the approximation of the frequency-limited input signal. Focus was invented in the 60s of the last century.

Unfortunately, I did not manage to implement this algorithm directly on stm-ke with bit generation and output to SPI, the required frequency should be about 2.884 MHz (https://en.wikipedia.org/wiki/Super_Audio_CD) . It looks unrealistic to generate two channels with sampling with interpolation from the usb buffer and calling the sigma_delta function for each bit and packing bits for output in SPI at a processor frequency of only 84 MHz. Only 15 clocks on the left and 15 clocks on the right channels per bit with a clogged DMA bus – this is unrealistic …


But what if we rewrite sigma_delta so that it gives out more than two signal levels [1 -1] (with one comparator near zero ), and more ? For example 4 levels [3 1 -1 -3] ? Or even more, for example N + 1 (we also shifted the input and output signal to the range from [0 N]so you need to issue PWM to the timer):

int sigma_delta(struct sigmaDeltaStorage* st,float x)
	st->integral+= x - st->y;
	st->y = floorf(st->integral+0.5f);  // nearest integer
	if(st->y<0) st->y = 0;                    
	if(st->y>N) st->y = N;

	return st->y;

Hooray – we invented the wheel. If I had read the article more carefully https://en.wikipedia.org/wiki/Delta-sigma_modulation then I would pay attention that one bit is a special case, which is written there in black and white 🙁

Now we can choose a compromise between the sampling rate and the number of discrete levels that is convenient for implementation. Simple sampling of the signal gives a uniform frequency noise with a value of half the sampling step. Sigma delta gives noise inversely proportional to the frequency of the signal (in the first approximation). That is, the energy of the sampling noise spectrum is shifted upward, beyond audible frequencies! The trick can be repeated with a double, triple, etc. integral:

float sigma_delta2(struct sigmaDeltaStorage2* st,float x)
	st->integral0+= x             - st->y;
	st->integral1+= st->integral0 - st->y;
	st->y = floorf(st->integral1+0.5f);
	if(st->y<0)st->y = 0;
	if(st->y>MAX_VOL)st->y = MAX_VOL;

	return st->y;

There is an analogy with high-order low-frequency digital filters. However, we will leave the design of stable high-order filters to professionals (filter design). In the code, we implement simple filters of the first and second order, with implementation in fixed-point numbers (on our STM this is the fastest type int32_t )

For implementation, I chose a frequency of 101 kHz, respectively, the number of levels 828 (84 MHz / 101 kHz) per sigma-delta of the second order of integration, however, if you wish, you can play with the options. It is better to set the number of displays to 0 for a configuration with sigma-delta, since the noise from them will be heard in the headphones, however, not very strong, you can listen. But it is preferable to use an external DAC together with level indicators or organize a power decoupling for them. Without displays, with a sigma-delta RC low-pass filter on the STM32f401, it gives very good sound, better than most built-in sound cards on inexpensive computers and clearly better than cheap Chinese USB-> 3.5mm audio plugs for headphones.

PS. Pros in the field of sound processing and radio designers, do not find fault. Firstly, in more than 30 years I have forgotten a lot and now I am just an amateur in this area. Secondly, I didn’t want to overcomplicate the article with mathematics here, it really needs a lot for a detailed description ..

More Video : Snow swirling groups of flames

Source code on github

Similar Posts

Leave a Reply