Simulation of the implementation of the Fast Fourier Transform (FFT) module and comparison with a similar kernel from Xilinx

When developing complex functional blocks (CFBs) of digital signal processing, an important step is modeling the operating algorithm. This stage can take significant time, delay the start of RTL writing and, as a result, increase the overall development time. Therefore, given limited time for development, many people prefer to skip this stage. But in vain.

Having a model allows you to solve several problems at once:

  • firstly, the model allows you to evaluate the quality characteristics of the selected algorithm even before you start writing the RTL, and, if necessary, you can easily make changes and quickly check them;

  • it becomes possible to quickly compare different implementations of algorithms, including with third-party implementations;

  • As a bonus, it is possible to create test vectors from anywhere in the model, which greatly facilitates RTL debugging and, as a result, reduces development time.

This article describes the results of modeling a proprietary FFT block with the following characteristics:

  • vector size – 1024 points;

  • input and output data are represented in complex number format, where the re/im components are represented as fixed-point values ​​in signed integer format (16 bits);

  • The FFT is performed by the RADIX-4 frequency-decimation algorithm;

  • the FFT implementation has a pipeline structure;

  • scaling/normalization of the results of each stage is set programmatically.

For comparison, we used Xilinx's FFT IP core with similar characteristics, generated using “Vivado v.2018.1 (win64) Build 2188600”.

Modeling environments

To simulate and study the characteristics, the GNU Octave, version 8.4.0 package was used.
“ModelSim – Intel FPGA Starter Edition 2020.1” was used to simulate the Xilinx FFT core.

Description of the studied characteristics

Terms:

  • SNR (Signal-to-Noise Ratio) – signal-to-noise ratio, calculated as the ratio of the useful signal power to the noise power. Typically expressed in dB and calculated:SNR_{dB} = 10 * log10(P_{signal}/P_{noisy});

  • Noise – deviation of the received signal from the ideal;

  • To calculate the power of a complex value, multiply this value by its complex conjugate.

To assess the qualitative characteristics of the studied nuclei, the following SNR values ​​were used:

  • minimum (worst case) SNR – the SNR value was calculated for each input test vector. After testing on a set of test vectors, the minimum value was selected.

  • averaged SNR – for each input test vector, the power values ​​of the useful signal and noise were calculated. After testing on a set of test vectors, the sums of the useful signal and noise powers were calculated, and then these values ​​were used to calculate the SNR.

Output Spectrum Study

Initially, the test vectors were a single-frequency signal. To calculate the SNR, only the result of the module's operation (output spectrum) was used. The maximum value at the FFT output was used as the useful signal, and all other non-zero values ​​were considered as noise.

But with this approach, the output FFT spectrum will contain not only the noise of the module itself, but also quantization errors (noise) of the input signal. If it is necessary to determine the characteristics of only the module under study, and not the entire system, including the test environment, then this approach is not suitable.
In addition, when using a multi-frequency input signal, there is uncertainty about what is considered a useful signal and what is noise.

Updated research structure

The main idea of ​​the updated scheme was to use the result of the function as a useful (reference) signal when calculating SNR fft() from the Octave package over the input signal, and use the deviation of the calculated spectrum from the reference one as noise.

To model and obtain the characteristics of the modules under study, two tests were used: the “running one” test and the “running zero” test.

Running unit

This test is somewhat analogous to the test of the same name for testing memory. The essence of the test:

  • A set of test vectors is supplied to the input of the module under study;

  • The generation of each test vector is performed as follows:

    • in the frequency domain, a vector with a length equal to the size of the FFT is created, filled with the value 0;

    • a non-zero (unit) value is set for a given frequency component.

    • To ensure that the test vectors are not purely real, each test vector is rotated in the complex plane using the following formula: dout(SEL_F + 1) = exp(2 * \pi * i * SEL_F / SIZE);

  • Each test vector is designed to test only one specified frequency;

  • Test vectors are generated to test all frequencies (one at a time) for a given FFT size.

The structure of the test is shown in Figure 1.

Figure 1: Test structure "running unit".

Figure 1: Structure of the running unit test.

Where:

  • “Test generator” – generates a test vector in the frequency domain in floating point format;

  • “Octave ifft()” – a function from the Octave package, performs an inverse FFT in floating point format to generate a test vector in the time domain. Since the power in the input vector is concentrated at only one point, and in the output vector it is distributed over the entire vector, the amplitude of the output signal will not exceed: (1.0/FFT size);

  • “Scaling up” – scales the test vector to 1.0 (multiply by the FFT size) to obtain the maximum amplitude of the test vector;

  • “FP2FIX” – converts the test vector from floating point format to fixed point format. To do this, the conversion is performed by multiplying the input vector by 32767. Then rounding with saturation is performed for each vector value.

  • “FFT model” – the model under study, forms a spectrum in the frequency domain;

  • “Octave fft()” – a function from the Octave package that generates a reference spectrum in the frequency domain (useful signal);

  • “Scaling down” – since the maximum value in the reference spectrum exceeds the maximum value determined by the fixed point bit depth, it is necessary to perform reverse scaling (dividing by the FFT size);

  • “SNR calculation” – calculation of SNR.

To simulate the Xilinx FFT core, the following steps were performed:

  • “Write input vector” – in the Octave package, the input test vector is written to a file;

  • “Xilinx IP sim” – simulation of the FFT core from Xilinx is performed in the ModelSim environment using test vectors from the previous step;

  • “Write output vector” – the simulation result is written to a file;

  • “SNR calculation” – calculation of SNR.

To avoid overflow when simulating the “running unit” test, for both models the output of each stage is programmed to shift = 2 (in the Xilinx FFT documentation this is called “Scaling Schedule”).

When calculating SNR for cases where noise is 0, the value of “infinity” is replaced by a fixed value of 400 dB.

results

The SNR distribution graph depending on the number of the test vector (tested frequency) for the developed FFT model is presented in Figure 2.

Figure 2: SNR distribution plot for the test "running unit"

Figure 2: SNR distribution plot for the running unit test

The following features can be seen in the graph:

  • for test vectors for frequency components with numbers 0, 256, 512 and 768, the SNR value = 400 dB, i.e. noise is 0;

  • SNR distribution over the interval [0:255] repeated 4 times. In other words, it is enough to analyze the distribution only for the first 256 test vectors.

Since Xilinx's FFT kernel takes a long time to simulate, and our own model simulation determined that it was sufficient to consider the SNR of only the first 256 test vectors, the Xilinx FFT kernel simulation was limited to the set of test vectors to the first 256 test vectors.

The results were as follows:

Min SNR of FFT model = 90.7772 dB
Min SNR of Xilinx FFT = 89.6613 dB
Average SNR of FFT model = 94.606 dB
Average SNR of Xilinx FFT = 94.4597 dB

Running zero

The main difference between this test and the “running unit” test is the formation of the test vector:

  • in the frequency domain, a vector of a given FFT size is created, filled with non-zero values ​​according to the following formula:

{angle} = {SIZE} - (1:{SIZE});
dout = exp(2 * \pi * i / {SIZE} * {angle});

The structure of the test is presented in Figure 3 and has minor differences from the “running unit” test:

  • “Scaling down” – the test vector in the time domain after calculating the inverse FFT is scaled down (multiplied by (fft_size - 4)/fft_size). This is due to the lack of saturation logic in the Xilinx implementation. When an arithmetic overflow occurs, the output spectrum of the Xilinx FFT core “falls apart.” To be able to compare the output spectrum of the FFT model under study with the results of the Xilinx FFT core, we had to reduce the amplitude of the input test vector. The coefficient was selected experimentally;

  • when generating the reference spectrum, there is no inverse scaling to obtain the maximum amplitude;

  • To obtain the maximum amplitude of the output spectra, the scaling factors (“Scaling Schedule”) for both models (the studied model and the Xilinx FFT kernel) were set to 0.

Figure 3: Test structure "running zero"

Figure 3: Structure of the running zero test

results

The SNR distribution graph depending on the number of the test vector (tested frequency) for the developed FFT model is presented in Figure 4.

Figure 4: SNR distribution plot for the test "running zero"

Figure 4: SNR distribution plot for the running zero test

The SNR distribution is different from the previous test, but you can also notice the 4-fold repetition of the first segment.

Xilinx's FFT kernel simulation was also run only on the first 256 test vectors.

The results were as follows:

Min SNR of FFT model = 78.5121 dB
Min SNR of Xilinx FFT = 78.5687 dB
Average SNR of FFT model = 80.2368 dB
Average SNR of Xilinx FFT = 80.2085 dB

conclusions

  • When using SNR to evaluate the performance characteristics of different implementations, it is important to use the same measurement techniques and the same input stimuli. The SNR values ​​themselves for different implementations, without taking into account how they were obtained, are “parrots” and “boa constrictors”.

  • If it is possible to compare the quality characteristics of the block being developed with the implementations of other developers, this must be done. In my case, the initial implementation of the module produced results that I thought were good until I compared the results with results from a Xilinx kernel simulation. The presence of the model made it possible to identify potential problem areas in the algorithm and quickly make the necessary changes. As a result, the characteristics turned out to be no worse.

  • If we talk directly about the FFT module, the output spectrum very much depends on the characteristics of the input signal. Therefore, to correctly configure the module, it is also necessary to carry out modeling using examples of real signals that you will have to work with.

  • When using the Xilinx FFT core, you must be aware of the lack of saturation logic in arithmetic operations:

When an overflow occurs in the core, the data is wrapped rather than saturated, resulting in the transformed data becoming unusable for most applications. (Product specification: LogiCORE IP, Fast Fourier Transform v8.0)

In other words, it is necessary to ensure that there is no overflow when the module is running. This is possible either due to strict control of the amplitude of the input signal, or due to the inclusion of an additional shift at the first (input) stage of the module. In both cases, the amplitude of the output signal (spectrum) will be lower than possible. The third option is to use a floating arithmetic FFT module.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *