Compilers benchmark

by Thomas Serafini

When writing super-optimized code, it is important to understand how the compiler works on the blocks of code that we use more frequently. That's why I wrote a simple benchmark application that I use to test the optimization capabilities of different compilers; the benchmark tests are specifically designed for DSP purpose; their aim is to test the operations that are frequently used in DSP algorithms. The source code of the application is available. The aim is to test the output of the compiler and not to find the better optimized algorithm.
Here is a brief description of all the test that the application performs:

Float matrix multiplication: this test performs 1000 multiplications of two 64 x 64 matrix with 32 bit float elements. This test shows the multiply and add capabilities, the scheduling of float operations and the scheduling of memory accesses.

Integer conditional move: it is a test for the following instructions:
if (condition[i])
   result[i] = 1;
else
   result[i] = 0;
So, this test shows the speed of an integer compare and branch. The conditions are choosen randomly, so about 50% of branches are predicted correctly and the other 50% are mispredicted.
This test shows another important feature of a compiler. Most of the CPUs now available have got conditional move instruction (like CMOV on the Pentium II); using these instructions you can avoid branching penality in the case of misprediction. But generally, compilers does not use these instructions when generating code. This kind of test let's you understand if a compiler uses them.

Float conditional move: it's the same of "integer conditional move" but the data type is 32 bit floating point.

Lookup table access: in this test, the index of the table is a 32 bit float number, so it is converted to a integer data type before being used as an index for the table; this is extremely common when the samples are managed as float. Many compilers spend dozens of cycles performing overflow test, so this code can be very slow on some compilers.

Biquadratic IIR filter float (and integer): this code tests the FPU (and integer) instruction scheduling capabilities of the compiler, implementing a biquadratic IIR filter.

Computing biquadratic filter coefficient: when you are managing a sweeping filter (like a VCF in a software synth) you need to recalculate the coefficient of the filter very often. This test shows the speed of this operation by calculating 25000000 sets of coefficients.

FFT: 500 blocks of 8192 samples. Doesn't need any other comments!

The table below contains the execution times (in seconds) of the benchmark for each compiler:

  Visual C++ 6.0 Intel compiler 5.0 GCC CW 5.0 PC CW 5.0 Mac Assembler (PC)
Float matrix multiplication 1.48 1.54 1.62 1.45 1.21 1.35
Integer conditional move 0.83 0.61 1.97 1.48 1.15 0.38
Float conditional move 1.15 0.89 2.16 1.84 1.03 0.47
Lookup table accessing 1.54 1.29 2.12 1.33 0.68 0.61
Float biquad filter 3.02 3.98 3.62 2.71 1.88 2.53
Computing biquad coefficents 1.26 1.76 1.39 1.95 1.64   n.a.
Int biquad filter 2.2 2.57 2.81 2.97 2.41 2.04
FFT 1.76 1.86 2.14 2.72 1.43   n.a.

Note:
- all the tests are performed on a PentiumIII 800MHz with 128Mb Ram.
- CodeWarrior 5.0 for Mac test is executed on a G4 733MHz with 384Mb Ram.
- VisualC++ 6.0, Intel Compiler and  CodeWarrior 5.0 PC tests are run on WIndows ME
- GCC compiler test is run under Suse 7.2 distribution
- CodeWarrior 5.0 Mac test is run on MacOS 9.

Last update: 6 october 2002