There was (again) some discussion on #maemo
about gcc options regarding VFP usage so I decided to make a simple test with different gcc options using the Maemo’s cs2005q3.2-glibc2.5-arm
toolchain. Nothing new here for people already familiar with the floating point stuff.
In the test I compiled a very simple test code with different -mfpu
and -mfloat-abi
options, -O2
and -march=armv6
were given in all cases. The asm code is linked in the table. The code is a simple for loop which could be optimized to one round but based on the results it looks like the gcc didn’t do this kind of tricks.
Although you can ask the default options from the gcc I decided to test also without the options. The test was run on n810 and results are in the table below.
GCC | none | hard | soft | softfp |
none | 0m 31.11s 0m 31.01s 0m 0.00s asm |
– – – asm |
0m 31.24s 0m 31.00s 0m 0.00s asm |
0m 4.38s 0m 4.37s 0m 0.00s asm |
fpa | 0m 30.64s 0m 30.54s 0m 0.00s asm |
5m 52.53s 0m 21.25s 5m 30.66s asm |
0m 30.57s 0m 30.55s 0m 0.00s asm |
5m 52.56s 0m 21.06s 5m 30.69s asm |
vfp | 0m 31.09s 0m 30.99s 0m 0.00s asm |
– – – asm |
0m 31.08s 0m 31.00s 0m 0.00s asm |
0m 4.40s 0m 4.39s 0m 0.00s asm |
Looks like the default for -mfpu
is vfp and the default for -mfloat-abi
is soft.
There are three different values, one takes some seconds, one takes some tens of seconds and one takes some minutes.
This was expected as there are three different ways to do floating point math. In the slowest one the kernel emulates the fpu after cpu has run into undefined floating point opcode. This is extremely slow as there are a lot of context switches. A faster way to do the floating point math without the hardware is to let the gcc to compile the floating point math to the binary itself (and something in libgcc?). The fastest way is to use the VFP hardware.