On Dec 20, 2007, at 10:22 PM, Markus Weissmann wrote:
On Dec 20, 2007, at 5:03 PM, Vincent Lefevre wrote:
On 2007-12-20 17:59:25 +1100, Joshua Root wrote:
[1] <http://lixom.net/~olof/64bit-perf.pdf> [2] <http://www.geekpatrol.ca/2006/09/32-bit-vs-64-bit-performance/>
I don't understand why they say that 5 instructions are needed for constants in 64-bit binaries. Can't the PowerPC load the constant from the memory with a single instruction? This is the solution chosen on the ARM for complex constants (if they are in the cache, this should be fast enough). But many constants are simple enough to be loaded with a single instruction (on the ARM, these are 8-bit values rotated by an even number of positions), in particular after optimizing the code.
If I remember correctly, all powerpc instructions have a length of 32 bit. Given that you need some bits for the opcode, a mere 16 bit remain to stuff a constant value to it (for the load high/add intermediate instructions). So, for a 64 bit value to load, you need to do a -2x loadhigh (2x high 16 bit) -2x add immediate (2x low 16 bit) -1x some combine statement (some shift operation or whatever)
Keep in mind that these 64 bit constants only cost you for pointers. If you want a 32 bit integer, you don't need to load 64 bit -- even in 64 bit mode.
Oh, and don't forget that 64 bit Intel code is actually most often faster than 32 bit code, thanks to the double amount of registers and some other goodies; I've compiled a 32bit/64bit universal `bzcat' and ran both version three times on my Core 2 Duo machine. For this randomly chosen (!) benchmark, I get an impressive edge of ~20% for 64 bit mode: $ time ./bzcat-64 gcc-core-4.3-20071214.tar.bz2 >/dev/null real 0m5.889s user 0m5.168s sys 0m0.096s $ time ./bzcat-64 gcc-core-4.3-20071214.tar.bz2 >/dev/null real 0m5.516s user 0m5.120s sys 0m0.088s $ time ./bzcat-64 gcc-core-4.3-20071214.tar.bz2 >/dev/null real 0m5.489s user 0m5.137s sys 0m0.085s $ time ./bzcat-32 gcc-core-4.3-20071214.tar.bz2 >/dev/null real 0m7.407s user 0m6.707s sys 0m0.107s $ time ./bzcat-32 gcc43/gcc-core-4.3-20071214.tar.bz2 >/dev/null real 0m6.966s user 0m6.540s sys 0m0.097s $ time ./bzcat-32 gcc43/gcc-core-4.3-20071214.tar.bz2 >/dev/null real 0m7.051s user 0m6.583s sys 0m0.103s Regards, -Markus PS: To force the system to run `xy' in 32/64 bit mode (or even Rosetta), just make a copy of the executable with just the arch you want, e.g. `lipo -extract i386 bzcat -output bzcat-32' to extract 32 bit intel code from bzcat and store it into bzcat-32. -- Dipl. Inf. (FH) Markus W. Weissmann http://www.macports.org/ http://www.mweissmann.de/