Conversion
Most of my benchmarks have been converted to run as 64 bit programs and have been tested via Windows XP Pro x64 and 64 bit Windows Vista. The first step was to download Microsoft Platform SDK for Windows Server 2003 SP1. This includes a 64 bit compiler (cl), assembler (ml64) and linker. These can be used via the command line or in a .BAT file and the package can be installed using Win32 or Win64. For Windows programs, .RC files can be converted (rc) to .RES files and the latter to .OBJ (cvtres). Library names used are the same as Win32, like GDI32.LIB.
The compiler does not accept asm type assembler functions so these have to be converted to MASM format but headers and .INC files are different to 32 bit varieties. 64 bit systems cannot run the old x87 floating point instructions nor MMX instructions. The former have to be converted to SSE1/2/3 instructions. MMX instruction names are the same as some provided in SSE2 but memory addresses have to be changed to suit 128 bit registers instead of 64 bits. 32 bit instructions can still be used, including CPUID and RDTSC. The only complication appears to be that push/pop should refer to a 64 bit register (push rdx instead of push edx). There appears to be complications in passing parameters to assembly code but I have avoided this by using global variables.
The SDK includes a 32 bit compiler that checks for 64 bit compatibility and has options to use SSE or SSE2 instructions for floating point. In some cases, this produces identical code to the 64 bit version. In other cases it restricts the number of registers used for 32 bit compatibility. MASM type assembly requires an assembler that comes with such as Microsoft Visual C++ 6.0 Pro. In order to compare 64 versus 32 bit speeds, some of the benchmarks have also been compiled using the SDK 32 bit compiler.
The C/C++ and Assembler source codes for these benchmarks are available in NewSource.zip.
The original versions can be obtained via the Main Page.
To Start
More 64 Bit Benchmarks
Windows, DirectDraw, OpenGL and Image Processing benchmarks have also been converted to run at 64 bits
and a DirectX 9 benchmark has also been produced. See 64 Bit Graphics Tests.htm. Download benchmarks and C/C++ source codes via Video64.zip
Then, there are benchmarks for disks, CD/DVD drives, networks and peripherals in More64bit.zip with results in 64 Bit Disk Tests.htm.
The latest conversions, including source code, are also in More64bit.zip. These are three versions of my Fast Fourier Transform benchmarks (see also FFTGraf.zip), SSE/SSE2 benchmark and burn-in/reliability tests (see also SSE3Dnow.zip) and BusSpd2K burn-in/reliability tests (see also BusSpd2K.zip).
The latter burn-in tests have been modified to demonstrate paging speeds more quickly. See Paging.htm for results via 64-Bit Vista and XP Pro x64.
To Start
Other Results
Results of 64 bit tests, descriptions and some comparisons are included in results reports for 32 bit versions.
To Start
System ID
Each benchmark includes a new system identification test. This is limited because Intel appear to make significant changes with each new CPU (now much too complicated for identifying dual CPUs with HT enabled or cache sizes on a range of CPUs). Windows functions also considerably lag on hardware capabilities. The following shows details provided by the 64 bit programs, then differences at 32 bits.
Note that on AMD and Intel CPUs, with 64 bit working, info.wProcessorArchitecture from GetSystemInfo(&info) indicates PROCESSOR_ARCHITECTURE_AMD64. With 32 bit operation PROCESSOR_ARCHITECTURE_INTEL is supplied.
AMD Windows XP Pro x64
CPUID and RDTSC Assembly Code
CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00020FB1
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz
Has MMX, Has SSE, Has SSE2, Has SSE3, Has 3DNow,
Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
AMD64 processor architecture, 2 CPUs
Windows NT Version 5.2, build 3790, Service Pack 1
Memory 1024 MB, Free 656 MB
User Virtual Space 8388608 MB, Free 8388557 MB
Intel Windows Vista 64-Bit
CPUID and RDTSC Assembly Code
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz
Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
AMD64 processor architecture, 2 CPUs
Windows NT Version 6.0, build 6000,
Memory 4094 MB, Free 3207 MB
User Virtual Space 8388608 MB, Free 8388547 MB
|
Differences Win32 and Win64 at 32 bits
Intel processor architecture, 2 CPUs
User Virtual Space 4096 MB, Free 4047 MB - Win64
User Virtual Space 2048 MB, Free 2022 MB - Win32
Memory 4095 MB, Free 3103 MB - Win64
Memory less than 3.5 GB - Win32
|
The C/C++ and Assembler source codes for these utilities are available in NewSource.zip.
To Start
Maximum CPU Speed
This benchmark CPUID64 (In Win64.zip) is based on the original in Whatcpu.zip. The latter executes a long series of assembler coded add instructions to 1, 2, 3 and 4 registers to identify maximum speeds of integer, floating point and MMX instructions.
The 64 bit version has the same 32 bit integer test and an identical one using 64 bit mode. SSE/SSE2 32/64 bit floating point tests are the same. As indicated above, normal floating point and MMX instructions are invalid under Win64. Instead of MMX, SSE2 32 bit and 64 bit add speeds are measured.
A revised 32 bit version is included in Win64.zip to show SSE2 integer speeds.
In the following example of 64 bit results, Millions of Instructions Per Second (MIPS) are
similar to 32 bit speeds. That would be expected as 32 bit registers use half of real register
size. With more pipelines, 64 bit normal integer MIPS can be faster than using integer SSE2
instructions. As usual, 64 bit floating point MFLOPS (Millions of
FLoating point Operations Per Second) run at half speed compared with 32 bits (2 words versus
4 words in 128 bit registers). This is also the case with AMD on 32/64 bit SSE2 integers, but not
with this Intel CPU.
CPU ID and Speed Test 64 bit Version - Windows XP Pro x64
Assembled with Microsoft ml64.exe Version 8.00.40310.39
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz
Speeds adding to 1 Register 2 Registers 3 Registers 4 Registers
32 bit Integer MIPS 2430 4864 5650 6080
64 bit Integer MIPS 2430 4864 6356 6485
32 bit SSE2 Int MIPS 4421 8895 8844 8844
64 bit SSE2 Int MIPS 2211 4419 4447 4422
32 bit SSE MFLOPS 2214 4421 4421 4434
64 bit SSE2 MFLOPS 1105 2210 2217 2217
CPU ID and Speed Test 64 bit Version - Windows Vista 64-Bit
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz
Speeds adding to 1 Register 2 Registers 3 Registers 4 Registers
32 bit Integer MIPS 2609 4081 5261 7044
64 bit Integer MIPS 2613 4177 5225 7044
32 bit SSE2 Int MIPS 9488 14638 17542 17466
64 bit SSE2 Int MIPS 2401 4575 4585 4575
32 bit SSE MFLOPS 3201 6405 9607 9607
64 bit SSE2 MFLOPS 1601 3202 4804 4804
|
Download Win64.zip
To Start
Maximum MP Speed
This benchmark CPUIDMP64 (In DualCore.zip) uses some of the instruction sequences from CPUID64. First an integer and an SSE floating point test are run separately. They are then run as two threads of equal priority, where both should run at full speed with 2 CPUs. Finally, an FP test is started with another and two integer tests at lower priority. With 2 CPUs, the FP test should run at full speed and the others at the whim of the OS. A 32 bit version is included in DualCore.zip using the same 32 bit instructions. Results of 64 bit and 32 bit tests can be expected to be the same except possibly for sharing with 4 threads.
When run on a single CPU, the floating point and integer tests are likely to run at half speed with two threads. With four threads, the lower priority tests might obtain a small amount of time.
CPU ID and MP Speed Test 64 bit Version - Windows XP Pro x64
Assembled with Microsoft ml64.exe Version 8.00.40310.39
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz
Speed adding to registers Pass 1 Pass 2 Pass 3
Separate Tests
32 bit SSE MFLOPS 4411 4411 4415
32 bit Integer MIPS 6068 6070 6070
Two Threads Equal Priority
32 bit SSE MFLOPS 4405 4409 4408
32 bit Integer MIPS 6067 6053 5992
Four Threads, First Normal Priority, Others Normal - 1
32 bit SSE MFLOPS 4401 4411 4410
32 bit Integer MIPS 2903 2053 2898
32 bit SSE MFLOPS 0 1433 0
32 bit Integer MIPS 3454 2227 3455
CPU ID and MP Speed Test 64 bit Version - Windows Vista 64-Bit
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz
Speed adding to registers Pass 1 Pass 2 Pass 3
Separate Tests
32 bit SSE MFLOPS 9582 9595 9600
32 bit Integer MIPS 6934 6936 6950
Two Threads Equal Priority
32 bit SSE MFLOPS 9501 9600 9600
32 bit Integer MIPS 7002 7006 7013
Four Threads, First Normal Priority, Others Normal - 1
32 bit SSE MFLOPS 9592 9575 9576
32 bit Integer MIPS 3447 3414 3329
32 bit SSE MFLOPS 4844 0 0
32 bit Integer MIPS 0 3337 3366
|
Download DualCore.zip
To Start
Classic Benchmarks
The Classic Benchmarks are the first programs that set standards of performance for computers. Details are available from Classic.htm and benchmark programs and results obtained via BenchNT.zip. The Linpack, Livermore Loops and Whetstone Benchmarks have been compiled for 64 bit systems and for 32 bit PCs using automatic compilation using SSE or SSE2 instructions.
The benchmarks and sample results can be obtained from Win64.zip or DualCore.zip and source codes from NewSource.zip.
Linpack and Livermore Loops benchmarks use double precision floating point, so are compiled with SSE2 instructions. The compilers are not as efficient as they could be, producing instructions using one 64 bit word in the 128 bit registers, rather than two for Single Instruction Multiple Data (SIMD) operation.
The 64 bit Linpack results on Core 2 Duo are disappointing.
Linpack Benchmark Results
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
Measured 2211 MHz and XP Pro x64
Original SSE2 Win32 SSE2 Win64
838 MFLOPS 1014 MFLOPS 1044 MFLOPS
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
Measured 2402 MHz and Vista 64-Bit
Original SSE2 Win32 SSE2 Win64
1315 MFLOPS 1480 MFLOPS 823 MFLOPS
|
There are 24 Livermore Loops whose performance is measured in MFLOPS also with average
results, Geometric Mean being the official average quoted. Following are results for the
original Watcom version, 32 bits with SSE2 and 64 bits with SSE2. The 64 bit compilation can
use up to 16 registers to speed up processing. However, some 32 bit SSE2 results are faster
as are a few from the original Watcom version. The Intel Core 2 Duo results, compiled for 64 bits,
are more frequently slower than at 32 bits. See below for Whetstone Benchmark.
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz and XP Pro x64
********************************************************
Livermore Loops Benchmark Original Optimised via C/C++
MFLOPS for 24 loops
2032.4 1312.4 345.7 1031.6 275.6 334.9 2565.9 2288.0 2337.2 121.9 183.7 550.5
49.8 131.6 393.5 350.4 217.7 1474.2 309.3 290.9 612.8 458.5 751.6 294.6
Overall Ratings
Maximum Average Geomean Harmean Minimum
2565.9 740.9 460.5 285.7 48.4
********************************************************
Livermore Loops Benchmark 32 Bit Version
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
MFLOPS for 24 loops
1619.1 1187.1 717.9 1068.3 244.9 606.3 1815.6 1727.1 1907.6 670.2 200.4 549.6
169.4 317.6 737.7 654.7 684.2 1452.0 455.0 762.5 1031.2 406.1 590.3 219.2
Overall Ratings
Maximum Average Geomean Harmean Minimum
1907.6 798.3 640.1 501.3 162.3
********************************************************
Livermore Loops Benchmark 64 Bit Version
Via 64 Bit Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
MFLOPS for 24 loops
1927.4 1118.3 1096.7 1054.2 252.2 320.3 2284.1 2099.2 1756.2 632.4 183.6 731.0
173.1 306.3 552.9 732.4 922.6 1441.3 500.8 881.3 328.6 351.8 758.0 397.8
Overall Ratings
Maximum Average Geomean Harmean Minimum
2284.1 843.7 660.7 509.1 165.7
######################################################
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz and Vista 64-Bit
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
MFLOPS for 24 loops
1960.1 1357.0 788.5 1471.0 341.2 891.9 2526.4 2044.9 2153.0 860.1 265.8 1181.5
458.5 555.0 444.0 1018.2 824.4 1073.6 505.2 632.3 1235.3 194.7 772.0 278.2
Overall Ratings
Maximum Average Geomean Harmean Minimum
2526.4 990.3 803.8 639.0 194.7
********************************************************
Via Microsoft 64 Bit C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
MFLOPS for 24 loops
626.3 835.2 594.5 589.0 341.3 406.3 886.7 1040.6 1098.2 391.0 239.0 398.3
349.7 397.8 320.7 857.1 1038.9 714.0 639.2 429.5 418.0 227.3 838.1 673.5
Overall Ratings
Maximum Average Geomean Harmean Minimum
1175.0 592.9 537.0 484.9 227.2
|
To Start
Whetstone Benchmark
The Whetstone Benchmark produces an overall rating in terms of Millions of Whetstone Instructions Per Second (MWIPS). The version used also produces speeds in MFLOPS and MOPS for the eight test loops, three with straight floating point, two with intrinsic functions and three with integer type operations. An overall average (Geometric) is produced for the first three and equivalent VAX MIPS for the last three. The single precision version of the benchmark was compiled to use SSE instructions.
This is quite a bit faster than the original version but Vax MIPS are over-inflated due to excessive optimisation.
MP Version
The program was modified to use a second thread to execute some of the code and demonstrate the use of two CPUs. The second thread is run at THREAD_PRIORITY_BELOW_NORMAL which sees little time on a single CPU. With dual CPU, both threads should demonstrate full speed . One complication is that the compiler refused to produce the same code for that used by the second thread so there is some variation in speeds.
This MP version was also compiled for 64 bit operation and results are shown below for this, 32 bit MP and 32 bit SSE versions. Floating point speed is similar on both MP versions (and around double that of a single processor) but the 64 bit variety runs one of the integer tests faster by using more registers.
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz and XP Pro x64
Whetstone Single Precision SSE benchmark - single CPU version
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
583 12197 2313 655 656 461 51.0 36.3 1988 2210 3305
********************************************************************************
Whetstone Single Precision MP SSE Benchmark Wed Aug 10 12:38:03 2005
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
1164 19030 4506 1310 1308 920 102 69.7 3598 4139 3702
Thread 1 642 642 452 50.7 34.8 1796 2062 2690
Thread 2 668 666 467 50.8 34.9 1802 2078 1013
********************************************************************************
Whetstone Single Precision MP SSE Benchmark Fri Aug 05 12:18:12 2005
Via Microsoft 64 bit C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
1086 25950 4983 1325 1145 845 151 67.1 3610 4204 9210
Thread 1 661 572 468 75.2 33.5 1804 2099 8067
Thread 2 663 573 377 76.0 33.6 1806 2105 1143
#################################################################################
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz and Vista 64-Bit
Whetstone Single Precision SSE Benchmark Fri Jul 20 17:06:25 2007
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
728 18421 2419 851 855 530 57.2 29.7 1994 1747 14352
********************************************************************************
Whetstone Single Precision MP SSE Benchmark Fri Jul 20 17:06:08 2007
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
1439 23554 4704 1700 1689 1037 113 58.1 3720 3738 7518
Thread 1 845 826 517 56.5 28.9 1871 1797 6439
Thread 2 855 863 520 56.8 29.3 1848 1941 1079
********************************************************************************
Whetstone Single Precision MP SSE Benchmark Fri Jul 20 17:06:45 2007
Via Microsoft 64 bit C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
1417 26543 5661 1723 1608 1026 157 77.4 3645 3096 13257
Thread 1 862 805 530 78.1 38.5 1809 1535 12268
Thread 2 861 803 496 78.4 39.0 1837 1560 989
|
To Start
BusSpeed MP Benchmark
This MP benchmark uses variable amounts of memory to measure speed via caches and RAM, first as a single thread, then as two threads to demonstrate the impact of two CPUs. It is based on BusSpd2K (BusSpd2K.zip ) using integer AND instructions to a single register, streaming data from caches or RAM. The first test reads one word with a 32 word address increment for the next word. That is 128 bytes with 32 bit words and 256 bytes with 64 bit words. The address increment reduces for following tests to one word (ReadAll). The last test reads all 16 byte SSE2 data. With two threads, each reads all the data, with total passes same as with one thread. BusSpd2K can produce some faster results as streaming to two registers is used for some tests. Except for SSE2, C compiler code is used for the tests as this is similar to assembly code in BusSpd2K. Results of benchmarks compiled for 32 and 64 bit systems are shown below. The benchmarks are in DualCore.zip.
Looking at RAM speed, the system reads data in 64 byte bursts - 16 word address increments at 32 bits and 8 word increments at 64 bits. This is demonstrated by no/little performance gain with larger address increments. Speed at 64 bits will appear to be twice as fast as 32 bits as twice as much data is being used out of the burst. Typical burst speed at 32 bits is 319 MB/sec and maximum speed can be assumed to be 16 times this or 5104 MB/sec (maximum theoretical 2 x 3200). In this case, the memory buses appear to be saturated and there is no gain with 2 CPUs. As the address increment is reduced speed increases to around 3000 MB/sec using one thread or 4700 MB/sec with two threads. These are similar speeds to a BusSpd2K two program test (see DualCore.htm), indicating a performance limitation with a single CPU.
Results via caches are strange. A sample from 32 bit BusSpd2K is included below to explain possible reasons. Firstly, BusSpd2K uses just MOV instructions for the burst tests. It shows halving of speed from caches from 32 byte (8 word) increments to 64 byte (16 word) and BusSpdMP goes one step further to 32 words address increments. BusSpd2K also shows half speed from L1 cache when ANDing to 1 register instead of 2. With BusMP, the compiler refused to translate code for two registers as hoped for.
Most cache based results do not show expected performance gains on using 2 CPUs but, at least, it is better at 64 bits. Inner loops of the tests have 64 AND instructions and an outer loops runs this for around 0.5 seconds (a long time and little difference at 0.1 seconds). Maybe the cause is cache flushing with some data coming from RAM.
The above comments relate to the tests on the PC with an AMD CPU and using windows XP Pro x64.
Later, a Core 2 Duo PC results are given, using 64-Bit Windows Vista. This has faster RAM, larger
and faster L2 cache and faster operation on SSE2 instructions.
AMD Athlon 64(tm) X2 Dual Core Processor 4200+ Measured 2211 MHz, XP Pro x64
##############################################################################
Old BusSpd2K Performance Test MBytes/Second
16wds 8wds
MovI MovI MovI MovI MovI MovI AndI AndI MovM MovM
Memory Reg2 Reg2 Reg2 Reg2 Reg1 Reg2 Reg1 Reg2 Reg1 Reg8
KBytes Inc64 Inc32 Inc16 Inc8 Inc4 Inc4 Inc4 Inc4 Inc8 Inc8
4 8070 15711 16498 17247 16538 16763 8670 16454 34291 34254
8 8437 16391 16544 17044 16838 17064 8765 16787 34148 35264
128 639 1281 2437 4400 7782 7780 6539 6694 8882 8684
256 651 1285 2411 4418 7786 7776 6448 6688 8936 8718
65536 315 609 1009 1478 2789 2792 2656 2842 2940 2940
131072 315 610 1007 1457 2793 2791 2704 2803 2940 2941
#####################################################################
MP Bus Speed Test 32 bit Version 1.1 Sun Aug 21 17:05:26 2005
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
SSE2 Assembled with Microsoft ml.exe Version 6.15.8803
Part 1 - Single Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 6209 6371 9953 10084 9887 9962 17458
24 8386 8566 9929 10083 10184 10101 17486
96 842 660 1259 2353 4865 6416 8959
384 356 314 569 887 1435 2788 2962
768 358 314 568 887 1435 2784 2955
1536 360 314 568 886 1434 2785 2955
16384 353 318 565 879 1416 2751 2909
131072 350 318 563 877 1414 2749 2915
Part 2 - Two Threads Total MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 8292 8403 14317 16184 17352 18528 34743
24 12043 13645 17357 18663 19500 19626 34841
96 1789 1317 2489 4693 9679 12783 17819
384 318 329 667 1307 2374 4812 4741
768 318 330 666 1305 2372 4791 4736
1536 317 331 665 1307 2394 4819 4754
16384 323 336 673 1303 2349 4778 4711
131072 322 336 671 1303 2358 4786 4710
##############################################################################
MP Bus Speed Test 64 bit Version 1.1 Sun Aug 21 16:42:21 2005
Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
SSE2 Assembled with Microsoft ml64.exe Version 8.00.40310.39
Part 1 - Single Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 5883 11782 12352 17826 17769 17482 17492
24 12355 16539 16566 17786 17693 17810 17473
96 1941 1536 1344 2414 4532 9624 8869
384 656 719 596 995 1515 2919 2970
768 657 719 593 991 1506 2912 2958
1536 669 733 606 1015 1545 2985 3041
16384 650 710 597 987 1485 2878 2932
131072 644 706 598 985 1483 2874 2927
Part 2 - Two Threads Total MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 8051 16115 16456 25937 28993 31119 29997
24 16474 23432 26224 31526 33113 34146 33258
96 3149 3003 2688 4783 8604 19130 17722
384 597 644 662 1334 2411 4888 4754
768 596 646 662 1332 2414 4882 4745
1536 622 673 689 1379 2497 4693 4914
16384 609 647 673 1332 2392 4866 4718
131072 606 646 674 1336 2398 4863 4727
##############################################################################
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz and Vista 64-Bit
##############################################################################
MP Bus Speed Test 32 bit Version 1.11 Fri Jul 20 14:47:06 2007
Part 1 - Single Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 6751 6808 9050 9198 9270 9258 37375
24 9197 8536 8839 8902 9007 9126 37813
96 2094 2028 3328 4529 6670 7949 19089
384 2099 2027 3332 4525 6657 7953 19115
768 2105 2027 3345 4523 6669 7925 19113
1536 2037 1997 3294 4457 6558 7886 18806
16384 316 387 788 1449 2661 4893 5766
131072 318 385 792 1436 2620 4902 5747
Part 2 - Two Threads Total MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 7897 9372 13674 15367 16514 17224 71915
24 13704 14445 15787 16761 16779 17428 72535
96 3157 3005 5369 7756 12129 14592 31545
384 3224 3046 5440 7628 12059 14629 31558
768 3215 3050 5449 7753 11989 14884 31186
1536 3151 2954 5330 7549 11810 14653 30595
16384 317 433 887 1792 3140 5868 7165
131072 316 431 890 1787 3084 5765 7161
##############################################################################
MP Bus Speed Test 64 bit Version 1.11 Fri Jul 20 14:53:01 2007
Part 1 - Single Thread MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 6304 12699 12904 17342 17446 17443 37201
24 12971 17349 16962 17284 17380 17502 37853
96 4215 4225 4072 6452 9539 13629 19077
384 4228 4228 4070 6460 9554 13679 19060
768 4210 4214 4076 6461 9579 13627 19122
1536 3930 3973 4048 6398 9421 13424 18797
16384 597 634 774 1596 2898 5245 5702
131072 600 636 774 1585 2897 5127 5748
Part 2 - Two Threads Total MBytes/Second
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 5916 11822 12642 17265 19151 20362 72137
24 12410 17491 18825 20290 20947 21166 72328
96 6007 6089 5723 9960 13424 17715 31282
384 6050 6124 5881 10123 13557 17751 31424
768 6109 6137 5840 9991 13459 17586 31079
1536 5952 5978 5755 9846 13108 17207 30502
16384 613 629 846 1694 3219 5317 7037
131072 611 629 840 1690 3217 5260 7172
|
To Start
Another version of the 64 bit benchmark was produced. This just uses the single thread
test, with command line options to select memory size used, running time and log file name.
More than one version can then be run at the same time. Results are shown below for two
programs running concurrently to test L1 cache, L2 cache and RAM. Speed from caches is
seen to double, unlike the same tests using two threads. Results are for the AMD based PC.
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
6 Prog 1 5706 11458 11868 17738 17742 17452 17431
6 Prog 2 5731 11349 11882 17839 17803 17505 17473
96 Prog 1 1926 1498 1347 2449 4495 9619 8796
96 Prog 2 1936 1505 1349 2455 4506 9642 8813
1536 Prog 1 295 319 328 642 1174 2366 2444
1536 Prog 2 299 328 347 694 1240 2334 2809
|
To Start
RandMP Benchmark
This MP benchmark uses variable amounts of memory to measure speed via caches and RAM, first as a single thread, then as two threads to demonstrate the impact of two CPUs. It is based on RandMem (RandMem.zip ) with serial and random read and read/write tests. Serial and random tests use the same code via indexing to read and write 4 byte words e.g. a sequence such as tot = tot & xi[xi[i+ 0]] | xi[xi[i+ 2]] & --- for reading and xi[xi[i+ 0]] = xi[xi[i+ 2]]; for read/write. The inner loops have a more than 600 CPU instructions. RandMP64 and RandMP32 versions to run via Win64 and Win32 can be found in DualCore.zip.
The benchmark has four tests, Serial Read (RD), Serial Read/Write (RW), Random Read and Random Read/Write. With two threads, each has its own code and use the same data but the second thread starts at the half way point. Each has the same number of repeat passes where variations in the time taken are reflected in the relative speeds of the two threads.
Below are example results of the 32 bit version on a single CPU using Windows XP and 64 bit version on a dual core CPU via Windows XP x64 (32 bit version produces very similar results). Using one thread, RW speed is slower than RD and speed reduces more using larger data size with random access. Running two threads on a single CPU produces the same sort of total speed as the single thread. With two CPUs, the speed of read only is mainly around double that of a single thread but speed via caches with read/write can be worse than for a single thread (or single CPU).
Looking at dual core results, with Serial RW and Random RW at 6 KB, the CPU is executing at around 1360 Million Instructions Per Second (MIPS) or 0.62 MIPS/MHz with a single thread. With two threads, each CPU runs at 340 MIPS (0.15 MIPS/MHz) with Serial RW and 154 MIPS (0.07 MIPS/MHz) with Random RW. This can be put down to Windows flushing caches to maintain data coherency.
Modifying the benchmark, so that each thread accesses its own data array, enables RW cache tests to run at 1360 MIPS on each CPU.
The above comments relate to results on the PC with an AMD CPU and using Windows XP Pro x64.
Later results are for a Core 2 Duo system using 64-Bit Windows Vista. RAM on this is nearly twice as fast but the tests show up to 4 times faster.
Measured L1 cache speeds are much faster on the Read/Write tests as they are via L2 cache.
AMD Athlon(tm) XP 2600+ Measured 2088 MHz
#####################################################################
RandMP Write/Read Test 32 bit Version 1.0 Sat Aug 27 19:33:14 2005
Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 7773 7748 3616 895 896 889 891 892
Serial RW 3655 3657 2193 660 663 661 663 658
Random RD 7527 7599 2165 628 313 240 192 57
Random RW 3686 3693 2034 439 190 141 116 44
2 Threads
Serial RD1 4510 4522 2043 444 448 466 447 534
Serial RD2 3911 3906 1813 443 444 442 442 442
Serial RW1 1890 2133 1153 346 328 348 349 340
Serial RW2 1832 1828 1097 327 342 328 328 326
Random RD1 4429 4297 1134 311 169 115 103 31
Random RD2 3781 3803 1067 302 151 116 92 28
Random RW1 1928 1941 1050 219 95 75 61 24
Random RW2 1837 1849 1012 220 92 71 58 22
For approximate speed in MIPS divide MBytes/Second by 3.2
AMD Athlon 64 X2 Dual Core Processor 4200+ Measured 2211 MHz, XP Pro x64
#####################################################################
RandMP Write/Read Test 64 bit Version 1.0 Sat Aug 27 19:17:58 2005
Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 8552 8518 5115 5132 2369 2353 2344 2305
Serial RW 4346 4340 2702 2697 1349 1352 1354 1351
Random RD 8176 8244 3733 1620 872 389 255 170
Random RW 4384 4332 2865 1483 563 236 161 136
2 Threads
Serial RD1 8374 8532 5064 5010 2075 2096 2021 2026
Serial RD2 8532 8394 5176 5108 2111 2062 2049 2054
Serial RW1 1090 1172 1110 1096 1041 867 864 866
Serial RW2 1083 1136 1089 1076 1049 866 855 824
Random RD1 8147 8024 3683 1638 485 193 126 100
Random RD2 8154 8158 3701 1637 485 195 125 101
Random RW1 494 489 448 406 352 152 86 75
Random RW2 495 490 449 406 343 152 87 75
For approximate speed in MIPS divide MBytes/Second by 3.2
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz and Vista 64-Bit
#####################################################################
RandMP Write/Read Test 64 bit Version 1.0 Sat Aug 27 19:17:58 2005
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 8742 9128 7498 7468 7486 7429 4417 4391
Serial RW 8428 9332 7665 7663 7662 7165 2442 2397
Random RD 8918 9404 4244 3304 3183 2790 638 458
Random RW 8014 8523 3390 2752 2656 2462 418 289
2 Threads
Serial RD1 8435 9094 7334 7336 7365 7238 4024 2817
Serial RD2 8460 8943 7183 7168 7201 7159 3962 2764
Serial RW1 2007 2181 6931 6995 6984 6738 1643 1521
Serial RW2 2010 2174 6789 6801 6806 6651 1568 1433
Random RD1 8576 9392 3530 2695 2604 2292 450 443
Random RD2 8598 9180 3478 2666 2553 2256 455 443
Random RW1 730 759 1409 1984 1991 1923 282 292
Random RW2 733 759 1398 1955 1961 1897 277 292
|
To Start
Roy Longbottom October 2007
At the time of writing, Virgin FreeSpace Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|