Title

PC Benchmarks For 64 Bit Windows

Index

     
Conversion
System ID
Maximum CPU Speed
Maximum MP Speed
Classic Benchmarks
Whetstone MP
BusSpeed MP
Rand MP
More 64 Bit Tests
     
       
Main Page
Other Results
Download Win64.zip
Download DualCore.zip

This page was set up as 770 pixels wide and accommodates preformatted text <PRE> results tables. Some browsers
produce monospaced font of an unexpected size but this might be adjustable via browser Preferences.

Conversion

Most of my benchmarks have been converted to run as 64 bit programs and have been tested via Windows XP Pro x64 and 64 bit Windows Vista. The first step was to download Microsoft Platform SDK for Windows Server 2003 SP1. This includes a 64 bit compiler (cl), assembler (ml64) and linker. These can be used via the command line or in a .BAT file and the package can be installed using Win32 or Win64. For Windows programs, .RC files can be converted (rc) to .RES files and the latter to .OBJ (cvtres). Library names used are the same as Win32, like GDI32.LIB.

The compiler does not accept asm type assembler functions so these have to be converted to MASM format but headers and .INC files are different to 32 bit varieties. 64 bit systems cannot run the old x87 floating point instructions nor MMX instructions. The former have to be converted to SSE1/2/3 instructions. MMX instruction names are the same as some provided in SSE2 but memory addresses have to be changed to suit 128 bit registers instead of 64 bits. 32 bit instructions can still be used, including CPUID and RDTSC. The only complication appears to be that push/pop should refer to a 64 bit register (push rdx instead of push edx). There appears to be complications in passing parameters to assembly code but I have avoided this by using global variables.

The SDK includes a 32 bit compiler that checks for 64 bit compatibility and has options to use SSE or SSE2 instructions for floating point. In some cases, this produces identical code to the 64 bit version. In other cases it restricts the number of registers used for 32 bit compatibility. MASM type assembly requires an assembler that comes with such as Microsoft Visual C++ 6.0 Pro. In order to compare 64 versus 32 bit speeds, some of the benchmarks have also been compiled using the SDK 32 bit compiler.

The C/C++ and Assembler source codes for these benchmarks are available in NewSource.zip. The original versions can be obtained via the Main Page.

To Start

More 64 Bit Benchmarks

Windows, DirectDraw, OpenGL and Image Processing benchmarks have also been converted to run at 64 bits and a DirectX 9 benchmark has also been produced. See 64 Bit Graphics Tests.htm. Download benchmarks and C/C++ source codes via Video64.zip Then, there are benchmarks for disks, CD/DVD drives, networks and peripherals in More64bit.zip with results in 64 Bit Disk Tests.htm.

The latest conversions, including source code, are also in More64bit.zip. These are three versions of my Fast Fourier Transform benchmarks (see also FFTGraf.zip), SSE/SSE2 benchmark and burn-in/reliability tests (see also SSE3Dnow.zip) and BusSpd2K burn-in/reliability tests (see also BusSpd2K.zip). The latter burn-in tests have been modified to demonstrate paging speeds more quickly. See Paging.htm for results via 64-Bit Vista and XP Pro x64.

To Start

Other Results

Results of 64 bit tests, descriptions and some comparisons are included in results reports for 32 bit versions.

Whetstone Results.htm Linpack Results.htm
Livermore Loops Results.htm WhatCPU Results.htm
BusSpd2K Results.htm SSE3Dnow Results.htm
Randmem Results.htm FFTgraf Results.htm
BMPspeed Results.htm 64 Bit Graphics Tests.htm
BurnIn64.htm DualCore.htm
DiskGraf Results.htm CDDVDSpd Results.htm
VideoWin Results.htm DirectDraw Results.htm
Direct3D Results.htm OpenGL Results.htm


To Start

System ID

Each benchmark includes a new system identification test. This is limited because Intel appear to make significant changes with each new CPU (now much too complicated for identifying dual CPUs with HT enabled or cache sizes on a range of CPUs). Windows functions also considerably lag on hardware capabilities. The following shows details provided by the 64 bit programs, then differences at 32 bits. Note that on AMD and Intel CPUs, with 64 bit working, info.wProcessorArchitecture from GetSystemInfo(&info) indicates PROCESSOR_ARCHITECTURE_AMD64. With 32 bit operation PROCESSOR_ARCHITECTURE_INTEL is supplied.


  AMD Windows XP Pro x64 

 CPUID and RDTSC Assembly Code
 CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00020FB1
 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz 
 Has MMX, Has SSE, Has SSE2, Has SSE3, Has 3DNow, 
 Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
 AMD64 processor architecture, 2 CPUs 
 Windows NT  Version 5.2, build 3790, Service Pack 1
 Memory 1024 MB, Free 656 MB
 User Virtual Space 8388608 MB, Free 8388557 MB

  Intel Windows Vista 64-Bit 

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
  Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz Measured 2402 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  AMD64 processor architecture, 2 CPUs 
  Windows NT  Version 6.0, build 6000, 
  Memory 4094 MB, Free 3207 MB
  User Virtual Space 8388608 MB, Free 8388547 MB

Differences Win32 and Win64 at 32 bits


 Intel processor architecture, 2 CPUs 
 User Virtual Space 4096 MB, Free 4047 MB - Win64
 User Virtual Space 2048 MB, Free 2022 MB - Win32
 Memory 4095 MB, Free 3103 MB             - Win64
 Memory less than 3.5 GB                  - Win32
    

The C/C++ and Assembler source codes for these utilities are available in NewSource.zip.

To Start

Maximum CPU Speed

This benchmark CPUID64 (In Win64.zip) is based on the original in Whatcpu.zip. The latter executes a long series of assembler coded add instructions to 1, 2, 3 and 4 registers to identify maximum speeds of integer, floating point and MMX instructions. The 64 bit version has the same 32 bit integer test and an identical one using 64 bit mode. SSE/SSE2 32/64 bit floating point tests are the same. As indicated above, normal floating point and MMX instructions are invalid under Win64. Instead of MMX, SSE2 32 bit and 64 bit add speeds are measured. A revised 32 bit version is included in Win64.zip to show SSE2 integer speeds.

In the following example of 64 bit results, Millions of Instructions Per Second (MIPS) are similar to 32 bit speeds. That would be expected as 32 bit registers use half of real register size. With more pipelines, 64 bit normal integer MIPS can be faster than using integer SSE2 instructions. As usual, 64 bit floating point MFLOPS (Millions of FLoating point Operations Per Second) run at half speed compared with 32 bits (2 words versus 4 words in 128 bit registers). This is also the case with AMD on 32/64 bit SSE2 integers, but not with this Intel CPU.


     CPU ID and Speed Test 64 bit Version - Windows XP Pro x64
 
       Assembled with Microsoft ml64.exe Version 8.00.40310.39

  AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz

 Speeds adding to     1 Register  2 Registers  3 Registers  4 Registers

 32 bit Integer MIPS     2430         4864         5650         6080
 64 bit Integer MIPS     2430         4864         6356         6485
 32 bit SSE2 Int MIPS    4421         8895         8844         8844
 64 bit SSE2 Int MIPS    2211         4419         4447         4422
 32 bit SSE MFLOPS       2214         4421         4421         4434
 64 bit SSE2 MFLOPS      1105         2210         2217         2217


     CPU ID and Speed Test 64 bit Version - Windows Vista 64-Bit

      Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz Measured 2402 MHz

 Speeds adding to     1 Register  2 Registers  3 Registers  4 Registers

 32 bit Integer MIPS     2609         4081         5261         7044
 64 bit Integer MIPS     2613         4177         5225         7044
 32 bit SSE2 Int MIPS    9488        14638        17542        17466
 64 bit SSE2 Int MIPS    2401         4575         4585         4575
 32 bit SSE MFLOPS       3201         6405         9607         9607
 64 bit SSE2 MFLOPS      1601         3202         4804         4804

Download Win64.zip

To Start

Maximum MP Speed

This benchmark CPUIDMP64 (In DualCore.zip) uses some of the instruction sequences from CPUID64. First an integer and an SSE floating point test are run separately. They are then run as two threads of equal priority, where both should run at full speed with 2 CPUs. Finally, an FP test is started with another and two integer tests at lower priority. With 2 CPUs, the FP test should run at full speed and the others at the whim of the OS. A 32 bit version is included in DualCore.zip using the same 32 bit instructions. Results of 64 bit and 32 bit tests can be expected to be the same except possibly for sharing with 4 threads.

When run on a single CPU, the floating point and integer tests are likely to run at half speed with two threads. With four threads, the lower priority tests might obtain a small amount of time.


   CPU ID and MP Speed Test 64 bit Version - Windows XP Pro x64
 
     Assembled with Microsoft ml64.exe Version 8.00.40310.39

  AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz

  Speed adding to registers   Pass 1   Pass 2   Pass 3

  Separate Tests
  32 bit SSE   MFLOPS          4411     4411     4415
  32 bit Integer MIPS          6068     6070     6070

  Two Threads Equal Priority
  32 bit SSE   MFLOPS          4405     4409     4408
  32 bit Integer MIPS          6067     6053     5992

  Four Threads, First Normal Priority, Others Normal - 1
  32 bit SSE   MFLOPS          4401     4411     4410
  32 bit Integer MIPS          2903     2053     2898
  32 bit SSE   MFLOPS             0     1433        0
  32 bit Integer MIPS          3454     2227     3455


  CPU ID and MP Speed Test 64 bit Version - Windows Vista 64-Bit

      Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz Measured 2402 MHz

  Speed adding to registers   Pass 1   Pass 2   Pass 3

  Separate Tests
  32 bit SSE   MFLOPS          9582     9595     9600
  32 bit Integer MIPS          6934     6936     6950

  Two Threads Equal Priority
  32 bit SSE   MFLOPS          9501     9600     9600
  32 bit Integer MIPS          7002     7006     7013

  Four Threads, First Normal Priority, Others Normal - 1
  32 bit SSE   MFLOPS          9592     9575     9576
  32 bit Integer MIPS          3447     3414     3329
  32 bit SSE   MFLOPS          4844        0        0
  32 bit Integer MIPS             0     3337     3366

Download DualCore.zip

To Start

Classic Benchmarks

The Classic Benchmarks are the first programs that set standards of performance for computers. Details are available from Classic.htm and benchmark programs and results obtained via BenchNT.zip. The Linpack, Livermore Loops and Whetstone Benchmarks have been compiled for 64 bit systems and for 32 bit PCs using automatic compilation using SSE or SSE2 instructions. The benchmarks and sample results can be obtained from Win64.zip or DualCore.zip and source codes from NewSource.zip.

Linpack and Livermore Loops benchmarks use double precision floating point, so are compiled with SSE2 instructions. The compilers are not as efficient as they could be, producing instructions using one 64 bit word in the 128 bit registers, rather than two for Single Instruction Multiple Data (SIMD) operation. The 64 bit Linpack results on Core 2 Duo are disappointing.


           Linpack Benchmark Results

  AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ 
      Measured 2211 MHz and XP Pro x64

    Original       SSE2 Win32         SSE2 Win64

  838 MFLOPS      1014 MFLOPS        1044 MFLOPS

    Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz 
      Measured 2402 MHz and Vista 64-Bit

    Original       SSE2 Win32         SSE2 Win64

 1315 MFLOPS      1480 MFLOPS         823 MFLOPS

There are 24 Livermore Loops whose performance is measured in MFLOPS also with average results, Geometric Mean being the official average quoted. Following are results for the original Watcom version, 32 bits with SSE2 and 64 bits with SSE2. The 64 bit compilation can use up to 16 registers to speed up processing. However, some 32 bit SSE2 results are faster as are a few from the original Watcom version. The Intel Core 2 Duo results, compiled for 64 bits, are more frequently slower than at 32 bits. See below for Whetstone Benchmark.


  AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz and XP Pro x64

 ********************************************************

 Livermore Loops Benchmark Original Optimised via C/C++

 MFLOPS for 24 loops

 2032.4 1312.4  345.7 1031.6  275.6  334.9 2565.9 2288.0 2337.2  121.9  183.7  550.5
   49.8  131.6  393.5  350.4  217.7 1474.2  309.3  290.9  612.8  458.5  751.6  294.6

 Overall Ratings
 Maximum Average Geomean Harmean Minimum

  2565.9   740.9   460.5   285.7    48.4
 
 ********************************************************

 Livermore Loops Benchmark 32 Bit Version

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS for 24 loops

 1619.1 1187.1  717.9 1068.3  244.9  606.3 1815.6 1727.1 1907.6  670.2  200.4  549.6
  169.4  317.6  737.7  654.7  684.2 1452.0  455.0  762.5 1031.2  406.1  590.3  219.2

 Overall Ratings
 Maximum Average Geomean Harmean Minimum

  1907.6   798.3   640.1   501.3   162.3

 ********************************************************

 Livermore Loops Benchmark 64 Bit Version

 Via 64 Bit Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64

 MFLOPS for 24 loops

 1927.4 1118.3 1096.7 1054.2  252.2  320.3 2284.1 2099.2 1756.2  632.4  183.6  731.0
  173.1  306.3  552.9  732.4  922.6 1441.3  500.8  881.3  328.6  351.8  758.0  397.8

 Overall Ratings
 Maximum Average Geomean Harmean Minimum

  2284.1   843.7   660.7   509.1   165.7

 ######################################################

 Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz Measured 2402 MHz and Vista 64-Bit

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS for 24 loops

 1960.1 1357.0  788.5 1471.0  341.2  891.9 2526.4 2044.9 2153.0  860.1  265.8 1181.5
  458.5  555.0  444.0 1018.2  824.4 1073.6  505.2  632.3 1235.3  194.7  772.0  278.2

 Overall Ratings
 Maximum Average Geomean Harmean Minimum
  2526.4   990.3   803.8   639.0   194.7

 ********************************************************

 Via Microsoft 64 Bit C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64

 MFLOPS for 24 loops
  626.3  835.2  594.5  589.0  341.3  406.3  886.7 1040.6 1098.2  391.0  239.0  398.3
  349.7  397.8  320.7  857.1 1038.9  714.0  639.2  429.5  418.0  227.3  838.1  673.5

 Overall Ratings
 Maximum Average Geomean Harmean Minimum
  1175.0   592.9   537.0   484.9   227.2

To Start

Whetstone Benchmark

The Whetstone Benchmark produces an overall rating in terms of Millions of Whetstone Instructions Per Second (MWIPS). The version used also produces speeds in MFLOPS and MOPS for the eight test loops, three with straight floating point, two with intrinsic functions and three with integer type operations. An overall average (Geometric) is produced for the first three and equivalent VAX MIPS for the last three. The single precision version of the benchmark was compiled to use SSE instructions. This is quite a bit faster than the original version but Vax MIPS are over-inflated due to excessive optimisation.

MP Version

The program was modified to use a second thread to execute some of the code and demonstrate the use of two CPUs. The second thread is run at THREAD_PRIORITY_BELOW_NORMAL which sees little time on a single CPU. With dual CPU, both threads should demonstrate full speed . One complication is that the compiler refused to produce the same code for that used by the second thread so there is some variation in speeds. This MP version was also compiled for 64 bit operation and results are shown below for this, 32 bit MP and 32 bit SSE versions. Floating point speed is similar on both MP versions (and around double that of a single processor) but the 64 bit variety runs one of the integer tests faster by using more registers.


 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Measured 2211 MHz and XP Pro x64


 Whetstone Single Precision SSE benchmark - single CPU version

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

    583  12197   2313    655    656    461   51.0   36.3   1988   2210   3305

 ********************************************************************************

 Whetstone Single Precision MP SSE Benchmark Wed Aug 10 12:38:03 2005

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

   1164  19030   4506   1310   1308    920    102   69.7   3598   4139   3702
  Thread 1               642    642    452   50.7   34.8   1796   2062   2690
  Thread 2               668    666    467   50.8   34.9   1802   2078   1013

 ********************************************************************************

 Whetstone Single Precision MP SSE Benchmark Fri Aug 05 12:18:12 2005

 Via Microsoft 64 bit C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

   1086  25950   4983   1325   1145    845    151   67.1   3610   4204   9210
  Thread 1               661    572    468   75.2   33.5   1804   2099   8067
  Thread 2               663    573    377   76.0   33.6   1806   2105   1143

 #################################################################################

 Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz Measured 2402 MHz and Vista 64-Bit

 Whetstone Single Precision SSE Benchmark Fri Jul 20 17:06:25 2007

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS
    728  18421   2419    851    855    530   57.2   29.7   1994   1747  14352

 ********************************************************************************

 Whetstone Single Precision MP SSE Benchmark Fri Jul 20 17:06:08 2007

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS
   1439  23554   4704   1700   1689   1037    113   58.1   3720   3738   7518
  Thread 1               845    826    517   56.5   28.9   1871   1797   6439
  Thread 2               855    863    520   56.8   29.3   1848   1941   1079

 ********************************************************************************

 Whetstone Single Precision MP SSE Benchmark Fri Jul 20 17:06:45 2007

 Via Microsoft 64 bit C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS
   1417  26543   5661   1723   1608   1026    157   77.4   3645   3096  13257
  Thread 1               862    805    530   78.1   38.5   1809   1535  12268
  Thread 2               861    803    496   78.4   39.0   1837   1560    989

To Start

BusSpeed MP Benchmark

This MP benchmark uses variable amounts of memory to measure speed via caches and RAM, first as a single thread, then as two threads to demonstrate the impact of two CPUs. It is based on BusSpd2K (BusSpd2K.zip ) using integer AND instructions to a single register, streaming data from caches or RAM. The first test reads one word with a 32 word address increment for the next word. That is 128 bytes with 32 bit words and 256 bytes with 64 bit words. The address increment reduces for following tests to one word (ReadAll). The last test reads all 16 byte SSE2 data. With two threads, each reads all the data, with total passes same as with one thread. BusSpd2K can produce some faster results as streaming to two registers is used for some tests. Except for SSE2, C compiler code is used for the tests as this is similar to assembly code in BusSpd2K. Results of benchmarks compiled for 32 and 64 bit systems are shown below. The benchmarks are in DualCore.zip.

Looking at RAM speed, the system reads data in 64 byte bursts - 16 word address increments at 32 bits and 8 word increments at 64 bits. This is demonstrated by no/little performance gain with larger address increments. Speed at 64 bits will appear to be twice as fast as 32 bits as twice as much data is being used out of the burst. Typical burst speed at 32 bits is 319 MB/sec and maximum speed can be assumed to be 16 times this or 5104 MB/sec (maximum theoretical 2 x 3200). In this case, the memory buses appear to be saturated and there is no gain with 2 CPUs. As the address increment is reduced speed increases to around 3000 MB/sec using one thread or 4700 MB/sec with two threads. These are similar speeds to a BusSpd2K two program test (see DualCore.htm), indicating a performance limitation with a single CPU.

Results via caches are strange. A sample from 32 bit BusSpd2K is included below to explain possible reasons. Firstly, BusSpd2K uses just MOV instructions for the burst tests. It shows halving of speed from caches from 32 byte (8 word) increments to 64 byte (16 word) and BusSpdMP goes one step further to 32 words address increments. BusSpd2K also shows half speed from L1 cache when ANDing to 1 register instead of 2. With BusMP, the compiler refused to translate code for two registers as hoped for.

Most cache based results do not show expected performance gains on using 2 CPUs but, at least, it is better at 64 bits. Inner loops of the tests have 64 AND instructions and an outer loops runs this for around 0.5 seconds (a long time and little difference at 0.1 seconds). Maybe the cause is cache flushing with some data coming from RAM.

The above comments relate to the tests on the PC with an AMD CPU and using windows XP Pro x64. Later, a Core 2 Duo PC results are given, using 64-Bit Windows Vista. This has faster RAM, larger and faster L2 cache and faster operation on SSE2 instructions.


 AMD Athlon 64(tm) X2 Dual Core Processor 4200+ Measured 2211 MHz, XP Pro x64

 ##############################################################################
 
 Old BusSpd2K Performance Test MBytes/Second

         16wds  8wds

          MovI  MovI  MovI  MovI  MovI  MovI  AndI  AndI  MovM  MovM
  Memory  Reg2  Reg2  Reg2  Reg2  Reg1  Reg2  Reg1  Reg2  Reg1  Reg8
  KBytes Inc64 Inc32 Inc16  Inc8  Inc4  Inc4  Inc4  Inc4  Inc8  Inc8

      4   8070 15711 16498 17247 16538 16763  8670 16454 34291 34254
      8   8437 16391 16544 17044 16838 17064  8765 16787 34148 35264

    128    639  1281  2437  4400  7782  7780  6539  6694  8882  8684
    256    651  1285  2411  4418  7786  7776  6448  6688  8936  8718

  65536    315   609  1009  1478  2789  2792  2656  2842  2940  2940
 131072    315   610  1007  1457  2793  2791  2704  2803  2940  2941

 #####################################################################
      MP Bus Speed Test 32 bit Version 1.1 Sun Aug 21 17:05:26 2005
 
 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86
        SSE2 Assembled with Microsoft ml.exe Version 6.15.8803

                  Part 1 - Single Thread MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     6209     6371     9953    10084     9887     9962    17458
       24     8386     8566     9929    10083    10184    10101    17486
       96      842      660     1259     2353     4865     6416     8959
      384      356      314      569      887     1435     2788     2962
      768      358      314      568      887     1435     2784     2955
     1536      360      314      568      886     1434     2785     2955
    16384      353      318      565      879     1416     2751     2909
   131072      350      318      563      877     1414     2749     2915

               Part 2 - Two Threads Total MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     8292     8403    14317    16184    17352    18528    34743
       24    12043    13645    17357    18663    19500    19626    34841
       96     1789     1317     2489     4693     9679    12783    17819
      384      318      329      667     1307     2374     4812     4741
      768      318      330      666     1305     2372     4791     4736
     1536      317      331      665     1307     2394     4819     4754
    16384      323      336      673     1303     2349     4778     4711
   131072      322      336      671     1303     2358     4786     4710

 ##############################################################################

      MP Bus Speed Test 64 bit Version 1.1 Sun Aug 21 16:42:21 2005
 
 Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
       SSE2 Assembled with Microsoft ml64.exe Version 8.00.40310.39

                  Part 1 - Single Thread MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     5883    11782    12352    17826    17769    17482    17492
       24    12355    16539    16566    17786    17693    17810    17473
       96     1941     1536     1344     2414     4532     9624     8869
      384      656      719      596      995     1515     2919     2970
      768      657      719      593      991     1506     2912     2958
     1536      669      733      606     1015     1545     2985     3041
    16384      650      710      597      987     1485     2878     2932
   131072      644      706      598      985     1483     2874     2927

               Part 2 - Two Threads Total MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     8051    16115    16456    25937    28993    31119    29997
       24    16474    23432    26224    31526    33113    34146    33258
       96     3149     3003     2688     4783     8604    19130    17722
      384      597      644      662     1334     2411     4888     4754
      768      596      646      662     1332     2414     4882     4745
     1536      622      673      689     1379     2497     4693     4914
    16384      609      647      673     1332     2392     4866     4718
   131072      606      646      674     1336     2398     4863     4727

 ##############################################################################

 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz and Vista 64-Bit

 ##############################################################################

      MP Bus Speed Test 32 bit Version 1.11 Fri Jul 20 14:47:06 2007

                  Part 1 - Single Thread MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     6751     6808     9050     9198     9270     9258    37375
       24     9197     8536     8839     8902     9007     9126    37813
       96     2094     2028     3328     4529     6670     7949    19089
      384     2099     2027     3332     4525     6657     7953    19115
      768     2105     2027     3345     4523     6669     7925    19113
     1536     2037     1997     3294     4457     6558     7886    18806
    16384      316      387      788     1449     2661     4893     5766
   131072      318      385      792     1436     2620     4902     5747

               Part 2 - Two Threads Total MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     7897     9372    13674    15367    16514    17224    71915
       24    13704    14445    15787    16761    16779    17428    72535
       96     3157     3005     5369     7756    12129    14592    31545
      384     3224     3046     5440     7628    12059    14629    31558
      768     3215     3050     5449     7753    11989    14884    31186
     1536     3151     2954     5330     7549    11810    14653    30595
    16384      317      433      887     1792     3140     5868     7165
   131072      316      431      890     1787     3084     5765     7161

 ##############################################################################

      MP Bus Speed Test 64 bit Version 1.11 Fri Jul 20 14:53:01 2007

                  Part 1 - Single Thread MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     6304    12699    12904    17342    17446    17443    37201
       24    12971    17349    16962    17284    17380    17502    37853
       96     4215     4225     4072     6452     9539    13629    19077
      384     4228     4228     4070     6460     9554    13679    19060
      768     4210     4214     4076     6461     9579    13627    19122
     1536     3930     3973     4048     6398     9421    13424    18797
    16384      597      634      774     1596     2898     5245     5702
   131072      600      636      774     1585     2897     5127     5748

               Part 2 - Two Threads Total MBytes/Second

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

        6     5916    11822    12642    17265    19151    20362    72137
       24    12410    17491    18825    20290    20947    21166    72328
       96     6007     6089     5723     9960    13424    17715    31282
      384     6050     6124     5881    10123    13557    17751    31424
      768     6109     6137     5840     9991    13459    17586    31079
     1536     5952     5978     5755     9846    13108    17207    30502
    16384      613      629      846     1694     3219     5317     7037
   131072      611      629      840     1690     3217     5260     7172

To Start

Another version of the 64 bit benchmark was produced. This just uses the single thread test, with command line options to select memory size used, running time and log file name. More than one version can then be run at the same time. Results are shown below for two programs running concurrently to test L1 cache, L2 cache and RAM. Speed from caches is seen to double, unlike the same tests using two threads. Results are for the AMD based PC.


   Kbytes    Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

    6 Prog 1     5706    11458    11868    17738    17742    17452    17431
    6 Prog 2     5731    11349    11882    17839    17803    17505    17473

   96 Prog 1     1926     1498     1347     2449     4495     9619     8796
   96 Prog 2     1936     1505     1349     2455     4506     9642     8813

 1536 Prog 1      295      319      328      642     1174     2366     2444
 1536 Prog 2      299      328      347      694     1240     2334     2809

To Start

RandMP Benchmark

This MP benchmark uses variable amounts of memory to measure speed via caches and RAM, first as a single thread, then as two threads to demonstrate the impact of two CPUs. It is based on RandMem (RandMem.zip ) with serial and random read and read/write tests. Serial and random tests use the same code via indexing to read and write 4 byte words e.g. a sequence such as tot = tot & xi[xi[i+ 0]] | xi[xi[i+ 2]] & --- for reading and xi[xi[i+ 0]] = xi[xi[i+ 2]]; for read/write. The inner loops have a more than 600 CPU instructions. RandMP64 and RandMP32 versions to run via Win64 and Win32 can be found in DualCore.zip.

The benchmark has four tests, Serial Read (RD), Serial Read/Write (RW), Random Read and Random Read/Write. With two threads, each has its own code and use the same data but the second thread starts at the half way point. Each has the same number of repeat passes where variations in the time taken are reflected in the relative speeds of the two threads.

Below are example results of the 32 bit version on a single CPU using Windows XP and 64 bit version on a dual core CPU via Windows XP x64 (32 bit version produces very similar results). Using one thread, RW speed is slower than RD and speed reduces more using larger data size with random access. Running two threads on a single CPU produces the same sort of total speed as the single thread. With two CPUs, the speed of read only is mainly around double that of a single thread but speed via caches with read/write can be worse than for a single thread (or single CPU).

Looking at dual core results, with Serial RW and Random RW at 6 KB, the CPU is executing at around 1360 Million Instructions Per Second (MIPS) or 0.62 MIPS/MHz with a single thread. With two threads, each CPU runs at 340 MIPS (0.15 MIPS/MHz) with Serial RW and 154 MIPS (0.07 MIPS/MHz) with Random RW. This can be put down to Windows flushing caches to maintain data coherency. Modifying the benchmark, so that each thread accesses its own data array, enables RW cache tests to run at 1360 MIPS on each CPU.

The above comments relate to results on the PC with an AMD CPU and using Windows XP Pro x64. Later results are for a Core 2 Duo system using 64-Bit Windows Vista. RAM on this is nearly twice as fast but the tests show up to 4 times faster. Measured L1 cache speeds are much faster on the Read/Write tests as they are via L2 cache.


           AMD Athlon(tm) XP 2600+ Measured 2088 MHz

 #####################################################################
  RandMP Write/Read Test 32 bit Version 1.0 Sat Aug 27 19:33:14 2005
 
 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86


               ------------------ MBytes Per Second At --------------------
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB
 1 Thread
 Serial RD     7773    7748    3616     895     896     889     891     892
 Serial RW     3655    3657    2193     660     663     661     663     658
 Random RD     7527    7599    2165     628     313     240     192      57
 Random RW     3686    3693    2034     439     190     141     116      44

 2 Threads
 Serial RD1    4510    4522    2043     444     448     466     447     534
 Serial RD2    3911    3906    1813     443     444     442     442     442

 Serial RW1    1890    2133    1153     346     328     348     349     340
 Serial RW2    1832    1828    1097     327     342     328     328     326

 Random RD1    4429    4297    1134     311     169     115     103      31
 Random RD2    3781    3803    1067     302     151     116      92      28

 Random RW1    1928    1941    1050     219      95      75      61      24
 Random RW2    1837    1849    1012     220      92      71      58      22

           For approximate speed in MIPS divide MBytes/Second by 3.2


 AMD Athlon 64 X2 Dual Core Processor 4200+ Measured 2211 MHz, XP Pro x64

 #####################################################################
  RandMP Write/Read Test 64 bit Version 1.0 Sat Aug 27 19:17:58 2005
 
 Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64

               ------------------ MBytes Per Second At --------------------
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB
 1 Thread
 Serial RD     8552    8518    5115    5132    2369    2353    2344    2305
 Serial RW     4346    4340    2702    2697    1349    1352    1354    1351
 Random RD     8176    8244    3733    1620     872     389     255     170
 Random RW     4384    4332    2865    1483     563     236     161     136

 2 Threads
 Serial RD1    8374    8532    5064    5010    2075    2096    2021    2026
 Serial RD2    8532    8394    5176    5108    2111    2062    2049    2054

 Serial RW1    1090    1172    1110    1096    1041     867     864     866
 Serial RW2    1083    1136    1089    1076    1049     866     855     824

 Random RD1    8147    8024    3683    1638     485     193     126     100
 Random RD2    8154    8158    3701    1637     485     195     125     101

 Random RW1     494     489     448     406     352     152      86      75
 Random RW2     495     490     449     406     343     152      87      75

           For approximate speed in MIPS divide MBytes/Second by 3.2

 
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz and Vista 64-Bit

 #####################################################################
  RandMP Write/Read Test 64 bit Version 1.0 Sat Aug 27 19:17:58 2005

               ------------------ MBytes Per Second At --------------------
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB
 1 Thread
 Serial RD     8742    9128    7498    7468    7486    7429    4417    4391
 Serial RW     8428    9332    7665    7663    7662    7165    2442    2397
 Random RD     8918    9404    4244    3304    3183    2790     638     458
 Random RW     8014    8523    3390    2752    2656    2462     418     289

 2 Threads
 Serial RD1    8435    9094    7334    7336    7365    7238    4024    2817
 Serial RD2    8460    8943    7183    7168    7201    7159    3962    2764

 Serial RW1    2007    2181    6931    6995    6984    6738    1643    1521
 Serial RW2    2010    2174    6789    6801    6806    6651    1568    1433

 Random RD1    8576    9392    3530    2695    2604    2292     450     443
 Random RD2    8598    9180    3478    2666    2553    2256     455     443

 Random RW1     730     759    1409    1984    1991    1923     282     292
 Random RW2     733     759    1398    1955    1961    1897     277     292

To Start

Roy Longbottom October 2007

At the time of writing, Virgin FreeSpace Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection