Title

Paging Problems Including 64-Bit Vista


Summary

The following new PC with 4 GB RAM initially appeared to have only 3 GB. This was corrected by enabling Memory Remapping in BIOS.

Core 2 Duo 2400 MHz, Asus P5B motherboard, 800 MHz DDR2 RAM, Seagate ST3400633AS SATA-300 disk, 16 MB buffer, 7200 RPM, GeForce 8600 GT graphics, Windows Vista 64-Bit.

Testing at 3 GB indicated slow performance on a benchmark that requested little more than 1 GB. Earlier tests, using Windows XP Pro x64 on a PC with 1 GB RAM, produced worse than expected speeds with paging. This appears to be an issue with 64-Bit Windows relating to creation of bitmaps and fast BitBlt copying being available for use with larger images.

With 4 GB of RAM being available and usable via 64-Bit Vista (and 1 GB with XP x64), a benchmark was run to measure the impact of paging. The main observation is the speed contrast due to paging, when too much RAM is requested, can be enormous and much slower than using normal disk input/output. So, careful consideration of data size is needed when programming. Further measurements show that 64-Bit Vista can be significantly faster that Windows XP x64 as paging speeds are random access linked and Vista can read up to 64 KB at a time, compared with a fixed 4 KB with XP.

Data that can be allocated for a single data array within the 2 GB User Virtual Space with 32 bit Windows was found to be 1.2 GB with XP and 1.5 GB using Windows 2000. Virtual Space for a 32 bit application is shown as 4 GB via 64 bit Windows but only 2 GB could be used. With 64 bit applications, 8192 GB is shown and arrays of up to 8 GB could be allocated using 64-Bit Vista (and 4 GB RAM) but less than 6 GB with XP Pro x64 (1 GB RAM).


BMPSpeed Benchmark

BMPSpeed Benchmark generates BMP files up to 512 MB. It measures speed of saving, loading, scrolling, rotating and editing of 0.5, 1, 2, 4 etc. MB files upwards. Pre-compiled versions of the benchmarks can be found in BMPSpd.zip which also contains the source code and more detailed explanations. Results for a wide range of systems are in BMPSpeed Results.htm. A 64 bit version is also available in Video64.zip with comparisons in 64 Bit Graphics Tests.htm. See also My Home Page for other PC benchmarks and results.

Extra copies of the images for editing result in memory demands of more than twice the largest image size, leading to possible paging to/from disk. Five tests are run at each size, run times being saved in log file BMPTime.txt.

1 - Enlarge with blur editing (copy with add/divide instructions) and display.
2 - Save enlargement to disk.
3 - Load from disk, format and display.
4 - Copy from memory scrolling.
5 - Make an extra copy rotating 90 degrees and display.

Data transfer speeds in MB/second are also recorded for Test 4 where displayed data might be from video RAM cache, main RAM or disk page file. The benchmark also produces real and virtual memory usage statistics.

To Start

Results With 3 GB


 BMP Benchmark Version 2.2x for 64 bit Windows Fri Jul 20 15:37:32 2007

           Copyright Roy Longbottom 1999 - 2006

   Input Enlarge    Save    Load  Scroll  Scroll  Rotate     Use
   Image Display         Display /Repeat Overall  90 deg    Fast
  Mbytes    Secs    Secs    Secs   msecs  MB/Sec    Secs  BitBlt

     0.5    0.05    0.01    0.05     0.1  4748.4    0.02      3
     1.0    0.05    0.02    0.08     0.3  4463.6    0.03      3
     2.0    0.07    0.02    0.11     1.1  2475.2    0.04      3
     4.0    0.09    0.03    0.19     2.4  1866.0    0.06      3
     8.0    0.13    0.08    0.31     2.9  1765.0    0.10      3
    16.0    0.20    0.24    0.48     2.7  1832.5    0.17      3
    32.0    0.26    0.52    0.78     2.9  1741.2    0.28      3
    64.0    0.39    1.08    1.38     2.9  1760.0    0.52      3
   128.0    0.68    2.37    2.63     2.9  1740.3    1.03      3
   256.0    1.35    4.62    5.38     3.1  1645.6    4.39      3
   512.0   27.91   13.05   10.59     3.2  1595.6   57.11      3

  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
  Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz Measured 2402 MHz
  AMD64 processor architecture, 2 CPUs 
  Windows NT  Version 6.0, build 6000, 
  Memory Status Maximum Use
  Mbytes of physical memory    3006
  Percent of memory in use     81
  Free physical memory Mbytes  567
  Mbytes of paging file        6215
  Free Mbytes of paging file   2967
  User Mbytes of virtual space 8388607
  Free user virtual Mbytes     8387500
  Screen setting 1280 x 1024 x 32 bits =  5.2 MB

                    End at Fri Jul 20 15:40:34 2007

To Start

More Results

The displaying method comes from a 1997 Microsoft sample program, ShowDib. This uses CreateDIBitmap so that fast BitBlt copying can be used. In the past, the size that can be created for fast copying could vary, depending on the version of Windows and graphics driver. Most recent results via Windows XP showed a limit of 64 MB. In the case of my benchmark, when the DIB cannot be created, the slower StretchDIBits method is used to copy part of the image to the display. Although it should have been clear that CreateDIBitmap would use more memory, it was not obvious on older systems with limited and slower main RAM.

Tests show that the DIBs are at 32 bits, 33% larger than the original BMP data. So, a 512 MB image increases to 682 MB and the program can have two open. RAM space used is outside the user’s virtual space but can show up via free memory space (if large enough) and free paging file space.

Below are Enlarge and Rotate speeds at 256 and 512 MB using 64-Bit Vista and XP Pro x64 with four versions of the benchmark, the original, the 64 bit version, a 32 bit version via a later MS compiler and a version that uses StretchDIBits for the larger images. Also shown are RAM, PageFile and User Virtual Space usage. Some Windows XP results with different RAM size are shown for comparison purposes.

  • 64-Bit Vista speeds are much better than 3 GB RAM when 4 GB is available
  • Speed and memory occupancy is similar with 64 bit, 32 bit and original benchmarks
  • RAM and PageFile use is increased when using CreateDIBitmap (for fast BitBlt copying) vs StretchDIBits
  • 64-Bit Windows can use CreateDIBitmap for larger images and this can lead to poor performance due to excessive paging
  • Enlarge/Rotate (no paging) speeds can be faster when StretchDIBits is used
  • 64-Bit Windows uses 50 to 60 MB more User Virtual Space than Windows XP

What is not shown is the reduction in scrolling speed using StretchDIBits which, on the Vista PC at 256 MB, was 1706 MB/second or 3.0 milliseconds per screen using BitBlt, to 171 MB/second or 29.5 milliseconds with Stretch. The XP x64 PC results were 4.3 to 32.9 milliseconds.

To Start


           RAM BMP Enlarge Rotate    Free    Free    Used    Used    Used    Used    Used
            GB  MB   Secs    Secs  MB RAM  MB RAM      MB  Pgfile  Pgfile      MB      MB
                                    Start     End     RAM   Start     End  Pgfile Virtual

 C2D Vista   3 256   1.35    4.39
 64 bit        512  27.91   57.11             567                    3248            1107

             4 256   1.20    4.13
               512   2.32    5.80    3126     877    2249     959    3288    2329    1107

 32 bit      4 256   1.30    4.22
               512   2.53    5.91    3170     897    2273     957    3275    2318    1094

 Original    4 256   1.48    4.52
               512   2.80    8.15    3182     900    2282     N/A     N/A     N/A    1094

 Stretch     4 256   0.76    3.76
 64 bit        512   1.35    4.53    3169    2170     999     915    1936    1021    1107

 AMD XP x64  1 256 119.28   58.51
 64 bit        512 335.83  832.41     518     183     N/A     415    2734    2319    1081

 32 bit      1 256  71.92   88.30
               512 246.43  971.95     801     129     N/A     407    2736    2329    1076

 Original    1 256  47.39   99.84
               512 189.28 1061.02     524     192     N/A     411    2616    2205    1072

 Stretch     1 256   0.59    9.27
 64 bit        512   8.40  160.08     607      60     N/A     409    1439    1030    1081

 P4 XP     0.5 256  66.79  184.56
 Original      512 140.41  148.08     421      40     N/A      18    1037    1019    1047

 P4 XP       1 256   1.30    7.05
 Original      512   1.88   35.21             131                    1122            1036

 C2D XP      2 256   1.21    5.48
 Original      512   1.71    6.53             608                    1302            1054

To Start


4 GB Data

With 8192 GB of user virtual memory available using 64-Bit Windows, compared with 2 GB via 32-Bit versions, it is tempting to write programs with vast data arrays instead of bothering with frequent disk input and output. Some would claim that, when paging is necessary, it will be just as fast as normal disk data transfers.

I ran some tests using IntBurn64 in More64bit.zip and the 32 bit version or reliability test in BusSpd2k.zip. These are designed to run at the highest speed whilst checking for correct results at a chosen data size and minimum running time. There are six tests with write and read once, using different data patterns. This is followed by 6 tests with read only. Each of the latter is preceded by an untimed write/read and an extra read pass to calibrate the number of read passes needed for the chosen time. This is a significant overhead when one pass is used.

Following is an example log file for the Core 2 Duo with 64-Bit Vista, running for the minimum time at 3860000 KB (3.68 GB) where Vista managed to find sufficient memory space for the last three reading tests at full speed. Maximum write/read speed, at lower memory demands, is around 3300 MB/second, with the first test usually at about 2200 MB/second. With the total running time being too long at 1 hour 24 minutes, I produced a version of the 64 bit benchmark that runs just one write/read test in order to measure paging speeds with data size up to 4 GB and higher.

To Start



         64 Bit Integer Reliability Test Version 1.0 for 64 bit OS

                   Copyright (C) Roy Longbottom 2006

  Batch Command KB 3860000 SECS 1 P1 LOG INT64RAM.TXT 

  Test 3860000 KB at 1 seconds per test, Start at Mon Aug 06 20:09:49 2007

 Write/Read
  1      52 MB/sec  Pattern 0000000000000000 	 Result OK         1 passes
  2      21 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK         1 passes
  3      17 MB/sec  Pattern A5A5A5A5A5A5A5A5 	 Result OK         1 passes
  4      28 MB/sec  Pattern 5555555555555555 	 Result OK         1 passes
  5      24 MB/sec  Pattern 3333333333333333 	 Result OK         1 passes
  6      18 MB/sec  Pattern F0F0F0F0F0F0F0F0 	 Result OK         1 passes

 Read
  1      14 MB/sec  Pattern 0000000000000000 	 Result OK         1 passes
  2      23 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK         1 passes
  3      21 MB/sec  Pattern A5A5A5A5A5A5A5A5 	 Result OK         1 passes
  4    5265 MB/sec  Pattern 5555555555555555 	 Result OK         2 passes
  5    5330 MB/sec  Pattern 3333333333333333 	 Result OK         2 passes
  6    5301 MB/sec  Pattern F0F0F0F0F0F0F0F0 	 Result OK         2 passes

             Reliability Test Ended Mon Aug 06 21:34:04 2007


To Start

Paging Test

As can be seen above, running all 12 tests to measure paging speeds with those memory demands took nearly 25 minutes. The benchmarks have been modified to use a Paging parameter that runs just one write/read test (now in More64bit.zip and BusSpd2k.zip.). The test can only be run from a BAT file with the following example parameters:

Start BusSpd2k Reliability, Paging, KB 100000, Log Paging.txt
Start IntBurn64 Auto, Paging, KB 100000, Log Paging.txt

Following are 32 bit and 64 bit results representing the situation where memory demands are slowly increased. Data transfer speed with paging depends on what has run before. For example, suddenly demanding 80% of memory capacity is likely to produce very slow speed.

For 32 bit Windows, the 2 GB virtual memory space is allocated to the application via a table of unmovable sequential addresses. This space also addresses the EXE file and some items for use by Windows. The table can become fragmented, further reducing space available for a single data array. The maximum that could be used was 1,200,000 KB with Windows XP and 1,500,000 KB using Windows 2000. Sometime ago, the BMPSpeed benchmark (see above) was modified so that XP could run using 512 MB images, where memory demands included 2 x 512 MB, 256 MB and 128 MB. The 256 MB was dropped for the last test.

The tables also show normal disk writing/reading speeds. With 32 bit Windows and the two PCs with 512 MB RAM, data transfer rates with paging were relatively good using data size somewhat greater than RAM capacity. Worst case was 3 to 4 times slower than normal disk transfers and 40 to 65 times slower than with data in RAM.

With the 32 bit application running on 64 bit Windows, User Virtual Space is detected as 4 GB by the program. Maximum array size that could be allocated was 2,000,000 KB. At this size with 1 GB RAM, paging speed was 9 times slower than normal disk transfers and 340 times slower than memory based data. Speed had also reduced considerably with 1 GB data.

User Virtual Space is detected as 8192 GB by the 64 bit benchmark but maximum data array size was between 5,000,000 and 6,000,000 KB on the PC with 1 GB RAM and Windows XP x64 then 7,900,000 KB with Vista and 4 GB memory. Performance of the former was essentially the same as the 32 bit program. Vista paging speeds had a higher tendency to improve with a larger data array with worst case 5.5 times slower than normal disk but still 340 times slower than with data in RAM.

To Start


            32 Bit BusSpd2K   32 Bit BusSpd2K

      CPU      Athlon XP         Pentium 4
      MHz        2088              1900
   RAM MB         512               512
  Windows        2000                XP
 Disk W/R
   MB/sec          50                49

        KB    Secs  MB/sec      Secs  MB/sec

    100000             970               532

    300000       1     932         2     285
    350000       1     929        13      56
    400000       6     127        22      38
    450000       8     117        19      48
    470000       8     118        14      70
    480000       8     123        15      64
    490000       9     116        24      41
    500000      13      80        21      49
    510000      15      68        27      38
    520000      16      65        23      46
    530000      19      58        29      37
    540000      21      53        32      35

   1200000     154      16       189      13
   1300000                               N/A
   1500000     205      15
   1600000             N/A

   N/A Cannot allocate memory 
      

           32 Bit BusSpd2K   64 Bit IntBurn64          64 Bit IntBurn64

      CPU     Athlon 64         Athlon 64                  Core 2 Duo
      MHz        2210              2210                       2400
   RAM MB        1024              1024                       4096
 Max used                           880                       3850
  Windows       XP x64            XP x64                  64-Bit Vista
 Disk W/R
   MB/sec          55                55                         55

        KB    Secs  MB/sec      Secs  MB/sec         KB    Secs  MB/sec

    100000            2040              2041     100000            3393

    800000      66      25         1    1976    2500000       2    2868
    850000      31      56        23      77    3000000       2    2878
    900000      61      30        58      32    3100000       2    2847
    920000     118      16        61      31    3200000       2    2899
    930000     112      17        91      21    3300000       3    2698
    940000      92      21        96      20    3400000       3    2610
    950000     114      17        93      21    3500000       7    1075
    960000     123      16        89      22    3600000      10     750
    970000     124      16       142      14    3700000      17     459
    980000     125      16       125      16    3800000     107      73
    990000     135      15       119      17    3900000     210      38
   1000000     137      15       128      16    4000000     146      56
   1100000     188      12       188      12
   1200000     223      11       205      12    5000000    1024      10
   1300000     380       7       266      10    7000000     652      22
   1400000     358       8       358       8    7900000     770      21
                                                8000000             N/A
   2000000     683       6       683       6
   2100000             N/A                             32 Bit BusSpd2K 
   5000000                      1707       6
   6000000                               N/A    2000000       2    2139      
                                                2100000             N/A

   N/A Cannot allocate memory 


To Start


Paging Disk Activity

Tests were run with Performance Monitor logging of Physical Disk Write and Read Bytes and Bytes/Second. The graphs below are extrapolations of million bytes written and read over the 5 second monitoring periods. At least they confirm that the disks can run at 50 to 60 MB/second. CPU utilisation was also reported and was extremely low for most of the time.

Other calculations carried out were for average KB per transfer and this showed a significant difference between XP x64 and 64-Bit Vista. The former consistently produced (approximately) 4 KB per read and 64 KB per write. Vista was completely different, mainly averaging nearly 1000 KB per write for the main writing period. On loading the data there seem to be periods of 16, 32 and 64 KB per read down to 8 KB at the end. Peak reading speeds are as reflected in my DiskGraf Benchmark for block sizes 4 KB and 64 KB respectively.

The slow speeds will be caused by limited random access due to read following write or accessing fragmented data space. At the end of the Core 2 Duo test, when mainly reading is taking place, average time per read access is around 4 milliseconds, about half a disk revolution. With sequential data, speed would be much faster, either reading directly from the disk or via the disk’s buffer. Overall, it appears that Vista paging can be faster than Windows XP as it has the ability to read data at a larger page size.

To Start



Athlon 64, XP x64, 1 GB RAM, 800000 KB data
Write/Read 205.2 seconds, 8.0 MB/second
Time at RAM speed < 1 second


Core 2 Duo, 64-Bit Vista, 4 GB RAM, 5000000 KB data
Write/Read 499.0 seconds, 21 MB/second
Time at RAM speed < 3 seconds



To Start




Roy Longbottom October 2007

At the time of writing, Virgin FreeSpace Internet Home for my benchmarks is via the link
Roy Longbottom's PC Benchmark Collection