Paging Problems Including 64-Bit Vista
Summary
The following new PC with 4 GB RAM initially appeared to have only 3 GB. This was corrected by enabling Memory Remapping in BIOS.
Core 2 Duo 2400 MHz, Asus P5B motherboard, 800 MHz DDR2 RAM,
Seagate ST3400633AS SATA-300 disk, 16 MB buffer, 7200 RPM,
GeForce 8600 GT graphics, Windows Vista 64-Bit.
Testing at 3 GB indicated slow performance on a benchmark that requested little more than 1 GB.
Earlier tests, using Windows XP Pro x64 on a PC with 1 GB RAM, produced worse than expected speeds
with paging. This appears to be an issue with 64-Bit Windows relating to creation of bitmaps and fast BitBlt copying being available for use with larger images.
With 4 GB of RAM being available and usable via 64-Bit Vista (and 1 GB with XP x64), a benchmark was run to measure the impact of paging. The main observation is the speed contrast due to paging, when too much RAM is requested, can be enormous and much slower than using normal disk input/output. So, careful consideration of data size is needed when programming.
Further measurements show that 64-Bit Vista can be significantly faster that Windows XP x64 as paging speeds are random access linked and Vista can read up to 64 KB at a time, compared with a fixed 4 KB with XP.
Data that can be allocated for a single data array within the 2 GB User Virtual Space with 32 bit Windows was found to be 1.2 GB with XP and 1.5 GB using Windows 2000. Virtual Space for a 32 bit application is shown as 4 GB via 64 bit Windows but only 2 GB could be used. With 64 bit applications, 8192 GB is shown and arrays of up to 8 GB could be allocated using 64-Bit Vista (and 4 GB RAM) but less than 6 GB with XP Pro x64 (1 GB RAM).
BMPSpeed Benchmark
BMPSpeed Benchmark generates BMP files up to 512 MB. It measures speed of saving, loading, scrolling, rotating and editing of 0.5, 1, 2, 4 etc. MB files upwards.
Pre-compiled versions of the benchmarks can be found in
BMPSpd.zip
which also contains the source code and more detailed explanations.
Results for a wide range of systems are in
BMPSpeed Results.htm.
A 64 bit version is also available in
Video64.zip
with comparisons in
64 Bit Graphics Tests.htm.
See also
My Home Page
for other PC benchmarks and results.
Extra copies of the images for
editing result in memory demands of more than twice the largest image size,
leading to possible paging to/from disk. Five tests are run at each size, run
times being saved in log file BMPTime.txt.
1 - Enlarge with blur editing (copy with add/divide instructions) and display.
2 - Save enlargement to disk.
3 - Load from disk, format and display.
4 - Copy from memory scrolling.
5 - Make an extra copy rotating 90 degrees and display.
Data transfer speeds in MB/second are also recorded for Test 4 where
displayed data might be from video RAM cache, main RAM or disk page
file. The benchmark also produces real and virtual memory usage statistics.
To Start
Results With 3 GB
BMP Benchmark Version 2.2x for 64 bit Windows Fri Jul 20 15:37:32 2007
Copyright Roy Longbottom 1999 - 2006
Input Enlarge Save Load Scroll Scroll Rotate Use
Image Display Display /Repeat Overall 90 deg Fast
Mbytes Secs Secs Secs msecs MB/Sec Secs BitBlt
0.5 0.05 0.01 0.05 0.1 4748.4 0.02 3
1.0 0.05 0.02 0.08 0.3 4463.6 0.03 3
2.0 0.07 0.02 0.11 1.1 2475.2 0.04 3
4.0 0.09 0.03 0.19 2.4 1866.0 0.06 3
8.0 0.13 0.08 0.31 2.9 1765.0 0.10 3
16.0 0.20 0.24 0.48 2.7 1832.5 0.17 3
32.0 0.26 0.52 0.78 2.9 1741.2 0.28 3
64.0 0.39 1.08 1.38 2.9 1760.0 0.52 3
128.0 0.68 2.37 2.63 2.9 1740.3 1.03 3
256.0 1.35 4.62 5.38 3.1 1645.6 4.39 3
512.0 27.91 13.05 10.59 3.2 1595.6 57.11 3
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz
AMD64 processor architecture, 2 CPUs
Windows NT Version 6.0, build 6000,
Memory Status Maximum Use
Mbytes of physical memory 3006
Percent of memory in use 81
Free physical memory Mbytes 567
Mbytes of paging file 6215
Free Mbytes of paging file 2967
User Mbytes of virtual space 8388607
Free user virtual Mbytes 8387500
Screen setting 1280 x 1024 x 32 bits = 5.2 MB
End at Fri Jul 20 15:40:34 2007
|
To Start
More Results
The displaying method comes from a 1997 Microsoft sample program, ShowDib. This uses CreateDIBitmap so that fast BitBlt copying can be used. In the past, the size that can be created for fast copying could vary, depending on the version of Windows and graphics driver. Most recent results via Windows XP showed a limit of 64 MB.
In the case of my benchmark, when the DIB cannot be created, the slower StretchDIBits method is used to copy part of the image to the display. Although it should have been clear that CreateDIBitmap would use more memory, it was not obvious on older systems with limited and slower main RAM.
Tests show that the DIBs are at 32 bits, 33% larger than the original BMP data. So, a 512 MB image increases to 682 MB and the program can have two open. RAM space used is outside the user’s virtual space but can show up via free memory space (if large enough) and free paging file space.
Below are Enlarge and Rotate speeds at 256 and 512 MB using 64-Bit Vista and XP Pro x64 with four versions of the benchmark, the original, the 64 bit version, a 32 bit version via a later MS compiler and a version that uses StretchDIBits for the larger images. Also shown are RAM, PageFile and User Virtual Space usage. Some Windows XP results with different RAM size are shown for comparison purposes.
- 64-Bit Vista speeds are much better than 3 GB RAM when 4 GB is available
- Speed and memory occupancy is similar with 64 bit, 32 bit and original benchmarks
- RAM and PageFile use is increased when using CreateDIBitmap (for fast BitBlt copying) vs StretchDIBits
- 64-Bit Windows can use CreateDIBitmap for larger images and this can lead to poor performance due to excessive paging
- Enlarge/Rotate (no paging) speeds can be faster when StretchDIBits is used
- 64-Bit Windows uses 50 to 60 MB more User Virtual Space than Windows XP
|
What is not shown is the reduction in scrolling speed using StretchDIBits which, on the Vista PC at 256 MB, was 1706 MB/second or 3.0 milliseconds per screen using BitBlt, to 171 MB/second or 29.5 milliseconds with Stretch. The XP x64 PC results were 4.3 to 32.9 milliseconds.
To Start
|
RAM BMP Enlarge Rotate Free Free Used Used Used Used Used
GB MB Secs Secs MB RAM MB RAM MB Pgfile Pgfile MB MB
Start End RAM Start End Pgfile Virtual
C2D Vista 3 256 1.35 4.39
64 bit 512 27.91 57.11 567 3248 1107
4 256 1.20 4.13
512 2.32 5.80 3126 877 2249 959 3288 2329 1107
32 bit 4 256 1.30 4.22
512 2.53 5.91 3170 897 2273 957 3275 2318 1094
Original 4 256 1.48 4.52
512 2.80 8.15 3182 900 2282 N/A N/A N/A 1094
Stretch 4 256 0.76 3.76
64 bit 512 1.35 4.53 3169 2170 999 915 1936 1021 1107
AMD XP x64 1 256 119.28 58.51
64 bit 512 335.83 832.41 518 183 N/A 415 2734 2319 1081
32 bit 1 256 71.92 88.30
512 246.43 971.95 801 129 N/A 407 2736 2329 1076
Original 1 256 47.39 99.84
512 189.28 1061.02 524 192 N/A 411 2616 2205 1072
Stretch 1 256 0.59 9.27
64 bit 512 8.40 160.08 607 60 N/A 409 1439 1030 1081
P4 XP 0.5 256 66.79 184.56
Original 512 140.41 148.08 421 40 N/A 18 1037 1019 1047
P4 XP 1 256 1.30 7.05
Original 512 1.88 35.21 131 1122 1036
C2D XP 2 256 1.21 5.48
Original 512 1.71 6.53 608 1302 1054
|
To Start
|
4 GB Data
With 8192 GB of user virtual memory available using 64-Bit Windows, compared with 2 GB via 32-Bit versions, it is tempting to write programs with vast data arrays instead of bothering with frequent disk input and output. Some would claim that, when paging is necessary, it will be just as fast as normal disk data transfers.
I ran some tests using IntBurn64 in
More64bit.zip
and the 32 bit version or reliability test in
BusSpd2k.zip.
These are designed to run at the highest speed whilst checking for correct results at a chosen data size and minimum running time. There are six tests with write and read once, using different data patterns. This is followed by 6 tests with read only. Each of the latter is preceded by an untimed write/read and an extra read pass to calibrate the number of read passes needed for the chosen time. This is a significant overhead when one pass is used.
Following is an example log file for the Core 2 Duo with 64-Bit Vista, running for the minimum time at 3860000 KB (3.68 GB) where Vista managed to find sufficient memory space for the last three reading tests at full speed. Maximum write/read speed, at lower memory demands, is around 3300 MB/second, with the first test usually at about 2200 MB/second. With the total running time being too long at 1 hour 24 minutes, I produced a version of the 64 bit benchmark that runs just one write/read test in order to measure paging speeds with data size up to 4 GB and higher.
To Start
64 Bit Integer Reliability Test Version 1.0 for 64 bit OS
Copyright (C) Roy Longbottom 2006
Batch Command KB 3860000 SECS 1 P1 LOG INT64RAM.TXT
Test 3860000 KB at 1 seconds per test, Start at Mon Aug 06 20:09:49 2007
Write/Read
1 52 MB/sec Pattern 0000000000000000 Result OK 1 passes
2 21 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 1 passes
3 17 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 1 passes
4 28 MB/sec Pattern 5555555555555555 Result OK 1 passes
5 24 MB/sec Pattern 3333333333333333 Result OK 1 passes
6 18 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 1 passes
Read
1 14 MB/sec Pattern 0000000000000000 Result OK 1 passes
2 23 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 1 passes
3 21 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 1 passes
4 5265 MB/sec Pattern 5555555555555555 Result OK 2 passes
5 5330 MB/sec Pattern 3333333333333333 Result OK 2 passes
6 5301 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 2 passes
Reliability Test Ended Mon Aug 06 21:34:04 2007
|
To Start
Paging Test
As can be seen above, running all 12 tests to measure paging speeds with those memory demands took nearly 25 minutes. The benchmarks have been modified to use a Paging parameter that runs just one write/read test (now in More64bit.zip and BusSpd2k.zip.). The test can only be run from a BAT file with the following example parameters:
Start BusSpd2k Reliability, Paging, KB 100000, Log Paging.txt
Start IntBurn64 Auto, Paging, KB 100000, Log Paging.txt
|
Following are 32 bit and 64 bit results representing the situation where memory demands are slowly increased. Data transfer speed with paging depends on what has run before. For example, suddenly demanding 80% of memory capacity is likely to produce very slow speed.
For 32 bit Windows, the 2 GB virtual memory space is allocated to the application via a table of unmovable sequential addresses. This space also addresses the EXE file and some items for use by Windows. The table can become fragmented, further reducing space available for a single data array. The maximum that could be used was 1,200,000 KB with Windows XP and 1,500,000 KB using Windows 2000. Sometime ago, the BMPSpeed benchmark (see above) was modified so that XP could run using 512 MB images, where memory demands included 2 x 512 MB, 256 MB and 128 MB. The 256 MB was dropped for the last test.
The tables also show normal disk writing/reading speeds. With 32 bit Windows and the two PCs with 512 MB RAM, data transfer rates with paging were relatively good using data size somewhat greater than RAM capacity. Worst case was 3 to 4 times slower than normal disk transfers and 40 to 65 times slower than with data in RAM.
With the 32 bit application running on 64 bit Windows, User Virtual Space is detected as 4 GB by the program. Maximum array size that could be allocated was 2,000,000 KB. At this size with 1 GB RAM, paging speed was 9 times slower than normal disk transfers and 340 times slower than memory based data. Speed had also reduced considerably with 1 GB data.
User Virtual Space is detected as 8192 GB by the 64 bit benchmark but maximum data array size was between 5,000,000 and 6,000,000 KB on the PC with 1 GB RAM and Windows XP x64 then 7,900,000 KB with Vista and 4 GB memory. Performance of the former was essentially the same as the 32 bit program. Vista paging speeds had a higher tendency to improve with a larger data array with worst case 5.5 times slower than normal disk but still 340 times slower than with data in RAM.
To Start
32 Bit BusSpd2K 32 Bit BusSpd2K
CPU Athlon XP Pentium 4
MHz 2088 1900
RAM MB 512 512
Windows 2000 XP
Disk W/R
MB/sec 50 49
KB Secs MB/sec Secs MB/sec
100000 970 532
300000 1 932 2 285
350000 1 929 13 56
400000 6 127 22 38
450000 8 117 19 48
470000 8 118 14 70
480000 8 123 15 64
490000 9 116 24 41
500000 13 80 21 49
510000 15 68 27 38
520000 16 65 23 46
530000 19 58 29 37
540000 21 53 32 35
1200000 154 16 189 13
1300000 N/A
1500000 205 15
1600000 N/A
N/A Cannot allocate memory
32 Bit BusSpd2K 64 Bit IntBurn64 64 Bit IntBurn64
CPU Athlon 64 Athlon 64 Core 2 Duo
MHz 2210 2210 2400
RAM MB 1024 1024 4096
Max used 880 3850
Windows XP x64 XP x64 64-Bit Vista
Disk W/R
MB/sec 55 55 55
KB Secs MB/sec Secs MB/sec KB Secs MB/sec
100000 2040 2041 100000 3393
800000 66 25 1 1976 2500000 2 2868
850000 31 56 23 77 3000000 2 2878
900000 61 30 58 32 3100000 2 2847
920000 118 16 61 31 3200000 2 2899
930000 112 17 91 21 3300000 3 2698
940000 92 21 96 20 3400000 3 2610
950000 114 17 93 21 3500000 7 1075
960000 123 16 89 22 3600000 10 750
970000 124 16 142 14 3700000 17 459
980000 125 16 125 16 3800000 107 73
990000 135 15 119 17 3900000 210 38
1000000 137 15 128 16 4000000 146 56
1100000 188 12 188 12
1200000 223 11 205 12 5000000 1024 10
1300000 380 7 266 10 7000000 652 22
1400000 358 8 358 8 7900000 770 21
8000000 N/A
2000000 683 6 683 6
2100000 N/A 32 Bit BusSpd2K
5000000 1707 6
6000000 N/A 2000000 2 2139
2100000 N/A
N/A Cannot allocate memory
|
To Start
Paging Disk Activity
Tests were run with Performance Monitor logging of Physical Disk Write and Read Bytes and Bytes/Second. The graphs below are extrapolations of million bytes written and read over the 5 second monitoring periods. At least they confirm that the disks can run at 50 to 60 MB/second. CPU utilisation was also reported and was extremely low for most of the time.
Other calculations carried out were for average KB per transfer and this showed a significant difference between XP x64 and 64-Bit Vista. The former consistently produced (approximately) 4 KB per read and 64 KB per write. Vista was completely different, mainly averaging nearly 1000 KB per write for the main writing period. On loading the data there seem to be periods of 16, 32 and 64 KB per read down to 8 KB at the end.
Peak reading speeds are as reflected in my DiskGraf Benchmark for block sizes 4 KB and 64 KB respectively.
The slow speeds will be caused by limited random access due to read following write or accessing fragmented data space. At the end of the Core 2 Duo test, when mainly reading is taking place, average time per read access is around 4 milliseconds, about half a disk revolution. With sequential data, speed would be much faster, either reading directly from the disk or via the disk’s buffer. Overall, it appears that Vista paging can be faster than Windows XP as it has the ability to read data at a larger page size.
To Start
Athlon 64, XP x64, 1 GB RAM, 800000 KB data
Write/Read 205.2 seconds, 8.0 MB/second
Time at RAM speed < 1 second
|
Core 2 Duo, 64-Bit Vista, 4 GB RAM, 5000000 KB data
Write/Read 499.0 seconds, 21 MB/second
Time at RAM speed < 3 seconds
|
To Start
|
Roy Longbottom October 2007
At the time of writing, Virgin FreeSpace Internet Home for my benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|