I would like to highlight several resources describing what programmers must know about how computers work.
An article ‘What every programmer must know about memory’ by Drepper (linked below) talks primarily about NUMA CPU and L1 and L2 caches. Although a little dated, it is still very valuable. The reason it is dated is that we are not optimizing our code just to lay it out on cache lines anymore, although alignment is and will be incredibly important. Alignment of data in general is very important and in some cases mandatory, for example, when we pass data from the host (CPU) to an accelerator (GPU) using constant buffers the data must be aligned, but we have to worry about a lot more in modern computing, only five years since the article had been written.
Hardware and hence programming have changed significantly since 2007 and if we look at the following graph which I borrowed from Herb Sutter’s article (reference below), we can make a conclusion that our code now must be written for a variety of devices and accelerators and we have to be able to visualize code architectures executing in parallel on thousands of cores.
We will not see significant performance gains in single thread applications in the future by merely upgrading to a new hardware. There will be some gains based on advancements in new compilers, but the lesson is that we have to start thinking in terms of computing in the cloud (borrowed the following picture from Herb Sutter as well).
Please read the Drepper’s article to understand how L-caches work. I will also refer you to Mark Russinovich presentation to understand how modern RAM memory is managed (at about minute 23 in video #2). The following images were borrowed from the presentation.
When the computer boots, it looks at pages in the Free Page list (red box in the middle)
and starts moving them to the Zero-Page list. The operation is highly optimized and performs as fast as hardware allows.
Once memory allocations are requested, the system starts creating working sets grabbing pages from the zero-page list.
Once the working set grows big enough, the system starts trimming it by moving pages either to the stand-by list from where they can be reused immediately (this is an instantly reusable cache of pages that can be sent back to the working set) or to modified list.
Modified list is a list of pages which were written to or modified by an executing program and must be persisted to the disk. This operation is ‘lazy’ where a worker (modified page writer below) periodically wakes up and flushes the page out and then moves it to the stand-by list.
Global valid fault is a shared page reused between several processes, you can see it in VMMap from SysINternals under sharable memory.
Once a page is released (stack is released for example), the page is put back onto the free-page list. Because the page contains private data from a process it cannot be reused and given to a different process, it must be reset.
But there are some allocations that can be satisfied from the free-page list. That happens when the page will be immediately overwritten by the system, for example when IO reads data from a disk and writes to that page
or when a soft-fault happens and page is returned to the same process that it was freed from
Below screenshot from process explorer shows the breakdown and available memory sizes on my machine. You can see that free-page list has nothing in it, but there are many pages on the standby and zeroed.
I wanted to close with some information you can get by running SysInternals tools and PowerShell scripts. For example, in the following output from CoreInfo important information is LineSize (please read Drepper to understand why) and L-cache information.
Logical Processor to NUMA Node Map: ******** NUMA Node 0 Logical Processor to Cache Map: **------ Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 **------ Instruction Cache 0, Level 1, 32 KB, Assoc 4, LineSize 64 **------ Unified Cache 0, Level 2, 256 KB, Assoc 8, LineSize 64 ******** Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 --**---- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 --**---- Instruction Cache 1, Level 1, 32 KB, Assoc 4, LineSize 64 --**---- Unified Cache 2, Level 2, 256 KB, Assoc 8, LineSize 64 ----**-- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 ----**-- Instruction Cache 2, Level 1, 32 KB, Assoc 4, LineSize 64 ----**-- Unified Cache 3, Level 2, 256 KB, Assoc 8, LineSize 64 ------** Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ------** Instruction Cache 3, Level 1, 32 KB, Assoc 4, LineSize 64 ------** Unified Cache 4, Level 2, 256 KB, Assoc 8, LineSize 64
You can also run PowerShell script to get a similar information:
PS C:\WINDOWS\system32> C:\ISEScript\ProcessorInfo.ps1 Caption : Intel64 Family 6 Model 30 Stepping 5 DeviceID : CPU0 Manufacturer : GenuineIntel MaxClockSpeed : 2128 Name : Intel(R) Core(TM) i7 CPU X 940 @ 2.13GHz SocketDesignation : CPU 1 BlockSize : 1024 CacheSpeed : CacheType : 5 DeviceID : Cache Memory 0 InstalledSize : 8192 Level : 5 MaxCacheSize : 8192 NumberOfBlocks : 8192 Status : OK BlockSize : 1024 CacheSpeed : CacheType : 5 DeviceID : Cache Memory 1 InstalledSize : 256 Level : 4 MaxCacheSize : 256 NumberOfBlocks : 256 Status : OK BlockSize : 1024 CacheSpeed : CacheType : 4 DeviceID : Cache Memory 2 InstalledSize : 32 Level : 3 MaxCacheSize : 32 NumberOfBlocks : 32 Status : OK
Here’s the script that produced it:
$computer = "." Get-WmiObject -Class "Win32_Processor" -ComputerName $computer Write-Host Get-WmiObject win32_cachememory Write-Host Get-WmiObject Win32_Processor | Format-List * Write-Host $props = Get-WmiObject -Class "Win32_Processor" -Namespace "root\CIMV2" -ComputerName $computer foreach ($item in $props) { write-host "Address Width: " $item.AddressWidth write-host "Architecture: " $item.Architecture write-host "Availability: " $item.Availability write-host "Caption: " $item.Caption write-host "Configuration Manager Error Code: " $item.ConfigManagerErrorCode write-host "Configuration Manager User Configuration: " $item.ConfigManagerUserConfig write-host "CPU Status: " $item.CpuStatus write-host "Creation Class Name: " $item.CreationClassName write-host "Current Clock Speed: " $item.CurrentClockSpeed write-host "Current Voltage: " $item.CurrentVoltage write-host "Data Width: " $item.DataWidth write-host "Description: " $item.Description write-host "Device ID: " $item.DeviceID write-host "Error Cleared: " $item.ErrorCleared write-host "Error Description: " $item.ErrorDescription write-host "Ext Clock: " $item.ExtClock write-host "Family: " $item.Family write-host "Installation Date: " $item.InstallDate write-host "L2 Cache Size: " $item.L2CacheSize write-host "L2 Cache Speed: " $item.L2CacheSpeed write-host "Last Error Code: " $item.LastErrorCode write-host "Level: " $item.Level write-host "Load Percentage: " $item.LoadPercentage write-host "Manufacturer: " $item.Manufacturer write-host "Maximum Clock Speed: " $item.MaxClockSpeed write-host "Name: " $item.Name write-host "Other Family Description: " $item.OtherFamilyDescription write-host "PNP Device ID: " $item.PNPDeviceID write-host "Power Management Capabilities: " $item.PowerManagementCapabilities write-host "Power Management Supported: " $item.PowerManagementSupported write-host "Processor ID: " $item.ProcessorId write-host "Processor Type: " $item.ProcessorType write-host "Revision: " $item.Revision write-host "Role: " $item.Role write-host "Socket Designation: " $item.SocketDesignation write-host "Status: " $item.Status write-host "Status Information: " $item.StatusInfo write-host "Stepping: " $item.Stepping write-host "System Creation Class Name: " $item.SystemCreationClassName write-host "System Name: " $item.SystemName write-host "Unique ID: " $item.UniqueId write-host "Upgrade Method: " $item.UpgradeMethod write-host "Version: " $item.Version write-host "Voltage Caps: " $item.VoltageCaps write-host } get-wmiobject MSAcpi_ThermalZoneTemperature -namespace "root/wmi" ` | select CurrentTemperature,InstanceName get-wmiobject -namespace "root/wmi" -list | findstr Temp
And finally, the following script will give you information about L-caches:
$computer = "." Get-WmiObject Win32_CacheMemory | Format-List * Write-Host $props = Get-WmiObject -Class "Win32_CacheMemory" -Namespace "root\CIMV2" -ComputerName $computer foreach ($item in $props) { Write-Host "Access: " $item.Access Write-Host "AdditionalErrorData: " $item.AdditionalErrorData Write-Host "Associativity: " $item.Associativity Write-Host "Availability: " $item.Availability Write-Host "BlockSize: " $item.BlockSize Write-Host "CacheSpeed: " $item.CacheSpeed Write-Host "CacheType: " $item.CacheType Write-Host "Caption: " $item.Caption Write-Host "ConfigManagerErrorCode: " $item.ConfigManagerErrorCode Write-Host "ConfigManagerUserConfig: " $item.ConfigManagerUserConfig Write-Host "CorrectableError: " $item.CorrectableError Write-Host "CreationClassName: " $item.CreationClassName Write-Host "CurrentSRAM: " $item.CurrentSRAM Write-Host "Description: " $item.Description Write-Host "DeviceID: " $item.DeviceID Write-Host "EndingAddress: " $item.EndingAddress Write-Host "ErrorAccess: " $item.ErrorAccess Write-Host "ErrorAddress: " $item.ErrorAddress Write-Host "ErrorCleared: " $item.ErrorCleared Write-Host "ErrorCorrectType: " $item.ErrorCorrectType Write-Host "ErrorData: " $item.ErrorData Write-Host "ErrorDataOrder: " $item.ErrorDataOrder Write-Host "ErrorDescription: " $item.ErrorDescription Write-Host "ErrorInfo: " $item.ErrorInfo Write-Host "ErrorMethodology: " $item.ErrorMethodology Write-Host "ErrorResolution: " $item.ErrorResolution Write-Host "ErrorTime: " $item.ErrorTime Write-Host "ErrorTransferSize: " $item.ErrorTransferSize Write-Host "FlushTimer: " $item.FlushTimer Write-Host "InstallDate: " $item.InstallDate Write-Host "InstalledSize: " $item.InstalledSize Write-Host "LastErrorCode: " $item.LastErrorCode Write-Host "Level: " $item.Level Write-Host "LineSize: " $item.LineSize Write-Host "Location: " $item.Location Write-Host "MaxCacheSize: " $item.MaxCacheSize Write-Host "Name: " $item.Name Write-Host "NumberOfBlocks: " $item.NumberOfBlocks Write-Host "OtherErrorDescription: " $item.OtherErrorDescription Write-Host "PNPDeviceID: " $item.PNPDeviceID Write-Host "PowerManagementCapabilities: " $item.PowerManagementCapabilities Write-Host "PowerManagementSupported: " $item.PowerManagementSupported Write-Host "Purpose: " $item.Purpose Write-Host "ReadPolicy: " $item.ReadPolicy Write-Host "ReplacementPolicy: " $item.ReplacementPolicy Write-Host "StartingAddress: " $item.StartingAddress Write-Host "Status: " $item.Status Write-Host "StatusInfo: " $item.StatusInfo Write-Host "SupportedSRAM: " $item.SupportedSRAM Write-Host "SystemCreationClassName: " $item.SystemCreationClassName Write-Host "SystemLevelAddress: " $item.SystemLevelAddress Write-Host "SystemName: " $item.SystemName Write-Host "WritePolicy: " $item.WritePolicy Write-Host }
Output will be something like this:
AdditionalErrorData: Associativity: 8 Availability: 3 BlockSize: 1024 CacheType: 5 Caption: Cache Memory CreationClassName: Win32_CacheMemory CurrentSRAM: 6 Description: Cache Memory DeviceID: Cache Memory 0 ErrorCorrectType: 5 InstalledSize: 8192 Level: 5 Location: 0 MaxCacheSize: 8192 Name: Cache Memory NumberOfBlocks: 8192 Purpose: Unknown Status: OK StatusInfo: 3 SupportedSRAM: 6 SystemCreationClassName: Win32_ComputerSystem SystemName: ALAN-HP WritePolicy: 3
References
What every programmer should know about memory
Mysteries of Memory Part 1
Mysteries of Memory Part 2
Welcome to the Jungle
Windows Data Alignment on IPF, x86, and x64
[…] Computer CPU and Memory Architectures […]