Computer CPU and Memory Architectures

I would like to highlight several resources describing what programmers must know about how computers work.

An article ‘What every programmer must know about memory’ by  Drepper (linked below) talks primarily about NUMA CPU and L1 and L2 caches. Although a little dated, it is still very valuable. The reason it is dated is that we are not optimizing our code just to lay it out on cache lines anymore, although alignment is and will be incredibly important. Alignment of data in general is very important and in some cases mandatory, for example, when we pass data from the host (CPU) to an accelerator (GPU) using constant buffers the data must be aligned, but we have to worry about a lot more in modern computing, only five years since the article had been written.

Hardware and hence programming have changed significantly since 2007 and if we look at the following graph which I borrowed from Herb Sutter’s article (reference below), we can make a conclusion that our code now must be written for a variety of devices and accelerators and we have to be able to visualize code architectures executing in parallel on thousands of cores.

We will not see significant performance gains in single thread applications in the future by merely upgrading to a new hardware. There will be some gains based on advancements in new compilers, but the lesson is that we have to start thinking in terms of computing in the cloud (borrowed the following picture from Herb Sutter as well).

Please read the Drepper’s article to understand how L-caches work. I will also refer you to Mark Russinovich presentation to understand how modern RAM memory is managed (at about minute 23 in video #2). The following images were borrowed from the presentation.

When the computer boots, it looks at pages in the Free Page list (red box in the middle)

and starts moving them to the Zero-Page list. The operation is highly optimized and performs as fast as hardware allows.

Once memory allocations are requested, the system starts creating working sets grabbing pages from the zero-page list.

Once the working set grows big enough, the system starts trimming it by moving pages either to the stand-by list from where they can be reused immediately (this is an instantly reusable cache of pages that can be sent back to the working set) or to modified list.

Modified list is a list of pages which were written to or modified by an executing program and must be persisted to the disk. This operation is ‘lazy’ where a worker (modified page writer below) periodically wakes up and flushes the page out and then moves it to the stand-by list.

Global valid fault is a shared page reused between several processes, you can see it in VMMap from SysINternals under sharable memory.

Once a page is released (stack is released for example), the page is put back onto the free-page list. Because the page contains private data from a process it cannot be reused and given to a different process, it must be reset.

But there are some allocations that can be satisfied from the free-page list. That happens when the page will be immediately overwritten by the system, for example when IO reads data from a disk and writes to that page

or when a soft-fault happens and page is returned to the same process that it was freed from

Below screenshot from process explorer shows the breakdown and available memory sizes on my machine. You can see that free-page list has nothing in it, but there are many pages on the standby and zeroed.

I wanted to close with some information you can get by running SysInternals tools and PowerShell scripts. For example, in the following output from CoreInfo important information is LineSize (please read Drepper to understand why) and L-cache information.

Logical Processor to NUMA Node Map:

********  NUMA Node 0
Logical Processor to Cache Map:
**------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**------  Instruction Cache   0, Level 1,   32 KB, Assoc   4, LineSize  64
**------  Unified Cache       0, Level 2,  256 KB, Assoc   8, LineSize  64

********  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
--**----  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----  Instruction Cache   1, Level 1,   32 KB, Assoc   4, LineSize  64
--**----  Unified Cache       2, Level 2,  256 KB, Assoc   8, LineSize  64
----**--  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--  Instruction Cache   2, Level 1,   32 KB, Assoc   4, LineSize  64
----**--  Unified Cache       3, Level 2,  256 KB, Assoc   8, LineSize  64
------**  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**  Instruction Cache   3, Level 1,   32 KB, Assoc   4, LineSize  64
------**  Unified Cache       4, Level 2,  256 KB, Assoc   8, LineSize  64

You can also run PowerShell script to get a similar information:

PS C:\WINDOWS\system32> C:\ISEScript\ProcessorInfo.ps1

Caption           : Intel64 Family 6 Model 30 Stepping 5
DeviceID          : CPU0
Manufacturer      : GenuineIntel
MaxClockSpeed     : 2128
Name              : Intel(R) Core(TM) i7 CPU       X 940  @ 2.13GHz
SocketDesignation : CPU 1
BlockSize      : 1024
CacheSpeed     :
CacheType      : 5
DeviceID       : Cache Memory 0
InstalledSize  : 8192
Level          : 5
MaxCacheSize   : 8192
NumberOfBlocks : 8192
Status         : OK
BlockSize      : 1024
CacheSpeed     :
CacheType      : 5
DeviceID       : Cache Memory 1
InstalledSize  : 256
Level          : 4
MaxCacheSize   : 256
NumberOfBlocks : 256
Status         : OK
BlockSize      : 1024
CacheSpeed     :
CacheType      : 4
DeviceID       : Cache Memory 2
InstalledSize  : 32
Level          : 3
MaxCacheSize   : 32
NumberOfBlocks : 32
Status         : OK

Here’s the script that produced it:

$computer = "."
Get-WmiObject -Class "Win32_Processor" -ComputerName $computer
Write-Host

Get-WmiObject win32_cachememory
Write-Host

Get-WmiObject Win32_Processor | Format-List *
Write-Host

$props = Get-WmiObject -Class "Win32_Processor" -Namespace "root\CIMV2" -ComputerName $computer

foreach ($item in $props)
{
write-host "Address Width:                            " $item.AddressWidth
write-host "Architecture:                             " $item.Architecture
write-host "Availability:                             " $item.Availability
write-host "Caption:                                  " $item.Caption
write-host "Configuration Manager Error Code:         " $item.ConfigManagerErrorCode
write-host "Configuration Manager User Configuration: " $item.ConfigManagerUserConfig
write-host "CPU Status:                               " $item.CpuStatus
write-host "Creation Class Name:                      " $item.CreationClassName
write-host "Current Clock Speed:                      " $item.CurrentClockSpeed
write-host "Current Voltage:                          " $item.CurrentVoltage
write-host "Data Width:                               " $item.DataWidth
write-host "Description:                              " $item.Description
write-host "Device ID:                                " $item.DeviceID
write-host "Error Cleared:                            " $item.ErrorCleared
write-host "Error Description:                        " $item.ErrorDescription
write-host "Ext Clock:                                " $item.ExtClock
write-host "Family:                                   " $item.Family
write-host "Installation Date:                        " $item.InstallDate
write-host "L2 Cache Size:                            " $item.L2CacheSize
write-host "L2 Cache Speed:                           " $item.L2CacheSpeed
write-host "Last Error Code:                          " $item.LastErrorCode
write-host "Level:                                    " $item.Level
write-host "Load Percentage:                          " $item.LoadPercentage
write-host "Manufacturer:                             " $item.Manufacturer
write-host "Maximum Clock Speed:                      " $item.MaxClockSpeed
write-host "Name:                                     " $item.Name
write-host "Other Family Description:                 " $item.OtherFamilyDescription
write-host "PNP Device ID:                            " $item.PNPDeviceID
write-host "Power Management Capabilities:            " $item.PowerManagementCapabilities
write-host "Power Management Supported:               " $item.PowerManagementSupported
write-host "Processor ID:                             " $item.ProcessorId
write-host "Processor Type:                           " $item.ProcessorType
write-host "Revision:                                 " $item.Revision
write-host "Role:                                     " $item.Role
write-host "Socket Designation:                       " $item.SocketDesignation
write-host "Status:                                   " $item.Status
write-host "Status Information:                       " $item.StatusInfo
write-host "Stepping:                                 " $item.Stepping
write-host "System Creation Class Name:               " $item.SystemCreationClassName
write-host "System Name:                              " $item.SystemName
write-host "Unique ID:                                " $item.UniqueId
write-host "Upgrade Method:                           " $item.UpgradeMethod
write-host "Version:                                  " $item.Version
write-host "Voltage Caps:                             " $item.VoltageCaps
write-host
}

get-wmiobject MSAcpi_ThermalZoneTemperature -namespace "root/wmi" `
| select CurrentTemperature,InstanceName

get-wmiobject -namespace "root/wmi" -list | findstr Temp

And finally, the following script will give you information about L-caches:

$computer = "."

Get-WmiObject Win32_CacheMemory | Format-List *
Write-Host

$props = Get-WmiObject -Class "Win32_CacheMemory" -Namespace "root\CIMV2" -ComputerName $computer

foreach ($item in $props)
{
Write-Host "Access: "  $item.Access
Write-Host "AdditionalErrorData: "  $item.AdditionalErrorData
Write-Host "Associativity: "  $item.Associativity
Write-Host "Availability: "  $item.Availability
Write-Host "BlockSize: "  $item.BlockSize
Write-Host "CacheSpeed: "  $item.CacheSpeed
Write-Host "CacheType: "  $item.CacheType
Write-Host "Caption: "  $item.Caption
Write-Host "ConfigManagerErrorCode: "  $item.ConfigManagerErrorCode
Write-Host "ConfigManagerUserConfig: "  $item.ConfigManagerUserConfig
Write-Host "CorrectableError: "  $item.CorrectableError
Write-Host "CreationClassName: "  $item.CreationClassName
Write-Host "CurrentSRAM: "  $item.CurrentSRAM
Write-Host "Description: "  $item.Description
Write-Host "DeviceID: "  $item.DeviceID
Write-Host "EndingAddress: "  $item.EndingAddress
Write-Host "ErrorAccess: "  $item.ErrorAccess
Write-Host "ErrorAddress: "  $item.ErrorAddress
Write-Host "ErrorCleared: "  $item.ErrorCleared
Write-Host "ErrorCorrectType: "  $item.ErrorCorrectType
Write-Host "ErrorData: "  $item.ErrorData
Write-Host "ErrorDataOrder: "  $item.ErrorDataOrder
Write-Host "ErrorDescription: "  $item.ErrorDescription
Write-Host "ErrorInfo: "  $item.ErrorInfo
Write-Host "ErrorMethodology: "  $item.ErrorMethodology
Write-Host "ErrorResolution: "  $item.ErrorResolution
Write-Host "ErrorTime: "  $item.ErrorTime
Write-Host "ErrorTransferSize: "  $item.ErrorTransferSize
Write-Host "FlushTimer: "  $item.FlushTimer
Write-Host "InstallDate: "  $item.InstallDate
Write-Host "InstalledSize: "  $item.InstalledSize
Write-Host "LastErrorCode: "  $item.LastErrorCode
Write-Host "Level: "  $item.Level
Write-Host "LineSize: "  $item.LineSize
Write-Host "Location: "  $item.Location
Write-Host "MaxCacheSize: "  $item.MaxCacheSize
Write-Host "Name: "  $item.Name
Write-Host "NumberOfBlocks: "  $item.NumberOfBlocks
Write-Host "OtherErrorDescription: "  $item.OtherErrorDescription
Write-Host "PNPDeviceID: "  $item.PNPDeviceID
Write-Host "PowerManagementCapabilities: "  $item.PowerManagementCapabilities
Write-Host "PowerManagementSupported: "  $item.PowerManagementSupported
Write-Host "Purpose: "  $item.Purpose
Write-Host "ReadPolicy: "  $item.ReadPolicy
Write-Host "ReplacementPolicy: "  $item.ReplacementPolicy
Write-Host "StartingAddress: "  $item.StartingAddress
Write-Host "Status: "  $item.Status
Write-Host "StatusInfo: "  $item.StatusInfo
Write-Host "SupportedSRAM: "  $item.SupportedSRAM
Write-Host "SystemCreationClassName: "  $item.SystemCreationClassName
Write-Host "SystemLevelAddress: "  $item.SystemLevelAddress
Write-Host "SystemName: "  $item.SystemName
Write-Host "WritePolicy: "  $item.WritePolicy
Write-Host
}

Output will be something like this:

AdditionalErrorData:
Associativity:  8
Availability:  3
BlockSize:  1024
CacheType:  5
Caption:  Cache Memory
CreationClassName:  Win32_CacheMemory
CurrentSRAM:  6
Description:  Cache Memory
DeviceID:  Cache Memory 0
ErrorCorrectType:  5
InstalledSize:  8192
Level:  5
Location:  0
MaxCacheSize:  8192
Name:  Cache Memory
NumberOfBlocks:  8192
Purpose:  Unknown
Status:  OK
StatusInfo:  3
SupportedSRAM:  6
SystemCreationClassName:  Win32_ComputerSystem
SystemName:  ALAN-HP
WritePolicy:  3

 

References

What every programmer should know about memory
Mysteries of Memory Part 1
Mysteries of Memory Part 2
Welcome to the Jungle
Windows Data Alignment on IPF, x86, and x64

One comment

Leave a comment