Crash Dump Analysis Patterns (Part 68)

Setting a thread affinity mask to a specific processor or core makes that thread running in a single processor environment from that thread point of view. It is always scheduled to run on that processor. This potentially creates a problem found on real single processor environments if the processor runs another higher priority thread (Thread Starvation pattern) or loops at dispatch level IRQL (Dispatch Level Spin pattern). Therefore I call this pattern Affine Thread.

Here is one example. Dual core laptop was hanging and kernel memory dump showed the following Wait Chain pattern:

Resource @ nt!PopPolicyLock (0x80563080)    Exclusively owned
    Contention Count = 32
    NumberOfExclusiveWaiters = 9
     Threads: 8b3b08b8-01<*>
     Threads Waiting On Exclusive Access:
              872935f0       8744cb30       87535da8       8755a6b0      
              8588dba8       8a446c10       85891c50       87250020      
              8a6e7da8

The thread 8b3b08b8 blocked other 9 threads and had the following stack trace:

0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8  Cid 0004.002c  Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap                 e1009248
Owning Process            8b3b2830       Image:         System
Wait Start TickCount      44419          Ticks: 8744 (0:00:02:16.625)
Context Switch Count      4579            
UserTime                  00:00:00.000
KernelTime                00:00:01.109
Start Address nt!ExpWorkerThread (0x8053867e)
Stack Init bad00000 Current bacffcb0 Base bad00000 Limit bacfd000 Call 0
Priority 15 BasePriority 12 PriorityDecrement 3 DecrementCount 16
ChildEBP RetAddr 
bacffcc8 804fd2c9 nt!KiUnlockDispatcherDatabase+0x9e
bacffcdc 8052a16f nt!KeSetSystemAffinityThread+0×5b
bacffd04 805caf03 nt!PopCompositeBatteryUpdateThrottleLimit+0×2d
bacffd24 805ca767 nt!PopCompositeBatteryDeviceHandler+0×1c5
bacffd3c 80529d3b nt!PopPolicyWorkerMain+0×25
bacffd7c 8053876d nt!PopPolicyWorkerThread+0xbf
bacffdac 805cff64 nt!ExpWorkerThread+0xef
bacffddc 805460de nt!PspSystemThreadStartup+0×34
00000000 00000000 nt!KiThreadStartup+0×16

Note this function and its first parameter: 

0: kd> !thread 8b3b08b8
[...]
bacffcdc 8052a16f 00000002 8a5b8cd8 00000030 nt!KeSetSystemAffinityThread+0×5b
[…]

The first parameter is KAFFINITY mask and 0×2 is 0y10 (binary) which is the second core. This thread had been already set to run on that core only:

0: kd> dt _KTHREAD 8b3b08b8
nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
[...]
   +0×124 Affinity         : 2
[…]
 

Let’s look at our second core:

0: kd> ~1s

1: kd> kL 100
ChildEBP RetAddr 
a8f00618 acd21947 hal!KeAcquireInStackQueuedSpinLock+0x43
a8f00618 acd21947 tcpip!IndicateData+0x98
a8f00684 acd173e5 tcpip!IndicateData+0x98
a8f0070c acd14ef5 tcpip!TCPRcv+0xbb0
a8f0076c acd14b19 tcpip!DeliverToUser+0x18e
a8f007e8 acd14836 tcpip!DeliverToUserEx+0x95e
a8f008a0 acd13928 tcpip!IPRcvPacket+0x6cb
a8f008e0 acd13853 tcpip!ARPRcvIndicationNew+0x149
a8f0091c ba56be45 tcpip!ARPRcvPacket+0x68
a8f00970 b635801d NDIS!ethFilterDprIndicateReceivePacket+0x307
a8f00984 b63581b4 psched!PsFlushReceiveQueue+0x15
a8f009a8 b63585f9 psched!PsEnqueueReceivePacket+0xda
a8f009c0 ba56c8ed psched!ClReceiveComplete+0x13
a8f009d8 b7defdb5 NDIS!EthFilterDprIndicateReceiveComplete+0x7c
a8f00a08 b7df0f78 driverA+0x17db5
a8f00a64 ba55ec2c driverA+0x18f78
a8f00a88 b6b0962c NDIS!ndisMSendCompleteX+0x8d
a8f00a9c b6b0a36d driverB+0x62c
a8f00ab8 ba55e88f driverB+0x136d
a8f00ae0 b7de003c NDIS!NdisReturnPackets+0xe9
a8f00af0 ba55e88f driverA+0x803c
a8f00b18 b6358061 NDIS!NdisReturnPackets+0xe9
a8f00b30 ba55e88f psched!MpReturnPacket+0x3b
a8f00b58 acc877cc NDIS!NdisReturnPackets+0xe9
87749da0 00000000 afd!AfdReturnBuffer+0xe1

1: kd> r
eax=a8f005f8 ebx=a8f00624 ecx=8a9862ed edx=a8f00b94 esi=874e2ed0 edi=8a9862d0
eip=806e6a33 esp=a8f005ec ebp=a8f00618 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
hal!KeAcquireInStackQueuedSpinLock+0x43:
806e6a33 74ee            je      hal!KeAcquireInStackQueuedSpinLock+0x33 (806e6a23) [br=0]

1: kd> !running

System Processors 3 (affinity mask)
  Idle Processors 1

     Prcb      Current   Next   
  1  bab38120  8a0c8ae8  8b3a7318  …………….

We see the thread 8a0c8ae8 had been spinning on the second core for more than 2 minutes:

1: kd> !thread 8a0c8ae8 1f
THREAD 8a0c8ae8  Cid 0660.0124  Teb: 7ffd7000 Win32Thread: e338c498 RUNNING on processor 1
IRP List:
    8a960008: (0006,01b4) Flags: 00000900  Mdl: 87535908
Not impersonating
DeviceMap                 e2f155b8
Owning Process            87373020       Image:         APPLICATION.EXE
Wait Start TickCount      43918          Ticks: 9245 (0:00:02:24.453)
Context Switch Count      690                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:02:24.453
[…]

Its kernel time looks consistent with the starved thread waiting time:

0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8  Cid 0004.002c  Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap                 e1009248
Owning Process            8b3b2830       Image:         System
Wait Start TickCount      44419          Ticks: 8744 (0:00:02:16.625)
[…]

For comparison, the spinning thread has affinity mask 0y11 (0×3) which means that it can be scheduled to run on both cores:

0: kd> dt _KTHREAD 8a0c8ae8
nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
[...]
   +0×124 Affinity         : 3
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Leave a Reply