Crash Dump Analysis Patterns (Part 68)
Friday, June 27th, 2008Setting a thread affinity mask to a specific processor or core makes that thread running in a single processor environment from that thread point of view. It is always scheduled to run on that processor. This potentially creates a problem found on real single processor environments if the processor runs another higher priority thread (Thread Starvation pattern) or loops at dispatch level IRQL (Dispatch Level Spin pattern). Therefore I call this pattern Affine Thread.
Here is one example. Dual core laptop was hanging and kernel memory dump showed the following Wait Chain pattern:
Resource @ nt!PopPolicyLock (0x80563080) Exclusively owned
Contention Count = 32
NumberOfExclusiveWaiters = 9
Threads: 8b3b08b8-01<*>
Threads Waiting On Exclusive Access:
872935f0 8744cb30 87535da8 8755a6b0
8588dba8 8a446c10 85891c50 87250020
8a6e7da8
The thread 8b3b08b8 blocked other 9 threads and had the following stack trace:
0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8 Cid 0004.002c Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap e1009248
Owning Process 8b3b2830 Image: System
Wait Start TickCount 44419 Ticks: 8744 (0:00:02:16.625)
Context Switch Count 4579
UserTime 00:00:00.000
KernelTime 00:00:01.109
Start Address nt!ExpWorkerThread (0x8053867e)
Stack Init bad00000 Current bacffcb0 Base bad00000 Limit bacfd000 Call 0
Priority 15 BasePriority 12 PriorityDecrement 3 DecrementCount 16
ChildEBP RetAddr
bacffcc8 804fd2c9 nt!KiUnlockDispatcherDatabase+0x9e
bacffcdc 8052a16f nt!KeSetSystemAffinityThread+0×5b
bacffd04 805caf03 nt!PopCompositeBatteryUpdateThrottleLimit+0×2d
bacffd24 805ca767 nt!PopCompositeBatteryDeviceHandler+0×1c5
bacffd3c 80529d3b nt!PopPolicyWorkerMain+0×25
bacffd7c 8053876d nt!PopPolicyWorkerThread+0xbf
bacffdac 805cff64 nt!ExpWorkerThread+0xef
bacffddc 805460de nt!PspSystemThreadStartup+0×34
00000000 00000000 nt!KiThreadStartup+0×16
Note this function and its first parameter:
0: kd> !thread 8b3b08b8
[...]
bacffcdc 8052a16f 00000002 8a5b8cd8 00000030 nt!KeSetSystemAffinityThread+0×5b
[…]
The first parameter is KAFFINITY mask and 0×2 is 0y10 (binary) which is the second core. This thread had been already set to run on that core only:
0: kd> dt _KTHREAD 8b3b08b8
nt!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
[...]
+0×124 Affinity : 2
[…]
Let’s look at our second core:
0: kd> ~1s
1: kd> kL 100
ChildEBP RetAddr
a8f00618 acd21947 hal!KeAcquireInStackQueuedSpinLock+0x43
a8f00618 acd21947 tcpip!IndicateData+0x98
a8f00684 acd173e5 tcpip!IndicateData+0x98
a8f0070c acd14ef5 tcpip!TCPRcv+0xbb0
a8f0076c acd14b19 tcpip!DeliverToUser+0x18e
a8f007e8 acd14836 tcpip!DeliverToUserEx+0x95e
a8f008a0 acd13928 tcpip!IPRcvPacket+0x6cb
a8f008e0 acd13853 tcpip!ARPRcvIndicationNew+0x149
a8f0091c ba56be45 tcpip!ARPRcvPacket+0x68
a8f00970 b635801d NDIS!ethFilterDprIndicateReceivePacket+0x307
a8f00984 b63581b4 psched!PsFlushReceiveQueue+0x15
a8f009a8 b63585f9 psched!PsEnqueueReceivePacket+0xda
a8f009c0 ba56c8ed psched!ClReceiveComplete+0x13
a8f009d8 b7defdb5 NDIS!EthFilterDprIndicateReceiveComplete+0x7c
a8f00a08 b7df0f78 driverA+0x17db5
a8f00a64 ba55ec2c driverA+0x18f78
a8f00a88 b6b0962c NDIS!ndisMSendCompleteX+0x8d
a8f00a9c b6b0a36d driverB+0x62c
a8f00ab8 ba55e88f driverB+0x136d
a8f00ae0 b7de003c NDIS!NdisReturnPackets+0xe9
a8f00af0 ba55e88f driverA+0x803c
a8f00b18 b6358061 NDIS!NdisReturnPackets+0xe9
a8f00b30 ba55e88f psched!MpReturnPacket+0x3b
a8f00b58 acc877cc NDIS!NdisReturnPackets+0xe9
87749da0 00000000 afd!AfdReturnBuffer+0xe1
1: kd> r
eax=a8f005f8 ebx=a8f00624 ecx=8a9862ed edx=a8f00b94 esi=874e2ed0 edi=8a9862d0
eip=806e6a33 esp=a8f005ec ebp=a8f00618 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
hal!KeAcquireInStackQueuedSpinLock+0x43:
806e6a33 74ee je hal!KeAcquireInStackQueuedSpinLock+0x33 (806e6a23) [br=0]
1: kd> !running
System Processors 3 (affinity mask)
Idle Processors 1
Prcb Current Next
1 bab38120 8a0c8ae8 8b3a7318 …………….
We see the thread 8a0c8ae8 had been spinning on the second core for more than 2 minutes:
1: kd> !thread 8a0c8ae8 1f
THREAD 8a0c8ae8 Cid 0660.0124 Teb: 7ffd7000 Win32Thread: e338c498 RUNNING on processor 1
IRP List:
8a960008: (0006,01b4) Flags: 00000900 Mdl: 87535908
Not impersonating
DeviceMap e2f155b8
Owning Process 87373020 Image: APPLICATION.EXE
Wait Start TickCount 43918 Ticks: 9245 (0:00:02:24.453)
Context Switch Count 690 LargeStack
UserTime 00:00:00.000
KernelTime 00:02:24.453
[…]
Its kernel time looks consistent with the starved thread waiting time:
0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8 Cid 0004.002c Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap e1009248
Owning Process 8b3b2830 Image: System
Wait Start TickCount 44419 Ticks: 8744 (0:00:02:16.625)
[…]
For comparison, the spinning thread has affinity mask 0y11 (0×3) which means that it can be scheduled to run on both cores:
0: kd> dt _KTHREAD 8a0c8ae8
nt!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
[...]
+0×124 Affinity : 3
[…]
- Dmitry Vostokov @ DumpAnalysis.org -