Crash Dump Analysis Patterns (Part 68)

CARE: Crash Analysis Report Environment

DATA (Dump Analysis + Trace Analysis) Facebook group
Please join the community of memory (dump) and trace analysis engineers. This group promotes scientific methods and memory dump-based worldview.

Twitter @ DumpAnalysis
You can now follow portal and blog news at DumpAnalysis on Twitter

LinkedIn Group Dr. Watson Enthusiasts
All about Dr. Watson errors and more. Get news, excerpts and progress reports about the forthcoming book The Science of Dr. Watson: An Illustrated History of Debugging (ISBN 978-1906717070)

2010 (0x7DA) - The Year of Dump Analysis
2011 (0x7DB) - 2020 (0x7E4) The Debugging Decade

Setting a thread affinity mask to a specific processor or core makes that thread running in a single processor environment from that thread point of view. It is always scheduled to run on that processor. This potentially creates a problem found on real single processor environments if the processor runs another higher priority thread (Thread Starvation pattern) or loops at dispatch level IRQL (Dispatch Level Spin pattern). Therefore I call this pattern Affine Thread.

Here is one example. Dual core laptop was hanging and kernel memory dump showed the following Wait Chain pattern:

Resource @ nt!PopPolicyLock (0x80563080)    Exclusively owned
    Contention Count = 32
    NumberOfExclusiveWaiters = 9
     Threads: 8b3b08b8-01<*>
     Threads Waiting On Exclusive Access:
              872935f0       8744cb30       87535da8       8755a6b0      
              8588dba8       8a446c10       85891c50       87250020      
              8a6e7da8

The thread 8b3b08b8 blocked other 9 threads and had the following stack trace:

0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8  Cid 0004.002c  Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap                 e1009248
Owning Process            8b3b2830       Image:         System
Wait Start TickCount      44419          Ticks: 8744 (0:00:02:16.625)
Context Switch Count      4579            
UserTime                  00:00:00.000
KernelTime                00:00:01.109
Start Address nt!ExpWorkerThread (0x8053867e)
Stack Init bad00000 Current bacffcb0 Base bad00000 Limit bacfd000 Call 0
Priority 15 BasePriority 12 PriorityDecrement 3 DecrementCount 16
ChildEBP RetAddr 
bacffcc8 804fd2c9 nt!KiUnlockDispatcherDatabase+0x9e
bacffcdc 8052a16f nt!KeSetSystemAffinityThread+0×5b
bacffd04 805caf03 nt!PopCompositeBatteryUpdateThrottleLimit+0×2d
bacffd24 805ca767 nt!PopCompositeBatteryDeviceHandler+0×1c5
bacffd3c 80529d3b nt!PopPolicyWorkerMain+0×25
bacffd7c 8053876d nt!PopPolicyWorkerThread+0xbf
bacffdac 805cff64 nt!ExpWorkerThread+0xef
bacffddc 805460de nt!PspSystemThreadStartup+0×34
00000000 00000000 nt!KiThreadStartup+0×16

Note this function and its first parameter: 

0: kd> !thread 8b3b08b8
[...]
bacffcdc 8052a16f 00000002 8a5b8cd8 00000030 nt!KeSetSystemAffinityThread+0×5b
[…]

The first parameter is KAFFINITY mask and 0×2 is 0y10 (binary) which is the second core. This thread had been already set to run on that core only:

0: kd> dt _KTHREAD 8b3b08b8
nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
[...]
   +0×124 Affinity         : 2
[…]
 

Let’s look at our second core:

0: kd> ~1s

1: kd> kL 100
ChildEBP RetAddr 
a8f00618 acd21947 hal!KeAcquireInStackQueuedSpinLock+0x43
a8f00618 acd21947 tcpip!IndicateData+0x98
a8f00684 acd173e5 tcpip!IndicateData+0x98
a8f0070c acd14ef5 tcpip!TCPRcv+0xbb0
a8f0076c acd14b19 tcpip!DeliverToUser+0x18e
a8f007e8 acd14836 tcpip!DeliverToUserEx+0x95e
a8f008a0 acd13928 tcpip!IPRcvPacket+0x6cb
a8f008e0 acd13853 tcpip!ARPRcvIndicationNew+0x149
a8f0091c ba56be45 tcpip!ARPRcvPacket+0x68
a8f00970 b635801d NDIS!ethFilterDprIndicateReceivePacket+0x307
a8f00984 b63581b4 psched!PsFlushReceiveQueue+0x15
a8f009a8 b63585f9 psched!PsEnqueueReceivePacket+0xda
a8f009c0 ba56c8ed psched!ClReceiveComplete+0x13
a8f009d8 b7defdb5 NDIS!EthFilterDprIndicateReceiveComplete+0x7c
a8f00a08 b7df0f78 driverA+0x17db5
a8f00a64 ba55ec2c driverA+0x18f78
a8f00a88 b6b0962c NDIS!ndisMSendCompleteX+0x8d
a8f00a9c b6b0a36d driverB+0x62c
a8f00ab8 ba55e88f driverB+0x136d
a8f00ae0 b7de003c NDIS!NdisReturnPackets+0xe9
a8f00af0 ba55e88f driverA+0x803c
a8f00b18 b6358061 NDIS!NdisReturnPackets+0xe9
a8f00b30 ba55e88f psched!MpReturnPacket+0x3b
a8f00b58 acc877cc NDIS!NdisReturnPackets+0xe9
87749da0 00000000 afd!AfdReturnBuffer+0xe1

1: kd> r
eax=a8f005f8 ebx=a8f00624 ecx=8a9862ed edx=a8f00b94 esi=874e2ed0 edi=8a9862d0
eip=806e6a33 esp=a8f005ec ebp=a8f00618 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
hal!KeAcquireInStackQueuedSpinLock+0x43:
806e6a33 74ee            je      hal!KeAcquireInStackQueuedSpinLock+0x33 (806e6a23) [br=0]

1: kd> !running

System Processors 3 (affinity mask)
  Idle Processors 1

     Prcb      Current   Next   
  1  bab38120  8a0c8ae8  8b3a7318  …………….

We see the thread 8a0c8ae8 had been spinning on the second core for more than 2 minutes:

1: kd> !thread 8a0c8ae8 1f
THREAD 8a0c8ae8  Cid 0660.0124  Teb: 7ffd7000 Win32Thread: e338c498 RUNNING on processor 1
IRP List:
    8a960008: (0006,01b4) Flags: 00000900  Mdl: 87535908
Not impersonating
DeviceMap                 e2f155b8
Owning Process            87373020       Image:         APPLICATION.EXE
Wait Start TickCount      43918          Ticks: 9245 (0:00:02:24.453)
Context Switch Count      690                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:02:24.453
[…]

Its kernel time looks consistent with the starved thread waiting time:

0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8  Cid 0004.002c  Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap                 e1009248
Owning Process            8b3b2830       Image:         System
Wait Start TickCount      44419          Ticks: 8744 (0:00:02:16.625)
[…]

For comparison, the spinning thread has affinity mask 0y11 (0×3) which means that it can be scheduled to run on both cores:

0: kd> dt _KTHREAD 8a0c8ae8
nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
[...]
   +0×124 Affinity         : 3
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

           

Announcements

Coming Soon:

Debugging Notebook: Essential Concepts, WinDbg Commands and Tools

Crash Dump Analysis for System Administrators and Support Engineers

New Magazines:

Debugged! MZ/PE: MagaZine for/from Practicing Engineers


New Books:

Memory Dump Analysis Anthology, Volume 3

First Fault Software Problem Solving: A Guide for Engineers, Managers and Users

x64 Windows Debugging: Practical Foundations

Also available:

Windows Debugging: Practical Foundations

DLL List Landscape: The Art from Computer Memory Space

Dumps, Bugs and Debugging Forensics: The Adventures of Dr. Debugalov

WinDbg: A Reference Poster and Learning Cards

Memory Dump Analysis Anthology, Volume 2

Memory Dump Analysis Anthology, Volume 1

New Children's Book:

Baby Turing

Leave a Reply