Let’s now apply the knowledge of crash dump analysis patterns to the real memory dump coming from the hanging system. !locks WinDbg command is the most frequently used start command for such symptoms according to my observation which might be biased. It reveals deep Wait Chain pattern for executive resources:
1: kd> !locks
[...]
Resource @ 0x8b10c168 Exclusively owned
Contention Count = 1950624
NumberOfSharedWaiters = 2
NumberOfExclusiveWaiters = 154
Threads: 88256430-01<*> 882aedb0-01 8a348580-01
Threads Waiting On Exclusive Access:
891fd368 894fd240 88382280 8921b9c0
8aa18db0 89790328 882b8818 88a70bf8
884a2780 88999818 8a13b020 8846a7a0
8a0b3020 8812e568 897b6db0 88a16440
8922c5f8 88bfe3b8 88264ac0 89ff2b40
8a9da020 881cf020 8807adb0 89d64598
887811d0 8822a850 88264820 88194738
8801f7a8 88284020 88628db0 8a071db0
884a84b0 88be46c0 89755b18 89700020
89ca4580 881cddb0 882f7020 88bcf9a0
8921b020 8826fdb0 88a73db0 88211020
8868a1c8 89121280 89e01020 895cedb0
88d03790 883941f8 8910f820 891ebc80
89862db0 88154af8 8821e7d0 881cedb0
8822b020 88094818 8a00b020 89e69020
880bbdb0 8945f690 8954c1c0 88d2cb90
881cd020 8921c448 89550540 8a5a5870
8a159228 893976c0 882847b8 89306578
880eb9a8 8978e020 882f72c0 8966c380
8a12f4b8 8815adb0 881a5020 897c7db0
8873ebf8 88674530 8831b468 88e999a0
88287020 8966e600 88541db0 8826d7a0
88119b10 8a226338 882f7810 888ba348
884b89a8 88d03db0 8826db00 8910adb0
881d8368 89288238 8a00adb0 89125db0
88eb50a0 88dbbdb0 880ed020 895cd5b0
881d4b00 88565db0 886e7780 884b86d8
8a603598 89383020 8826f370 886d2248
88cd1360 881d1888 88bef670 88117db0
890d63f0 894d0368 8826f850 89123020
88209020 8826fac0 88f9bdb0 89027478
894b8d18 882a7338 899b9020 897c3db0
8a13fc50 88b33d50 88b54b68 88652360
8a199020 8910dc98 8833a020 8a194a70
8a5af640 89b717a0 89464db0 8a152878
884773d8 88afe020 88debaf8 88bef2a0
88bd6948 89abddb0 8a133db0 88e0ebe8
88287398 889622e0 8836aab8 88daec80
88c5c450 88225718
[...]
Resource @ 0x8a316c98 Exclusively owned
Contention Count = 40315
NumberOfExclusiveWaiters = 2
Threads: 893bd498-01<*>
Threads Waiting On Exclusive Access:
8846f9a8 88256430
[...]
Resource @ 0x8a2b3800 Exclusively owned
Contention Count = 17735
NumberOfExclusiveWaiters = 1
Threads: 8a30ec80-01<*>
Threads Waiting On Exclusive Access:
893bd498
[...]
14606 total locks, 14 locks currently held
We have this chain: 154 threads -> 88256430 -> 893bd498 -> 8a30ec80
The same conclusion comes from !analyze -v -hang command:
1: kd> !analyze -v -hang
[...]
Scanning for threads blocked on locks ...
CURRENT_IRQL: 2
BLOCKING_THREAD: 8a30ec80
LOCK_ADDRESS: 8a2b3800 -- (!locks 8a2b3800)
Resource @ 0x8a2b3800 Exclusively owned
Contention Count = 17735
NumberOfExclusiveWaiters = 1
Threads: 8a30ec80-01<*>
Threads Waiting On Exclusive Access:
893bd498
BUGCHECK_STR: LOCK_HELD
[...]
Let’s examine the thread 8a30ec80 that holds so many others:
1: kd> !thread 8a30ec80 1f
THREAD 8a30ec80 Cid 3ca0.20f0 Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 2
Not impersonating
DeviceMap e1000930
Owning Process 8a254128 Image: processA.exe
Wait Start TickCount 2024978 Ticks: 2291 (0:00:00:35.796)
Context Switch Count 339739
UserTime 00:00:00.000
KernelTime 01:08:29.484
Start Address driverA!WorkerRoutine (0xbfa4b850)
Stack Init b7409000 Current b7407e74 Base b7409000 Limit b7406000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr
f77353a0 bfa4b880 driverA!DoProcessing+0×256
f77353a4 ff180010 driverA!WorkerRoutine+0×30
[…]
We see that this thread is running on processor 2 and the time spent in kernel is more than one hour. Seems we have the case of Spiking Thread pattern here. Also the thread is not waiting and seems to be moving some data:
1: kd> .thread 8a30ec80
Implicit thread is now 8a30ec80
1: kd> r
Last set context:
eax=baec2950 ebx=00000000 ecx=00001b4a edx=00002275 esi=bae8c010 edi=ba01a018
eip=bfa3b68c esp=f77353a4 ebp=b7407f44 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
driverA!DoProcessing+0×256:
bfa3b68c f3a5 rep movs dword ptr es:[edi],dword ptr [esi]
This driver was also known to have pool allocation problems resulting in various strange system behaviour so let’s inspect the output of !vm command to see whether we have an instance of Insufficient Memory pattern:
1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 1048242 ( 4192968 Kb)
Page File: \??\C:\pagefile.sys
Current: 4456448 Kb Free Space: 3483860 Kb
Minimum: 4456448 Kb Maximum: 4456448 Kb
Available Pages: 409879 ( 1639516 Kb)
ResAvail Pages: 942398 ( 3769592 Kb)
Locked IO Pages: 124 ( 496 Kb)
Free System PTEs: 182782 ( 731128 Kb)
Free NP PTEs: 32766 ( 131064 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 29 ( 116 Kb)
Modified PF Pages: 29 ( 116 Kb)
NonPagedPool Usage: 13567 ( 54268 Kb)
NonPagedPool Max: 65279 ( 261116 Kb)
PagedPool 0 Usage: 12350 ( 49400 Kb)
PagedPool 1 Usage: 2442 ( 9768 Kb)
PagedPool 2 Usage: 2457 ( 9828 Kb)
PagedPool 3 Usage: 2395 ( 9580 Kb)
PagedPool 4 Usage: 2465 ( 9860 Kb)
PagedPool Usage: 22109 ( 88436 Kb)
PagedPool Maximum: 64512 ( 258048 Kb)
********** 3 pool allocations have failed **********
Shared Commit: 133470 ( 533880 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 24993 ( 99972 Kb)
PagedPool Commit: 22173 ( 88692 Kb)
Driver Commit: 2056 ( 8224 Kb)
Committed pages: 938909 ( 3755636 Kb)
Commit limit: 2119027 ( 8476108 Kb)
[...]
We have signs here but the current size of paged pool and nonpaged pool seems to be very far from their maximum. Perhaps there were failures in session pool allocations? Let’s look at session pool:
1: kd> !vm 4
[...]
Session ID 5 @ f79f3000:
Paged Pool Usage: 35640K
*** 6 Pool Allocation Failures ***
Commit Usage: 36900K
We see the problem in session 5 and let’s see in which session our processA.exe was running:
1: kd> !process 8a254128 0
PROCESS 8a254128 SessionId: 15 Cid: 3ca0 Peb: 7ffd7000 ParentCid: 01ac
DirBase: bff47800 ObjectTable: e779f5c8 HandleCount: 161.
Image: processA.exe
It was session 15 and therefore we might conclude that previous problems with driverA are not connected to this new one. The identified problem is CPU spike. Perhaps the code contains a bug that causes this driver to loop indefinitely.
- Dmitry Vostokov @ DumpAnalysis.org -