Multiple patterns in a dump: case study

Let’s now apply the knowledge of crash dump analysis patterns to the real memory dump coming from the hanging system. !locks WinDbg command is the most frequently used start command for such symptoms according to my observation which might be biased. It reveals deep Wait Chain pattern for executive resources:

1: kd> !locks
[...]
Resource @ 0x8b10c168    Exclusively owned
    Contention Count = 1950624
    NumberOfSharedWaiters = 2
    NumberOfExclusiveWaiters = 154
     Threads: 88256430-01<*> 882aedb0-01    8a348580-01   
     Threads Waiting On Exclusive Access:
              891fd368       894fd240       88382280       8921b9c0      
              8aa18db0       89790328       882b8818       88a70bf8      
              884a2780       88999818       8a13b020       8846a7a0      
              8a0b3020       8812e568       897b6db0       88a16440      
              8922c5f8       88bfe3b8       88264ac0       89ff2b40      
              8a9da020       881cf020       8807adb0       89d64598      
              887811d0       8822a850       88264820       88194738      
              8801f7a8       88284020       88628db0       8a071db0      
              884a84b0       88be46c0       89755b18       89700020      
              89ca4580       881cddb0       882f7020       88bcf9a0      
              8921b020       8826fdb0       88a73db0       88211020      
              8868a1c8       89121280       89e01020       895cedb0      
              88d03790       883941f8       8910f820       891ebc80      
              89862db0       88154af8       8821e7d0       881cedb0      
              8822b020       88094818       8a00b020       89e69020      
              880bbdb0       8945f690       8954c1c0       88d2cb90      
              881cd020       8921c448       89550540       8a5a5870      
              8a159228       893976c0       882847b8       89306578      
              880eb9a8       8978e020       882f72c0       8966c380      
              8a12f4b8       8815adb0       881a5020       897c7db0      
              8873ebf8       88674530       8831b468       88e999a0      
              88287020       8966e600       88541db0       8826d7a0      
              88119b10       8a226338       882f7810       888ba348      
              884b89a8       88d03db0       8826db00       8910adb0      
              881d8368       89288238       8a00adb0       89125db0      
              88eb50a0       88dbbdb0       880ed020       895cd5b0      
              881d4b00       88565db0       886e7780       884b86d8      
              8a603598       89383020       8826f370       886d2248      
              88cd1360       881d1888       88bef670       88117db0      
              890d63f0       894d0368       8826f850       89123020      
              88209020       8826fac0       88f9bdb0       89027478      
              894b8d18       882a7338       899b9020       897c3db0      
              8a13fc50       88b33d50       88b54b68       88652360      
              8a199020       8910dc98       8833a020       8a194a70      
              8a5af640       89b717a0       89464db0       8a152878      
              884773d8       88afe020       88debaf8       88bef2a0      
              88bd6948       89abddb0       8a133db0       88e0ebe8      
              88287398       889622e0       8836aab8       88daec80      
              88c5c450       88225718

[...]

Resource @ 0x8a316c98    Exclusively owned
    Contention Count = 40315
    NumberOfExclusiveWaiters = 2
     Threads: 893bd498-01<*>
     Threads Waiting On Exclusive Access:
              8846f9a8       88256430

[...]

Resource @ 0x8a2b3800    Exclusively owned
    Contention Count = 17735
    NumberOfExclusiveWaiters = 1
     Threads: 8a30ec80-01<*>
     Threads Waiting On Exclusive Access:
              893bd498

[...]

14606 total locks, 14 locks currently held

We have this chain: 154 threads -> 88256430 -> 893bd498 -> 8a30ec80

The same conclusion comes from !analyze -v -hang  command:

1: kd> !analyze -v -hang

[...]

Scanning for threads blocked on locks ...

CURRENT_IRQL:  2

BLOCKING_THREAD:  8a30ec80

LOCK_ADDRESS:  8a2b3800 -- (!locks 8a2b3800)

Resource @ 0x8a2b3800    Exclusively owned
    Contention Count = 17735
    NumberOfExclusiveWaiters = 1
     Threads: 8a30ec80-01<*>
     Threads Waiting On Exclusive Access:
              893bd498

BUGCHECK_STR:  LOCK_HELD

[...]

Let’s examine the thread 8a30ec80 that holds so many others:

1: kd> !thread 8a30ec80 1f
THREAD 8a30ec80  Cid 3ca0.20f0  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 2
Not impersonating
DeviceMap                 e1000930
Owning Process            8a254128       Image:         processA.exe
Wait Start TickCount      2024978        Ticks: 2291 (0:00:00:35.796)
Context Switch Count      339739            
UserTime                  00:00:00.000
KernelTime                01:08:29.484
Start Address driverA!WorkerRoutine (0xbfa4b850)
Stack Init b7409000 Current b7407e74 Base b7409000 Limit b7406000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f77353a0 bfa4b880 driverA!DoProcessing+0×256
f77353a4 ff180010 driverA!WorkerRoutine+0×30
[…]

We see that this thread is running on processor 2 and the time spent in kernel is more than one hour. Seems we have the case of Spiking Thread pattern here. Also the thread is not waiting and seems to be moving some data:

1: kd> .thread 8a30ec80
Implicit thread is now 8a30ec80

1: kd> r
Last set context:
eax=baec2950 ebx=00000000 ecx=00001b4a edx=00002275 esi=bae8c010 edi=ba01a018
eip=bfa3b68c esp=f77353a4 ebp=b7407f44 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
driverA!DoProcessing+0×256:
bfa3b68c f3a5 rep movs dword ptr es:[edi],dword ptr [esi]

This driver was also known to have pool allocation problems resulting in various strange system behaviour so let’s inspect the output of !vm command to see whether we have an instance of Insufficient Memory pattern:

1: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:     1048242 (   4192968 Kb)
 Page File: \??\C:\pagefile.sys
   Current:   4456448 Kb  Free Space:   3483860 Kb
   Minimum:   4456448 Kb  Maximum:      4456448 Kb
 Available Pages:      409879 (   1639516 Kb)
 ResAvail Pages:       942398 (   3769592 Kb)
 Locked IO Pages:         124 (       496 Kb)
 Free System PTEs:     182782 (    731128 Kb)
 Free NP PTEs:          32766 (    131064 Kb)
 Free Special NP:           0 (         0 Kb)
 Modified Pages:           29 (       116 Kb)
 Modified PF Pages:        29 (       116 Kb)
 NonPagedPool Usage:    13567 (     54268 Kb)
 NonPagedPool Max:      65279 (    261116 Kb)
 PagedPool 0 Usage:     12350 (     49400 Kb)
 PagedPool 1 Usage:      2442 (      9768 Kb)
 PagedPool 2 Usage:      2457 (      9828 Kb)
 PagedPool 3 Usage:      2395 (      9580 Kb)
 PagedPool 4 Usage:      2465 (      9860 Kb)
 PagedPool Usage:       22109 (     88436 Kb)
 PagedPool Maximum:     64512 (    258048 Kb)

 ********** 3 pool allocations have failed **********

 Shared Commit:        133470 (    533880 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:        24993 (     99972 Kb)
 PagedPool Commit:      22173 (     88692 Kb)
 Driver Commit:          2056 (      8224 Kb)
 Committed pages:      938909 (   3755636 Kb)
 Commit limit:        2119027 (   8476108 Kb)

[...]

We have signs here but the current size of paged pool and nonpaged pool seems to be very far from their maximum. Perhaps there were failures in session pool allocations? Let’s look at session pool:

1: kd> !vm 4
[...]
 Session ID 5 @ f79f3000:
 Paged Pool Usage:       35640K

 *** 6 Pool Allocation Failures ***

 Commit Usage:           36900K

We see the problem in session 5 and let’s see in which session our processA.exe was running:

1: kd> !process 8a254128 0
PROCESS 8a254128  SessionId: 15  Cid: 3ca0    Peb: 7ffd7000  ParentCid: 01ac
    DirBase: bff47800  ObjectTable: e779f5c8  HandleCount: 161.
    Image: processA.exe

It was session 15 and therefore we might conclude that previous problems with driverA are not connected to this new one. The identified problem is CPU spike. Perhaps the code contains a bug that causes this driver to loop indefinitely.

- Dmitry Vostokov @ DumpAnalysis.org -

Leave a Reply

You must be logged in to post a comment.