Archive for the ‘Complete Memory Dump Analysis’ Category

Webinar: Fundamentals of Complete Crash and Hang Memory Dump Analysis (Second Session)

Wednesday, August 18th, 2010

Do to the attendee limit not all registered for the first session were able to attend. The second session is available:

Date: 23rd of August 2010
Time: 19:00 (BST) 14:00 (Eastern) 11:00 (Pacific)
Duration: 90 minutes

Space is limited.
Reserve your Webinar seat now at:
https://www1.gotomeeting.com/register/823155577

After the second session webinar slides will be published and later a Q&A page will be compiled.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

ALPC wait chains, missing threads, waiting thread time and semantic process coupling: pattern cooperation

Monday, August 16th, 2010

In a complete memory dump we could see ALPC wait chains leading to ServiceA.exe process with a queue of 372 messages. Additionally we could also see ServiceB.exe process waiting for ServiceC.exe with the latter having a queue of 201 messages. Threads that were supposed to process some messages didn’t exist. ServiceC process had a thread that was waiting for ServiceA.exe as well. But there was no any indication for a thread-2-thread deadlock. We could also see that threads waiting for ServiceA.exe sometimes had the greater waiting time than threads waiting for ServiceC so it could be the case that the problem initially started with ServiceA.exe. However, after more thorough analysis we could also see several terminating ApplicationD.exe processes with just one thread waiting in ModuleE with the waiting time exceeding the waiting time of blocked threads waiting for ServiceA and ServiceC. Because of semantic process coupling between ServiceA and ApplicationD it was decided that ModuleE was responsible and its vendor was contacted for updates.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Forthcoming Webinar: Fundamentals of Complete Crash and Hang Memory Dump Analysis

Sunday, July 18th, 2010

Complete Memory Dump Analysis Logo

Memory Dump Analysis Services (DumpAnalysis.com) organizes a free webinar

Date: 18th of August 2010
Time: 21:00 (BST) 16:00 (Eastern) 13:00 (Pacific)
Duration: 90 minutes

Topics include:

- User vs. kernel vs. physical (complete) memory space
- Challenges of complete memory dump analysis
- Common WinDbg commands
- Patterns
- Common mistakes
- Fiber bundles
- Hands-on exercise: a complete memory dump analysis
- A guide to DumpAnalysis.org case studies

Prerequisites: working knowledge of basic user process and kernel memory dump analysis or live debugging using WinDbg 

The webinar link will be posted before 18th of August on DumpAnalysis.com

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Stack trace collection, special process, LPC and critical section wait chains, blocked thread, coupled machines, thread waiting time and IRP distribution anomaly: pattern cooperation

Sunday, July 18th, 2010

It was reported that new remote sessions couldn’t be created. A complete memory dump stack trace collection log lists a special process that would not normally be present in a fully initialized session: userinit.exe. One of its threads is blocked waiting for an LPC response:

kd> !process 0 ff
**** NT ACTIVE PROCESS DUMP ****

[...]

PROCESS 89cf4870  SessionId: 0  Cid: 0fa4    Peb: 7ffde000  ParentCid: 0228
    DirBase: 3c9e6000  ObjectTable: e1627410  HandleCount:  81.
    Image: userinit.exe
    VadRoot 89a80168 Vads 161 Clone 0 Private 517. Modified 4. Locked 0.
    DeviceMap e1003170
    Token                             e1575030
    ElapsedTime                       05:34:29.046
    UserTime                          00:00:00.046
    KernelTime                        00:00:00.234
    QuotaPoolUsage[PagedPool]         42916
    QuotaPoolUsage[NonPagedPool]      7176
    Working Set Sizes (now,min,max)  (1289, 50, 345) (5156KB, 200KB, 1380KB)
    PeakWorkingSetSize                1291
    VirtualSize                       33 Mb
    PeakVirtualSize                   34 Mb
    PageFaultCount                    1411
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      866

[...]

THREAD 89d475a8  Cid 0fa4.0f48  Teb: 7ffda000 Win32Thread: 00000000 WAIT: (WrLpcReply) UserMode Non-Alertable
    89d4779c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 0000acfd:
Current LPC port e5501c28
Not impersonating
DeviceMap                 e1003170
Owning Process            0       Image:         <Unknown>
Attached Process          89cf4870       Image:         userinit.exe
Wait Start TickCount      1845699        Ticks: 1284370 (0:05:34:28.281)
Context Switch Count      149            
UserTime                  00:00:00.000
KernelTime                00:00:00.093
Win32 Start Address PAUTOENR!AEMainThreadProc (0×5e95a798)
Start Address kernel32!BaseThreadStartThunk (0×7c8106f9)
Stack Init b88a1000 Current b88a0c50 Base b88a1000 Limit b889e000 Call 0
Priority 7 BasePriority 7 PriorityDecrement 0 DecrementCount 0
Kernel stack not resident.
ChildEBP RetAddr 
b88a0c68 804e1bf2 nt!KiSwapContext+0×2f
b88a0c74 804e1c3e nt!KiSwapThread+0×8a
b88a0c9c 8057d073 nt!KeWaitForSingleObject+0×1c2
b88a0d50 804dd99f nt!NtRequestWaitReplyPort+0×63d
b88a0d50 7c90e514 nt!KiFastCallEntry+0xfc (TrapFrame @ b88a0d64)
00a8f064 7c90daea ntdll!KiFastSystemCallRet
00a8f068 77e7cac1 ntdll!ZwRequestWaitReplyPort+0xc
00a8f0b4 77e7a33e RPCRT4!LRPC_CCALL::SendReceive+0×228
00a8f0c0 77e7a36f RPCRT4!I_RpcSendReceive+0×24
00a8f0d4 77ef4675 RPCRT4!NdrSendReceive+0×2b
00a8f4b0 76f235e7 RPCRT4!NdrClientCall2+0×222
00a8f4c4 76f2357b DNSAPI!R_ResolverQuery+0×1b
00a8f520 71a526c6 DNSAPI!DnsQuery_W+0×14f
00a8f554 71a5266f mswsock!HostentBlob_Query+0×29
00a8f580 71a51b0a mswsock!Rnr_DoDnsLookup+0×7d
00a8f9c8 71ab32b0 mswsock!NSPLookupServiceNext+0×533
00a8f9e0 71ab3290 WS2_32!NSPROVIDER::NSPLookupServiceNext+0×17
00a8f9fc 71ab325a WS2_32!NSPROVIDERSTATE::LookupServiceNext+0×1c
00a8fa28 71ab31f8 WS2_32!NSQUERY::LookupServiceNext+0xae
00a8fa48 76f775eb WS2_32!WSALookupServiceNextW+0×78
00a8faec 76f6a9d2 WLDAP32!GetHostByNameW+0xef
00a8fb38 76f6667b WLDAP32!OpenLdapServer+0×435
00a8fb58 76f6fb05 WLDAP32!LdapConnect+0×169
00a8fef8 76f704f3 WLDAP32!LdapBind+0×34
00a8ff20 5e95651a WLDAP32!ldap_bind_sW+0×2c
00a8ff68 5e95a887 PAUTOENR!AERobustLdapBind+0xc9
00a8ffb4 7c80b729 PAUTOENR!AEMainThreadProc+0xef
00a8ffec 00000000 kernel32!BaseThreadStart+0×37

We start following the LPC wait chain

kd> !lpc message 0000acfd
Searching message acfd in threads …
    Server thread 89a80328 is working on message acfd                        
Client thread 89d475a8 waiting a reply from acfd                         
Searching thread 89d475a8 in port rundown queues …

Server communication port 0xe12fc438
    Handles: 1   References: 1
    The LpcDataInfoChainHead queue is empty
        Connected port: 0xe5501c28      Server connection port: 0xe1640798

Client communication port 0xe5501c28
    Handles: 1   References: 2
    The LpcDataInfoChainHead queue is empty

Server connection port e1640798  Name: DNSResolver
    Handles: 1   References: 17
    Server process  : 899a0020 (svchost.exe)
    Queue semaphore : 89dabdf0
    Semaphore state 0 (0x0)
    The message queue is empty
    The LpcDataInfoChainHead queue is empty
Done.

kd> !thread 89a80328 1f
THREAD 89a80328  Cid 03c8.0644  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (WrLpcReply) UserMode Non-Alertable
    89a8051c  Semaphore Limit 0×1
Waiting for reply to LPC MessageId 0000acfe:
Current LPC port e10b6b00
Not impersonating
DeviceMap                 e1b093c8
Owning Process            0       Image:         <Unknown>
Attached Process          899a0020       Image:         svchost.exe
Wait Start TickCount      1845699        Ticks: 1284370 (0:05:34:28.281)
Context Switch Count      1208            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×0000acfd
LPC Server thread working on message Id acfd
Start Address kernel32!BaseThreadStartThunk (0×7c8106f9)
Stack Init b7a33000 Current b7a32c50 Base b7a33000 Limit b7a30000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0
Kernel stack not resident.
ChildEBP RetAddr 
b7a32c68 804e1bf2 nt!KiSwapContext+0×2f
b7a32c74 804e1c3e nt!KiSwapThread+0×8a
b7a32c9c 8057d073 nt!KeWaitForSingleObject+0×1c2
b7a32d50 804dd99f nt!NtRequestWaitReplyPort+0×63d
b7a32d50 7c90e514 nt!KiFastCallEntry+0xfc (TrapFrame @ b7a32d64)
00a9eb3c 7c90daea ntdll!KiFastSystemCallRet
00a9eb40 77e7cac1 ntdll!ZwRequestWaitReplyPort+0xc
00a9eb8c 77e7a33e RPCRT4!LRPC_CCALL::SendReceive+0×228
00a9eb98 77e7a36f RPCRT4!I_RpcSendReceive+0×24
00a9ebac 77ef4675 RPCRT4!NdrSendReceive+0×2b
00a9ef88 662e0c48 RPCRT4!NdrClientCall2+0×222
00a9ef9c 662dafa9 hnetcfg!FwOpenDynamicFwPort+0×1b
00a9f048 71a55025 hnetcfg!IcfOpenDynamicFwPort+0×6a
00a9f0e4 71a590f2 mswsock!WSPBind+0×332
00a9f200 71ab2fd7 mswsock!WSPSendTo+0×230
00a9f250 76f252c0 WS2_32!sendto+0×88
00a9f280 76f251ea DNSAPI!SendMessagePrivate+0×18d
00a9f2a0 76f2517c DNSAPI!SendUsingServerInfo+0×33
00a9f2c8 76f25436 DNSAPI!SendUdpToNextDnsServers+0×80
00a9f314 76f24dec DNSAPI!Dns_SendAndRecvUdp+0×121
00a9f34c 76f24d20 DNSAPI!Dns_SendAndRecv+0×7b
00a9f37c 76f24a7d DNSAPI!Query_SingleName+0×8b
00a9f3b0 7677373a DNSAPI!Query_Main+0×11a
00a9f3c8 7677303f dnsrslvr!ResolverQuery+0×48
00a9f8bc 77e799f4 dnsrslvr!R_ResolverQuery+0×111
00a9f8e4 77ef421a RPCRT4!Invoke+0×30
00a9fcf4 77ef46ee RPCRT4!NdrStubCall2+0×297
00a9fd10 77e794bd RPCRT4!NdrServerCall2+0×19
00a9fd44 77e79422 RPCRT4!DispatchToStubInC+0×38
00a9fd98 77e7934e RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×113
00a9fdbc 77e7be64 RPCRT4!RPC_INTERFACE::DispatchToStub+0×84
00a9fdf8 77e7bcc1 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×2db
00a9fe1c 77e7bc05 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×16d
00a9ff80 77e76caf RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×310
00a9ff88 77e76ad1 RPCRT4!RecvLotsaCallsWrapper+0xd
00a9ffa8 77e76c97 RPCRT4!BaseCachedThreadRoutine+0×79
00a9ffb4 7c80b729 RPCRT4!ThreadStartRoutine+0×1a
00a9ffec 00000000 kernel32!BaseThreadStart+0×37

We notice that an endpoint is blocked waiting for a critical section:

kd> !lpc message 0000acfe
Searching message acfe in threads …
    Server thread 89b452e8 is working on message acfe
Client thread 89a80328 waiting a reply from acfe                         
Searching thread 89a80328 in port rundown queues …

Server communication port 0xe11152f8
    Handles: 1   References: 1
    The LpcDataInfoChainHead queue is empty
        Connected port: 0xe10b6b00      Server connection port: 0xe1633380

Client communication port 0xe10b6b00
    Handles: 1   References: 4
    The LpcDataInfoChainHead queue is empty

Server connection port e1633380  Name: trkwks
    Handles: 1   References: 19
    Server process  : 89a20858 (svchost.exe)
    Queue semaphore : 89af47e8
    Semaphore state 0 (0x0)
    The message queue is empty
    The LpcDataInfoChainHead queue is empty
Done.

kd> !thread 89b452e8 1f
THREAD 89b452e8  Cid 03a8.0a28  Teb: 7ff94000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable
    89b466c0  SynchronizationEvent
IRP List:
    89b49008: (0006,01d8) Flags: 00000030  Mdl: 00000000
Not impersonating
DeviceMap                 e1003170
Owning Process            0       Image:         <Unknown>
Attached Process          89a20858       Image:         svchost.exe
Wait Start TickCount      1845699        Ticks: 1284370 (0:05:34:28.281)
Context Switch Count      5            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×0000acfe
LPC Server thread working on message Id acfe
Start Address kernel32!BaseThreadStartThunk (0×7c8106f9)
Stack Init b88dd000 Current b88dcca0 Base b88dd000 Limit b88da000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0 DecrementCount 0
Kernel stack not resident.
ChildEBP RetAddr 
b88dccb8 804e1bf2 nt!KiSwapContext+0×2f
b88dccc4 804e1c3e nt!KiSwapThread+0×8a
b88dccec 8056dff6 nt!KeWaitForSingleObject+0×1c2
b88dcd50 804dd99f nt!NtWaitForSingleObject+0×9a
b88dcd50 7c90e514 nt!KiFastCallEntry+0xfc (TrapFrame @ b88dcd64)
036ef714 7c90df5a ntdll!KiFastSystemCallRet
036ef718 7c91b24b ntdll!ZwWaitForSingleObject+0xc
036ef7a0 7c901046 ntdll!RtlpWaitForCriticalSection+0×132
036ef7a8 6648a33b ntdll!RtlEnterCriticalSection+0×46

036ef7b0 6648c2ed ipnathlp!FwLock+0xa
036ef808 6648c705 ipnathlp!FwDynPortAdd+0×1d
036ef8c4 77e799f4 ipnathlp!FwOpenDynamicFwPort+0×125
036ef8e8 77ef421a RPCRT4!Invoke+0×30
036efcf4 77ef46ee RPCRT4!NdrStubCall2+0×297
036efd10 77e794bd RPCRT4!NdrServerCall2+0×19
036efd44 77e79422 RPCRT4!DispatchToStubInC+0×38
036efd98 77e7934e RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×113
036efdbc 77e7be64 RPCRT4!RPC_INTERFACE::DispatchToStub+0×84
036efdf8 77e7bcc1 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×2db
036efe1c 77e7bc05 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×16d
036eff80 77e76caf RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×310
036eff88 77e76ad1 RPCRT4!RecvLotsaCallsWrapper+0xd
036effa8 77e76c97 RPCRT4!BaseCachedThreadRoutine+0×79
036effb4 7c80b729 RPCRT4!ThreadStartRoutine+0×1a
036effec 00000000 kernel32!BaseThreadStart+0×37

In order to get a critical section wait chain starting from the above thread we need to set the process context, use !cs WinDbg command, then walk thread stack trace parameters:

kd> .process /r /p 89a20858
Implicit process is now 89a20858

kd> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x7c97e500
Critical section   = 0x7c980600 (ntdll!FastPebLock+0x0)
LOCKED
LockCount          = 0x10
OwningThread       = 0x000004a8
RecursionCount     = 0x1
LockSemaphore      = 0xC20
SpinCount          = 0x00000000
OwningThread       = .thread 89cd9c10
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x000d7f08
Critical section   = 0x01e700d4 (+0x1E700D4)
LOCKED
LockCount          = 0x0
OwningThread       = 0x000001b8
RecursionCount     = 0x1
LockSemaphore      = 0x0
SpinCount          = 0x00000000
OwningThread       = .thread 89b3b348
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x000d96e0
Critical section   = 0x767e406c (w32time!g_state+0x24)
LOCKED
LockCount          = 0x3
OwningThread       = 0x00000f70
RecursionCount     = 0x2
LockSemaphore      = 0x7FC
SpinCount          = 0x00000000
OwningThread       = .thread 89a6a268
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x000e74f0
Critical section   = 0x01e70cc8 (+0x1E70CC8)
LOCKED
LockCount          = 0x2
OwningThread       = 0x00000514
RecursionCount     = 0x1
LockSemaphore      = 0xBA8
SpinCount          = 0x00000000
OwningThread       = .thread 8996a338
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x00103d58
Critical section   = 0x0272a8b4 (+0x272A8B4)
LOCKED
LockCount          = 0x0
OwningThread       = 0x00000d38
RecursionCount     = 0x1
LockSemaphore      = 0x0
SpinCount          = 0x00000000
OwningThread       = .thread 89912860
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x0010e8f0
Critical section   = 0x664a3fe0 (ipnathlp!gFwMain+0x0)
LOCKED
LockCount          = 0x6
OwningThread       = 0x000009e8
RecursionCount     = 0x1
LockSemaphore      = 0xC48
SpinCount          = 0x00000000
OwningThread       = .thread 898aa600
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x0010a7d8
Critical section   = 0x00138cd0 (+0x138CD0)
LOCKED
LockCount          = 0x0
OwningThread       = 0x00000510
RecursionCount     = 0x1
LockSemaphore      = 0x0
SpinCount          = 0x00000000
OwningThread       = .thread 89a2eda8
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
-----------------------------------------
DebugInfo          = 0x00109cb0
Critical section   = 0x02750f18 (+0x2750F18)
LOCKED
LockCount          = 0x0
OwningThread       = 0x00000c84
RecursionCount     = 0x1
LockSemaphore      = 0x0
SpinCount          = 0x00000000
OwningThread       = .thread 898ba3d0
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.

kd> .thread 89b452e8
Implicit thread is now 89b452e8

kd> kv 0n10
ChildEBP RetAddr  Args to Child             
b88dccb8 804e1bf2 89b45358 89b452e8 804e1c3e nt!KiSwapContext+0x2f
b88dccc4 804e1c3e 00000000 00000000 00000000 nt!KiSwapThread+0x8a
b88dccec 8056dff6 00000001 00000006 b88dcd01 nt!KeWaitForSingleObject+0x1c2
b88dcd50 804dd99f 00000c48 00000000 00000000 nt!NtWaitForSingleObject+0x9a
b88dcd50 7c90e514 00000c48 00000000 00000000 nt!KiFastCallEntry+0xfc (TrapFrame @ b88dcd64)
036ef714 7c90df5a 7c91b24b 00000c48 00000000 ntdll!KiFastSystemCallRet
036ef718 7c91b24b 00000c48 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
036ef7a0 7c901046 004a3fe0 6648a33b 664a3fe0 ntdll!RtlpWaitForCriticalSection+0x132
036ef7a8 6648a33b 664a3fe0 6648c2ed 00000000 ntdll!RtlEnterCriticalSection+0×46
036ef7b0 6648c2ed 00000000 00000000 00000001 ipnathlp!FwLock+0xa

The thread above is waiting for the critical section 664a3fe0 which has the owner thread 898aa600:

[...]
Critical section = 0×664a3fe0 (ipnathlp!gFwMain+0×0)
LOCKED
LockCount = 0×6
OwningThread = 0×000009e8
RecursionCount = 0×1
LockSemaphore = 0xC48
SpinCount = 0×00000000
OwningThread = .thread 898aa600
[…]

kd> .thread 898aa600
Implicit thread is now 898aa600

kd> kv 0n10
ChildEBP RetAddr  Args to Child             
b7b46cb8 804e1bf2 898aa670 898aa600 804e1c3e nt!KiSwapContext+0x2f
b7b46cc4 804e1c3e 00000000 00000000 00000000 nt!KiSwapThread+0x8a
b7b46cec 8056dff6 00000001 00000006 ffffff01 nt!KeWaitForSingleObject+0x1c
b7b46d50 804dd99f 00000c20 00000000 00000000 nt!NtWaitForSingleObject+0x9a
b7b46d50 7c90e514 00000c20 00000000 00000000 nt!KiFastCallEntry+0xfc (TrapFrame @ b7b46d64)
029ef324 7c90df5a 7c91b24b 00000c20 00000000 ntdll!KiFastSystemCallRet
029ef328 7c91b24b 00000c20 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
029ef3b0 7c901046 00980600 7c910435 7c980600 ntdll!RtlpWaitForCriticalSection+0x132
029ef3b8 7c910435 7c980600 00000000 00000000 ntdll!RtlEnterCriticalSection+0×46
029ef3f8 7c9145d1 00121abe 00121ab0 00000020 ntdll!RtlAcquirePebLock+0×28

The thread 898aa600 is waiting for the critical section 7c980600 which has the owner thread 89cd9c10:

[...]
Critical section = 0×7c980600 (ntdll!FastPebLock+0×0)
LOCKED
LockCount = 0×10
OwningThread = 0×000004a8
RecursionCount = 0×1
LockSemaphore = 0xC20
SpinCount = 0×00000000
OwningThread = .thread 89cd9c10
[…]

kd> .thread 89cd9c10
Implicit thread is now 89cd9c10

kd> kv 100
ChildEBP RetAddr  Args to Child             
b881c8d4 804e1bf2 89cd9c80 89cd9c10 804e1c3e nt!KiSwapContext+0x2f
b881c8e0 804e1c3e 00000000 89e35b08 89e35b34 nt!KiSwapThread+0x8a
b881c908 f783092e 00000000 00000006 00000000 nt!KeWaitForSingleObject+0x1c2
b881c930 f7830a3b 89e35b08 00000000 f78356d8 Mup!PktPostSystemWork+0x3d
b881c94c f7836712 b881c9b0 b881c9b0 b881c9b8 Mup!PktGetReferral+0xce
b881c980 f783644f b881c9b0 b881c9b8 00000000 Mup!PktCreateDomainEntry+0x224
b881c9d0 f7836018 0000000b 00000000 b881c9f0 Mup!DfsFsctrlIsThisADfsPath+0x2bb
b881ca14 f7835829 89a2e130 899ba350 b881caac Mup!CreateRedirectedFile+0x2cd
b881ca70 804e13eb 89f46ee8 89a2e130 89a2e130 Mup!MupCreate+0x1cb
b881ca80 805794b6 89f46ed0 89df3c44 b881cc18 nt!IopfCallDriver+0x31
b881cb60 8056d03b 89f46ee8 00000000 89df3ba0 nt!IopParseDevice+0xa12
b881cbd8 805701e7 00000000 b881cc18 00000042 nt!ObpLookupObjectName+0x53c
b881cc2c 80579b12 00000000 00000000 00003801 nt!ObOpenObjectByName+0xea
b881cca8 80579be1 00cff67c 00100020 00cff620 nt!IopCreateFile+0x407
b881cd04 80579d18 00cff67c 00100020 00cff620 nt!IoCreateFile+0x8e
b881cd44 804dd99f 00cff67c 00100020 00cff620 nt!NtOpenFile+0x27
b881cd44 7c90e514 00cff67c 00100020 00cff620 nt!KiFastCallEntry+0xfc (TrapFrame @ b881cd64)
00cff5f0 7c90d5aa 7c91e8dd 00cff67c 00100020 ntdll!KiFastSystemCallRet
00cff5f4 7c91e8dd 00cff67c 00100020 00cff620 ntdll!ZwOpenFile+0xc
00cff69c 7c831e58 00cff6a8 00460044 0078894a ntdll!RtlSetCurrentDirectory_U+0x169
00cff6b0 7731889e 0078894a 00000000 00000001 kernel32!SetCurrentDirectoryW+0×2b
00cffb84 7730ffbb 00788450 00788b38 00cffbe0 schedsvc!CSchedWorker::RunNTJob+0×221
00cffe34 7730c03a 01ea9108 8ed032d4 00787df8 schedsvc!CSchedWorker::RunJobs+0×304
00cffe74 77310e4d 7c80a749 00000000 00000000 schedsvc!CSchedWorker::RunNextJobs+0×129
00cfff28 77310efc 7730b592 00000000 000ba4bc schedsvc!CSchedWorker::MainServiceLoop+0×6d9
00cfff2c 7730b592 00000000 000ba4bc 0009a2bc schedsvc!SchedMain+0xb
00cfff5c 7730b69f 00000001 000ba4b8 00cfffa0 schedsvc!SchedStart+0×266
00cfff6c 010011cc 00000001 000ba4b8 00000000 schedsvc!SchedServiceMain+0×33
00cfffa0 77df354b 00000001 000ba4b8 0007e898 svchost!ServiceStarter+0×9e
00cfffb4 7c80b729 000ba4b0 00000000 0007e898 ADVAPI32!ScSvcctrlThreadA+0×12
00cfffec 00000000 77df3539 000ba4b0 00000000 kernel32!BaseThreadStart+0×37

kd> du /c 90 0078894a
0078894a  “\\SERVER_B\Share_X$\Folder_Q”

The thread above is blocked trying to set the current directory residing on another server SERVER_B. Its waiting time is almost 13 min 34 sec:

kd> !thread 89cd9c10 7
THREAD 89cd9c10  Cid 03a8.04a8  Teb: 7ffd5000 Win32Thread: e1cdc2c0 WAIT: (UserRequest) KernelMode Non-Alertable
    89e35b34  SynchronizationEvent
IRP List:
    89a2e130: (0006,0094) Flags: 00000884  Mdl: 00000000
    89a13008: (0006,01b4) Flags: 00000000  Mdl: 00000000
Impersonation token:  e2deea00 (Level Impersonation)
DeviceMap                 e1cfe618
Owning Process            0       Image:         <Unknown>
Attached Process          89a20858       Image:         svchost.exe
Wait Start TickCount      4392           Ticks: 3125677 (0:13:33:58.703)
Context Switch Count      202                 LargeStack
UserTime                  00:00:00.031
KernelTime                00:00:00.015
Win32 Start Address ADVAPI32!ScSvcctrlThreadA (0×77df3539)
Start Address kernel32!BaseThreadStartThunk (0×7c8106f9)
Stack Init b881d000 Current b881c8bc Base b881d000 Limit b8819000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr  Args to Child             
[…] 

Looking at the previous !process 0 ff output we also find the similar system thread running through the same drivers and having the same waiting time:

THREAD 8a04cb30  Cid 0004.0014  Teb: 00000000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
    89e344a8  SynchronizationEvent
IRP List:
    89901348: (0006,0094) Flags: 00000884  Mdl: 00000000
Not impersonating
DeviceMap                 e1003170
Owning Process            0       Image:         <Unknown>
Attached Process          8a04d830       Image:         System
Wait Start TickCount      4392           Ticks: 3125677 (0:13:33:58.703)
Context Switch Count      1890            
UserTime                  00:00:00.000
KernelTime                00:00:00.109
Start Address nt!ExpWorkerThread (0×804e2311)
Stack Init f78b3000 Current f78b27c0 Base f78b3000 Limit f78b0000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr 
f78b27d8 804e1bf2 nt!KiSwapContext+0×2f
f78b27e4 804e1c3e nt!KiSwapThread+0×8a
f78b280c f7836328 nt!KeWaitForSingleObject+0×1c2
f78b282c f783622a Mup!MupiIssueQueryRequest+0×2f
f78b2854 f7836069 Mup!MupiResolvePrefix+0×11b
f78b2898 f7835829 Mup!CreateRedirectedFile+0×35d
f78b28f4 804e13eb Mup!MupCreate+0×1cb
f78b2904 805794b6 nt!IopfCallDriver+0×31
f78b29e4 8056d03b nt!IopParseDevice+0xa12
f78b2a5c 805701e7 nt!ObpLookupObjectName+0×53c
f78b2ab0 80579b12 nt!ObOpenObjectByName+0xea
f78b2b2c 80579be1 nt!IopCreateFile+0×407
f78b2b88 80573e2b nt!IoCreateFile+0×8e
f78b2bc8 804dd99f nt!NtCreateFile+0×30
f78b2bc8 804e3597 nt!KiFastCallEntry+0xfc (TrapFrame @ f78b2bfc)
f78b2c6c f78368ca nt!ZwCreateFile+0×11
f78b2cd4 f78306fa Mup!DfsCreateIpcConnection+0×9c
f78b2d60 f7830aae Mup!_PktGetReferral+0×11d
f78b2d7c 804e23d5 Mup!PktWorkInSystemContext+0×4c
f78b2dac 80576316 nt!ExpWorkerThread+0xef
f78b2ddc 804ec6f9 nt!PspSystemThreadStartup+0×34
00000000 00000000 nt!KiThreadStartup+0×16

It has an IRP having file object pointing the same server SERVER_B:

kd> !irp 89901348
Irp is active with 1 stacks 1 is current (= 0×899013b8)
 No Mdl: No System Buffer: Thread 8a04cb30:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
>[  0, 0]   0  0 89f46ee8 899ab170 00000000-00000000   
        \FileSystem\Mup
   Args: f78b2930 03000020 00070080 00000000
kd> !fileobj 899ab170

\SERVER_B\IPC$

Device Object: 0x89f46ee8   \FileSystem\Mup
Vpb is NULL

Flags:  0x2
 Synchronous IO

CurrentByteOffset: 0

So it looks like we have an instance of Coupled Machines pattern. We also notice that wait chain threads have various Windows socket modules on their thread stacks and we check if there is any IRP distribution anomaly using !irpfind command. Counting IRPs we find the most of them are directed towards HTTP, Tcpip and AFD drivers:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Introducing Crash and Hang Analysis Audit Service

Wednesday, June 16th, 2010

Adding AI. Analysis Improvement.

After reading earlier today Windows Internals pages about system audit an idea came to my mind in the evening to provide audit services for memory dump and software trace analysis. One mind is good but two are better, especially if the second is a pattern-driven AI. Here are possible problem scenarios:

Problem: You are not satisfied with a crash report.

Problem: Your critical issue is escalated to the VP level. Engineers analyze memory dumps and software traces. No definite conclusion so far. You want to be sure that nothing has been omitted from the analysis.

Problem: You analyze a system dump or a software trace. You need a second pair of eyes but don’t want to send your memory dump due to your company security policies.

Other scenarios (use cases) will be added as soon as I see the service fit to the realities of software technical support.

I plan to make this service operational in July - August, 2010. Prices to be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

10 Common Mistakes in Memory Analysis (Part 8)

Wednesday, June 16th, 2010

Yet another common mistake is not looking past the first found evidence. For example, not looking further to prove or disprove a hypothesis after finding a pattern. Let me illustrate this by a complete memory dump from a frozen system. Careful analysis of wait chains revealed a thread owning a mutant and blocking other threads from many processes:

THREAD 882e8730  Cid 0f64.2ce0  Teb: 7ff76000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    89a76e08  SynchronizationEvent
IRP List:
    883fbba0: (0006,0220) Flags: 00000830  Mdl: 00000000
Not impersonating
DeviceMap                 e1003880
Owning Process            89e264e8       Image:         ProcessA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      5090720        Ticks: 1981028 (0:08:35:53.562)
Context Switch Count      8376            
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0×010d22a3
LPC Server thread working on message Id 10d22a3
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b6e8b000 Current b6e8ac60 Base b6e8b000 Limit b6e88000 Call 0
Priority 10 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr 
b6e8ac78 8083d26e nt!KiSwapContext+0×26
b6e8aca4 8083dc5e nt!KiSwapThread+0×2e5
b6e8acec 8092cd88 nt!KeWaitForSingleObject+0×346
b6e8ad50 8083387f nt!NtWaitForSingleObject+0×9a
b6e8ad50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ b6e8ad64)
0408f910 7c827d29 ntdll!KiFastSystemCallRet
0408f914 77e61d1e ntdll!ZwWaitForSingleObject+0xc
0408f984 77e61c8d kernel32!WaitForSingleObjectEx+0xac
0408f998 027f0808 kernel32!WaitForSingleObject+0×12
0408f9bc 027588d4 DllA!DriverB_DependentFunction+0×86
[…]
0408ffec 00000000 kernel32!BaseThreadStart+0×34

So we did found a culprit component, DllA, or not? Could this blockage result from earlier problems? We search stack trace collection for any other anomalous activity (Semantic Split) and we find indeed a recurrent stack trace pattern across process landscape:

THREAD 89edadb0  Cid 09fc.1050  Teb: 7ff54000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    8a38b82c  SynchronizationEvent
IRP List:
    893a7470: (0006,0094) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 e16dc398
Owning Process            8a2ad5e0       Image:         ProcessB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3704877        Ticks: 3366871 (0:14:36:47.359)
Context Switch Count      121867            
UserTime                  00:00:25.093
KernelTime                00:00:01.968
Win32 Start Address MSVCR71!_threadstartex (0×7c3494f6)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init f3a9c000 Current f3a9b548 Base f3a9c000 Limit f3a99000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
f3a9b560 8083d26e nt!KiSwapContext+0×26
f3a9b58c 8083dc5e nt!KiSwapThread+0×2e5
f3a9b5d4 f6866255 nt!KeWaitForSingleObject+0×346
f3a9b5f4 f68663e3 DriverA!Block+0×1b
f3a9b604 f5ba05f1 DriverA!BlockWithTracker+0×19
f3a9b634 f5b9ba24 DriverB!DoRequest+0xc2
f3a9b724 f5b9b702 DriverB!QueryInfo+0×392

[…]

0: kd> !irp 893a7470
Irp is active with 1 stacks 1 is current (= 0×893a74e0)
 No Mdl: No System Buffer: Thread 89edadb0:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
>[  e, 0]   0  1 8a216f18 8a023f70 00000000-00000000    pending
        \Driver\DriverB
   Args: 0000015c 00000024 00120003 111af32c

We know that DllA!DriverB_DependentFunction will not work if DriverB device doesn’t function (for example, remote file system access without network access). The thread 89edadb0 had been waiting for more 14 hours and the originally found thread 882e8730 was waiting for less than 9 hours. This suggests looking first into DriverB / DriverA functions first.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

IRP distribution anomaly, inconsistent dump, execution residue, hardware activity, coincidental symbolic information, not my version, virtualized system: pattern cooperation

Monday, June 7th, 2010

It was reported that a system had an abnormally high number of interrupt activity visible in various monitoring tools. A complete memory dump was collected. When we open it and look at an IRP list we find statistically higher number of IRPs belonging to Disk and DriverA device drivers:

0: kd> !irpfind
Irp [ Thread ] irpStack: (Mj,Mn) DevObj [Driver] MDL Process
[…]
8a3d3008 [8b56cb10] irpStack: ( 4,34) 8b1b8030 [ \Driver\Disk] 0×00000000
8a3d3340 [8acab888] irpStack: ( 3, 0) 8b4c6030 [ \FileSystem\Npfs]
8a3d4580 [8b56cb10] irpStack: ( 4,34) 8b1b8030 [ \Driver\Disk] 0×00000000
8a403e00 [8b56cb10] irpStack: ( 4,34) 8b1b8030 [ \Driver\Disk] 0×00000000
8a4047e0 [8b56cb10] irpStack: ( 4,34) 8b1b8030 [ \Driver\Disk] 0×00000000
[…]
8aa6ab28 [00000000] irpStack: ( f, 0) 8b192030 [ \Driver\DriverA] 0×00000000
8aa6ce28 [00000000] irpStack: ( f, 0) 8b192030 [ \Driver\DriverA] 0×00000000
[…]

What we also notice is that the thread 8b56cb10 is also an active running thread so we look at its raw stack to find any executon residue providing hints to possible hardware activity.

0: kd> !stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
                            [8b57f7a8 System]
   4.000070  8b579db0 ffffff42 Blocked    +0x1
   4.0000c0  8b5768d0 ffffff42 READY      nt!KiAdjustQuantumThread+0x109
   4.0000e4  8b56cb10 ffffff42 RUNNING    +0xf6fb2044
[…]

0: kd> !thread 8b56cb10
THREAD 8b56cb10  Cid 0004.00e4  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
Not impersonating
DeviceMap                 e10018b8
Owning Process            8b57f7a8       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      24437735    
Context Switch Count      33644            
UserTime                  00:00:00.000
KernelTime                00:00:07.843
Start Address nt!MiMappedPageWriter (0x80847ffe)
Stack Init f70c4000 Current f70c3468 Base f70c4000 Limit f70c1000 Call 0
Priority 17 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
f70c3470 00000000 000000be e1912bc0 e1912bc4 0xf6fb2044

However, WinDbg reports another current thread running on the same processor so we obviously have an inconsistent dump and should exercise caution:

0: kd> !thread
THREAD 8089d8c0  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
Not impersonating
Owning Process            8089db40       Image:         Idle
Attached Process          N/A            Image:         N/A
Wait Start TickCount      24437476       Ticks: 69 (0:00:00:01.078)
Context Switch Count      72194391            
UserTime                  00:00:00.000
KernelTime                4 Days 08:57:56.171
Stack Init 8089a8b0 Current 8089a5fc Base 8089a8b0 Limit 808978b0 Call 0
Priority 0 BasePriority 0 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
f3b30c5c 00000000 00000000 00000000 00000000 LiveKdD+0×1c07

0: kd> !running
[...]
Prcbs  Current   Next   
  0    ffdff120  8089d8c0            …………….
  1    f772f120  f7732090            …………….

0: kd> !thread f7732090
THREAD f7732090  Cid 0000.0000  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
Not impersonating
Owning Process            8089db40       Image:         Idle
Attached Process          N/A            Image:         N/A
Wait Start TickCount      0              Ticks: 24437545 (4:10:03:56.640)
Context Switch Count      75624870            
UserTime                  00:00:00.000
KernelTime                4 Days 08:56:05.125
Stack Init f78b3000 Current f78b2d4c Base f78b3000 Limit f78b0000 Call 0
Priority 0 BasePriority 0 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
f3b30c5c 00000000 00000000 00000000 00000000 LiveKdD+0×1c07

Let’s come back to the thread 8b56cb10. Its raw stack residue shows traces of SCSI activity:

0: kd> dps f70c1000 f70c4000
[...]
f70c2f38  8b20100c
f70c2f3c  f70c2f9c
f70c2f40  f72922a5 DriverA!BuildScatterGather+0×9b
f70c2f44  8b20100c
f70c2f48  8aa24dfc
[…]
f70c337c  00000000
f70c3380  00000246
f70c3384  808a6228 nt!KiProcessorBlock+0×8
f70c3388  00000002
f70c338c  00000011
f70c3390  00000246
f70c3394  f70c33a4
f70c3398  80a62a73 hal!HalRequestIpi+0×13
f70c339c  00000002
f70c33a0  000000e1
f70c33a4  f70c33dc
f70c33a8  8082e4db nt!KiIpiSend+0×27
f70c33ac  00000002
f70c33b0  f772fa7c
f70c33b4  8b56bdb0
f70c33b8  ffdff120
f70c33bc  00000000
f70c33c0  00000002
f70c33c4  00000001
f70c33c8  00000000
f70c33cc  00000002
f70c33d0  00000002
f70c33d4  f70c33e4
f70c33d8  80a61456 hal!KfLowerIrql+0×62
f70c33dc  00000001
f70c33e0  00000002
f70c33e4  f70c3494
f70c33e8  f70c3450
f70c33ec  8b56cb10
f70c33f0  8b089100
f70c33f4  8a5abe01
f70c33f8  f70c3450
f70c33fc  8b089100
f70c3400  8a5abe01
f70c3404  8b089101
f70c3408  f70c3418
f70c340c  80a61456 hal!KfLowerIrql+0×62
f70c3410  8a5abe98
f70c3414  8b089101
f70c3418  f70c3450
f70c341c  f70c3434
f70c3420  80819c10 nt!FsFilterPerformCompletionCallbacks+0×2e
f70c3424  f70c3450
f70c3428  00000000
f70c342c  00000000
f70c3430  00000000
f70c3434  f70c3584
f70c3438  f70c3584
f70c343c  80815040 nt!FsRtlReleaseFileForModWrite+0×190
f70c3440  f70c3450
f70c3444  8b56cdc4
f70c3448  00010000
f70c344c  8b56cd68
f70c3450  00000024
f70c3454  8b56cbfc
f70c3458  8abe10f0
f70c345c  8a5b4830
f70c3460  8b089100
f70c3464  80a613f4 hal!KfLowerIrql
f70c3468  00000001
f70c346c  00000246
f70c3470  f6fb2044
f70c3474  00000000
f70c3478  000000be
f70c347c  e1912bc0
f70c3480  e1912bc4
f70c3484  8a4b7db8
f70c3488  00000011
f70c348c  f70c34a4
f70c3490  8081610e nt!FsRtlLookupBaseMcbEntry+0×16
f70c3494  80887b75 nt!KiFlushTargetSingleTb+0xd
f70c3498  f70c34d0
f70c349c  8082e431 nt!KiIpiServiceRoutine+0×4d
f70c34a0  f772f121
f70c34a4  00000000
f70c34a8  e2894000
f70c34ac  00000000
f70c34b0  80872322 nt!WRITE_REGISTER_ULONG+0xa
f70c34b4  8b20100c
f70c34b8  80a6157e hal!HalEndSystemInterrupt+0×6e
f70c34bc  8b20100c
f70c34c0  f70c34d0
f70c34c4  80a5e902 hal!HalpIpiHandler+0xd2
f70c34c8  80816209 nt!FsRtlLookupLargeMcbEntry+0×4d
f70c34cc  000000e1
f70c34d0  f70c3564
f70c34d4  80872322 nt!WRITE_REGISTER_ULONG+0xa
f70c34d8  badb0d00
f70c34dc  f6fb2040
f70c34e0  8b20100c
f70c34e4  8b038fb4
f70c34e8  0000f000
f70c34ec  f70c3510
f70c34f0  8b377e10
f70c34f4  8b20100c
f70c34f8  8b038fb4
f70c34fc  00000000
f70c3500  00000000
f70c3504  8b377e64
f70c3508  00000007
f70c350c  f6fb2040
f70c3510  8b201100
f70c3514  0b377e10
f70c3518  00000005
f70c351c  ffdff120
f70c3520  ffdffa40
f70c3524  8b4eca09
f70c3528  8b20100c
f70c352c  ffdffa40
f70c3530  8b4eca09
f70c3534  ffdffa09
f70c3538  f70c3548
f70c353c  80a61456 hal!KfLowerIrql+0×62
f70c3540  8b4ecab4
f70c3544  ffdffa09
f70c3548  f70c356c
f70c354c  80829f70 nt!KeInsertQueueDpc+0×1c4
f70c3550  8b4ecaf8
f70c3554  8b038fb4
f70c3558  8b192001
f70c355c  ffdffa48
f70c3560  ffdff120
f70c3564  00000000
f70c3568  01092855
f70c356c  f70c3580
f70c3570  f727221d SCSIPORT!SpRequestCompletionDpc+0×2d
f70c3574  014ecab4
f70c3578  8b4ecab8
f70c357c  8b4ecaf8
f70c3580  8b4ecbf8
f70c3584  00000102
f70c3588  8b4eca40
f70c358c  8b4ecaf8
f70c3590  8b4ecbf8
f70c3594  8b038f02
f70c3598  f70c35a8
f70c359c  8b4ecbf8
f70c35a0  8b038f02
f70c35a4  8b4ecb02
f70c35a8  f70c35b8
f70c35ac  80a61456 hal!KfLowerIrql+0×62
f70c35b0  8b038f02
f70c35b4  8b4ecb02
f70c35b8  f70c35d8
f70c35bc  80a5f44b hal!KfReleaseSpinLock+0xb
f70c35c0  f72737ae SCSIPORT!SpReceiveScatterGather+0×33b
f70c35c4  8b56bb94
f70c35c8  00000000
f70c35cc  0cd8e000
f70c35d0  0000000f
f70c35d4  0000000f
f70c35d8  f70c3604
f70c35dc  80a60147 hal!HalBuildScatterGatherList+0×1c7
f70c35e0  8b4eca40
f70c35e4  8a5acd20
f70c35e8  8ab7aa98
f70c35ec  8ab7aa30
f70c35f0  8a5acd20
f70c35f4  8b4ecaf8
f70c35f8  8b038fb4
f70c35fc  00804001
f70c3600  00000000
f70c3604  f70c3650
f70c3608  f72733a6 SCSIPORT!ScsiPortStartIo+0×36a
f70c360c  8ab7aa98
f70c3610  8b4eca40
f70c3614  8b56bb38
f70c3618  00000000
f70c361c  00010000
f70c3620  f72736b4 SCSIPORT!SpReceiveScatterGather
f70c3624  8ab7aa30
f70c3628  00000000
f70c362c  8b4eca40
f70c3630  8a5acd20
f70c3634  00000002
f70c3638  8b4eca40
f70c363c  f70c39e0
f70c3640  f70c3658
f70c3644  00000000
f70c3648  80a611ae hal!HalpDispatchSoftwareInterrupt+0×5e
f70c364c  00000000
f70c3650  8a5acd00
f70c3654  00000202
f70c3658  f70c3674
f70c365c  80a613d9 hal!HalpCheckForSoftwareInterrupt+0×81
f70c3660  8b4ecb02
f70c3664  00000000
f70c3668  8b1920e8
f70c366c  8a5acd00
f70c3670  8b4ecb02
f70c3674  f70c3684
f70c3678  80a61456 hal!KfLowerIrql+0×62
f70c367c  8a5acd20
f70c3680  8b4ecb00
f70c3684  f70c36a8
f70c3688  f7273638 SCSIPORT!ScsiPortFdoDispatch+0×279
f70c368c  8b4ecaf8
f70c3690  8b41a228
f70c3694  8a5acd20
f70c3698  f70c36d0
f70c369c  f70c36ac
f70c36a0  8ab7aa30
f70c36a4  8b1920e8
f70c36a8  f70c36c4
f70c36ac  f7273146 SCSIPORT!SpDispatchRequest+0×68
f70c36b0  8b4eca40
f70c36b4  005acdb4
f70c36b8  8b038fb4
f70c36bc  8b1920e8
f70c36c0  8a5acd20
f70c36c4  f70c36e0
f70c36c8  f7272dc3 SCSIPORT!ScsiPortPdoScsi+0×129
f70c36cc  8b1920e8
f70c36d0  8a5acd20
f70c36d4  8a581008
f70c36d8  8a5acd20
f70c36dc  8b192030
f70c36e0  f70c36f4
f70c36e4  f7272299 SCSIPORT!ScsiPortGlobalDispatch+0×1d
f70c36e8  8b192030
f70c36ec  8a5acd20
f70c36f0  8b5441c8
f70c36f4  f70c3708
f70c36f8  8081df85 nt!IofCallDriver+0×45
f70c36fc  8b192030
f70c3700  8a5acd20
f70c3704  8b038f08
f70c3708  f70c3718
f70c370c  f725f607 CLASSPNP!SubmitTransferPacket+0xbb
f70c3710  8b038f08
f70c3714  00000000
f70c3718  f70c374c
f70c371c  f725f2b2 CLASSPNP!ServiceTransferRequest+0×1e4
f70c3720  8b038f08
f70c3724  8b1b80e8
f70c3728  8a581150
f70c372c  8a581008
f70c3730  24052000
f70c3734  00000001
f70c3738  00000001
f70c373c  00011000
f70c3740  00010000
f70c3744  00000000
f70c3748  00000001
f70c374c  f70c3770
f70c3750  f725f533 CLASSPNP!ClassReadWrite+0×159
f70c3754  00000103
f70c3758  00000000
f70c375c  8a581008
f70c3760  8b57e218
f70c3764  8b055790
f70c3768  8b192030
f70c376c  00010000
f70c3770  f70c3784
f70c3774  8081df85 nt!IofCallDriver+0×45
f70c3778  8b1b8030
f70c377c  00000103
f70c3780  8b51d1c8
f70c3784  f70c3794
f70c3788  f74c80cf PartMgr!PmReadWrite+0×95
f70c378c  8b467e00
f70c3790  8a581174
f70c3794  f70c37a8
f70c3798  8081df85 nt!IofCallDriver+0×45
f70c379c  8b0556d8
f70c37a0  8a581008
f70c37a4  8a581198
f70c37a8  f70c37c4
f70c37ac  f7317053 ftdisk!FtDiskReadWrite+0×1a9
f70c37b0  8a5811b4
f70c37b4  8b5570c8
f70c37b8  8b201c40
f70c37bc  24032000
f70c37c0  8b467d48
f70c37c4  f70c37d8
f70c37c8  8081df85 nt!IofCallDriver+0×45
f70c37cc  8b467d48
f70c37d0  8a581008
f70c37d4  8a5811d8
f70c37d8  f70c37f8
f70c37dc  f72c0537 volsnap!VolSnapWrite+0×46f
f70c37e0  8a581008
f70c37e4  8b5851c8
f70c37e8  e25b3668
f70c37ec  fd800000
f70c37f0  8b201c40
f70c37f4  00000002
f70c37f8  f70c380c
f70c37fc  8081df85 nt!IofCallDriver+0×45
f70c3800  8b201b88
f70c3804  8a581008
[…]

There’s past DPC insertion activity but no pending DPCs:

0: kd> ub 80829f70
nt!KeInsertQueueDpc+0x1a9:
80829f55 6a02            push    2
80829f57 5a              pop     edx
80829f58 e857450000      call    nt!KiIpiSend (8082e4b4)
80829f5d eb08            jmp     nt!KeInsertQueueDpc+0x1bb (80829f67)
80829f5f b102            mov     cl,2
80829f61 ff1598108080    call    dword ptr [nt!_imp_HalRequestSoftwareInterrupt (80801098)]
80829f67 8a4dfe          mov     cl,byte ptr [ebp-2]
80829f6a ff1508118080    call    dword ptr [nt!_imp_KfLowerIrql (80801108)]

0: kd> !dpcs
CPU Type      KDPC       Function

dpcs: no pending DPCs found

We notice DriverA and see it also attached to Disk driver device:

0: kd> !irp 8a3d3008
Irp is active with 11 stacks 7 is current (= 0x8a3d3150)
 Mdl=8b56b6a0: No System Buffer: Thread 8b56cb10:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
   Args: 00000000 00000000 00000000 00000000
>[  4,34]   0 e0 8b1b8030 00000000 f74c857e-8b0556d8 Success Error Cancel
        \Driver\Disk PartMgr!PmIoCompletion
   Args: 00010000 00000000 f457e000 00000002
 [  4, 0]   0 e1 8b0556d8 00000000 f7316558-8b467e00 Success Error Cancel pending
        \Driver\PartMgr ftdisk!FtpRefCountCompletionRoutine
   Args: 7274c6df 0002b74c f457e000 00000002
 [  4, 0]   0 e1 8b467d48 00000000 f72bf638-8b201c40 Success Error Cancel pending
        \Driver\Ftdisk volsnap!VspRefCountCompletionRoutine
   Args: 7274c5f4 0002b74c f456e000 00000002
 [  4, 0]   0 e1 8b201b88 00000000 f71814e8-8b206238 Success Error Cancel pending
        \Driver\VolSnap Ntfs!NtfsSingleAsyncCompletionRoutine
   Args: 00010000 00000000 f456e000 00000002
 [  4, 0]   0  0 8b36b020 8a5b4830 00000000-00000000   
        \FileSystem\Ntfs
   Args: 00010000 00000000 08710000 00000000

0: kd> !devobj 8b1b8030
Device object (8b1b8030) is for:
 DR0 \Driver\Disk DriverObject 8b57e218
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00000050
Vpb 8b52f080 Dacl e13e397c DevExt 8b1b80e8 DevObjExt 8b1b8548 Dope 8b467828
ExtensionFlags (0000000000) 
AttachedDevice (Upper) 8b0556d8 \Driver\PartMgr
AttachedTo (Lower) 8b192030 \Driver\DriverA
Device queue is not busy.

0: kd> !devstack 8b1b8030
  !DevObj   !DrvObj            !DevExt   ObjectName
  8b0556d8  \Driver\PartMgr    8b055790 
> 8b1b8030  \Driver\Disk       8b1b80e8  DR0
  8b192030  \Driver\DriverA    8b1920e8  DriverA1Port1Path0Target0Lun0
!DevNode 8b56f5d0 :
  DeviceInst is “SCSI\Disk&Ven_A&Prod_Virtual_disk&Rev_1.0\4&1588251b&0&000″
  ServiceName is “disk” 

Additionally we look at raw stack data of running threads 8089d8c0 and f7732090. We find the same hardware activity there and plus some network driver activity too. We also see “double fault” symbols:

[...]
8089a554 ffdffec0
8089a558 80a6157e hal!HalEndSystemInterrupt+0x6e
8089a55c ffdffec0
8089a560 8089a570 nt!KiDoubleFaultStack+0×2cc0
8089a564 80a5e902 hal!HalpIpiHandler+0xd2
8089a568 00000002
8089a56c 000000e1
8089a570 8089a600 nt!KiDoubleFaultStack+0×2d50
8089a574 f7549ca2 intelppm!AcpiC1Idle+0×12
8089a578 badb0d00
8089a57c 0002b74b
8089a580 00000000
8089a584 f7298da0 DriverA!DevScsiTimer
8089a588 00000000
8089a58c 00000000
8089a590 0005d373
8089a594 00000000
8089a598 8b4ecaf8
8089a59c 00000000
8089a5a0 8a4b1e20
8089a5a4 00000000
8089a5a8 8089a600 nt!KiDoubleFaultStack+0×2d50
8089a5ac 0002b74b
8089a5b0 ffdffee0
[…]

But they seem to be coincidental:

0: kd> ub 8089a570
^ Unable to find valid previous instruction for 'ub 8089a570'

0: kd> u 8089a570
nt!KiDoubleFaultStack+0x2cc0:
8089a570 00a68980a29c add byte ptr [esi-635D7F77h],ah
8089a576 54 push esp
8089a577 f7000ddbba4b test dword ptr [eax],4BBADB0Dh
8089a57d b702 mov bh,2
8089a57f 0000 add byte ptr [eax],al
8089a581 0000 add byte ptr [eax],al
8089a583 00a08d29f700 add byte ptr [eax+0F7298Dh],ah
8089a589 0000 add byte ptr [eax],al

0: kd> ub 8089a600
                 ^ Unable to find valid previous instruction for 'ub 8089a600'

0: kd> u 8089a600
nt!KiDoubleFaultStack+0x2d50:
8089a600 0100            add     dword ptr [eax],eax
8089a602 0000            add     byte ptr [eax],al
8089a604 ebde            jmp     nt!KiDoubleFaultStack+0x2d34 (8089a5e4)
8089a606 888000000000    mov     byte ptr [eax],al
8089a60c 0e              push    cs
8089a60d 0000            add     byte ptr [eax],al
8089a60f 0000            add     byte ptr [eax],al
8089a611 0000            add     byte ptr [eax],al 

We find the system virtualized:

0: kd> !sysinfo machineid
[...]
SystemManufacturer = Virtualization Vendor A
[...]

Looking at DriverA timestamp we notice it is much older than expected and Google search points to similar cases (but not for virtualized systems) where it was recommended to update that driver.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Inconsistent dump, stack trace collection, LPC, thread, process, executive resource wait chains, missing threads and waiting thread time: pattern cooperation

Saturday, February 27th, 2010

Here is a synthetic case study to show various wait chain patterns. The complete memory dump from a frozen system is inconsistent, saved by LiveKd. Stack trace collection shows many threads waiting for LPC replies:

THREAD 87432520  Cid 0b10.0dd8  Teb: 7ff83000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8743270c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 007ade85:
Current LPC port d676e560
Not impersonating
DeviceMap                 d6f05be8
Owning Process            87581c78       Image:         ProcessA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      4093415        Ticks: 659565 (0:02:51:45.703)
Context Switch Count      111255            
UserTime                  00:00:27.781
KernelTime                00:00:02.015
Win32 Start Address MSVCR71!_threadstartex (0×7c3494f6)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b6439000 Current b6438c08 Base b6439000 Limit b6436000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
b6438c10 00000000 0×0

Checking MessageId by using !lpc command gives us the following LPC server thread that is waiting for a mutant owned by thread 866d63e8 (this is equivalent that thread 85b209d0 is waiting for thread 866d63e8):

THREAD 85b209d0  Cid 1524.2770  Teb: 7ff78000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    85e9ba00  Mutant - owning thread 866d63e8
Not impersonating
DeviceMap                 d64008d8
Owning Process            87200880       Image:         ProcessB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      4093415        Ticks: 659565 (0:02:51:45.703)
Context Switch Count      28            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×007ade85
LPC Server thread working on message Id 7ade85
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b42e5000 Current b42e4c60 Base b42e5000 Limit b42e2000 Call 0
Priority 10 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
b42e4c68 00000000 0xa000a02

Thread 866d63e8 is waiting for a process 86b33b30:

THREAD 866d63e8  Cid 1524.3984  Teb: 7ff6e000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    86b33b30  ProcessObject
    866d6460  NotificationTimer
Not impersonating
DeviceMap                 d64008d8
Owning Process            87200880       Image:         ProcessB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      4755080     
Context Switch Count      4767            
UserTime                  00:00:00.015
KernelTime                00:00:00.000
Win32 Start Address 0×007a5051
LPC Server thread working on message Id 7a5051
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init ab459000 Current ab458c60 Base ab459000 Limit ab456000 Call 0
Priority 10 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr 
ab458c78 8083d5b1 nt!KiSwapContext+0×26
ab458ca4 8083df9e nt!KiSwapThread+0×2e5
ab458cec 8092ae67 nt!KeWaitForSingleObject+0×346
ab458d50 80833bef nt!NtWaitForSingleObject+0×9a
ab458d50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ ab458d64)
0499fbec 7c827d29 ntdll!KiFastSystemCallRet
0499fbf0 76548721 ntdll!ZwWaitForSingleObject+0xc
0499fc0c 7654aa54 ProcessB!WaitForProcess+0×4a
[…]
0499ffec 00000000 kernel32!BaseThreadStart+0×34

We find the following thread in the process 86b33b30 (there is only one thread left where we expect many of them in ProcessC):

THREAD 85a6fdb0  Cid 5c0c.26f4  Teb: 7ffdf000 Win32Thread: b9b778a0 WAIT: (Unknown) KernelMode Non-Alertable
    86686030  SynchronizationEvent
    85a6fe28  NotificationTimer
Not impersonating
DeviceMap                 d6767248
Owning Process            86b33b30       Image:         ProcessC.EXE
Attached Process          N/A            Image:         N/A
Wait Start TickCount      4755681     
Context Switch Count      77668                 LargeStack
UserTime                  00:00:00.437
KernelTime                00:00:04.421
*** ERROR: Module load completed but symbols could not be loaded for ProcessC.EXE
Win32 Start Address 0×300010cc
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init 9ad92000 Current 9ad91a40 Base 9ad92000 Limit 9ad8d000 Call 0
Priority 10 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr 
9ad91a58 8083d5b1 nt!KiSwapContext+0×26
9ad91a84 8083df9e nt!KiSwapThread+0×2e5
9ad91acc 8081e05b nt!KeWaitForSingleObject+0×346
9ad91b08 8082e012 nt!ExpWaitForResource+0xd5
9ad91b28 b6a9c1ee nt!ExAcquireResourceExclusiveLite+0×8d
WARNING: Stack unwind information not available. Following frames may be wrong.
9ad91b38 b6ab09d3 DriverA+0×21ee
[…]
9ad91c3c 80840153 DriverA+0×1ed6

9ad91c50 8092b51f nt!IofCallDriver+0×45
9ad91c64 8092b454 nt!IopSynchronousServiceTail+0×10b
9ad91d00 8092b574 nt!IopXxxControlFile+0×60f
9ad91d34 80833bef nt!NtDeviceIoControlFile+0×2a
9ad91d34 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ 9ad91d64)
0012d4e0 00000000 ntdll!KiFastSystemCallRet

It is blocked by DriverA waiting for an executive resource. Fortunately !locks command works for this inconsistent dump and we finally reach thread 86ba5638:

0: kd> !locks

Resource @ 0x85610d30    Exclusively owned
    Contention Count = 4
    NumberOfExclusiveWaiters = 1
     Threads: 86ba5638-01<*>
     Threads Waiting On Exclusive Access:
              85a6fdb0

This thread belongs to another instance of ProcessC.exe (different process 8690dd88):

0: kd> !thread 86ba5638 1f
THREAD 86ba5638  Cid 186c.1574  Teb: 7ffdf000 Win32Thread: b9a28a70 WAIT: (Unknown) KernelMode Non-Alertable
    8944e3d4  SynchronizationEvent
Not impersonating
DeviceMap                 d6767248
Owning Process            8690dd88       Image:         ProcessC.EXE
Attached Process          N/A            Image:         N/A
Wait Start TickCount      4074148        Ticks: 678832 (0:02:56:46.750)
Context Switch Count      95896                 LargeStack
UserTime                  00:00:00.593
KernelTime                00:00:05.343
*** ERROR: Module load completed but symbols could not be loaded for ProcessC.EXE
Win32 Start Address 0×300010cc
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init 99ef2000 Current 99ef19c0 Base 99ef2000 Limit 99eec000 Call 0
Priority 10 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr 
99ef19d8 8083d5b1 nt!KiSwapContext+0×26
99ef1a04 8083df9e nt!KiSwapThread+0×2e5
99ef1a4c 80853f3f nt!KeWaitForSingleObject+0×346
99ef1a64 8081d45f nt!KiAcquireFastMutex+0×13
99ef1a70 b6a9c70d nt!ExAcquireFastMutex+0×20
WARNING: Stack unwind information not available. Following frames may be wrong.
99ef1a7c b6aaf22a DriverA+0×270d
[…]
99ef1c3c 80840153 DriverA+0×1ed6

99ef1c50 8092b51f nt!IofCallDriver+0×45
99ef1c64 8092b454 nt!IopSynchronousServiceTail+0×10b
99ef1d00 8092b574 nt!IopXxxControlFile+0×60f
99ef1d34 80833bef nt!NtDeviceIoControlFile+0×2a
99ef1d34 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ 99ef1d64)
0012db08 00000000 ntdll!KiFastSystemCallRet

We see this thread is also blocked by DriverA. We also check thread waiting times. All threads involved in wait chains have their Ticks value less than 86ba5638. This means that thread 86ba5638 was blocked earlier. We contact DriverA vendor for any possible updates.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Complete Stack Traces from x64 System

Tuesday, February 9th, 2010

Previously I wrote on how to get a 32-bit stack trace from a 32-bit process thread on an x64 system. There are situations when we are interested in all such stack traces, for example, from a complete memory dump. I wrote a script that extracts both 64-bit and WOW64 32-bit stack traces:

.load wow64exts
!for_each_thread "!thread @#Thread 1f;.thread /w @#Thread; .reload; kb 256; .effmach AMD64"

Example output fragment for a thread fffffa801f3a3bb0 from a very long debugger log file:

[...]

Setting context for owner process...
.process /p /r fffffa8013177c10

                                    

THREAD fffffa801f3a3bb0  Cid 4b4c.5fec  Teb: 000000007efaa000 Win32Thread: fffff900c1efad50 WAIT: (UserRequest) UserMode Non-Alertable
    fffffa8021ce4590  NotificationEvent
    fffffa801f3a3c68  NotificationTimer
Not impersonating
DeviceMap                 fffff8801b551720
Owning Process            fffffa8013177c10       Image:         application.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14066428       Ticks: 301 (0:00:00:04.695)
Context Switch Count      248                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address mscorwks!Thread::intermediateThreadProc (0x00000000733853b3)
Stack Init fffffa60190e5db0 Current fffffa60190e5940
Base fffffa60190e6000 Limit fffffa60190df000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           Call Site
fffffa60`190e5980 fffff800`01cba0fa nt!KiSwapContext+0x7f
fffffa60`190e5ac0 fffff800`01caedab nt!KiSwapThread+0x13a
fffffa60`190e5b30 fffff800`01f1d608 nt!KeWaitForSingleObject+0x2cb
fffffa60`190e5bc0 fffff800`01cb7973 nt!NtWaitForSingleObject+0x98
fffffa60`190e5c20 00000000`75183d09 nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffffa60`190e5c20)
00000000`069ef118 00000000`75183b06 wow64cpu!CpupSyscallStub+0x9
00000000`069ef120 00000000`74f8ab46 wow64cpu!Thunk0ArgReloadState+0x1a
00000000`069ef190 00000000`74f8a14c wow64!RunCpuSimulation+0xa
00000000`069ef1c0 00000000`771605a8 wow64!Wow64LdrpInitialize+0x4b4
00000000`069ef720 00000000`771168de ntdll! ?? ::FNODOBFM::`string'+0x20aa1
00000000`069ef7d0 00000000`00000000 ntdll!LdrInitializeThunk+0xe

.process /p /r 0
Implicit thread is now fffffa80`1f3a3bb0
WARNING: WOW context retrieval requires
switching to the thread's process context.
Use .process /p fffffa80`1f6b2990 to switch back.
Implicit process is now fffffa80`13177c10
x86 context set
Loading Kernel Symbols
Loading User Symbols
Loading unloaded module list
Loading Wow64 Symbols
ChildEBP RetAddr
06aefc68 76921270 ntdll_772b0000!ZwWaitForSingleObject+0x15
06aefcd8 7328c639 kernel32!WaitForSingleObjectEx+0xbe
06aefd1c 7328c56f mscorwks!PEImage::LoadImage+0x1af
06aefd6c 7328c58e mscorwks!CLREvent::WaitEx+0x117
06aefd80 733770fb mscorwks!CLREvent::Wait+0x17
06aefe00 73377589 mscorwks!ThreadpoolMgr::SafeWait+0x73
06aefe64 733853f9 mscorwks!ThreadpoolMgr::WorkerThreadStart+0x11c
06aeff88 7699eccb mscorwks!Thread::intermediateThreadProc+0x49
06aeff94 7732d24d kernel32!BaseThreadInitThunk+0xe
06aeffd4 7732d45f ntdll_772b0000!__RtlUserThreadStart+0x23
06aeffec 00000000 ntdll_772b0000!_RtlUserThreadStart+0x1b
Effective machine: x64 (AMD64)

[...]

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Stack trace collection, missing threads, waiting time, critical section and LPC wait chains: pattern cooperation

Friday, November 20th, 2009

Windows shutdown couldn’t progress and a complete memory dump was saved for later postmortem analysis. !stacks command showed reduced number of waiting threads in one important system service:

0: kd> !stacks
[...]
                            [89d6d8e8 ServiceA.exe]
1454.0014b0  89d36b60 0000163 Blocked    DriverA!Check+0x177
[...]

Normally this service had at least a dozen waiting threads. If we switch to the process we see many threads missing and the process itself is in the process of exiting (three “process” nouns in one sentence):

0: kd> !process 89d6d8e8
PROCESS 89d6d8e8  SessionId: 0  Cid: 1454    Peb: 7ffd8000  ParentCid: 0234
    DirBase: afa06000  ObjectTable: e5491278  HandleCount: 444.
    Image: ServiceA.exe
    VadRoot 89db18d8 Vads 213 Clone 0 Private 827. Modified 15. Locked 0.
    DeviceMap e10028c8
    Token                             e5556710
    ElapsedTime                       2 Days 02:59:39.285
    UserTime                          00:00:08.375
    KernelTime                        00:00:20.046
    QuotaPoolUsage[PagedPool]         50660
    QuotaPoolUsage[NonPagedPool]      9704
    Working Set Sizes (now,min,max)  (2523, 50, 345) (10092KB, 200KB, 1380KB)
    PeakWorkingSetSize                2953
    VirtualSize                       76 Mb
    PeakVirtualSize                   78 Mb
    PageFaultCount                    19259
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      1522

THREAD 89d36b60  Cid 1454.14b0  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Alertable
    8a8d7438  NotificationEvent
    89d36bd8  NotificationTimer
Not impersonating
DeviceMap                 e10028c8
Owning Process            89d6d8e8       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11760358       Ticks: 355 (0:00:00:05.546)
Context Switch Count      4591            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Start Address DriverA!CheckProtocolStackThread (0xf762cfa0)
Stack Init f3e7b000 Current f3e7acc0 Base f3e7b000 Limit f3e78000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
f3e7acd8 8083d5b1 nt!KiSwapContext+0x26
f3e7ad04 8083df9e nt!KiSwapThread+0x2e5
f3e7ad4c f762cf8d nt!KeWaitForSingleObject+0x346
[...]
f3e7adac 8092083b DriverA!CheckProtocolStackThread+0x5
f3e7addc 8083fe9f nt!PspSystemThreadStartup+0x2e
00000000 00000000 nt!KiThreadStartup+0x16

THREAD 89ce9580  Cid 1454.1630  Teb: 7ff9e000 Win32Thread: bc1e71f8 WAIT: (Unknown) UserMode Non-Alertable
     893fae40  SynchronizationEvent
Not impersonating
DeviceMap                 e10028c8
Owning Process            89d6d8e8       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11048203       Ticks: 712510 (0:03:05:32.968)
Context Switch Count      1103                 LargeStack
UserTime                  00:00:00.281
KernelTime                00:00:01.484
Win32 Start Address DllA!OperationThread (0x1003b37e)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f65a3000 Current f65a2c60 Base f65a3000 Limit f65a0000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f65a2c78 8083d5b1 nt!KiSwapContext+0x26
f65a2ca4 8083df9e nt!KiSwapThread+0x2e5
f65a2cec 8092ae67 nt!KeWaitForSingleObject+0x346
f65a2d50 80833bef nt!NtWaitForSingleObject+0x9a
f65a2d50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f65a2d64)
0293fd18 7c827d29 ntdll!KiFastSystemCallRet
0293fd1c 77e61d1e ntdll!ZwWaitForSingleObject+0xc
0293fd8c 77e61c8d kernel32!WaitForSingleObjectEx+0xac
0293fda0 724c705b kernel32!WaitForSingleObject+0x12
0293fdb4 724c6745 DllB!Cleanup+0x3b
[...]
0293fde0 7c81a352 DllB!DLLEntry+0x62
0293fe00 7c830e90 ntdll!LdrpCallInitRoutine+0x14
0293feb8 77e668ab ntdll!LdrShutdownProcess+0x182
0293ffa4 77e6690d kernel32!_ExitProcess+0×43
0293ffb8 77e792c1 kernel32!ExitProcess+0×14

0293ffec 00000000 kernel32!BaseThreadStart+0×5f

However, the brief scan of all other processes and threads from !process 0 ff command output shows that another important service ServiceB has critical section wait chains:

THREAD 89e1f890  Cid 09f4.1018  Teb: 7ff96000 Win32Thread: bc279160 WAIT: (Unknown) UserMode Non-Alertable
    89d96c30  SynchronizationEvent
Not impersonating
DeviceMap                 e10028c8
Owning Process            8a453b18       Image:         ServiceB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11750950       Ticks: 9763 (0:00:02:32.546)
Context Switch Count      327                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address msvcrt!_endthreadex (0×77bcb4bc)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init f6113000 Current f6112c60 Base f6113000 Limit f6110000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 1
Kernel stack not resident.
ChildEBP RetAddr 
f6112c78 8083d5b1 nt!KiSwapContext+0×26
f6112ca4 8083df9e nt!KiSwapThread+0×2e5
f6112cec 8092ae67 nt!KeWaitForSingleObject+0×346
f6112d50 80833bef nt!NtWaitForSingleObject+0×9a
f6112d50 7c82860c nt!KiFastCallEntry+0xfc
09eafd98 7c827d29 ntdll!KiFastSystemCallRet
09eafd9c 7c83d266 ntdll!ZwWaitForSingleObject+0xc
09eafdd8 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0×1a3
09eafdf8 6738d489 ntdll!RtlEnterCriticalSection+0xa8

[…]
09eaffb8 77e6482f msvcrt!_endthreadex+0xa3
09eaffec 00000000 kernel32!BaseThreadStart+0×34

We switch to this process and find the owner of a critical section that blocks other threads:

0: kd> .process /r /p 8a453b18
Implicit process is now 8a453b18

0: kd> !cs -l -o -s
[...]
DebugInfo          = 0x0a199ea0
Critical section   = 0x0998ac80 (+0x998AC80)
LOCKED
LockCount          = 0x5
WaiterWoken        = No
OwningThread       = 0x00001680
RecursionCount     = 0x1
LockSemaphore      = 0xE08
SpinCount          = 0x00000000
OwningThread       = .thread 89bfc4d8
[…]

0: kd> !thread 89bfc4d8 1f
THREAD 89bfc4d8  Cid 09f4.1680  Teb: 7ff70000 Win32Thread: bc1e79d8 WAIT: (Unknown) UserMode Non-Alertable
    89bfc6c4  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00fbbc86:
Current LPC port e544f108
Not impersonating
DeviceMap                 e10028c8
Owning Process            8a453b18       Image:         ServiceB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11049035       Ticks: 711678 (0:03:05:19.968)
Context Switch Count      455269                 LargeStack
UserTime                  00:00:45.312
KernelTime                00:00:10.531
Win32 Start Address msvcrt!_endthreadex (0×77bcb4bc)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init f3b8b000 Current f3b8ac08 Base f3b8b000 Limit f3b88000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f3b8ac20 8083d5b1 nt!KiSwapContext+0×26
f3b8ac4c 8083df9e nt!KiSwapThread+0×2e5
f3b8ac94 8093edb1 nt!KeWaitForSingleObject+0×346
f3b8ad50 80833bef nt!NtRequestWaitReplyPort+0×776
f3b8ad50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f3b8ad64)
0f13ebe8 7c827899 ntdll!KiFastSystemCallRet
0f13ebec 77c80a6e ntdll!ZwRequestWaitReplyPort+0xc
0f13ec38 77c7fcf0 RPCRT4!LRPC_CCALL::SendReceive+0×230
0f13ec44 77c80673 RPCRT4!I_RpcSendReceive+0×24
0f13ec58 77ce315a RPCRT4!NdrSendReceive+0×2b
0f13f040 771f40c4 RPCRT4!NdrClientCall2+0×22e
[…]
0f13ffb8 77e6482f msvcrt!_endthreadex+0xa3
0f13ffec 00000000 kernel32!BaseThreadStart+0×34

Following LPC chain we find that the blocking thread in ServiceB was waiting for a response from ServiceA:

0: kd> !lpc message 00fbbc86
[...]
    Server process  : 89d6d8e8 (ServiceA.exe)
[…]

Now the question arises: who was freezing first, ServiceA or ServiceB? We can compare waiting times to answer. We see that waiting time for ServiceB request thread is 3:05:19 and for ServiceA shutdown thread is 03:05:32 (from !thread and !process output above):

Owning Process            8a453b18       Image:         ServiceB.exe
[...]
Wait Start TickCount      11049035       Ticks: 711678 (0:03:05:19.968)

Owning Process            89d6d8e8       Image:         ServiceA.exe
[...]
Wait Start TickCount      11048203       Ticks: 712510 (0:03:05:32.968)

Therefore, we conclude that ServiceB was blocked after ServiceA was blocked. Such questions often arise in multivendor troubleshooting scenarious.

- Dmitry Vostokov @ DumpAnalysis.org -

Critical section high contention and wait chains, blocked threads, and periodic error: memory dump and trace analysis pattern cooperation

Friday, October 9th, 2009

This is the first case study here that shows an interplay of memory dump analysis (DA) and software trace analysis (TA) patterns, what I call DATA analysis patterns (or DA+TA).  

It was reported that one process was blocking vital server functionality. After the process restart the problem was gone away. A complete memory dump was saved on the next occurrence and it revealed critical section wait chains in that process but no critical section deadlocks:

0: kd> .process /r /p 87f76020
Implicit process is now 87f76020
Loading User Symbols
[...]

0: kd> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x0016c6d8
Critical section   = 0×0032be30 (+0×32BE30)
LOCKED
LockCount          = 0×34
WaiterWoken        = No
OwningThread       = 0×00001c64
RecursionCount     = 0×1
LockSemaphore      = 0×624
SpinCount          = 0×00000000
OwningThread       = .thread 86396db0
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
[…]

The thread 86396db0 (TID 1c64) that blocked more than 50 threads (0×34) was blocked itself sleeping for more than 6 seconds:

0: kd> .thread 86396db0
Implicit thread is now 86396db0

0: kd> kL 100
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr 
ae7f8c98 8083d5b1 nt!KiSwapContext+0x26
ae7f8cc4 8083cf69 nt!KiSwapThread+0x2e5
ae7f8d0c 8092b03f nt!KeDelayExecutionThread+0x2ab
ae7f8d54 80833bef nt!NtDelayExecution+0x84
ae7f8d54 7c82860c nt!KiFastCallEntry+0xfc
1020e8ac 7c826f69 ntdll!KiFastSystemCallRet
1020e8b0 77e41ed5 ntdll!NtDelayExecution+0xc
1020e918 77e424fd kernel32!SleepEx+0x68
1020e928 67739357 kernel32!Sleep+0xf
1020e944 6773c3a2 ComponentA!DB_Driver_Command+0xa7
[…]
1020ec64 67485393 ComponentB!DatabaseSearch+0×34
[…]
1020ffb8 77e6482f msvcrt!_endthreadex+0xa3
1020ffec 00000000 kernel32!BaseThreadStart+0×34

0: kd> kv
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child             
[...]
1020e918 77e424fd 00001b00 00000000 1020e944 kernel32!SleepEx+0x68 (FPO: [SEH])
1020e928 67739357 00001b00 00000000 0032ac6c kernel32!Sleep+0xf (FPO: [1,0,0])
[…]

0: kd> ? 1b00 / 0n1000
Evaluate expression: 6 = 00000006

Critical section it owns shows high contention count too:

0: kd> dt -r1 _RTL_CRITICAL_SECTION   0x0032be30
ProcessA!_RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x0016c6d8 _RTL_CRITICAL_SECTION_DEBUG
      +0x000 Type             : 0
      +0x002 CreatorBackTraceIndex : 0
      +0x004 CriticalSection  : 0x0032be30 _RTL_CRITICAL_SECTION
      +0x008 ProcessLocksList : _LIST_ENTRY [ 0x16c708 - 0x16c6b8 ]
      +0x010 EntryCount       : 0
      +0×014 ContentionCount  : 0xac352
      +0×018 Spare            : [2] 0×43005c
   +0×004 LockCount        : -210
   +0×008 RecursionCount   : 1
   +0×00c OwningThread     : 0×00001c64
   +0×010 LockSemaphore    : 0×00000624
   +0×014 SpinCount        : 0

Fortunately, that process had ETW tracing capability and its software trace recorded before the complete memory dump was saved the following recurrent periodic errorfrom different threads that confirms our observation about the possible problem with a database and explains thread delays we see (> 6 seconds for Sleep):

#    PID  TID  Time         Message
[...]
1972 2780 5992 10:05:11.005 Error: [DB Driver] Not enough space on temp disk
1973 2780 5992 10:05:11.005 Execute DB command sleeps on error (retry 26)
[...]
4513 2780 3292 10:06:02.942 Error: [DB Driver] Not enough space on temp disk
4514 2780 3292 10:06:02.942 Execute DB command sleeps on error (retry 11)
4515 2780 3292 10:06:09.598 Error: [DB Driver] Not enough space on temp disk
4516 2780 3292 10:06:09.598 Execute DB command sleeps on error (retry 12)
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Truncated dump, stack trace collection, waiting thread time and wait chains: pattern cooperation

Thursday, September 10th, 2009

It was reported that functionality of some applications was not available until ServiceA was killed and restarted. Complete memory dump was forced. For that memory dump WinDbg !stacks command shows every thread bugchecking the system, for example:

0: kd> !stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
[...]
                            [8ca72020 ApplicationA.exe]
 4bc.0004c4  8bf76af0 0000cc4 Blocked    nt!KeBugCheckEx+0x1b
 4bc.0004d0  8bf53650 000055d Blocked    nt!KeBugCheckEx+0x1b
 4bc.0004e8  8bf33c88 0000094 Blocked    nt!KeBugCheckEx+0x1b
 4bc.001ffc  8ba86020 0000eed Blocked    nt!KeBugCheckEx+0x1b
 4bc.001e30  8b7a0ca8 00003ac Blocked    nt!KeBugCheckEx+0x1b
 4bc.001934  8b664020 0000eed Blocked    nt!KeBugCheckEx+0x1b
 4bc.001af0  8ca6d3b0 0000094 Blocked    nt!KeBugCheckEx+0x1b
 4bc.001cac  8bf41af0 0001412 Blocked    nt!KeBugCheckEx+0x1b
 4bc.00141c  8b6458d0 0000094 Blocked    nt!KeBugCheckEx+0x1b
[...]

Seeing this nonsense I checked that complete dump was truncated by half because page file was 4Gb but the amount of physical memory was 8Gb:

0: kd> !vm
*** Virtual Memory Usage ***
 Physical Memory:     2096553 (   8386212 Kb)
 Paging File Name paged out
   Current:   4195328 Kb  Free Space:   4182336 Kb
   Minimum:   4195328 Kb  Maximum:      4195328 Kb
[…]

However it was possible to get stack trace collection using !process 0 ff command with most data from _ETHREAD structures present but most of kernel and user space stack traces not present in the output. ServiceA has many threads waiting for LPC requests during last 5 minutes (the bugcheck was forced after 3 - 4 minutes the issue manifested itself):

THREAD 8cc9b590  Cid 053c.0668  Teb: 7ffaf000 Win32Thread: b84e5770 WAIT: (Unknown) UserMode Non-Alertable
    8cc9b77c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 0005bbdf:
Current LPC port e2cee338
IRP List:
    8b6548b8: (0006,0268) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 e1003860
Owning Process            8bf81d88       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      193823         Ticks: 13752 (0:00:03:34.875)
Context Switch Count      389                 LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.000
Win32 Start Address 0×7d1f5e70
Start Address 0×77e617ec
Stack Init b3a44000 Current b3a43c08 Base b3a44000 Limit b3a41000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

The similar waits were found for ApplicationB and ApplicationC processes. Therefore it was advised to limit the amount of physical memory to 4Gb and reproduce the issue or if the liveliness of the system needs to be preserved to manually dump the following processes on the next occasion:

ServiceA.exe
ApplicationB.exe
ApplicationC.exe

- Dmitry Vostokov @ DumpAnalysis.org -

Inconsistent dump, blocked threads, wait chains, incorrect stack trace and process factory: pattern cooperation

Tuesday, June 9th, 2009

The more busy the system is, the more inconsistent are complete memory dumps produced by external physical memory dumpers. On the contrary, quiet systems, with idle existent users and when no one can connect, are more consistent than usual. Here is one example:

Loading Dump File [complete_dump.dmp]
Kernel Complete Dump File: Full address space is available

[...]

WARNING: Process directory table base 00039000 doesn't match CR3 CB759000
[...]

BugCheck 0, {0, 0, 0, 0}

Probably caused by : LiveKdD.SYS ( LiveKdD+12d1 )

In such cases I often dump all processes and threads to look for anomalies and indeed we find plenty of blocked threads in several services. For example, ServiceA.exe has blocked threads waiting for critical sections and LPC requests:

0: kd> !process 0 ff
**** NT ACTIVE PROCESS DUMP ****

[...]

THREAD 86f21118  Cid 07d4.2914  Teb: 7ffa5000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
     86f21304  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00f2b546:
Current LPC port d79c6e00
Impersonation token:  db621028 (Level Impersonation)
DeviceMap                 dca24d50
Owning Process            88948d88       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14788708       Ticks: 3643121 (0:15:48:43.765)
Context Switch Count      52            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address ServiceA!EventWorkerThread (0×0042e2f8)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b2246000 Current b2245c08 Base b2246000 Limit b2243000 Call 0
Priority 12 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
b2245c10 00000000 0×149c3
[…]

THREAD 8736fdb0  Cid 07d4.3cc4  Teb: 7ff9c000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
     8736ff9c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00f2b518:
Current LPC port d79c6e00
Impersonation token:  da1b51a8 (Level Impersonation)
DeviceMap                 d9f9e368
Owning Process            88948d88       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14788707       Ticks: 3643122 (0:15:48:43.781)
Context Switch Count      44            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address ServiceA!EventWorkerThread (0×0042e2f8)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init ae5d3000 Current ae5d2c08 Base ae5d3000 Limit ae5d0000 Call 0
Priority 12 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
ae5d2c10 3fe63bfa 0×73e40763
[…]

THREAD 85e89c80  Cid 07d4.5648  Teb: 7ff6a000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
     871d6e80  SynchronizationEvent
Impersonation token:  d7a1c028 (Level Impersonation)
DeviceMap                 d66ef860
Owning Process            88948d88       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14320472       Ticks: 4111357 (0:17:50:39.953)
Context Switch Count      28            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00ebebcc
LPC Server thread working on message Id ebebcc
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init a536b000 Current a536ac60 Base a536b000 Limit a5368000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
a536ac78 8083d5b1 nt!KiSwapContext+0x26
a536aca4 8083df9e nt!KiSwapThread+0x2e5
a536acec 8092ae57 nt!KeWaitForSingleObject+0x346
a536ad50 80833bdf nt!NtWaitForSingleObject+0x9a
a536ad50 7c8285ec nt!KiFastCallEntry+0xfc
05b6f7d0 7c827d0b ntdll!KiFastSystemCallRet
05b6f7d4 7c83d236 ntdll!NtWaitForSingleObject+0xc
05b6f810 7c83d281 ntdll!RtlpWaitOnCriticalSection+0×1a3
05b6f830 0040bb5b ntdll!RtlEnterCriticalSection+0xa8

05b6f874 00415889 ServiceA!OnRequest+0×23
[…]
05b6f8b4 77ce33e1 RPCRT4!Invoke+0×30
05b6fcb4 77ce35c4 RPCRT4!NdrStubCall2+0×299
05b6fcd0 77c7ff7a RPCRT4!NdrServerCall2+0×19
05b6fd04 77c8042d RPCRT4!DispatchToStubInCNoAvrf+0×38
05b6fd58 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×11f
05b6fd7c 77c7e0d4 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
05b6fdbc 77c7e080 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject+0xc0
05b6fdfc 77c812f0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×41e
05b6fe20 77c88678 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
05b6ff84 77c88792 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
05b6ff8c 77c8872d RPCRT4!RecvLotsaCallsWrapper+0xd
05b6ffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
05b6ffb8 77e64829 RPCRT4!ThreadStartRoutine+0×1b
05b6ffec 00000000 kernel32!BaseThreadStart+0×34

Knowing that ServiceA is crucial for our system functions we examine LPC wait chains. All “Waiting for reply to LPC MessageId <…>” threads depend on another service:

0: kd> !lpc message 00f2b546
Searching message f2b546 in threads ...
Client thread 86f21118 waiting a reply from f2b546                         
    Server thread 860cc9e0 is working on message f2b546
[…]
    Server process  : 87d21698 (svchost.exe)
[…]

We find this thread in svchost.exe hosting ServiceB:

0: kd> !thread 860cc9e0 1f
THREAD 860cc9e0  Cid 1310.4ad8  Teb: 7ff88000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
     871d47d0  Mutant - owning thread 867322e0
Not impersonating
DeviceMap                 d64008f8
Owning Process            87d21698       Image:         svchost.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      14788708       Ticks: 3643121 (0:15:48:43.765)
Context Switch Count      1791            
UserTime                  00:00:00.031
KernelTime                00:00:00.046
Win32 Start Address 0×00f2b546
LPC Server thread working on message Id f2b546
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b055e000 Current b055dc60 Base b055e000 Limit b055b000 Call 0
Priority 12 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
b055dc78 8083d5b1 nt!KiSwapContext+0×26
b055dca4 8083df9e nt!KiSwapThread+0×2e5
b055dcec 8092ae57 nt!KeWaitForSingleObject+0×346
b055dd50 80833bdf nt!NtWaitForSingleObject+0×9a
b055dd50 7c8285ec nt!KiFastCallEntry+0xfc
0400f718 7c827d0b ntdll!KiFastSystemCallRet
0400f71c 7654bf59 ntdll!NtWaitForSingleObject+0xc
0400f73c 7654c694 ServiceB!Lock+0×42
0400f754 76558b17 ServiceB!FindObjectById+0×39
0400f8ac 7654fa50 ServiceB!QueryInformation+0×3e
0400f8c8 77c80193 ServiceB!RpcQueryInformation+0×49
0400f8f8 77ce33e1 RPCRT4!Invoke+0×30
0400fcf8 77ce35c4 RPCRT4!NdrStubCall2+0×299
0400fd14 77c7ff7a RPCRT4!NdrServerCall2+0×19
0400fd48 77c8042d RPCRT4!DispatchToStubInCNoAvrf+0×38
0400fd9c 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×11f
0400fdc0 77c811dc RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
0400fdfc 77c812f0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
0400fe20 77c88678 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0400ff84 77c88792 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0400ff8c 77c8872d RPCRT4!RecvLotsaCallsWrapper+0xd
0400ffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
0400ffb8 77e64829 RPCRT4!ThreadStartRoutine+0×1b
0400ffec 00000000 kernel32!BaseThreadStart+0×34

We notice that about 45 threads are waiting for the mutant 871d47d0 owned by thread 867322e0, like this one with incorrect stack trace because of inconsistent dump file data:

0: kd> !thread 86cad210 1f
THREAD 86cad210  Cid 1310.31a4  Teb: 7ff15000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
     871d47d0  Mutant - owning thread 867322e0
Not impersonating
DeviceMap                 d64008f8
Owning Process            87d21698       Image:         svchost.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      18020896       Ticks: 410933 (0:01:47:00.828)
Context Switch Count      43            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address RPCRT4!ThreadStartRoutine (0×77c7b0f5)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init abb1b000 Current abb1ac60 Base abb1b000 Limit abb18000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
abb1ac68 60203020 0×84
[…]

We inspect the thread 867322e0:

0: kd> !thread 867322e0 1f
THREAD 867322e0  Cid 1310.26dc  Teb: 7ff56000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
     869d4020  ProcessObject
     86732358  NotificationTimer
Not impersonating
DeviceMap                 d64008f8
Owning Process            87d21698       Image:         svchost.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      18454684    
Context Switch Count      5172            
UserTime                  00:00:00.000
KernelTime                00:00:00.031
Win32 Start Address 0×0083fd5d
LPC Server thread working on message Id 83fd5d
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init ad350000 Current ad34fc60 Base ad350000 Limit ad34d000 Call 0
Priority 10 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr 
ad34fc78 8083d5b1 nt!KiSwapContext+0×26
ad34fca4 8083df9e nt!KiSwapThread+0×2e5
ad34fcec 8092ae57 nt!KeWaitForSingleObject+0×346
ad34fd50 80833bdf nt!NtWaitForSingleObject+0×9a
ad34fd50 7c8285ec nt!KiFastCallEntry+0xfc
04f5fbec 7c827d0b ntdll!KiFastSystemCallRet
04f5fbf0 76548721 ntdll!NtWaitForSingleObject+0xc
04f5fc0c 7654c596 ServiceB!WaitForProcess+0×4a

04f5ffb8 77e64829 ServiceB!LpcThread+0×14e
04f5ffec 00000000 kernel32!BaseThreadStart+0×34

We see that it is waiting for the process 869d4020 and we find the latter one with a single thread blocked by DriverA:

0: kd> !process 869d4020 ff
PROCESS 869d4020  SessionId: 18  Cid: 56b8    Peb: 7ffde000  ParentCid: 3608
    DirBase: 758b3000  ObjectTable: dd7083c0  HandleCount:  79.
    Image: rundll32.exe
    VadRoot 869d5228 Vads 107 Clone 0 Private 330. Modified 1706. Locked 0.
    DeviceMap d8d260a8
    Token                             dfa8f028
    ElapsedTime                       1 Day 23:52:06.918
    UserTime                          00:00:00.312
    KernelTime                        00:00:01.484
    QuotaPoolUsage[PagedPool]         34784
    QuotaPoolUsage[NonPagedPool]      5016
    Working Set Sizes (now,min,max)  (810, 50, 345) (3240KB, 200KB, 1380KB)
    PeakWorkingSetSize                2088
    VirtualSize                       37 Mb
    PeakVirtualSize                   47 Mb
    PageFaultCount                    34473
    MemoryPriority                    BACKGROUND
    BasePriority                      10
    CommitCharge                      707
[..]
    CommandLine:  ‘rundll32 C:\WINDOWS\system32\DllA.dll,CheckData /name”PropertyA”‘
[…]

THREAD 86bac020  Cid 56b8.4904  Teb: 7ffdb000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
     8774a930  NotificationEvent
Not impersonating
DeviceMap                 d8d260a8
Owning Process            869d4020       Image:         rundll32.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      10113815       Ticks: 8318014 (1:12:06:08.968)
Context Switch Count      188            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x2b169ecb
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init af12b000 Current af12ac34 Base af12b000 Limit af128000 Call 0
Priority 12 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr 
af12ac4c 8083d5b1 nt!KiSwapContext+0x26
af12ac78 8083df9e nt!KiSwapThread+0x2e5
af12acc0 b5075a82 nt!KeWaitForSingleObject+0x346
WARNING: Stack unwind information not available. Following frames may be wrong.
af12acf4 b507f989 DriverA+0×3a82
[…]

We also check virtual memory consumption and instantly see hundreds of rundll32.exe processes like in a process factory pattern:

0: kd> !vm
[...]
57f4 rundll32.exe       586 (      2344 Kb)
57d0 rundll32.exe       586 (      2344 Kb)
57b4 rundll32.exe       586 (      2344 Kb)
578c rundll32.exe       586 (      2344 Kb)
5780 rundll32.exe       586 (      2344 Kb)
577c rundll32.exe       586 (      2344 Kb)
5744 rundll32.exe       586 (      2344 Kb)
572c rundll32.exe       586 (      2344 Kb)
5724 rundll32.exe       586 (      2344 Kb)
5708 rundll32.exe       586 (      2344 Kb)
56d4 rundll32.exe       586 (      2344 Kb)
56a4 rundll32.exe       586 (      2344 Kb)
564c rundll32.exe       586 (      2344 Kb)
5630 rundll32.exe       586 (      2344 Kb)
55e0 rundll32.exe       586 (      2344 Kb)
55b8 rundll32.exe       586 (      2344 Kb)
55ac rundll32.exe       586 (      2344 Kb)
558c rundll32.exe       586 (      2344 Kb)
550c rundll32.exe       586 (      2344 Kb)
546c rundll32.exe       586 (      2344 Kb)
5428 rundll32.exe       586 (      2344 Kb)
5418 rundll32.exe       586 (      2344 Kb)
53d0 rundll32.exe       586 (      2344 Kb)
53bc rundll32.exe       586 (      2344 Kb)
53b0 rundll32.exe       586 (      2344 Kb)
53a8 rundll32.exe       586 (      2344 Kb)
530c rundll32.exe       586 (      2344 Kb)
5300 rundll32.exe       586 (      2344 Kb)
52e8 rundll32.exe       586 (      2344 Kb)
52a4 rundll32.exe       586 (      2344 Kb)
5294 rundll32.exe       586 (      2344 Kb)
5228 rundll32.exe       586 (      2344 Kb)
5118 rundll32.exe       586 (      2344 Kb)
50bc rundll32.exe       586 (      2344 Kb)
50a8 rundll32.exe       586 (      2344 Kb)
5080 rundll32.exe       586 (      2344 Kb)
5030 rundll32.exe       586 (      2344 Kb)
4fc8 rundll32.exe       586 (      2344 Kb)
4f88 rundll32.exe       586 (      2344 Kb)
4ef0 rundll32.exe       586 (      2344 Kb)
4e84 rundll32.exe       586 (      2344 Kb)
4e70 rundll32.exe       586 (      2344 Kb)
4e68 rundll32.exe       586 (      2344 Kb)
4e4c rundll32.exe       586 (      2344 Kb)
4e38 rundll32.exe       586 (      2344 Kb)
4df8 rundll32.exe       586 (      2344 Kb)
4df4 rundll32.exe       586 (      2344 Kb)
4d2c rundll32.exe       586 (      2344 Kb)
4d00 rundll32.exe       586 (      2344 Kb)
4cd4 rundll32.exe       586 (      2344 Kb)
4cbc rundll32.exe       586 (      2344 Kb)
4c90 rundll32.exe       586 (      2344 Kb)
4c80 rundll32.exe       586 (      2344 Kb)
4c68 rundll32.exe       586 (      2344 Kb)
4c40 rundll32.exe       586 (      2344 Kb)
[...]

By switching the current process context to 869d4020 (.process /r /p command) and using lmv command we check vendors for DllA and DriverA. It seems that they are different. We also see that DllA module is no longer loaded (not seen in the module list) by the time DriverA is waiting for a notification event. All rundll32.exe processes are waiting for that DriverA kernel module. We therefore decide to contact its vendor.

- Dmitry Vostokov @ DumpAnalysis.org -

Stack trace collection, blocked threads, pass through functions and main thread: pattern cooperation

Friday, May 8th, 2009

A server was hanging and a complete memory dump was generated. Usual checks for resource contention or CPU spikes like !locks and !running WinDbg commands didn’t reveal anything. Therefore we decided to list all threads in the system using !stacks and !process 0 ff commands. The former command gives a birds eye overview of threads and it was found suspicious that many threads in different processes had 0 ticks waiting and had this top level call:

0: kd> !stacks
Proc.Thread .Thread Ticks ThreadState Blocker
[...]

[88d9cd88 explorer.exe]
Proc.Thread .Thread  Ticks   ThreadState Blocker
1ecc.001eb4 88d99db0 0000000 Blocked     3rdPartyAVDriver+0×1394
[…]

[862bad88 userinit.exe]
4394.005570 878f0790 0000000 Blocked     3rdPartyAVDriver+0×1394

Inspection of the latter command detailed output of thread stacks showed that all these suspicious threads were in fact blocked when loading a DLL:

0: kd> .process /r /p 862bad88
Implicit process is now 862bad88
Loading User Symbols

0: kd> .thread 878f0790
Implicit thread is now 878f0790

0: kd> kL 100
ChildEBP RetAddr 
9aeb529c 80833e95 nt!KiSwapContext+0x26
9aeb52c8 80829027 nt!KiSwapThread+0x2e5
9aeb5310 b8dff394 nt!KeDelayExecutionThread+0x2ab
WARNING: Stack unwind information not available. Following frames may be wrong.
9aeb5338 b8dffe35 3rdPartyAVDriver+0×1394
9aeb5354 b8e00208 3rdPartyAVDriver+0×1e35
9aeb5398 b8e0e45a 3rdPartyAVDriver+0×2208
9aeb53d0 8081e087 3rdPartyAVDriver+0×1045a

9aeb53e4 b7d6e41b nt!IofCallDriver+0×45
9aeb53f4 b7d6a3ee DriverA!Dispatch+0xfe
9aeb5404 8081e087 DriverA!KUI_dispatch+0×6d

9aeb5418 b8bdb4e1 nt!IofCallDriver+0×45
9aeb543c b8bdb755 DriverB!PassThrough+0xd3
9aeb545c 8081e087 DriverB!Dispatch+0×153

9aeb5470 b6a7b81d nt!IofCallDriver+0×45
9aeb5498 b6a6c5bb DriverC!LowerPassThrough+0×48
9aeb54bc b6a6cc32 DriverC+0×75bb
9aeb5554 b6a7aa57 DriverC+0×7c32
9aeb5564 b6a7aaa7 DriverC+0×15a57
9aeb558c 8081e087 DriverC!DispatchPassThrough+0×48
9aeb55a0 808fb13b nt!IofCallDriver+0×45
9aeb5688 80939c6a nt!IopParseDevice+0xa35
9aeb5708 80935d9e nt!ObpLookupObjectName+0×5b0
9aeb575c 808ece57 nt!ObOpenObjectByName+0xea
9aeb57d8 808ee0f1 nt!IopCreateFile+0×447
9aeb5834 808f1e31 nt!IoCreateFile+0xa3
9aeb5874 8088b19c nt!NtOpenFile+0×27
9aeb5874 7c94860c nt!KiFastCallEntry+0xfc
0007e1fc 7c947339 ntdll!KiFastSystemCallRet
0007e200 7c952653 ntdll!NtOpenFile+0xc
0007e278 7c9525db ntdll!LdrpCreateDllSection+0×4e
0007e364 7c95425c ntdll!LdrpMapDll+0×297
0007e610 7c95408d ntdll!LdrpLoadDll+0×1e4
0007e88c 7c801bf7 ntdll!LdrLoadDll+0×198
0007e8f4 7c81c70b kernel32!LoadLibraryExW+0×1b2
0007e908 7c93a352 kernel32!LoadLibraryW+0×11
0007ee38 7c95348d ntdll!LdrpCallInitRoutine+0×14
0007ef40 7c954339 ntdll!LdrpRunInitializeRoutines+0×367
0007f1d4 7c95408d ntdll!LdrpLoadDll+0×3cd
0007f450 7c801bf7 ntdll!LdrLoadDll+0×198
0007f4b8 7c81c70b kernel32!LoadLibraryExW+0×1b2
0007f4cc 77e556fb kernel32!LoadLibraryW+0×11
0007f558 77e3dc66 USER32!LoadAppDlls+0×179
0007f694 77e3df10 USER32!ClientThreadSetup+0×1c4
0007f698 7c948556 USER32!__ClientThreadSetup+0×5
0007f698 80831348 ntdll!KiUserCallbackDispatcher+0×2e
9aeb5b4c 8091f9e1 nt!KiCallUserMode+0×4
9aeb5ba4 bf8bc7c3 nt!KeUserModeCallback+0×8f
9aeb5bcc bf8bbc93 win32k!xxxClientThreadSetup+0×21
9aeb5ccc bf8b6c46 win32k!xxxCreateThreadInfo+0×804
9aeb5cdc bf8b7ae4 win32k!UserThreadCallout+0×71
9aeb5cf8 80947a1b win32k!W32pThreadCallout+0×3a
9aeb5d54 8088ae9e nt!PsConvertToGuiThread+0×153
9aeb5d58 00000000 nt!KiBBTUnexpectedRange+0xc

The functions shown in blue are known from past issues to be pass through forwarding IRP to the lower drivers in a device driver stack.

Actually the thread stack above belongs to the main thread blocked while loading application DLLs. The conclusion was to check 3rdPartyAVDriver.sys vendor.

- Dmitry Vostokov @ DumpAnalysis.org -

Coupled processes, wait chains, message box, waiting thread time, paged out data, incorrect stack trace, hidden exception, unknown component and execution residue: pattern cooperation

Wednesday, March 11th, 2009

Let’s look at one complete memory dump from a hanging system. It was reported that an application was hanging when trying to create a printer. In its process we can identify this blocked thread that was processing an LPC request and waiting for an LPC reply from a coupled process, a spooler service:

0: kd> !thread 88db89a8 1f
THREAD 88db89a8  Cid 0764.31b8  Teb: 7ffdc000 Win32Thread: bc2dc980 WAIT: (Unknown) UserMode Non-Alertable
    88db8b94  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 0016346d:
Current LPC port e179d780
Not impersonating
DeviceMap                 e157b068
Owning Process            8a95f690       Image:         ApplicationA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2750643        Ticks: 153453 (0:00:39:57.703)
Context Switch Count      2120                 LargeStack
UserTime                  00:00:00.031
KernelTime                00:00:00.078
Win32 Start Address 0×0015b4fd
LPC Server thread working on message Id 15b4fd
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b7848000 Current b7847c08 Base b7848000 Limit b7845000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
b7847c20 80833ec5 nt!KiSwapContext+0×26
b7847c4c 80829bc0 nt!KiSwapThread+0×2e5
b7847c94 80920f28 nt!KeWaitForSingleObject+0×346
b7847d50 8088ad3c nt!NtRequestWaitReplyPort+0×776
b7847d50 7c8285ec nt!KiFastCallEntry+0xfc (TrapFrame @ b7847d64)
00e6e150 7c82787b ntdll!KiFastSystemCallRet
00e6e154 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
00e6e1a8 77c7fcf0 RPCRT4!LRPC_CCALL::SendReceive+0×230
00e6e1b4 77c80673 RPCRT4!I_RpcSendReceive+0×24
00e6e1c8 77ce315a RPCRT4!NdrSendReceive+0×2b
00e6e5b0 7307ca61 RPCRT4!NdrClientCall2+0×22e
00e6e5c8 73082e15 WINSPOOL!RpcAddPrinter+0×1c
00e6ea50 0040dd93 WINSPOOL!AddPrinterW+0×102
00e6ef40 0040f535 ApplicationA!CreatePrinter+0×8d9

[…]
00e6f8b4 77ce33e1 RPCRT4!Invoke+0×30
00e6fcb4 77ce35c4 RPCRT4!NdrStubCall2+0×299
00e6fcd0 77c7ff7a RPCRT4!NdrServerCall2+0×19
00e6fd04 77c8042d RPCRT4!DispatchToStubInCNoAvrf+0×38
00e6fd58 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×11f
00e6fd7c 77c7e0d4 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
00e6fdbc 77c7e080 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject+0xc0
00e6fdfc 77c812f0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×41e
00e6fe20 77c88678 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
00e6ff84 77c88792 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
00e6ff8c 77c8872d RPCRT4!RecvLotsaCallsWrapper+0xd
00e6ffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
00e6ffb8 77e64829 RPCRT4!ThreadStartRoutine+0×1b

We follow an LPC wait chain and see a thread blocked by a message box:

0: kd> !lpc message 0016346d
Searching message 16346d in threads …
    Server thread 891e9db0 is working on message 16346d                        
Client thread 88db89a8 waiting a reply from 16346d                         
Searching thread 88db89a8 in port rundown queues …

0: kd> !thread 891e9db0 1f
THREAD 891e9db0  Cid 045c.3b3c  Teb: 7ffa7000 Win32Thread: bc2d4ea8 WAIT: (Unknown) UserMode Non-Alertable
    88f5c850  SynchronizationEvent
Impersonation token:  e76f5030 (Level Impersonation)
DeviceMap                 e157b068
Owning Process            8ab10460       Image:         spoolsv.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2414962        Ticks: 489134 (0:02:07:22.718)
Context Switch Count      6797                 LargeStack
UserTime                  00:00:00.359
KernelTime                00:00:01.437
Win32 Start Address 0×0016346d
LPC Server thread working on message Id 16346d
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b6aaa000 Current b6aa9c50 Base b6aaa000 Limit b6aa6000 Call 0
Priority 11 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
b6aa9c68 80833ec5 nt!KiSwapContext+0×26
b6aa9c94 80829bc0 nt!KiSwapThread+0×2e5
b6aa9cdc bf89aacc nt!KeWaitForSingleObject+0×346
b6aa9d38 bf89d952 win32k!xxxSleepThread+0×1be
b6aa9d4c bf89e310 win32k!xxxRealWaitMessageEx+0×12
b6aa9d5c 8088ad3c win32k!NtUserWaitMessage+0×14
b6aa9d5c 7c8285ec nt!KiFastCallEntry+0xfc (TrapFrame @ b6aa9d64)
021267bc 7739bf53 ntdll!KiFastSystemCallRet
021267f4 7738965e USER32!NtUserWaitMessage+0xc
0212681c 7739f762 USER32!InternalDialogBox+0xd0
02126adc 7739f047 USER32!SoftModalMessageBox+0×94b
02126c2c 7739eec9 USER32!MessageBoxWorker+0×2ba
02126c84 773d7d0d USER32!MessageBoxTimeoutW+0×7a
02126cb8 773c42c8 USER32!MessageBoxTimeoutA+0×9c
02126cd8 773c42a4 USER32!MessageBoxExA+0×1b
02126cf4 6dfcf8c2 USER32!MessageBoxA+0×45
WARNING: Frame IP not in any known module. Following frames may be wrong.
02126d98 530a2935 0×6dfcf8c2
02126d9c 20656d6f 0×530a2935
[…]

Looking at this thread waiting time we see the problem happened about 1 hour 20 minutes before our hanging application attempted to print. We want to see other threads in the spooler but most of them are paged out:

0: kd> !process 8ab10460 1f

[...]

THREAD 8a6aedb0  Cid 045c.1090  Teb: 7ffab000 Win32Thread: bc33bbe8 WAIT: (Unknown) UserMode Non-Alertable
    8abb2f40  SynchronizationEvent
    IRP List:
        8a0a2ed0: (0006,0094) Flags: 00000800  Mdl: 00000000
Impersonation token:  e7c94760 (Level Impersonation)
DeviceMap                 e9194e68
Owning Process            8ab10460       Image:         spoolsv.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      1897118        Ticks: 1006978 (0:04:22:14.031)
Context Switch Count      9711                 LargeStack
UserTime                  00:00:00.343
KernelTime                00:00:02.812
Win32 Start Address 0x00165a40
LPC Server thread working on message Id 165a40
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init b7c28000 Current b7c27c60 Base b7c28000 Limit b7c24000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

[...] 

Coming back to our message box thread we inspect MessageBox parameters to see the message:

0: kd> .process /r /p 8ab10460
Implicit process is now 8ab10460
Loading User Symbols

0: kd> !thread 891e9db0
[...]
ChildEBP RetAddr  Args to Child             
[...]
02126cf4 6dfcf8c2 00000000 02126d4c 02126d2c USER32!MessageBoxA+0×45
[…]

0: kd> da /c 100 02126d4c
02126d4c  "Function address 0×77e63ce4 caused a protection fault. (exception code 0xc0000005).

0: kd> ln 0x77e63ce4
(77e63c87)   kernel32!GetModuleFileNameW+0×83   |  (77e63d7a)   kernel32!GetProcAddress

We see that the thread experienced an access violation in GetModuleFileName API call.

We also see that stack trace is incorrect and we try to reconstruct the point of exception by looking at thread raw stack and searching for any hidden exception

0: kd> .thread 891e9db0
Implicit thread is now 891e9db0

0: kd> !teb
TEB at 7ffa7000
    ExceptionList:        0212757c
    StackBase:            02130000
    StackLimit:           02123000

    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffa7000
    EnvironmentPointer:   00000000
    ClientId:             0000045c . 00003b3c
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffde000
    LastErrorValue:       0
    LastStatusValue:      c0000034
    Count Owned Locks:    0
    HardErrorMode:        0

0: kd> dds 02123000 02130000
02123000  00000000
02123004  00000000
02123008  00000000
0212300c  00000000
[...]
021275a8  02127648
021275ac  0212aa0c
021275b0  7c8315c2 ntdll!RtlDispatchException+0×91
021275b4  02127648
021275b8  0212aa0c
021275bc  02127664

021275c0  02127624
021275c4  6dfd0a54
021275c8  063b0000
021275cc  02127648
021275d0  7c8877ec ntdll!PebLdr+0xc
021275d4  77e61a60 kernel32!_except_handler3
021275d8  77e69cf8 kernel32!`string’+0xc
021275dc  ffffffff
021275e0  00080000
021275e4  00089878
021275e8  77b92e40 VERSION!`string’
021275ec  00000000
021275f0  00000000
021275f4  00017608
021275f8  02127540
021275fc  77b92e40 VERSION!`string’
02127600  02127648
02127604  7c828270 ntdll!_except_handler3
02127608  7c829f60 ntdll!CheckHeapFillPattern+0×64
0212760c  ffffffff
02127610  7c829f59 ntdll!RtlFreeHeap+0×70f
02127614  77e62444 kernel32!LocalFree+0×27
02127618  00080000
0212761c  00000000
02127620  77e62457 kernel32!LocalFree+0×108
02127624  5dff1510
02127628  02130000
0212762c  00000000
02127630  0212797c
02127634  7c82855e ntdll!KiUserExceptionDispatcher+0xe
02127638  02123000
0212763c  02127664

02127640  02127648
02127644  02127664
02127648  c0000005
0212764c  00000000
02127650  00000000
02127654  77e63ce4 kernel32!GetModuleFileNameW+0×83
02127658  00000002
0212765c  00000000
02127660  00000018
02127664  0001003f
02127668  00000000
0212766c  00000000
02127670  00000000
[…]

0: kd> .cxr 02127664
eax=00000000 ebx=00000000 ecx=7c82d279 edx=7c8877a0 esi=7c8877ec edi=063b0000
eip=77e63ce4 esp=02127930 ebp=0212797c iopl=0  nv up ei ng nz ac pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000  efl=00010297
kernel32!GetModuleFileNameW+0x83:
001b:77e63ce4 3b7818          cmp     edi,dword ptr [eax+18h] ds:0023:00000018=????????

0: kd> kv
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child             
0212797c 77390f3d 063b0000 021279d0 00000104 kernel32!GetModuleFileNameW+0x83
02127990 77390f13 063b0000 021279d0 00000104 USER32!WowGetModuleFileName+0x14
02127bdc 7738bcbd 063b0000 00000097 00000003 USER32!LoadIcoCur+0x76
02127bfc 06425db5 063b0000 00000097 02127d28 USER32!LoadIconW+0×1b
WARNING: Frame IP not in any known module. Following frames may be wrong.
02127c38 7c82cefe 00000704 00001f10 02127c58 0×6425db5
02127c4c 7c82ceb0 00001f10 00002614 00000704 ntdll!ARRAY_FITS+0×29
02127c8c c0150008 00000000 03923f68 0000001d ntdll!RtlpLocateActivationContextSection+0×1da
02127d20 7c82e0ef 00000000 00000000 00071f10 0xc0150008

What we see is the fact that the thread tried to load an icon from 063b0000 module not listed in lmv command output list perhaps because it was not fully loaded yet. We see that the thread also holds a loader critical section:

0: kd> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x7c8877c0
Critical section   = 0×7c8877a0 (ntdll!LdrpLoaderLock+0×0)
LOCKED
LockCount          = 0xC
WaiterWoken        = No
OwningThread       = 0×00003b3c
RecursionCount     = 0×1
LockSemaphore      = 0×108
SpinCount          = 0×00000000
OwningThread       = .thread 891e9db0
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.

However we can inspect the unknown module PE headers left in memory:

 0: kd> !dh 063b0000

File Type: DLL
FILE HEADER VALUES
     14C machine (i386)
       4 number of sections
42B9172C time date stamp Wed Jun 22 08:45:48 2005

       0 file pointer to symbol table
       0 number of symbols
      E0 size of optional header
    210E characteristics
            Executable
            Line numbers stripped
            Symbols stripped
            32 bit word machine
            DLL

[...]

Debug Directories(1)
 Type       Size     Address  Pointer
 cv           16       10198     f598 Format: NB10, 42b9172c, 1, c:\printerdriver123UI.pdb

[...]

We see that the module name is printerdriver123UI. So we should recommend to check the printer driver package containing this component. lmv command lists many printer drivers with the similar name prefix and we look at thread execution residue to get better stack trace and find the corresponding printer. We know that the application called AddPrinter API and therefore we try to find the corresponding call on the raw stack that we examined for hidden exceptions before:

[...]
0212ef2c  01f20390
0212ef30  0212f3f8
0212ef34  76144799 localspl!SplAddPrinter+0×5f3
0212ef38  01efb730
0212ef3c  00000003
[…]
0212f3f8  0212f424
0212f3fc  76144ab2 localspl!LocalAddPrinterEx+0×2e
0212f400  00000000
[…]
0212f424  0212f874
0212f428  7407018b SPOOLSS!AddPrinterExW+0×151
0212f42c  00000000
[…]
0212f874  0212f890
0212f878  74070254 SPOOLSS!AddPrinterW+0×17
0212f87c  00000000
[…]

0: kd> kv L=0212ef30
ChildEBP RetAddr  Args to Child             
0212ef30 76144799 01efb730 00000003 00000000 kernel32!GetModuleFileNameW+0x83
0212f3f8 76144ab2 00000000 00000002 011116e0 localspl!SplAddPrinter+0x5f3
0212f424 7407018b 00000000 00000002 011116e0 localspl!LocalAddPrinterEx+0x2e
0212f874 74070254 00000000 00000002 011116e0 SPOOLSS!AddPrinterExW+0x151
0212f890 0100792d 00000000 00000002 011116e0 SPOOLSS!AddPrinterW+0×17
0212f8ac 01006762 00000000 0118f968 00159180 spoolsv!YAddPrinter+0×75
0212f8d0 77c80193 00000000 0118f968 00159180 spoolsv!RpcAddPrinter+0×37
0212f8f8 77ce33e1 0100672b 0212fae0 00000005 RPCRT4!Invoke+0×30
0212fcf8 77ce35c4 00000000 00000000 000cefbc RPCRT4!NdrStubCall2+0×299
0212fd14 77c7ff7a 000cefbc 000bbd20 000cefbc RPCRT4!NdrServerCall2+0×19
0212fd48 77c8042d 01002c47 000cefbc 0212fdec RPCRT4!DispatchToStubInCNoAvrf+0×38
0212fd9c 77c80353 00000005 00000000 0100d228 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×11f
0212fdc0 77c811dc 000cefbc 00000000 0100d228 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
0212fdfc 77c812f0 000c9e80 000a7af0 0014ce70 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
0212fe20 77c88678 000a7b28 0212fe38 000c9e80 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0212ff84 77c88792 0212ffac 77c8872d 000a7af0 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0212ff8c 77c8872d 000a7af0 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd
0212ffac 77c7b110 000885d8 0212ffec 77e64829 RPCRT4!BaseCachedThreadRoutine+0×9d
0212ffb8 77e64829 000a3750 00000000 00000000 RPCRT4!ThreadStartRoutine+0×1b
0212ffec 00000000 77c7b0f5 000a3750 00000000 kernel32!BaseThreadStart+0×34

 
Examination of the server-side AddPrinter arguments reveals the printer name:

0: kd> dpu 011116e0 l5
011116e0  00000000
011116e4  00158f48
011116e8  00000000
011116ec  00158fe4
011116f0  00159040 "PRINTER NAME ABCDEF

- Dmitry Vostokov @ DumpAnalysis.org -

Insufficient memory, handle leak, process factory, high contention and busy system: pattern cooperation

Monday, February 16th, 2009

It was reported that one system became slower and slower and eventually the complete memory dump was generated for analysis. Usual !vm 4 command shows pool allocation errors and particularly insufficient session pool memory for user session 1:

1: kd> !vm 4

*** Virtual Memory Usage ***
 Physical Memory:     1048503 (   4194012 Kb)
 Page File: \??\C:\pagefile.sys
   Current:   5109760 Kb  Free Space:   3668764 Kb
   Minimum:   5109760 Kb  Maximum:      5109760 Kb
 Available Pages:      409409 (   1637636 Kb)
 ResAvail Pages:       708135 (   2832540 Kb)
 Locked IO Pages:         108 (       432 Kb)
 Free System PTEs:     137566 (    550264 Kb)
 Free NP PTEs:          27288 (    109152 Kb)
 Free Special NP:           0 (         0 Kb)
 Modified Pages:          780 (      3120 Kb)
 Modified PF Pages:       771 (      3084 Kb)
 NonPagedPool Usage:    37587 (    150348 Kb)
 NonPagedPool Max:      65214 (    260856 Kb)

 PagedPool 0 Usage:     21583 (     86332 Kb)
 PagedPool 1 Usage:      3054 (     12216 Kb)
 PagedPool 2 Usage:      3076 (     12304 Kb)
 PagedPool 3 Usage:      3050 (     12200 Kb)
 PagedPool 4 Usage:      3040 (     12160 Kb)
 PagedPool Usage:       33803 (    135212 Kb)
 PagedPool Maximum:     66560 (    266240 Kb)

 ********** 185454 pool allocations have failed **********

 Shared Commit:        123289 (    493156 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:       101937 (    407748 Kb)
 PagedPool Commit:      33850 (    135400 Kb)
 Driver Commit:          1697 (      6788 Kb)
 Committed pages:     1159581 (   4638324 Kb)
 Commit limit:        2283111 (   9132444 Kb)

[...]

 Session ID 1 @ af925000:
 Paged Pool Usage:       16868K

 *** 371034 Pool Allocation Failures ***

 Commit Usage:           17960K

 Session ID 2 @ af927000:
 Paged Pool Usage:        2856K
 Commit Usage:            3860K

[...]

The exploration of session 1 processes shows Process Factory pattern (5,000 launched processes) with explorer.exe ran amok:

1: kd> !sprocess 1
Dumping Session 1

_MM_SESSION_SPACE af925000
_MMSESSION        af925d80
PROCESS 87db34a0  SessionId: 1  Cid: 4a68    Peb: 7ffd4000  ParentCid: 01b8
    DirBase: cffb6d20  ObjectTable: e6084268  HandleCount: 11677.
    Image: csrss.exe

PROCESS 86083020  SessionId: 1  Cid: 4560    Peb: 7ffde000  ParentCid: 2978
    DirBase: cffb6f40  ObjectTable: e6c41410  HandleCount:  79.
    Image: AnotherApp.exe

PROCESS 8607c020  SessionId: 1  Cid: 44c8    Peb: 7ffdc000  ParentCid: 4cf8
    DirBase: cffb7080  ObjectTable: e3c9fd38  HandleCount: 25407.
    Image: explorer.exe

[...]

PROCESS 85e94738  SessionId: 1  Cid: 4868    Peb: 7ffde000  ParentCid: 44c8
    DirBase: cffb7c00  ObjectTable: e7038840  HandleCount:  39.
    Image: application.exe

PROCESS 85e5c020  SessionId: 1  Cid: 4668    Peb: 7ffd4000  ParentCid: 4a50
    DirBase: cffb7de0  ObjectTable: e11891d0  HandleCount: 172.
    Image: application.exe

PROCESS 85e51578  SessionId: 1  Cid: 4c9c    Peb: 7ffde000  ParentCid: 44c8
    DirBase: cffb7e40  ObjectTable: e2ee0070  HandleCount:  39.
    Image: application.exe

[...]

PROCESS 85c81020  SessionId: 1  Cid: 53a4    Peb: 7ffd7000  ParentCid: 44c8
    DirBase: cffb89c0  ObjectTable: e6d2f600  HandleCount:  39.
    Image: application.exe

PROCESS 85c6fb18  SessionId: 1  Cid: 53a8    Peb: 7ffd7000  ParentCid: 44c8
    DirBase: cffb89e0  ObjectTable: e54df078  HandleCount:  39.
    Image: application.exe

PROCESS 85c60020  SessionId: 1  Cid: 53bc    Peb: 7ffdf000  ParentCid: 44c8
    DirBase: cffb8a40  ObjectTable: e1214e90  HandleCount:  39.
    Image: application.exe

[...]

We can also see the very big number of handles in both csrss.exe and explorer.exe. If we look at one of 5,000 application.exe processes we would see that it is a GUI application and this explains session pool exhaustion:

1: kd> !process 85c60020 1f
PROCESS 85c60020  SessionId: 1  Cid: 53bc    Peb: 7ffdf000  ParentCid: 44c8
    DirBase: cffb8a40  ObjectTable: e1214e90  HandleCount:  39.
    Image: application.exe
    VadRoot 85c97450 Vads 80 Clone 0 Private 173. Modified 159. Locked 0.
    DeviceMap e3743340
    Token                             e2ead5e0
    ElapsedTime                       00:54:17.218
    UserTime                          00:00:00.015
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         32460
    QuotaPoolUsage[NonPagedPool]      3200
    Working Set Sizes (now,min,max)  (30, 50, 345) (120KB, 200KB, 1380KB)
    PeakWorkingSetSize                778
    VirtualSize                       16 Mb
    PeakVirtualSize                   17 Mb
    PageFaultCount                    829
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      212

THREAD 85c25db0  Cid 53bc.5470  Teb: 7ffde000 Win32Thread: bd1a18a8 WAIT: (Unknown) UserMode Non-Alertable
    85bbf2a8  SynchronizationEvent
Not impersonating
DeviceMap                 e3743340
Owning Process            85c60020       Image:         application.exe
Wait Start TickCount      1121133        Ticks: 206877 (0:00:53:52.453)
Context Switch Count      19355  NoStackSwap    LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.000
Win32 Start Address application (0x00402a38)
Start Address kernel32!BaseProcessStartThunk (0x7c8217f8)
Stack Init ad2b8000 Current ad2b7b1c Base ad2b8000 Limit ad2b4000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
ad2b7b34 80833ec5 nt!KiSwapContext+0x26
ad2b7b60 80829bc0 nt!KiSwapThread+0x2e5
ad2b7ba8 bf89aacc nt!KeWaitForSingleObject+0x346
ad2b7c04 bf8c4816 win32k!xxxSleepThread+0×1be
ad2b7ca0 bf8a134f win32k!xxxInterSendMsgEx+0×798
ad2b7cec bf85ae3c win32k!xxxSendMessageTimeout+0×1f3
ad2b7d10 bf8c13dc win32k!xxxWrapSendMessage+0×1b
ad2b7d40 8088ad3c win32k!NtUserMessageCall+0×9d

ad2b7d40 7c9485ec nt!KiFastCallEntry+0xfc (TrapFrame @ ad2b7d64)
0012f814 00000000 ntdll!KiFastSystemCallRet

We also see extremely high executive resource contention and various signs of a busy system like the fact that all processors are busy and very high number of ready threads at normal priority (about 1,000):

1: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****

Resource @ DriverA (0xbae016a0)    Exclusively owned
    Contention Count = 277088870
    NumberOfExclusiveWaiters = 3591
     Threads: 83b46db0-01<*> << Too many exclusive waiters to list>>

1: kd> !running

System Processors ff (affinity mask)
  Idle Processors 0

     Prcb      Current   Next   
  0  ffdff120  86300b40            …………….
  1  f772f120  f77320a0  852fadb0  …………….
  2  f7737120  f773a0a0  861528d0  …………….
  3  f773f120  f77420a0  877a4020  …………….
  4  f7747120  f774a0a0  865909a8  …………….
  5  f774f120  f77520a0  83d62cd0  …………….
  6  f7757120  f775a0a0  841aa020  …………….
  7  f775f120  f77620a0  ffb7b5e0  …………….

1: kd> !ready
Processor 0: No threads in READY state
Processor 1: Ready Threads at priority 10
    THREAD 861b1898  Cid 4860.4894  Teb: 7ffd8000 Win32Thread: bc1669a8 READY
Processor 1: Ready Threads at priority 9
    THREAD 83b46db0  Cid 44c8.a11c  Teb: 7f8f8000 Win32Thread: bcbabb38 READY
Processor 1: Ready Threads at priority 8
    THREAD 851408d0  Cid 6338.6398  Teb: 7ffdf000 Win32Thread: bc9bfbb8 READY
    THREAD 83eb1730  Cid 8f30.9568  Teb: 7ffdf000 Win32Thread: 00000000 READY
    THREAD fd310af8  Cid d53c.b4d8  Teb: 7ffdf000 Win32Thread: 00000000 READY
    THREAD 83bce8d0  Cid abec.6450  Teb: 7ffdd000 Win32Thread: 00000000 READY
    THREAD 83de2388  Cid a618.d79c  Teb: 7ffde000 Win32Thread: 00000000 READY
    THREAD 84197b40  Cid 44c8.6974  Teb: 7fe6a000 Win32Thread: bc662d48 READY
[…]

Processor 1: Ready Threads at priority 7
    THREAD 86d0c890  Cid 3524.365c  Teb: 7ff8e000 Win32Thread: bc403bb0 READY
Processor 2: No threads in READY state
Processor 3: No threads in READY state
Processor 4: No threads in READY state
Processor 5: No threads in READY state
Processor 6: No threads in READY state
Processor 7: No threads in READY state

There is AnotherApp.exe in the session process list that executes a VBS script:  

1: kd> .process /r /p 86083020
Implicit process is now 86083020
Loading User Symbols

1: kd> !process 86083020
[...]

        THREAD 860b0db0  Cid 4560.4e4c  Teb: 7ffdd000 Win32Thread: bc1bf4d0 WAIT: (Unknown) KernelMode Non-Alertable
            860b0e28  NotificationTimer

1: kd> .thread 860b0db0
Implicit thread is now 860b0db0

1: kd> kv 100
ChildEBP RetAddr  Args to Child             
ae7ec76c 80833ec5 860b0db0 860b0ea0 00000000 nt!KiSwapContext+0x26
ae7ec798 80829069 0000001e 00000000 bae016a0 nt!KiSwapThread+0x2e5
ae7ec7e0 badff394 00000000 00000000 ffdff5f0 nt!KeDelayExecutionThread+0x2ab
[...]
0012f6ac 7c8024d5 00000000 00000000 00a43280 kernel32!CreateProcessInternalA+0x28b
0012f6e4 00401b31 00000000 00a43280 00000000 kernel32!CreateProcessA+0×2c
[…]

1: kd> da /c 100 00a43280
00a43280  “wscript.exe C:\Scripts\script.vbs parameters”

The examination of that script showed that it contained a coding error resulted in an infinite loop.

- Dmitry Vostokov @ DumpAnalysis.org -

Insufficient memory, handle leak, wait chain, deadlock, inconsistent dump and overaged system: pattern cooperation

Wednesday, December 24th, 2008

In one complete memory dump taken from the system refusing user connections but not hung completely we can see the signs of past pool allocation failures (see Insufficient Memory):

0: kd> !vm
[...]
       PagedPool Usage:       47391 (    189564 Kb)
       PagedPool Maximum:     67584 (    270336 Kb)

       ********** 981 pool allocations have failed **********
[…]

We check paged pool usage but the output is inconsistent (shown in magenta color):

0: kd> !poolused 4
   Sorting by  Paged Pool Consumed

  Pool Used:
            NonPaged            Paged
 Tag    Allocs     Used    Allocs     Used
 LSmi        0        0        -1 4294967240 BlockTypeMisc
 PpEE        0        0        -1 4294967040 PNP_DEVICE_EVENT_ENTRY_TAG , Binary: nt!pnp
 CM         58     2320        -1 4294967000 Configuration Manager (registry) , Binary: nt!cm
 SeSc        0        0       -65 4294966112 Captured Security Descriptor , Binary: nt!se
 RxMs        1     1096       -99 4294947312 misc.
 CM38        0        0        -2 4294942720 Internal Configuration manager allocations , Binary: nt!cm
 RxFc        0        0        -8 4294879664 FCB
 Lfs         0        0      -907 4294872976 Lfs allocations
 xSMB        0        0      -179 4293500928 IFSKIT sample SMB mini-redirector , Binary: smbmrx.sys

 AAAA        4      224       581 51639048 UNKNOWN pooltag ‘AAAA’, please update pooltag.txt
 BBBB        2    65664      2582 16362984 UNKNOWN pooltag ‘BBBB’, please update pooltag.txt

 MmSt        0        0     10718 14944776 Mm section object prototype ptes , Binary: nt!mm
[…]

However we see that drivers using AAAA and BBBB consumed almost 65Mb and we can search for them as described here.

Dumping processes we notice signs of possible handle leak:

0: kd> !process 0 0
[...]
PROCESS 89ac09c0  SessionId: 0  Cid: 04b0    Peb: 7ffd5000  ParentCid: 0480
    DirBase: cc210000  ObjectTable: e13991a0  HandleCount: 3329.
    Image: csrss.exe

PROCESS 89ae4508  SessionId: 0  Cid: 07a8    Peb: 7ffdf000  ParentCid: 04f4
    DirBase: cb330000  ObjectTable: e115c220  HandleCount: 4476.
    Image: svchost.exe

PROCESS 868d1d88  SessionId: 0  Cid: 4120    Peb: 7ffd8000  ParentCid: 04f4
    DirBase: 95558000  ObjectTable: e1135428  HandleCount: 2255.
    Image: AppC.exe
[…]

We see lots of threads in the process 89ae4508 waiting for LPC reply:

0: kd> !thread 86b64388 1f
THREAD 86b64388  Cid 07a8.0fbc  Teb: 7ff73000 Win32Thread: bc173ac0 WAIT: (Unknown) UserMode Non-Alertable
    86b64574  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 06345018:
Current LPC port e169dd90
Impersonation token:  e492a6c0 (Level Impersonation)
DeviceMap                 e1603ce8
Owning Process            89ae4508       Image:         svchost.exe
Wait Start TickCount      148053822      Ticks: 23982406 (4:08:05:25.093)
Context Switch Count      11                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address DllA!ThreadEntry (0×752e27fe)
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 901e5000 Current 901e4c08 Base 901e5000 Limit 901e2000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

Now we follow LPC wait chain:

0: kd> !lpc message 06345018
Searching message 6345018 in threads ...
Client thread 86b64388 waiting a reply from 6345018                         
    Server thread 87f0d790 is working on message 6345018
[…]

0: kd> !thread 87f0d790 1f
THREAD 87f0d790  Cid 0de4.5b2c  Teb: 7ff8f000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    87f0d97c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 0634501d:
Current LPC port eb2e6450
Impersonation token:  e93ff870 (Level Impersonation)
DeviceMap                 e1603ce8
Owning Process            89c36690       Image:         AppA.exe
Wait Start TickCount      113650910      Ticks: 58385318 (10:13:24:30.593)
Context Switch Count      373            
UserTime                  00:00:00.015
KernelTime                00:00:00.000
Win32 Start Address 0×06345018
LPC Server thread working on message Id 6345018
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 8b0e6000 Current 8b0e5c08 Base 8b0e6000 Limit 8b0e3000 Call 0
Priority 11 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

0: kd> !lpc message 0634501d
Searching message 634501d in threads ...
Client thread 87f0d790 waiting a reply from 634501d                         
    Server thread 89137780 is working on message 634501d
[…]

0: kd> !thread 89137780 1f
THREAD 89137780  Cid 1884.41f8  Teb: 7ff90000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8913796c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 064aa11b:
Current LPC port ea3fc860
Impersonation token:  e93ff870 (Level Impersonation)
DeviceMap                 e1603ce8
Owning Process            8a608020       Image:         AppB.exe
Wait Start TickCount      148002015      Ticks: 24034213 (4:08:18:54.578)
Context Switch Count      700            
UserTime                  00:00:00.015
KernelTime                00:00:00.000
Win32 Start Address 0×0634501d
LPC Server thread working on message Id 634501d
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 8b749000 Current 8b748c08 Base 8b749000 Limit 8b746000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

0: kd> !lpc message 064aa11b
Searching message 64aa11b in threads ...
Client thread 89137780 waiting a reply from 64aa11b                         
    Server thread 87acb728 is working on message 64aa11b
[…]

0: kd> !thread 87acb728 1f
THREAD 87acb728  Cid 4120.4078  Teb: 7ff3f000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    87acb914  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 064aa127:
Current LPC port e7ec63f0
Not impersonating
DeviceMap                 e1003910
Owning Process            868d1d88       Image:         AppC.exe
Wait Start TickCount      147996856      Ticks: 24039372 (4:08:20:15.187)
Context Switch Count      440            
UserTime                  00:00:00.812
KernelTime                00:00:00.015
Win32 Start Address 0×064aa11b
LPC Server thread working on message Id 64aa11b
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 8ba35000 Current 8ba34c08 Base 8ba35000 Limit 8ba32000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

0: kd> !lpc message 064aa127
Searching message 64aa127 in threads ...
    Server thread 899e1750 is working on message 64aa127
Client thread 87acb728 waiting a reply from 64aa127                         
[…]

0: kd> !thread 899e1750 1f
THREAD 899e1750  Cid 0a0c.6cfc  Teb: 7ff16000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    89293390  Mutant - owning thread 89a9dc38
Not impersonating
DeviceMap                 e1003910
Owning Process            892b8a38       Image:         svchost.exe
Wait Start TickCount      148115996      Ticks: 23920232 (4:07:49:13.625)
Context Switch Count      166            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×064aa127
LPC Server thread working on message Id 64aa127
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 8c7b1000 Current 8c7b0c60 Base 8c7b1000 Limit 8c7ae000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

We finally come to a thread waiting for a mutant and we inspect its owner:

0: kd> !thread 89a9dc38 1f
THREAD 89a9dc38  Cid 0a0c.185c  Teb: 7ff8d000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    89a9de24  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 064a90f1:
Current LPC port e15992a8
Not impersonating
DeviceMap                 e1003910
Owning Process            892b8a38       Image:         svchost.exe
Wait Start TickCount      148115996      Ticks: 23920232 (4:07:49:13.625)
Context Switch Count      29043            
UserTime                  00:00:01.046
KernelTime                00:00:00.968
Win32 Start Address 0×064a8fb6
LPC Server thread working on message Id 64a8fb6
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 92201000 Current 92200c08 Base 92201000 Limit 921fe000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

We see it is waiting for an LPC reply from the thread waiting for the mutant we already saw:

0: kd> !lpc message 064a90f1
Searching message 64a90f1 in threads ...
Client thread 89a9dc38 waiting a reply from 64a90f1                      
    Server thread 88806a28 is working on message 64a90f1
[…]

0: kd> !thread 88806a28 1f
THREAD 88806a28  Cid 0a0c.10b8  Teb: 7ff82000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    89293390  Mutant - owning thread 89a9dc38
Not impersonating
DeviceMap                 e1003910
Owning Process            892b8a38       Image:         svchost.exe
Wait Start TickCount      148115996      Ticks: 23920232 (4:07:49:13.625)
Context Switch Count      532            
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0×064a90f1
LPC Server thread working on message Id 64a90f1
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init 94ef3000 Current 94ef2c60 Base 94ef3000 Limit 94ef0000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.

We have

… -> Thread(899e1750) -> Mutant(89293390) Thread(89a9dc38) -> LPC(064a90f1) Thread(88806a28) -> Mutant(89293390) Thread(89a9dc38)

This looks like a deadlock although we cannot examine stack traces which are not resident (stack data resides in a page file).

We also notice the system uptime which might suggest that all these abnormalities had been gradually accumulated (see Overaged System):

0: kd> version
[…]
System Uptime: 61 days 3:40:01.122
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Blocked GUI thread, wait chain and virtualized process: pattern cooperation

Friday, December 19th, 2008

In one system even Task Manager was hanging and forced complete memory dump reveals a blocked GUI dialog thread waiting for an LPC reply:

THREAD fffffade69e36310  Cid 4b30.4f34  Teb: 000007fffffde000 Win32Thread: fffff97ff3f97620 WAIT: (Unknown) UserMode Non-Alertable
    fffffade69e36678  Semaphore Limit 0x1
Waiting for reply to LPC MessageId a7531ec4:
Current LPC port fffffa8007118190
Not impersonating
DeviceMap                 fffffa80021c6240
Owning Process            fffffadee0332040       Image:         taskmgr.exe
Wait Start TickCount      80209499       Ticks: 5596 (0:00:01:27.437)
Context Switch Count      173                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.031
Win32 Start Address taskmgr!wWinMainCRTStartup (0×000000010001bfe0)
Start Address kernel32!BaseProcessStart (0×0000000077d59620)
Stack Init fffffad7c6e5ae00 Current fffffad7c6e5a8e0
Base fffffad7c6e5b000 Limit fffffad7c6e52000 Call 0
Priority 14 BasePriority 13 PriorityDecrement 0
Child-SP          RetAddr           Call Site
fffffad7`c6e5a920 fffff800`01027752 nt!KiSwapContext+0×85
fffffad7`c6e5aaa0 fffff800`0102835e nt!KiSwapThread+0×3c9
fffffad7`c6e5ab00 fffff800`01283ebb nt!KeWaitForSingleObject+0×5a6
fffffad7`c6e5ab80 fffff800`0102e3fd nt!NtRequestWaitReplyPort+0xa60
fffffad7`c6e5ac70 00000000`77ef0c0a nt!KiSystemServiceCopyEnd+0×3 (TrapFrame @ fffffad7`c6e5ac70)
00000000`000adef8 000007ff`7fd6df46 ntdll!ZwRequestWaitReplyPort+0xa
00000000`000adf00 000007ff`7fd6e17f RPCRT4!LRPC_CCALL::SendReceive+0×447
00000000`000adfe0 000007ff`7fe974c8 RPCRT4!NdrSendReceive+0xf6
00000000`000ae010 000007ff`7fe975bd RPCRT4!NdrpClientCall2+0×7e2
00000000`000ae5f0 000007ff`7d4d8a0d RPCRT4!NdrClientCall2+0×1d
00000000`000ae620 00000001`00011029 WINSTA!WinStationGetProcessSid+0xfd
00000000`000ae6a0 00000001`00010e3b taskmgr!CProcInfo::SetProcessUsername+0×189
00000000`000ae930 00000001`000118e9 taskmgr!CProcInfo::SetData+0×53b
00000000`000aebb0 00000001`000122f4 taskmgr!CProcPage::UpdateProcInfoArray+0×449
00000000`000aee70 00000001`00013eab taskmgr!CProcPage::TimerEvent+0×14
00000000`000aeea0 00000001`00004e8f taskmgr!CProcPage::Initialize+0×29b
00000000`000af320 00000001`000070e8 taskmgr!MainWnd_OnInitDialog+0×2ff
00000000`000af620 00000000`77c34582 taskmgr!MainWindowProc+0×3f8
00000000`000af680 00000000`77c34bf2 USER32!UserCallDlgProcCheckWow+0×210
00000000`000af760 00000000`77c2ff5f USER32!DefDlgProcWorker+0xf6
00000000`000af7f0 00000000`77c29beb USER32!InternalCreateDialog+0×16a7
00000000`000af980 00000001`00007c97 USER32!CreateDialogParamW+0xab
00000000`000afa00 00000001`0001c24b taskmgr!wWinMain+0×5f7
00000000`000afed0 00000000`77d5964c taskmgr!wWinMainCRTStartup+0×26b
00000000`000aff80 00000000`00000000 kernel32!BaseProcessStart+0×29

We search for that LPC message:

6: kd> !lpc message a7531ec4
Searching message a7531ec4 in threads ...
    Server thread fffffade6b8cf7a0 is working on message a7531ec4
Client thread fffffade69e36310 waiting a reply from a7531ec4                         
[…]

and then examine the corresponding server thread (for visual clarity only 1 argument is shown for each function call):

6: kd> !thread fffffade6b8cf7a0 16
THREAD fffffade6b8cf7a0  Cid 0238.8ff4  Teb: 000007fffff48000 Win32Thread: 0000000000000000 WAIT: (Unknown) UserMode Non-Alertable
    fffffade6f2e93c0  SynchronizationEvent
Impersonation token:  fffffa8013a17700 (Level Impersonation)
DeviceMap                 fffffa80021c6240
Owning Process            fffffade6f337040       Image:         svchost.exe
Wait Start TickCount      80209499       Ticks: 5596 (0:00:01:27.437)
Context Switch Count      40138            
UserTime                  00:00:00.140
KernelTime                00:00:00.312
Win32 Start Address 0×00000000a7531ec4
LPC Server thread working on message Id a7531ec4
Start Address kernel32!BaseThreadStart (0×0000000077d6b6a0)
Stack Init fffffad867f5ae00 Current fffffad867f5a950
Base fffffad867f5b000 Limit fffffad867f55000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Child-SP          RetAddr           : Args to Child     : Call Site
fffffad8`67f5a990 fffff800`01027752 : fffffa80`071181f0 : nt!KiSwapContext+0×85
fffffad8`67f5ab10 fffff800`0102835e : 00000000`00000000 : nt!KiSwapThread+0×3c9
fffffad8`67f5ab70 fffff800`0127f03f : 00000000`00000000 : nt!KeWaitForSingleObject+0×5a6
fffffad8`67f5abf0 fffff800`0102e3fd : fffffade`6b8cf7a0 : nt!NtWaitForSingleObject+0xc1
fffffad8`67f5ac70 00000000`77ef0a2a : 00000000`77efb9a6 : nt!KiSystemServiceCopyEnd+0×3 (TrapFrame @ fffffad8`67f5ac70)
00000000`04d0f098 00000000`77efb9a6 : 00000000`00000000 : ntdll!NtWaitForSingleObject+0xa
00000000`04d0f0a0 00000000`77efba10 : 000007ff`7b8c3500 : ntdll!RtlpWaitOnCriticalSection+0×240
00000000`04d0f120 000007ff`7b887042 : 00000000`04d0f300 : ntdll!RtlEnterCriticalSection+0xa9
[…]
00000000`04d0f190 000007ff`7fe96cc9 : 00000000`00000048 : RPCRT4!Invoke+0×65
00000000`04d0f210 000007ff`7fe9758d : 00000000`04d0f8f0 : RPCRT4!NdrStubCall2+0×54d
00000000`04d0f7d0 000007ff`7fd697b4 : 00000000`0450a8fd : RPCRT4!NdrServerCall2+0×1d
00000000`04d0f800 000007ff`7fd6990d : 00000000`00377aa0 : RPCRT4!DispatchToStubInCNoAvrf+0×14
00000000`04d0f830 000007ff`7fd69766 : 00000000`00000001 : RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×50d
00000000`04d0f9a0 000007ff`7fd6b214 : 00000000`06c18ba0 : RPCRT4!RPC_INTERFACE::DispatchToStub+0×2ec
00000000`04d0fa20 000007ff`7fd6b9e3 : 00000000`06c18ba0 : RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×63b
00000000`04d0fae0 000007ff`7fd7007c : 00000000`048348a0 : RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×3bf
00000000`04d0fba0 000007ff`7fd45369 : 00000000`0525fd30 : RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×710
00000000`04d0feb0 000007ff`7fd65996 : 00000000`00000000 : RPCRT4!RecvLotsaCallsWrapper+0×9
00000000`04d0fee0 000007ff`7fd65d51 : 00000000`0492de90 : RPCRT4!BaseCachedThreadRoutine+0xde
00000000`04d0ff50 00000000`77d6b6da : 00000000`77d6b6a0 : RPCRT4!ThreadStartRoutine+0×21
00000000`04d0ff80 00000000`00000000 : 00000000`00000000 : kernel32!BaseThreadStart+0×3a 

We see the thread above is waiting for a critical section and we examine it using !cs WinDbg extension command after switching to the process context of that thread:

6: kd> .process /r /p fffffade6f337040
Implicit process is now fffffade`6f337040
Loading User Symbols

6: kd> !cs 000007ff`7b8c3500
Critical section   = 0×000007ff7b8c3500 (DllA!Lock+0×0)
DebugInfo          = 0×00000000000ffea0
LOCKED
LockCount          = 0×8
WaiterWoken        = No
OwningThread       = 0×0000000000004100
RecursionCount     = 0×1
LockSemaphore      = 0×8E0
SpinCount          = 0×0000000000000000 

From the output we know the critical section owner thread and examine it too (function call arguments are not shown for visual clarity):

6: kd> !thread -t 4100 16
Looking for thread Cid = 4100 …
THREAD fffffade6b9df4f0  Cid 0238.4100  Teb: 000007fffff58000 Win32Thread: 0000000000000000 WAIT: (Unknown) UserMode Non-Alertable
    fffffadff5edadc0  Mutant - owning thread fffffadfe2486bf0
Impersonation token:  fffffa8010197250 (Level Impersonation)
Owning Process            fffffade6f337040       Image:         svchost.exe
Wait Start TickCount      79672315       Ticks: 542780 (0:02:21:20.937)
Context Switch Count      174            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×00000000a75007ac
LPC Server thread working on message Id a75007ac
Start Address kernel32!BaseThreadStart (0×0000000077d6b6a0)
Stack Init fffffad83a94ee00 Current fffffad83a94e950
Base fffffad83a94f000 Limit fffffad83a949000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
Child-SP          RetAddr           : Call Site
fffffad8`3a94e990 fffff800`01027752 : nt!KiSwapContext+0×85
fffffad8`3a94eb10 fffff800`0102835e : nt!KiSwapThread+0×3c9
fffffad8`3a94eb70 fffff800`0127f03f : nt!KeWaitForSingleObject+0×5a6
fffffad8`3a94ebf0 fffff800`0102e3fd : nt!NtWaitForSingleObject+0xc1
fffffad8`3a94ec70 00000000`77ef0a2a : nt!KiSystemServiceCopyEnd+0×3 (TrapFrame @ fffffad8`3a94ec70)
00000000`03c8efa8 000007ff`7b880427 : ntdll!NtWaitForSingleObject+0xa
[…]
00000000`03c8f1a0 000007ff`7fe96cc9 : RPCRT4!Invoke+0×65
00000000`03c8f210 000007ff`7fe9758d : RPCRT4!NdrStubCall2+0×54d
00000000`03c8f7d0 000007ff`7fd697b4 : RPCRT4!NdrServerCall2+0×1d
00000000`03c8f800 000007ff`7fd6990d : RPCRT4!DispatchToStubInCNoAvrf+0×14
00000000`03c8f830 000007ff`7fd69766 : RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×50d
00000000`03c8f9a0 000007ff`7fd6b214 : RPCRT4!RPC_INTERFACE::DispatchToStub+0×2ec
00000000`03c8fa20 000007ff`7fd6b9e3 : RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×63b
00000000`03c8fae0 000007ff`7fd7007c : RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×3bf
00000000`03c8fba0 000007ff`7fd45369 : RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×710
00000000`03c8feb0 000007ff`7fd65996 : RPCRT4!RecvLotsaCallsWrapper+0×9
00000000`03c8fee0 000007ff`7fd65d51 : RPCRT4!BaseCachedThreadRoutine+0xde
00000000`03c8ff50 00000000`77d6b6da : RPCRT4!ThreadStartRoutine+0×21
00000000`03c8ff80 00000000`00000000 : kernel32!BaseThreadStart+0×3a

We see that it is waiting for a mutant and examine its owner thread:

6: kd> !thread fffffadfe2486bf0 16
THREAD fffffadfe2486bf0  Cid 0238.7010  Teb: 000007fffff5e000 Win32Thread: 0000000000000000 WAIT: (Unknown) UserMode Non-Alertable
    fffffadfe2486f58  Semaphore Limit 0x1
Waiting for reply to LPC MessageId a7507485:
Current LPC port fffffa8006822c80
Not impersonating
DeviceMap                 fffffa8000003790
Owning Process            fffffade6f337040       Image:         svchost.exe
Wait Start TickCount      79744596       Ticks: 470499 (0:02:02:31.546)
Context Switch Count      29636            
UserTime                  00:00:00.062
KernelTime                00:00:00.156
Win32 Start Address 0×00000000a74ff6e4
LPC Server thread working on message Id a74ff6e4
Start Address kernel32!BaseThreadStart (0×0000000077d6b6a0)
Stack Init fffffad7d98a0e00 Current fffffad7d98a08e0
Base fffffad7d98a1000 Limit fffffad7d989b000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
Child-SP          RetAddr           : Call Site
fffffad7`d98a0920 fffff800`01027752 : nt!KiSwapContext+0×85
fffffad7`d98a0aa0 fffff800`0102835e : nt!KiSwapThread+0×3c9
fffffad7`d98a0b00 fffff800`01283ebb : nt!KeWaitForSingleObject+0×5a6
fffffad7`d98a0b80 fffff800`0102e3fd : nt!NtRequestWaitReplyPort+0xa60
fffffad7`d98a0c70 00000000`77ef0c0a : nt!KiSystemServiceCopyEnd+0×3 (TrapFrame @ fffffad7`d98a0c70)
00000000`0430dc18 000007ff`7fd6df46 : ntdll!ZwRequestWaitReplyPort+0xa
00000000`0430dc20 000007ff`7fd6e17f : RPCRT4!LRPC_CCALL::SendReceive+0×447
00000000`0430dd10 000007ff`7fe974c8 : RPCRT4!NdrSendReceive+0xf6
00000000`0430dd40 000007ff`7fe975bd : RPCRT4!NdrpClientCall2+0×7e2
00000000`0430e320 00000000`676b1dd7 : RPCRT4!NdrClientCall2+0×1d
[…]

The thread above is waiting for an LPC message and we examine its corresponding server thread:

6: kd> !lpc message a7507485
Searching message a7507485 in threads ...
Client thread fffffadfe2486bf0 waiting a reply from a7507485                         
    Server thread fffffade6f3c9720 is working on message a7507485                        
[...]

6: kd> !thread fffffade6f3c9720 16
THREAD fffffade6f3c9720  Cid 6308.a1c8  Teb: 000000007ef24000 Win32Thread: 0000000000000000 WAIT: (Unknown) UserMode Non-Alertable
    fffffade68d31a40  SynchronizationEvent
Not impersonating
DeviceMap                 fffffa80016ce1b0
Owning Process            fffffade69c29430       Image:         AppA.exe
Wait Start TickCount      79744596       Ticks: 470499 (0:02:02:31.546)
Context Switch Count      7230            
UserTime                  00:00:00.218
KernelTime                00:00:00.109
Win32 Start Address 0×00000000a7507485
LPC Server thread working on message Id a7507485
Start Address kernel32!BaseThreadStartThunk (0×000000007d4d1504)
Stack Init fffffad83f7e4e00 Current fffffad83f7e4950
Base fffffad83f7e5000 Limit fffffad83f7df000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
Child-SP          RetAddr           : Call Site
fffffad8`3f7e4990 fffff800`01027752 : nt!KiSwapContext+0×85
fffffad8`3f7e4b10 fffff800`0102835e : nt!KiSwapThread+0×3c9
fffffad8`3f7e4b70 fffff800`0127f03f : nt!KeWaitForSingleObject+0×5a6
fffffad8`3f7e4bf0 fffff800`0102e3fd : nt!NtWaitForSingleObject+0xc1
fffffad8`3f7e4c70 00000000`78b842d9 : nt!KiSystemServiceCopyEnd+0×3 (TrapFrame @ fffffad8`3f7e4c70)
00000000`0257f108 00000000`78b840e5 : wow64cpu!CpupSyscallStub+0×9
00000000`0257f110 00000000`6b006a5a : wow64cpu!Thunk0ArgReloadState+0×1a
00000000`0257f180 00000000`6b005e0d : wow64!RunCpuSimulation+0xa
00000000`0257f1b0 00000000`77f109f0 : wow64!Wow64LdrpInitialize+0×2ed

00000000`0257f6f0 00000000`77ef30a5 : ntdll!LdrpInitialize+0×2aa
00000000`0257f7d0 00000000`7d4d1504 : ntdll!KiUserApcDispatcher+0×15 (TrapFrame @ 00000000`0257fb38)
00000000`0257fcc8 00000000`00000000 : kernel32!BaseThreadStartThunk

We see the thread belongs to a virtualized WOW64 process and presently I’m not aware of how to get its 32-bit stacks from complete memory dumps. WOW64 extensions seems only working in process user dumps. Therefore I recommended to get a userdump of AppA process using 32-bit version of userdump.exe as soon as the same condition starts surfacing again.

- Dmitry Vostokov @ DumpAnalysis.org -

Truncated dump, spiking thread, not my version and hooked functions: pattern cooperation

Friday, November 14th, 2008

Recently I got another manual complete memory dump from a hang workstation where default analysis pointed to a possibility for a Spiking Thread pattern:

0: kd> !analyze -v
[...]
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
[...]
GetContextState failed, 0x80070026
Unable to read selector for PCR for processor 1
GetContextState failed, 0x80070026
Unable to read selector for PCR for processor 1

PROCESS_NAME:  AppA.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 808b73a4 to 808b72cb

STACK_TEXT: 
f46f5b44 808b73a4 e1a36008 00000004 00000018 nt!HvpFindFreeCellInThisViewWindow+0xd7
f46f5b6c 808bd07b e1a36008 00000002 00000018 nt!HvpFindFreeCell+0x98
f46f5b98 808bd588 e1a36008 e1a362fc 00000000 nt!HvpDoAllocateCell+0x69
f46f5bbc 808d0b22 e1a36008 009214a0 c94e24a4 nt!HvReallocateCell+0x9a
f46f5bdc 808c1028 e1a36008 051fa3e8 00000003 nt!CmpAddValueToList+0x46
f46f5c28 808c406a e1a36008 cddb7ccc f46f5d0c nt!CmpSetValueKeyNew+0xfa
f46f5cb4 808b7e2f e45872e0 f46f5d0c 00000004 nt!CmSetValueKey+0x426
f46f5d44 8088978c 00000438 0012fad0 00000000 nt!NtSetValueKey+0x241
f46f5d44 7c8285ec 00000438 0012fad0 00000000 nt!KiFastCallEntry+0xfc
0012fa58 7c827b7b 77f77703 00000438 0012fad0 ntdll!KiFastSystemCallRet
0012fa5c 77f77703 00000438 0012fad0 00000000 ntdll!ZwSetValueKey+0xc
0012faa0 77f5ec90 00000438 0012fad0 00000004 ADVAPI32!LocalBaseRegSetValue+0x12c
0012fb04 60072e40 00000438 6290c0c4 00000000 ADVAPI32!RegSetValueExA+0x160
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fbf4 628e2d57 60062e70 60062b40 80000001 DLLA!GetObjectId+0×9600
[…]

FOLLOWUP_IP:
nt!HvpFindFreeCellInThisViewWindow+d7
808b72cb 034508          add     eax,dword ptr [ebp+8]

Looking at this thread information we see it RUNNING (this is also evident from its call stack):

0: kd> !thread
THREAD 8a0b2890  Cid 0814.10e8  Teb: 7ffdf000 Win32Thread: bc217c68 RUNNING on processor 0
Not impersonating
DeviceMap                 e440acc0
Owning Process            8a0d85f8       Image:         AppA.exe
Wait Start TickCount      153974         Ticks: 0
Context Switch Count      16905                 LargeStack
UserTime                  00:00:03.109
KernelTime                00:00:17.500
[…]

We also see that the thread accumulated 17 seconds as time spent in kernel. Switching to AppA process context and looking at its Image version we see that it is 5.80.x:

0: kd> lmv m AppA
start    end        module name
00400000 0049c000   AppA   (deferred)            
    Image path: C:\PROGRA~1\AppA\AppA.exe
    Image name: AppA.exe
    Timestamp:        Thu Jun 05 14:51:52 2008 (4847EF78)
    CheckSum:         0009D068
    ImageSize:        0009C000
    File version:     5.80.5.1764
    Product version:  5.80.0.0

However from Google search we can find that there is newer version available (variant of Not My Version pattern) and even some indication on various forums that the older ones had problems with CPU resource utilization. We may stop here but I usually scan all threads for any suspicious signs and we can see another running thread on the second CPU:

THREAD 8a2ed5d0  Cid 11b4.1100  Teb: 7ffdf000 Win32Thread: bc342b80 RUNNING on processor 1
Not impersonating
DeviceMap                 e44fc100
Owning Process            8a1efcb0       Image:         calc.exe
Wait Start TickCount      153973         Ticks: 1 (0:00:00:00.015)
Context Switch Count      50736                 LargeStack
UserTime                  00:01:04.515
KernelTime                00:00:00.015
Win32 Start Address calc (0×0101e23a)
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init f4cd6000 Current f4cd5d00 Base f4cd6000 Limit f4cd1000 Call 0
Priority 6 BasePriority 6 PriorityDecrement 0
Unable to get context for thread running on processor 1, Win32 error 0n38

We also see that this thread spent more than a minute in user mode. Unfortunately we cannot see its thread stack because the dump shows signs of Truncated Dump pattern:

Loading Dump File [MEMORY.DMP]
Kernel Complete Dump File: Full address space is available

************************************************************
WARNING: Dump file has been truncated.  Data may be missing.
************************************************************

[…]

0: kd> ~1
GetContextState failed, 0×80070026
Unable to read selector for PCR for processor 1
WARNING: Unable to reset page directories
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
Unable to get program counter
GetContextState failed, 0×80070026
Unable to get current machine context, Win32 error 0n38
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

The dump file size 4,177,920 Kb is less than amount of physical memory 4,192,948 Kb:

1: kd> !vm
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
Unable to get program counter

*** Virtual Memory Usage ***
 Physical Memory:     1048237 (   4192948 Kb)
 Page File: \??\R:\pagefile.sys
   Current:   4177920 Kb  Free Space:   4154440 Kb
   Minimum:   4177920 Kb  Maximum:      4194304 Kb

We can stop here and still recommend to upgrade AppA product seen from the thread running on the first processor but the fact that the second thread belongs to innocent calc.exe demands some attention. Was it calculating incessantly some financial figures following button clicks from a financial genius? Taking advantage of a complete memory dump and the fact that this process spent most of the time in user space we can check for Hooked Functions pattern:

1: kd> .process /r /p 8a1efcb0
Implicit process is now 8a1efcb0
Loading User Symbols
..........................
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026

1: kd> !chkimg -lo 50 -d !user32 -v
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
Unable to get program counter
Searching for module with expression: !user32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\USER32.dll\45D70AC791000\USER32.dll
No range specified

Scanning section:    .text
Size: 392891
Range to scan: 77381000-773e0ebb
    7738c341-7738c345  5 bytes - USER32!CreateWindowExA
 [ 8b ff 55 8b ec:e9 ba 3c 00 c0 ]
[…]
Total bytes compared: 73728(18%)
Number of errors: 75
75 errors : !user32 (7738c341-773a154d)
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

1: kd> u 7738c341
USER32!CreateWindowExA:
7738c341 e9ba3c00c0      jmp     37390000
7738c346 6801000040      push    40000001h
7738c34b ff7534          push    dword ptr [ebp+34h]
7738c34e ff7530          push    dword ptr [ebp+30h]
7738c351 ff752c          push    dword ptr [ebp+2Ch]
7738c354 ff7528          push    dword ptr [ebp+28h]
7738c357 ff7524          push    dword ptr [ebp+24h]
7738c35a ff7520          push    dword ptr [ebp+20h]
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

1: kd> u 37390000
37390000 e96b91562e      jmp     HookA!CreateWindowExA (658f9170)
37390005 8bff            mov     edi,edi
37390007 55              push    ebp
37390008 8bec            mov     ebp,esp
3739000a e937c3ff3f      jmp     USER32!CreateWindowExA+0×5 (7738c346)
3739000f 0000            add     byte ptr [eax],al
37390011 0000            add     byte ptr [eax],al
37390013 0000            add     byte ptr [eax],al
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

Indeed we see that HookA module is involved and we can recommend to test the stability of the system without the product that uses it or upgrading or disabling this component.

- Dmitry Vostokov @ DumpAnalysis.org -

Lateral damage, stack overflow and execution residue: pattern cooperation

Wednesday, November 5th, 2008

As I mentioned in comments to Lateral Damage pattern it lies in between the normal healthy dump files and corrupt dumps. For example, the following 8Gb complete memory dump that fits perfectly into 16Gb page file had the problem of missing processor control region making it impossible to get meaningful information from certain WinDbg commands:

0: kd> !analyze -v

[...]

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7727fe0
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------

Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3
Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3

[...]

STACK_TEXT: 
WARNING: Stack unwind information not available. Following frames may be wrong.
8089a600 8088ddf2 00000000 0000000e 00000000 processr+0x2886
8089a604 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa

[...]

0: kd> ~1
Unable to read selector for PCR for processor 1
WARNING: Unable to reset page directories

1: kd> !pcr
Unable to read selector for PCR for processor 1
Cannot get PRCB address 

1: kd> kv
ChildEBP RetAddr  Args to Child             
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 0×0

The bugcheck argument 1 shows that we have a double fault that most often results from kernel stack overflow. If we go back to processor 0 to inspect its TSS we don’t get meaningful results too (we expect the value of Backlink to be 0×28):

0: kd> !pcr
KPCR for Processor 0 at ffdff000:
    Major 1 Minor 1
 NtTib.ExceptionList: ffffffff
     NtTib.StackBase: 00000000
    NtTib.StackLimit: 00000000
  NtTib.SubSystemTib: 80042000
       NtTib.Version: 2a1b0b08
   NtTib.UserPointer: 00000001
       NtTib.SelfTib: 00000000

             SelfPcr: ffdff000
                Prcb: ffdff120
                Irql: 0000001f
                 IRR: 00000000
                 IDR: ffffffff
       InterruptMode: 00000000
                 IDT: 8003f400
                 GDT: 8003f000
                 TSS: 80042000

       CurrentThread: 8089d8c0
          NextThread: 00000000
          IdleThread: 8089d8c0

           DpcQueue:

0: kd> dt _KTSS 80042000
nt!_KTSS
   +0×000 Backlink         : 0xc45
   +0×002 Reserved0        : 0×4d8a
   +0×004 Esp0             : 0×8089a6a0
   +0×008 Ss0              : 0×10
   +0×00a Reserved1        : 0xb70f
   +0×00c NotUsed1         : [4] 0×5031ff00
   +0×01c CR3              : 0×8b55ff8b
   +0×020 Eip              : 0xc75ffec
   +0×024 EFlags           : 0xe80875ff
   +0×028 Eax              : 0xfffffbdd
   +0×02c Ecx              : 0×1b75c084
   +0×030 Edx              : 0×8b184d8b
   +0×034 Ebx              : 0×7d8b57d1
   +0×038 Esp              : 0×2e9c110
   +0×03c Ebp              : 0xf3ffc883
   +0×040 Esi              : 0×83ca8bab
   +0×044 Edi              : 0xaaf303e1
   +0×048 Es               : 0xeb5f
   +0×04a Reserved2        : 0×6819
   +0×04c Cs               : 0×24fc
   +0×04e Reserved3        : 0×44
   +0×050 Ss               : 0×75ff
   +0×052 Reserved4        : 0xff18
   +0×054 Ds               : 0×1475
   +0×056 Reserved5        : 0×75ff
   +0×058 Fs               : 0xff10
   +0×05a Reserved6        : 0xc75
   +0×05c Gs               : 0×75ff
   +0×05e Reserved7        : 0xe808
   +0×060 LDT              : 0
   +0×062 Reserved8        : 0xffff
   +0×064 Flags            : 0
   +0×066 IoMapBase        : 0×20ac
   +0×068 IoMaps           : [1] _KiIoAccessMap
   +0×208c IntDirectionMap  : [32]  “???”

However if we try to list all thread stacks we see one thread running on processor 1:

0: kd> !process 0 ff

[...]

THREAD 8a241db0  Cid 1218.4420  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
IRP List:
   8b200008: (0006,0244) Flags: 00000884  Mdl: 00000000
   89beedb8: (0006,0244) Flags: 00000884  Mdl: 00000000
Not impersonating
DeviceMap                 e1002060
Owning Process            8bc63d88       Image:         svchost.exe
Wait Start TickCount      10242012       Ticks: 0
Context Switch Count      1832            
UserTime                  00:00:00.000
KernelTime                00:00:00.046
Start Address termdd (0xf75cc218)
Stack Init 9c849000 Current 9c846938 Base 9c849000 Limit 9c846000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Unable to read selector for PCR for processor 1

[...]

Now we can look at its raw stack to see execution residue and try to reconstruct partial stack traces:

0: kd> dds 9c846000  9c849000
9c846000  94040001
9c846004  00000014
9c846008  8d147848
9c84600c  8d0bfd08
9c846010  8d0bfd00
9c846014  00000001
9c846018  8d0bfd08
9c84601c  8d0bfd00
9c846020  8d0bfd00
9c846024  9c846034
9c846028  80a5c456 hal!KfLowerIrql+0×62
9c84602c  8d0bfdd8
9c846030  8d0bfd00
9c846034  9c846060
9c846038  80a5a56d hal!KeReleaseQueuedSpinLock+0×2d
9c84603c  00000011
9c846040  00000001
9c846044  8a241db0
9c846048  0000000e
9c84604c  00000000
9c846050  8d0bfdc0
9c846054  05000000
9c846058  00007400
9c84605c  00000001
9c846060  9c846084
9c846064  808b6138 driverA!MapData+0×4a
9c846068  8d0bfd08
9c84606c  00007400
9c846070  00000000
9c846074  00000018
9c846078  00000028
9c84607c  00001000
9c846080  00000018
9c846084  9c84609c
9c846088  f7b8f2e5 driverB!CheckData+0×7a
9c84608c  01b47538
9c846090  00000028
9c846094  0000001c
[…]

0: kd> k L=9c846024 9c846024 9c846024
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846024 80a5c456 0×9c846024
9c846034 80a5a56d hal!KfLowerIrql+0×62
9c8460f0 8080d164 hal!KeReleaseQueuedSpinLock+0×2d
9c846060 808b6138 driverA!RemapData+0×3e
9c846084 f7b8f2e5 driverA!MapData+0×4a
9c84609c f7b8f340 driverB!CheckData+0×7a
9c8460e4 808b4000 driverB!CheckAttributes+0×36f
9c84610c f7b8e503 driverB!AddToRecord+0×2a
9c846174 f7b90df0 driverB!ReadRecord+0×1d0
f7b8e508 90909090 driverB!ReadAllRecords+0×7a
[…]

Using the current stack pointer we get another partial stack trace:

0: kd> k L=9c846034 9c846938 9c846938
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846954 8081df65 0×9c846938
9c846968 808f5437 nt!IofCallDriver+0×45
9c84697c 808ef963 nt!IopSynchronousServiceTail+0×10b
9c8469a0 8088978c nt!NtQueryDirectoryFile+0×5d
9c8469a0 8082f1c1 nt!KiFastCallEntry+0xfc
9c846a44 f5296f4b nt!ZwQueryDirectoryFile+0×11
9c846a90 f5297451 DriverC+0×2f4b
9c846adc f52a54cb DriverC+0×3451
9c846af8 f52a44e6 DriverC+0×114cb
9c846b1c f52b2941 DriverC+0×104e6
9c846b4c f52b2626 DriverC+0×1e941
9c846b88 f52a34a7 DriverC+0×1e626
9c846be8 f52a487c DriverC+0xf4a7
[…]

Using different base pointers for k command we can reconstruct different partial stack traces. We can also analyze the longest ones for any stack usage using variant of knf command that shows stack frame size in bytes and find drivers that consume the most of kernel stack. Because we see execution residue on top of the kernel stack (Limit) we can suspect this thread caused the actual stack overflow which resulted in the double fault bugcheck.

- Dmitry Vostokov @ DumpAnalysis.org -