Stack trace collection, suspended threads, not my version, special process, main thread and blocked LPC chain threads: pattern cooperation

It was reported that one server was hanging during automated reboot. Stack trace collection shows a few suspended and frozen threads. They all belong to the same process, ServiceA:

PROCESS 8545eb18  SessionId: 0  Cid: 0fec    Peb: 7ffd4000  ParentCid: 0fdc
    DirBase: 3fbeb8e0  ObjectTable: e19dd1d0  HandleCount: 169.
    Image: ServiceA.exe

THREAD 859cc900  Cid 0fec.0ff0  Teb: 7ffdf000 Win32Thread: bc1738d0 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1

       859cca90  Semaphore Limit 0×2

THREAD 858c6480  Cid 0fec.0ff4  Teb: 7ffde000 Win32Thread: bc178c40 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
       f55747d8  SynchronizationEvent

THREAD 859f2338  Cid 0fec.0ff8  Teb: 7ffdd000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1

       859f24c8  Semaphore Limit 0×2

THREAD 859be2b8  Cid 0fec.0ffc  Teb: 7ffdc000 Win32Thread: bc1915d8 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1

       859be448  Semaphore Limit 0×2

[...]

When zooming into this process we see that one thread was processing an exception:

0: kd> .process /r /p 8545eb18
Implicit process is now 8545eb18
Loading User Symbols

0: kd> !process 8545eb18
[...]
THREAD 858c6480  Cid 0fec.0ff4  Teb: 7ffde000 Win32Thread: bc178c40 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
    f55747d8  SynchronizationEvent
Not impersonating
DeviceMap                 e10008e8
Owning Process            8545eb18       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      6927           Ticks: 89866 (0:00:23:24.156)
Context Switch Count      156                 LargeStack
UserTime                  00:00:00.031
KernelTime                00:00:00.000
Win32 Start Address 0x611054cb
Start Address kernel32!BaseThreadStartThunk (0x7c8217ec)
Stack Init f5575000 Current f557471c Base f5575000 Limit f5571000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
f5574734 80833ec5 nt!KiSwapContext+0x26
f5574760 80829c14 nt!KiSwapThread+0x2e5
f55747a8 809a25c8 nt!KeWaitForSingleObject+0x346
f5574888 809a3739 nt!DbgkpQueueMessage+0x178
f55748ac 809a386e nt!DbgkpSendApiMessage+0x45
f5574938 8082d7ec nt!DbgkForwardException+0x90
f5574cf4 8088bed2 nt!KiDispatchException+0×1ea
f5574d5c 8088be86 nt!CommonDispatchException+0×4a
f5574da0 7c829c3a nt!Kei386EoiHelper+0×186
f5574dd0 00000000 kernel32!LoadResource+0×5d

We zoom into its parameters in search of semantically consistent output of .exr, .cxr and .trap commands:

0: kd> .thread 858c6480
Implicit thread is now 858c6480

0: kd> kv 100
ChildEBP RetAddr  Args to Child             
f5574734 80833ec5 858c6480 858c6528 00000200 nt!KiSwapContext+0x26
f5574760 80829c14 00000000 858c6480 f55747d0 nt!KiSwapThread+0x2e5
f55747a8 809a25c8 f55747d8 00000000 00000000 nt!KeWaitForSingleObject+0x346
f5574888 809a3739 8545eb18 00000000 f55748c0 nt!DbgkpQueueMessage+0x178
f55748ac 809a386e f55748c0 00000001 f5574d64 nt!DbgkpSendApiMessage+0x45
f5574938 8082d7ec f5574d10 00000001 00000000 nt!DbgkForwardException+0x90
f5574cf4 8088bed2 f5574d10 00000000 f5574d64nt!KiDispatchException+0×1ea
f5574d5c 8088be86 005bf4b4 61213267 badb0d00 nt!CommonDispatchException+0×4a
f5574da0 7c829c3a 71c22898 00000001 ffffffff nt!Kei386EoiHelper+0×186
f5574dd0 00000000 005bf448 00000023 00000000 kernel32!LoadResource+0×5d

After probing parameters for KiDispatchException we get these results pointing to ModuleA:

0: kd> .exr f5574d10
ExceptionAddress: 61213267 (ModuleA!GetData+0×0000b57f)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 71c22898
Attempt to read from address 71c22898

0: kd> .trap f5574d64
ErrCode = 00000004
eax=71c22898 ebx=0073a7a8 ecx=7c829c3a edx=71c1c000 esi=00000104 edi=00000000
eip=61213267 esp=005bf448 ebp=005bf4b4 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ModuleA!GetData+0×0000b57f:
001b:61213267 0fb700 movzx   eax,word ptr [eax]  ds:0023:71c22898=????

We check its data using lmv WinDbg command and find out that it is old and needs to be updated. But we don’t stop our investigation here. The fact that ServiceA was suspended means that it was probably being debugged or memory dumped. And indeed, we see NTSD in a process list:

0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 8619d5d0  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
    DirBase: 3fbeb020  ObjectTable: e1001e08  HandleCount: 1651.
    Image: System

PROCESS 85e95d88  SessionId: none  Cid: 019c    Peb: 7ffdf000  ParentCid: 0004
    DirBase: 3fbeb040  ObjectTable: e16d5f18  HandleCount:  19.
    Image: smss.exe

PROCESS 85e4fd88  SessionId: 0  Cid: 01cc    Peb: 7ffd4000  ParentCid: 019c
    DirBase: 3fbeb060  ObjectTable: e1561d70  HandleCount: 907.
    Image: csrss.exe

PROCESS 85e42d88  SessionId: 0  Cid: 01e4    Peb: 7ffde000  ParentCid: 019c
    DirBase: 3fbeb080  ObjectTable: e16a97b0  HandleCount: 504.
    Image: winlogon.exe

[...]

PROCESS 85a4dd18  SessionId: 0  Cid: 0fdc    Peb: 7ffda000  ParentCid: 0214
    DirBase: 3fbeb520  ObjectTable: e1aa5b38  HandleCount: 121.
    Image: ntsd.exe

[...]

If we zoom into NTSD process we would see that its main thread was waiting for a console input:

0: kd> !process 0fdc ff
[...]
THREAD 859f8768  Cid 0fdc.0fe0  Teb: 7ffdf000 Win32Thread: bc14cb38 WAIT: (Unknown) UserMode Non-Alertable
    859f8954  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00001f98:
Current LPC port e19f03a0
Not impersonating
DeviceMap                 e10008e8
Owning Process            85a4dd18       Image:         ntsd.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      6932           Ticks: 89861 (0:00:23:24.078)
Context Switch Count      450                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.078
Win32 Start Address ntsd!mainCRTStartup (0×0100845a)
Start Address kernel32!BaseProcessStartThunk (0×7c8217f8)
Stack Init f55c5000 Current f55c4c08 Base f55c5000 Limit f55c1000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f55c4c20 80833ec5 nt!KiSwapContext+0×26
f55c4c4c 80829c14 nt!KiSwapThread+0×2e5
f55c4c94 80920fba nt!KeWaitForSingleObject+0×346
f55c4d50 8088b3fc nt!NtRequestWaitReplyPort+0×776
f55c4d50 7c94860c nt!KiFastCallEntry+0xfc
0006ece0 7c947899 ntdll!KiFastSystemCallRet
0006ece4 7c94ec4a ntdll!ZwRequestWaitReplyPort+0xc
0006ed04 7c80cf8c ntdll!CsrClientCallServer+0×8c
0006edfc 7c872904 kernel32!ReadConsoleInternal+0×1b8
0006ee84 7c8018f4 kernel32!ReadConsoleA+0×3b
0006eedc 01005141 kernel32!ReadFile+0×64

0006ef04 01006974 ntsd!ConIn+0×183
0006ff38 010082d1 ntsd!MainLoop+0×1eb
0006ff44 01008589 ntsd!main+0×149
0006ffc0 7c82f23b ntsd!mainCRTStartup+0×12f
0006fff0 00000000 kernel32!BaseProcessStart+0×23

We follow LPC chain to csrss.exe to find out another blocked thread there:

0: kd> !lpc message 00001f98
Searching message 1f98 in threads …
Client thread 859f8768 waiting a reply from 1f98                         
Searching thread 859f8768 in port rundown queues …

Server communication port 0xe19b6b08
    Handles: 1   References: 1
    The LpcDataInfoChainHead queue is empty
        Connected port: 0xe19f03a0      Server connection port: 0xe1361d20

Client communication port 0xe19f03a0
    Handles: 1   References: 4
    The LpcDataInfoChainHead queue is empty

Server connection port e1361d20  Name: ServiceAPort
    Handles: 1   References: 233
    Server process  : 85e4fd88 (csrss.exe)
    Queue semaphore : 85e9b078
    Semaphore state 0 (0×0)
    The message queue is empty
    The LpcDataInfoChainHead queue is empty
Done.

0: kd> !process 85e4fd88 ff
[…]
THREAD 8549db60  Cid 01cc.1390  Teb: 7ffad000 Win32Thread: bc15aea8 WAIT: (Unknown) UserMode Non-Alertable
    8549dd4c  Semaphore Limit 0×1
Waiting for reply to LPC MessageId 00004feb:
Pending LPC Reply Message:
e191b6d0: [e1a162e8,e19ffc18]
Not impersonating
DeviceMap                 e10008e8
Owning Process            85e4fd88       Image:         csrss.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      12095          Ticks: 84698 (0:00:22:03.406)
Context Switch Count      35                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Start Address 0×75943b55
Stack Init f5625000 Current f5624bf0 Base f5625000 Limit f5622000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f5624c08 80833ec5 nt!KiSwapContext+0×26
f5624c34 80829c14 nt!KiSwapThread+0×2e5
f5624c7c 809222f6 nt!KeWaitForSingleObject+0×346
f5624d38 8088b3fc nt!NtSecureConnectPort+0×6ce
f5624d38 7c94860c nt!KiFastCallEntry+0xfc
015ff778 7c947939 ntdll!KiFastSystemCallRet
015ff77c 77c2e7c3 ntdll!NtSecureConnectPort+0xc
015ff8a0 77c4607b RPCRT4!LRPC_CASSOCIATION::OpenLpcPort+0×21e
015ff8e0 77c45ffb RPCRT4!LRPC_CASSOCIATION::ActuallyDoBinding+0×55
015ff958 77c4f6a5 RPCRT4!LRPC_CASSOCIATION::AllocateCCall+0×190
015ff98c 77c4f5d1 RPCRT4!LRPC_BINDING_HANDLE::AllocateCCall+0×1f2
015ff9b8 77c4f201 RPCRT4!LRPC_BINDING_HANDLE::NegotiateTransferSyntax+0xd3
015ff9d0 77c4ed14 RPCRT4!I_RpcGetBufferWithObject+0×5b
015ff9e0 77c4f464 RPCRT4!I_RpcGetBuffer+0xf
015ff9f0 77cb30e4 RPCRT4!NdrGetBuffer+0×2e
015ffddc 779b4695 RPCRT4!NdrClientCall2+0×197
[…]

We follow LPC chain again to see that csrss.exe thread was waiting for a reply from our suspended and frozen ServiceA: 

0: kd> !lpc message 00004feb
Searching message 4feb in threads …
Client thread 8549db60 waiting a reply from 4feb                         
Searching thread 8549db60 in port rundown queues …

Server connection port e19a50e0  Name: ServiceAPort
    Handles: 1   References: 20
    Server process  : 8545eb18 (ServiceA.exe)
    Queue semaphore : 85443320
    Semaphore state 9 (0×9)
        Messages in queue:
        0000 e1a866e0 - Busy  Id=000022e7  From: 01e4.01e8  Context=80060004  [e19a50f0 . e1878688]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e1878688 - Busy  Id=00003297  From: 0ac0.0b54  Context=804d0045  [e1a866e0 . e1036740]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e1036740 - Busy  Id=00003986  From: 0ce4.0ce8  Context=00000042  [e1878688 . e1441228]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e1441228 - Busy  Id=00003a32  From: 0db4.0e14  Context=00000050  [e1036740 . e1a162e8]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e1a162e8 - Busy  Id=00004c75  From: 059c.05ac  Context=00000051  [e1441228 . e191b6d0]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e191b6d0 - Busy  Id=00004feb  From: 01cc.1390  Context=00000051  [e1a162e8 . e19ffc18]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e19ffc18 - Busy  Id=000055e3  From: 13fc.05b4  Context=800d0009  [e191b6d0 . e19f4ea0]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
        0000 e19f4ea0 - Busy  Id=00006844  From: 0b00.0f20  Context=006b3d60  [e19ffc18 . e19a50f0]
                   Length=011800e8  Type=0000000a (LPC_CONNECTION_REQUEST)
                   Data: 00000000 00000000 00000000 00000000 00000000 00000000
    The message queue contains 8 messages
    The LpcDataInfoChainHead queue is empty
Done.

It doesn’t look as a deadlock because, although we have a cyclic process wait chain ServiceA -> NTSD -> CSRSS -> ServiceA, NTSD was waiting for a different thread in CSRSS than the one in CSRSS waiting for a reply from ServiceA. If these threads are unrelated then we don’t have a deadlock, strictly speaking, because the latter involves activity chains with ownership, not a container dependency (a process is a container for threads). I illustrated all this on the following diagram:

- Dmitry Vostokov @ DumpAnalysis.org -

One Response to “Stack trace collection, suspended threads, not my version, special process, main thread and blocked LPC chain threads: pattern cooperation”

  1. Software Generalist » Blog Archive » Reading Notebook: 11-August-09 Says:

    […] today I wrote a case study where we can see how an exception dispatcher sends a message: http://www.dumpanalysis.org/blog/index.php/2009/08/11/stack-trace-collection-suspended-threads-not-m…. Also here is another case study showing the flow of exception processing originated from user mode […]

Leave a Reply

You must be logged in to post a comment.