10 Common Mistakes in Memory Analysis (Part 4)

One of the common mistakes that I observe is to habitually stick to certain WinDbg commands to recognize patterns. One example is !locks command used to find out any wait chains and deadlock conditions among threads. Recently a service process was reported to be hang and !locks command showed no blocked threads:

0:000> !locks
CritSec +18caf94 at 018CAF94
LockCount          -2
RecursionCount     1
OwningThread       58e8
EntryCount         0
ContentionCount    0
*** Locked

CritSec +18cc7c4 at 018CC7C4
LockCount          -2
RecursionCount     1
OwningThread       58e8
EntryCount         0
ContentionCount    0
*** Locked

The number of threads waiting for the lock is 0 (this calculation is explained in the MSDN article): 

0:000> ? ((-1) - (-2)) >> 2
Evaluate expression: 0 = 00000000

In the past, for that hang sevice memory dumps, !locks command always showed LockCount values corresponding to several waiting threads. Therefore, an engineer assumed that the dump was taken at some random time, not at the time the service was hanging, and asked for a new right dump. The mistake here is that the engineer didn’t look at the corresponding thread stack trace that shows the characteristic pattern of the blocked thread waiting for a reply from an LRPC call:

0:000> ~~[58e8]kc 100

ntdll!KiFastSystemCallRet
ntdll!NtRequestWaitReplyPort
RPCRT4!LRPC_CCALL::SendReceive
RPCRT4!I_RpcSendReceive
RPCRT4!NdrSendReceive
RPCRT4!NdrClientCall2

ServiceA!foo
[…]
ServiceA!bar
RPCRT4!NdrStubCall2
RPCRT4!NdrServerCall2
RPCRT4!DispatchToStubInCNoAvrf
RPCRT4!RPC_INTERFACE::DispatchToStubWorker
RPCRT4!RPC_INTERFACE::DispatchToStub
RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
RPCRT4!LRPC_SCALL::DealWithRequestMessage
RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest
RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls
RPCRT4!RecvLotsaCallsWrapper
RPCRT4!BaseCachedThreadRoutine
RPCRT4!ThreadStartRoutine
kernel32!BaseThreadStart

We don’t see other blocked threads and wait chains because the dump was saved as soon as the freezing condition was detected: the service didn’t allow a user connection to proceed. If more users tried to connect we would have seen critical section wait chains that are absent in this dump.

To prevent such mistakes checklists are indispensable. For one example, see Crash Dump Analysis Checklist. You can also order it in print:

WinDbg: A Reference Poster and Learning Cards

- Dmitry Vostokov @ DumpAnalysis.org -

2 Responses to “10 Common Mistakes in Memory Analysis (Part 4)”

  1. Crash Dump Analysis » Blog Archive » Blocked LPC thread, coupled processes, stack trace collection and blocked GUI thread: pattern cooperation Says:

    […] small case study continues where Not using checklists common mistake case study left, after identifying the blocked LPC thread in ServiceA process. We […]

  2. Dmitry Vostokov Says:

    Checklists also help to discover apparently “unrelated” anomalies like increased number of threads in one process and a handle leak in anoher

Leave a Reply

You must be logged in to post a comment.