10 Common Mistakes in Memory Analysis (Part 4)
One of the common mistakes that I observe is to habitually stick to certain WinDbg commands to recognize patterns. One example is !locks command used to find out any wait chains and deadlock conditions among threads. Recently a service process was reported to be hang and !locks command showed no blocked threads:
0:000> !locks
CritSec +18caf94 at 018CAF94
LockCount -2
RecursionCount 1
OwningThread 58e8
EntryCount 0
ContentionCount 0
*** Locked
CritSec +18cc7c4 at 018CC7C4
LockCount -2
RecursionCount 1
OwningThread 58e8
EntryCount 0
ContentionCount 0
*** Locked
The number of threads waiting for the lock is 0 (this calculation is explained in the MSDN article):
0:000> ? ((-1) - (-2)) >> 2
Evaluate expression: 0 = 00000000
In the past, for that hang sevice memory dumps, !locks command always showed LockCount values corresponding to several waiting threads. Therefore, an engineer assumed that the dump was taken at some random time, not at the time the service was hanging, and asked for a new right dump. The mistake here is that the engineer didn’t look at the corresponding thread stack trace that shows the characteristic pattern of the blocked thread waiting for a reply from an LRPC call:
0:000> ~~[58e8]kc 100
ntdll!KiFastSystemCallRet
ntdll!NtRequestWaitReplyPort
RPCRT4!LRPC_CCALL::SendReceive
RPCRT4!I_RpcSendReceive
RPCRT4!NdrSendReceive
RPCRT4!NdrClientCall2
ServiceA!foo
[…]
ServiceA!bar
RPCRT4!NdrStubCall2
RPCRT4!NdrServerCall2
RPCRT4!DispatchToStubInCNoAvrf
RPCRT4!RPC_INTERFACE::DispatchToStubWorker
RPCRT4!RPC_INTERFACE::DispatchToStub
RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
RPCRT4!LRPC_SCALL::DealWithRequestMessage
RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest
RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls
RPCRT4!RecvLotsaCallsWrapper
RPCRT4!BaseCachedThreadRoutine
RPCRT4!ThreadStartRoutine
kernel32!BaseThreadStart
We don’t see other blocked threads and wait chains because the dump was saved as soon as the freezing condition was detected: the service didn’t allow a user connection to proceed. If more users tried to connect we would have seen critical section wait chains that are absent in this dump.
To prevent such mistakes checklists are indispensable. For one example, see Crash Dump Analysis Checklist. You can also order it in print:
WinDbg: A Reference Poster and Learning Cards
- Dmitry Vostokov @ DumpAnalysis.org -
July 4th, 2009 at 9:00 pm
[…] small case study continues where Not using checklists common mistake case study left, after identifying the blocked LPC thread in ServiceA process. We […]
April 23rd, 2010 at 11:31 am
Checklists also help to discover apparently “unrelated” anomalies like increased number of threads in one process and a handle leak in anoher