10 Common Mistakes in Memory Analysis (Part 4)

CARE: Crash Analysis Report Environment

DATA (Dump Analysis + Trace Analysis) Facebook group
Please join the community of memory (dump) and trace analysis engineers. This group promotes scientific methods and memory dump-based worldview.

Twitter @ DumpAnalysis
You can now follow portal and blog news at DumpAnalysis on Twitter

LinkedIn Group Dr. Watson Enthusiasts
All about Dr. Watson errors and more. Get news, excerpts and progress reports about the forthcoming book The Science of Dr. Watson: An Illustrated History of Debugging (ISBN 978-1906717070)

2010 (0x7DA) - The Year of Dump Analysis
2011 (0x7DB) - 2020 (0x7E4) The Debugging Decade

One of the common mistakes that I observe is to habitually stick to certain WinDbg commands to recognize patterns. One example is !locks command used to find out any wait chains and deadlock conditions among threads. Recently a service process was reported to be hang and !locks command showed no blocked threads:

0:000> !locks
CritSec +18caf94 at 018CAF94
LockCount          -2
RecursionCount     1
OwningThread       58e8
EntryCount         0
ContentionCount    0
*** Locked

CritSec +18cc7c4 at 018CC7C4
LockCount          -2
RecursionCount     1
OwningThread       58e8
EntryCount         0
ContentionCount    0
*** Locked

The number of threads waiting for the lock is 0 (this calculation is explained in the MSDN article): 

0:000> ? ((-1) - (-2)) >> 2
Evaluate expression: 0 = 00000000

In the past, for that hang sevice memory dumps, !locks command always showed LockCount values corresponding to several waiting threads. Therefore, an engineer assumed that the dump was taken at some random time, not at the time the service was hanging, and asked for a new right dump. The mistake here is that the engineer didn’t look at the corresponding thread stack trace that shows the characteristic pattern of the blocked thread waiting for a reply from an LRPC call:

0:000> ~~[58e8]kc 100

ntdll!KiFastSystemCallRet
ntdll!NtRequestWaitReplyPort
RPCRT4!LRPC_CCALL::SendReceive
RPCRT4!I_RpcSendReceive
RPCRT4!NdrSendReceive
RPCRT4!NdrClientCall2

ServiceA!foo
[…]
ServiceA!bar
RPCRT4!NdrStubCall2
RPCRT4!NdrServerCall2
RPCRT4!DispatchToStubInCNoAvrf
RPCRT4!RPC_INTERFACE::DispatchToStubWorker
RPCRT4!RPC_INTERFACE::DispatchToStub
RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
RPCRT4!LRPC_SCALL::DealWithRequestMessage
RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest
RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls
RPCRT4!RecvLotsaCallsWrapper
RPCRT4!BaseCachedThreadRoutine
RPCRT4!ThreadStartRoutine
kernel32!BaseThreadStart

We don’t see other blocked threads and wait chains because the dump was saved as soon as the freezing condition was detected: the service didn’t allow a user connection to proceed. If more users tried to connect we would have seen critical section wait chains that are absent in this dump.

To prevent such mistakes checklists are indispensable. For one example, see Crash Dump Analysis Checklist. You can also order it in print:

WinDbg: A Reference Poster and Learning Cards

- Dmitry Vostokov @ DumpAnalysis.org -

           

Announcements

Coming Soon:

Debugging Notebook: Essential Concepts, WinDbg Commands and Tools

Crash Dump Analysis for System Administrators and Support Engineers

New Magazines:

Debugged! MZ/PE: MagaZine for/from Practicing Engineers


New Books:

Memory Dump Analysis Anthology, Volume 3

First Fault Software Problem Solving: A Guide for Engineers, Managers and Users

x64 Windows Debugging: Practical Foundations

Also available:

Windows Debugging: Practical Foundations

DLL List Landscape: The Art from Computer Memory Space

Dumps, Bugs and Debugging Forensics: The Adventures of Dr. Debugalov

WinDbg: A Reference Poster and Learning Cards

Memory Dump Analysis Anthology, Volume 2

Memory Dump Analysis Anthology, Volume 1

New Children's Book:

Baby Turing

One Response to “10 Common Mistakes in Memory Analysis (Part 4)”

  1. Crash Dump Analysis » Blog Archive » Blocked LPC thread, coupled processes, stack trace collection and blocked GUI thread: pattern cooperation Says:

    […] small case study continues where Not using checklists common mistake case study left, after identifying the blocked LPC thread in ServiceA process. We […]

Leave a Reply