Crash Dump Analysis Checklist
Sometimes the root cause of a problem is not obvious from a memory dump. Here is the first version of crash dump analysis checklist to help experienced engineers not to miss any important information. The check list doesn’t prescribe any specific steps, just lists all possible points to double check when looking at a memory dump. Of course, it is not complete at the moment and any suggestions are welcome.
General:
- Symbol servers (.symfix)
- Internal database(s) search
- Google or Microsoft search for suspected components as this could be a known issue. Sometimes a simple search immediately points to the fix on a vendor’s site
- The tool used to save a dump (to flag false positive, incomplete or inconsistent dumps)
- OS/SP version (version)
- Language
- Debug time
- System uptime
- Computer name (dS srv!srvcomputername or !envvar COMPUTERNAME)
- List of loaded and unloaded modules (lmv or !dlls)
- Hardware configuration (!sysinfo)
- .kframes 1000
Application or service:
- Default analysis (!analyze -v or !analyze -v -hang for hangs)
- Critical sections (!cs -s -l -o, !locks) for both crashes and hangs
- Component timestamps, duplication and paths. DLL Hell? (lmv and !dlls)
- Do any newer components exist?
- Process threads (~*kv or !uniqstack) for multiple exceptions and blocking functions
- Process uptime
- Your components on the full raw stack of the problem thread
- Your components on the full raw stack of the main application thread
- Process size
- Number of threads
- Gflags value (!gflag)
- Time consumed by threads (!runaway)
- Environment (!peb)
- Import table (!dh)
- Hooked functions (!chkimg)
- Exception handlers (!exchain)
- Computer name (!envvar COMPUTERNAME)
- Process heap stats and validation (!heap -s, !heap -s -v)
- CLR threads? (mscorwks or clr modules on stack traces) Yes: use .NET checklist below
- Hidden (unhandled and handled) exceptions on thread raw stacks
System hang:
- Default analysis (!analyze -v -hang)
- ERESOURCE contention (!locks)
- Processes and virtual memory including session space (!vm 4)
- Important services are present and not hanging
- Pools (!poolused)
- Waiting threads (!stacks)
- Critical system queues (!exqueue f)
- I/O (!irpfind)
- The list of all thread stack traces (!process 0 3f)
- LPC/ALPC chain for suspected threads (!lpc message or !alpc /m after search for “Waiting for reply to LPC” or “Waiting for reply to ALPC” in !process 0 3f output)
- RPC threads (search for “RPCRT4!OSF” in !process 0 3f output)
- Mutants (search for “Mutants - owning thread” in !process 0 3f output)
- Critical sections for suspected processes (!cs -l -o -s)
- Sessions, session processes (!session, !sprocess)
- Processes (size, handle table size) (!process 0 0)
- Running threads (!running)
- Ready threads (!ready)
- DPC queues (!dpcs)
- The list of APCs (!apc)
- Internal queued spinlocks (!qlocks)
- Computer name (dS srv!srvcomputername)
- File cache, VACB (!filecache)
- File objects for blocked thread IRPs (!irp -> !fileobj)
- Network (!ndiskd.miniports and !ndiskd.pktpools)
- Disk (!scsikd.classext -> !scsikd.classext class_device 2)
- Modules rdbss, mrxdav, mup, mrxsmb in stack traces
- Functions Ntfs!Ntfs*, nt!Fs* and fltmgr!Flt* in stack traces
BSOD:
- Default analysis (!analyze -v)
- Pool address (!pool)
- Component timestamps (lmv)
- Processes and virtual memory (!vm 4)
- Current threads on other processors
- Raw stack
- Bugcheck description (including ln exception address for corrupt or truncated dumps)
- Bugcheck callback data (!bugdump for systems prior to Windows XP SP1)
- Bugcheck secondary callback data (.enumtag)
- Computer name (dS srv!srvcomputername)
- Hardware configuration (!sysinfo)
.NET application or service:
- CLR module and SOS extension versions (lmv and .chain)
- Managed exceptions (~*e !pe)
- Nested managed exceptions (!pe -nested)
- Managed threads (!Threads -special)
- Managed stack traces (~*e !CLRStack)
- Managed execution residue (~*e !DumpStackObjects and !DumpRuntimeTypes)
- Managed heap (!VerifyHeap, !DumpHeap -stat and !eeheap -gc)
- GC handles (!GCHandles, !GCHandleLeaks)
- Finalizer queue (!FinalizeQueue)
- Sync blocks (!syncblk)
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
June 28th, 2007 at 5:17 pm
Added the check for a pool address: !pool <address>.
Useful to see what pool tag is associated with the data, gives an idea about the data structure, currently accessed struct field, what component it somes from, etc.
June 28th, 2007 at 5:37 pm
Added import table check (!dh) to see if it is not corrupt. Useful in some cases where memory optimization or rebasing products are used.
September 7th, 2007 at 4:41 pm
Hello, Dmitry!
Terrific site! Thank you!!!!
(I’m a developer who supports load testing of Citrix protocol in HP (former Mercury Interactive) LoadRunner product. I sometimes have headache with customer support/remote debugging also. )
September 11th, 2007 at 11:20 am
Thanks!
Dmitry
November 5th, 2007 at 1:05 pm
Added the command to list of ready-to-run threads: !ready
November 9th, 2007 at 2:42 pm
Added !dpcs and !apc commands
November 22nd, 2007 at 4:53 pm
Added !chkimg
November 23rd, 2007 at 4:34 pm
Added !exchain
January 25th, 2008 at 4:58 pm
Added !qlocks
April 17th, 2008 at 1:55 am
wait_for_client_connects: Process 1872 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
*
* BEGIN STACK DUMP:
* 04/16/08 10:18:42 spid 0
*
* Exception Address = 77FCC2C2 (RtlAllocateHeap + 1d3)
* Exception Code = c0000005 E
* Access Violation occurred writing address 00000005
*
* MODULE BASE END SIZE
* sqlservr 00400000 008d2fff 004d3000
* ntdll 77f80000 77ffcfff 0007d000
* KERNEL32 77e50000 77f31fff 000e2000
* ADVAPI32 796d0000 79731fff 00062000
* RPCRT4 786f0000 7875efff 0006f000
* USER32 77de0000 77e44fff 00065000
* GDI32 77f40000 77f7bfff 0003c000
* ole32 7cf00000 7cfeefff 000ef000
* OLEAUT32 77980000 77a1afff 0009b000
* VERSION 777d0000 777d6fff 00007000
* LZ32 75940000 75945fff 00006000
* opends60 41060000 41085fff 00026000
* ums 41090000 4109cfff 0000d000
* MSVCRT 78000000 78044fff 00045000
* sqlsort 04000000 0408efff 0008f000
* MSVCIRT 780a0000 780b1fff 00012000
* IMM32 75df0000 75e09fff 0001a000
* sqlevn70 410a0000 410a6fff 00007000
* COMNEVNT 410b0000 410fefff 0004f000
* ODBC32 00eb0000 00ee1fff 00032000
* COMCTL32 716f0000 71779fff 0008a000
* SHELL32 78f90000 791d5fff 00246000
* SHLWAPI 63180000 631cbfff 0004c000
* comdlg32 76ae0000 76b1dfff 0003e000
* SQLWOA 41100000 4110bfff 0000c000
* odbcint 1f850000 1f865fff 00016000
* NDDEAPI 76930000 76936fff 00007000
* WINSPOOL 777b0000 777cdfff 0001e000
* MPR 79b20000 79b2ffff 00010000
* SQLTrace 41130000 4117dfff 0004e000
* NETAPI32 7cea0000 7ceeffff 00050000
* Secur32 797b0000 797befff 0000f000
* NTDSAPI 77bc0000 77bd0fff 00011000
* DNSAPI 77950000 77973fff 00024000
* WSOCK32 74fc0000 74fc9fff 0000a000
* WS2_32 74fa0000 74fb3fff 00014000
* WS2HELP 74f90000 74f97fff 00008000
* WLDAP32 77920000 77949fff 0002a000
* NETRAP 75140000 75145fff 00006000
* SAMLIB 750d0000 750defff 0000f000
* SSNMPN70 41190000 41195fff 00006000
* SSMSRP70 411b0000 411b7fff 00008000
* SSMSSO70 411a0000 411aafff 0000b000
* XOLEHLP 048e0000 048e7fff 00008000
* MSDTCPRX 048f0000 049aafff 000bb000
* MTXCLU 049b0000 049bffff 00010000
* CLUSAPI 049c0000 049cffff 00010000
* RESUTILS 049d0000 049dcfff 0000d000
* USERENV 049e0000 04a40fff 00061000
* rnr20 04a50000 04a5bfff 0000c000
* iphlpapi 04aa0000 04ab2fff 00013000
* ICMP 04ac0000 04ac4fff 00005000
* MPRAPI 04ad0000 04ae6fff 00017000
* ACTIVEDS 04af0000 04b1efff 0002f000
* ADSLDPC 04b20000 04b42fff 00023000
* RTUTILS 04b50000 04b5dfff 0000e000
* SETUPAPI 04b60000 04c0dfff 000ae000
* RASAPI32 04c10000 04c42fff 00033000
* RASMAN 04c50000 04c60fff 00011000
* TAPI32 04c70000 04c91fff 00022000
* DHCPCSVC 04ca0000 04cb8fff 00019000
* winrnr 04d50000 04d57fff 00008000
* rasadhlp 04d60000 04d64fff 00005000
* mswsock 05490000 054a1fff 00012000
* msafd 054f0000 0550dfff 0001e000
* wshtcpip 05550000 05556fff 00007000
* SQLRGSTR 059f0000 059f4fff 00005000
* security 05a90000 05a93fff 00004000
* msv1_0 05aa0000 05ac0fff 00021000
* CRYPT32 05ad0000 05b56fff 00087000
* MSASN1 05b70000 05b7ffff 00010000
* xpsqlbot 05b90000 05b95fff 00006000
* sqlboot 05ba0000 05ba7fff 00008000
* xpsql70 075b0000 075bafff 0000b000
* xpstar 07600000 07630fff 00031000
* SQLWID 07640000 07645fff 00006000
* SQLSVC 07650000 07668fff 00019000
* odbcbcp 07670000 07675fff 00006000
* SQLRESLD 07680000 07685fff 00006000
* W95SCM 07690000 07697fff 00008000
* SQLSVC 076a0000 076a5fff 00006000
* imagehlp 0d9e0000 0da02fff 00023000
* DBGHELP 0db50000 0db7cfff 0002d000
* sqlimage 10180000 101acfff 0002d000
*
* Edi: 00D90000: 0000fe00 00000000 00001002 eeffeeff 00000100 000000c8
* Esi: 04DFAFD8: 04dfafe8 04dfafe8 00d901b8 00000001 00100001 00070008
* Eax: 00000001:
* Ebx: 00000007:
* Ecx: 00D901B8: 00d901c8 00d901c8 00d901c0 00d901c0 04dfafe0 00000001
* Edx: 00000061:
* Eip: 77FCC2C2: ffff2885 8903e8c1 c18b0eb7 0f000000 d6850fc1 3b044889
* Ebp: 0540FE70: 00000020 00000030 00000000 00d90000 78001532 0540feb0
* SegCs: 0000001B:
* EFlags: 00000216:
* Esp: 0540FCA4: 00000000 00000000 00000000 00000000 01762254 00000020
* SegSs: 00000023:
April 17th, 2008 at 9:56 am
* BEGIN STACK DUMP:
* 04/16/08 10:18:42 spid 0
*
* Exception Address = 77FCC2C2 (RtlAllocateHeap + 1d3)
Due to RtlAllocateHeap you have an instance of heap corruption and you need to enable full page heap to isolate the component:
http://www.dumpanalysis.org/blog/index.php/2006/10/31/crash-dump-analysis-patterns-part-2/
April 23rd, 2008 at 1:07 pm
Added !bugdump and .enumtag
June 16th, 2008 at 3:27 pm
Added .kframes 100
June 18th, 2008 at 2:14 pm
Added a check for component paths
June 19th, 2008 at 4:04 pm
Added a check for duplicated components
July 11th, 2008 at 7:01 pm
Added !locks -v check when we have signs of critical section corruption
July 28th, 2008 at 2:38 pm
Added !cs -s -l -o for process memory dumps
August 1st, 2008 at 7:51 am
Great work and very useful to all professional developers.
~Subbu
August 12th, 2008 at 4:58 pm
[…] This is already written application by me (10 years ago by me) that I’m adapting as a high-level interface to WinDbg (can be any GUI debugger actually). The basic idea revolves around floating buttons (listbox and task bar icons, optionally) that dynamically change with every new window or application. The number of buttons can be unlimited, they can be repositioned to any corner of the screen, they can play sounds, show video and pictures. On click they play elaborated macro commands, including keystrokes and mouse movements, written in a special scripting language. For example, we can create buttons for CDA checklist. […]
September 18th, 2008 at 10:26 am
Added commands to extract computer name
September 18th, 2008 at 11:32 am
[…] Farah who blogged about .cmdtree command I was able create the first version of cmdtree.txt for Crash Dump Analysis Checklist to include common commands that I use. It can be found […]
December 16th, 2008 at 12:02 pm
Added search for “Waiting for reply to LPC” in !process 0 ff output to detect LPC wait chains
December 16th, 2008 at 12:56 pm
[…] ago I started with a few commands like !analyze -v, kv and dd and progressed to an elaborate checklist. Here the natural logarithm can be used to approximate the […]
February 26th, 2009 at 7:40 pm
[…] 1. First, have a checklist […]
March 5th, 2009 at 11:28 am
Added search for “Mutant - owning thread”
March 20th, 2009 at 12:13 am
Added “Waiting for reply to ALPC Message” and !alpc /m
July 3rd, 2009 at 4:47 pm
[…] prevent such mistakes checklists are indispensable. For one example, see Crash Dump Analysis Checklist. You can also order it in […]
July 16th, 2009 at 10:25 am
Added a check for important services for your environment
November 6th, 2009 at 11:26 am
Added !sysinfo for checking hardware configuration
November 6th, 2009 at 12:01 pm
Added .symfix and version commands
December 24th, 2009 at 2:24 pm
Added !filecache to check for free VACB
January 4th, 2010 at 11:08 am
Added lmv and !dlls (the latter is for user and complete memory dumps) as a general check for loaded and unloaded modules and their versions
January 4th, 2010 at 12:00 pm
Hi Dmitry!
I found following command is handy during debugging.
.lastevent
!gle -all
!heap -s
.cxr and .exr
!address
and for managed side:
~* e !clrstack -all
!dumpstack
!pe
!dumpheap -stat -type Exception
!dso
August 26th, 2010 at 12:11 pm
“!sysinfo” ?!?!
September 8th, 2010 at 1:34 pm
“!sysinfo” ?!?!
Oh, right, I got it. Kernel debugging. Never mind!
February 18th, 2011 at 10:51 pm
[…] Crash dump analysis checklist: http://www.dumpanalysis.org/blog/index.php/2007/06/20/crash-dump-analysis-checklist/ […]
April 4th, 2011 at 4:01 pm
Removed !ntsd.locks as deprecated and replaced by !cs
October 26th, 2011 at 12:56 pm
Massive update. Added process heap stats,hidden (unhandled and handled) exceptions on thread raw stacks, multiple exceptions and blocking calls. The first initial checklist for .NET
November 16th, 2011 at 4:09 pm
Adding !locks back to user process memory dump analysis checklist as it’s working now with the latest WinDbg
February 10th, 2012 at 12:14 pm
Added !DumpRuntimeTypes for .NET execution residue
March 23rd, 2012 at 1:16 pm
Added !heap -s -v to validate heap
March 28th, 2012 at 8:03 pm
[…] am just copy and paste from http://www.dumpanalysis.org/blog/index.php/2007/06/20/crash-dump-analysis-checklist/ for my own […]
May 20th, 2012 at 6:51 pm
[…] - Crash Dump Analysis Checklist […]
July 31st, 2012 at 3:40 pm
Added file objects for blocked thread IRPs: !irp then !fileobj
July 8th, 2013 at 9:23 pm
Added !process 0 3f for W8
August 25th, 2013 at 5:19 pm
Added !GCHandleLeaks
June 6th, 2014 at 2:17 pm
Added network and disk checks
September 15th, 2014 at 8:07 pm
Added rdbss, mrxdav, mup, mrxsmb checks for remote file access
July 2nd, 2015 at 5:57 pm
Added Ntfs!Ntfs* and nt!Fs* thread stack trace checks
February 23rd, 2016 at 11:56 am
Added fltmgr!Flt* for stack trace checks
December 15th, 2016 at 11:53 am
Added “RPCRT4!OSF” for stack trace checks