Crash Dump Analysis AntiPatterns (Part 6)

November 22nd, 2007

Need the crash dump. Period. This might be the first thought when an engineer gets a stack trace fragment without symbolic information. It is usually based on the following presupposition:

We need an actual dump file to suggest further troubleshooting steps.

This is not actually true unless it is the first time you have the problem and get stack trace for it. Consider the following fragment from bugcheck kernel dump when no symbols were applied because the customer didn’t have them:

b90529f8 8085eced nt!KeBugCheckEx+0x1b
b9052a70 8088c798 nt!MmAccessFault+0xb25
b9052a70 bfabd940 nt!_KiTrap0E+0xdc
WARNING: Stack unwind information not available. Following frames may be wrong.
b9052b14 bfabe452 MyDriver+0x27940

We can convert module+offset information into module!function+offset2 using MAP files or using DIA SDK (Debug Interface Access SDK) to query PDB files if we know module timestamp. This might be seen as a tedious exercise but we don’t need to do it if we keep raw stack trace signatures in some database when doing crash dump analysis. If we use our own symbol servers we might want to remove references to them and reload symbols. Then redo previous stack trace commands.

In my case it happened that I already analyzed similar previous bugcheck crash dumps months ago and saved stack trace prior to applying symbols. This helped me to point to solution without requesting the crash dump corresponding to that stack trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Critical thinking when troubleshooting

November 22nd, 2007

Faulty thinking happens all the time in technical support environments partly due to hectic and demanding business realities.

Simple*ology book pointed me to this website:

http://www.fallacyfiles.org/ 

which taxonomically organizes fallacies:

http://www.fallacyfiles.org/taxonomy.html

For example, False Cause. Technical examples might include false causes inferred from trace analysis, customer problem description that includes steps to reproduce the problem, etc. This also applies to debugging and importance of thinking skills has been emphasized in the following book:

Debugging by Thinking: A Multidisciplinary Approach

Surface-level of basic crash dump analysis is less influenced by false cause fallacies because it doesn’t have explicitly recorded sequence of events although some caution should be exercised during detailed analysis of thread waiting times and other historical information.   

Warning: when exercising critical thinking recursively we need to stop at the right time to avoid paralysis of analysis :-) 

- Dmitry Vostokov @ DumpAnalysis.org

Crash Dump Analysis Patterns (Part 37)

November 21st, 2007

Some bugs are fixed using brute-force approach via putting an exception handler to catch access violations and other exceptions. Long time ago I saw one such “incredible fix” when the image processing application was crashing after approximately Nth heap free runtime call. To ignore crashes a SEH handler was put in place but the application started to crash in different places. Therefore the additional fix was to skip free calls when approaching N and resume afterwards. The application started to crash less frequently.

Here getting Early Crash Dump when a first-chance exception happens can help in component identification before corruption starts spreading across data. Recall that when an access violation happens in a process thread in user mode the system generates the first-chance exception which can be caught by an attached debugger and if there is no such debugger the system tries to find an exception handler and if that exception handler catches and dismisses the exception the thread resumes its normal execution path. If there are no such handlers found the system generates the so called second-chance exception with the same exception context to notify the attached debugger and if it is not attached a default thread exception handler usually saves a postmortem user dump.

You can get first-chance exception memory dumps with:

Here is an example configuration rule for crashes in Debug Diagnostic tool for TestDefaultDebugger process (Unconfigured First Chance Exceptions option is set to Full Userdump):

    

When we push the big crash button in TestDefaultDebugger dialog box two crash dumps are saved, with first and second-chance exceptions pointing to the same code:

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_27PM__2__First chance exception 0XC0000005.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. First chance exception 0XC0000005′
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Wed Nov 21 16:28:27.000 2007 (GMT+0)
System Uptime: 0 days 23:45:34.711
Process Uptime: 0 days 0:01:09.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(10dc.590): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70
eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_34PM__693__ Second_Chance_Exception_C0000005.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. Second_Chance_Exception_C0000005
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Wed Nov 21 16:28:34.000 2007 (GMT+0)
System Uptime: 0 days 23:45:39.313
Process Uptime: 0 days 0:01:16.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(10dc.590): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70
eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis on Solaris x86 - AMD64

November 20th, 2007

Found the following book which is an interesting read to see crash dump analysis from a different operating system architecture perspective but on the same Intel / AMD platform:

http://www.genunix.org/gen/crashdump/book.pdf

- Dmitry Vostokov @ DumpAnalysis.org

Crash Dump Analysis Patterns (Part 31a)

November 20th, 2007

I have already discussed Passive Thread pattern in user space. In this part I continue with kernel space and passive system threads that don’t run in any user process context. These threads belong to the so called System process, don’t have any user space stack and their full stack traces can be seen from the output of !process command (if not completely paged out):

1: kd> !process 0 ff System

or from system portion of !stacks 2 command.  

Some system threads from that list belong to core OS functionality and are not passive (function offsets can vary for different OS versions and service packs):

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!MmZeroPageThread+0×180
nt!Phase1Initialization+0xe
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!MiModifiedPageWriter+0×59
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!MiMappedPageWriter+0xad
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!KeBalanceSetManager+0×101
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!KeSwapProcessOrStack+0×44
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!EtwpLogger+0xdd
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!KiExecuteDpc+0×198
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!CcQueueLazyWriteScanThread+0×73
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!ExpWorkerThreadBalanceManager+0×85
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

Other threads belong to various worker queues (they can also be seen from !exqueue ff command output) and wait for data items to arrive (passive threads):

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!ExpWorkerThread+0×104
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

or

nt!KiSwapContext+0x26
nt!KiSwapThread+0x2e5
nt!KeRemoveQueue+0x417
nt!ExpWorkerThread+0xc8
nt!PspSystemThreadStartup+0×2e
nt!KiThreadStartup+0×16

Non-Exp system threads having Worker, Logging or Logger substrings in their function names are passive threads and wait for data too, for example:

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!PfTLoggingWorker+0×81
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!EtwpLogger+0xdd
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!KeRemoveQueue+0x21
rdpdr!RxpWorkerThreadDispatcher+0×6f
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
HTTP!UlpThreadPoolWorker+0×26c
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!KeRemoveQueue+0x21
srv2!SrvProcWorkerThread+0×74
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!KeRemoveQueue+0x21
srv!WorkerThread+0×90
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

Any deviations in memory dump can raise suspicion like in the stack below for driver.sys 

nt!KiSwapContext+0x26
nt!KiSwapThread+0x284
nt!KeWaitForSingleObject+0×346
nt!ExpWaitForResource+0xd5
nt!ExAcquireResourceExclusiveLite+0×8d
nt!ExEnterCriticalRegionAndAcquireResourceExclusive+0×19

driver!ProcessItem+0×2f
driver!DelayedWorker+0×27

nt!ExpWorkerThread+0×104
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

- Dmitry Vostokov @ DumpAnalysis.org

Modeling side of DLL Injection

November 19th, 2007

Component injection can be used to model various process and system software behavior by writing customized DLL/SYS and injecting them into process/kernel space. Although often depicted either as security threat or value-added hooking mechanism very little has been written about its use to model various software defects. Here I don’t mean testing but studying faulty behavior and artifacts after injecting specific DLLs with design and implementation defects. For example, forgetting to release database connections or not closing file handles. NotMyLeak is an attempt to do it for different kind of leaks on x86 and x64 Windows platforms. It uses automatic DLL injection via standard Windows hooking mechanism. Stay tuned.

- Dmitry Vostokov @ DumpAnalysis.org -  

NotMyLeak

November 19th, 2007

To troubleshoot and study memory leaks the following tool called NotMyLeak will be released soon. It injects different kinds of leaks into specified processes and system:

  • Process heap
  • Runtime library
  • Performance counters
  • Kernel paged pool
  • Kernel nonpaged pool
  • IRP
  • Handles
  • PTE
  • etc…

The idea is to model various real-time leaks, analyze memory dumps and then apply discovered patterns to crash dump analysis of memory dumps coming from real-world systems.   

The draft GUI (subject to change):

Note: the tool name prefix NotMy… was inspired by the name of Mark Russinovich’s tool called NotMyFault.

- Dmitry Vostokov @ DumpAnalysis.org

Windows Internals book

November 19th, 2007

Scheduled to be updated with Windows Vista and Windows Server 2008 details:

Windows® Internals, Fifth Edition

- Dmitry Vostokov @ DumpAnalysis.org

Filtering processes

November 19th, 2007

When I analyze memory dumps coming from Microsoft or Citrix terminal service environments I frequently need to find a process hosting terminal service. In Windows 2000 it was the separate process termsrv.exe and now it is termsrv.dll which can be loaded into any of several instances of svchost.exe. The simplest way to narrow down that svchost.exe process if we have a complete memory dump is to use the module option of WinDbg !process command:

!process /m termsrv.dll 0

!process /m wsxica.dll 0

!process /m ctxrdpwsx.dll 0

Note: this option works only with W2K3, XP and later OS

Also to list all processes with user space stacks having the same image name we can use:

!process 0 ff msiexec.exe

or  

!process 0 ff svchost.exe

Note: this command works with W2K too as well as session option (/s)

- Dmitry Vostokov @ DumpAnalysis.org

Exceptions Ab Initio

November 16th, 2007

Where do native exceptions come from? How do they propagate from hardware and eventually result in crash dumps? I was asking these questions when I started doing crash dump analysis more than four years ago and I tried to find answers using IA-32 Intel® Architecture Software Developer’s Manual, WinDbg and complete memory dumps.

Eventually I wrote some blog posts about my findings. They are buried between many other posts so I dug them out and put on a dedicated page:

Interrupts and Exceptions Explained

- Dmitry Vostokov @ DumpAnalysis.org

Memorillion and Quadrimemorillion

November 15th, 2007

What are these? These are names of the number of possible unique complete memory dumps when address space is 32 bit and 64-bit correspondingly:

256232 and 256264

The first of them can be approximated by 101010

This idea came to me after I learnt about the so called “immense number” proposed by Walter Elsasser. This number is so big that its digits cannot be listed because there is not enough particles in observable Universe to write them.

Certainly one memorillion is more than one googol 10100 but it requires only approx. 1010 particles in ideal case to list its digits and therefore not an immense number. It is however far less than one googolplex 1010100.

Consider a complete memory dump with bytes written in hexadecimal notation:

0x50414745554d500f000000ce0e00000090...

This number has more than 8 billion digits… And it is one possible number out of memorillion of them. So one memorillion in hexadecimal notation is just

0xFFFFFFFFFFFFFFFFFFFFF... + 1

where we have 2*232 ‘F’ symbols written sequentially. One quadrimemorillion has 2*264 ‘F’ symbols.

Also the question about the number of possible crash dumps can be considered as Microsoft interview style question when you have possible candidates and you want to assess their ability to think out of the box and handle large numbers. 

- Dmitry Vostokov @ DumpAnalysis.org -

Making Software Troubleshooting Simple

November 15th, 2007

Excellent read to refine general problem solving skills towards simplicity, understand broad applicability of modeling and just for fun:

Simpleology: The Simple Science of Getting What You Want

But from Amazon

Now I’m going to have a simple lunch and read this simple book. What about the rating? Of course, it is simple too! Maximum! 1 star in my simple zero-one binary rating system - worth (1) or not worth (0) to read.

- Dmitry Vostokov @ DumpAnalysis.org -

News for C++ and MFC funs

November 15th, 2007

I write most of my tools using C++, MFC and STL and I was really delighted to hear about new MFC framework improvements in forthcoming Visual Studio 2008. You can read the following press release from Russian ISV:

http://www.bcgsoft.com/pressreleases/PR071110.pdf

This is also discussed on MS Visual C++ team blog:

http://blogs.msdn.com/vcblog/archive/2007/11/09/quick-tour-of-new-mfc-functionality.aspx

I was also thinking about extending my MFC projects with .NET class library and found this interesting practical book:

Extending MFC Applications with the .NET Framework

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 36)

November 14th, 2007

The pattern I should have written as one of the first is called Local Buffer Overflow. It is observed on x86 platforms when a local variable and a function return address and/or saved frame pointer EBP are overwritten with some data. As a result, the instruction pointer EIP becomes Wild Pointer and we have a process crash in user mode or a bugcheck in kernel mode. Sometimes this pattern is diagnosed by looking at mismatched EBP and ESP values and in the case of ASCII or UNICODE buffer overflow EIP register may contain 4-char or 2-wchar_t value and ESP or EBP or both registers might point at some string fragment like in the example below:

0:000> r
eax=000fa101 ebx=0000c026 ecx=01010001 edx=bd43a010 esi=000003e0 edi=00000000
eip=0048004a esp=0012f158 ebp=00510044 iopl=0  nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000202
0048004a 0000 add     byte ptr [eax],al  ds:0023:000fa101=??

0:000> kL
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f154 00420047 0x48004a
0012f158 00440077 0x420047
0012f15c 00420043 0x440077
0012f160 00510076 0x420043
0012f164 00420049 0x510076
0012f168 00540041 0x420049
0012f16c 00540041 0x540041
...
...
...

Good buffer overflow case studies with complete analysis including assembly language tutorial can be found in Buffer Overflow Attacks book.

Buy from Amazon 

- Dmitry Vostokov @ DumpAnalysis.org -

Recent service downtime

November 13th, 2007

Due to increased popularity this site became slow and even had severe service disruptions during last few days. I have moved it to a dedicated virtual server and now it should be much faster with at least 99.9% service uptime.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 35)

November 12th, 2007

In kernel or complete memory dumps coming from hanging or slow workstations and servers !irpfind WinDbg command may show IRP Distribution Anomaly pattern when certain drivers have excessive count of active IRPs not observed under normal circumstances. I created two IRP distribution graphs from two problem kernel dumps by preprocessing command output using Visual Studio keyboard macros to eliminate completed IRPs and then using Excel. In one case it was a big number of I/O request packets from 3rd-party antivirus filter driver:

\Driver\3rdPartyAvFilter

In the second case it was the huge number of active IRPs targeted to kernel socket ancillary function driver:

\Driver\AFD

Two other peaks on both graphs are related to NTPS and NTFS, pipes and file system and usually normal. Here is IRP distribution graph from my Vista workstation captured while I was writing this post:

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump Analysis using Excel

November 9th, 2007

Some WinDbg commands output data in tabular format so it is possible to save their output to a text file, import it to Excel and do sorting, filtering, and graph visualization, etc. Some commands from WinDbg include:

!stacks 1

Lists all threads with Ticks column so you can sort and filter threads that had been waiting no more than 100 ticks, for
example.

!irpfind

Here we can create various histograms, for example, IRP distribution based on [Driver] column.

I’ll show more examples later but now the graph depicting thread distribution in PID - TID coordinates on a busy multiprocessor
system with 25 user sessions and more than 3,000 threads:

WinDbg scripts offer possibility to output various tabulated data via .printf:

0:000> .printf "a\tb\tc"
a       b       c

- Dmitry Vostokov @ DumpAnalysis.org -

TestDefaultDebugger.NET

November 8th, 2007

Sometimes there are situations when we need to test exception handling to see whether it works and how to get dumps or logs from it. For example, a customer reports infrequent process crashes but no dumps are saved. Then we can try some application that crashes immediately to see whether it results in error messages and/or saved crash dumps. This was the motivation behind TestDefaultDebugger package. Unfortunately it contains only native applications and today I needed to test .NET CLR exception handling and see what messages it shows in my environment. So I wrote a simple program in C# that creates an empty Stack object and then calls its Pop method which triggers “Stack empty” exception sufficient for my purposes.

The updated package now includes TestDefaultDebugger.NET.exe and can be downloaded from Citrix support web site (requires free registration):

Download TestDefaultDebugger package

- Dmitry Vostokov @ DumpAnalysis.org -

Symbol file warnings in WinDbg 6.8.0004.0

November 8th, 2007

I started using new WinDbg 6.8.4.0 and found that it prints the following message twice when I open a process dump or a complete memory dump where the current context is from some user mode process:

0:000> !analyze -v
...
...
...
***
***    Your debugger is not using the correct symbols
***
***    In order for this command to work properly, your symbol path
***    must point to .pdb files that have full type information.
***
***    Certain .pdb files (such as the public OS symbols) do not
***    contain the required information.  Contact the group that
***    provided you with these symbols if you need this command to
***    work.
***
***    Type referenced: kernel32!pNlsUserInfo
***

Fortunately kernel32.dll symbols were loaded correctly despite the warning:

0:000> lmv m kernel32
start    end        module name
77e40000 77f42000   kernel32   (pdb symbols)          c:\mssymbols\kernel32.pdb\DF4F569C743446809ACD3DFD1E9FA2AF2\kernel32.pdb
    Loaded symbol image file: kernel32.dll
    Image path: C:\WINDOWS\system32\kernel32.dll
    Image name: kernel32.dll
    Timestamp:        Tue Jul 25 13:31:53 2006 (44C60F39)
    CheckSum:         001059A9
    ImageSize:        00102000
    File version:     5.2.3790.2756
    Product version:  5.2.3790.2756
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     kernel32
    OriginalFilename: kernel32
    ProductVersion:   5.2.3790.2756
    FileVersion:      5.2.3790.2756 (srv03_sp1_gdr.060725-0040)
    FileDescription:  Windows NT BASE API Client DLL
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

Also double checking return addresses on the stack trace shows that symbol mapping was correct (from another dump with the same message):

kd> dpu kernel32!pNlsUserInfo l1
77ecb0a8  77ecb760 "ENU"

kd> kv
ChildEBP RetAddr  Args to Child
f552bbec f79e1743 000000e2 cccccccc 858a0470 nt!KeBugCheckEx+0x1b
WARNING: Stack unwind information not available. Following frames may be wrong.
f552bc38 8081d39d 85699390 8596fe78 860515f8 SystemDump+0x743
f552bc4c 808ec789 8596fee8 860515f8 8596fe78 nt!IofCallDriver+0x45
f552bc60 808ed507 85699390 8596fe78 860515f8 nt!IopSynchronousServiceTail+0x10b
f552bd00 808e60be 00000090 00000000 00000000 nt!IopXxxControlFile+0x5db
f552bd34 80882fa8 00000090 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
f552bd34 7c82ed54 00000090 00000000 00000000 nt!KiFastCallEntry+0xf8
0012efc4 7c8213e4 77e416f1 00000090 00000000 ntdll!KiFastSystemCallRet
0012efc8 77e416f1 00000090 00000000 00000000 ntdll!NtDeviceIoControlFile+0xc
0012f02c 00402208 00000090 9c400004 00947eb8 kernel32!DeviceIoControl+0×137
0012f884 00404f8e 0012fe80 00000001 00000000 SystemDump_400000+0×2208

kd> ub 77e416f1
kernel32!DeviceIoControl+0x11d:
77e416db lea     eax,[ebp-28h]
77e416de push    eax
77e416df push    ebx
77e416e0 push    ebx
77e416e1 push    ebx
77e416e2 push    dword ptr [ebp+8]
77e416e5 je      kernel32!DeviceIoControl+0x131 (77e417f3)
77e416eb call    dword ptr [kernel32!_imp__NtDeviceIoControlFile (77e4103c)]

So everything is allright and messages above shall be ignored. I also got e-mails from other people having the same problem so it seems to be related with this WinDbg release and not with my debugging environment.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dumps for Dummies (Part 7)

November 8th, 2007

In the previous part I introduced clear separation between crashes and hangs and outlined memory dump capturing methods for each category. However, looking from user point of view we need to tell them what is the best way to capture a dump based on observations they have and their failure level, system or component. The latter failure type usually happens with user applications and services.

For user applications the best way is to get a memory dump proactively or put in another words, manually, and do not rely on a postmortem debugger that may not be set up correctly on a problem server in 100 server farm. If any error message box appears with a message that an application stopped working or that it has encountered an application error then you can use process dumpers like userdump.exe.

Suppose we have the following error message when TestDefaultDebugger application crashes on Vista x64 (the same technique is applicable to earlier OS too):

Then we can dump the process while it displays the problem if we know its process ID:

In Vista this can be done even more easily by dumping the process from Task Manager directly:

Choose Create Dump File:

and the process dump is saved in a user location for temporary files:

Although the application above is the native Windows application the same method applies for .NET applications. For example, the forthcoming TestDefaultDebugger.NET application

shows the following dialog:

and we can dump the process manually while it displays the message.

Although both applications will disappear from Task Manager if we choose Close or Quit on their error message boxes and therefore will be considered as crashes under my terminology, at the time when they show their stop messages they are considered as application hangs and this is why we use manual process dumpers.

Other scenarios including system failures will be considered in the next part. 

- Dmitry Vostokov @ DumpAnalysis.org -