Crash Dump Analysis Patterns (Part 41a)

December 12th, 2007

Some memory dumps are generated on purpose to troubleshoot process and system hangs. They are usually called Manual Dumps, manual crash dumps or manual memory dumps. Kernel, complete and kernel mini dumps can be generated using the famous keyboard method described in the following Microsoft article which has been recently updated and contains the fix for USB keyboards:

http://support.microsoft.com/kb/244139

The crash dump will show E2 bugcheck:

MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Various tools including Citrix SystemDump reuse E2 bug check code and its arguments.  There are many other 3rd-party tools used to bugcheck Windows OS such as BANG! from OSR or NotMyFault from Sysinternals. The old one is crash.exe that loads crashdrv.sys and uses the following bugcheck:

Unknown bugcheck code (69696969)
Unknown bugcheck description
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

In a memory dump you would see its characteristic stack trace pointing to crashdrv module: 

STACK_TEXT:
b5b3ebe0 f615888d nt!KeBugCheck+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
b5b3ebec f61584e3 crashdrv+0x88d
b5b3ec00 8041eec9 crashdrv+0x4e3
b5b3ec14 804b328a nt!IopfCallDriver+0x35
b5b3ec28 804b40de nt!IopSynchronousServiceTail+0x60
b5b3ed00 804abd0a nt!IopXxxControlFile+0x5d6
b5b3ed34 80468379 nt!NtDeviceIoControlFile+0x28
b5b3ed34 77f82ca0 nt!KiSystemService+0xc9
0006fed4 7c5794f4 ntdll!NtDeviceIoControlFile+0xb
0006ff38 01001a74 KERNEL32!DeviceIoControl+0xf8
0006ff70 01001981 crash+0x1a74
0006ff80 01001f93 crash+0x1981
0006ffc0 7c5989a5 crash+0x1f93
0006fff0 00000000 KERNEL32!BaseProcessStart+0x3d

Sometimes various hardware buttons are used to trigger NMI and generate a crash dump when keyboard is not available. The bugcheck will be:

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction. The hardware supplier should be called.
Arguments:
Arg1: 004f4454
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Critical process termination such as session 0 csrss.exe is used to force a memory dump:

CRITICAL_OBJECT_TERMINATION (f4)
A process or thread crucial to system operation has unexpectedly exited or been terminated.
Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function.
Arguments:
Arg1: 00000003, Process
Arg2: 8a090d88, Terminating object
Arg3: 8a090eec, Process image file name
Arg4: 80967b74, Explanatory message (ascii)

- Dmitry Vostokov @ DumpAnalysis.org -

Management: Analysis and Synthesis

December 11th, 2007

I created “Management Bits and Tips” category to write my thoughts on management and just realized how this category title fits into grand scientific modeling approach:

Analysis (Bits) -> Synthesis (Tips)

Contrast this with pure analytic approaches

  • “Management Bits”
  • “Management Bytes”
  • “Management Bits and Bytes”

or with pure synthetic approach “Management Tips”.

I was thinking about “Management QWords” category but abandoned that thought because QWord sounds to me as an abbreviation to “Cursing Words”. “Management DWords” ?

Perhaps I have to start a separate blog otherwise debugging community will complain for this off topic :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Expertise-Driven Motivation

December 11th, 2007

There are many X-Driven motivations out there but I prefer expertise-driven individuals, motivated by the desire to become experts. It is not bullshit as you might think. It is more like a persistent psychological state found in researchers and scientists and the best results are guaranteed when it is supplemented by money-driven positive feedback loop. I’ve seen such people in both software engineering and software technical support environments. It is very interesting topic and I might come back to it later.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 40a)

December 10th, 2007

In Advanced Windows Debugging book I encountered some thread stacks related to debugger events like Exit a Process, Load or Unload a Module and realized that I’ve seen process crash dumps with such stacks traces. These thread stacks are not normally encountered in healthy process dumps and, statistically speaking, when a process terminates or unloads a library the chances to save a memory dump manually using process dumpers like userdump.exe or Task Manager in Vista are very low unless an interactive debugger was attached or breakpoints were set in advance. Therefore the presence of such threads in a captured crash dump usually indicates some problem or at least focuses attention to the procedure used to save a dump. Such pattern merits its own name: Special Stack Trace.

For example, one process dump had the following stack trace showing process termination initiated from .NET runtime:

STACK_TEXT:
0012fc2c 7c827c1b ntdll!KiFastSystemCallRet
0012fc30 77e668c3 ntdll!NtTerminateProcess+0xc
0012fd24 77e66905 KERNEL32!_ExitProcess+0x63
0012fd38 01256d9b KERNEL32!ExitProcess+0x14
0012ff60 01256dc7 mscorwks!SafeExitProcess+0x11a
0012ff6c 011c5fa4 mscorwks!DisableRuntime+0xd0
0012ffb0 79181b5f mscorwks!_CorExeMain+0x8c
0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
0012fff0 00000000 KERNEL32!BaseProcessStart+0x23

The original problem was an error message box and the application disappeared when a user dismissed the message. How the dump was saved? Someone advised to attach NTSD to that process, hit ‘g’ and then save the memory dump when the process breaks into the debugger again. So the problem was already gone by that time and the better way would have been to create the manual user dump of that process when it was displaying the error message.

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 6)

December 7th, 2007

Previous parts were dealing with exceptions in kernel mode. In this and next parts I’m going to investigate the flow of exception processing in user mode. In part 1 I mentioned that interrupts and exceptions generated when CPU executes code in user mode require a processor to switch the current user mode stack to kernel mode stack. This can be seen when we have a user debugger attached and it gets an exception notification called first chance exception. Because of the stack switch we don’t see any saved processor context on user mode thread stack when WinDbg breaks on first-chance exception in TestDefaultDebugger64:

0:000> r
rax=0000000000000000 rbx=0000000000000001 rcx=000000000012fd80
rdx=00000000000003e8 rsi=000000000012fd80 rdi=0000000140033fe0
rip=0000000140001690 rsp=000000000012f198 rbp=0000000000000111
 r8=0000000000000000  r9=0000000140001690 r10=0000000140001690
r11=000000000012f260 r12=0000000000000000 r13=00000000000003e8
r14=0000000000000110 r15=0000000000000001
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1:
00000001`40001690 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> kL 100
Child-SP          RetAddr           Call Site
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0x180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0x32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0x60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0x38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0x59
00000000`0012f560 00000000`77c4337a user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f630 00000000`77c4341b user32!SendMessageWorker+0x68c
00000000`0012f6d0 000007ff`7f07c89f user32!SendMessageW+0x9d
00000000`0012f720 000007ff`7f07f2e1 comctl32!Button_ReleaseCapture+0x14f
00000000`0012f750 00000000`77c43abc comctl32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f980 00000000`77c3966a user32!DispatchMessageWorker+0x3af
00000000`0012f9f0 00000001`40007148 user32!IsDialogMessageW+0x256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0x38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0x28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0x3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0x67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0x23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0x3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0x1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0x1c6
00000000`0012fd50 00000001`4002ccb1 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3
00000000`0012fe80 00000001`40016150 TestDefaultDebugger64!AfxWinMain+0x75
00000000`0012fec0 00000000`77d5964c TestDefaultDebugger64!__tmainCRTStartup+0x260
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

0:000> dqs 000000000012f198-20 000000000012f198+20
00000000`0012f178  00000001`4000bc25 TestDefaultDebugger64!CWnd::ReflectLastMsg+0x65
00000000`0012f180  00000000`00080334
00000000`0012f188  00000000`00000006
00000000`0012f190  00000000`0000000d
00000000`0012f198  00000001`40004ba0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1a0  ffffffff`fffffffe
00000000`0012f1a8  00000000`00000000
00000000`0012f1b0  00000000`00000000
00000000`0012f1b8  00000000`00000000

We see that there are no saved SS:RSP, RFLAGS, CS:RIP registers which we see on a stack if an exception happens in kernel mode as shown in part 2. If we bugcheck our system using SystemDump tool to generate complete memory dump at that time we can look later at the whole thread that experienced exception in user mode and its user mode and kernel mode stacks:

kd> !process fffffadfe7055c20 2
PROCESS fffffadfe7055c20
    SessionId: 0  Cid: 0c64    Peb: 7fffffd7000  ParentCid: 07b0
    DirBase: 27e3d000  ObjectTable: fffffa800073a550  HandleCount:  55.
    Image: TestDefaultDebugger64.exe

        THREAD fffffadfe78f2bf0  Cid 0c64.0c68  Teb: 000007fffffde000 Win32Thread: fffff97ff4d71010 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
            fffffadfdf7b6fc0  SynchronizationEvent

        THREAD fffffadfe734c3d0  Cid 0c64.0c88  Teb: 000007fffffdc000 Win32Thread: 0000000000000000 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
            fffffadfe734c670  Semaphore Limit 0x2

kd> .thread /r /p fffffadfe78f2bf0
Implicit thread is now fffffadf`e78f2bf0
Implicit process is now fffffadf`e7055c20
Loading User Symbols

kd> kL 100
Child-SP          RetAddr           Call Site
fffffadf`df7b6d30 fffff800`0103b063 nt!KiSwapContext+0x85
fffffadf`df7b6eb0 fffff800`0103c403 nt!KiSwapThread+0xc3
fffffadf`df7b6ef0 fffff800`013a9dc1 nt!KeWaitForSingleObject+0x528
fffffadf`df7b6f80 fffff800`01336dcf nt!DbgkpQueueMessage+0x281
fffffadf`df7b7130 fffff800`01011c69 nt!DbgkForwardException+0x1c5
fffffadf`df7b74f0 fffff800`0104146f nt!KiDispatchException+0x264
fffffadf`df7b7af0 fffff800`010402e1 nt!KiExceptionExit
fffffadf`df7b7c70 00000001`40001690 nt!KiPageFault+0×1e1
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0×180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0×32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0×60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0×38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0×59
00000000`0012f560 00000000`77c4337a USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f630 00000000`77c4341b USER32!SendMessageWorker+0×68c
00000000`0012f6d0 000007ff`7f07c89f USER32!SendMessageW+0×9d
00000000`0012f720 000007ff`7f07f2e1 COMCTL32!Button_ReleaseCapture+0×14f
00000000`0012f750 00000000`77c43abc COMCTL32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f980 00000000`77c3966a USER32!DispatchMessageWorker+0×3af
00000000`0012f9f0 00000001`40007148 USER32!IsDialogMessageW+0×256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0×38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0×28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0×3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0×67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0×23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0×3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0×1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0×1c6
00000000`0012fd50 00000000`00000000 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3

Dumping kernel mode stack of our thread shows that the processor saved registers there:

kd> dqs fffffadf`df7b7c70  fffffadf`df7b7c70+200
fffffadf`df7b7c70  fffffadf`e78f2bf0
fffffadf`df7b7c78  00000000`00000000
fffffadf`df7b7c80  fffffadf`e78f2b01
fffffadf`df7b7c88  00000000`00000020
...
...
...
fffffadf`df7b7d90  00000000`00000000
fffffadf`df7b7d98  00000000`00000000
fffffadf`df7b7da0  00000000`00000000
fffffadf`df7b7da8  00000000`00000000
fffffadf`df7b7db0  00000000`001629b0
fffffadf`df7b7db8  00000000`00000001
fffffadf`df7b7dc0  00000000`00000001
fffffadf`df7b7dc8  00000000`00000111 ; RBP saved by KiPageFault
fffffadf`df7b7dd0  00000000`00000006 ; Page-Fault Error Code
fffffadf`df7b7dd8  00000001`40001690 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1 ; RIP
fffffadf`df7b7de0  00000000`00000033 ; CS
fffffadf`df7b7de8  00000000`00010246 ; RFLAGS
fffffadf`df7b7df0  00000000`0012f198 ; RSP
fffffadf`df7b7df8  00000000`0000002b ; SS
fffffadf`df7b7e00  00000000`0000027f
fffffadf`df7b7e08  00000000`00000000
fffffadf`df7b7e10  00000000`00000000
fffffadf`df7b7e18  0000ffff`00001f80
fffffadf`df7b7e20  00000000`00000000
fffffadf`df7b7e28  00000000`00000000
fffffadf`df7b7e30  00000000`00000000
fffffadf`df7b7e38  00000000`00000000


kd> .asm no_code_bytes
Assembly options: no_code_bytes

kd> u KiPageFault
nt!KiPageFault:
fffff800`01040100 push    rbp
fffff800`01040101 sub     rsp,158h
fffff800`01040108 lea     rbp,[rsp+80h]
fffff800`01040110 mov     byte ptr [rbp-55h],1
fffff800`01040114 mov     qword ptr [rbp-50h],rax
fffff800`01040118 mov     qword ptr [rbp-48h],rcx
fffff800`0104011c mov     qword ptr [rbp-40h],rdx
fffff800`01040120 mov     qword ptr [rbp-38h],r8

Error code 6 is 110 in binary and volume 3A of Intel manual tells us that “the fault was caused by a non-present page” (bit 0 is cleared), “the access causing the fault was a write” (bit 1 is set) and “the access causing the fault originated when the processor was executing in user mode” (bit 2 is set).

- Dmitry Vostokov @ DumpAnalysis.org -

SIMSIM Software Development Process

December 6th, 2007

Faced with the problem to find time to write troubleshooting tools that spring to my mind I devised this process that seems to be a novel way to write software for busy professionals. Its essence is in writing software when presenting it or when presenting software writing topics, for example, software architecture, design and implementation. It has some agile process flavour but magnified by a bigger audience than pair programming has and nicely complements my Reading Windows-based Code series. SIMSIM is an abbreviation for:

Show IMplementation and Subsequent IMprovement

More details will be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org -
 

Complexity and Memory Dumps (Part 1)

December 5th, 2007

Asking right questions at the appropriate hierarchical organization level is a known solution to complexity. In case of memory dumps it is sometimes useful to forget about bits, bytes, words, dwords and qwords, memory addresses, pointers, runtime structures, API and ask educated questions at component level, the simplest of it is the question about component timestamp, in WinDbg parlance, using variants of lm command, for example:

0:008> lmt m ModuleA
start    end        module name
76290000 762ad000   ModuleA  Sat Feb 17 13:59:59 2007 (45D70A5F)

0:008> lmt m ModuleB
start    end        module name
66c50000 66c65000   ModuleB  Fri Feb 02 22:30:03 2007 (45C3BB6B)

The next step is obvious: test with the newer version. Another good question is about consistency to exclude cases caused by α-particle hits. This latter possibility was mentioned in Andreas Zeller’s book I read some time ago and can be considered as the efficient cause of some crash dumps according to Aristotelian causation categories.   

- Dmitry Vostokov @ DumpAnalysis.org -

CAFF BugCheck

December 4th, 2007

Recently observed it in a kernel dump and found that userdump.sys generates it from userdump.exe request when process monitoring rules in Process Dumper from Microsoft userdump package are set to “Bugcheck after dumping”:

BUGCHECK_STR:  0xCAFF

PROCESS_NAME:  userdump.exe

kd> kL
Child-SP          RetAddr           Call Site
fffffadf`dfcf19b8 fffffadf`dfee38c4 nt!KeBugCheck
fffffadf`dfcf19c0 fffff800`012ce9cf userdump!UdIoctl+0x104
fffffadf`dfcf1a70 fffff800`012df026 nt!IopXxxControlFile+0xa5a
fffffadf`dfcf1b90 fffff800`010410fd nt!NtDeviceIoControlFile+0x56
fffffadf`dfcf1c00 00000000`77ef0a5a nt!KiSystemServiceCopyEnd+0x3
00000000`01eadd58 00000001`0000a755 ntdll!NtDeviceIoControlFile+0xa
00000000`01eadd60 00000000`77ef30a5 userdump_100000000!UdServiceWorkerAPC+0x1005
00000000`01eaf970 00000000`77ef0a2a ntdll!KiUserApcDispatcher+0x15
00000000`01eafe68 00000001`00007fe2 ntdll!NtWaitForSingleObject+0xa
00000000`01eafe70 00000001`00008a39 userdump_100000000!UdServiceWorker+0xb2
00000000`01eaff20 000007ff`7fee4db6 userdump_100000000!UdServiceStart+0x139
00000000`01eaff50 00000000`77d6b6da ADVAPI32!ScSvcctrlThreadW+0x25
00000000`01eaff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

This might be useful if you want to see kernel data that happened to be at exception time. In this case you can avoid requesting complete memory dump of physical memory and ask for kernel memory dump only (if it was configured in Control Panel) together with a user dump.

Note: do not set this option if you are unsure. It can have your production servers bluescreen in the case of false positive dumps.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 7)

December 3rd, 2007

Be language - excessive use of “is”. This anti-pattern was inspired by Alfred Korzybski notion of how “is” affects our understanding of the world. In the context of technical support the use of certain verbs sometimes leads to wrong troubleshooting and debugging paths. For example, the following phrase:

It is our pool tag. It is effected by driver A, driver B and driver C.  

Surely driver A, driver B and driver C were not developed by the same company that introduced the problem pool tag (smells Alien Component here). Unless supported by solid evidence the better phrase shall be:

It is our pool tag. It might have been effected by driver A, driver B or driver C.  

I’m not advocating to completely eradicate “be” verbs as was done in E-Prime language but to be conscious in their use. Thanks to Simple*ology in pointing me to the right direction.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 9d)

November 29th, 2007

Finally I got a good example of Deadlock pattern involving LPC. In the stack trace below svchost.exe thread (we call it thread A) receives an LPC call and dispatches it to componentA module which makes another LPC call (MessageId 000135b8) and waiting for a reply: 

THREAD 89143020  Cid 09b4.10dc  Teb: 7ff91000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8914320c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 000135b8:
Current LPC port d64a5328
Not impersonating
DeviceMap                 d64028f0
Owning Process            891b8b80       Image:         svchost.exe
Wait Start TickCount      237408         Ticks: 1890 (0:00:00:29.531)
Context Switch Count      866            
UserTime                  00:00:00.031
KernelTime                00:00:00.015
Win32 Start Address 0×000135b2
LPC Server thread working on message Id 135b2
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init b91f9000 Current b91f8c08 Base b91f9000 Limit b91f6000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
b91f8c20 8083e6a2 nt!KiSwapContext+0×26
b91f8c4c 8083f164 nt!KiSwapThread+0×284
b91f8c94 8093983f nt!KeWaitForSingleObject+0×346
b91f8d50 80834d3f nt!NtRequestWaitReplyPort+0×776
b91f8d50 7c94ed54 nt!KiFastCallEntry+0xfc
02bae928 7c941c94 ntdll!KiFastSystemCallRet
02bae92c 77c42700 ntdll!NtRequestWaitReplyPort+0xc
02bae984 77c413ba RPCRT4!LRPC_CCALL::SendReceive+0×230
02bae990 77c42c7f RPCRT4!I_RpcSendReceive+0×24
02bae9a4 77cb5d63 RPCRT4!NdrSendReceive+0×2b
02baec48 674825b6 RPCRT4!NdrClientCall+0×334

02baec5c 67486776 componentA!bar+0×16



02baf8d4 77c40f3b componentA!foo+0×157
02baf8f8 77cb23f7 RPCRT4!Invoke+0×30
02bafcf8 77cb26ed RPCRT4!NdrStubCall2+0×299
02bafd14 77c409be RPCRT4!NdrServerCall2+0×19
02bafd48 77c4093f RPCRT4!DispatchToStubInCNoAvrf+0×38
02bafd9c 77c40865 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
02bafdc0 77c434b1 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
02bafdfc 77c41bb3 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
02bafe20 77c45458 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
02baff84 77c2778f RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
02baff8c 77c2f7dd RPCRT4!RecvLotsaCallsWrapper+0xd
02baffac 77c2de88 RPCRT4!BaseCachedThreadRoutine+0×9d
02baffb8 7c82608b RPCRT4!ThreadStartRoutine+0×1b
02baffec 00000000 kernel32!BaseThreadStart+0×34

We search for that LPC message to find the server thread:

1: kd> !lpc message 000135b8
Searching message 135b8 in threads ...
    Server thread 89115db0 is working on message 135b8
Client thread 89143020 waiting a reply from 135b8  


                       

It belongs to Process.exe (we call it thread B):

1: kd> !thread 89115db0 0x16
THREAD 89115db0  Cid 098c.0384  Teb: 7ff79000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a114628  SynchronizationEvent
Not impersonating
DeviceMap                 d64028f0
Owning Process            8a2c9d88       Image:         Process.exe
Wait Start TickCount      237408         Ticks: 1890 (0:00:00:29.531)
Context Switch Count      1590            
UserTime                  00:00:03.265
KernelTime                00:00:01.671
Win32 Start Address 0x000135b8
LPC Server thread working on message Id 135b8
Start Address kernel32!BaseThreadStartThunk (0x7c82b5f3)
Stack Init b952d000 Current b952cc60 Base b952d000 Limit b952a000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b952cc78 8083e6a2 89115e28 89115db0 89115e58 nt!KiSwapContext+0x26
b952cca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0x284
b952ccec 8092db70 8a114628 00000006 ffffff01 nt!KeWaitForSingleObject+0x346
b952cd50 80834d3f 00000a7c 00000000 00000000 nt!NtWaitForSingleObject+0x9a
b952cd50 7c94ed54 00000a7c 00000000 00000000 nt!KiFastCallEntry+0xfc
22aceb48 7c942124 7c95970f 00000a7c 00000000 ntdll!KiFastSystemCallRet
22aceb4c 7c95970f 00000a7c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
22aceb88 7c959620 00000000 00000004 00002000 ntdll!RtlpWaitOnCriticalSection+0x19c
22aceba8 1b005744 06d30940 1b05ea80 06d30940 ntdll!RtlEnterCriticalSection+0xa8
22acebb0 1b05ea80 06d30940 feffffff 0cd410c0 componentB!bar+0xb



22acf8b0 77c40f3b 00080002 000800e2 00000001 componentB!foo+0xeb
22acf8e0 77cb23f7 0de110dc 22acfac8 00000007 RPCRT4!Invoke+0×30
22acfce0 77cb26ed 00000000 00000000 19f38f94 RPCRT4!NdrStubCall2+0×299
22acfcfc 77c409be 19f38f94 17316ef0 19f38f94 RPCRT4!NdrServerCall2+0×19
22acfd30 77c75e41 0de1dc58 19f38f94 22acfdec RPCRT4!DispatchToStubInCNoAvrf+0×38
22acfd48 77c4093f 0de1dc58 19f38f94 22acfdec RPCRT4!DispatchToStubInCAvrf+0×14
22acfd9c 77c40865 00000041 00000000 0de2b398 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
22acfdc0 77c434b1 19f38f94 00000000 0de2b398 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
22acfdfc 77c41bb3 1beeaec8 16b96f50 1baeef00 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
22acfe20 77c45458 16b96f88 22acfe38 1beeaec8 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127

0×16 flags for !thread extension command are used to temporarily set the process context to the owning process and show the first three function call parameters. We see that the thread B is waiting for the critical section 06d30940 and we use user space !locks extension command to find who owns it after switching process context:

1: kd> .process /r /p 8a2c9d88
Implicit process is now 8a2c9d88
Loading User Symbols

1: kd> !ntsdexts.locks

CritSec +6d30940 at 06d30940
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       d6c
EntryCount         0
ContentionCount    1
*** Locked

Now we try to find a thread with TID d6c (thread C):

1: kd> !thread -t d6c
Looking for thread Cid = d6c ...
THREAD 890d8bb8  Cid 098c.0d6c  Teb: 7ff71000 Win32Thread: bc23cc20 WAIT: (Unknown) UserMode Non-Alertable
    890d8da4  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 000135ea:
Current LPC port d649a678
Not impersonating
DeviceMap                 d64028f0
Owning Process            8a2c9d88       Image:         Process.exe
Wait Start TickCount      237641         Ticks: 1657 (0:00:00:25.890)
Context Switch Count      2102                 LargeStack
UserTime                  00:00:00.734
KernelTime                00:00:00.234
Win32 Start Address msvcrt!_endthreadex (0×77b9b4bc)
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init ba91d000 Current ba91cc08 Base ba91d000 Limit ba919000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ba91cc20 8083e6a2 890d8c30 890d8bb8 890d8c60 nt!KiSwapContext+0×26
ba91cc4c 8083f164 890d8da4 890d8d78 890d8bb8 nt!KiSwapThread+0×284
ba91cc94 8093983f 890d8da4 00000011 8a2c9d01 nt!KeWaitForSingleObject+0×346
ba91cd50 80834d3f 000008bc 19c94f00 19c94f00 nt!NtRequestWaitReplyPort+0×776
ba91cd50 7c94ed54 000008bc 19c94f00 19c94f00 nt!KiFastCallEntry+0xfc
2709ebf4 7c941c94 77c42700 000008bc 19c94f00 ntdll!KiFastSystemCallRet
2709ebf8 77c42700 000008bc 19c94f00 19c94f00 ntdll!NtRequestWaitReplyPort+0xc
2709ec44 77c413ba 2709ec80 2709ec64 77c42c7f RPCRT4!LRPC_CCALL::SendReceive+0×230
2709ec50 77c42c7f 2709ec80 779b2770 2709f06c RPCRT4!I_RpcSendReceive+0×24
2709ec64 77cb219b 2709ecac 1957cfe4 1957ab38 RPCRT4!NdrSendReceive+0×2b
2709f04c 779b43a3 779b2770 779b1398 2709f06c RPCRT4!NdrClientCall2+0×22e




2709ff84 77b9b530 26658fb0 00000000 00000000 ComponentC!foo+0×18d
2709ffb8 7c82608b 26d9af70 00000000 00000000 msvcrt!_endthreadex+0xa3
2709ffec 00000000 77b9b4bc 26d9af70 00000000 kernel32!BaseThreadStart+0×34

We see that thread C makes another LPC call (MessageId 000135e) and waiting for a reply. Let’s find the server thread processing the message (thread D):

1: kd> !lpc message 000135ea
Searching message 135ea in threads ...
Client thread 890d8bb8 waiting a reply from 135ea                         
    Server thread 89010020 is working on message 135ea


1: kd> !thread 89010020 16
THREAD 89010020  Cid 09b4.1530  Teb: 7ff93000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8903ba28  Mutant - owning thread 89143020
Not impersonating
DeviceMap                 d64028f0
Owning Process            891b8b80       Image:         svchost.exe
Wait Start TickCount      237641         Ticks: 1657 (0:00:00:25.890)
Context Switch Count      8            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×000135ea
LPC Server thread working on message Id 135ea
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init b9455000 Current b9454c60 Base b9455000 Limit b9452000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b9454c78 8083e6a2 89010098 89010020 890100c8 nt!KiSwapContext+0×26
b9454ca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0×284
b9454cec 8092db70 8903ba28 00000006 00000001 nt!KeWaitForSingleObject+0×346
b9454d50 80834d3f 00000514 00000000 00000000 nt!NtWaitForSingleObject+0×9a
b9454d50 7c94ed54 00000514 00000000 00000000 nt!KiFastCallEntry+0xfc
02b5f720 7c942124 75fdbe44 00000514 00000000 ntdll!KiFastSystemCallRet
02b5f724 75fdbe44 00000514 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
02b5f744 75fdc57f 000e6014 000da62c 02b5fca0 ComponentD!bar+0×42



02b5f8c8 77c40f3b 000d0a48 02b5fc90 00000001 ComponentD!foo+0×49
02b5f8f8 77cb23f7 75fdf8f2 02b5fae0 00000007 RPCRT4!Invoke+0×30
02b5fcf8 77cb26ed 00000000 00000000 000d4f24 RPCRT4!NdrStubCall2+0×299
02b5fd14 77c409be 000d4f24 000b5d70 000d4f24 RPCRT4!NdrServerCall2+0×19
02b5fd48 77c4093f 75fff834 000d4f24 02b5fdec RPCRT4!DispatchToStubInCNoAvrf+0×38
02b5fd9c 77c40865 00000005 00000000 7600589c RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
02b5fdc0 77c434b1 000d4f24 00000000 7600589c RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
02b5fdfc 77c41bb3 000d3550 000a78d0 001054b8 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
02b5fe20 77c45458 000a7908 02b5fe38 000d3550 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
02b5ff84 77c2778f 02b5ffac 77c2f7dd 000a78d0 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
02b5ff8c 77c2f7dd 000a78d0 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd
02b5ffac 77c2de88 0008ae00 02b5ffec 7c82608b RPCRT4!BaseCachedThreadRoutine+0×9d
02b5ffb8 7c82608b 000d5c20 00000000 00000000 RPCRT4!ThreadStartRoutine+0×1b
02b5ffec 00000000 77c2de6d 000d5c20 00000000 kernel32!BaseThreadStart+0×34

We see that thread D is waiting for the mutant object owned by thread A (89143020). Therefore we have a deadlock spanning 2 process boundaries via RPC/LPC calls with the following dependency graph:

A (svchost.exe) LPC-> B (Process.exe) CritSec-> C (Process.exe) LPC-> D (svchost.exe) Obj-> A (svchost.exe)

- Dmitry Vostokov @ DumpAnalysis.org -

Four pillars of software troubleshooting

November 29th, 2007

They are (sorted alphabetically):

  1. Crash Dump Analysis (also called Memory Dump Analysis or Core Dump Analysis)

  2. Problem Reproduction

  3. Trace and Log Analysis

  4. Virtual Assistance (also called Remote Assistance)

 

For troubleshooting software on Windows platforms Citrix provides GoToAssist for virtual on-site presence and Xen for problem reproduction.

- Dmitry Vostokov @ DumpAnalysis.org -

Understanding I/O Completion Ports

November 27th, 2007

Many articles and books explain Windows I/O completion ports from high level design considerations arising when building high-performance server software. But it is hard to recall them later when someone asks to explain and not everyone writes that software. Looking at complete memory dumps has an advantage of a bottom-up or reverse engineering approach where we see internals of server software and can immediately grasp the implementation of certain architectural and design decisions.

Consider this thread stack trace we can find almost inside any service or network application process:

THREAD 86cf09c0  Cid 05cc.2030  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a3bb970  QueueObject
    86cf0a38  NotificationTimer
Not impersonating
DeviceMap                 e15af5a8
Owning Process            8a3803d8       Image:         svchost.exe
Wait Start TickCount      2131621        Ticks: 1264 (0:00:00:19.750)
Context Switch Count      6            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address RPCRT4!ThreadStartRoutine (0×77c5de6d)
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init ba276000 Current ba275c38 Base ba276000 Limit ba273000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
ba275c50 8083d3b1 nt!KiSwapContext+0×26
ba275c7c 8083dea2 nt!KiSwapThread+0×2e5
ba275cc4 8092b205 nt!KeRemoveQueue+0×417
ba275d48 80833a6f nt!NtRemoveIoCompletion+0xdc

ba275d48 7c82ed54 nt!KiFastCallEntry+0xfc
0093feac 7c821bf4 ntdll!KiFastSystemCallRet
0093feb0 77e66142 ntdll!NtRemoveIoCompletion+0xc
0093fedc 77c604c3 kernel32!GetQueuedCompletionStatus+0×29

0093ff18 77c60655 RPCRT4!COMMON_ProcessCalls+0xa1
0093ff84 77c5f9f1 RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+0×117
0093ff8c 77c5f7dd RPCRT4!ProcessIOEventsWrapper+0xd
0093ffac 77c5de88 RPCRT4!BaseCachedThreadRoutine+0×9d
0093ffb8 77e6608b RPCRT4!ThreadStartRoutine+0×1b
0093ffec 00000000 kernel32!BaseThreadStart+0×34

We see that I/O completion port is implemented via kernel queue object so requests (work items, completion notifications, etc) are stored in that queue for further processing by threads. The number of active threads processing requests is bound to some maximum value that usually corresponds to the number of processors:

0: kd> dt _KQUEUE 8a3bb970
ntdll!_KQUEUE
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 EntryListHead    : _LIST_ENTRY [ 0x8a3bb980 - 0x8a3bb980 ]
   +0x018 CurrentCount     : 0
   +0×01c MaximumCount     : 2
   +0×020 ThreadListHead   : _LIST_ENTRY [ 0×86cf0ac8 - 0×89ff9520 ]

0: kd> !smt
SMT Summary:
------------
   KeActiveProcessors: **------------------------------ (00000003)
        KiIdleSummary: **------------------------------ (00000003)
No PRCB     Set Master SMT Set                                     IAID
 0 ffdff120 Master     **—————————— (00000003)  00
 1 f772f120 ffdff120   **—————————— (00000003)  01

Kernel work queues are also implemented via the same queue object as we might have guessed already:

THREAD 8a777660  Cid 0004.00d0  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    808b707c  QueueObject
Not impersonating
DeviceMap                 e1000928
Owning Process            8a780818       Image:         System
Wait Start TickCount      2615           Ticks: 2130270 (0:09:14:45.468)
Context Switch Count      301            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Start Address nt!ExpWorkerThread (0×8082d92b)
Stack Init f71e0000 Current f71dfcec Base f71e0000 Limit f71dd000 Call 0
Priority 12 BasePriority 12 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f71dfd04 8083d3b1 nt!KiSwapContext+0×26
f71dfd30 8083dea2 nt!KiSwapThread+0×2e5
f71dfd78 8082d9c1 nt!KeRemoveQueue+0×417
f71dfdac 809208fc nt!ExpWorkerThread+0xc8
f71dfddc 8083fc9f nt!PspSystemThreadStartup+0×2e
00000000 00000000 nt!KiThreadStartup+0×16

0: kd> dt _KQUEUE 808b707c
ntdll!_KQUEUE
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 EntryListHead    : _LIST_ENTRY [ 0x808b708c - 0x808b708c ]
   +0x018 CurrentCount     : 0
   +0×01c MaximumCount     : 2
   +0×020 ThreadListHead   : _LIST_ENTRY [ 0×8a77a128 - 0×8a777768 ]

I’ve created the simple UML diagram showing high-level relationship between various objects seen from crash dumps. Note that Active Thread object can process items from more than one completion port if its wait was satisfied for one port and then for another but I have never seen this. Obviously Waiting thread can wait only for one completion port. 

- Dmitry Vostokov @ DumpAnalysis.org -

DebugWare

November 27th, 2007

I’ve been slowly accumulating blog posts about various troubleshooting tools for my next book in a row with a working title:

DebugWare: The Art and Craft of Writing Troubleshooting and Debugging Tools

Details will be announced later together with supporting website which is under construction. This book will be about architecture, design and implementation of troubleshooting tools for software technical support.

- Dmitry Vostokov @ DumpAnalysis.org -

Teaching binary to decimal conversion

November 26th, 2007

Sometimes we have data in binary and we want to convert it to decimal to lookup some constant in a header file, for example. I used to do it previously via calc.exe. Now I use .formats WinDbg command and 0y binary prefix:

0:000> .formats 0y111010
Evaluate expression:
  Hex:     0000003a
  Decimal: 58
  Octal:   00000000072
  Binary:  00000000 00000000 00000000 00111010
  Chars:   ...:
  Time:    Thu Jan 01 00:00:58 1970
  Float:   low 8.12753e-044 high 0
  Double:  2.86558e-322

Some months ago I was flying SWISS and found this binary watch in their duty-free catalog which I use now to guess time :-)

01 The One Binary Watch
 

Buy from Amazon

It has 6 binary digits for minutes. There are desktop binary clocks and other binary watches available if you google them but they don’t have 6 binary digits for minutes. They approximate them by using 2 rows or columns: tenths of minutes and minutes (2 + 4 binary digits) and we are all good in handling 4 binary digits because of our work with hexadecimal nibbles but not good in handling more binary digits like 5 or 6 when we see them in one row. 

- Dmitry Vostokov @ DumpAnalysis.org -

Stack traces on the Web

November 26th, 2007

How many WinDbg stack traces on the web are available for mining? Google gave the answer when I searched for typical stack trace text fragments:

"ChildEBP RetAddr" - about 40,200

"ChildEBP RetAddr  Args to Child" - about 30,000

"Frame IP not in any known module" - about 10,800

- Dmitry Vostokov @ DumpAnalysis.org -

Five golden rules of troubleshooting

November 26th, 2007

It is difficult to analyze a problem when you have crash dumps and/or traces from various tracing tools and supporting information you have is incomplete or missing. After doing crash dump and trace analysis including ETW-based traces for more than 4 years I came up with this easy to remember 4WS questions to ask when you send or request traces and memory dumps:

What - What had happened or had been observed? Crash or hang, for example?

When - When did the problem happen if traces were recorded for hours?

Where - What server or workstation had been used for tracing or where memory dumps came from? For example, one trace is from a primary server and two others are from backup servers or one trace is from a client workstation and the other is from a server. 

Why - Why did a customer or a support engineer request a dump or a trace? This could shed the light on various assumptions including presuppositions hidden in problem description.  

Supporting information - needed to find a needle in a hay: process id, thread id, etc. Also, the answer to the following question is important: how dumps and traces were created?

Every trace or memory dump shall be accompanied by 4WS answers.  

4WS rule can be applied to any troubleshooting because even the problem description itself is some kind of a trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 39)

November 23rd, 2007

As mentioned in Early Crash Dump pattern saving crash dumps on first-chance exceptions helps to diagnose components that might have caused corruption and later crashes, hangs or CPU spikes by ignoring abnormal exceptions like access violation. In such cases we need to know whether an application installs its own Custom Exception Handler or several of them. If it uses only default handlers provided by runtime or windows subsystem then most likely a first-chance access violation exception will result in a last-chance exception and a postmortem dump. To check a chain of exception handlers we can use WinDbg !exchain extention command. For example:

0:000> !exchain
0017f9d8: TestDefaultDebugger!AfxWinMain+3f5 (00420aa9)
0017fa60: TestDefaultDebugger!AfxWinMain+34c (00420a00)
0017fb20: user32!_except_handler4+0 (770780eb)
0017fcc0: user32!_except_handler4+0 (770780eb)
0017fd24: user32!_except_handler4+0 (770780eb)
0017fe40: TestDefaultDebugger!AfxWinMain+16e (00420822)
0017feec: TestDefaultDebugger!AfxWinMain+797 (00420e4b)
0017ff90: TestDefaultDebugger!_except_handler4+0 (00410e00)
0017ffdc: ntdll!_except_handler4+0 (77961c78)

We see that TestDefaultDebugger doesn’t have its own exception handlers except ones provided by MFC and C/C++ runtime libraries which were linked statically. Here is another example. It was reported that a 3rd-party application was hanging and spiking CPU (Spiking Thread pattern) so a user dump was saved using command line userdump.exe:

0:000> vertarget
Windows Server 2003 Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
Product: Server, suite: TerminalServer
kernel32.dll version: 5.2.3790.4062 (srv03_sp2_gdr.070417-0203)
Debug session time: Thu Nov 22 12:45:59.000 2007 (GMT+0)
System Uptime: 0 days 10:43:07.667
Process Uptime: 0 days 4:51:32.000 
Kernel time: 0 days 0:08:04.000 
User time: 0 days 0:23:09.000

0:000> !runaway 3 
User Mode Time 
Thread Time  
0:1c1c      0 days 0:08:04.218  
1:2e04      0 days 0:00:00.015
Kernel Mode Time 
Thread Time  
0:1c1c      0 days 0:23:09.156  
1:2e04      0 days 0:00:00.031

0:000> kL
ChildEBP RetAddr 
0012fb80 7739bf53 ntdll!KiFastSystemCallRet
0012fbb4 05ca73b0 user32!NtUserWaitMessage+0xc
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fd20 05c8be3f 3rdPartyDLL+0x573b0
0012fd50 05c9e9ea 3rdPartyDLL+0x3be3f
0012fd68 7739b6e3 3rdPartyDLL+0x4e9ea
0012fd94 7739b874 user32!InternalCallWinProc+0x28
0012fe0c 7739c8b8 user32!UserCallWinProcCheckWow+0x151
0012fe68 7739c9c6 user32!DispatchClientMessage+0xd9
0012fe90 7c828536 user32!__fnDWORD+0x24
0012febc 7739d1ec ntdll!KiUserCallbackDispatcher+0x2e
0012fef8 7738cee9 user32!NtUserMessageCall+0xc
0012ff18 0050aea9 user32!SendMessageA+0x7f
0012ff70 00452ae4 3rdPartyApp+0x10aea9
0012ffac 00511941 3rdPartyApp+0x52ae4
0012ffc0 77e6f23b 3rdPartyApp+0x111941
0012fff0 00000000 kernel32!BaseProcessStart+0x23

Exception chain showed custom exception handlers:

0:000> !exchain
0012fb8c: 3rdPartyDLL+57acb (05ca7acb)
0012fd28: 3rdPartyDLL+3be57 (05c8be57)
0012fd34: 3rdPartyDLL+3be68 (05c8be68)

0012fdfc: user32!_except_handler3+0 (773aaf18)
  CRT scope  0, func:   user32!UserCallWinProcCheckWow+156 (773ba9ad)
0012fe58: user32!_except_handler3+0 (773aaf18)
0012fea0: ntdll!KiUserCallbackExceptionHandler+0 (7c8284e8)
0012ff3c: 3rdPartyApp+53310 (00453310)
0012ff48: 3rdPartyApp+5334b (0045334b)
0012ff9c: 3rdPartyApp+52d06 (00452d06)
0012ffb4: 3rdPartyApp+38d4 (004038d4)

0012ffe0: kernel32!_except_handler3+0 (77e61a60)
  CRT scope  0, filter: kernel32!BaseProcessStart+29 (77e76a10)
                func:   kernel32!BaseProcessStart+3a (77e81469)

The customer then enabled MS Exception Monitor and selected only Access violation exception code (c0000005) to avoid False Positive Dumps. During application execution various 1st-chance exception crash dumps were saved pointing to numerous access violations including function calls into unloaded modules, for example:

0:000> kL 100
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f910 7739b6e3 <Unloaded_Another3rdParty.dll>+0x4ce58
0012f93c 7739b874 user32!InternalCallWinProc+0x28
0012f9b4 7739c8b8 user32!UserCallWinProcCheckWow+0x151
0012fa10 7739c9c6 user32!DispatchClientMessage+0xd9
0012fa38 7c828536 user32!__fnDWORD+0x24
0012fa64 7739d1ec ntdll!KiUserCallbackDispatcher+0x2e
0012faa0 7738cee9 user32!NtUserMessageCall+0xc
0012fac0 0a0f2e01 user32!SendMessageA+0x7f
0012fae4 0a0f2ac7 3rdPartyDLL+0x52e01
0012fb60 7c81a352 3rdPartyDLL+0x52ac7
0012fb80 7c839dee ntdll!LdrpCallInitRoutine+0x14
0012fc94 77e6b1bb ntdll!LdrUnloadDll+0x41a
0012fca8 0050c9c1 kernel32!FreeLibrary+0x41
0012fdf4 004374af 3rdPartyApp+0x10c9c1
0012fe24 0044a076 3rdPartyApp+0x374af
0012fe3c 7739b6e3 3rdPartyApp+0x4a076
0012fe68 7739b874 user32!InternalCallWinProc+0x28
0012fee0 7739ba92 user32!UserCallWinProcCheckWow+0x151
0012ff48 773a16e5 user32!DispatchMessageWorker+0x327
0012ff58 00452aa0 user32!DispatchMessageA+0xf
0012ffac 00511941 3rdPartyApp+0x52aa0
0012ffc0 77e6f23b 3rdPartyApp+0x111941
0012fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov @ DumpAnalysis.org -

Four causes of crash dumps

November 23rd, 2007

Obviously the appearance of crash dumps on your computer was caused by something. A bug, fault, defect or something else?

Aristotle suggested 4 types of causation 2 millennia ago and they are:

Material cause - presence of some substance, usually material one (hardware) but can be machine code (software). The distinction between hardware and software is often blurred today because of virtualization.

Formal cause - some form or arrangement (an algorithm)

Efficient cause - an agent (data flow or event caused an algorithm to be executed)

Final cause - the desire of someone (or something, operating system, for example).

We skip material causes because hardware and software are always involved. Obviously final causality should be among of crash dump causes because they were either anticipated or made deliberately. Let’s look at 3 examples with possible causes:

Buffer Overflow

  • Formal cause - a defect in code which might have arisen from incomplete or wrong model

  • Efficient cause - data is too big to fit in a buffer

  • Final cause - operating system and runtime library support decided to save a crash dump

Bugcheck (NMI)

  • Formal cause - NMI handler

  • Efficient cause - a button on a hardware panel or KeBugCheckEx

  • Final cause - “I need a memory dump” desire. Also crash dump saving functions were written before by kernel developers in anticipation of future crash dumps.

Bugcheck (A)

  • Formal cause - a defect in code again or particular disposition of threads

  • Efficient cause - Driver Verifier triggered paging out data

  • Final cause - deliberate OS bugcheck (here we can also say that it was anticipated by OS designers)

Concrete causes depend on the organizational level you use: software/hardware systems/components, modeling act by humans, etc.

- Dmitry Vostokov @ DumpAnalysis.org -

StressPrinters in press

November 23rd, 2007

Thomas Koetzing wrote a useful article on how to use StressPrinters and put some examples:

Understanding and using Citrix StressPrinters

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 38)

November 22nd, 2007

Hooking functions using trampoline method is so common on Windows and sometimes we need to check Hooked Functions in specific modules and determine which module hooked them for troubleshooting or memory forensic analysis needs. If original unhooked modules are available (via symbol server, for example) this can be done by using !chkimg WinDbg extension command:

0:002> !chkimg -lo 50 -d !kernel32 -v
Searching for module with expression: !kernel32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\kernel32.dll\44C60F39102000\kernel32.dll
No range specified

Scanning section:    .text
Size: 564445
Range to scan: 77e41000-77ecacdd
    77e44004-77e44008  5 bytes - kernel32!GetDateFormatA
 [ 8b ff 55 8b ec:e9 f7 bf 08 c0 ]
    77e4412e-77e44132  5 bytes - kernel32!GetTimeFormatA (+0×12a)
 [ 8b ff 55 8b ec:e9 cd be 06 c0 ]
    77e4e857-77e4e85b  5 bytes - kernel32!FileTimeToLocalFileTime (+0xa729)
 [ 8b ff 55 8b ec:e9 a4 17 00 c0 ]
    77e56b5f-77e56b63  5 bytes - kernel32!GetTimeZoneInformation (+0×8308)
 [ 8b ff 55 8b ec:e9 9c 94 00 c0 ]
    77e579a9-77e579ad  5 bytes - kernel32!GetTimeFormatW (+0xe4a)
 [ 8b ff 55 8b ec:e9 52 86 06 c0 ]
    77e57fc8-77e57fcc  5 bytes - kernel32!GetDateFormatW (+0×61f)
 [ 8b ff 55 8b ec:e9 33 80 08 c0 ]
    77e6f32b-77e6f32f  5 bytes - kernel32!GetLocalTime (+0×17363)
 [ 8b ff 55 8b ec:e9 d0 0c 00 c0 ]
    77e6f891-77e6f895  5 bytes - kernel32!LocalFileTimeToFileTime (+0×566)
 [ 8b ff 55 8b ec:e9 6a 07 01 c0 ]
    77e83499-77e8349d  5 bytes - kernel32!SetLocalTime (+0×13c08)
 [ 8b ff 55 8b ec:e9 62 cb 00 c0 ]
    77e88c32-77e88c36  5 bytes - kernel32!SetTimeZoneInformation (+0×5799)
 [ 8b ff 55 8b ec:e9 c9 73 01 c0 ]
Total bytes compared: 564445(100%)
Number of errors: 50
50 errors : !kernel32 (77e44004-77e88c36)

0:002> u 77e44004
kernel32!GetDateFormatA:
77e44004 e9f7bf08c0      jmp     37ed0000
77e44009 81ec18020000    sub     esp,218h
77e4400f a148d1ec77      mov     eax,dword ptr [kernel32!__security_cookie (77ecd148)]
77e44014 53              push    ebx
77e44015 8b5d14          mov     ebx,dword ptr [ebp+14h]
77e44018 56              push    esi
77e44019 8b7518          mov     esi,dword ptr [ebp+18h]
77e4401c 57              push    edi

0:002> u 37ed0000
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for MyDateTimeHooks.dll -
37ed0000 e99b262f2d      jmp     MyDateTimeHooks+0×26a0 (651c26a0)
37ed0005 8bff            mov     edi,edi
37ed0007 55              push    ebp
37ed0008 8bec            mov     ebp,esp
37ed000a e9fa3ff73f      jmp     kernel32!GetDateFormatA+0×5 (77e44009)
37ed000f 0000            add     byte ptr [eax],al
37ed0011 0000            add     byte ptr [eax],al
37ed0013 0000            add     byte ptr [eax],al

- Dmitry Vostokov @ DumpAnalysis.org -