Archive for the ‘Debugging’ Category

New edition of Windows® Internals

Sunday, January 20th, 2008

Finally I can pre-order this 1232 page 5th edition! Looking forward to seeing it in the post.

Windows® Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition (PRO-Developer)

Buy from Amazon

I read all previous editions as the part of my knowledge read ahead cache. Here is my short review of the previous 4th edition.

- Dmitry Vostokov @ DumpAnalysis.org -

Software troubleshooting and causal models

Tuesday, January 15th, 2008

Recently became interested in causality, causal models and how they can be applied to software troubleshooting especially when we have various traces and logs. Looking at traces, system and application event logs and logs from other tools, technical support engineers see correlations between various events and build causal models that are used to trace symptoms back to their causes. They use prior knowledge, assumptions, informed guessing and event order to discern causal structure. Clearly event order in logs influences that so it is important to understand how we think in causal terms in order to learn about our biases.

Another important question from software engineering perspective is how to design tracing components to help technical support and software maintenance engineers build correct causal models of software issues. Just finished reading a book about causal modeling:

Review of Causal Models

- Dmitry Vostokov @ DumpAnalysis.org -

Iterative and Incremental Publishing Process

Monday, January 14th, 2008

Being committed to writing several books brings the question about the proper writing and publishing process beyond page-a-day time management. Borrowed from software engineering I decided to use iterative and incremental process to finish previously announced Windows® Crash Dump Analysis book. The first iteration will result in this working title:

Memory Dump Analysis Anthology, Volume I

This is revised, expanded, commented and cross-referenced 500 page volume of selected blog posts and other useful information. Details will be announced soon so stay tuned. Russian translation is also planned.

The abbreviation “MDAA” actually sounds funny in Russian. Similar to “Hmmm”  :-)

- Dmitry Vostokov @ DumpAnalysis.org -

What is a Software Defect?

Tuesday, January 8th, 2008

Software can be considered as models of real or imagined systems which may be models themselves. Any modeling act involves a mapping between a system and a model that preserves causal, ordering and inclusion relationships and a mapping from the model to the system that translates emerging relationships and causal structures back to that system. The latter I call modeling expectations and any observed deviations in structure and behavior between the model and the system I call software defects which can be functional failures, error messages, crashes or hangs (red line on diagrams below):

Consider ATM software as a venerable example. It models imagined world of ATM transactions which we call ATM software requirements. The latter specifies ACID (atomic, consistent, isolated and durable) transaction rules. If they are broken by the written software we have the defect:

What are software requirements? They are models of real or imagined systems or can be models of past causal and relationship experiences. If requirements are wrong they do not translate back and we still consider software as having a defect:

Translating this to ATM example we have:

Another example where the perceived absence of failures can be considered as a defect is the program designed to model memory leaks that might not be leaking due to a defect in its source code.

Now we can answer the question ”What is a memory corruption?” which is the topic of the next post.

- Dmitry Vostokov @ DumpAnalysis.org -

How to distinguish between 1st and 2nd chances

Wednesday, January 2nd, 2008

Sometimes we look for Early Crash Dump pattern but information about whether an exception was first-chance or second-chance is missing from a crash dump file name or in a crash dump itself, for example:

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(1254.1124): Access violation - code c0000005 (first/second chance not available)
TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1:
00000000`00401570 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

If we recall that first-chance exceptions don’t leave any traces on user space thread stacks (see Interrupts and Exceptions Explained, Part 6 for details) then we won’t see any exception codes on thread raw stack:

0:000> !teb
TEB at 000007fffffde000
    ExceptionList:        0000000000000000
    StackBase:            0000000000130000
    StackLimit:           000000000012b000

    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 000007fffffde000
    EnvironmentPointer:   0000000000000000
    ClientId:             0000000000001254 . 0000000000001124
    RpcHandle:            0000000000000000
    Tls Storage:          000007fffffde058
    PEB Address:          000007fffffd5000
    LastErrorValue:       0
    LastStatusValue:      c0000034
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> s -d 000000000012b000 0000000000130000 c0000005

However, we would definitely see it on raw stack in second-chance exception crash dump: 

0:000> s -d 000000000012b000 0000000000130000 c0000005 
00000000`0012f000  c0000005 00000000 00000000 00000000  .

Using raw stack observations we can even tell when a crash dump was saved from a debugger handling a second-chance exception or saved by some postmortem debugger afterwards. For example, on my Vista x64 I see the following difference:

raw stack from a crash dump saved from WinDbg after receiving second-chance exception:

00000000`0012e278  00000000`00000000
00000000`0012e280  00000000`00000000
00000000`0012e288  00000000`7790032c kernel32!IsDebugPortPresent+0×2c
00000000`0012e290  00000000`00000000
00000000`0012e298  00000000`00000000
00000000`0012e2a0  00000000`00000000
00000000`0012e2a8  00000000`7790032c kernel32!IsDebugPortPresent+0×2c
00000000`0012e2b0  00000001`00010000
00000000`0012e2b8  00000000`00000000
00000000`0012e2c0  00000000`00000000
00000000`0012e2c8  00000000`77b63c94 ntdll! ?? ::FNODOBFM::`string’+0xbd14
00000000`0012e2d0  00000000`00000000
00000000`0012e2d8  00000000`0012e420
00000000`0012e2e0  00000000`77b63cb0 ntdll! ?? ::FNODOBFM::`string’+0xbd30
00000000`0012e2e8  00000000`7793cf47 kernel32!UnhandledExceptionFilter+0xb7
00000000`0012e2f0  ffffffff`ffffffff
00000000`0012e2f8  00000000`00000000
00000000`0012e300  00000000`00000000
00000000`0012e308  00000000`00000000
00000000`0012e310  00000000`00318718
00000000`0012e318  00000000`7799eb89 user32!ImeWndProcWorker+0×331
00000000`0012e320  00000000`00000000
00000000`0012e328  00000000`00000000
00000000`0012e330  00000000`00000000
00000000`0012e338  00000000`004189b0 TestDefaultDebugger64!_getptd_noexit+0×80
00000000`0012e340  00000000`01fb4b90
00000000`0012e348  00000000`0012e928
00000000`0012e350  00000000`00000000
00000000`0012e358  00000000`00418a50 TestDefaultDebugger64!_getptd+0×80
00000000`0012e360  00000000`01fb4b90
00000000`0012e368  00000000`0041776d TestDefaultDebugger64!_XcptFilter+0×1d
00000000`0012e370  00000000`0012e4f0
00000000`0012e378  00000000`77aa3cfa ntdll!RtlDecodePointer+0×2a
00000000`0012e380  00000000`00436fc4 TestDefaultDebugger64!__rtc_tzz+0×1f8c
00000000`0012e388  00000000`00000001
00000000`0012e390  00000000`0012f000
00000000`0012e398  00000000`77a60000 ntdll!`string’ <PERF> (ntdll+0×0)
00000000`0012e3a0  00000000`0004c6e1
00000000`0012e3a8  00000000`00000001
00000000`0012e3b0  00000000`0012ff90
00000000`0012e3b8  00000000`77afb1b5 ntdll!RtlUserThreadStart+0×95
00000000`0012e3c0  00000000`0012e420
00000000`0012e3c8  00000000`d4b99e6f
00000000`0012e3d0  00000000`00000000
00000000`0012e3d8  00000000`00000001
00000000`0012e3e0  00000000`0012e4f0
00000000`0012e3e8  00000000`77a8ae4b ntdll!_C_specific_handler+0×8c

raw stack from a crash dump saved by CDB installed as a default postmortem debugger (WER was used to launch it):

0:000> dqs 00000000`0012e278 00000000`0012e3e8
00000000`0012e278  00000000`00000000
00000000`0012e280  00000000`00000000
00000000`0012e288  00000000`00000000
00000000`0012e290  ffffffff`ffffffff
00000000`0012e298  00000000`779257ef kernel32!WerpReportExceptionInProcessContext+0×9f
00000000`0012e2a0  00000000`00000000
00000000`0012e2a8  00000000`0012ff90
00000000`0012e2b0  00000000`0012e420
00000000`0012e2b8  00000000`0041f4dc TestDefaultDebugger64!__CxxUnhandledExceptionFilter+0×5c
00000000`0012e2c0  00000000`00000000
00000000`0012e2c8  00000000`7a8b477b
00000000`0012e2d0  00000000`00000000
00000000`0012e2d8  fffff980`01000000
00000000`0012e2e0  00000000`00000001
00000000`0012e2e8  00000000`7793d07e kernel32!UnhandledExceptionFilter+0×1ee
00000000`0012e2f0  00000000`77b63cb0 ntdll! ?? ::FNODOBFM::`string’+0xbd30
00000000`0012e2f8  000007fe`00000000
00000000`0012e300  00000000`00000000
00000000`0012e308  00000000`00000001
00000000`0012e310  00000000`00000000
00000000`0012e318  00000000`00000000
00000000`0012e320  000007ff`00000000
00000000`0012e328  00000000`00000000
00000000`0012e330  00000000`00000000
00000000`0012e338  00000000`004189b0 TestDefaultDebugger64!_getptd_noexit+0×80
00000000`0012e340  00000000`023f4b90
00000000`0012e348  00000000`77ab8cf8 ntdll!RtlpFindNextActivationContextSection+0xaa
00000000`0012e350  00000000`00000000
00000000`0012e358  00000000`00418a50 TestDefaultDebugger64!_getptd+0×80
00000000`0012e360  00000000`023f4b90
00000000`0012e368  00000000`0041776d TestDefaultDebugger64!_XcptFilter+0×1d
00000000`0012e370  00000000`0012e4f0
00000000`0012e378  00000000`77aa3cfa ntdll!RtlDecodePointer+0×2a
00000000`0012e380  00000000`00436fc4 TestDefaultDebugger64!__rtc_tzz+0×1f8c
00000000`0012e388  00000000`00000001
00000000`0012e390  00000000`0012f000
00000000`0012e398  00000000`77a60000 ntdll!`string’ <PERF> (ntdll+0×0)
00000000`0012e3a0  00000000`0004c6e1
00000000`0012e3a8  00000000`00000001
00000000`0012e3b0  00000000`0012ff90
00000000`0012e3b8  00000000`77afb1b5 ntdll!RtlUserThreadStart+0×95
00000000`0012e3c0  00000000`0012e420
00000000`0012e3c8  00000000`7a8b477b
00000000`0012e3d0  00000000`00000000
00000000`0012e3d8  00000000`00000001
00000000`0012e3e0  00000000`0012e4f0
00000000`0012e3e8  00000000`77a8ae4b ntdll!_C_specific_handler+0×8c

- Dmitry Vostokov @ DumpAnalysis.org -

Raw Stack Dump of all threads (part 2)

Monday, December 24th, 2007

In the previous part I used WinDbg scripting to get raw stack data from user process dump. However the script needs to be modified if the dump is complete memory dump. Here I use !for_each_thread WinDbg extension command to dump stack trace and user space raw stack data for all threads except system threads because they don’t have user space stack counterpart and their TEB address is NULL:

!for_each_thread ".thread /r /p @#Thread; .if (@$teb != 0) {!thread @#Thread; r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2}"

We need to open a log file. It will be huge and we might want to dump raw stack contents for specific process only. In such case we can filter the output of the script using $proc pseudo-register, the address of EPROCESS:

!for_each_thread ".thread /r /p @#Thread; .if (@$teb != 0 & @$proc == <EPROCESS>) {!thread @#Thread; r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2}"

For example:

1: kd>!process 0 0
...
...
...
PROCESS 8596f9c8  SessionId: 0  Cid: 0fac    Peb: 7ffde000  ParentCid: 0f3c
    DirBase: 3fba6520  ObjectTable: d6654e28  HandleCount: 389.
    Image: explorer.exe


1: kd> !for_each_thread ".thread /r /p @#Thread; .if (@$teb != 0 & @$proc == 8596f9c8) {!thread @#Thread; r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2}”
Implicit thread is now 8659b208
Implicit process is now 8659b478
Loading User Symbols

Implicit thread is now 86599db0
Implicit process is now 8659b478
Loading User Symbols

...
...
...
Implicit thread is now 85b32db0
Implicit process is now 8596f9c8
Loading User Symbols

THREAD 85b32db0  Cid 0fac.0fb0  Teb: 7ffdd000 Win32Thread: bc0a6be8 WAIT: (Unknown) UserMode Non-Alertable
    859bda20  SynchronizationEvent
Not impersonating
DeviceMap                 d743e440
Owning Process            8596f9c8       Image:         explorer.exe
Wait Start TickCount      376275         Ticks: 102 (0:00:00:01.593)
Context Switch Count      3509                 LargeStack
UserTime                  00:00:00.078
KernelTime                00:00:00.203
Win32 Start Address Explorer!ModuleEntry (0x010148a4)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init ba5fe000 Current ba5fdc50 Base ba5fe000 Limit ba5f9000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ba5fdc68 80833465 85b32db0 85b32e58 00000000 nt!KiSwapContext+0x26
ba5fdc94 80829a62 00000000 bc0a6be8 00000000 nt!KiSwapThread+0x2e5
ba5fdcdc bf89abe3 859bda20 0000000d 00000001 nt!KeWaitForSingleObject+0x346
ba5fdd38 bf89da53 000024ff 00000000 00000001 win32k!xxxSleepThread+0x1be
ba5fdd4c bf89e411 000024ff 00000000 0007fef8 win32k!xxxRealWaitMessageEx+0x12
ba5fdd5c 8088978c 0007ff08 7c8285ec badb0d00 win32k!NtUserWaitMessage+0x14
ba5fdd5c 7c8285ec 0007ff08 7c8285ec badb0d00 nt!KiFastCallEntry+0xfc (TrapFrame @ ba5fdd64)
0007feec 7739bf53 7c92addc 77e619d1 000d9298 ntdll!KiFastSystemCallRet
0007ff08 7c8fadbd 00000000 0007ff5c 0100fff1 USER32!NtUserWaitMessage+0xc
0007ff14 0100fff1 000d9298 7ffde000 0007ffc0 SHELL32!SHDesktopMessageLoop+0x24
0007ff5c 0101490c 00000000 00000000 000207fa Explorer!ExplorerWinMain+0x2c4
0007ffc0 77e6f23b 00000000 00000000 7ffde000 Explorer!ModuleEntry+0x6d
0007fff0 00000000 010148a4 00000000 78746341 kernel32!BaseProcessStart+0x23

Last set context:
TEB at 7ffdd000
    ExceptionList:        0007ffe0
    StackBase:            00080000
    StackLimit:           00072000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffdd000
    EnvironmentPointer:   00000000
    ClientId:             00000fac . 00000fb0
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffde000
    LastErrorValue:       6
    LastStatusValue:      c0000008
    Count Owned Locks:    0
    HardErrorMode:        0
00072000  ????????
00072004  ????????
00072008  ????????
0007200c  ????????
00072010  ????????
00072014  ????????
00072018  ????????
0007201c  ????????
...
...
...
00079ff8  ????????
00079ffc  ????????
0007a000  00000000
0007a004  00000000
0007a008  00000000
0007a00c  00000000
0007a010  00000000
0007a014  00000000
0007a018  00000000
0007a01c  00000000
0007a020  00000000
0007a024  00000000
0007a028  00000000
0007a02c  00000000
...
...
...
0007ff04  0007ff14
0007ff08  0007ff14
0007ff0c  7c8fadbd SHELL32!SHDesktopMessageLoop+0x24
0007ff10  00000000
0007ff14  0007ff5c
0007ff18  0100fff1 Explorer!ExplorerWinMain+0x2c4
0007ff1c  000d9298
0007ff20  7ffde000
0007ff24  0007ffc0
0007ff28  00000000
0007ff2c  0007fd28
0007ff30  0007ff50
0007ff34  7ffde000
0007ff38  7c82758b ntdll!ZwQueryInformationProcess+0xc
0007ff3c  77e6c336 kernel32!GetErrorMode+0x18
0007ff40  ffffffff
0007ff44  0000000c
0007ff48  00000000
0007ff4c  00018fb8
0007ff50  000000ec
0007ff54  00000001
0007ff58  000d9298
0007ff5c  0007ffc0
0007ff60  0101490c Explorer!ModuleEntry+0x6d
0007ff64  00000000
0007ff68  00000000
0007ff6c  000207fa
0007ff70  00000001
0007ff74  00000000
0007ff78  00000000
0007ff7c  00000044
0007ff80  0002084c
0007ff84  0002082c
0007ff88  000207fc
0007ff8c  00000000
0007ff90  00000000
0007ff94  00000000
0007ff98  00000000
0007ff9c  f60e87fc
0007ffa0  00000002
0007ffa4  021a006a
0007ffa8  00000001
0007ffac  00000001
0007ffb0  00000000
0007ffb4  00000000
0007ffb8  00000000
0007ffbc  00000000
0007ffc0  0007fff0
0007ffc4  77e6f23b kernel32!BaseProcessStart+0x23
0007ffc8  00000000
0007ffcc  00000000
0007ffd0  7ffde000
0007ffd4  00000000
0007ffd8  0007ffc8
0007ffdc  b9a94ce4
0007ffe0  ffffffff
0007ffe4  77e61a60 kernel32!_except_handler3
0007ffe8  77e6f248 kernel32!`string'+0x88
0007ffec  00000000
0007fff0  00000000
0007fff4  00000000
0007fff8  010148a4 Explorer!ModuleEntry
0007fffc  00000000
00080000  78746341
...
...
...

Because complete memory dumps contain only physical memory contents some pages of raw stack data can be in page files and therefore unavailable.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 42b)

Wednesday, December 19th, 2007

Here is another example of Wait Chain pattern where objects are critical sections. 

WinDbg can detect them if we use !analyze -v -hang command but it detects only one and not necessarily the longest or widest chain in cases with multiple wait chains:

DERIVED_WAIT_CHAIN:

Dl Eid Cid     WaitType
-- --- ------- --------------------------
   2   8d8.90c Critical Section       -->
   4   8d8.914 Critical Section       -->
   66  8d8.f9c Unknown

Looking at threads we see this chain and we also see that the final thread is blocked waiting for socket. 

 0:167> ~~[90c]kvL
ChildEBP RetAddr  Args to Child             
00bbfd9c 7c942124 7c95970f 00000ea0 00000000 ntdll!KiFastSystemCallRet
00bbfda0 7c95970f 00000ea0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
00bbfddc 7c959620 00000000 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x19c
00bbfdfc 6748d2f9 06018b50 00000000 00000000 ntdll!RtlEnterCriticalSection+0xa8



00bbffb8 7c82608b 00315218 00000000 00000000 msvcrt!_endthreadex+0xa3
00bbffec 00000000 77b9b4bc 00315218 00000000 kernel32!BaseThreadStart+0×34

0:167> ~~[914]kvL 100
ChildEBP RetAddr  Args to Child             
00dbf1cc 7c942124 7c95970f 000004b0 00000000 ntdll!KiFastSystemCallRet
00dbf1d0 7c95970f 000004b0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
00dbf20c 7c959620 00000000 00000004 0031abcc ntdll!RtlpWaitOnCriticalSection+0x19c
00dbf22c 6748d244 0031abd8 003174e0 00dbf254 ntdll!RtlEnterCriticalSection+0xa8



00dbffb8 7c82608b 00315218 00000000 00000000 msvcrt!_endthreadex+0xa3
00dbffec 00000000 77b9b4bc 00315218 00000000 kernel32!BaseThreadStart+0×34

0:167> ~~[f9c]kvL 100
ChildEBP RetAddr  Args to Child             
0fe2a09c 7c942124 71933a09 00000b50 00000001 ntdll!KiFastSystemCallRet
0fe2a0a0 71933a09 00000b50 00000001 0fe2a0c8 ntdll!NtWaitForSingleObject+0xc
0fe2a0dc 7194576e 00000b50 00000234 00000000 mswsock!SockWaitForSingleObject+0x19d
0fe2a154 71a12679 00000234 0fe2a1b4 00000001 mswsock!WSPRecv+0x203
0fe2a190 62985408 00000234 0fe2a1b4 00000001 WS2_32!WSARecv+0x77
0fe2a1d0 6298326b 00000234 0274ebc6 00000810 component!wait+0x338
...
...
...
0fe2ffb8 7c82608b 060cfc70 00000000 00000000 msvcrt!_endthreadex+0xa3
0fe2ffec 00000000 77b9b4bc 060cfc70 00000000 kernel32!BaseThreadStart+0x34

If we look at all held critical sections we would see another thread that blocked +125 other threads:

0:167> !locks

CritSec +31abd8 at 0031abd8
WaiterWoken        No
LockCount          6
RecursionCount     1
OwningThread       f9c
EntryCount         0
ContentionCount    17
*** Locked

CritSec +51e4bd8 at 051e4bd8
WaiterWoken        No
LockCount          125
RecursionCount     1
OwningThread       830
EntryCount         0
ContentionCount    7d
*** Locked

CritSec +5f40620 at 05f40620
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       920
EntryCount         0
ContentionCount    0
*** Locked

CritSec +60b6320 at 060b6320
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       8a8
EntryCount         0
ContentionCount    1
*** Locked

CritSec +6017c60 at 06017c60
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       914
EntryCount         0
ContentionCount    0
*** Locked

CritSec +6018b50 at 06018b50
WaiterWoken        No
LockCount          3
RecursionCount     1
OwningThread       914
EntryCount         0
ContentionCount    3
*** Locked

CritSec +6014658 at 06014658
WaiterWoken        No
LockCount          2
RecursionCount     1
OwningThread       928
EntryCount         0
ContentionCount    2
*** Locked

0:167> ~~[830]kvL 100
ChildEBP RetAddr  Args to Child             
0ff2f300 7c942124 7c95970f 000004b0 00000000 ntdll!KiFastSystemCallRet
0ff2f304 7c95970f 000004b0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0ff2f340 7c959620 00000000 00000004 0031abcc ntdll!RtlpWaitOnCriticalSection+0x19c
0ff2f360 6748d244 0031abd8 003174e0 0ff2f388 ntdll!RtlEnterCriticalSection+0xa8



0ff2ffb8 7c82608b 060cf9a0 00000000 00000000 msvcrt!_endthreadex+0xa3
0ff2ffec 00000000 77b9b4bc 060cf9a0 00000000 kernel32!BaseThreadStart+0×34

Searching for any thread waiting for critical section 051e4bd8 gives us:

   8  Id: 8d8.924 Suspend: 1 Teb: 7ffd5000 Unfrozen
ChildEBP RetAddr  Args to Child             
011ef8e0 7c942124 7c95970f 00000770 00000000 ntdll!KiFastSystemCallRet
011ef8e4 7c95970f 00000770 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
011ef920 7c959620 00000000 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x19c
011ef940 677b209d 051e4bd8 011efa0c 057bd36c ntdll!RtlEnterCriticalSection+0xa8



011effb8 7c82608b 00315510 00000000 00000000 msvcrt!_endthreadex+0xa3
011effec 00000000 77b9b4bc 00315510 00000000 kernel32!BaseThreadStart+0×34

and we can construct yet another wait chain:

   8   8d8.924 Critical Section       -->
   67  8d8.830 Critical Section       -->
   66  8d8.f9c Unknown

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 41b)

Monday, December 17th, 2007

Now Manual Dump pattern as seen from process memory dumps. It is not possible to reliably identify manual dumps here because a debugger or another process dumper might have been attached to a process noninvasively and not leaving traces of intervention so we can only rely on the following information:

Comment field

Loading Dump File [C:\kktools\userdump8.1\x64\notepad.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Userdump generated complete user-mode minidump with Standalone function on COMPUTER-NAME'

Absence of exceptions

Loading Dump File [C:\UserDumps\notepad.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:31:31.000 2007 (GMT+0)
System Uptime: 0 days 0:45:11.148
Process Uptime: 0 days 0:00:36.000
....................
user32!ZwUserGetMessage+0xa:
00000000`76c8e6aa c3              ret
0:000> ~*kL

.  0  Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0029f618 00000000`76c8e6ea user32!ZwUserGetMessage+0xa
00000000`0029f620 00000000`ff2b6eca user32!GetMessageW+0x34
00000000`0029f650 00000000`ff2bcf8b notepad!WinMain+0x176
00000000`0029f6d0 00000000`76d7cdcd notepad!IsTextUTF8+0x24f
00000000`0029f790 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0029f7c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Wake debugger exception

Loading Dump File [C:\UserDumps\notepad2.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:35:37.000 2007 (GMT+0)
System Uptime: 0 days 0:49:13.806
Process Uptime: 0 days 0:02:54.000
....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(314.1b4): Wake debugger - code 80000007 (first/second chance not available)”

user32!ZwUserGetMessage+0xa:
00000000`76c8e6aa c3              ret

Break instruction exception

Loading Dump File [C:\UserDumps\notepad3.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:45:15.000 2007 (GMT+0)
System Uptime: 0 days 0:58:52.699
Process Uptime: 0 days 0:14:20.000
....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
ntdll!DbgBreakPoint:
00000000`76ecfdf0 cc              int     3

0:001> ~*kL

   0  Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0029f618 00000000`76c8e6ea user32!ZwUserGetMessage+0xa
00000000`0029f620 00000000`ff2b6eca user32!GetMessageW+0x34
00000000`0029f650 00000000`ff2bcf8b notepad!WinMain+0x176
00000000`0029f6d0 00000000`76d7cdcd notepad!IsTextUTF8+0x24f
00000000`0029f790 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0029f7c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

#  1  Id: 1b8.ec4 Suspend: 1 Teb: 000007ff`fffda000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`030df798 00000000`76f633e8 ntdll!DbgBreakPoint
00000000`030df7a0 00000000`76d7cdcd ntdll!DbgUiRemoteBreakin+0×38

00000000`030df7d0 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`030df800 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

The latter might also be some assertion statement in the code leading to a process crash like in the following instance of Dynamic Memory Corruption pattern (heap corruption):  

FAULTING_IP:
ntdll!DbgBreakPoint+0
77f813b1 cc int 3

EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 77f813b1 (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 09aef2ac
Parameter[2]: 09aeeee8

STACK_TEXT:
09aef0bc 77fb76aa ntdll!DbgBreakPoint
09aef0c4 77fa65c2 ntdll!RtlpBreakPointHeap+0×26
09aef2bc 77fb5367 ntdll!RtlAllocateHeapSlowly+0×212
09aef340 77fa64f6 ntdll!RtlDebugAllocateHeap+0xcb
09aef540 77fcc9e3 ntdll!RtlAllocateHeapSlowly+0×5a
09aef720 786f3f11 ntdll!RtlAllocateHeap+0×954
09aef730 786fd10e rpcrt4!operator new+0×12

09aef748 786fc042 rpcrt4!OSF_CCONNECTION::OSF_CCONNECTION+0×174
09aef79c 786fbe0d rpcrt4!OSF_CASSOCIATION::AllocateCCall+0xfa
09aef808 786fbd53 rpcrt4!OSF_BINDING_HANDLE::AllocateCCall+0×1cd
09aef83c 786f1f2f rpcrt4!OSF_BINDING_HANDLE::GetBuffer+0×28
09aef854 786f1ee4 rpcrt4!I_RpcGetBufferWithObject+0×6e
09aef860 786f1ea4 rpcrt4!I_RpcGetBuffer+0xb
09aef86c 78754762 rpcrt4!NdrGetBuffer+0×2b
09aefab8 796d78b5 rpcrt4!NdrClientCall2+0×3f9
09aefac8 796d7821 advapi32!LsarOpenPolicy2+0×14
09aefb1c 796d8b04 advapi32!LsaOpenPolicy+0xaf
09aefb84 796d8aa9 advapi32!LookupAccountSidInternal+0×63
09aefbac 0aaf5d8b advapi32!LookupAccountSidW+0×1f
WARNING: Stack unwind information not available. Following frames may be wrong.
09aeff40 0aad1665 ComponentDLL+0×35d8b
09aeff5c 3f69264c ComponentDLL+0×11665
09aeff7c 780085bc ComponentDLL+0×264c
09aeffb4 77e5438b msvcrt!_endthreadex+0xc1
09aeffec 00000000 kernel32!BaseThreadStart+0×52

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 42a)

Friday, December 14th, 2007

Wait Chain is another pattern and it is simply a sequence of causal relations between events: thread A is waiting for event E to happen that threads B, C or D are supposed to signal at some time in the future but they are all waiting for event F to happen that thread G is about to signal as soon as it finishes processing some critical task:

That subsumes various deadlock patterns too which are causal loops where thread A is waiting for event AB that thread B will signal as soon as thread A signals event BA thread B is waiting for:

In this context “Event” means any type of synchronization object, critical section, LPC/RPC reply or data arrival through some IPC channel and not only Win32 event object or kernel _KEVENT.

As the first example of Wait Chain pattern I show a process being terminated and waiting for the other thread to finish or in other words, considering thread termination as an event itself, the main process thread is waiting for the second thread object to be signaled. The second thread tries to cancel previous I/O request directed to some device. However that IRP is not cancellable and process hangs. This can be depicted on the following diagram:

where Thread A is our main thread waiting for Event A which is thread B itself waiting for I/O cancellation (Event B). Their stack traces are:

THREAD 8a3178d0  Cid 04bc.01cc  Teb: 7ffdf000 Win32Thread: bc1b6e70 WAIT: (Unknown) KernelMode Non-Alertable
    8af2c920  Thread
Not impersonating
DeviceMap                 e1032530
Owning Process            89ff8d88       Image:         processA.exe
Wait Start TickCount      80444          Ticks: 873 (0:00:00:13.640)
Context Switch Count      122                 LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.156
Win32 Start Address 0x010148a4
Start Address 0x77e617f8
Stack Init f3f29000 Current f3f28be8 Base f3f29000 Limit f3f25000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f3f28c00 80833465 nt!KiSwapContext+0x26
f3f28c2c 80829a62 nt!KiSwapThread+0x2e5
f3f28c74 8094c0ea nt!KeWaitForSingleObject+0x346 ; stack trace with arguments shows the first parameter as 8af2c920 
f3f28d0c 8094c63f nt!PspExitThread+0×1f0
f3f28d24 8094c839 nt!PspTerminateThreadByPointer+0×4b
f3f28d54 8088978c nt!NtTerminateProcess+0×125
f3f28d54 7c8285ec nt!KiFastCallEntry+0xfc

THREAD 8af2c920  Cid 04bc.079c  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    8af2c998  NotificationTimer
IRP List:
    8ad26260
: (0006,0220) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 e1032530
Owning Process            89ff8d88       Image:         processA.exe
Wait Start TickCount      81312          Ticks: 5 (0:00:00:00.078)
Context Switch Count      169                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×77da3ea5
Start Address 0×77e617ec
Stack Init f3e09000 Current f3e08bac Base f3e09000 Limit f3e05000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f3e08bc4 80833465 nt!KiSwapContext+0×26
f3e08bf0 80828f0b nt!KiSwapThread+0×2e5
f3e08c38 808ea7a4 nt!KeDelayExecutionThread+0×2ab
f3e08c68 8094c360 nt!IoCancelThreadIo+0×62
f3e08cf0 8094c569 nt!PspExitThread+0×466
f3e08cfc 8082e0b6 nt!PsExitSpecialApc+0×1d
f3e08d4c 80889837 nt!KiDeliverApc+0×1ae
f3e08d4c 7c8285ec nt!KiServiceExit+0×56

By inspecting IRP we can see a device it was directed to, see that it has cancel bit but doesn’t have a cancel routine:

0: kd> !irp 8ad26260  1
Irp is active with 5 stacks 4 is current (= 0x8ad2633c)
 No Mdl: No System Buffer: Thread 8af2c920:  Irp stack trace. 
Flags = 00000000
ThreadListEntry.Flink = 8af2cb28
ThreadListEntry.Blink = 8af2cb28
IoStatus.Status = 00000000
IoStatus.Information = 00000000
RequestorMode = 00000001
Cancel = 01
CancelIrql = 0
ApcEnvironment = 00
UserIosb = 77ecb700
UserEvent = 00000000
Overlay.AsynchronousParameters.UserApcRoutine = 00000000
Overlay.AsynchronousParameters.UserApcContext = 00000000
Overlay.AllocationSize = 00000000 - 00000000
CancelRoutine = 00000000  
UserBuffer = 77ecb720
&Tail.Overlay.DeviceQueueEntry = 8ad262a0
Tail.Overlay.Thread = 8af2c920
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = 00000000
Tail.Overlay.ListEntry.Blink = 00000000
Tail.Overlay.CurrentStackLocation = 8ad2633c
Tail.Overlay.OriginalFileObject = 89ff8920
Tail.Apc = 00000000
Tail.CompletionKey = 00000000
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000
   

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
>[  c, 2]   0  1 8ab20388 89ff8920 00000000-00000000    pending
        \Device\DeviceA

   Args: 00000020 00000017 00000000 00000000
 [  c, 2]   0  0 8affa4b8 89ff8920 00000000-00000000   
        \Device\DeviceB
   Args: 00000020 00000017 00000000 00000000

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 41a)

Wednesday, December 12th, 2007

Some memory dumps are generated on purpose to troubleshoot process and system hangs. They are usually called Manual Dumps, manual crash dumps or manual memory dumps. Kernel, complete and kernel mini dumps can be generated using the famous keyboard method described in the following Microsoft article which has been recently updated and contains the fix for USB keyboards:

http://support.microsoft.com/kb/244139

The crash dump will show E2 bugcheck:

MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Various tools including Citrix SystemDump reuse E2 bug check code and its arguments.  There are many other 3rd-party tools used to bugcheck Windows OS such as BANG! from OSR or NotMyFault from Sysinternals. The old one is crash.exe that loads crashdrv.sys and uses the following bugcheck:

Unknown bugcheck code (69696969)
Unknown bugcheck description
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

In a memory dump you would see its characteristic stack trace pointing to crashdrv module: 

STACK_TEXT:
b5b3ebe0 f615888d nt!KeBugCheck+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
b5b3ebec f61584e3 crashdrv+0x88d
b5b3ec00 8041eec9 crashdrv+0x4e3
b5b3ec14 804b328a nt!IopfCallDriver+0x35
b5b3ec28 804b40de nt!IopSynchronousServiceTail+0x60
b5b3ed00 804abd0a nt!IopXxxControlFile+0x5d6
b5b3ed34 80468379 nt!NtDeviceIoControlFile+0x28
b5b3ed34 77f82ca0 nt!KiSystemService+0xc9
0006fed4 7c5794f4 ntdll!NtDeviceIoControlFile+0xb
0006ff38 01001a74 KERNEL32!DeviceIoControl+0xf8
0006ff70 01001981 crash+0x1a74
0006ff80 01001f93 crash+0x1981
0006ffc0 7c5989a5 crash+0x1f93
0006fff0 00000000 KERNEL32!BaseProcessStart+0x3d

Sometimes various hardware buttons are used to trigger NMI and generate a crash dump when keyboard is not available. The bugcheck will be:

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction. The hardware supplier should be called.
Arguments:
Arg1: 004f4454
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Critical process termination such as session 0 csrss.exe is used to force a memory dump:

CRITICAL_OBJECT_TERMINATION (f4)
A process or thread crucial to system operation has unexpectedly exited or been terminated.
Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function.
Arguments:
Arg1: 00000003, Process
Arg2: 8a090d88, Terminating object
Arg3: 8a090eec, Process image file name
Arg4: 80967b74, Explanatory message (ascii)

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 40a)

Monday, December 10th, 2007

In Advanced Windows Debugging book I encountered some thread stacks related to debugger events like Exit a Process, Load or Unload a Module and realized that I’ve seen process crash dumps with such stacks traces. These thread stacks are not normally encountered in healthy process dumps and, statistically speaking, when a process terminates or unloads a library the chances to save a memory dump manually using process dumpers like userdump.exe or Task Manager in Vista are very low unless an interactive debugger was attached or breakpoints were set in advance. Therefore the presence of such threads in a captured crash dump usually indicates some problem or at least focuses attention to the procedure used to save a dump. Such pattern merits its own name: Special Stack Trace.

For example, one process dump had the following stack trace showing process termination initiated from .NET runtime:

STACK_TEXT:
0012fc2c 7c827c1b ntdll!KiFastSystemCallRet
0012fc30 77e668c3 ntdll!NtTerminateProcess+0xc
0012fd24 77e66905 KERNEL32!_ExitProcess+0x63
0012fd38 01256d9b KERNEL32!ExitProcess+0x14
0012ff60 01256dc7 mscorwks!SafeExitProcess+0x11a
0012ff6c 011c5fa4 mscorwks!DisableRuntime+0xd0
0012ffb0 79181b5f mscorwks!_CorExeMain+0x8c
0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
0012fff0 00000000 KERNEL32!BaseProcessStart+0x23

The original problem was an error message box and the application disappeared when a user dismissed the message. How the dump was saved? Someone advised to attach NTSD to that process, hit ‘g’ and then save the memory dump when the process breaks into the debugger again. So the problem was already gone by that time and the better way would have been to create the manual user dump of that process when it was displaying the error message.

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 6)

Friday, December 7th, 2007

Previous parts were dealing with exceptions in kernel mode. In this and next parts I’m going to investigate the flow of exception processing in user mode. In part 1 I mentioned that interrupts and exceptions generated when CPU executes code in user mode require a processor to switch the current user mode stack to kernel mode stack. This can be seen when we have a user debugger attached and it gets an exception notification called first chance exception. Because of the stack switch we don’t see any saved processor context on user mode thread stack when WinDbg breaks on first-chance exception in TestDefaultDebugger64:

0:000> r
rax=0000000000000000 rbx=0000000000000001 rcx=000000000012fd80
rdx=00000000000003e8 rsi=000000000012fd80 rdi=0000000140033fe0
rip=0000000140001690 rsp=000000000012f198 rbp=0000000000000111
 r8=0000000000000000  r9=0000000140001690 r10=0000000140001690
r11=000000000012f260 r12=0000000000000000 r13=00000000000003e8
r14=0000000000000110 r15=0000000000000001
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1:
00000001`40001690 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> kL 100
Child-SP          RetAddr           Call Site
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0x180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0x32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0x60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0x38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0x59
00000000`0012f560 00000000`77c4337a user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f630 00000000`77c4341b user32!SendMessageWorker+0x68c
00000000`0012f6d0 000007ff`7f07c89f user32!SendMessageW+0x9d
00000000`0012f720 000007ff`7f07f2e1 comctl32!Button_ReleaseCapture+0x14f
00000000`0012f750 00000000`77c43abc comctl32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f980 00000000`77c3966a user32!DispatchMessageWorker+0x3af
00000000`0012f9f0 00000001`40007148 user32!IsDialogMessageW+0x256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0x38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0x28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0x3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0x67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0x23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0x3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0x1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0x1c6
00000000`0012fd50 00000001`4002ccb1 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3
00000000`0012fe80 00000001`40016150 TestDefaultDebugger64!AfxWinMain+0x75
00000000`0012fec0 00000000`77d5964c TestDefaultDebugger64!__tmainCRTStartup+0x260
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

0:000> dqs 000000000012f198-20 000000000012f198+20
00000000`0012f178  00000001`4000bc25 TestDefaultDebugger64!CWnd::ReflectLastMsg+0x65
00000000`0012f180  00000000`00080334
00000000`0012f188  00000000`00000006
00000000`0012f190  00000000`0000000d
00000000`0012f198  00000001`40004ba0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1a0  ffffffff`fffffffe
00000000`0012f1a8  00000000`00000000
00000000`0012f1b0  00000000`00000000
00000000`0012f1b8  00000000`00000000

We see that there are no saved SS:RSP, RFLAGS, CS:RIP registers which we see on a stack if an exception happens in kernel mode as shown in part 2. If we bugcheck our system using SystemDump tool to generate complete memory dump at that time we can look later at the whole thread that experienced exception in user mode and its user mode and kernel mode stacks:

kd> !process fffffadfe7055c20 2
PROCESS fffffadfe7055c20
    SessionId: 0  Cid: 0c64    Peb: 7fffffd7000  ParentCid: 07b0
    DirBase: 27e3d000  ObjectTable: fffffa800073a550  HandleCount:  55.
    Image: TestDefaultDebugger64.exe

        THREAD fffffadfe78f2bf0  Cid 0c64.0c68  Teb: 000007fffffde000 Win32Thread: fffff97ff4d71010 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
            fffffadfdf7b6fc0  SynchronizationEvent

        THREAD fffffadfe734c3d0  Cid 0c64.0c88  Teb: 000007fffffdc000 Win32Thread: 0000000000000000 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
            fffffadfe734c670  Semaphore Limit 0x2

kd> .thread /r /p fffffadfe78f2bf0
Implicit thread is now fffffadf`e78f2bf0
Implicit process is now fffffadf`e7055c20
Loading User Symbols

kd> kL 100
Child-SP          RetAddr           Call Site
fffffadf`df7b6d30 fffff800`0103b063 nt!KiSwapContext+0x85
fffffadf`df7b6eb0 fffff800`0103c403 nt!KiSwapThread+0xc3
fffffadf`df7b6ef0 fffff800`013a9dc1 nt!KeWaitForSingleObject+0x528
fffffadf`df7b6f80 fffff800`01336dcf nt!DbgkpQueueMessage+0x281
fffffadf`df7b7130 fffff800`01011c69 nt!DbgkForwardException+0x1c5
fffffadf`df7b74f0 fffff800`0104146f nt!KiDispatchException+0x264
fffffadf`df7b7af0 fffff800`010402e1 nt!KiExceptionExit
fffffadf`df7b7c70 00000001`40001690 nt!KiPageFault+0×1e1
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0×180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0×32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0×60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0×38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0×59
00000000`0012f560 00000000`77c4337a USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f630 00000000`77c4341b USER32!SendMessageWorker+0×68c
00000000`0012f6d0 000007ff`7f07c89f USER32!SendMessageW+0×9d
00000000`0012f720 000007ff`7f07f2e1 COMCTL32!Button_ReleaseCapture+0×14f
00000000`0012f750 00000000`77c43abc COMCTL32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f980 00000000`77c3966a USER32!DispatchMessageWorker+0×3af
00000000`0012f9f0 00000001`40007148 USER32!IsDialogMessageW+0×256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0×38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0×28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0×3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0×67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0×23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0×3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0×1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0×1c6
00000000`0012fd50 00000000`00000000 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3

Dumping kernel mode stack of our thread shows that the processor saved registers there:

kd> dqs fffffadf`df7b7c70  fffffadf`df7b7c70+200
fffffadf`df7b7c70  fffffadf`e78f2bf0
fffffadf`df7b7c78  00000000`00000000
fffffadf`df7b7c80  fffffadf`e78f2b01
fffffadf`df7b7c88  00000000`00000020
...
...
...
fffffadf`df7b7d90  00000000`00000000
fffffadf`df7b7d98  00000000`00000000
fffffadf`df7b7da0  00000000`00000000
fffffadf`df7b7da8  00000000`00000000
fffffadf`df7b7db0  00000000`001629b0
fffffadf`df7b7db8  00000000`00000001
fffffadf`df7b7dc0  00000000`00000001
fffffadf`df7b7dc8  00000000`00000111 ; RBP saved by KiPageFault
fffffadf`df7b7dd0  00000000`00000006 ; Page-Fault Error Code
fffffadf`df7b7dd8  00000001`40001690 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1 ; RIP
fffffadf`df7b7de0  00000000`00000033 ; CS
fffffadf`df7b7de8  00000000`00010246 ; RFLAGS
fffffadf`df7b7df0  00000000`0012f198 ; RSP
fffffadf`df7b7df8  00000000`0000002b ; SS
fffffadf`df7b7e00  00000000`0000027f
fffffadf`df7b7e08  00000000`00000000
fffffadf`df7b7e10  00000000`00000000
fffffadf`df7b7e18  0000ffff`00001f80
fffffadf`df7b7e20  00000000`00000000
fffffadf`df7b7e28  00000000`00000000
fffffadf`df7b7e30  00000000`00000000
fffffadf`df7b7e38  00000000`00000000


kd> .asm no_code_bytes
Assembly options: no_code_bytes

kd> u KiPageFault
nt!KiPageFault:
fffff800`01040100 push    rbp
fffff800`01040101 sub     rsp,158h
fffff800`01040108 lea     rbp,[rsp+80h]
fffff800`01040110 mov     byte ptr [rbp-55h],1
fffff800`01040114 mov     qword ptr [rbp-50h],rax
fffff800`01040118 mov     qword ptr [rbp-48h],rcx
fffff800`0104011c mov     qword ptr [rbp-40h],rdx
fffff800`01040120 mov     qword ptr [rbp-38h],r8

Error code 6 is 110 in binary and volume 3A of Intel manual tells us that “the fault was caused by a non-present page” (bit 0 is cleared), “the access causing the fault was a write” (bit 1 is set) and “the access causing the fault originated when the processor was executing in user mode” (bit 2 is set).

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 9d)

Thursday, November 29th, 2007

Finally I got a good example of Deadlock pattern involving LPC. In the stack trace below svchost.exe thread (we call it thread A) receives an LPC call and dispatches it to componentA module which makes another LPC call (MessageId 000135b8) and waiting for a reply: 

THREAD 89143020  Cid 09b4.10dc  Teb: 7ff91000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8914320c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 000135b8:
Current LPC port d64a5328
Not impersonating
DeviceMap                 d64028f0
Owning Process            891b8b80       Image:         svchost.exe
Wait Start TickCount      237408         Ticks: 1890 (0:00:00:29.531)
Context Switch Count      866            
UserTime                  00:00:00.031
KernelTime                00:00:00.015
Win32 Start Address 0×000135b2
LPC Server thread working on message Id 135b2
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init b91f9000 Current b91f8c08 Base b91f9000 Limit b91f6000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
b91f8c20 8083e6a2 nt!KiSwapContext+0×26
b91f8c4c 8083f164 nt!KiSwapThread+0×284
b91f8c94 8093983f nt!KeWaitForSingleObject+0×346
b91f8d50 80834d3f nt!NtRequestWaitReplyPort+0×776
b91f8d50 7c94ed54 nt!KiFastCallEntry+0xfc
02bae928 7c941c94 ntdll!KiFastSystemCallRet
02bae92c 77c42700 ntdll!NtRequestWaitReplyPort+0xc
02bae984 77c413ba RPCRT4!LRPC_CCALL::SendReceive+0×230
02bae990 77c42c7f RPCRT4!I_RpcSendReceive+0×24
02bae9a4 77cb5d63 RPCRT4!NdrSendReceive+0×2b
02baec48 674825b6 RPCRT4!NdrClientCall+0×334

02baec5c 67486776 componentA!bar+0×16



02baf8d4 77c40f3b componentA!foo+0×157
02baf8f8 77cb23f7 RPCRT4!Invoke+0×30
02bafcf8 77cb26ed RPCRT4!NdrStubCall2+0×299
02bafd14 77c409be RPCRT4!NdrServerCall2+0×19
02bafd48 77c4093f RPCRT4!DispatchToStubInCNoAvrf+0×38
02bafd9c 77c40865 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
02bafdc0 77c434b1 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
02bafdfc 77c41bb3 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
02bafe20 77c45458 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
02baff84 77c2778f RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
02baff8c 77c2f7dd RPCRT4!RecvLotsaCallsWrapper+0xd
02baffac 77c2de88 RPCRT4!BaseCachedThreadRoutine+0×9d
02baffb8 7c82608b RPCRT4!ThreadStartRoutine+0×1b
02baffec 00000000 kernel32!BaseThreadStart+0×34

We search for that LPC message to find the server thread:

1: kd> !lpc message 000135b8
Searching message 135b8 in threads ...
    Server thread 89115db0 is working on message 135b8
Client thread 89143020 waiting a reply from 135b8  


                       

It belongs to Process.exe (we call it thread B):

1: kd> !thread 89115db0 0x16
THREAD 89115db0  Cid 098c.0384  Teb: 7ff79000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a114628  SynchronizationEvent
Not impersonating
DeviceMap                 d64028f0
Owning Process            8a2c9d88       Image:         Process.exe
Wait Start TickCount      237408         Ticks: 1890 (0:00:00:29.531)
Context Switch Count      1590            
UserTime                  00:00:03.265
KernelTime                00:00:01.671
Win32 Start Address 0x000135b8
LPC Server thread working on message Id 135b8
Start Address kernel32!BaseThreadStartThunk (0x7c82b5f3)
Stack Init b952d000 Current b952cc60 Base b952d000 Limit b952a000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b952cc78 8083e6a2 89115e28 89115db0 89115e58 nt!KiSwapContext+0x26
b952cca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0x284
b952ccec 8092db70 8a114628 00000006 ffffff01 nt!KeWaitForSingleObject+0x346
b952cd50 80834d3f 00000a7c 00000000 00000000 nt!NtWaitForSingleObject+0x9a
b952cd50 7c94ed54 00000a7c 00000000 00000000 nt!KiFastCallEntry+0xfc
22aceb48 7c942124 7c95970f 00000a7c 00000000 ntdll!KiFastSystemCallRet
22aceb4c 7c95970f 00000a7c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
22aceb88 7c959620 00000000 00000004 00002000 ntdll!RtlpWaitOnCriticalSection+0x19c
22aceba8 1b005744 06d30940 1b05ea80 06d30940 ntdll!RtlEnterCriticalSection+0xa8
22acebb0 1b05ea80 06d30940 feffffff 0cd410c0 componentB!bar+0xb



22acf8b0 77c40f3b 00080002 000800e2 00000001 componentB!foo+0xeb
22acf8e0 77cb23f7 0de110dc 22acfac8 00000007 RPCRT4!Invoke+0×30
22acfce0 77cb26ed 00000000 00000000 19f38f94 RPCRT4!NdrStubCall2+0×299
22acfcfc 77c409be 19f38f94 17316ef0 19f38f94 RPCRT4!NdrServerCall2+0×19
22acfd30 77c75e41 0de1dc58 19f38f94 22acfdec RPCRT4!DispatchToStubInCNoAvrf+0×38
22acfd48 77c4093f 0de1dc58 19f38f94 22acfdec RPCRT4!DispatchToStubInCAvrf+0×14
22acfd9c 77c40865 00000041 00000000 0de2b398 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
22acfdc0 77c434b1 19f38f94 00000000 0de2b398 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
22acfdfc 77c41bb3 1beeaec8 16b96f50 1baeef00 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
22acfe20 77c45458 16b96f88 22acfe38 1beeaec8 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127

0×16 flags for !thread extension command are used to temporarily set the process context to the owning process and show the first three function call parameters. We see that the thread B is waiting for the critical section 06d30940 and we use user space !locks extension command to find who owns it after switching process context:

1: kd> .process /r /p 8a2c9d88
Implicit process is now 8a2c9d88
Loading User Symbols

1: kd> !ntsdexts.locks

CritSec +6d30940 at 06d30940
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       d6c
EntryCount         0
ContentionCount    1
*** Locked

Now we try to find a thread with TID d6c (thread C):

1: kd> !thread -t d6c
Looking for thread Cid = d6c ...
THREAD 890d8bb8  Cid 098c.0d6c  Teb: 7ff71000 Win32Thread: bc23cc20 WAIT: (Unknown) UserMode Non-Alertable
    890d8da4  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 000135ea:
Current LPC port d649a678
Not impersonating
DeviceMap                 d64028f0
Owning Process            8a2c9d88       Image:         Process.exe
Wait Start TickCount      237641         Ticks: 1657 (0:00:00:25.890)
Context Switch Count      2102                 LargeStack
UserTime                  00:00:00.734
KernelTime                00:00:00.234
Win32 Start Address msvcrt!_endthreadex (0×77b9b4bc)
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init ba91d000 Current ba91cc08 Base ba91d000 Limit ba919000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ba91cc20 8083e6a2 890d8c30 890d8bb8 890d8c60 nt!KiSwapContext+0×26
ba91cc4c 8083f164 890d8da4 890d8d78 890d8bb8 nt!KiSwapThread+0×284
ba91cc94 8093983f 890d8da4 00000011 8a2c9d01 nt!KeWaitForSingleObject+0×346
ba91cd50 80834d3f 000008bc 19c94f00 19c94f00 nt!NtRequestWaitReplyPort+0×776
ba91cd50 7c94ed54 000008bc 19c94f00 19c94f00 nt!KiFastCallEntry+0xfc
2709ebf4 7c941c94 77c42700 000008bc 19c94f00 ntdll!KiFastSystemCallRet
2709ebf8 77c42700 000008bc 19c94f00 19c94f00 ntdll!NtRequestWaitReplyPort+0xc
2709ec44 77c413ba 2709ec80 2709ec64 77c42c7f RPCRT4!LRPC_CCALL::SendReceive+0×230
2709ec50 77c42c7f 2709ec80 779b2770 2709f06c RPCRT4!I_RpcSendReceive+0×24
2709ec64 77cb219b 2709ecac 1957cfe4 1957ab38 RPCRT4!NdrSendReceive+0×2b
2709f04c 779b43a3 779b2770 779b1398 2709f06c RPCRT4!NdrClientCall2+0×22e




2709ff84 77b9b530 26658fb0 00000000 00000000 ComponentC!foo+0×18d
2709ffb8 7c82608b 26d9af70 00000000 00000000 msvcrt!_endthreadex+0xa3
2709ffec 00000000 77b9b4bc 26d9af70 00000000 kernel32!BaseThreadStart+0×34

We see that thread C makes another LPC call (MessageId 000135e) and waiting for a reply. Let’s find the server thread processing the message (thread D):

1: kd> !lpc message 000135ea
Searching message 135ea in threads ...
Client thread 890d8bb8 waiting a reply from 135ea                         
    Server thread 89010020 is working on message 135ea


1: kd> !thread 89010020 16
THREAD 89010020  Cid 09b4.1530  Teb: 7ff93000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8903ba28  Mutant - owning thread 89143020
Not impersonating
DeviceMap                 d64028f0
Owning Process            891b8b80       Image:         svchost.exe
Wait Start TickCount      237641         Ticks: 1657 (0:00:00:25.890)
Context Switch Count      8            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×000135ea
LPC Server thread working on message Id 135ea
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init b9455000 Current b9454c60 Base b9455000 Limit b9452000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b9454c78 8083e6a2 89010098 89010020 890100c8 nt!KiSwapContext+0×26
b9454ca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0×284
b9454cec 8092db70 8903ba28 00000006 00000001 nt!KeWaitForSingleObject+0×346
b9454d50 80834d3f 00000514 00000000 00000000 nt!NtWaitForSingleObject+0×9a
b9454d50 7c94ed54 00000514 00000000 00000000 nt!KiFastCallEntry+0xfc
02b5f720 7c942124 75fdbe44 00000514 00000000 ntdll!KiFastSystemCallRet
02b5f724 75fdbe44 00000514 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
02b5f744 75fdc57f 000e6014 000da62c 02b5fca0 ComponentD!bar+0×42



02b5f8c8 77c40f3b 000d0a48 02b5fc90 00000001 ComponentD!foo+0×49
02b5f8f8 77cb23f7 75fdf8f2 02b5fae0 00000007 RPCRT4!Invoke+0×30
02b5fcf8 77cb26ed 00000000 00000000 000d4f24 RPCRT4!NdrStubCall2+0×299
02b5fd14 77c409be 000d4f24 000b5d70 000d4f24 RPCRT4!NdrServerCall2+0×19
02b5fd48 77c4093f 75fff834 000d4f24 02b5fdec RPCRT4!DispatchToStubInCNoAvrf+0×38
02b5fd9c 77c40865 00000005 00000000 7600589c RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
02b5fdc0 77c434b1 000d4f24 00000000 7600589c RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
02b5fdfc 77c41bb3 000d3550 000a78d0 001054b8 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
02b5fe20 77c45458 000a7908 02b5fe38 000d3550 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
02b5ff84 77c2778f 02b5ffac 77c2f7dd 000a78d0 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
02b5ff8c 77c2f7dd 000a78d0 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd
02b5ffac 77c2de88 0008ae00 02b5ffec 7c82608b RPCRT4!BaseCachedThreadRoutine+0×9d
02b5ffb8 7c82608b 000d5c20 00000000 00000000 RPCRT4!ThreadStartRoutine+0×1b
02b5ffec 00000000 77c2de6d 000d5c20 00000000 kernel32!BaseThreadStart+0×34

We see that thread D is waiting for the mutant object owned by thread A (89143020). Therefore we have a deadlock spanning 2 process boundaries via RPC/LPC calls with the following dependency graph:

A (svchost.exe) LPC-> B (Process.exe) CritSec-> C (Process.exe) LPC-> D (svchost.exe) Obj-> A (svchost.exe)

- Dmitry Vostokov @ DumpAnalysis.org -

Five golden rules of troubleshooting

Monday, November 26th, 2007

It is difficult to analyze a problem when you have crash dumps and/or traces from various tracing tools and supporting information you have is incomplete or missing. After doing crash dump and trace analysis including ETW-based traces for more than 4 years I came up with this easy to remember 4WS questions to ask when you send or request traces and memory dumps:

What - What had happened or had been observed? Crash or hang, for example?

When - When did the problem happen if traces were recorded for hours?

Where - What server or workstation had been used for tracing or where memory dumps came from? For example, one trace is from a primary server and two others are from backup servers or one trace is from a client workstation and the other is from a server. 

Why - Why did a customer or a support engineer request a dump or a trace? This could shed the light on various assumptions including presuppositions hidden in problem description.  

Supporting information - needed to find a needle in a hay: process id, thread id, etc. Also, the answer to the following question is important: how dumps and traces were created?

Every trace or memory dump shall be accompanied by 4WS answers.  

4WS rule can be applied to any troubleshooting because even the problem description itself is some kind of a trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Critical thinking when troubleshooting

Thursday, November 22nd, 2007

Faulty thinking happens all the time in technical support environments partly due to hectic and demanding business realities.

Simple*ology book pointed me to this website:

http://www.fallacyfiles.org/ 

which taxonomically organizes fallacies:

http://www.fallacyfiles.org/taxonomy.html

For example, False Cause. Technical examples might include false causes inferred from trace analysis, customer problem description that includes steps to reproduce the problem, etc. This also applies to debugging and importance of thinking skills has been emphasized in the following book:

Debugging by Thinking: A Multidisciplinary Approach

Surface-level of basic crash dump analysis is less influenced by false cause fallacies because it doesn’t have explicitly recorded sequence of events although some caution should be exercised during detailed analysis of thread waiting times and other historical information.   

Warning: when exercising critical thinking recursively we need to stop at the right time to avoid paralysis of analysis :-) 

- Dmitry Vostokov @ DumpAnalysis.org

Crash Dump Analysis Patterns (Part 37)

Wednesday, November 21st, 2007

Some bugs are fixed using brute-force approach via putting an exception handler to catch access violations and other exceptions. Long time ago I saw one such “incredible fix” when the image processing application was crashing after approximately Nth heap free runtime call. To ignore crashes a SEH handler was put in place but the application started to crash in different places. Therefore the additional fix was to skip free calls when approaching N and resume afterwards. The application started to crash less frequently.

Here getting Early Crash Dump when a first-chance exception happens can help in component identification before corruption starts spreading across data. Recall that when an access violation happens in a process thread in user mode the system generates the first-chance exception which can be caught by an attached debugger and if there is no such debugger the system tries to find an exception handler and if that exception handler catches and dismisses the exception the thread resumes its normal execution path. If there are no such handlers found the system generates the so called second-chance exception with the same exception context to notify the attached debugger and if it is not attached a default thread exception handler usually saves a postmortem user dump.

You can get first-chance exception memory dumps with:

Here is an example configuration rule for crashes in Debug Diagnostic tool for TestDefaultDebugger process (Unconfigured First Chance Exceptions option is set to Full Userdump):

    

When we push the big crash button in TestDefaultDebugger dialog box two crash dumps are saved, with first and second-chance exceptions pointing to the same code:

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_27PM__2__First chance exception 0XC0000005.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. First chance exception 0XC0000005′
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Wed Nov 21 16:28:27.000 2007 (GMT+0)
System Uptime: 0 days 23:45:34.711
Process Uptime: 0 days 0:01:09.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(10dc.590): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70
eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_34PM__693__ Second_Chance_Exception_C0000005.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. Second_Chance_Exception_C0000005
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Wed Nov 21 16:28:34.000 2007 (GMT+0)
System Uptime: 0 days 23:45:39.313
Process Uptime: 0 days 0:01:16.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(10dc.590): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70
eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

- Dmitry Vostokov @ DumpAnalysis.org -

NotMyLeak

Monday, November 19th, 2007

To troubleshoot and study memory leaks the following tool called NotMyLeak will be released soon. It injects different kinds of leaks into specified processes and system:

  • Process heap
  • Runtime library
  • Performance counters
  • Kernel paged pool
  • Kernel nonpaged pool
  • IRP
  • Handles
  • PTE
  • etc…

The idea is to model various real-time leaks, analyze memory dumps and then apply discovered patterns to crash dump analysis of memory dumps coming from real-world systems.   

The draft GUI (subject to change):

Note: the tool name prefix NotMy… was inspired by the name of Mark Russinovich’s tool called NotMyFault.

- Dmitry Vostokov @ DumpAnalysis.org

Windows Internals book

Monday, November 19th, 2007

Scheduled to be updated with Windows Vista and Windows Server 2008 details:

Windows® Internals, Fifth Edition

- Dmitry Vostokov @ DumpAnalysis.org

Filtering processes

Monday, November 19th, 2007

When I analyze memory dumps coming from Microsoft or Citrix terminal service environments I frequently need to find a process hosting terminal service. In Windows 2000 it was the separate process termsrv.exe and now it is termsrv.dll which can be loaded into any of several instances of svchost.exe. The simplest way to narrow down that svchost.exe process if we have a complete memory dump is to use the module option of WinDbg !process command:

!process /m termsrv.dll 0

!process /m wsxica.dll 0

!process /m ctxrdpwsx.dll 0

Note: this option works only with W2K3, XP and later OS

Also to list all processes with user space stacks having the same image name we can use:

!process 0 ff msiexec.exe

or  

!process 0 ff svchost.exe

Note: this command works with W2K too as well as session option (/s)

- Dmitry Vostokov @ DumpAnalysis.org

Exceptions Ab Initio

Friday, November 16th, 2007

Where do native exceptions come from? How do they propagate from hardware and eventually result in crash dumps? I was asking these questions when I started doing crash dump analysis more than four years ago and I tried to find answers using IA-32 Intel® Architecture Software Developer’s Manual, WinDbg and complete memory dumps.

Eventually I wrote some blog posts about my findings. They are buried between many other posts so I dug them out and put on a dedicated page:

Interrupts and Exceptions Explained

- Dmitry Vostokov @ DumpAnalysis.org