Crash Dump Analysis Patterns (Part 13b)

July 15th, 2007

Sometimes handle leaks also result in insufficient memory especially if handles point to structures allocated by OS. Here is the typical example of the handle leak resulted in freezing several servers. The complete memory dump shows exhausted non-paged pool:

0: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:     1048352 (   4193408 Kb)
 Page File: \??\C:\pagefile.sys
   Current:   4190208 Kb  Free Space:   3749732 Kb
   Minimum:   4190208 Kb  Maximum:      4190208 Kb
 Available Pages:      697734 (   2790936 Kb)
 ResAvail Pages:       958085 (   3832340 Kb)
 Locked IO Pages:          95 (       380 Kb)
 Free System PTEs:     199971 (    799884 Kb)
 Free NP PTEs:            105 (       420 Kb)
 Free Special NP:           0 (         0 Kb)
 Modified Pages:          195 (       780 Kb)
 Modified PF Pages:       195 (       780 Kb)
 NonPagedPool Usage:    65244 (    260976 Kb)
 NonPagedPool Max:      65503 (    262012 Kb)
 ********** Excessive NonPaged Pool Usage *****

 PagedPool 0 Usage:      6576 (     26304 Kb)
 PagedPool 1 Usage:       629 (      2516 Kb)
 PagedPool 2 Usage:       624 (      2496 Kb)
 PagedPool 3 Usage:       608 (      2432 Kb)
 PagedPool 4 Usage:       625 (      2500 Kb)
 PagedPool Usage:        9062 (     36248 Kb)
 PagedPool Maximum:     66560 (    266240 Kb)

********** 184 pool allocations have failed **********

 Shared Commit:          7711 (     30844 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:        10625 (     42500 Kb)
 PagedPool Commit:       9102 (     36408 Kb)
 Driver Commit:          1759 (      7036 Kb)
 Committed pages:      425816 (   1703264 Kb)
 Commit limit:        2052560 (   8210240 Kb)

Looking at non-paged pool consumption reveals excessive number of thread objects:

0: kd> !poolused 3
   Sorting by  NonPaged Pool Consumed

  Pool Used:
            NonPaged
 Tag    Allocs    Frees     Diff     Used
 Thre   772672   463590   309082 192867168  Thread objects , Binary: nt!ps
 MmCm       42        9       33 12153104   Calls made to MmAllocateContiguousMemory , Binary: nt!mm


The next logical step would be to list processes and find their handle usage. Indeed there is such a process:

0: kd> !process 0 0



PROCESS 88b75020  SessionId: 7  Cid: 172e4    Peb: 7ffdf000  ParentCid: 17238
    DirBase: c7fb6bc0  ObjectTable: e17f50a0  HandleCount: 143428.
    Image: iexplore.exe


Making the process current and listing its handles shows contiguously allocated handles to thread objects:

0: kd> .process 88b75020
Implicit process is now 88b75020
0: kd> .reload /user

0: kd> !handle



0d94: Object: 88a6b020  GrantedAccess: 001f03ff Entry: e35e1b28
Object: 88a6b020  Type: (8b780c68) Thread
    ObjectHeader: 88a6b008
        HandleCount: 1  PointerCount: 1

0d98: Object: 88a97320  GrantedAccess: 001f03ff Entry: e35e1b30
Object: 88a97320  Type: (8b780c68) Thread
    ObjectHeader: 88a97308
        HandleCount: 1  PointerCount: 1

0d9c: Object: 88b2b020  GrantedAccess: 001f03ff Entry: e35e1b38
Object: 88b2b020  Type: (8b780c68) Thread
    ObjectHeader: 88b2b008
        HandleCount: 1  PointerCount: 1

0da0: Object: 88b2a730  GrantedAccess: 001f03ff Entry: e35e1b40
Object: 88b2a730  Type: (8b780c68) Thread
    ObjectHeader: 88b2a718
        HandleCount: 1  PointerCount: 1

0da4: Object: 88b929a0  GrantedAccess: 001f03ff Entry: e35e1b48
Object: 88b929a0  Type: (8b780c68) Thread
    ObjectHeader: 88b92988
        HandleCount: 1  PointerCount: 1

0da8: Object: 88a57db0  GrantedAccess: 001f03ff Entry: e35e1b50
Object: 88a57db0  Type: (8b780c68) Thread
    ObjectHeader: 88a57d98
        HandleCount: 1  PointerCount: 1

0dac: Object: 88b92db0  GrantedAccess: 001f03ff Entry: e35e1b58
Object: 88b92db0  Type: (8b780c68) Thread
    ObjectHeader: 88b92d98
        HandleCount: 1  PointerCount: 1

0db0: Object: 88b4a730  GrantedAccess: 001f03ff Entry: e35e1b60
Object: 88b4a730  Type: (8b780c68) Thread
    ObjectHeader: 88b4a718
        HandleCount: 1  PointerCount: 1

0db4: Object: 88a7e730  GrantedAccess: 001f03ff Entry: e35e1b68
Object: 88a7e730  Type: (8b780c68) Thread
    ObjectHeader: 88a7e718
        HandleCount: 1  PointerCount: 1

0db8: Object: 88a349a0  GrantedAccess: 001f03ff Entry: e35e1b70
Object: 88a349a0  Type: (8b780c68) Thread
    ObjectHeader: 88a34988
        HandleCount: 1  PointerCount: 1

0dbc: Object: 88a554c0  GrantedAccess: 001f03ff Entry: e35e1b78
Object: 88a554c0  Type: (8b780c68) Thread
    ObjectHeader: 88a554a8
        HandleCount: 1  PointerCount: 1


Examination of these threads shows their stack traces and start address:

0: kd> !thread 88b4a730
THREAD 88b4a730  Cid 0004.1885c  Teb: 00000000 Win32Thread: 00000000 TERMINATED
Not impersonating
DeviceMap                 e1000930
Owning Process            8b7807a8       Image:         System
Wait Start TickCount      975361         Ticks: 980987 (0:04:15:27.921)
Context Switch Count      1
UserTime                  00:00:00.0000
KernelTime                00:00:00.0000
Start Address mydriver!StatusWaitThread (0xf5c5d128)
Stack Init 0 Current f3c4cc98 Base f3c4d000 Limit f3c4a000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child
f3c4ccac 8083129e ffdff5f0 8697ba00 a674c913 hal!KfLowerIrql+0×62
f3c4ccc8 00000000 808ae498 8697ba00 00000000 nt!KiExitDispatcher+0×130

0: kd> !thread 88a554c0
THREAD 88a554c0  Cid 0004.1888c  Teb: 00000000 Win32Thread: 00000000 TERMINATED
Not impersonating
DeviceMap                 e1000930
Owning Process            8b7807a8       Image:         System
Wait Start TickCount      975380         Ticks: 980968 (0:04:15:27.625)
Context Switch Count      1
UserTime                  00:00:00.0000
KernelTime                00:00:00.0000
Start Address mydriver!StatusWaitThread (0xf5c5d128)
Stack Init 0 Current f3c4cc98 Base f3c4d000 Limit f3c4a000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child
f3c4ccac 8083129e ffdff5f0 8697ba00 a674c913 hal!KfLowerIrql+0×62
f3c4ccc8 00000000 808ae498 8697ba00 00000000 nt!KiExitDispatcher+0×130

We can see that they have been terminated and their start address belongs to mydriver.sys. Therefore we can say that mydriver code has to be examined to find the source of the handle leak.

- Dmitry Vostokov @ DumpAnalysis.org -

Music for Debugging

July 15th, 2007

Debugging and understanding multithreaded programs is hard and sometimes it requires running several execution paths mentally. Listening to composers who use multithreading in music can help here. My favourite is J.S. Bach and I recently purchased his complete works (155 CD box from Brilliant Classics). Virtuoso music helps me in live debugging too and here my favourites are Chopin and Liszt. I recently purchased 30 CD box of complete Chopin works (also from Brilliant Classics).

Many software engineers listen to music when writing code and I’m not the exception. However, I have found that not all music suitable for programming helps me during debugging sessions.

Music for relaxation, quiet classical or modern music helps me to think about program design and write solid code. Music with several melodies played simultaneously, heroic and virtuoso works help me to achieve breakthrough and find a bug. The latter kind of music also suits me for listening when doing crash dump analysis or problem troubleshooting.

In 1997 I read a wonderful book “Zen of Windows 95 Programming: Master the Art of Moving to Windows 95 and Creating High-Performance Windows Applications” written by Lou Grinzo and this book provides some music suggestions to listen to while doing programming.

Recently I’ve found one research project related to audio debugging, mapping source code to musical structures

http://www.wsu.edu/~stefika/ProgramAuralization.html

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 4)

July 15th, 2007

The previous part discussed processor interrupts in user mode. In this part I will explain WinDbg .trap command and show how to simulate it manually.

Upon an interrupt a processor saves the current instruction pointer and transfers execution to an interrupt handler as explained in the first part of these series. This interrupt handler has to save a full thread context before calling other functions to do complex interrupt processing. For example, if we disassemble KiTrap0E handler from x86 Windows 2003 crash dump we would see that it saves a lot of registers including segment registers:

3: kd> uf nt!KiTrap0E
...
...
...
nt!KiTrap0E:
e088bb2c mov     word ptr [esp+2],0
e088bb33 push    ebp
e088bb34 push    ebx
e088bb35 push    esi
e088bb36 push    edi
e088bb37 push    fs
e088bb39 mov     ebx,30h
e088bb3e mov     fs,bx
e088bb41 mov     ebx,dword ptr fs:[0]
e088bb48 push    ebx
e088bb49 sub     esp,4
e088bb4c push    eax
e088bb4d push    ecx
e088bb4e push    edx
e088bb4f push    ds
e088bb50 push    es
e088bb51 push    gs
e088bb53 mov     ax,23h
e088bb57 sub     esp,30h
e088bb5a mov     ds,ax
e088bb5d mov     es,ax
e088bb60 mov     ebp,esp
e088bb62 test    dword ptr [esp+70h],20000h
e088bb6a jne     nt!V86_kite_a (e088bb04)
...
...
...

The saved processor state information (context) forms the so called Windows kernel trap frame:

3: kd> dt _KTRAP_FRAME
+0x000 DbgEbp           : Uint4B
+0x004 DbgEip           : Uint4B
+0x008 DbgArgMark       : Uint4B
+0x00c DbgArgPointer    : Uint4B
+0x010 TempSegCs        : Uint4B
+0x014 TempEsp          : Uint4B
+0x018 Dr0              : Uint4B
+0x01c Dr1              : Uint4B
+0x020 Dr2              : Uint4B
+0x024 Dr3              : Uint4B
+0x028 Dr6              : Uint4B
+0x02c Dr7              : Uint4B
+0x030 SegGs            : Uint4B
+0x034 SegEs            : Uint4B
+0x038 SegDs            : Uint4B
+0x03c Edx              : Uint4B
+0x040 Ecx              : Uint4B
+0x044 Eax              : Uint4B
+0x048 PreviousPreviousMode : Uint4B
+0x04c ExceptionList    : Ptr32 _EXCEPTION_REGISTRATION_RECORD
+0x050 SegFs            : Uint4B
+0x054 Edi              : Uint4B
+0x058 Esi              : Uint4B
+0x05c Ebx              : Uint4B
+0x060 Ebp              : Uint4B
+0x064 ErrCode          : Uint4B
+0x068 Eip              : Uint4B
+0x06c SegCs            : Uint4B
+0x070 EFlags           : Uint4B
+0x074 HardwareEsp      : Uint4B
+0x078 HardwareSegSs    : Uint4B
+0x07c V86Es            : Uint4B
+0x080 V86Ds            : Uint4B
+0x084 V86Fs            : Uint4B
+0x088 V86Gs            : Uint4B

This Windows trap frame is not the same as an interrupt frame a processor saves on a current thread stack when an interrupt occurs in kernel mode. The latter frame is very small and consists only of EIP, CS, EFLAGS and ErrorCode. When an interrupt occurs in user mode an x86 processor additionally saves the current stack pointer SS:ESP.

The .trap command finds the trap frame on a current thread stack and sets the current thread register context using the values from that saved structure. You can see that command in action for certain bugchecks when you use !analyze -v:

3: kd> !analyze -v
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
...
...
...
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: de65190c, The address that the exception occurred at
Arg3: f24f8a74, Trap Frame
Arg4: 00000000



TRAP_FRAME:  f24f8a74 — (.trap fffffffff24f8a74)
.trap fffffffff24f8a74
ErrCode = 00000000
eax=dbc128c0 ebx=dbe4a010 ecx=f24f8ac4 edx=00000001 esi=46525356 edi=00000000
eip=de65190c esp=f24f8ae8 ebp=f24f8b18 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
driver!foo+0×16:
de65190c 837e1c00         cmp     dword ptr [esi+1Ch],0 ds:0023:46525372=????????


If we look at the trap frame we would see the same register values that WinDbg reports above:

3: kd> dt _KTRAP_FRAME f24f8a74
+0x000 DbgEbp           : 0xf24f8b18
+0x004 DbgEip           : 0xde65190c
+0x008 DbgArgMark       : 0xbadb0d00
+0x00c DbgArgPointer    : 1
+0x010 TempSegCs        : 0xb0501cd
+0x014 TempEsp          : 0xdcc01cd0
+0x018 Dr0              : 0xf24f8aa8
+0x01c Dr1              : 0xde46c90a
+0x020 Dr2              : 0
+0x024 Dr3              : 0
+0x028 Dr6              : 0xdbe4a000
+0x02c Dr7              : 0
+0x030 SegGs            : 0
+0x034 SegEs            : 0x23
+0x038 SegDs            : 0x23
+0x03c Edx              : 1
+0x040 Ecx              : 0xf24f8ac4
+0x044 Eax              : 0xdbc128c0
+0x048 PreviousPreviousMode : 0xdbe4a010
+0x04c ExceptionList    : 0xffffffff _EXCEPTION_REGISTRATION_RECORD
+0x050 SegFs            : 0x30
+0x054 Edi              : 0
+0x058 Esi              : 0x46525356
+0x05c Ebx              : 0xdbe4a010
+0x060 Ebp              : 0xf24f8b18
+0x064 ErrCode          : 0
+0x068 Eip              : 0xde65190c ; driver!foo+0x16
+0x06c SegCs            : 8
+0x070 EFlags           : 0x10206
+0x074 HardwareEsp      : 0xdbc171b0
+0x078 HardwareSegSs    : 0xde667677
+0x07c V86Es            : 0xdbc128c0
+0x080 V86Ds            : 0xdbc171c4
+0x084 V86Fs            : 0xf24f8bc4
+0x088 V86Gs            : 0

It is good to know how to find a trap frame manually in the case the stack is corrupt or WinDbg cannot find a trap frame automatically. In this case we can take the advantage of the fact that DS and ES segment registers have the same value in the Windows flat memory model:

   +0x034 SegEs            : 0x23
+0x038 SegDs            : 0x23

We need to find 2 consecutive 0×23 values on the stack. There may be several such places but usually the correct one comes between KiTrapXX address on the stack and the initial processor trap frame shown below in red. This is because KiTrapXX obviously calls other functions to further process an interrupt so its return address is saved on the stack.

3: kd> r
eax=f535713c ebx=de65190c ecx=00000000 edx=e088e1d2 esi=f5357120 edi=00000000
eip=e0827451 esp=f24f8628 ebp=f24f8640 iopl=0 nv up ei ng nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
nt!KeBugCheckEx+0×1b:
e0827451 5d              pop     ebp

3: kd> dds f24f8628 f24f8628+1000
...
...
...
f24f8784  de4b2995 win32k!NtUserQueryWindow
f24f8788  00000000
f24f878c  fe76a324
f24f8790  f24f8d64
f24f8794  0006e43c
f24f8798  e087c041 nt!ExReleaseResourceAndLeaveCriticalRegion+0x5
f24f879c  83f3b801
f24f87a0  f24f8a58
f24f87a4  0000003b
f24f87a8  00000000
f24f87ac  00000030
f24f87b0  00000023
f24f87b4  00000023

f24f87b8  00000000



f24f8a58  00000111
f24f8a5c  f24f8a74
f24f8a60  e088bc08 nt!KiTrap0E+0xdc
f24f8a64  00000000
f24f8a68  46525372
f24f8a6c  00000000
f24f8a70  e0889686 nt!Kei386EoiHelper+0×186
f24f8a74  f24f8b18
f24f8a78  de65190c driver!foo+0×16
f24f8a7c  badb0d00
f24f8a80  00000001
f24f8a84  0b0501cd
f24f8a88  dcc01cd0
f24f8a8c  f24f8aa8
f24f8a90  de46c90a win32k!HANDLELOCK::vLockHandle+0×80
f24f8a94  00000000
f24f8a98  00000000
f24f8a9c  dbe4a000
f24f8aa0  00000000
f24f8aa4  00000000
f24f8aa8  00000023
f24f8aac  00000023

f24f8ab0  00000001
f24f8ab4  f24f8ac4
f24f8ab8  dbc128c0
f24f8abc  dbe4a010
f24f8ac0  ffffffff
f24f8ac4  00000030
f24f8ac8  00000000
f24f8acc  46525356
f24f8ad0  dbe4a010
f24f8ad4  f24f8b18
f24f8ad8  00000000
f24f8adc  de65190c driver!foo+0×16
f24f8ae0  00000008
f24f8ae4  00010206

f24f8ae8  dbc171b0
f24f8aec  de667677 driver!bar+0×173
f24f8af0  dbc128c0
f24f8af4  dbc171c4
f24f8af8  f24f8bc4
f24f8afc  00000000


Subtracting the offset 0×38 from the address of the 00000023 value (f24f8aac) and using dt command we can check _KTRAP_FRAME structure and apply .trap command afterwards:

3: kd> dt _KTRAP_FRAME f24f8aac-38
+0x000 DbgEbp           : 0xf24f8b18
+0x004 DbgEip           : 0xde65190c
+0x008 DbgArgMark       : 0xbadb0d00
+0x00c DbgArgPointer    : 1
+0x010 TempSegCs        : 0xb0501cd
+0x014 TempEsp          : 0xdcc01cd0
+0x018 Dr0              : 0xf24f8aa8
+0x01c Dr1              : 0xde46c90a
+0x020 Dr2              : 0
+0x024 Dr3              : 0
+0x028 Dr6              : 0xdbe4a000
+0x02c Dr7              : 0
+0x030 SegGs            : 0
+0x034 SegEs            : 0x23
+0x038 SegDs            : 0x23
+0x03c Edx              : 1
+0x040 Ecx              : 0xf24f8ac4
+0x044 Eax              : 0xdbc128c0
+0x048 PreviousPreviousMode : 0xdbe4a010
+0x04c ExceptionList    : 0xffffffff _EXCEPTION_REGISTRATION_RECORD
+0x050 SegFs            : 0x30
+0x054 Edi              : 0
+0x058 Esi              : 0x46525356
+0x05c Ebx              : 0xdbe4a010
+0x060 Ebp              : 0xf24f8b18
+0x064 ErrCode          : 0
+0x068 Eip              : 0xde65190c
+0x06c SegCs            : 8
+0x070 EFlags           : 0x10206
+0x074 HardwareEsp      : 0xdbc171b0
+0x078 HardwareSegSs    : 0xde667677
+0x07c V86Es            : 0xdbc128c0
+0x080 V86Ds            : 0xdbc171c4
+0x084 V86Fs            : 0xf24f8bc4
+0x088 V86Gs            : 0

3: kd> ? f24f8aac-38
Evaluate expression: -229668236 = f24f8a74

3: kd> .trap f24f8a74
ErrCode = 00000000
eax=dbc128c0 ebx=dbe4a010 ecx=f24f8ac4 edx=00000001 esi=46525356 edi=00000000
eip=de65190c esp=f24f8ae8 ebp=f24f8b18 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
driver!foo+0x16:
de65190c 837e1c00        cmp     dword ptr [esi+1Ch],0 ds:0023:46525372=????????

In complete memory dumps we can see that _KTRAP_FRAME is saved when calling system services too:

3: kd> kL
ChildEBP RetAddr
f24f8ae8 de667677 driver!foo+0x16
f24f8b18 de667799 driver!bar+0x173
f24f8b90 de4a853e win32k!GreSaveScreenBits+0x69
f24f8bd8 de4922bd win32k!CreateSpb+0x167
f24f8c40 de490bb8 win32k!zzzChangeStates+0x448
f24f8c88 de4912de win32k!zzzBltValidBits+0xe2
f24f8ce0 de4926c6 win32k!xxxEndDeferWindowPosEx+0x13a
f24f8cfc de49aa8f win32k!xxxSetWindowPos+0xb1
f24f8d34 de4acf4d win32k!xxxShowWindow+0x201
f24f8d54 e0888c6c win32k!NtUserShowWindow+0x79
f24f8d54 7c94ed54 nt!KiFastCallEntry+0xfc (TrapFrame @ f24f8d64)
0006e48c 77e34f1d ntdll!KiFastSystemCallRet
0006e53c 77e2f12f USER32!NtUserShowWindow+0xc
0006e570 77e2b0fe USER32!InternalDialogBox+0xa9
0006e590 77e29005 USER32!DialogBoxIndirectParamAorW+0×37
0006e5b4 0103d569 USER32!DialogBoxParamW+0×3f
0006e5d8 0102d2f5 winlogon!Fusion_DialogBoxParam+0×24

and we can get the current thread context before its transition to kernel mode:

3: kd> .trap f24f8d64
ErrCode = 00000000
eax=7ffff000 ebx=00000000 ecx=00000000 edx=7c94ed54 esi=00532e68 edi=0002002c
eip=7c94ed54 esp=0006e490 ebp=0006e53c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
001b:7c94ed54 c3              ret

3: kd> kL
ChildEBP RetAddr
0006e48c 77e34f1d ntdll!KiFastSystemCallRet
0006e53c 77e2f12f USER32!NtUserShowWindow+0xc
0006e570 77e2b0fe USER32!InternalDialogBox+0xa9
0006e590 77e29005 USER32!DialogBoxIndirectParamAorW+0x37
0006e5b4 0103d569 USER32!DialogBoxParamW+0x3f
0006e5d8 0102d2f5 winlogon!Fusion_DialogBoxParam+0x24

In the next part I’ll show an example from an x64 crash dump.

- Dmitry Vostokov @ DumpAnalysis.org -

Reading Windows-based Code (Part 1)

July 13th, 2007

As promised here is the first introductory part of the Code Reading (The Windows Perspective) training. You might need to download and install Microsoft Office Animation Runtime if you don’t have PowerPoint installed:

PowerPoint 2003/2002 Add-in: Office Animation Runtime 

The HTML version of the presentation is located here:

Reading Windows-based Code (Part 1)

- Dmitry Vostokov @ DumpAnalysis.org -

StressPrinters update

July 12th, 2007

The new version 1.3.1 has been published and can be downloaded from Citrix technical support:

StressPrinters 1.3.1 for 32-bit and 64-bit platforms

What’s new:

  1. Configurable timeout to mark potential printer drivers in the log 

  2. The log structure and warnings are documented in the article with an example

  3. AddPrinter command line section in the article for fine-tuning tests

  4. The option to execute a post-processing command after tests 

The motivation behind the creation of this tool is explained in the previous post:

StressPrinters: Stressing Printer Autocreation 

- Dmitry Vostokov @ DumpAnalysis.org -

Troubleshooting as debugging

July 11th, 2007

This post is motivated by TRAFFIC steps introduced by Andreas Zeller in his book ”Why Programs Fail?”. This book is wonderful and it gives practical debugging skills coherent and solid systematical foundation.

However these steps are for fixing defects in code, the traditional view of the software debugging process. Based on an analogy with systems theories where we have different levels of abstraction like psychology, biology, chemistry and physics, I would say that debugging starts when you have the failure at the system level.

If we compare systems to applications, troubleshooting to source code debugging, the question we ask at the higher level is “Who caused the product to fail?” which also has a business and political flavor. Therefore I propose a different acronym: VERSION. If you always try to fix system problems at the code level you will get a huge “traffic” in all sense but if you troubleshoot them first you get a different system / subsystem / component version and get your problem solved faster. This is why we have technical support departments in organizations. 

There are some parallels between TRAFFIC and VERSION steps:

Track                     View the problem
Reproduce                 Environment/repro steps
Automate (and simplify)   Relevant description
Find origins              Subsystem/component
                             identification
Focus                     Identify the origin
                             (subsystem/component)
Isolate (defect in code)  Obtain the solution
                             (replace/eliminate
                              subsystem/component)
Correct (defect in code)  New case study
                             (document,
                              postmortem analysis)

Troubleshooting doesn’t eliminate the need to look at source code. In many cases a support engineer has to be proficient in code reading skill to be able to map from traces to source code. This will help in component identification, especially if your product has extensive tracing facility. I have started development of  ”Code Reading” training targeted for Windows environments and will post some presentations soon. The first one will be available tomorrow, so stay tuned.

- Dmitry Vostokov @ DumpAnalysis.org -

WinDbg is privacy-aware

July 8th, 2007

This is a quick follow up to my previous post about privacy issues related to crash dumps. WinDbg has two options for .dump command to remove the potentially sensitive user data (from WinDbg help):

  • r  - Deletes from the minidump those portions of the stack and store memory that are not useful for recreating the stack trace. Local variables and other data type values are deleted as well. This option does not make the minidump smaller (because these memory sections are simply zeroed), but it is useful if you want to protect the privacy of other applications. 

  • R  - Deletes the full module paths from the minidump. Only the module names will be included. This is a useful option if you want to protect the privacy of the user’s directory structure.

Therefore it is possible to configure CDB or WinDbg as default postmortem debuggers and avoid process data to be included. Most of stack is zeroed except frame data pointers and return addresses used to reconstruct stack trace. Therefore string constants like passwords are eliminated. I made the following test with CDB configured as the default post-mortem debugger on my Windows x64 server:

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug
Debugger="C:\Program Files (x86)\Debugging Tools for Windows\cdb.exe" -p %ld -e %ld -g -c ".dump /o /mrR /u c:\temp\safedump.dmp; q"

I got the following stack trace from TestDefaultDebugger (module names and function offsets are removed for visual clarity):

0:000> kvL 100
ChildEBP RetAddr  Args to Child
002df868 00403263 00425ae8 00000000 002df8a8 OnBnClickedButton1
002df878 00403470 002dfe90 00000000 00000000 _AfxDispatchCmdMsg
002df8a8 00402a27 00000000 00000000 00000000 OnCmdMsg
002df8cc 00408e69 00000000 00000000 00000000 OnCmdMsg
002df91c 004098d9 00000000 00580a9e 00000000 OnCommand
002df9b8 00406258 00000000 00000000 00580a9e OnWndMsg
002df9d8 0040836d 00000000 00000000 00580a9e WindowProc
002dfa40 004083f4 00000000 00000000 00000000 AfxCallWndProc
002dfa60 7d9472d8 00000000 00000000 00000000 AfxWndProc
002dfa8c 7d9475c3 004083c0 00000000 00000000 InternalCallWinProc
002dfb04 7d948626 00000000 004083c0 00000000 UserCallWinProcCheckWow
002dfb48 7d94868d 00000000 00000000 00000000 SendMessageWorker
002dfb6c 7dbf87b3 00000000 00000000 00000000 SendMessageW
002dfb8c 7dbf8895 00000000 00000000 00000000 Button_NotifyParent
002dfba8 7dbfab9a 00000000 00000000 002dfcb0 Button_ReleaseCapture
002dfc38 7d9472d8 00580a9e 00000000 00000000 Button_WndProc
002dfc64 7d9475c3 7dbfa313 00580a9e 00000000 InternalCallWinProc
002dfcdc 7d9477f6 00000000 7dbfa313 00580a9e UserCallWinProcCheckWow
002dfd54 7d947838 00000000 00000000 002dfd90 DispatchMessageWorker
002dfd64 7d956ca0 00000000 00000000 002dfe90 DispatchMessageW
002dfd90 0040568b 00000000 00000000 002dfe90 IsDialogMessageW
002dfda0 004065d8 00000000 00402a07 00000000 IsDialogMessageW
002dfda8 00402a07 00000000 00000000 00000000 PreTranslateInput
002dfdb8 00408041 00000000 00000000 002dfe90 PreTranslateMessage
002dfdc8 00403ae3 00000000 00000000 00000000 WalkPreTranslateTree
002dfddc 00403c1e 00000000 00403b29 00000000 AfxInternalPreTranslateMessage
002dfde4 00403b29 00000000 00403c68 00000000 PreTranslateMessage
002dfdec 00403c68 00000000 00000000 002dfe90 AfxPreTranslateMessage
002dfdfc 00407920 00000000 002dfe90 002dfe6c AfxInternalPumpMessage
002dfe20 004030a1 00000000 00000000 0042ec18 CWnd::RunModalLoop
002dfe6c 0040110d 00000000 0042ec18 0042ec18 CDialog::DoModal
002dff18 004206fb 00000000 00000000 00000000 InitInstance
002dff28 0040e852 00400000 00000000 00000000 AfxWinMain
002dffc0 7d4e992a 00000000 00000000 00000000 __tmainCRTStartup
002dfff0 00000000 0040e8bb 00000000 00000000 BaseProcessStart

We can see that most arguments are zeroes. Those that are not either do not point to valid data or related to function return addresses and frame pointers. This can be seen from the raw stack data as well:

0:000> dds esp
002df86c  00403263 TestDefaultDebugger!_AfxDispatchCmdMsg+0x43
002df870  00425ae8 TestDefaultDebugger!CTestDefaultDebuggerApp::`vftable'+0x154
002df874  00000000
002df878  002df8a8
002df87c  00403470 TestDefaultDebugger!CCmdTarget::OnCmdMsg+0x118
002df880  002dfe90
002df884  00000000
002df888  00000000
002df88c  004014f0 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1
002df890  00000000
002df894  00000000
002df898  00000000
002df89c  002dfe90
002df8a0  00000000
002df8a4  00000000
002df8a8  002df8cc
002df8ac  00402a27 TestDefaultDebugger!CDialog::OnCmdMsg+0x1b
002df8b0  00000000
002df8b4  00000000
002df8b8  00000000
002df8bc  00000000
002df8c0  00000000
002df8c4  002dfe90
002df8c8  00000000
002df8cc  002df91c
002df8d0  00408e69 TestDefaultDebugger!CWnd::OnCommand+0x90
002df8d4  00000000
002df8d8  00000000
002df8dc  00000000
002df8e0  00000000
002df8e4  002dfe90
002df8e8  002dfe90

We can compare it with the normal full or minidump saved with other /m options. The data zeroed when we use /mr option is shown in red color (module names and function offsets are removed for visual clarity):

0:000> kvL 100
ChildEBP RetAddr Args to Child
002df868 00403263 00425ae8 00000111 002df8a8 OnBnClickedButton1
002df878 00403470 002dfe90 000003e8 00000000 _AfxDispatchCmdMsg
002df8a8 00402a27 000003e8 00000000 00000000 OnCmdMsg
002df8cc 00408e69 000003e8 00000000 00000000 OnCmdMsg
002df91c 004098d9 00000000 00271876 d5b6c7f7 OnCommand
002df9b8 00406258 00000111 000003e8 00271876 OnWndMsg
002df9d8 0040836d 00000111 000003e8 00271876 WindowProc
002dfa40 004083f4 00000000 00561878 00000111 AfxCallWndProc
002dfa60 7d9472d8 00561878 00000111 000003e8 AfxWndProc
002dfa8c 7d9475c3 004083c0 00561878 00000111 InternalCallWinProc
002dfb04 7d948626 00000000 004083c0 00561878 UserCallWinProcCheckWow
002dfb48 7d94868d 00aec860 00000000 00000111 SendMessageWorker
002dfb6c 7dbf87b3 00561878 00000111 000003e8 SendMessageW
002dfb8c 7dbf8895 002ec9e0 00000000 0023002c Button_NotifyParent
002dfba8 7dbfab9a 002ec9e0 00000001 002dfcb0 Button_ReleaseCapture
002dfc38 7d9472d8 00271876 00000202 00000000 Button_WndProc
002dfc64 7d9475c3 7dbfa313 00271876 00000202 InternalCallWinProc
002dfcdc 7d9477f6 00000000 7dbfa313 00271876 UserCallWinProcCheckWow
002dfd54 7d947838 002e77f8 00000000 002dfd90 DispatchMessageWorker
002dfd64 7d956ca0 002e77f8 00000000 002dfe90 DispatchMessageW
002dfd90 0040568b 00561878 00000000 002dfe90 IsDialogMessageW
002dfda0 004065d8 002e77f8 00402a07 002e77f8 IsDialogMessageW
002dfda8 00402a07 002e77f8 002e77f8 00561878 PreTranslateInput
002dfdb8 00408041 002e77f8 002e77f8 002dfe90 PreTranslateMessage
002dfdc8 00403ae3 00561878 002e77f8 002e77f8 WalkPreTranslateTree
002dfddc 00403c1e 002e77f8 00403b29 002e77f8 AfxInternalPreTranslateMessage
002dfde4 00403b29 002e77f8 00403c68 002e77f8 PreTranslateMessage
002dfdec 00403c68 002e77f8 00000000 002dfe90 AfxPreTranslateMessage
002dfdfc 00407920 00000004 002dfe90 002dfe6c AfxInternalPumpMessage
002dfe20 004030a1 00000004 d5b6c023 0042ec18 RunModalLoop
002dfe6c 0040110d d5b6c037 0042ec18 0042ec18 DoModal
002dff18 004206fb 00000ece 00000002 00000001 InitInstance
002dff28 0040e852 00400000 00000000 001d083e AfxWinMain
002dffc0 7d4e992a 00000000 00000000 7efdf000 __tmainCRTStartup
002dfff0 00000000 0040e8bb 00000000 000000c8 BaseProcessStart

0:000> dds esp
002df86c 00403263 TestDefaultDebugger!_AfxDispatchCmdMsg+0x43
002df870 00425ae8 TestDefaultDebugger!CTestDefaultDebuggerApp::`vftable'+0x154
002df874 00000111
002df878 002df8a8
002df87c 00403470 TestDefaultDebugger!CCmdTarget::OnCmdMsg+0×118
002df880 002dfe90
002df884 000003e8
002df888 00000000
002df88c 004014f0 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1
002df890 00000000
002df894 00000038
002df898 00000000
002df89c 002dfe90
002df8a0 000003e8
002df8a4 00000000
002df8a8 002df8cc
002df8ac 00402a27 TestDefaultDebugger!CDialog::OnCmdMsg+0×1b
002df8b0 000003e8
002df8b4 00000000
002df8b8 00000000
002df8bc 00000000
002df8c0 000003e8
002df8c4 002dfe90
002df8c8 00000000
002df8cc 002df91c
002df8d0 00408e69 TestDefaultDebugger!CWnd::OnCommand+0×90
002df8d4 000003e8
002df8d8 00000000
002df8dc 00000000
002df8e0 00000000
002df8e4 002dfe90
002df8e8 002dfe90

- Dmitry Vostokov @ DumpAnalysis.org -

WinDbg update 6.7.5.1

July 8th, 2007

The new WinDbg has been released this week: 6.7.5.1. Contains some enhancements since 6.7.5.0 released earlier in April.

What’s New for Debugging Tools for Windows

One improvement is for handling mini-dumps:

When loading modules from a user-mode minidump provide available misc and CV record info from dump. This can allow symbols to be loaded in some cases when PE image file is not available.

- Dmitry Vostokov @ DumpAnalysis.org -

Citrixware

July 8th, 2007

Citrix is a global leader in application delivery and access infrastructure solutions including application streaming and virtualization. There are so many great products developed by this company including WinFrame, MetaFrame, Presentation Server and its clients, Desktop Server, XenServer, XenApp, XenDesktop, Receiver and Dazzle, Access Gateway, Application Firewall, Application Gateway, NetScaler, WANScaler, GoToMeeting, GoToMyPC, GoToWebinar, GoToAssist, EdgeSight and Password Manager.

Citrix is no longer tied to Windows platforms because its products run on Linux, Solaris, FreeBSD, HP-UX, AIX, Symbian and Mac OS X as well. To bind them all together I propose to use the word “Citrixware”.

With more than 180,000 organizations in the world using Citrixware the chances are that you use it too. This is more encompassing word than just simple “accessware”. 

- Dmitry Vostokov @ DumpAnalysis.org -

Resolving security issues with crash dumps

July 8th, 2007

It is a well known fact that crash dumps may contain sensitive and private information. Crash reports that contain binary process extracts may contain it too. There is a conflict here between the desire to get full memory contents for debugging purposes and possible security implications. The solution would be to have postmortem debuggers and user mode process dumpers to implement an option to save only the activity data like stack traces in a text form. Some problems on a system level can be corrected just by looking at thread stack traces, critical section list, full module information, thread times and so on. This can help to identify components that cause process crashes, hangs or CPU spikes.

Users or system administrators can review text data before sending it outside their environment. This was already implemented as Dr. Watson logs. However these logs don’t usually have sufficient information required for crash dump analysis compared to information we can extract from a dump using WinDbg, for example. If you need to analyze kernel and all process activities you can use scripts to convert your kernel and complete memory dumps to text files:

Dmp2Txt: Solving Security Problem

The similar scripts can be applied to user dumps:

Using scripts to process hundreds of user dumps

Generating good scripts in a production environment has one problem: the conversion tool or debugger needs to know about symbols. This can be easily done with Microsoft modules because of Microsoft public symbol server.  Other companies like Citrix have the option to download public symbols:

Debug Symbols for Citrix Presentation Server

Alternatively one can write a WinDbg extension that loads a text file with stack traces, appropriate module images, finds the right PDB files and presents stack traces with full symbolic information. This can also be a separate program that uses Visual Studio DIA (Debug Interface Access) SDK to access PDB files later after receiving a text file from a customer.

I’m currently experimenting with some approaches and will write about them later. Text files will also be used in Internet Based Crash Dump Analysis Service because it is much easier to process text files than crash dumps. Although it is feasible to submit small mini dumps for this purpose they don’t contain much information and require writing specialized OS specific code to parse them. Also text files having the same file size can contain much more useful information without exposing private and sensitive information.

I would appreciate any comments and suggestions regarding this problem. 

- Dmitry Vostokov @ DumpAnalysis.org -

Coping with missing symbolic information

July 5th, 2007

Sometimes there is no private PDB file available for a module in a crash dump although we know they exist for different versions of the same module. The typical example is when we have a public PDB file loaded automatically and we need access to structure definitions, for example, _TEB or _PEB. In this case we need to force WinDbg to load an additional PDB file just to be able to use these structure definitions. This can be achieved by loading an additional module at a different address and forcing it to use another private PDB file. At the same time we want to keep the original module to reference the correct PDB file albeit the public one. Let’s look at one concrete example.   

I was trying to get stack limits for a thread by using !teb command:

0:000> !teb
TEB at 7efdd000
*** Your debugger is not using the correct symbols
***
*** In order for this command to work properly, your symbol path
*** must point to .pdb files that have full type information.
***
*** Certain .pdb files (such as the public OS symbols) do not
*** contain the required information.  Contact the group that
*** provided you with these symbols if you need this command to
*** work.
***
*** Type referenced: ntdll!_TEB
***
error InitTypeRead( TEB )…
0:000> dt ntdll!*

lm command showed that the symbol file was loaded and it was correct so perhaps it was the public symbol file or _TEB definition was missing in it: 

0:000> lm m ntdll
start    end      module name
7d600000 7d6f0000 ntdll (pdb symbols) c:\websymbols\wntdll.pdb\ 40B574C84D5C42708465A7E4A1E4D7CC2\wntdll.pdb

I looked at the size of wntdll.pdb and it was 1,091Kb. I searched for other ntdll.pdb files, found one with the bigger size 1,187Kb and appended it to my symbol search path:

0:000> .sympath+ C:\websymbols\ntdll.pdb\ DCE823FCF71A4BF5AA489994520EA18F2
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols; C:\websymbols\ntdll.pdb\DCE823FCF71A4BF5AA489994520EA18F2

Then I looked at my symbol cache folder for ntdll.dll, chose a path to a random one and loaded it at the address not occupied by other modules forcing to load symbol files and ignore a mismatch if any:

0:000> .reload /f /i C:\websymbols\ntdll.dll\45D709FFf0000\ntdll.dll=7E000000
0:000> lm
start    end        module name
...
...
...
7d600000 7d6f0000   ntdll      (pdb symbols)          c:\websymbols\wntdll.pdb\40B574C84D5C42708465A7E4A1E4D7CC2\wntdll.pdb
7d800000 7d890000   GDI32      (deferred)
7d8d0000 7d920000   Secur32    (deferred)
7d930000 7da00000   USER32     (deferred)
7da20000 7db00000   RPCRT4     (deferred)
7e000000 7e000000   ntdll_7e000000   (pdb symbols)          C:\websymbols\ntdll.pdb\DCE823FCF71A4BF5AA489994520EA18F2\ntdll.pdb

The additional ntdll.dll was loaded at 7e000000 address and its module name became ntdll_7e000000. Because I knew TEB address I could see the values of _TEB structure fields immediately:

0:000> dt -r1 ntdll_7e000000!_TEB 7efdd000
   +0×000 NtTib            : _NT_TIB
      +0×000 ExceptionList    : 0×0012fec0 _EXCEPTION_REGISTRATION_RECORD
      +0×004 StackBase        : 0×00130000
      +0×008 StackLimit       : 0×0011c000

      +0×00c SubSystemTib     : (null)
      +0×010 FiberData        : 0×00001e00
      +0×010 Version          : 0×1e00
      +0×014 ArbitraryUserPointer : (null)
      +0×018 Self             : 0×7efdd000 _NT_TIB
   +0×01c EnvironmentPointer : (null)
   +0×020 ClientId         : _CLIENT_ID
      +0×000 UniqueProcess    : 0×00000e0c
      +0×004 UniqueThread     : 0×000013dc
   +0×028 ActiveRpcHandle  : (null)
   +0×02c ThreadLocalStoragePointer : (null)
   +0×030 ProcessEnvironmentBlock : 0×7efde000 _PEB
      +0×000 InheritedAddressSpace : 0 ”
      +0×001 ReadImageFileExecOptions : 0×1 ”
      +0×002 BeingDebugged    : 0×1 ”
      +0×003 BitField         : 0 ”
      +0×003 ImageUsesLargePages : 0y0
      +0×003 SpareBits        : 0y0000000 (0)
      +0×004 Mutant           : 0xffffffff
      +0×008 ImageBaseAddress : 0×00400000
      +0×00c Ldr              : 0×7d6a01e0 _PEB_LDR_DATA
      +0×010 ProcessParameters : 0×00020000 _RTL_USER_PROCESS_PARAMETERS
      +0×014 SubSystemData    : (null)
      +0×018 ProcessHeap      : 0×00210000
      +0×01c FastPebLock      : 0×7d6a00e0 _RTL_CRITICAL_SECTION
      +0×020 AtlThunkSListPtr : (null)
      +0×024 SparePtr2        : (null)
      +0×028 EnvironmentUpdateCount : 1
      +0×02c KernelCallbackTable : 0×7d9419f0
      +0×030 SystemReserved   : [1] 0
      +0×034 SpareUlong       : 0
      +0×038 FreeList         : (null)
      +0×03c TlsExpansionCounter : 0
      +0×040 TlsBitmap        : 0×7d6a2058
      +0×044 TlsBitmapBits    : [2] 0xf
      +0×04c ReadOnlySharedMemoryBase : 0×7efe0000
      +0×050 ReadOnlySharedMemoryHeap : 0×7efe0000
      +0×054 ReadOnlyStaticServerData : 0×7efe0cd0  -> (null)
      +0×058 AnsiCodePageData : 0×7efb0000
      +0×05c OemCodePageData  : 0×7efc1000
      +0×060 UnicodeCaseTableData : 0×7efd2000
      +0×064 NumberOfProcessors : 8
      +0×068 NtGlobalFlag     : 0×70
      +0×070 CriticalSectionTimeout : _LARGE_INTEGER 0xffffe86d`079b8000
      +0×078 HeapSegmentReserve : 0×100000
      +0×07c HeapSegmentCommit : 0×2000
      +0×080 HeapDeCommitTotalFreeThreshold : 0×10000
      +0×084 HeapDeCommitFreeBlockThreshold : 0×1000
      +0×088 NumberOfHeaps    : 5
      +0×08c MaximumNumberOfHeaps : 0×10
      +0×090 ProcessHeaps     : 0×7d6a06a0  -> 0×00210000
      +0×094 GdiSharedHandleTable : (null)
      +0×098 ProcessStarterHelper : (null)
      +0×09c GdiDCAttributeList : 0
      +0×0a0 LoaderLock       : 0×7d6a0180 _RTL_CRITICAL_SECTION
      +0×0a4 OSMajorVersion   : 5
      +0×0a8 OSMinorVersion   : 2
      +0×0ac OSBuildNumber    : 0xece
      +0×0ae OSCSDVersion     : 0×200
      +0×0b0 OSPlatformId     : 2
      +0×0b4 ImageSubsystem   : 2
      +0×0b8 ImageSubsystemMajorVersion : 4
      +0×0bc ImageSubsystemMinorVersion : 0
      +0×0c0 ImageProcessAffinityMask : 0
      +0×0c4 GdiHandleBuffer  : [34] 0
      +0×14c PostProcessInitRoutine : (null)
      +0×150 TlsExpansionBitmap : 0×7d6a2050
      +0×154 TlsExpansionBitmapBits : [32] 1
      +0×1d4 SessionId        : 1
      +0×1d8 AppCompatFlags   : _ULARGE_INTEGER 0×0
      +0×1e0 AppCompatFlagsUser : _ULARGE_INTEGER 0×0
      +0×1e8 pShimData        : (null)
      +0×1ec AppCompatInfo    : (null)
      +0×1f0 CSDVersion       : _UNICODE_STRING “Service Pack 2″
      +0×1f8 ActivationContextData : (null)
      +0×1fc ProcessAssemblyStorageMap : (null)
      +0×200 SystemDefaultActivationContextData : 0×00180000 _ACTIVATION_CONTEXT_DATA
      +0×204 SystemAssemblyStorageMap : (null)
      +0×208 MinimumStackCommit : 0
      +0×20c FlsCallback      : 0×002137b0  -> (null)
      +0×210 FlsListHead      : _LIST_ENTRY [ 0×2139c8 - 0×2139c8 ]
      +0×218 FlsBitmap        : 0×7d6a2040
      +0×21c FlsBitmapBits    : [4] 0×33
      +0×22c FlsHighIndex     : 5
   +0×034 LastErrorValue   : 0
   +0×038 CountOfOwnedCriticalSections : 0
   +0×03c CsrClientThread  : (null)
   +0×040 Win32ThreadInfo  : (null)
   +0×044 User32Reserved   : [26] 0
   +0×0ac UserReserved     : [5] 0
   +0×0c0 WOW32Reserved    : 0×78b81910
   +0×0c4 CurrentLocale    : 0×409
   +0×0c8 FpSoftwareStatusRegister : 0
   +0×0cc SystemReserved1  : [54] (null)
   +0×1a4 ExceptionCode    : 0
   +0×1a8 ActivationContextStackPointer : 0×00211ea0 _ACTIVATION_CONTEXT_STACK
      +0×000 ActiveFrame      : (null)
      +0×004 FrameListCache   : _LIST_ENTRY [ 0×211ea4 - 0×211ea4 ]
      +0×00c Flags            : 0
      +0×010 NextCookieSequenceNumber : 1
      +0×014 StackId          : 0×9444f8
   +0×1ac SpareBytes1      : [40]  “”
   +0×1d4 GdiTebBatch      : _GDI_TEB_BATCH
      +0×000 Offset           : 0
      +0×004 HDC              : 0
      +0×008 Buffer           : [310] 0
   +0×6b4 RealClientId     : _CLIENT_ID
      +0×000 UniqueProcess    : 0×00000e0c
      +0×004 UniqueThread     : 0×000013dc
   +0×6bc GdiCachedProcessHandle : (null)
   +0×6c0 GdiClientPID     : 0
   +0×6c4 GdiClientTID     : 0
   +0×6c8 GdiThreadLocalInfo : (null)
   +0×6cc Win32ClientInfo  : [62] 0
   +0×7c4 glDispatchTable  : [233] (null)
   +0xb68 glReserved1      : [29] 0
   +0xbdc glReserved2      : (null)
   +0xbe0 glSectionInfo    : (null)
   +0xbe4 glSection        : (null)
   +0xbe8 glTable          : (null)
   +0xbec glCurrentRC      : (null)
   +0xbf0 glContext        : (null)
   +0xbf4 LastStatusValue  : 0xc0000135
   +0xbf8 StaticUnicodeString : _UNICODE_STRING “mscoree.dll”
      +0×000 Length           : 0×16
      +0×002 MaximumLength    : 0×20a
      +0×004 Buffer           : 0×7efddc00  “mscoree.dll”
   +0xc00 StaticUnicodeBuffer : [261] 0×6d
   +0xe0c DeallocationStack : 0×00030000
   +0xe10 TlsSlots         : [64] (null)
   +0xf10 TlsLinks         : _LIST_ENTRY [ 0×0 - 0×0 ]
      +0×000 Flink            : (null)
      +0×004 Blink            : (null)
   +0xf18 Vdm              : (null)
   +0xf1c ReservedForNtRpc : (null)
   +0xf20 DbgSsReserved    : [2] (null)
   +0xf28 HardErrorMode    : 0
   +0xf2c Instrumentation  : [14] (null)
   +0xf64 SubProcessTag    : (null)
   +0xf68 EtwTraceData     : (null)
   +0xf6c WinSockData      : (null)
   +0xf70 GdiBatchCount    : 0×7efdb000
   +0xf74 InDbgPrint       : 0 ”
   +0xf75 FreeStackOnTermination : 0 ”
   +0xf76 HasFiberData     : 0 ”
   +0xf77 IdealProcessor   : 0×3 ”
   +0xf78 GuaranteedStackBytes : 0
   +0xf7c ReservedForPerf  : (null)
   +0xf80 ReservedForOle   : (null)
   +0xf84 WaitingOnLoaderLock : 0
   +0xf88 SparePointer1    : 0
   +0xf8c SoftPatchPtr1    : 0
   +0xf90 SoftPatchPtr2    : 0
   +0xf94 TlsExpansionSlots : (null)
   +0xf98 ImpersonationLocale : 0
   +0xf9c IsImpersonating  : 0
   +0xfa0 NlsCache         : (null)
   +0xfa4 pShimData        : (null)
   +0xfa8 HeapVirtualAffinity : 0
   +0xfac CurrentTransactionHandle : (null)
   +0xfb0 ActiveFrame      : (null)
   +0xfb4 FlsData          : 0×002139c8
   +0xfb8 SafeThunkCall    : 0 ”
   +0xfb9 BooleanSpare     : [3]  “”

Of course, if I knew in advance that StackBase and StackLimit were the second and the third double words I could have just dumped the first 3 double words at TEB address:

0:000> dd 7efdd000 l3
7efdd000  0012fec0 00130000 0011c000

- Dmitry Vostokov @ DumpAnalysis.org -

Citrixofication

July 5th, 2007

Following the invention of one of the greatest technological thinkers of our civilization, Thomas Edison, at the beginning of the 20th century, that prompted the electrification of our world, I would like to introduce and coin the word ”Citrixofication”, the electrifying power of Citrix that transforms our lives in the 21st century - instant access to any software application whenever and wherever we want. I am very proud that I work for Citrix, the company that changes the way we work at the beginning of the 21st century, like electricity transformed our lives a century ago. The moment you do remote access to your Windows application you use solutions and technology invented by Citrix. Every Windows computer in the world has the code developed by Citrix, ICA protocol code! To be honest I was thinking about “Citrification” but I found that it is already used in chemistry so I tried  ”Citrixification” but this was already used in one French document. The only word left was “Citrixofication” or just “Citrixfication”, whatever you prefer.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 9b)

July 3rd, 2007

This is a follow up to the previous Deadlock pattern post. In this part I’m going to show an example of ERESOURCE deadlock in the Windows kernel.

ERESOURCE (executive resource) is a Windows synchronization object that has ownership semantics. 

An executive resource can be owned exclusively or can have a shared ownership. This is similar to the following file sharing analogy: when a file is opened for writing others can’t write or read it; if you have that file opened for reading others can read from it but can’t write to it.

ERESOURCE structure is linked into a list and have threads as owners which allows us to quickly find deadlocks using !locks command in kernel and complete memory dumps. Here is the definition of _ERESOURCE from x86 and x64 Windows:

0: kd> dt -r1 _ERESOURCE
   +0x000 SystemResourcesList : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x008 OwnerTable       : Ptr32 _OWNER_ENTRY
      +0x000 OwnerThread      : Uint4B
      +0x004 OwnerCount       : Int4B
      +0x004 TableSize        : Uint4B
   +0x00c ActiveCount      : Int2B
   +0x00e Flag             : Uint2B
   +0x010 SharedWaiters    : Ptr32 _KSEMAPHORE
      +0x000 Header           : _DISPATCHER_HEADER
      +0x010 Limit            : Int4B
   +0x014 ExclusiveWaiters : Ptr32 _KEVENT
      +0x000 Header           : _DISPATCHER_HEADER
   +0x018 OwnerThreads     : [2] _OWNER_ENTRY
      +0x000 OwnerThread      : Uint4B
      +0x004 OwnerCount       : Int4B
      +0x004 TableSize        : Uint4B
   +0x028 ContentionCount  : Uint4B
   +0x02c NumberOfSharedWaiters : Uint2B
   +0x02e NumberOfExclusiveWaiters : Uint2B
   +0x030 Address          : Ptr32 Void
   +0x030 CreatorBackTraceIndex : Uint4B
   +0x034 SpinLock         : Uint4B

0: kd> dt -r1  _ERESOURCE
nt!_ERESOURCE
   +0x000 SystemResourcesList : _LIST_ENTRY
      +0x000 Flink            : Ptr64 _LIST_ENTRY
      +0x008 Blink            : Ptr64 _LIST_ENTRY
   +0x010 OwnerTable       : Ptr64 _OWNER_ENTRY
      +0x000 OwnerThread      : Uint8B
      +0x008 OwnerCount       : Int4B
      +0x008 TableSize        : Uint4B
   +0x018 ActiveCount      : Int2B
   +0x01a Flag             : Uint2B
   +0x020 SharedWaiters    : Ptr64 _KSEMAPHORE
      +0x000 Header           : _DISPATCHER_HEADER
      +0x018 Limit            : Int4B
   +0x028 ExclusiveWaiters : Ptr64 _KEVENT
      +0x000 Header           : _DISPATCHER_HEADER
   +0x030 OwnerThreads     : [2] _OWNER_ENTRY
      +0x000 OwnerThread      : Uint8B
      +0x008 OwnerCount       : Int4B
      +0x008 TableSize        : Uint4B
   +0x050 ContentionCount  : Uint4B
   +0x054 NumberOfSharedWaiters : Uint2B
   +0x056 NumberOfExclusiveWaiters : Uint2B
   +0x058 Address          : Ptr64 Void
   +0x058 CreatorBackTraceIndex : Uint8B
   +0x060 SpinLock         : Uint8B

If we have a list of resources from !locks output we can start following threads that own these resources. Owner threads are marked with a star character (*):

0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks......

Resource @ 0x8815b928    Exclusively owned
    Contention Count = 6234751
    NumberOfExclusiveWaiters = 53
     Threads: 89ab8db0-01<*>
     Threads Waiting On Exclusive Access:
        8810fa08       880f5b40       88831020       87e33020
        880353f0       88115020       88131678       880f5db0
        89295420       88255378       880f8b40       8940d020
        880f58d0       893ee500       880edac8       880f8db0
        89172938       879b3020       88091510       88038020
        880407b8       88051020       89511db0       8921f020
        880e9db0       87c33020       88064cc0       88044730
        8803f020       87a2a020       89529380       8802d330
        89a53020       89231b28       880285b8       88106b90
        8803cbc8       88aa3020       88093400       8809aab0
        880ea540       87d46948       88036020       8806e198
        8802d020       88038b40       8826b020       88231020
        890a2020       8807f5d0
     

We see that 53 threads are waiting for _KTHREAD 89ab8db0 to release _ERESOURCE 8815b928. Searching for this thread address reveals the following: 

Resource @ 0x88159560    Exclusively owned
    Contention Count = 166896
    NumberOfExclusiveWaiters = 1
     Threads: 8802a790-01<*>
     Threads Waiting On Exclusive Access:
              89ab8db0
  

We see that thread 89ab8db0 is waiting for 8802a790 to release resource 88159560. We continue searching for thread 8802a790 waiting for another thread but we skip occurences when this thread is not waiting:

Resource @ 0x881f7b60    Exclusively owned
     Threads: 8802a790-01<*>

Resource @ 0x8824b418    Exclusively owned
    Contention Count = 34
     Threads: 8802a790-01<*>
 

Resource @ 0x8825e5a0    Exclusively owned
     Threads: 8802a790-01<*>

Resource @ 0x88172428    Exclusively owned
    Contention Count = 5
    NumberOfExclusiveWaiters = 1
     Threads: 8802a790-01<*>
     Threads Waiting On Exclusive Access:
              880f5020

Searching further we see that thread 8802a790 is waiting for thread 880f5020 to release resource 89bd7bf0:

Resource @ 0x89bd7bf0    Exclusively owned
    Contention Count = 1
    NumberOfExclusiveWaiters = 1
     Threads: 880f5020-01<*>
     Threads Waiting On Exclusive Access:
              8802a790

If we look carefully we would see that we have already seen thread 880f5020 above and I repeat the fragment (with appropriate colors now):

Resource @ 0x88172428    Exclusively owned
    Contention Count = 5
    NumberOfExclusiveWaiters = 1
     Threads: 8802a790-01<*>
     Threads Waiting On Exclusive Access:
              880f5020

We see that thread 880f5020 is waiting for thread 8802a790 and thread 8802a790 is waiting for thread 880f5020.

Therefore we have identified the classical deadlock. What we have to do now is to look at stack traces of these threads to see involved components.

- Dmitry Vostokov @ DumpAnalysis.org -

PDBFinder (public version 3.5)

July 1st, 2007

Version 3.5 uses the new binary database format and achieves the following results compare to the previous version 3.0.1:

  • 2 times smaller database size
  • 5 times faster database load time on startup!

It is fully backwards compatible with 3.0.1 and 2.x database formats and silently converts your old database to the new format on the first load.

Additionally the new version fixes the bug in version 3.0.1 sometimes manifested when removing and then adding folders before building the new database which resulted in incorrectly built database. 

The next version 4.0 is currently under development and it will have the following features:

  • The ability to open multiple databases
  • The ability to exclude certain folders during build to avoid excessive search results output
  • Fully configurable OS and language search options (which are currently disabled for public version)

PDBFinder upgrade is available for download from Citrix support.

If you still use version 2.x there is some additional information about features in version 3.5:

http://www.dumpanalysis.org/blog/index.php/2007/05/04/pdbfinder-public-version-301/

- Dmitry Vostokov @ DumpAnalysis.org -

GDB for WinDbg Users (Part 5)

July 1st, 2007

Displaying thread stack trace is the most used action in crash or core dump analysis and debugging. To show various available GDB commands I created the next version of the test program with the following source code:

#include <stdio.h>

void func_1(int param_1, char param_2, int *param_3, char *param_4);
void func_2(int param_1, char param_2, int *param_3, char *param_4);
void func_3(int param_1, char param_2, int *param_3, char *param_4);
void func_4();

int val_1;
char val_2;
int *pval_1 = &val_1;
char *pval_2 = &val_2;

int main()
{
  val_1 = 1;
  val_2 = '1';
  func_1(val_1, val_2, (int *)pval_1, (char *)pval_2);
  return 0;
}

void func_1(int param_1, char param_2, int *param_3, char *param_4)
{
  val_1 = 2;
  val_2 = '2';
  func_2(param_1, param_2, param_3, param_4);
}

void func_2(int param_1, char param_2, int *param_3, char *param_4)
{
  val_1 = 3;
  val_2 = '3';
  func_3(param_1, param_2, param_3, param_4);
}

void func_3(int param_1, char param_2, int *param_3, char *param_4)
{
  *pval_1 += param_1;
  *pval_2 += param_2;
  func_4();
}

void func_4()
{
  puts("Hello World!");
}

I compiled it with -g gcc compiler option to generate symbolic information. It will be needed for GDB to display function arguments and local variables.

C:\MinGW\examples>..\bin\gcc -g -o test.exe test.c

If you have a crash in func_4 then you can examine stack trace (backtrace) once you open a core dump. Because we don’t have a core dump of our test program we will simulate the stack trace by putting a breakpoint on func_4. In GDB this can be done by break command:

C:\MinGW\examples>..\bin\gdb test.exe
...
...
...
(gdb) break func_4
Breakpoint 1 at 0x40141d
(gdb) run
Starting program: C:\MinGW\examples/test.exe
Breakpoint 1, 0x0040141d in func_4 ()
(gdb)

In WinDbg the breakpoint command is bp:

CommandLine: C:\dmitri\test\release\test.exe
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 0040f000   test.exe
ModLoad: 7d4c0000 7d5f0000   NOT_AN_IMAGE
ModLoad: 7d600000 7d6f0000   C:\W2K3\SysWOW64\ntdll32.dll
ModLoad: 7d4c0000 7d5f0000   C:\W2K3\syswow64\kernel32.dll
(103c.17d8): Break instruction exception - code 80000003 (first chance)
eax=7d600000 ebx=7efde000 ecx=00000005 edx=00000020 esi=7d6a01f4 edi=00221f38
eip=7d61002d esp=0012fb4c ebp=0012fcac iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
ntdll32!DbgBreakPoint:
7d61002d cc              int     3

0:000> bp func_4

0:000> g
ModLoad: 71c20000 71c32000   C:\W2K3\SysWOW64\tsappcmp.dll
ModLoad: 77ba0000 77bfa000   C:\W2K3\syswow64\msvcrt.dll
ModLoad: 77f50000 77fec000   C:\W2K3\syswow64\ADVAPI32.dll
ModLoad: 7da20000 7db00000   C:\W2K3\syswow64\RPCRT4.dll
Breakpoint 0 hit
eax=0040c9d0 ebx=7d4d8dc9 ecx=0040c9d0 edx=00000064 esi=00000002 edi=00000ece
eip=00408be0 esp=0012ff24 ebp=0012ff28 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
test!func_4:
00408be0 55              push    ebp

I had to disable optimization in the project properties otherwise Visual C++ compiler optimizes away all function calls and produces the following short code:

0:000> uf main
00401000 push    offset test!`string' (004020f4)
00401005 mov     dword ptr [test!val_1 (0040337c)],4
0040100f mov     byte ptr [test!val_2 (00403378)],64h
00401016 call    dword ptr [test!_imp__puts (004020a0)]
0040101c add     esp,4
0040101f xor     eax,eax
00401021 ret

I will talk about setting breakpoints in another part and here I’m going to concentrate only on commands that examine call stack. backtrace or bt command shows stack trace. backtrace <N> or bt <N> shows only the innermost N stack frames. backtrace -<N> or bt -<N> shows only the outermost N stack frames. backtrace full or bt full additionally shows local variables. There are also variants backtrace full <N> or bt full <N> and backtrace full -<N> or bt full -<N>:

(gdb) backtrace
#0  func_4 () at test.c:48
#1  0x00401414 in func_3 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:43
#2  0x004013da in func_2 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:35
#3  0x0040139a in func_1 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:27
#4  0x00401355 in main () at test.c:18

(gdb) bt
#0  func_4 () at test.c:48
#1  0x00401414 in func_3 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:43
#2  0x004013da in func_2 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:35
#3  0x0040139a in func_1 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:27
#4  0x00401355 in main () at test.c:18

(gdb) bt 2
#0  func_4 () at test.c:48
#1  0x00401414 in func_3 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:43
(More stack frames follow...)

(gdb) bt -2
#3  0x0040139a in func_1 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:27
#4  0x00401355 in main () at test.c:18

(gdb) bt full
#0  func_4 () at test.c:48
No locals.
#1  0x00401414 in func_3 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:43
        param_2 = 49 '1'
#2  0x004013da in func_2 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:35
        param_2 = 49 '1'
#3  0x0040139a in func_1 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:27
        param_2 = 49 '1'
#4  0x00401355 in main () at test.c:18
No locals.

(gdb) bt full 2
#0  func_4 () at test.c:48
No locals.
#1  0x00401414 in func_3 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:43
        param_2 = 49 '1'
(More stack frames follow...)

(gdb) bt full -2
#3  0x0040139a in func_1 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:27
        param_2 = 49 '1'
#4  0x00401355 in main () at test.c:18
No locals.

(gdb)

In WinDbg there is only one k command but it has many parameters, for example:

- default stack trace with source code lines

0:000> k
ChildEBP RetAddr
0012ff20 00408c30 test!func_4 [c:\dmitri\test\test\test.cpp @ 47]
0012ff28 00408c69 test!func_3+0x30 [c:\dmitri\test\test\test.cpp @ 44]
0012ff40 00408c99 test!func_2+0x29 [c:\dmitri\test\test\test.cpp @ 35]
0012ff58 00408cd3 test!func_1+0x29 [c:\dmitri\test\test\test.cpp @ 27]
0012ff70 00401368 test!main+0x33 [c:\dmitri\test\test\test.cpp @ 18]
0012ffc0 7d4e992a test!__tmainCRTStartup+0x15f [f:\sp\vctools\crt_bld\self_x86\crt\src\crt0.c @ 327]
0012fff0 00000000 kernel32!BaseProcessStart+0x28

- stack trace without source code lines

0:000> kL
ChildEBP RetAddr
0012ff20 00408c30 test!func_4
0012ff28 00408c69 test!func_3+0x30
0012ff40 00408c99 test!func_2+0x29
0012ff58 00408cd3 test!func_1+0x29
0012ff70 00401368 test!main+0x33
0012ffc0 7d4e992a test!__tmainCRTStartup+0x15f
0012fff0 00000000 kernel32!BaseProcessStart+0x28

- full stack trace without source code lines showing 3 stack arguments for every stack frame, calling convention and optimization information

0:000> kvL
ChildEBP RetAddr  Args to Child
0012ff20 00408c30 0012ff40 00408c69 00000001 test!func_4 (CONV: cdecl)
0012ff28 00408c69 00000001 00000031 0040c9d4 test!func_3+0x30 (CONV: cdecl)
0012ff40 00408c99 00000001 00000031 0040c9d4 test!func_2+0x29 (CONV: cdecl)
0012ff58 00408cd3 00000001 00000031 0040c9d4 test!func_1+0x29 (CONV: cdecl)
0012ff70 00401368 00000001 004230e0 00423120 test!main+0x33 (CONV: cdecl)
0012ffc0 7d4e992a 00000000 00000000 7efde000 test!__tmainCRTStartup+0x15f (FPO: [Non-Fpo]) (CONV: cdecl)
0012fff0 00000000 004013bf 00000000 00000000 kernel32!BaseProcessStart+0x28 (FPO: [Non-Fpo])

- stack trace without source code lines showing all function parameters

0:000> kPL
ChildEBP RetAddr
0012ff20 00408c30 test!func_4(void)
0012ff28 00408c69 test!func_3(
   int param_1 = 1,
   char param_2 = 49 '1',
   int * param_3 = 0x0040c9d4,
   char * param_4 = 0x0040c9d0 "d")+0x30
0012ff40 00408c99 test!func_2(
   int param_1 = 1,
   char param_2 = 49 '1',
   int * param_3 = 0x0040c9d4,
   char * param_4 = 0x0040c9d0 "d")+0x29
0012ff58 00408cd3 test!func_1(
   int param_1 = 1,
   char param_2 = 49 '1',
   int * param_3 = 0x0040c9d4,
   char * param_4 = 0x0040c9d0 "d")+0x29
0012ff70 00401368 test!main(void)+0x33
0012ffc0 7d4e992a test!__tmainCRTStartup(void)+0x15f
0012fff0 00000000 kernel32!BaseProcessStart+0x28

- stack trace without source code lines showing stack frame numbers

0:000> knL
 # ChildEBP RetAddr
00 0012ff20 00408c30 test!func_4
01 0012ff28 00408c69 test!func_3+0x30
02 0012ff40 00408c99 test!func_2+0x29
03 0012ff58 00408cd3 test!func_1+0x29
04 0012ff70 00401368 test!main+0x33
05 0012ffc0 7d4e992a test!__tmainCRTStartup+0x15f
06 0012fff0 00000000 kernel32!BaseProcessStart+0x28

- stack trace without source code lines showing the distance between stack frames in bytes

0:000> knfL
 #   Memory  ChildEBP RetAddr
00           0012ff20 00408c30 test!func_4
01         8 0012ff28 00408c69 test!func_3+0x30
02        18 0012ff40 00408c99 test!func_2+0x29
03        18 0012ff58 00408cd3 test!func_1+0x29
04        18 0012ff70 00401368 test!main+0x33
05        50 0012ffc0 7d4e992a test!__tmainCRTStartup+0x15f
06        30 0012fff0 00000000 kernel32!BaseProcessStart+0x28

- stack trace without source code lines showing the innermost 2 frames:

0:000> kL 2
ChildEBP RetAddr
0012ff20 00408c30 test!func_4
0012ff28 00408c69 test!func_3+0x30

If you want to see stack traces from all threads in a process use the following command: 

(gdb) thread apply all bt

Thread 1 (thread 728.0xc0c):
#0  func_4 () at test.c:48
#1  0x00401414 in func_3 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:43
#2  0x004013da in func_2 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:35
#3  0x0040139a in func_1 (param_1=1, param_2=49 '1', param_3=0x404080,
    param_4=0x404070 "d") at test.c:27
#4  0x00401355 in main () at test.c:18
(gdb)

In WinDbg it is ~*k. Any parameter shown above can be used, for example:

0:000> ~*kL

.  0  Id: 103c.17d8 Suspend: 1 Teb: 7efdd000 Unfrozen
ChildEBP RetAddr
0012ff20 00408c30 test!func_4
0012ff28 00408c69 test!func_3+0x30
0012ff40 00408c99 test!func_2+0x29
0012ff58 00408cd3 test!func_1+0x29
0012ff70 00401368 test!main+0x33
0012ffc0 7d4e992a test!__tmainCRTStartup+0x15f
0012fff0 00000000 kernel32!BaseProcessStart+0x28

Therefore, our next version of the map contains these new commands:

Action                      | GDB                 | WinDbg
----------------------------------------------------------
Start the process           | run                 | g
Exit                        | (q)uit              | q
Disassemble (forward)       | (disas)semble       | uf, u
Disassemble N instructions  | x/<N>i              | -
Disassemble (backward)      | -                   | ub
Stack trace                 | backtrace (bt)      | k
Full stack trace            | bt full             | kv
Partial trace (innermost)   | bt <N>              | k <N>
Partial trace (outermost)   | bt -<N>             | -
Stack trace for all threads | thread apply all bt | ~*k
Breakpoint                  | break               | bp

- Dmitry Vostokov @ DumpAnalysis.org -

GDB for WinDbg Users (Part 4)

July 1st, 2007

If you are looking for debugging tutorials with a wider scope than just listing various debugger commands then the following books will be useful:

Both use GDB for debugging case studies and will be useful for engineers with any level of debugging experience.

- Dmitry Vostokov @ DumpAnalysis.org -

Correcting Microsoft article about userdump.exe

June 28th, 2007

There is much confusion among Microsoft and Citrix customers on how to use userdump.exe to save a process dump. Microsoft published an article about userdump.exe and it has the following title:

How to use the Userdump.exe tool to create a dump file

Unfortunately all scenarios listed there start with:

1. Run the Setup.exe program for your processor.

It also says:

<…> move to the version of Userdump.exe for your processor at the command prompt 

I would like to correct the article here. You don’t need to run setup.exe, you just need to copy userdump.exe and dbghelp.dll. The latter is important because the version of that DLL in your system32 folder can be older and userdump.exe will not start:

C:\kktools\userdump8.1\x64>userdump.exe

!!!!!!!!!! Error !!!!!!!!!!
Unsupported DbgHelp.dll version.
Path   : C:\W2K3\system32\DbgHelp.dll
Version: 5.2.3790.1830

C:\kktools\userdump8.1\x64>

For most customers running setup.exe and configuring the default rules in Exception Monitor creates the significant amount of false positive dumps. If we want to manually dump a process we don’t need automatically generated dumps or fine tune Exception Monitor rules to reduce the number of dumps.

Just an additional note: if you have an error dialog box showing that a program got an exception you can find that process in Task Manager and use userdump.exe to save that process dump manually. Then inside the dump it is possible to see that error. Therefore in the case when a default postmortem debugger wasn’t configured in the registry you can still get a dump for postmortem crash dump analysis. Here is an example. I removed a postmortem debugger from

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug
Debugger=

Now if we run TestDefaultDebugger tool and hit the big crash button we get the following message box:

 

If we save TestDefaultDebugger process dump manually using userdump.exe when this message box is shown

C:\kktools\userdump8.1\x64>userdump.exe 5264 c:\tdd.dmp
User Mode Process Dumper (Version 8.1.2929.4)
Copyright (c) Microsoft Corp. All rights reserved.
Dumping process 5264 (TestDefaultDebugger64.exe) to
c:\tdd.dmp...
The process was dumped successfully.

and open it in WinDbg we can see the problem thread there:

0:000> kn
#  Child-SP          RetAddr           Call Site
00 00000000`0012dab8 00000000`77dbfb3b ntdll!ZwRaiseHardError+0xa
01 00000000`0012dac0 00000000`004148c6 kernel32!UnhandledExceptionFilter+0x6c8
02 00000000`0012e2f0 00000000`004165f6 TestDefaultDebugger64!__tmainCRTStartup$filt$0+0x16
03 00000000`0012e320 00000000`78ee4bdd TestDefaultDebugger64!__C_specific_handler+0xa6
04 00000000`0012e3b0 00000000`78ee685a ntdll!RtlpExecuteHandlerForException+0xd
05 00000000`0012e3e0 00000000`78ef3a5d ntdll!RtlDispatchException+0x1b4
06 00000000`0012ea90 00000000`00401570 ntdll!KiUserExceptionDispatch+0x2d
07 00000000`0012f028 00000000`00403d4d TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
08 00000000`0012f030 00000000`00403f75 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc1
09 00000000`0012f070 00000000`004030cc TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0x169
0a 00000000`0012f0f0 00000000`0040c18d TestDefaultDebugger64!CDialog::OnCmdMsg+0x28
0b 00000000`0012f150 00000000`0040cfbd TestDefaultDebugger64!CWnd::OnCommand+0xc9
0c 00000000`0012f200 00000000`0040818f TestDefaultDebugger64!CWnd::OnWndMsg+0x55
0d 00000000`0012f360 00000000`0040b2e5 TestDefaultDebugger64!CWnd::WindowProc+0x33
0e 00000000`0012f3c0 00000000`0040b3d2 TestDefaultDebugger64!AfxCallWndProc+0xf1
0f 00000000`0012f480 00000000`77c439fc TestDefaultDebugger64!AfxWndProc+0x4e
10 00000000`0012f4e0 00000000`77c432ba user32!UserCallWinProcCheckWow+0x1f9
11 00000000`0012f5b0 00000000`77c4335b user32!SendMessageWorker+0x68c
12 00000000`0012f650 000007ff`7f07c5af user32!SendMessageW+0x9d
13 00000000`0012f6a0 000007ff`7f07eb8e comctl32!Button_ReleaseCapture+0x14f

The second parameter to RtlDispatchException is the pointer to the exception context so if we dump the stack trace verbosely we can get that pointer and pass it to .cxr command:

0:000> kv
Child-SP          RetAddr           : Args to Child
...
...
...
00000000`0012e3e0 00000000`78ef3a5d : 00000000`0040c9ec 00000000`0012ea90 00000000`00000001 00000000`00000111 : ntdll!RtlDispatchException+0×1b4


0:000> .cxr 00000000`0012ea90
rax=0000000000000000 rbx=0000000000000001 rcx=000000000012fd70
rdx=00000000000003e8 rsi=000000000012fd70 rdi=0000000000432e90
rip=0000000000401570 rsp=000000000012f028 rbp=0000000000000111
 r8=0000000000000000  r9=0000000000401570 r10=0000000000401570
r11=000000000015abb0 r12=0000000000000000 r13=00000000000003e8
r14=0000000000000110 r15=0000000000000001
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1:
00000000`00401570 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

We see that it was NULL pointer dereference that caused the process termination. Now we can dump the full stack trace that led to our crash:

0:000> kn 100
#  Child-SP          RetAddr           Call Site
00 00000000`0012f028 00000000`00403d4d TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
01 00000000`0012f030 00000000`00403f75 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc1
02 00000000`0012f070 00000000`004030cc TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0x169
03 00000000`0012f0f0 00000000`0040c18d TestDefaultDebugger64!CDialog::OnCmdMsg+0x28
04 00000000`0012f150 00000000`0040cfbd TestDefaultDebugger64!CWnd::OnCommand+0xc9
05 00000000`0012f200 00000000`0040818f TestDefaultDebugger64!CWnd::OnWndMsg+0x55
06 00000000`0012f360 00000000`0040b2e5 TestDefaultDebugger64!CWnd::WindowProc+0x33
07 00000000`0012f3c0 00000000`0040b3d2 TestDefaultDebugger64!AfxCallWndProc+0xf1
08 00000000`0012f480 00000000`77c439fc TestDefaultDebugger64!AfxWndProc+0x4e
09 00000000`0012f4e0 00000000`77c432ba user32!UserCallWinProcCheckWow+0x1f9
0a 00000000`0012f5b0 00000000`77c4335b user32!SendMessageWorker+0x68c
0b 00000000`0012f650 000007ff`7f07c5af user32!SendMessageW+0x9d
0c 00000000`0012f6a0 000007ff`7f07eb8e comctl32!Button_ReleaseCapture+0x14f
0d 00000000`0012f6d0 00000000`77c439fc comctl32!Button_WndProc+0x8ee
0e 00000000`0012f830 00000000`77c43e9c user32!UserCallWinProcCheckWow+0x1f9
0f 00000000`0012f900 00000000`77c3965a user32!DispatchMessageWorker+0x3af
10 00000000`0012f970 00000000`0040706d user32!IsDialogMessageW+0x256
11 00000000`0012fa40 00000000`0040868c TestDefaultDebugger64!CWnd::IsDialogMessageW+0x35
12 00000000`0012fa80 00000000`0040309c TestDefaultDebugger64!CWnd::PreTranslateInput+0x28
13 00000000`0012fab0 00000000`0040ae73 TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc0
14 00000000`0012faf0 00000000`004047fc TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0x33
15 00000000`0012fb30 00000000`00404857 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0x64233]
16 00000000`0012fb70 00000000`00404a17 TestDefaultDebugger64!AfxPreTranslateMessage+0x23
17 00000000`0012fba0 00000000`00404a57 TestDefaultDebugger64!AfxInternalPumpMessage+0x37
18 00000000`0012fbe0 00000000`0040a419 TestDefaultDebugger64!AfxPumpMessage+0x1b
19 00000000`0012fc10 00000000`00403a3a TestDefaultDebugger64!CWnd::RunModalLoop+0xe5
1a 00000000`0012fc90 00000000`00401139 TestDefaultDebugger64!CDialog::DoModal+0x1ce
1b 00000000`0012fd40 00000000`0042bbbd TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe9
1c 00000000`0012fe70 00000000`00414848 TestDefaultDebugger64!AfxWinMain+0x69
1d 00000000`0012fed0 00000000`77d5966c TestDefaultDebugger64!__tmainCRTStartup+0x258
1e 00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

The same technique can be used to dump a process when any kind of error message box appears, for example, when a .NET application displays a .NET exception message box or a native application shows a run-time error dialog box. 

- Dmitry Vostokov @ DumpAnalysis.org -

GDB for WinDbg Users (Part 3)

June 28th, 2007

One of the common tasks in crash dump analysis is to disassemble various functions. In GDB it can be done by using two different commands: disassemble and x/i.

The first command gets a function name, an address or a range of addresses and can be shortened to just disas:

(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
0x4012f0 <main>:        push   ebp
0x4012f1 <main+1>:      mov    ebp,esp
0x4012f3 <main+3>:      sub    esp,0x8
0x4012f6 <main+6>:      and    esp,0xfffffff0
0x4012f9 <main+9>:      mov    eax,0x0
0x4012fe <main+14>:     add    eax,0xf
0x401301 <main+17>:     add    eax,0xf
0x401304 <main+20>:     shr    eax,0x4
0x401307 <main+23>:     shl    eax,0x4
0x40130a <main+26>:     mov    DWORD PTR [ebp-4],eax
0x40130d <main+29>:     mov    eax,DWORD PTR [ebp-4]
0x401310 <main+32>:     call   0x401860 <_alloca>
0x401315 <main+37>:     call   0x401500 <__main>
0x40131a <main+42>:     mov    DWORD PTR [esp],0x403000
0x401321 <main+49>:     call   0x401950 <puts>
0x401326 <main+54>:     mov    eax,0x0
0x40132b <main+59>:     leave
0x40132c <main+60>:     ret
0x40132d <main+61>:     nop
0x40132e <main+62>:     nop
0x40132f <main+63>:     nop
End of assembler dump.
(gdb) disas 0x4012f0
Dump of assembler code for function main:
0x4012f0 <main>:        push   ebp
0x4012f1 <main+1>:      mov    ebp,esp
0x4012f3 <main+3>:      sub    esp,0x8
0x4012f6 <main+6>:      and    esp,0xfffffff0
0x4012f9 <main+9>:      mov    eax,0x0
0x4012fe <main+14>:     add    eax,0xf
0x401301 <main+17>:     add    eax,0xf
0x401304 <main+20>:     shr    eax,0x4
0x401307 <main+23>:     shl    eax,0x4
0x40130a <main+26>:     mov    DWORD PTR [ebp-4],eax
0x40130d <main+29>:     mov    eax,DWORD PTR [ebp-4]
0x401310 <main+32>:     call   0x401860 <_alloca>
0x401315 <main+37>:     call   0x401500 <__main>
0x40131a <main+42>:     mov    DWORD PTR [esp],0x403000
0x401321 <main+49>:     call   0x401950 <puts>
0x401326 <main+54>:     mov    eax,0x0
0x40132b <main+59>:     leave
0x40132c <main+60>:     ret
0x40132d <main+61>:     nop
0x40132e <main+62>:     nop
0x40132f <main+63>:     nop
End of assembler dump.
(gdb) disas 0x4012f0 0x40132d
Dump of assembler code from 0x4012f0 to 0x40132d:
0x4012f0 <main>:        push   ebp
0x4012f1 <main+1>:      mov    ebp,esp
0x4012f3 <main+3>:      sub    esp,0x8
0x4012f6 <main+6>:      and    esp,0xfffffff0
0x4012f9 <main+9>:      mov    eax,0x0
0x4012fe <main+14>:     add    eax,0xf
0x401301 <main+17>:     add    eax,0xf
0x401304 <main+20>:     shr    eax,0x4
0x401307 <main+23>:     shl    eax,0x4
0x40130a <main+26>:     mov    DWORD PTR [ebp-4],eax
0x40130d <main+29>:     mov    eax,DWORD PTR [ebp-4]
0x401310 <main+32>:     call   0x401860 <_alloca>
0x401315 <main+37>:     call   0x401500 <__main>
0x40131a <main+42>:     mov    DWORD PTR [esp],0x403000
0x401321 <main+49>:     call   0x401950 <puts>
0x401326 <main+54>:     mov    eax,0x0
0x40132b <main+59>:     leave
0x40132c <main+60>:     ret
End of assembler dump.
(gdb)

The equivalent for this command in WinDbg is uf (unassemble function) and u (unassemble):

0:000> .asm no_code_bytes
Assembly options: no_code_bytes
0:000> uf main
test!main [test.cpp @ 3]:
00401000 push    offset test!`string' (004020f4)
00401005 call    dword ptr [test!_imp__puts (004020a0)]
0040100b add     esp,4
0040100e xor     eax,eax
00401010 ret
0:000> uf 00401000
test!main [test.cpp @ 3]:
00401000 push    offset test!`string' (004020f4)
00401005 call    dword ptr [test!_imp__puts (004020a0)]
0040100b add     esp,4
0040100e xor     eax,eax
00401010 ret
0:000> u 00401000
test!main [c:\dmitri\test\test\test.cpp @ 3]:
00401000 push    offset test!`string' (004020f4)
00401005 call    dword ptr [test!_imp__puts (004020a0)]
0040100b add     esp,4
0040100e xor     eax,eax
00401010 ret
test!__security_check_cookie [f:\sp\vctools\crt_bld\self_x86\crt\src\intel\secchk.c @ 52]:
00401011 cmp     ecx,dword ptr [test!__security_cookie (00403000)]
00401017 jne     test!__security_check_cookie+0xa (0040101b)
00401019 rep ret
0:000> u 00401000 00401011
test!main [test.cpp @ 3]:
00401000 push    offset test!`string' (004020f4)
00401005 call    dword ptr [test!_imp__puts (004020a0)]
0040100b add     esp,4
0040100e xor     eax,eax
00401010 ret
0:000> u
test!__security_check_cookie [f:\sp\vctools\crt_bld\self_x86\crt\src\intel\secchk.c @ 52]:
00401011 cmp     ecx,dword ptr [test!__security_cookie (00403000)]
00401017 jne     test!__security_check_cookie+0xa (0040101b)
00401019 rep ret
0040101b jmp     test!__report_gsfailure (004012cd)
test!pre_cpp_init [f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 321]:
00401020 push    offset test!_RTC_Terminate (004014fd)
00401025 call    test!atexit (004014c7)
0040102a mov     eax,dword ptr [test!_newmode (00403364)]
0040102f mov     dword ptr [esp],offset test!startinfo (0040302c)
0:000> u eip
ntdll32!DbgBreakPoint:
7d61002d int     3
7d61002e ret
7d61002f nop
7d610030 mov     edi,edi
ntdll32!DbgUserBreakPoint:
7d610032 int     3
7d610033 ret
7d610034 mov     edi,edi
ntdll32!DbgBreakPointWithStatus:
7d610036 mov     eax,dword ptr [esp+4]

The second GDB command is x/[N]i address where N is the number of instructions to disassemble:

(gdb) x/i 0x4012f0
0x4012f0 <main>:        push   ebp
(gdb) x/2i 0x4012f0
0x4012f0 <main>:        push   ebp
0x4012f1 <main+1>:      mov    ebp,esp
(gdb) x/3i 0x4012f0
0x4012f0 <main>:        push   ebp
0x4012f1 <main+1>:      mov    ebp,esp
0x4012f3 <main+3>:      sub    esp,0x8
(gdb) x/4i $pc
0x4012f6 <main+6>:      and    esp,0xfffffff0
0x4012f9 <main+9>:      mov    eax,0x0
0x4012fe <main+14>:     add    eax,0xf
0x401301 <main+17>:     add    eax,0xf
(gdb)

I don’t know the way to disassemble just N instructions in WinDbg. However in WinDbg I can disassemble backwards (ub). This is useful, for example, if we have a return address and we want to see the CALL instruction:

0:000> k
ChildEBP RetAddr
0012ff7c 0040117a test!main [test.cpp @ 3]
0012ffc0 7d4e992a test!__tmainCRTStartup+0×10f [f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 597]
0012fff0 00000000 kernel32!BaseProcessStart+0×28
0:000> ub 7d4e992a
kernel32!BaseProcessStart+0×10:
7d4e9912 call    kernel32!BasepReport32bitAppLaunching (7d4e9949)
7d4e9917 push    4
7d4e9919 lea     eax,[ebp+8]
7d4e991c push    eax
7d4e991d push    9
7d4e991f push    0FFFFFFFEh
7d4e9921 call    dword ptr [kernel32!_imp__NtSetInformationThread (7d4d032c)]
7d4e9927 call    dword ptr [ebp+8]

So our next version of the map contains these new commands:

Action                     | GDB           | WinDbg
---------------------------------------------------
Start the process          | run           | g
Exit                       | (q)uit        | q
Disassemble (forward)      | (disas)semble | uf, u
Disassemble N instructions | x/i           | -
Disassemble (backward)     | -             | ub

- Dmitry Vostokov @ DumpAnalysis.org -

When a process dies silently

June 28th, 2007

There are cases when default postmortem debugger doesn’t save a dump file. This is because the default postmortem debugger is called from the crashed application thread on Windows prior to Vista and if a thread stack is exhausted or critical thread data is corrupt there is no user dump.  On Vista the default postmorten debugger is called from WER (Windows Error Reporting) process WerFault.exe so there is a chance that it can save a user dump. During my experiments today on Windows 2003 (x64) I found that if we have a stack overflow inside a 64-bit process then the process silently dies. This doesn’t happen for 32-bit processes on the same server on a native 32-bit OS. Here is the added code from the modified default Win32 API project created in Visual Studio 2005:

...
volatile DWORD dwSupressOptimization;
...
void SoFunction();
...
LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
...
  case WM_PAINT:
     hdc = BeginPaint(hWnd, &ps);
     SoFunction();
     EndPaint(hWnd, &ps);
     break;
...
}
...
void SoFunction()
{
  if (++dwSupressOptimization)
  {
     SoFunction();
     WndProc(0,0,0,0);
  }
}

Adding WndProc call to SoFunction is done to eliminate an optimization in Release build when a recursion call is transformed into a loop:

void SoFunction()
{
  if (++dwSupressOptimization)
  {
     SoFunction();
  }
}

0:001> uf SoFunction
00401300 mov     eax,1
00401305 jmp     StackOverflow!SoFunction+0x10 (00401310)
00401310 add     dword ptr [StackOverflow!dwSupressOptimization (00403374)],eax
00401316 mov     ecx,dword ptr [StackOverflow!dwSupressOptimization (00403374)]
0040131c jne     StackOverflow!SoFunction+0x10 (00401310)
0040131e ret

Therefore without WndProc added or more complicated SoFunction there is no stack overflow but a loop with 4294967295 (0xFFFFFFFF) iterations.

If we compile an x64 project with WndProc call included in SoFunction and run it we would never get a dump from any default postmortem debugger although TestDefaultDebugger64 tool crashes with a dump. I also observed a strange behavior that the application disappears only during the second window repaint although it shall crash immediately when we launch it and the main window is shown. What I have seen is when I launch the application it is running and the main window is visible. When I force it to repaint by minimizing and then maximizing, for example, only then it disappears from the screen and the process list.

If we launch 64-bit WinDbg, load and run our application we would hit the first chance exception:

0:000> g
(159c.fc4): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
StackOverflow!SoFunction+0x22:
00000001`40001322 e8d9ffffff call StackOverflow!SoFunction (00000001`40001300)

Stack trace looks like normal stack overflow:

0:000> k
Child-SP          RetAddr           Call Site
00000000`00033fe0 00000001`40001327 StackOverflow!SoFunction+0x22
00000000`00034020 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034060 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000340a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000340e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034120 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034160 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000341a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000341e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034220 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034260 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000342a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000342e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034320 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034360 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000343a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000343e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034420 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034460 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000344a0 00000001`40001327 StackOverflow!SoFunction+0x27

RSP was inside stack guard page during the CALL instruction.

0:000> r
rax=0000000000003eed rbx=00000000000f26fe rcx=0000000077c4080a
rdx=0000000000000000 rsi=000000000000000f rdi=0000000000000000
rip=0000000140001322 rsp=0000000000033fe0 rbp=00000001400035f0
 r8=000000000012fb18 r9=00000001400035f0 r10=0000000000000000
r11=0000000000000246 r12=000000000012fdd8 r13=000000000012fd50
r14=00000000000f26fe r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
StackOverflow!SoFunction+0×22:
00000001`40001322 e8d9ffffff call StackOverflow!SoFunction (00000001`40001300)

0:000> uf StackOverflow!SoFunction
00000001`40001300 sub     rsp,38h
00000001`40001304 mov     rax,qword ptr [StackOverflow!__security_cookie (00000001`40003000)]
00000001`4000130b xor     rax,rsp
00000001`4000130e mov     qword ptr [rsp+20h],rax
00000001`40001313 add     dword ptr [StackOverflow!dwSupressOptimization (00000001`400035e4)],1
00000001`4000131a mov     eax,dword ptr [StackOverflow!dwSupressOptimization (00000001`400035e4)]
00000001`40001320 je      StackOverflow!SoFunction+0×37 (00000001`40001337)
00000001`40001322 call    StackOverflow!SoFunction (00000001`40001300)
00000001`40001327 xor     r9d,r9d
00000001`4000132a xor     r8d,r8d
00000001`4000132d xor     edx,edx
00000001`4000132f xor     ecx,ecx
00000001`40001331 call    qword ptr [StackOverflow!_imp_DefWindowProcW (00000001`40002198)]
00000001`40001337 mov     rcx,qword ptr [rsp+20h]
00000001`4000133c xor     rcx,rsp
00000001`4000133f call    StackOverflow!__security_check_cookie (00000001`40001360)
00000001`40001344 add     rsp,38h
00000001`40001348 ret

However this guard page is not the last stack page as can be seen from TEB and the current RSP address (0×33fe0):

0:000> !teb
TEB at 000007fffffde000
    ExceptionList:        0000000000000000
    StackBase:            0000000000130000
    StackLimit:           0000000000031000
    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 000007fffffde000
    EnvironmentPointer:   0000000000000000
    ClientId:             000000000000159c . 0000000000000fc4
    RpcHandle:            0000000000000000
    Tls Storage:          0000000000000000
    PEB Address:          000007fffffd5000
    LastErrorValue:       0
    LastStatusValue:      c0000135
    Count Owned Locks:    0
    HardErrorMode:        0

If we continue execution and force the main application window to invalidate (repaint) itself we get another first chance exception instead of second chance:

0:000> g
(159c.fc4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
StackOverflow!SoFunction+0x22:
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)

What we see now is that RSP is outside the valid stack region (stack limit) 0×31000:

0:000> k
Child-SP          RetAddr           Call Site
00000000`00030ff0 00000001`40001327 StackOverflow!SoFunction+0×22
00000000`00031030 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031070 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000310b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000310f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031130 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031170 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000311b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000311f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031230 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031270 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000312b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000312f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031330 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031370 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000313b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000313f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031430 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031470 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000314b0 00000001`40001327 StackOverflow!SoFunction+0×27
0:000> r
rax=0000000000007e98 rbx=00000000000f26fe rcx=0000000077c4080a
rdx=0000000000000000 rsi=000000000000000f rdi=0000000000000000
rip=0000000140001322 rsp=0000000000030ff0 rbp=00000001400035f0
 r8=000000000012faa8  r9=00000001400035f0 r10=0000000000000000
r11=0000000000000246 r12=000000000012fd68 r13=000000000012fce0
r14=00000000000f26fe r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
StackOverflow!SoFunction+0×22:
00000001`40001322 call    StackOverflow!SoFunction (00000001`40001300)

Therefore we expect the second chance exception at the same address here and we get it indeed when we continue execution:

0:000> g
(159c.fc4): Access violation - code c0000005 (!!! second chance !!!)
StackOverflow!SoFunction+0x22:
00000001`40001322 call    StackOverflow!SoFunction (00000001`40001300)

Now we see why the process died silently. There was no stack space left for exception dispatch handler functions and therefore for the default unhandled exception filter that launches the default postmortem debugger to save a process dump. So it looks like on x64 Windows when our process had first chance stack overflow exception there was no second chance exception afterwards and after handling first chance stack overflow exception process execution resumed and finally hit its thread stack limit. This doesn’t happen with 32-bit processes even on x64 Windows where unhandled first chance stack overflow exception results in immediate second chance stack overflow exception at the same stack address and therefore there is a sufficient room for the local variables for exception handler and filter functions.

This is an example of what happened before exception handling changes in Vista.

- Dmitry Vostokov @ DumpAnalysis.org -

GDB for WinDbg Users (Part 2)

June 26th, 2007

The primary motivation for this tutorial is to help WinDbg users starting with FreeBSD or Linux core dump analysis like myself to quickly learn GDB debugger commands because most debugging and crash dump analysis principles and techniques are the same for both worlds. You need to disassemble, dump memory locations, list threads and their stack traces, etc. GDB users starting with Windows crash dump analysis can learn WinDbg commands quickly so this tutorial has a second name: ”WinDbg for GDB users“. I don’t want to create a separate tutorial for this to avoid duplication but I have created a separate blog category “WinDbg for GDB users” to include selected posts where I map WinDbg commands to GDB commands and vice versa.

Although GDB is primarily used on Unix systems it is possible to use it on Windows. For this tutorial I use MinGW (Minimalist GNU for Windows):

http://www.mingw.org

You can download and install the current MinGW package from SourceForge:

http://sourceforge.net/project/showfiles.php?group_id=2435

Next you need to download an install GDB package. At the time of this writing both packages (MinGW-5.1.3.exe and gdb-5.2.1-1.exe) were available at the following location:

http://sourceforge.net/project/showfiles.php?group_id=2435&package_id=82721

When installing MinGW package select MinGW base tools and g++ compiler. This will download necessary components for GNU C/C++ environment. When installing GDB package select the same destination folder you used when installing MinGW package.

Now we can create the first C program we will use for learning GDB commands:

#include <stdio.h>
int main()
{
  puts("Hello World!");
  return 0;
}

Create test.c file, save it in examples folder, compile and link into test.exe:

C:\MinGW>mkdir examples

C:\MinGW\examples>..\bin\gcc -o test.exe test.c

C:\MinGW\examples>test
Hello World!

Now you can run it under GDB: 

C:\MinGW\examples>..\bin\gdb test.exe
GNU gdb 5.2.1
...
...
...
(gdb) run
Starting program: C:\MinGW\examples/test.exe

Program exited normally.
(gdb) q

C:\MinGW\examples>

WinDbg equivalent to GDB run command is g.

Here is the command line to launch WinDbg and load the same program:

C:\MinGW\examples>"c:\Program Files\Debugging Tools for Windows\WinDbg" -y SRV*c:\symbols*http://msdl.microsoft.com/download/symbols test.exe

WinDbg will set the initial breakpoint and you can execute the process with g command:

Microsoft (R) Windows Debugger  Version 6.7.0005.0
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: test.exe
Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 00406000   image00400000
ModLoad: 7c900000 7c9b0000   ntdll.dll
ModLoad: 7c800000 7c8f4000   C:\WINDOWS\system32\kernel32.dll
ModLoad: 77c10000 77c68000   C:\WINDOWS\system32\msvcrt.dll
(220.fbc): Break instruction exception - code 80000003 (first chance)
eax=00341eb4 ebx=7ffde000 ecx=00000004 edx=00000010 esi=00341f48 edi=00341eb4
eip=7c901230 esp=0022fb20 ebp=0022fc94 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!DbgBreakPoint:
7c901230 cc              int     3
0:000> g
eax=0022fe60 ebx=00000000 ecx=0022fe68 edx=7c90eb94 esi=7c90e88e edi=00000000
eip=7c90eb94 esp=0022fe68 ebp=0022ff64 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
7c90eb94 c3              ret

q command to end a debugging session is the same for both debuggers.  

So our first map between GDB and WinDbg commands contains the following entries:

Action                  GDB     | WinDbg
----------------------------------------
Start the process       run     | g
Exit                    (q)uit  | q

- Dmitry Vostokov @ DumpAnalysis.org -