Archive for December, 2007

Upgrade in power

Sunday, December 30th, 2007

I’ve upgraded the hosting to make it more reliable especially for e-mail and notification delivery. Recently I found that people had problems contacting me through the contact form. I apologize for any inconvenience caused.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis E-zine

Thursday, December 27th, 2007

I’ve found that e-mail subscribers do not get notifications and this has been going on for almost all December. This let me think about distributing a monthly e-zine highlighting written posts and other useful and related information. Previously registered subscribers will receive the first sample e-zine issue soon. If you would like to subscribe too (or unsubscribe) you can do it at the bottom of the main blog page.

- Dmitry Vostokov @ DumpAnalysis.org -

Raw Stack Dump of all threads (part 2)

Monday, December 24th, 2007

In the previous part I used WinDbg scripting to get raw stack data from user process dump. However the script needs to be modified if the dump is complete memory dump. Here I use !for_each_thread WinDbg extension command to dump stack trace and user space raw stack data for all threads except system threads because they don’t have user space stack counterpart and their TEB address is NULL:

!for_each_thread ".thread /r /p @#Thread; .if (@$teb != 0) {!thread @#Thread; r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2}"

We need to open a log file. It will be huge and we might want to dump raw stack contents for specific process only. In such case we can filter the output of the script using $proc pseudo-register, the address of EPROCESS:

!for_each_thread ".thread /r /p @#Thread; .if (@$teb != 0 & @$proc == <EPROCESS>) {!thread @#Thread; r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2}"

For example:

1: kd>!process 0 0
...
...
...
PROCESS 8596f9c8  SessionId: 0  Cid: 0fac    Peb: 7ffde000  ParentCid: 0f3c
    DirBase: 3fba6520  ObjectTable: d6654e28  HandleCount: 389.
    Image: explorer.exe


1: kd> !for_each_thread ".thread /r /p @#Thread; .if (@$teb != 0 & @$proc == 8596f9c8) {!thread @#Thread; r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2}”
Implicit thread is now 8659b208
Implicit process is now 8659b478
Loading User Symbols

Implicit thread is now 86599db0
Implicit process is now 8659b478
Loading User Symbols

...
...
...
Implicit thread is now 85b32db0
Implicit process is now 8596f9c8
Loading User Symbols

THREAD 85b32db0  Cid 0fac.0fb0  Teb: 7ffdd000 Win32Thread: bc0a6be8 WAIT: (Unknown) UserMode Non-Alertable
    859bda20  SynchronizationEvent
Not impersonating
DeviceMap                 d743e440
Owning Process            8596f9c8       Image:         explorer.exe
Wait Start TickCount      376275         Ticks: 102 (0:00:00:01.593)
Context Switch Count      3509                 LargeStack
UserTime                  00:00:00.078
KernelTime                00:00:00.203
Win32 Start Address Explorer!ModuleEntry (0x010148a4)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init ba5fe000 Current ba5fdc50 Base ba5fe000 Limit ba5f9000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ba5fdc68 80833465 85b32db0 85b32e58 00000000 nt!KiSwapContext+0x26
ba5fdc94 80829a62 00000000 bc0a6be8 00000000 nt!KiSwapThread+0x2e5
ba5fdcdc bf89abe3 859bda20 0000000d 00000001 nt!KeWaitForSingleObject+0x346
ba5fdd38 bf89da53 000024ff 00000000 00000001 win32k!xxxSleepThread+0x1be
ba5fdd4c bf89e411 000024ff 00000000 0007fef8 win32k!xxxRealWaitMessageEx+0x12
ba5fdd5c 8088978c 0007ff08 7c8285ec badb0d00 win32k!NtUserWaitMessage+0x14
ba5fdd5c 7c8285ec 0007ff08 7c8285ec badb0d00 nt!KiFastCallEntry+0xfc (TrapFrame @ ba5fdd64)
0007feec 7739bf53 7c92addc 77e619d1 000d9298 ntdll!KiFastSystemCallRet
0007ff08 7c8fadbd 00000000 0007ff5c 0100fff1 USER32!NtUserWaitMessage+0xc
0007ff14 0100fff1 000d9298 7ffde000 0007ffc0 SHELL32!SHDesktopMessageLoop+0x24
0007ff5c 0101490c 00000000 00000000 000207fa Explorer!ExplorerWinMain+0x2c4
0007ffc0 77e6f23b 00000000 00000000 7ffde000 Explorer!ModuleEntry+0x6d
0007fff0 00000000 010148a4 00000000 78746341 kernel32!BaseProcessStart+0x23

Last set context:
TEB at 7ffdd000
    ExceptionList:        0007ffe0
    StackBase:            00080000
    StackLimit:           00072000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffdd000
    EnvironmentPointer:   00000000
    ClientId:             00000fac . 00000fb0
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffde000
    LastErrorValue:       6
    LastStatusValue:      c0000008
    Count Owned Locks:    0
    HardErrorMode:        0
00072000  ????????
00072004  ????????
00072008  ????????
0007200c  ????????
00072010  ????????
00072014  ????????
00072018  ????????
0007201c  ????????
...
...
...
00079ff8  ????????
00079ffc  ????????
0007a000  00000000
0007a004  00000000
0007a008  00000000
0007a00c  00000000
0007a010  00000000
0007a014  00000000
0007a018  00000000
0007a01c  00000000
0007a020  00000000
0007a024  00000000
0007a028  00000000
0007a02c  00000000
...
...
...
0007ff04  0007ff14
0007ff08  0007ff14
0007ff0c  7c8fadbd SHELL32!SHDesktopMessageLoop+0x24
0007ff10  00000000
0007ff14  0007ff5c
0007ff18  0100fff1 Explorer!ExplorerWinMain+0x2c4
0007ff1c  000d9298
0007ff20  7ffde000
0007ff24  0007ffc0
0007ff28  00000000
0007ff2c  0007fd28
0007ff30  0007ff50
0007ff34  7ffde000
0007ff38  7c82758b ntdll!ZwQueryInformationProcess+0xc
0007ff3c  77e6c336 kernel32!GetErrorMode+0x18
0007ff40  ffffffff
0007ff44  0000000c
0007ff48  00000000
0007ff4c  00018fb8
0007ff50  000000ec
0007ff54  00000001
0007ff58  000d9298
0007ff5c  0007ffc0
0007ff60  0101490c Explorer!ModuleEntry+0x6d
0007ff64  00000000
0007ff68  00000000
0007ff6c  000207fa
0007ff70  00000001
0007ff74  00000000
0007ff78  00000000
0007ff7c  00000044
0007ff80  0002084c
0007ff84  0002082c
0007ff88  000207fc
0007ff8c  00000000
0007ff90  00000000
0007ff94  00000000
0007ff98  00000000
0007ff9c  f60e87fc
0007ffa0  00000002
0007ffa4  021a006a
0007ffa8  00000001
0007ffac  00000001
0007ffb0  00000000
0007ffb4  00000000
0007ffb8  00000000
0007ffbc  00000000
0007ffc0  0007fff0
0007ffc4  77e6f23b kernel32!BaseProcessStart+0x23
0007ffc8  00000000
0007ffcc  00000000
0007ffd0  7ffde000
0007ffd4  00000000
0007ffd8  0007ffc8
0007ffdc  b9a94ce4
0007ffe0  ffffffff
0007ffe4  77e61a60 kernel32!_except_handler3
0007ffe8  77e6f248 kernel32!`string'+0x88
0007ffec  00000000
0007fff0  00000000
0007fff4  00000000
0007fff8  010148a4 Explorer!ModuleEntry
0007fffc  00000000
00080000  78746341
...
...
...

Because complete memory dumps contain only physical memory contents some pages of raw stack data can be in page files and therefore unavailable.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugchecks: CF

Friday, December 21st, 2007

Bugcheck CF name is the second longest one (doesn’t fit in this post title):     

TERMINAL_SERVER_DRIVER_MADE_INCORRECT_MEMORY_REFERENCE (cf)
Arguments:
Arg1: a020b1d4, memory referenced
Arg2: 00000000, value 0 = read operation, 1 = write operation
Arg3: a020b1d4, If non-zero, the instruction address which referenced the bad memory
 address.
Arg4: 00000000, Mm internal code.
 A driver has been incorrectly ported to Terminal Server. It is referencing session space addresses from the system process context.  Probably from queueing an item to a system worker thread. The broken driver's name is displayed on the screen.

Although bugcheck explanation mentions only system process context it can also happen in an arbitrary process context. Recall that kernel space address mapping is usually considered as persistent where virtual-to-physical mapping doesn’t change between switching threads that belong to different processes. However there is so called session space in multi-user terminal services environments where different users can use different display drivers, for example:

  • MS RDP users - RDPDD.DLL
  • Citrix ICA users - VDTW30.DLL
  • Vista users - TSDDD.DLL
  • Console user - Some H/W related video driver like ATI or NVIDIA

These drivers are not committed at the same time persistently since OS boot although their module addresses might remain fixed. Therefore when a new user session is created the appropriate display driver corresponding to terminal services protocol is loaded and mapped to the so called session space starting from A0000000 (x86) or FFFFF90000000000 (x64) after win32k.sys address range (on first usage) and then committed to physical memory by proper PTE entries in process page tables. During thread switch, if the new process context belongs to a different session with a different display driver, the current display driver is decommitted by clearing its PTEs and the new driver is committed by setting its proper PTE entries.

Therefore in the system process context like worker threads virtual addresses corresponding to display driver code and data might be unknown. This can also happen in an arbitrary process context if we access the code that belongs to a display driver that doesn’t correspond to the current session protocol. This can be illustrated with the following example where TSDD can be either RDP or ICA display driver.

In the list of loaded modules we can see that ATI and TSDD drivers are loaded: 

0: kd> lm
start    end        module name
...
...
...
77d30000 77d9f000   RPCRT4     (deferred)            
77e10000 77e6f000   USER32     (deferred)            
77f40000 77f7c000   GDI32      (deferred)            
77f80000 77ffc000   ntdll      (pdb symbols)
78000000 78045000   MSVCRT     (deferred)            
7c2d0000 7c335000   ADVAPI32   (deferred)            
7c340000 7c34f000   Secur32    (deferred)            
7c570000 7c624000   KERNEL32   (deferred)            
7cc30000 7cc70000   winsrv     (deferred)            
80062000 80076f80   hal        (deferred)            
80400000 805a2940   nt         (pdb symbols)
a0000000 a0190ce0   win32k     (pdb symbols)
a0191000 a01e8000   ati2drad   (deferred)            
a01f0000 a0296000   tsdd       (no symbols)
          
b4a60000 b4a72320   naveng     (deferred)            
b4a73000 b4b44c40   navex15    (deferred)    


The bugcheck happens in 3rd-partyApp process context running inside some terminal session:

PROCESS_NAME:  3rd-partyApp.exe

TRAP_FRAME:  b475f84c -- (.trap 0xffffffffb475f84c)
ErrCode = 00000000
eax=a020b1d4 ebx=00000000 ecx=04e0443b edx=ffffffff esi=a21b6778 edi=a201b018
eip=a020b1d4 esp=b475f8c0 ebp=b475f900 iopl=3 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00013246
TSDD+0×1b1d4:
a020b1d4 ??              ???

Examining display driver virtual address shows that it is unknown (PTE is NULL): 

0: kd> !pte a020b1d4
A020B1D4  - PDE at C0300A00        PTE at C028082C
          contains 14AB6863      contains 00000000
        pfn 14ab6 –DA–KWV       not valid

ATI display driver address is unknown too: 

0: kd> !pte a0191000
A0191000  - PDE at C0300A00        PTE at C0280644
          contains 3E301863      contains 00000000
        pfn 3e301 –DA–KWV       not valid

Let’s switch to another terminal session : 

PROCESS 87540a60  SessionId: 45  Cid: 3954    Peb: 7ffdf000  ParentCid: 0164
    DirBase: 2473d000  ObjectTable: 889b2c48  TableSize: 182.
    Image: csrss.exe

0: kd> .process /r /p 87540a60
Implicit process is now 87540a60
Loading User Symbols
................

Now TSDD display driver address is valid:

0: kd> !pte a020b1d4
A020B1D4  - PDE at C0300A00        PTE at C028082C
          contains 5D2C2863      contains 20985863
        pfn 5d2c2 –DA–KWV    pfn 20985 –DA–KWV

but ATI driver address is not. It is unknown and this is expected because no real display hardware is used: 

0: kd> !pte a0191000
A0191000  - PDE at C0300A00        PTE at C0280644
          contains 5D2C2863      contains 00000000
        pfn 5d2c2 –DA–KWV       not valid

Let’s switch to session 0 where display is “physical”: 

PROCESS 8898a5e0  SessionId: 0  Cid: 0180    Peb: 7ffdf000  ParentCid: 0164
    DirBase: 14c58000  ObjectTable: 8898b948  TableSize: 1322.
    Image: csrss.exe

0: kd> .process /r /p 8898a5e0
Implicit process is now 8898a5e0
Loading User Symbols
..............

TSDD driver address is unknown and this is expected too because we no longer use terminal services protocol: 

0: kd> !pte a020b1d4
A020B1D4  - PDE at C0300A00        PTE at C028082C
          contains 14AB6863      contains 00000000
        pfn 14ab6 –DA–KWV       not valid

However ATI display driver addresses are not unknown (not NULL) and their 2 selected pages are either in transition or in a page file:  

0: kd> !pte a0191000
A0191000  - PDE at C0300A00        PTE at C0280644
          contains 14AB6863      contains 156DD882
        pfn 14ab6 –DA–KWV       not valid
                               Transition:  156dd
                               Protect:  4

0: kd> !pte a0198000
A0198000  - PDE at C0300A00        PTE at C0280660
          contains 14AB6863      contains 000B9060
        pfn 14ab6 –DA–KWV       not valid
                               PageFile    0
                               Offset b9
                               Protect:  3

- Dmitry Vostokov @ DumpAnalysis.org -

2007 in Retrospection (Part 1)

Thursday, December 20th, 2007

I started this blog in August, 2006 and during Christmas, 2007 moved it to dumpanalysis.org registered in March, 2006. However I didn’t track any site stats until Feb, 2007 when I added Google Analytics and ShinyStat counters. I was surprised by the growing number of visits started with 2,000/month in February and exceeding 9,000/month in November. Here is the simplified daily visits curve from ShinyStat:

I’ll put more interesting analytic observations from 2007 data later on.

Wishing all visitors a Happy Christmas and New Year.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 42b)

Wednesday, December 19th, 2007

Here is another example of Wait Chain pattern where objects are critical sections. 

WinDbg can detect them if we use !analyze -v -hang command but it detects only one and not necessarily the longest or widest chain in cases with multiple wait chains:

DERIVED_WAIT_CHAIN:

Dl Eid Cid     WaitType
-- --- ------- --------------------------
   2   8d8.90c Critical Section       -->
   4   8d8.914 Critical Section       -->
   66  8d8.f9c Unknown

Looking at threads we see this chain and we also see that the final thread is blocked waiting for socket. 

 0:167> ~~[90c]kvL
ChildEBP RetAddr  Args to Child             
00bbfd9c 7c942124 7c95970f 00000ea0 00000000 ntdll!KiFastSystemCallRet
00bbfda0 7c95970f 00000ea0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
00bbfddc 7c959620 00000000 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x19c
00bbfdfc 6748d2f9 06018b50 00000000 00000000 ntdll!RtlEnterCriticalSection+0xa8



00bbffb8 7c82608b 00315218 00000000 00000000 msvcrt!_endthreadex+0xa3
00bbffec 00000000 77b9b4bc 00315218 00000000 kernel32!BaseThreadStart+0×34

0:167> ~~[914]kvL 100
ChildEBP RetAddr  Args to Child             
00dbf1cc 7c942124 7c95970f 000004b0 00000000 ntdll!KiFastSystemCallRet
00dbf1d0 7c95970f 000004b0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
00dbf20c 7c959620 00000000 00000004 0031abcc ntdll!RtlpWaitOnCriticalSection+0x19c
00dbf22c 6748d244 0031abd8 003174e0 00dbf254 ntdll!RtlEnterCriticalSection+0xa8



00dbffb8 7c82608b 00315218 00000000 00000000 msvcrt!_endthreadex+0xa3
00dbffec 00000000 77b9b4bc 00315218 00000000 kernel32!BaseThreadStart+0×34

0:167> ~~[f9c]kvL 100
ChildEBP RetAddr  Args to Child             
0fe2a09c 7c942124 71933a09 00000b50 00000001 ntdll!KiFastSystemCallRet
0fe2a0a0 71933a09 00000b50 00000001 0fe2a0c8 ntdll!NtWaitForSingleObject+0xc
0fe2a0dc 7194576e 00000b50 00000234 00000000 mswsock!SockWaitForSingleObject+0x19d
0fe2a154 71a12679 00000234 0fe2a1b4 00000001 mswsock!WSPRecv+0x203
0fe2a190 62985408 00000234 0fe2a1b4 00000001 WS2_32!WSARecv+0x77
0fe2a1d0 6298326b 00000234 0274ebc6 00000810 component!wait+0x338
...
...
...
0fe2ffb8 7c82608b 060cfc70 00000000 00000000 msvcrt!_endthreadex+0xa3
0fe2ffec 00000000 77b9b4bc 060cfc70 00000000 kernel32!BaseThreadStart+0x34

If we look at all held critical sections we would see another thread that blocked +125 other threads:

0:167> !locks

CritSec +31abd8 at 0031abd8
WaiterWoken        No
LockCount          6
RecursionCount     1
OwningThread       f9c
EntryCount         0
ContentionCount    17
*** Locked

CritSec +51e4bd8 at 051e4bd8
WaiterWoken        No
LockCount          125
RecursionCount     1
OwningThread       830
EntryCount         0
ContentionCount    7d
*** Locked

CritSec +5f40620 at 05f40620
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       920
EntryCount         0
ContentionCount    0
*** Locked

CritSec +60b6320 at 060b6320
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       8a8
EntryCount         0
ContentionCount    1
*** Locked

CritSec +6017c60 at 06017c60
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       914
EntryCount         0
ContentionCount    0
*** Locked

CritSec +6018b50 at 06018b50
WaiterWoken        No
LockCount          3
RecursionCount     1
OwningThread       914
EntryCount         0
ContentionCount    3
*** Locked

CritSec +6014658 at 06014658
WaiterWoken        No
LockCount          2
RecursionCount     1
OwningThread       928
EntryCount         0
ContentionCount    2
*** Locked

0:167> ~~[830]kvL 100
ChildEBP RetAddr  Args to Child             
0ff2f300 7c942124 7c95970f 000004b0 00000000 ntdll!KiFastSystemCallRet
0ff2f304 7c95970f 000004b0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0ff2f340 7c959620 00000000 00000004 0031abcc ntdll!RtlpWaitOnCriticalSection+0x19c
0ff2f360 6748d244 0031abd8 003174e0 0ff2f388 ntdll!RtlEnterCriticalSection+0xa8



0ff2ffb8 7c82608b 060cf9a0 00000000 00000000 msvcrt!_endthreadex+0xa3
0ff2ffec 00000000 77b9b4bc 060cf9a0 00000000 kernel32!BaseThreadStart+0×34

Searching for any thread waiting for critical section 051e4bd8 gives us:

   8  Id: 8d8.924 Suspend: 1 Teb: 7ffd5000 Unfrozen
ChildEBP RetAddr  Args to Child             
011ef8e0 7c942124 7c95970f 00000770 00000000 ntdll!KiFastSystemCallRet
011ef8e4 7c95970f 00000770 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
011ef920 7c959620 00000000 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x19c
011ef940 677b209d 051e4bd8 011efa0c 057bd36c ntdll!RtlEnterCriticalSection+0xa8



011effb8 7c82608b 00315510 00000000 00000000 msvcrt!_endthreadex+0xa3
011effec 00000000 77b9b4bc 00315510 00000000 kernel32!BaseThreadStart+0×34

and we can construct yet another wait chain:

   8   8d8.924 Critical Section       -->
   67  8d8.830 Critical Section       -->
   66  8d8.f9c Unknown

- Dmitry Vostokov @ DumpAnalysis.org -

Management Bits and Tips blog

Tuesday, December 18th, 2007

To disassociate management activities and thoughts with crashes and hangs I have created a separate blog called

Management Bits and Tips

with the subtitle “Reflections on Software Engineering and Software Technical Support Management”.

Although, in the future, I reserve the right to metaphorically relate crash and hang dump analysis patterns with technical and people management.

All future posts in Management Bits and Tips category and related posts in Software Techical Support category will go there and here I will only post monthly or bi-monthly summary.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 41b)

Monday, December 17th, 2007

Now Manual Dump pattern as seen from process memory dumps. It is not possible to reliably identify manual dumps here because a debugger or another process dumper might have been attached to a process noninvasively and not leaving traces of intervention so we can only rely on the following information:

Comment field

Loading Dump File [C:\kktools\userdump8.1\x64\notepad.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Userdump generated complete user-mode minidump with Standalone function on COMPUTER-NAME'

Absence of exceptions

Loading Dump File [C:\UserDumps\notepad.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:31:31.000 2007 (GMT+0)
System Uptime: 0 days 0:45:11.148
Process Uptime: 0 days 0:00:36.000
....................
user32!ZwUserGetMessage+0xa:
00000000`76c8e6aa c3              ret
0:000> ~*kL

.  0  Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0029f618 00000000`76c8e6ea user32!ZwUserGetMessage+0xa
00000000`0029f620 00000000`ff2b6eca user32!GetMessageW+0x34
00000000`0029f650 00000000`ff2bcf8b notepad!WinMain+0x176
00000000`0029f6d0 00000000`76d7cdcd notepad!IsTextUTF8+0x24f
00000000`0029f790 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0029f7c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Wake debugger exception

Loading Dump File [C:\UserDumps\notepad2.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:35:37.000 2007 (GMT+0)
System Uptime: 0 days 0:49:13.806
Process Uptime: 0 days 0:02:54.000
....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(314.1b4): Wake debugger - code 80000007 (first/second chance not available)”

user32!ZwUserGetMessage+0xa:
00000000`76c8e6aa c3              ret

Break instruction exception

Loading Dump File [C:\UserDumps\notepad3.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:45:15.000 2007 (GMT+0)
System Uptime: 0 days 0:58:52.699
Process Uptime: 0 days 0:14:20.000
....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
ntdll!DbgBreakPoint:
00000000`76ecfdf0 cc              int     3

0:001> ~*kL

   0  Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0029f618 00000000`76c8e6ea user32!ZwUserGetMessage+0xa
00000000`0029f620 00000000`ff2b6eca user32!GetMessageW+0x34
00000000`0029f650 00000000`ff2bcf8b notepad!WinMain+0x176
00000000`0029f6d0 00000000`76d7cdcd notepad!IsTextUTF8+0x24f
00000000`0029f790 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0029f7c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

#  1  Id: 1b8.ec4 Suspend: 1 Teb: 000007ff`fffda000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`030df798 00000000`76f633e8 ntdll!DbgBreakPoint
00000000`030df7a0 00000000`76d7cdcd ntdll!DbgUiRemoteBreakin+0×38

00000000`030df7d0 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`030df800 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

The latter might also be some assertion statement in the code leading to a process crash like in the following instance of Dynamic Memory Corruption pattern (heap corruption):  

FAULTING_IP:
ntdll!DbgBreakPoint+0
77f813b1 cc int 3

EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 77f813b1 (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 09aef2ac
Parameter[2]: 09aeeee8

STACK_TEXT:
09aef0bc 77fb76aa ntdll!DbgBreakPoint
09aef0c4 77fa65c2 ntdll!RtlpBreakPointHeap+0×26
09aef2bc 77fb5367 ntdll!RtlAllocateHeapSlowly+0×212
09aef340 77fa64f6 ntdll!RtlDebugAllocateHeap+0xcb
09aef540 77fcc9e3 ntdll!RtlAllocateHeapSlowly+0×5a
09aef720 786f3f11 ntdll!RtlAllocateHeap+0×954
09aef730 786fd10e rpcrt4!operator new+0×12

09aef748 786fc042 rpcrt4!OSF_CCONNECTION::OSF_CCONNECTION+0×174
09aef79c 786fbe0d rpcrt4!OSF_CASSOCIATION::AllocateCCall+0xfa
09aef808 786fbd53 rpcrt4!OSF_BINDING_HANDLE::AllocateCCall+0×1cd
09aef83c 786f1f2f rpcrt4!OSF_BINDING_HANDLE::GetBuffer+0×28
09aef854 786f1ee4 rpcrt4!I_RpcGetBufferWithObject+0×6e
09aef860 786f1ea4 rpcrt4!I_RpcGetBuffer+0xb
09aef86c 78754762 rpcrt4!NdrGetBuffer+0×2b
09aefab8 796d78b5 rpcrt4!NdrClientCall2+0×3f9
09aefac8 796d7821 advapi32!LsarOpenPolicy2+0×14
09aefb1c 796d8b04 advapi32!LsaOpenPolicy+0xaf
09aefb84 796d8aa9 advapi32!LookupAccountSidInternal+0×63
09aefbac 0aaf5d8b advapi32!LookupAccountSidW+0×1f
WARNING: Stack unwind information not available. Following frames may be wrong.
09aeff40 0aad1665 ComponentDLL+0×35d8b
09aeff5c 3f69264c ComponentDLL+0×11665
09aeff7c 780085bc ComponentDLL+0×264c
09aeffb4 77e5438b msvcrt!_endthreadex+0xc1
09aeffec 00000000 kernel32!BaseThreadStart+0×52

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 42a)

Friday, December 14th, 2007

Wait Chain is another pattern and it is simply a sequence of causal relations between events: thread A is waiting for event E to happen that threads B, C or D are supposed to signal at some time in the future but they are all waiting for event F to happen that thread G is about to signal as soon as it finishes processing some critical task:

That subsumes various deadlock patterns too which are causal loops where thread A is waiting for event AB that thread B will signal as soon as thread A signals event BA thread B is waiting for:

In this context “Event” means any type of synchronization object, critical section, LPC/RPC reply or data arrival through some IPC channel and not only Win32 event object or kernel _KEVENT.

As the first example of Wait Chain pattern I show a process being terminated and waiting for the other thread to finish or in other words, considering thread termination as an event itself, the main process thread is waiting for the second thread object to be signaled. The second thread tries to cancel previous I/O request directed to some device. However that IRP is not cancellable and process hangs. This can be depicted on the following diagram:

where Thread A is our main thread waiting for Event A which is thread B itself waiting for I/O cancellation (Event B). Their stack traces are:

THREAD 8a3178d0  Cid 04bc.01cc  Teb: 7ffdf000 Win32Thread: bc1b6e70 WAIT: (Unknown) KernelMode Non-Alertable
    8af2c920  Thread
Not impersonating
DeviceMap                 e1032530
Owning Process            89ff8d88       Image:         processA.exe
Wait Start TickCount      80444          Ticks: 873 (0:00:00:13.640)
Context Switch Count      122                 LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.156
Win32 Start Address 0x010148a4
Start Address 0x77e617f8
Stack Init f3f29000 Current f3f28be8 Base f3f29000 Limit f3f25000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f3f28c00 80833465 nt!KiSwapContext+0x26
f3f28c2c 80829a62 nt!KiSwapThread+0x2e5
f3f28c74 8094c0ea nt!KeWaitForSingleObject+0x346 ; stack trace with arguments shows the first parameter as 8af2c920 
f3f28d0c 8094c63f nt!PspExitThread+0×1f0
f3f28d24 8094c839 nt!PspTerminateThreadByPointer+0×4b
f3f28d54 8088978c nt!NtTerminateProcess+0×125
f3f28d54 7c8285ec nt!KiFastCallEntry+0xfc

THREAD 8af2c920  Cid 04bc.079c  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    8af2c998  NotificationTimer
IRP List:
    8ad26260
: (0006,0220) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 e1032530
Owning Process            89ff8d88       Image:         processA.exe
Wait Start TickCount      81312          Ticks: 5 (0:00:00:00.078)
Context Switch Count      169                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×77da3ea5
Start Address 0×77e617ec
Stack Init f3e09000 Current f3e08bac Base f3e09000 Limit f3e05000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f3e08bc4 80833465 nt!KiSwapContext+0×26
f3e08bf0 80828f0b nt!KiSwapThread+0×2e5
f3e08c38 808ea7a4 nt!KeDelayExecutionThread+0×2ab
f3e08c68 8094c360 nt!IoCancelThreadIo+0×62
f3e08cf0 8094c569 nt!PspExitThread+0×466
f3e08cfc 8082e0b6 nt!PsExitSpecialApc+0×1d
f3e08d4c 80889837 nt!KiDeliverApc+0×1ae
f3e08d4c 7c8285ec nt!KiServiceExit+0×56

By inspecting IRP we can see a device it was directed to, see that it has cancel bit but doesn’t have a cancel routine:

0: kd> !irp 8ad26260  1
Irp is active with 5 stacks 4 is current (= 0x8ad2633c)
 No Mdl: No System Buffer: Thread 8af2c920:  Irp stack trace. 
Flags = 00000000
ThreadListEntry.Flink = 8af2cb28
ThreadListEntry.Blink = 8af2cb28
IoStatus.Status = 00000000
IoStatus.Information = 00000000
RequestorMode = 00000001
Cancel = 01
CancelIrql = 0
ApcEnvironment = 00
UserIosb = 77ecb700
UserEvent = 00000000
Overlay.AsynchronousParameters.UserApcRoutine = 00000000
Overlay.AsynchronousParameters.UserApcContext = 00000000
Overlay.AllocationSize = 00000000 - 00000000
CancelRoutine = 00000000  
UserBuffer = 77ecb720
&Tail.Overlay.DeviceQueueEntry = 8ad262a0
Tail.Overlay.Thread = 8af2c920
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = 00000000
Tail.Overlay.ListEntry.Blink = 00000000
Tail.Overlay.CurrentStackLocation = 8ad2633c
Tail.Overlay.OriginalFileObject = 89ff8920
Tail.Apc = 00000000
Tail.CompletionKey = 00000000
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000
   

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
>[  c, 2]   0  1 8ab20388 89ff8920 00000000-00000000    pending
        \Device\DeviceA

   Args: 00000020 00000017 00000000 00000000
 [  c, 2]   0  0 8affa4b8 89ff8920 00000000-00000000   
        \Device\DeviceB
   Args: 00000020 00000017 00000000 00000000

- Dmitry Vostokov @ DumpAnalysis.org -

Flawless writing with Google

Thursday, December 13th, 2007

Management Bits and Tips 0×1 - Many managers have flawless writing skills (bit). Use Google to check your writing (tip).

It is especially important for non-native English speakers like me. You can search simple sub-sentences and their alterations to compare search results.

For example, today I had a discussion about this sub-sentence:

“It’s main advantage is “

It gives 539 search results. However the sentence without apostrophe

“Its main advantage is “

gives 8,870 search results. Let’s check combinations with two “it”.

  • “It’s main advantage is it’s ” - 192
  • “Its main advantage is it’s ” - 0
  • “It’s main advantage is its ” - 299
  • “Its main advantage is its ” - 836

So you get the idea of what is more correct or more widely used from descriptive grammar point of view. 

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 41a)

Wednesday, December 12th, 2007

Some memory dumps are generated on purpose to troubleshoot process and system hangs. They are usually called Manual Dumps, manual crash dumps or manual memory dumps. Kernel, complete and kernel mini dumps can be generated using the famous keyboard method described in the following Microsoft article which has been recently updated and contains the fix for USB keyboards:

http://support.microsoft.com/kb/244139

The crash dump will show E2 bugcheck:

MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Various tools including Citrix SystemDump reuse E2 bug check code and its arguments.  There are many other 3rd-party tools used to bugcheck Windows OS such as BANG! from OSR or NotMyFault from Sysinternals. The old one is crash.exe that loads crashdrv.sys and uses the following bugcheck:

Unknown bugcheck code (69696969)
Unknown bugcheck description
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

In a memory dump you would see its characteristic stack trace pointing to crashdrv module: 

STACK_TEXT:
b5b3ebe0 f615888d nt!KeBugCheck+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
b5b3ebec f61584e3 crashdrv+0x88d
b5b3ec00 8041eec9 crashdrv+0x4e3
b5b3ec14 804b328a nt!IopfCallDriver+0x35
b5b3ec28 804b40de nt!IopSynchronousServiceTail+0x60
b5b3ed00 804abd0a nt!IopXxxControlFile+0x5d6
b5b3ed34 80468379 nt!NtDeviceIoControlFile+0x28
b5b3ed34 77f82ca0 nt!KiSystemService+0xc9
0006fed4 7c5794f4 ntdll!NtDeviceIoControlFile+0xb
0006ff38 01001a74 KERNEL32!DeviceIoControl+0xf8
0006ff70 01001981 crash+0x1a74
0006ff80 01001f93 crash+0x1981
0006ffc0 7c5989a5 crash+0x1f93
0006fff0 00000000 KERNEL32!BaseProcessStart+0x3d

Sometimes various hardware buttons are used to trigger NMI and generate a crash dump when keyboard is not available. The bugcheck will be:

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction. The hardware supplier should be called.
Arguments:
Arg1: 004f4454
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Critical process termination such as session 0 csrss.exe is used to force a memory dump:

CRITICAL_OBJECT_TERMINATION (f4)
A process or thread crucial to system operation has unexpectedly exited or been terminated.
Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function.
Arguments:
Arg1: 00000003, Process
Arg2: 8a090d88, Terminating object
Arg3: 8a090eec, Process image file name
Arg4: 80967b74, Explanatory message (ascii)

- Dmitry Vostokov @ DumpAnalysis.org -

Management: Analysis and Synthesis

Tuesday, December 11th, 2007

I created “Management Bits and Tips” category to write my thoughts on management and just realized how this category title fits into grand scientific modeling approach:

Analysis (Bits) -> Synthesis (Tips)

Contrast this with pure analytic approaches

  • “Management Bits”
  • “Management Bytes”
  • “Management Bits and Bytes”

or with pure synthetic approach “Management Tips”.

I was thinking about “Management QWords” category but abandoned that thought because QWord sounds to me as an abbreviation to “Cursing Words”. “Management DWords” ?

Perhaps I have to start a separate blog otherwise debugging community will complain for this off topic :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Expertise-Driven Motivation

Tuesday, December 11th, 2007

There are many X-Driven motivations out there but I prefer expertise-driven individuals, motivated by the desire to become experts. It is not bullshit as you might think. It is more like a persistent psychological state found in researchers and scientists and the best results are guaranteed when it is supplemented by money-driven positive feedback loop. I’ve seen such people in both software engineering and software technical support environments. It is very interesting topic and I might come back to it later.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 40a)

Monday, December 10th, 2007

In Advanced Windows Debugging book I encountered some thread stacks related to debugger events like Exit a Process, Load or Unload a Module and realized that I’ve seen process crash dumps with such stacks traces. These thread stacks are not normally encountered in healthy process dumps and, statistically speaking, when a process terminates or unloads a library the chances to save a memory dump manually using process dumpers like userdump.exe or Task Manager in Vista are very low unless an interactive debugger was attached or breakpoints were set in advance. Therefore the presence of such threads in a captured crash dump usually indicates some problem or at least focuses attention to the procedure used to save a dump. Such pattern merits its own name: Special Stack Trace.

For example, one process dump had the following stack trace showing process termination initiated from .NET runtime:

STACK_TEXT:
0012fc2c 7c827c1b ntdll!KiFastSystemCallRet
0012fc30 77e668c3 ntdll!NtTerminateProcess+0xc
0012fd24 77e66905 KERNEL32!_ExitProcess+0x63
0012fd38 01256d9b KERNEL32!ExitProcess+0x14
0012ff60 01256dc7 mscorwks!SafeExitProcess+0x11a
0012ff6c 011c5fa4 mscorwks!DisableRuntime+0xd0
0012ffb0 79181b5f mscorwks!_CorExeMain+0x8c
0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
0012fff0 00000000 KERNEL32!BaseProcessStart+0x23

The original problem was an error message box and the application disappeared when a user dismissed the message. How the dump was saved? Someone advised to attach NTSD to that process, hit ‘g’ and then save the memory dump when the process breaks into the debugger again. So the problem was already gone by that time and the better way would have been to create the manual user dump of that process when it was displaying the error message.

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 6)

Friday, December 7th, 2007

Previous parts were dealing with exceptions in kernel mode. In this and next parts I’m going to investigate the flow of exception processing in user mode. In part 1 I mentioned that interrupts and exceptions generated when CPU executes code in user mode require a processor to switch the current user mode stack to kernel mode stack. This can be seen when we have a user debugger attached and it gets an exception notification called first chance exception. Because of the stack switch we don’t see any saved processor context on user mode thread stack when WinDbg breaks on first-chance exception in TestDefaultDebugger64:

0:000> r
rax=0000000000000000 rbx=0000000000000001 rcx=000000000012fd80
rdx=00000000000003e8 rsi=000000000012fd80 rdi=0000000140033fe0
rip=0000000140001690 rsp=000000000012f198 rbp=0000000000000111
 r8=0000000000000000  r9=0000000140001690 r10=0000000140001690
r11=000000000012f260 r12=0000000000000000 r13=00000000000003e8
r14=0000000000000110 r15=0000000000000001
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1:
00000001`40001690 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> kL 100
Child-SP          RetAddr           Call Site
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0x180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0x32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0x60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0x38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0x59
00000000`0012f560 00000000`77c4337a user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f630 00000000`77c4341b user32!SendMessageWorker+0x68c
00000000`0012f6d0 000007ff`7f07c89f user32!SendMessageW+0x9d
00000000`0012f720 000007ff`7f07f2e1 comctl32!Button_ReleaseCapture+0x14f
00000000`0012f750 00000000`77c43abc comctl32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f980 00000000`77c3966a user32!DispatchMessageWorker+0x3af
00000000`0012f9f0 00000001`40007148 user32!IsDialogMessageW+0x256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0x38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0x28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0x3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0x67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0x23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0x3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0x1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0x1c6
00000000`0012fd50 00000001`4002ccb1 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3
00000000`0012fe80 00000001`40016150 TestDefaultDebugger64!AfxWinMain+0x75
00000000`0012fec0 00000000`77d5964c TestDefaultDebugger64!__tmainCRTStartup+0x260
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

0:000> dqs 000000000012f198-20 000000000012f198+20
00000000`0012f178  00000001`4000bc25 TestDefaultDebugger64!CWnd::ReflectLastMsg+0x65
00000000`0012f180  00000000`00080334
00000000`0012f188  00000000`00000006
00000000`0012f190  00000000`0000000d
00000000`0012f198  00000001`40004ba0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1a0  ffffffff`fffffffe
00000000`0012f1a8  00000000`00000000
00000000`0012f1b0  00000000`00000000
00000000`0012f1b8  00000000`00000000

We see that there are no saved SS:RSP, RFLAGS, CS:RIP registers which we see on a stack if an exception happens in kernel mode as shown in part 2. If we bugcheck our system using SystemDump tool to generate complete memory dump at that time we can look later at the whole thread that experienced exception in user mode and its user mode and kernel mode stacks:

kd> !process fffffadfe7055c20 2
PROCESS fffffadfe7055c20
    SessionId: 0  Cid: 0c64    Peb: 7fffffd7000  ParentCid: 07b0
    DirBase: 27e3d000  ObjectTable: fffffa800073a550  HandleCount:  55.
    Image: TestDefaultDebugger64.exe

        THREAD fffffadfe78f2bf0  Cid 0c64.0c68  Teb: 000007fffffde000 Win32Thread: fffff97ff4d71010 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
            fffffadfdf7b6fc0  SynchronizationEvent

        THREAD fffffadfe734c3d0  Cid 0c64.0c88  Teb: 000007fffffdc000 Win32Thread: 0000000000000000 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
            fffffadfe734c670  Semaphore Limit 0x2

kd> .thread /r /p fffffadfe78f2bf0
Implicit thread is now fffffadf`e78f2bf0
Implicit process is now fffffadf`e7055c20
Loading User Symbols

kd> kL 100
Child-SP          RetAddr           Call Site
fffffadf`df7b6d30 fffff800`0103b063 nt!KiSwapContext+0x85
fffffadf`df7b6eb0 fffff800`0103c403 nt!KiSwapThread+0xc3
fffffadf`df7b6ef0 fffff800`013a9dc1 nt!KeWaitForSingleObject+0x528
fffffadf`df7b6f80 fffff800`01336dcf nt!DbgkpQueueMessage+0x281
fffffadf`df7b7130 fffff800`01011c69 nt!DbgkForwardException+0x1c5
fffffadf`df7b74f0 fffff800`0104146f nt!KiDispatchException+0x264
fffffadf`df7b7af0 fffff800`010402e1 nt!KiExceptionExit
fffffadf`df7b7c70 00000001`40001690 nt!KiPageFault+0×1e1
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0×180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0×32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0×60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0×38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0×59
00000000`0012f560 00000000`77c4337a USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f630 00000000`77c4341b USER32!SendMessageWorker+0×68c
00000000`0012f6d0 000007ff`7f07c89f USER32!SendMessageW+0×9d
00000000`0012f720 000007ff`7f07f2e1 COMCTL32!Button_ReleaseCapture+0×14f
00000000`0012f750 00000000`77c43abc COMCTL32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f980 00000000`77c3966a USER32!DispatchMessageWorker+0×3af
00000000`0012f9f0 00000001`40007148 USER32!IsDialogMessageW+0×256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0×38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0×28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0×3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0×67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0×23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0×3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0×1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0×1c6
00000000`0012fd50 00000000`00000000 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3

Dumping kernel mode stack of our thread shows that the processor saved registers there:

kd> dqs fffffadf`df7b7c70  fffffadf`df7b7c70+200
fffffadf`df7b7c70  fffffadf`e78f2bf0
fffffadf`df7b7c78  00000000`00000000
fffffadf`df7b7c80  fffffadf`e78f2b01
fffffadf`df7b7c88  00000000`00000020
...
...
...
fffffadf`df7b7d90  00000000`00000000
fffffadf`df7b7d98  00000000`00000000
fffffadf`df7b7da0  00000000`00000000
fffffadf`df7b7da8  00000000`00000000
fffffadf`df7b7db0  00000000`001629b0
fffffadf`df7b7db8  00000000`00000001
fffffadf`df7b7dc0  00000000`00000001
fffffadf`df7b7dc8  00000000`00000111 ; RBP saved by KiPageFault
fffffadf`df7b7dd0  00000000`00000006 ; Page-Fault Error Code
fffffadf`df7b7dd8  00000001`40001690 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1 ; RIP
fffffadf`df7b7de0  00000000`00000033 ; CS
fffffadf`df7b7de8  00000000`00010246 ; RFLAGS
fffffadf`df7b7df0  00000000`0012f198 ; RSP
fffffadf`df7b7df8  00000000`0000002b ; SS
fffffadf`df7b7e00  00000000`0000027f
fffffadf`df7b7e08  00000000`00000000
fffffadf`df7b7e10  00000000`00000000
fffffadf`df7b7e18  0000ffff`00001f80
fffffadf`df7b7e20  00000000`00000000
fffffadf`df7b7e28  00000000`00000000
fffffadf`df7b7e30  00000000`00000000
fffffadf`df7b7e38  00000000`00000000


kd> .asm no_code_bytes
Assembly options: no_code_bytes

kd> u KiPageFault
nt!KiPageFault:
fffff800`01040100 push    rbp
fffff800`01040101 sub     rsp,158h
fffff800`01040108 lea     rbp,[rsp+80h]
fffff800`01040110 mov     byte ptr [rbp-55h],1
fffff800`01040114 mov     qword ptr [rbp-50h],rax
fffff800`01040118 mov     qword ptr [rbp-48h],rcx
fffff800`0104011c mov     qword ptr [rbp-40h],rdx
fffff800`01040120 mov     qword ptr [rbp-38h],r8

Error code 6 is 110 in binary and volume 3A of Intel manual tells us that “the fault was caused by a non-present page” (bit 0 is cleared), “the access causing the fault was a write” (bit 1 is set) and “the access causing the fault originated when the processor was executing in user mode” (bit 2 is set).

- Dmitry Vostokov @ DumpAnalysis.org -

SIMSIM Software Development Process

Thursday, December 6th, 2007

Faced with the problem to find time to write troubleshooting tools that spring to my mind I devised this process that seems to be a novel way to write software for busy professionals. Its essence is in writing software when presenting it or when presenting software writing topics, for example, software architecture, design and implementation. It has some agile process flavour but magnified by a bigger audience than pair programming has and nicely complements my Reading Windows-based Code series. SIMSIM is an abbreviation for:

Show IMplementation and Subsequent IMprovement

More details will be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org -
 

Complexity and Memory Dumps (Part 1)

Wednesday, December 5th, 2007

Asking right questions at the appropriate hierarchical organization level is a known solution to complexity. In case of memory dumps it is sometimes useful to forget about bits, bytes, words, dwords and qwords, memory addresses, pointers, runtime structures, API and ask educated questions at component level, the simplest of it is the question about component timestamp, in WinDbg parlance, using variants of lm command, for example:

0:008> lmt m ModuleA
start    end        module name
76290000 762ad000   ModuleA  Sat Feb 17 13:59:59 2007 (45D70A5F)

0:008> lmt m ModuleB
start    end        module name
66c50000 66c65000   ModuleB  Fri Feb 02 22:30:03 2007 (45C3BB6B)

The next step is obvious: test with the newer version. Another good question is about consistency to exclude cases caused by α-particle hits. This latter possibility was mentioned in Andreas Zeller’s book I read some time ago and can be considered as the efficient cause of some crash dumps according to Aristotelian causation categories.   

- Dmitry Vostokov @ DumpAnalysis.org -

CAFF BugCheck

Tuesday, December 4th, 2007

Recently observed it in a kernel dump and found that userdump.sys generates it from userdump.exe request when process monitoring rules in Process Dumper from Microsoft userdump package are set to “Bugcheck after dumping”:

BUGCHECK_STR:  0xCAFF

PROCESS_NAME:  userdump.exe

kd> kL
Child-SP          RetAddr           Call Site
fffffadf`dfcf19b8 fffffadf`dfee38c4 nt!KeBugCheck
fffffadf`dfcf19c0 fffff800`012ce9cf userdump!UdIoctl+0x104
fffffadf`dfcf1a70 fffff800`012df026 nt!IopXxxControlFile+0xa5a
fffffadf`dfcf1b90 fffff800`010410fd nt!NtDeviceIoControlFile+0x56
fffffadf`dfcf1c00 00000000`77ef0a5a nt!KiSystemServiceCopyEnd+0x3
00000000`01eadd58 00000001`0000a755 ntdll!NtDeviceIoControlFile+0xa
00000000`01eadd60 00000000`77ef30a5 userdump_100000000!UdServiceWorkerAPC+0x1005
00000000`01eaf970 00000000`77ef0a2a ntdll!KiUserApcDispatcher+0x15
00000000`01eafe68 00000001`00007fe2 ntdll!NtWaitForSingleObject+0xa
00000000`01eafe70 00000001`00008a39 userdump_100000000!UdServiceWorker+0xb2
00000000`01eaff20 000007ff`7fee4db6 userdump_100000000!UdServiceStart+0x139
00000000`01eaff50 00000000`77d6b6da ADVAPI32!ScSvcctrlThreadW+0x25
00000000`01eaff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

This might be useful if you want to see kernel data that happened to be at exception time. In this case you can avoid requesting complete memory dump of physical memory and ask for kernel memory dump only (if it was configured in Control Panel) together with a user dump.

Note: do not set this option if you are unsure. It can have your production servers bluescreen in the case of false positive dumps.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 7)

Monday, December 3rd, 2007

Be language - excessive use of “is”. This anti-pattern was inspired by Alfred Korzybski notion of how “is” affects our understanding of the world. In the context of technical support the use of certain verbs sometimes leads to wrong troubleshooting and debugging paths. For example, the following phrase:

It is our pool tag. It is effected by driver A, driver B and driver C.  

Surely driver A, driver B and driver C were not developed by the same company that introduced the problem pool tag (smells Alien Component here). Unless supported by solid evidence the better phrase shall be:

It is our pool tag. It might have been effected by driver A, driver B or driver C.  

I’m not advocating to completely eradicate “be” verbs as was done in E-Prime language but to be conscious in their use. Thanks to Simple*ology in pointing me to the right direction.

- Dmitry Vostokov @ DumpAnalysis.org -