Archive for the ‘Crash Dump Patterns’ Category

Crash Dump Analysis Patterns (Part 78a)

Monday, December 1st, 2008

This December starts with easy and obvious patterns that I forgot to write about. Integer division by zero is one of the most frequent exceptions. It is easily recognizable in process crash dumps by the processor instruction that caused this exception type (DIV or IDIV):

FAULTING_IP:
DLL!FindHighestID+278
1b2713c4 f775e4 div dword ptr [ebp-0×1c]

EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 1b2713c4 (DLL!FindHighestID+0x00000278)
ExceptionCode: c0000094 (Integer divide-by-zero)
ExceptionFlags: 00000000
NumberParameters: 0

or

FAULTING_IP:
Application+263d8
004263d8 f7fe idiv eax,esi

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 004263d8 (Application+0x000263d8)
ExceptionCode: c0000094 (Integer divide-by-zero)
ExceptionFlags: 00000000
NumberParameters: 0

ERROR_CODE: (NTSTATUS) 0xc0000094 - {EXCEPTION} Integer division by zero.

- Dmitry Vostokov @ DumpAnalysis.org -

WOW64, blocked threads and coupled processes: pattern cooperation

Friday, November 28th, 2008

Memory dump analysis always starts when a user complains. In this case it was a hanging application from a document processing suit. The manual dump was saved:

Loading Dump File [processA.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Userdump generated complete user-mode minidump with Standalone function on SERVER1'

Main thread stack trace shows a virtualized process (WOW64):

0:000> kL
Child-SP          RetAddr           Call Site
00000000`0016e7b8 00000000`6b006a5a wow64cpu!WaitForMultipleObjects32+0×3a
00000000`0016e860 00000000`6b0097f4 wow64!RunCpuSimulation+0xa
00000000`0016e890 00000000`6b2936a2 wow64!Wow64KiUserCallbackDispatcher+0×114
00000000`0016ebd0 00000000`77ef317f wow64win!whcbfnDWORD+0xc2
00000000`0016ed80 00000000`78b842d9 ntdll!KiUserCallbackDispatcherContinue
00000000`0016ee08 00000000`78b8428e wow64cpu!CpupSyscallStub+0×9
00000000`0016ee10 00000000`00000000 wow64cpu!Thunk0Arg+0×5
 

Therefore we switch to x86 32-bit mode and get the right thread stack:

0:000> .load wow64exts

0:000> .effmach x86
Effective machine: x86 compatible (x86)

0:000:x86> kv
ChildEBP          RetAddr           Args to Child                                        
0012dcac 7d948836 009db2c0 00000000 0000004a user32!NtUserMessageCall+0x15
0012dcd0 30059282 000b0296 0000004a 00000000 user32!SendMessageW+0×82
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fef8 3000268e 02110024 30000000 b90fcc31 ApplicationA+0×59282
0012ff30 3000260b 30000000 00000000 0022245d ApplicationA+0×268e
0012ffc0 7d4e7d2a 00000000 00000000 7efde000 ApplicationA+0×260b
0012fff0 00000000 30001d28 00000000 00000000 kernel32!BaseProcessStart+0×28

We see that the main threads is blocked by sending a synchronous message via SendMessage Win32 API function call. The first argument to it is a window handle. In our case it is 000b0296. It is also known that ApplicationA launches another ApplicationB (coupled process) and its manual memory dump was saved too. It is also a virtualized process and its main GUI thread is blocked:

0:000:x86> kv 100
ChildEBP          RetAddr           Args to Child                                        
0012ce80 7d4e286c 00000003 0012cecc 00000000 ntdll_7d600000!NtWaitForMultipleObjects+0x15
0012cf28 7d4e3e8e 00000003 0012cf6c 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0012cf44 0cc7c897 00000003 0012cf6c 00000001 kernel32!WaitForMultipleObjects+0×18
WARNING: Stack unwind information not available. Following frames may be wrong.
0012cf74 0cc7c990 ffffffff 0cc74b23 00000001 3rdPartyDLL+0xc897
[…]
0012d814 7d947568 3a0b28d7 000b0296 00000002 user32!InternalCallWinProc+0×28
0012d88c 7d947d93 00000000 3a0b28d7 000b0296 user32!UserCallWinProcCheckWow+0×114
0012d8e8 7d947e46 009db2c0 00000000 00000002 user32!DispatchClientMessage+0xdf
0012d924 7d61ea0e 0012d93c 00000000 0012d9b8 user32!__fnDWORD+0×2b
0012d958 3a0baf6a 000b0296 02114600 0012d98c ntdll_7d600000!KiUserCallbackDispatcher+0×2e
[…]
0012db28 7d947568 3a0b28d7 000b0296 00000010 user32!InternalCallWinProc+0×28
0012dba0 7d94778d 00000000 3a0b28d7 000b0296 user32!UserCallWinProcCheckWow+0×114
0012dc18 7d9477d0 0012dc88 00000000 0012dc4c user32!DispatchMessageWorker+0×37b
0012dc28 3a0b89ec 0012dc88 00000000 0219401c user32!DispatchMessageW+0xf
[…]
0012ffc0 7d4e7d2a 00000000 00000000 7efde000 ApplicationB+0×260b
0012fff0 00000000 30001d28 00000000 00000000 kernel32!BaseProcessStart+0×28

We see that it is blocked waiting for synchronization objects after receiving a message to the same window handle 000b0296 sent from ApplicationA:

0:000:x86> dd 0012dc88 l1
00000000`0012dc88 000b0296

DispatchMessage has its first argument as a pointer to an MSG structure with the first field as a window handle (HWND). 

Looking at arguments to WaitForMultipleObjects we see that it is waiting for all three objects to be signalled simultaneously:

0012cf44 0cc7c897 00000003 0012cf6c 00000001kernel32!WaitForMultipleObjects+0×18

0:000:x86> dd 0012cf6c l3
00000000`0012cf6c  00001490 0000149c 00001494

0:000:x86> !handle 00001490
Handle 0000000000001490
  Type          Mutant

0:000:x86> !handle 0000149c
Handle 000000000000149c
  Type          Event

0:000:x86> !handle 00001494
Handle 0000000000001494
  Type          Mutant

Because the waiting call was originated from 3rdPartyDLL module we can recommend to contact its vendor after determining the origin from the output of lmv command.

- Dmitry Vostokov @ DumpAnalysis.org -

WinDbg poster and cards book is out!

Friday, November 28th, 2008

Due to some technical difficulties the release of WinDbg: A Reference Poster and Learning Cards has been delayed by 2 weeks. Now I got a proof copy and approved the book distribution on Amazon, B&N and other bookshops worldwide. Hope you will enjoy it and find it useful.

The similar book for GDB will be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org -

Debugged! Magazine

Tuesday, November 25th, 2008

As one of the new initiatives for the Year of Debugging  DumpAnalysis Portal will publish bimonthly full color 16 page publication called:

Debugged! MZ/PE: MagaZine for/from Practicing Engineers
The only serial publication dedicated entirely to Windows® debugging

The first issue is planned for March, 2009 and will have ISBN-13: 978-1-906717-38-4. If it goes well I’m planning to have ISSN number assigned to it too. More details will be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org

DLL List Landscape

Sunday, November 23rd, 2008

DLL is also a recursive acronym for DLL List Landscape. OpenTask is going to publish soon the new full color book:

Title: DLL List Landscape: The Art from Computer Memory Space
ISBN-13: 978-1-906717-36-0

More details will be announced tomorrow.  

- Dmitry Vostokov @ DumpAnalysis.org -

Heap Corruption

Friday, November 21st, 2008

Below is the list of patterns related to process heap corruption:

and two case studies involving heap corruption:

- Dmitry Vostokov @ DumpAnalysis.org -

Stack trace collection, hidden exception and NULL code pointer: pattern cooperation

Friday, November 21st, 2008

This is a common pattern cooperation in user dumps coming from x64 environments. Its characteristic feature is stack trace collection that appears to be damaged or corrupt with lots of zeroed call sites and sometimes having UNICODE values in saved RSP:

0:001> ~*kL

   0  Id: 668.684 Suspend: 1 Teb: 000007ff`fffde000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`000afcb8 000007ff`7f1ee614 USER32!NtUserWaitMessage+0xa
00000000`000afcc0 000007ff`7f1adf7a SHELL32!CDesktopBrowser::_MessageLoop+0x287
00000000`000afd50 00000001`0001a46c SHELL32!SHDesktopMessageLoop+0x5c
00000000`000afd80 00000001`000231ba Explorer!ExplorerWinMain+0x6f1
00000000`000afec0 00000000`77d5964c Explorer!ModuleEntry+0x1c6
00000000`000aff80 00000000`00000000 kernel32!BaseProcessStart+0x29

   1  Id: 668.20c Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

   2  Id: 668.298 Suspend: 1 Teb: 000007ff`fffd9000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

   3  Id: 668.f34 Suspend: 1 Teb: 000007ff`fffd7000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

   4  Id: 668.824 Suspend: 1 Teb: 000007ff`fffd5000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

   5  Id: 668.820 Suspend: 1 Teb: 000007ff`fffae000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

   6  Id: 668.914 Suspend: 1 Teb: 000007ff`fffac000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0342fb98 000007ff`7fd6ff61 ntdll!ZwReplyWaitReceivePortEx+0xa
00000000`0342fba0 000007ff`7fd45369 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x2a5
00000000`0342feb0 000007ff`7fd65996 RPCRT4!RecvLotsaCallsWrapper+0x9
00000000`0342fee0 000007ff`7fd65d51 RPCRT4!BaseCachedThreadRoutine+0xde
00000000`0342ff50 00000000`77d6b6da RPCRT4!ThreadStartRoutine+0x21
00000000`0342ff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

   7  Id: 668.8cc Suspend: 1 Teb: 000007ff`fffaa000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0x7

   8  Id: 668.1078 Suspend: 1 Teb: 000007ff`fffa8000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

#  9  Id: 668.1118 Suspend: 1 Teb: 000007ff`fffa4000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

  10  Id: 668.574 Suspend: 1 Teb: 000007ff`fffa2000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000020 00000000`00000000 0×0

  11  Id: 668.a54 Suspend: 1 Teb: 000007ff`fffa0000 Unfrozen
Child-SP          RetAddr           Call Site
90000000`0000679f 00000000`00000000 0x72505c74`6e656d65

  12  Id: 668.163c Suspend: 1 Teb: 000007ff`fffa6000 Unfrozen
Child-SP          RetAddr           Call Site
0000001e`00057000 00000000`00000000 0x575c3a43`00000075

  13  Id: 668.5b0 Suspend: 1 Teb: 000007ff`fff9e000 Unfrozen
Child-SP          RetAddr           Call Site
49575c3a`43000000 00000000`00000000 0x4e49575c`3a430000

  14  Id: 668.4c0 Suspend: 1 Teb: 000007ff`fff9c000 Unfrozen
Child-SP          RetAddr           Call Site
3a430000`00200004 00000000`00000000 0x43000000`1f0001c0

  15  Id: 668.774 Suspend: 1 Teb: 000007ff`fff9a000 Unfrozen
Child-SP          RetAddr           Call Site
00410044`00500050 00000000`00000000 0×6e006f`00690074

  16  Id: 668.17c0 Suspend: 1 Teb: 000007ff`fff98000 Unfrozen
Child-SP          RetAddr           Call Site
005c0029`00360038 00000000`00000000 0×500048`005c0029

However we notice ‘#’ in front of thread 9:

#  9  Id: 668.1118 Suspend: 1 Teb: 000007ff`fffa4000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`00000000 00000000`00000000 0×0

This marks the thread that caught the exception. We can either switch to it by ~9s or using ~#s command:

0:010> ~#s
00000000`00000000 ??              ???

Now we look at thread raw stack data to search for any hidden exceptions and we find one indeed:

0:009> !teb
TEB at 000007fffffa4000
    ExceptionList:        0000000000000000
    StackBase:            0000000003000000
    StackLimit:           0000000002ff2000

    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 000007fffffa4000
    EnvironmentPointer:   0000000000000000
    ClientId:             0000000000000668 . 0000000000001118
    RpcHandle:            0000000000000000
    Tls Storage:          0000000000000000
    PEB Address:          000007fffffdb000
    LastErrorValue:       0
    LastStatusValue:      0
    Count Owned Locks:    0
    HardErrorMode:        0

0:010> dqs 0000000003232000  0000000003240000
00000000`03232000  00000000`00000000
00000000`03232008  00000000`00000000
00000000`03232010  00000000`00000000
[…]
00000000`02ffc8c8  00000000`77ef3202 ntdll!KiUserExceptionDispatcher+0×52
00000000`02ffc8d0  fffffa80`07186100
00000000`02ffc8d8  00000000`02ffc8d0
00000000`02ffc8e0  00000000`00000000
[…]

0:009> .cxr 00000000`02ffc8d0
rax=0000000000000000 rbx=0000000000000000 rcx=00000000671b4610
rdx=ffffffff9be48728 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000002ffce68 rbp=00000000671b4610
 r8=0000000002ffccc0  r9=0000000000000000 r10=000068aa62010001
r11=00000000671b4610 r12=0000000000000000 r13=00000000000006a5
r14=00000000671b45d0 r15=7ffffffffffffffd
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
00000000`00000000 ??              ???

0:009> kL
  *** Stack trace for last set context - .thread/.cxr resets it
Child-SP          RetAddr           Call Site
00000000`02ffce68 00000000`67199841 0×0
00000000`02ffce70 00000000`67193188 DllA!DllUnregisterServer+0×8401

00000000`02ffd230 00000000`67194f93 DllA!DllUnregisterServer+0×1d48
00000000`02ffd350 000007ff`7ca9cae4 DllA!DllUnregisterServer+0×3b53
00000000`02ffd3f0 000007ff`7ca9d867 NETSHELL!GetPrimaryIPv6AddressForAdapter+0×64
00000000`02ffd720 000007ff`7ca1eb4f NETSHELL!CConnectionFolder::GetDetailsOf+0×62a
00000000`02ffdd20 000007ff`7f27f57f NETSHELL!CConnectionFolder::GetDetailsEx+0×21d
00000000`02ffe870 000007ff`7f27eea6 SHELL32!CNameSpaceItemUIProperty::GetPropertyDisplayValue+0xaf
00000000`02ffe910 000007ff`7f1e7bd4 SHELL32!CDetailsSectionInfoTask::RunInitRT+0×213
00000000`02fffc60 000007ff`7cec9c26 SHELL32!CRunnableTask::Run+0×52
00000000`02fffc90 000007ff`7ef773d8 BROWSEUI!CShellTaskScheduler_ThreadProc+0×1be
00000000`02fffd60 00000000`77eea78a SHLWAPI!ExecuteWorkItem+0×28
00000000`02fffd90 00000000`77eea99f ntdll!RtlpWorkerCallout+0×183
00000000`02fffec0 00000000`77eeac75 ntdll!RtlpExecuteWorkerRequest+0×63
00000000`02ffff00 00000000`77d6b6da ntdll!RtlpWorkerThread+0×71
00000000`02ffff80 00000000`00000000 kernel32!BaseThreadStart+0×3a

Checking disassembly we see that DllA module code dereferenced a NULL code pointer:

0:009> ub DllA!DllUnregisterServer+0×8401
DllA!DllUnregisterServer+0×83e4:
00000000`67199824 0100            add     dword ptr [rax],eax
00000000`67199826 00488b          add     byte ptr [rax-75h],cl
00000000`67199829 cdff            int     0FFh
00000000`6719982b 1568ad0100      adc     eax,1AD68h
00000000`67199830 85c0            test    eax,eax
00000000`67199832 0f85d8000000    jne     DllA!DllUnregisterServer+0×84d0 (00000000`67199910)
00000000`67199838 488bcd          mov     rcx,rbp
00000000`6719983b ff1547ad0100    call    qword ptr [DllA!DllUnregisterServer+0×23148 (00000000`671b4588)]

0:009> dq 00000000`671b4588
00000000`671b4588  00000000`00000000 00000000`00000000
00000000`671b4598  000007ff`77317e40 00000000`00000000
00000000`671b45a8  00000000`00000000 00000000`00000000
00000000`671b45b8  00000000`00000000 00000000`00000000
00000000`671b45c8  00000000`00000000 00000000`00000000
00000000`671b45d8  000007ff`77310000 00000000`01b81240
00000000`671b45e8  00000000`00000001 00000000`020c09c0
00000000`671b45f8  00000000`00000001 00000001`00000001

- Dmitry Vostokov @ DumpAnalysis.org -

Truncated dump, spiking thread, not my version and hooked functions: pattern cooperation

Friday, November 14th, 2008

Recently I got another manual complete memory dump from a hang workstation where default analysis pointed to a possibility for a Spiking Thread pattern:

0: kd> !analyze -v
[...]
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
[...]
GetContextState failed, 0x80070026
Unable to read selector for PCR for processor 1
GetContextState failed, 0x80070026
Unable to read selector for PCR for processor 1

PROCESS_NAME:  AppA.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 808b73a4 to 808b72cb

STACK_TEXT: 
f46f5b44 808b73a4 e1a36008 00000004 00000018 nt!HvpFindFreeCellInThisViewWindow+0xd7
f46f5b6c 808bd07b e1a36008 00000002 00000018 nt!HvpFindFreeCell+0x98
f46f5b98 808bd588 e1a36008 e1a362fc 00000000 nt!HvpDoAllocateCell+0x69
f46f5bbc 808d0b22 e1a36008 009214a0 c94e24a4 nt!HvReallocateCell+0x9a
f46f5bdc 808c1028 e1a36008 051fa3e8 00000003 nt!CmpAddValueToList+0x46
f46f5c28 808c406a e1a36008 cddb7ccc f46f5d0c nt!CmpSetValueKeyNew+0xfa
f46f5cb4 808b7e2f e45872e0 f46f5d0c 00000004 nt!CmSetValueKey+0x426
f46f5d44 8088978c 00000438 0012fad0 00000000 nt!NtSetValueKey+0x241
f46f5d44 7c8285ec 00000438 0012fad0 00000000 nt!KiFastCallEntry+0xfc
0012fa58 7c827b7b 77f77703 00000438 0012fad0 ntdll!KiFastSystemCallRet
0012fa5c 77f77703 00000438 0012fad0 00000000 ntdll!ZwSetValueKey+0xc
0012faa0 77f5ec90 00000438 0012fad0 00000004 ADVAPI32!LocalBaseRegSetValue+0x12c
0012fb04 60072e40 00000438 6290c0c4 00000000 ADVAPI32!RegSetValueExA+0x160
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fbf4 628e2d57 60062e70 60062b40 80000001 DLLA!GetObjectId+0×9600
[…]

FOLLOWUP_IP:
nt!HvpFindFreeCellInThisViewWindow+d7
808b72cb 034508          add     eax,dword ptr [ebp+8]

Looking at this thread information we see it RUNNING (this is also evident from its call stack):

0: kd> !thread
THREAD 8a0b2890  Cid 0814.10e8  Teb: 7ffdf000 Win32Thread: bc217c68 RUNNING on processor 0
Not impersonating
DeviceMap                 e440acc0
Owning Process            8a0d85f8       Image:         AppA.exe
Wait Start TickCount      153974         Ticks: 0
Context Switch Count      16905                 LargeStack
UserTime                  00:00:03.109
KernelTime                00:00:17.500
[…]

We also see that the thread accumulated 17 seconds as time spent in kernel. Switching to AppA process context and looking at its Image version we see that it is 5.80.x:

0: kd> lmv m AppA
start    end        module name
00400000 0049c000   AppA   (deferred)            
    Image path: C:\PROGRA~1\AppA\AppA.exe
    Image name: AppA.exe
    Timestamp:        Thu Jun 05 14:51:52 2008 (4847EF78)
    CheckSum:         0009D068
    ImageSize:        0009C000
    File version:     5.80.5.1764
    Product version:  5.80.0.0

However from Google search we can find that there is newer version available (variant of Not My Version pattern) and even some indication on various forums that the older ones had problems with CPU resource utilization. We may stop here but I usually scan all threads for any suspicious signs and we can see another running thread on the second CPU:

THREAD 8a2ed5d0  Cid 11b4.1100  Teb: 7ffdf000 Win32Thread: bc342b80 RUNNING on processor 1
Not impersonating
DeviceMap                 e44fc100
Owning Process            8a1efcb0       Image:         calc.exe
Wait Start TickCount      153973         Ticks: 1 (0:00:00:00.015)
Context Switch Count      50736                 LargeStack
UserTime                  00:01:04.515
KernelTime                00:00:00.015
Win32 Start Address calc (0×0101e23a)
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init f4cd6000 Current f4cd5d00 Base f4cd6000 Limit f4cd1000 Call 0
Priority 6 BasePriority 6 PriorityDecrement 0
Unable to get context for thread running on processor 1, Win32 error 0n38

We also see that this thread spent more than a minute in user mode. Unfortunately we cannot see its thread stack because the dump shows signs of Truncated Dump pattern:

Loading Dump File [MEMORY.DMP]
Kernel Complete Dump File: Full address space is available

************************************************************
WARNING: Dump file has been truncated.  Data may be missing.
************************************************************

[…]

0: kd> ~1
GetContextState failed, 0×80070026
Unable to read selector for PCR for processor 1
WARNING: Unable to reset page directories
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
Unable to get program counter
GetContextState failed, 0×80070026
Unable to get current machine context, Win32 error 0n38
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

The dump file size 4,177,920 Kb is less than amount of physical memory 4,192,948 Kb:

1: kd> !vm
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
Unable to get program counter

*** Virtual Memory Usage ***
 Physical Memory:     1048237 (   4192948 Kb)
 Page File: \??\R:\pagefile.sys
   Current:   4177920 Kb  Free Space:   4154440 Kb
   Minimum:   4177920 Kb  Maximum:      4194304 Kb

We can stop here and still recommend to upgrade AppA product seen from the thread running on the first processor but the fact that the second thread belongs to innocent calc.exe demands some attention. Was it calculating incessantly some financial figures following button clicks from a financial genius? Taking advantage of a complete memory dump and the fact that this process spent most of the time in user space we can check for Hooked Functions pattern:

1: kd> .process /r /p 8a1efcb0
Implicit process is now 8a1efcb0
Loading User Symbols
..........................
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026

1: kd> !chkimg -lo 50 -d !user32 -v
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
GetContextState failed, 0x80070026
Unable to get program counter
Searching for module with expression: !user32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\USER32.dll\45D70AC791000\USER32.dll
No range specified

Scanning section:    .text
Size: 392891
Range to scan: 77381000-773e0ebb
    7738c341-7738c345  5 bytes - USER32!CreateWindowExA
 [ 8b ff 55 8b ec:e9 ba 3c 00 c0 ]
[…]
Total bytes compared: 73728(18%)
Number of errors: 75
75 errors : !user32 (7738c341-773a154d)
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

1: kd> u 7738c341
USER32!CreateWindowExA:
7738c341 e9ba3c00c0      jmp     37390000
7738c346 6801000040      push    40000001h
7738c34b ff7534          push    dword ptr [ebp+34h]
7738c34e ff7530          push    dword ptr [ebp+30h]
7738c351 ff752c          push    dword ptr [ebp+2Ch]
7738c354 ff7528          push    dword ptr [ebp+28h]
7738c357 ff7524          push    dword ptr [ebp+24h]
7738c35a ff7520          push    dword ptr [ebp+20h]
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

1: kd> u 37390000
37390000 e96b91562e      jmp     HookA!CreateWindowExA (658f9170)
37390005 8bff            mov     edi,edi
37390007 55              push    ebp
37390008 8bec            mov     ebp,esp
3739000a e937c3ff3f      jmp     USER32!CreateWindowExA+0×5 (7738c346)
3739000f 0000            add     byte ptr [eax],al
37390011 0000            add     byte ptr [eax],al
37390013 0000            add     byte ptr [eax],al
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026
GetContextState failed, 0×80070026

Indeed we see that HookA module is involved and we can recommend to test the stability of the system without the product that uses it or upgrading or disabling this component.

- Dmitry Vostokov @ DumpAnalysis.org -

WinDbg: A Reference Poster and Learning Cards

Monday, November 10th, 2008

Suddenly the course of my publishing activities bended a little to produce a DIY poster and learning cards to be published soon. Here are the product details:

Annotation:
WinDbg is a powerful debugger from Microsoft Debugging Tools for Windows. It has more than 350 commands that can be used in different debugging scenarios. The cover of this book is a poster featuring crash dump analysis checklist and common patterns seen in memory dumps and live debugging sessions. Inside the book you can find ready to cut learning cards with commands and their descriptions coloured according to their use for crash dump or live debugging sessions and user, kernel or complete memory dumps. Tossing cards can create unexpected connections between commands and help to learn them more quickly. Uncut pages can also serve as birds eye view to WinDbg debugging capabilities. More than 350 WinDbg commands including meta-commands and extensions are included.

  • Title: WinDbg: A Reference Poster and Learning Cards
  • Authors: Dmitry Vostokov
  • Publisher: Opentask (20 November 2008)
  • Language: English
  • Product Dimensions: 28.0 x 21.6
  • ISBN-13: 978-1-906717-29-2
  • Paperback: 20 pages

Book Excerpt

Front cover:

Back cover:

After you take inside pages out you are left with a cover that you can use as a crash dump analysis checklist and patterns poster:

I also plan to update this book on a yearly basis. 

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 42d)

Friday, November 7th, 2008

Another example of Wait Chain pattern for objects with ownership semantics is seen in kernel and complete memory dumps where threads are waiting for thread objects. A thread object is a synchronization object whose owner is a thread so they can be easily identified. For example, the main application thread is waiting for an object:

1: kd> !thread 8818e660 16
THREAD 8818e660  Cid 1890.1c50  Teb: 7ffdf000 Win32Thread: b8411008 WAIT: (Unknown) UserMode Non-Alertable
    87d569d8  Thread
    8818e6d8  NotificationTimer
Not impersonating
DeviceMap                 e10008d8
Owning Process            87db5d88       Image:         App.exe
Wait Start TickCount      299006         Ticks: 255 (0:00:00:03.984)
Context Switch Count      1208                 LargeStack
UserTime                  00:00:00.203
KernelTime                00:00:00.203
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b42c3000 Current b42c2c60 Base b42c3000 Limit b42be000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b42c2c78 80833e8d 8818e660 8818e708 00000003 nt!KiSwapContext+0×26
b42c2ca4 80829b74 00000000 b42c2d14 00000000 nt!KiSwapThread+0×2e5
b42c2cec 8093b034 87d569d8 00000006 00804c01 nt!KeWaitForSingleObject+0×346
b42c2d50 8088ac4c 000001ec 00000000 b42c2d14 nt!NtWaitForSingleObject+0×9a
b42c2d50 7c8285ec 000001ec 00000000 b42c2d14 nt!KiFastCallEntry+0xfc
0006fde4 7c827d0b 77e61d1e 000001ec 00000000 ntdll!KiFastSystemCallRet
0006fde8 77e61d1e 000001ec 00000000 0006fe2c ntdll!NtWaitForSingleObject+0xc
0006fe58 77e61c8d 000001ec 00001388 00000000 kernel32!WaitForSingleObjectEx+0xac
0006fe6c 01039308 000001ec 00001388 00000000 kernel32!WaitForSingleObject+0×12
0006fe94 010204ac 0007cc00 00000001 00000002 App!WaitForNotifyList+0xf2
[…]

However that object is a thread too:

THREAD 87d569d8  Cid 1890.1ec0  Teb: 7ffd9000 Win32Thread: b869ba48 WAIT: (Unknown) UserMode Non-Alertable
    8a1f8870  Thread

Therefore, we see that thread 8818e660 is waiting for another thread 87d569d8 which belongs to the same process with PID 1890 and thread 87d569d8 itself is waiting for thread 8a1f8870 which has the following stack trace:

1: kd> !thread 8a1f8870 16
THREAD 8a1f8870  Cid 1890.07d8  Teb: 7ff95000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a0ce4c0  NotificationEvent
    886f1870  NotificationEvent

Not impersonating
DeviceMap                 e10008d8
Owning Process            87db5d88       Image:         App.exe
Wait Start TickCount      292599         Ticks: 6662 (0:00:01:44.093)
Context Switch Count      17            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address Dll!StartMonitoring (0×758217b8)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b6d4f000 Current b6d4e900 Base b6d4f000 Limit b6d4c000 Call 0
Priority 14 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b6d4e918 80833e8d 8a1f8870 00000002 00140000 nt!KiSwapContext+0×26
b6d4e944 808295ab 8a1f8870 00000002 00000000 nt!KiSwapThread+0×2e5
b6d4e978 8093b290 00000002 b6d4eaac 00000001 nt!KeWaitForMultipleObjects+0×3d7
b6d4ebf4 8093b3f2 00000002 b6d4ec1c 00000001 nt!ObpWaitForMultipleObjects+0×202
b6d4ed48 8088ac4c 00000002 026bfc08 00000001 nt!NtWaitForMultipleObjects+0xc8
b6d4ed48 7c8285ec 00000002 026bfc08 00000001 nt!KiFastCallEntry+0xfc
026bfbb8 7c827cfb 77e6202c 00000002 026bfc08 ntdll!KiFastSystemCallRet
026bfbbc 77e6202c 00000002 026bfc08 00000001 ntdll!NtWaitForMultipleObjects+0xc
026bfc64 77e62fbe 00000002 026bfca4 00000000 kernel32!WaitForMultipleObjectsEx+0×11a
026bfc80 6554a01f 00000002 026bfca4 00000000 kernel32!WaitForMultipleObjects+0×18
026bfcfc 758237a3 cd050002 ffffffff 026bfd4c Dll!GetStatusChange+0×7bf
026bffb8 77e64829 75833120 00000000 00000000 Dll!StartMonitoring+0×14b
026bffec 00000000 758217b8 75833120 00000000 kernel32!BaseThreadStart+0×34

Thread 8a1f8870 is waiting for two notification events disjointly which is confirmed by dumping WaitForMultipleObjects arguments. Neither of them is in signaled state and one is a named event “MyEventObject”:

1: kd> dd 026bfc08 l2
026bfc08  0000008c 00000084

1: kd> !handle 0000008c
processor number 1, process 87db5d88
PROCESS 87db5d88  SessionId: 4  Cid: 1890    Peb: 7ffdc000  ParentCid: 01d0
    DirBase: cfe438e0  ObjectTable: e178c228  HandleCount: 439.
    Image: App.exe

Handle table at e50d2000 with 439 Entries in use
008c: Object: 8a0ce4c0  GrantedAccess: 001f0003 Entry: e50d2118
Object: 8a0ce4c0  Type: (8b26ec00) Event
    ObjectHeader: 8a0ce4a8 (old version)
        HandleCount: 1  PointerCount: 3

1: kd> !handle 00000084
processor number 1, process 87db5d88
PROCESS 87db5d88  SessionId: 4  Cid: 1890    Peb: 7ffdc000  ParentCid: 01d0
    DirBase: cfe438e0  ObjectTable: e178c228  HandleCount: 439.
    Image: App.exe

Handle table at e50d2000 with 439 Entries in use
0084: Object: 886f1870  GrantedAccess: 001f0003 (Inherit) Entry: e50d2108
Object: 886f1870  Type: (8b26ec00) Event
    ObjectHeader: 886f1858 (old version)
        HandleCount: 1  PointerCount: 4
        Directory Object: e43ee320  Name: MyEventObject

1: kd> dt _DISPATCHER_HEADER 8a0ce4c0
cutildll!_DISPATCHER_HEADER
   +0×000 Type             : 0 ”
   +0×001 Absolute         : 0 ”
   +0×002 Size             : 0×4 ”
   +0×003 Inserted         : 0 ”
   +0×003 DebugActive      : 0 ”
   +0×000 Lock             : 262144
   +0×004 SignalState      : 0
   +0×008 WaitListHead     : _LIST_ENTRY [ 0×88519d18 - 0×8a1f8918 ]

1: kd> dt _DISPATCHER_HEADER 886f1870
cutildll!_DISPATCHER_HEADER
   +0×000 Type             : 0 ”
   +0×001 Absolute         : 0 ”
   +0×002 Size             : 0×4 ”
   +0×003 Inserted         : 0 ”
   +0×003 DebugActive      : 0 ”
   +0×000 Lock             : 262144
   +0×004 SignalState      : 0
   +0×008 WaitListHead     : _LIST_ENTRY [ 0×88519d30 - 0×8a1f8930 ]

Here is a diagram showing this wait chain:

- Dmitry Vostokov @ DumpAnalysis.org -

Lateral damage, stack overflow and execution residue: pattern cooperation

Wednesday, November 5th, 2008

As I mentioned in comments to Lateral Damage pattern it lies in between the normal healthy dump files and corrupt dumps. For example, the following 8Gb complete memory dump that fits perfectly into 16Gb page file had the problem of missing processor control region making it impossible to get meaningful information from certain WinDbg commands:

0: kd> !analyze -v

[...]

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7727fe0
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------

Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3
Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3

[...]

STACK_TEXT: 
WARNING: Stack unwind information not available. Following frames may be wrong.
8089a600 8088ddf2 00000000 0000000e 00000000 processr+0x2886
8089a604 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa

[...]

0: kd> ~1
Unable to read selector for PCR for processor 1
WARNING: Unable to reset page directories

1: kd> !pcr
Unable to read selector for PCR for processor 1
Cannot get PRCB address 

1: kd> kv
ChildEBP RetAddr  Args to Child             
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 0×0

The bugcheck argument 1 shows that we have a double fault that most often results from kernel stack overflow. If we go back to processor 0 to inspect its TSS we don’t get meaningful results too (we expect the value of Backlink to be 0×28):

0: kd> !pcr
KPCR for Processor 0 at ffdff000:
    Major 1 Minor 1
 NtTib.ExceptionList: ffffffff
     NtTib.StackBase: 00000000
    NtTib.StackLimit: 00000000
  NtTib.SubSystemTib: 80042000
       NtTib.Version: 2a1b0b08
   NtTib.UserPointer: 00000001
       NtTib.SelfTib: 00000000

             SelfPcr: ffdff000
                Prcb: ffdff120
                Irql: 0000001f
                 IRR: 00000000
                 IDR: ffffffff
       InterruptMode: 00000000
                 IDT: 8003f400
                 GDT: 8003f000
                 TSS: 80042000

       CurrentThread: 8089d8c0
          NextThread: 00000000
          IdleThread: 8089d8c0

           DpcQueue:

0: kd> dt _KTSS 80042000
nt!_KTSS
   +0×000 Backlink         : 0xc45
   +0×002 Reserved0        : 0×4d8a
   +0×004 Esp0             : 0×8089a6a0
   +0×008 Ss0              : 0×10
   +0×00a Reserved1        : 0xb70f
   +0×00c NotUsed1         : [4] 0×5031ff00
   +0×01c CR3              : 0×8b55ff8b
   +0×020 Eip              : 0xc75ffec
   +0×024 EFlags           : 0xe80875ff
   +0×028 Eax              : 0xfffffbdd
   +0×02c Ecx              : 0×1b75c084
   +0×030 Edx              : 0×8b184d8b
   +0×034 Ebx              : 0×7d8b57d1
   +0×038 Esp              : 0×2e9c110
   +0×03c Ebp              : 0xf3ffc883
   +0×040 Esi              : 0×83ca8bab
   +0×044 Edi              : 0xaaf303e1
   +0×048 Es               : 0xeb5f
   +0×04a Reserved2        : 0×6819
   +0×04c Cs               : 0×24fc
   +0×04e Reserved3        : 0×44
   +0×050 Ss               : 0×75ff
   +0×052 Reserved4        : 0xff18
   +0×054 Ds               : 0×1475
   +0×056 Reserved5        : 0×75ff
   +0×058 Fs               : 0xff10
   +0×05a Reserved6        : 0xc75
   +0×05c Gs               : 0×75ff
   +0×05e Reserved7        : 0xe808
   +0×060 LDT              : 0
   +0×062 Reserved8        : 0xffff
   +0×064 Flags            : 0
   +0×066 IoMapBase        : 0×20ac
   +0×068 IoMaps           : [1] _KiIoAccessMap
   +0×208c IntDirectionMap  : [32]  “???”

However if we try to list all thread stacks we see one thread running on processor 1:

0: kd> !process 0 ff

[...]

THREAD 8a241db0  Cid 1218.4420  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
IRP List:
   8b200008: (0006,0244) Flags: 00000884  Mdl: 00000000
   89beedb8: (0006,0244) Flags: 00000884  Mdl: 00000000
Not impersonating
DeviceMap                 e1002060
Owning Process            8bc63d88       Image:         svchost.exe
Wait Start TickCount      10242012       Ticks: 0
Context Switch Count      1832            
UserTime                  00:00:00.000
KernelTime                00:00:00.046
Start Address termdd (0xf75cc218)
Stack Init 9c849000 Current 9c846938 Base 9c849000 Limit 9c846000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Unable to read selector for PCR for processor 1

[...]

Now we can look at its raw stack to see execution residue and try to reconstruct partial stack traces:

0: kd> dds 9c846000  9c849000
9c846000  94040001
9c846004  00000014
9c846008  8d147848
9c84600c  8d0bfd08
9c846010  8d0bfd00
9c846014  00000001
9c846018  8d0bfd08
9c84601c  8d0bfd00
9c846020  8d0bfd00
9c846024  9c846034
9c846028  80a5c456 hal!KfLowerIrql+0×62
9c84602c  8d0bfdd8
9c846030  8d0bfd00
9c846034  9c846060
9c846038  80a5a56d hal!KeReleaseQueuedSpinLock+0×2d
9c84603c  00000011
9c846040  00000001
9c846044  8a241db0
9c846048  0000000e
9c84604c  00000000
9c846050  8d0bfdc0
9c846054  05000000
9c846058  00007400
9c84605c  00000001
9c846060  9c846084
9c846064  808b6138 driverA!MapData+0×4a
9c846068  8d0bfd08
9c84606c  00007400
9c846070  00000000
9c846074  00000018
9c846078  00000028
9c84607c  00001000
9c846080  00000018
9c846084  9c84609c
9c846088  f7b8f2e5 driverB!CheckData+0×7a
9c84608c  01b47538
9c846090  00000028
9c846094  0000001c
[…]

0: kd> k L=9c846024 9c846024 9c846024
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846024 80a5c456 0×9c846024
9c846034 80a5a56d hal!KfLowerIrql+0×62
9c8460f0 8080d164 hal!KeReleaseQueuedSpinLock+0×2d
9c846060 808b6138 driverA!RemapData+0×3e
9c846084 f7b8f2e5 driverA!MapData+0×4a
9c84609c f7b8f340 driverB!CheckData+0×7a
9c8460e4 808b4000 driverB!CheckAttributes+0×36f
9c84610c f7b8e503 driverB!AddToRecord+0×2a
9c846174 f7b90df0 driverB!ReadRecord+0×1d0
f7b8e508 90909090 driverB!ReadAllRecords+0×7a
[…]

Using the current stack pointer we get another partial stack trace:

0: kd> k L=9c846034 9c846938 9c846938
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846954 8081df65 0×9c846938
9c846968 808f5437 nt!IofCallDriver+0×45
9c84697c 808ef963 nt!IopSynchronousServiceTail+0×10b
9c8469a0 8088978c nt!NtQueryDirectoryFile+0×5d
9c8469a0 8082f1c1 nt!KiFastCallEntry+0xfc
9c846a44 f5296f4b nt!ZwQueryDirectoryFile+0×11
9c846a90 f5297451 DriverC+0×2f4b
9c846adc f52a54cb DriverC+0×3451
9c846af8 f52a44e6 DriverC+0×114cb
9c846b1c f52b2941 DriverC+0×104e6
9c846b4c f52b2626 DriverC+0×1e941
9c846b88 f52a34a7 DriverC+0×1e626
9c846be8 f52a487c DriverC+0xf4a7
[…]

Using different base pointers for k command we can reconstruct different partial stack traces. We can also analyze the longest ones for any stack usage using variant of knf command that shows stack frame size in bytes and find drivers that consume the most of kernel stack. Because we see execution residue on top of the kernel stack (Limit) we can suspect this thread caused the actual stack overflow which resulted in the double fault bugcheck.

- Dmitry Vostokov @ DumpAnalysis.org -

The mystery of top hit kifastsystemcallret

Monday, November 3rd, 2008

I was always suspicious why kifastsystemcallretis the most searched keyword and now I think there are automated web scanning engines doing data mining for stack traces to keep their databases for crash dump analysis and other stats up-to-date. This is how I would design my own internet bot to find such stack traces. Originally I thought that people are looking for it and wrote this article:

What is KiFastSystemCallRet?

I might be wrong here and this function is searched by humans indeed because it is on top of stack traces and novice users of WinDbg or other debugging tools check its purpose.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.56

Sunday, October 26th, 2008

This is an example of a complex bugtation:

Bugteriology is the study of bugteria. “It comprises the identification, classification and characterization of” bugterial “species.” Bugteria “are identified by their properties, for example their looks, what” memory dumps “they can” appear in “or not” appear in, “what” bugs “they require for growth, what” effects “they produce, etc. To study morphology, that is the” phenotype “of” bugteria, “a” debugger “is used.”

Virtual Museum of Bacteria, Bacteriology: the study of bacteria

- Dmitry Vostokov @ DumpAnalysis.org -

Introducing Bugteriology

Sunday, October 26th, 2008

I continued thinking about bugteria in memory dumps all the day yesterday and came to the conclusion that the study of crash dump analysis patterns needs its own name and the obvious choice was Bugteriology:

Bugteriology is the study of crash dump analysis patterns (bugteria). Its main subject is identification, classification and characterization of such patterns found in memory dumps (bugterial species).

I initially registered a domain for this purpose (later abandoned) pointing to crash dump analysis and debugging portal but I want to think through this idea and perhaps make it a subdomain of dumpanalysis.org with a page for easy online pattern classification and make it also an online supplement to forthcoming encyclopedia of crash dump analysis patterns.

- Dmitry Vostokov @ DumpAnalysis.org -

Did you find a bugterium in a dump?

Saturday, October 25th, 2008

Yesterday was one of those days when I was in a good mood thinking about bugs. Suddenly a thought stroke me about the similar sounding words bacterium and bugterium (perhaps because I’m currently reading a theoretical biology book, Essays on Life Itself). I admit that it might be sounding the same only for a non-native English ear though. So the new definition was born:

Bugterium (pl. bugteria) - an instance of a memory dump analysis pattern found in a crash (memory, core) dump file.

Why a bugterium and not a cdarium? The motivation (with a hindsight) lies in the complexity of debugging (and life forms). While a bug is a complex thing (and a beast) and it takes sometimes days or weeks to chase and fix (kill) the one, a bugterium (bacterium) is of relatively smaller complexity and can be easily identified and dealt with by component removal or upgrade (massively killed). From software support perspective remember this bugtation No.14:

Crash dump analysis ”is anticipated with” joy, “performed with” eagerness, “and bragged about forever.”

Although the perceived simplicity of crash dump analysis is deceptive (bugtation No.2):

“It requires a very unusual mind to undertake the analysis of the obvious” crash.

Alfred North Whitehead, Science and the Modern World

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 9e)

Saturday, October 25th, 2008

Now it is the turn for the yet another pattern of a deadlock variety involving mixed objects in kernel space. Previously we discussed deadlock patterns involving critical sections in user space, executive resources in kernel space, mixed objects in user space and LPC.

Let’s look at a complete manual dump file from a hanging system:

0: kd> !analyze -v

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction.  The hardware supplier should
be called.
Arguments:
Arg1: 004f4454
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Here we have problems reading all executive resource locks:

3: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks....

Resource @ nt!CmpRegistryLock (0x808a48c0)    Shared 36 owning threads
    Contention Count = 48
     Threads: 86aecae0-01<*> 8b76db40-01<*> 8b76ddb0-01<*> 89773020-01<*>
              87222db0-01<*> 87024ba8-01<*> 89a324f0-01<*> 86b4e298-01<*>
              87925b40-01<*> 86b4db40-01<*> 8701f738-01<*> 86ffb198-01<*>
              86b492f0-01<*> 8701bad8-01<*> 86ae2db0-01<*> 86c85db0-01<*>
              86a9ddb0-01<*> 86a86db0-01<*> 86aa7db0-01<*> 86a9f5c0-01<*>
              86c5adb0-01<*> 8767ba38-01<*> 86afedb0-01<*> 89877960-01<*>
              8772cdb0-01<*> 87348628-01<*> 874d6748-01<*> 872365e0-01<*>
              87263970-01<*> 873bf020-01<*> 86c13db0-01<*> 893dcdb0-01<*>
              86afa020-01<*> 878e5020-01<*> 874959f8-01<*> 86b2dc70-01<*>
KD: Scanning for held locks…Error 1 in reading nt!_ERESOURCE.SystemResourcesList.Flink @ f76ee2a0

This is probably because the dump was truncated:

Loading Dump File [MEMORY.DMP]
Kernel Complete Dump File: Full address space is available

WARNING: Dump file has been truncated.  Data may be missing.

However looking at the resource 808a48c0 closely we see that it is owned by the thread 86aecae0 (Cid 2810.2910) which is blocked on a mutant owned by the thread 86dcf3a8:

3: kd> !locks -v 0x808a48c0

Resource @ nt!CmpRegistryLock (0x808a48c0)    Shared 36 owning threads
    Contention Count = 48
     Threads: 86aecae0-01<*>

     THREAD 86aecae0  Cid 2810.2910  Teb: 7ffdd000 Win32Thread: bc54ab88 WAIT: (Unknown) KernelMode Non-Alertable
         86dda264  Mutant - owning thread 86dcf3a8
     Not impersonating
     DeviceMap                 da534618
     Owning Process            86f30b70       Image:         ApplicationA.exe
     Wait Start TickCount      1074481        Ticks: 51601 (0:00:13:26.265)
     Context Switch Count      9860                 LargeStack
     UserTime                  00:00:01.125
     KernelTime                00:00:00.890
     Win32 Start Address 0×300019f0
     Start Address kernel32!BaseProcessStartThunk (0×7c8217f8)
     Stack Init b5342000 Current b5341150 Base b5342000 Limit b533d000 Call 0
     Priority 12 BasePriority 10 PriorityDecrement 0
     ChildEBP RetAddr 
     b5341168 80833465 nt!KiSwapContext+0×26
     b5341194 80829a62 nt!KiSwapThread+0×2e5
     b53411dc b91f4c08 nt!KeWaitForSingleObject+0×346
WARNING: Stack unwind information not available. Following frames may be wrong.
     b5341200 b91ee770 driverA+0xec08
     b5341658 b91e9ca7 driverA+0×8770
     b5341af0 8088978c driverA+0×3ca7

     b5341af0 8082f829 nt!KiFastCallEntry+0xfc
     b5341b7c 808ce716 nt!ZwSetInformationFile+0×11
     b5341bbc 808dd8d8 nt!CmpDoFileSetSize+0×5e
     b5341bd4 808bd798 nt!CmpFileSetSize+0×16
     b5341bf4 808be23f nt!HvpGrowLog1+0×52
     b5341c18 808bfc6b nt!HvMarkDirty+0×453
     b5341c40 808c3fd4 nt!HvMarkCellDirty+0×255
     b5341cb4 808b7e2f nt!CmSetValueKey+0×390
     b5341d44 8088978c nt!NtSetValueKey+0×241
     b5341d44 7c9485ec nt!KiFastCallEntry+0xfc
     0013f5fc 00000000 ntdll!KiFastSystemCallRet

8b76db40-01<*>

     THREAD 8b76db40  Cid 0004.00c8  Teb: 00000000 Win32Thread: 00000000 GATEWAIT
     Not impersonating
     DeviceMap                 d6600900
     Owning Process            8b7772a8       Image:         System
     Wait Start TickCount      1074667        Ticks: 51415 (0:00:13:23.359)
     Context Switch Count      65106            
     UserTime                  00:00:00.000
     KernelTime                00:00:00.781
     Start Address nt!ExpWorkerThread (0x80880352)
     Stack Init bae35000 Current bae34c68 Base bae35000 Limit bae32000 Call 0
     Priority 12 BasePriority 12 PriorityDecrement 0
     ChildEBP RetAddr 
     bae34c80 80833465 nt!KiSwapContext+0x26
     bae34cac 8082ffc0 nt!KiSwapThread+0x2e5
     bae34cd4 8087d6f6 nt!KeWaitForGate+0x152
     dbba6d78 00000000 nt!ExfAcquirePushLockExclusive+0x112

[...]

A reminder about Cid. It is the so called Client id composed from Process id and Thread id (Pid.Tid). Also a mutant is just another name for a mutex object which has an ownership semantics:

0: kd> dt _KMUTANT 86dda264
nt!_KMUTANT
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 MutantListEntry  : _LIST_ENTRY [ 0x86dcf3a8 - 0x86dcf3a8 ]
   +0×018 OwnerThread      : 86dcf3a8 _KTHREAD
   +0×01c Abandoned        : 0 ”
   +0×01d ApcDisable       : 0×1 ”

Now we look at that thread 86dcf3a8 and see that it belongs to ApplicationB (Cid 25a0.14b8):

3: kd> !thread 86dcf3a8
THREAD 86dcf3a8  Cid 25a0.14b8  Teb: 7ffa9000 Win32Thread: bc3e0d20 WAIT: (Unknown) UserMode Non-Alertable
    8708b888  Thread
    86dcf420  NotificationTimer
Not impersonating
DeviceMap                 da534618
Owning Process            87272d88       Image:         ApplicationB.exe
Wait Start TickCount      1126054        Ticks: 28 (0:00:00:00.437)
Context Switch Count      2291                 LargeStack
UserTime                  00:00:00.078
KernelTime                00:00:00.218
Win32 Start Address msvcrt!_endthreadex (0×77b9b4bc)
Start Address kernel32!BaseThreadStartThunk (0×7c8217ec)
Stack Init b550a000 Current b5509c60 Base b550a000 Limit b5507000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b5509c78 80833465 86dcf3a8 86dcf450 00000003 nt!KiSwapContext+0×26
b5509ca4 80829a62 00000000 b5509d14 00000000 nt!KiSwapThread+0×2e5
b5509cec 80938d0c 8708b888 00000006 00000001 nt!KeWaitForSingleObject+0×346
b5509d50 8088978c 00000960 00000000 b5509d14 nt!NtWaitForSingleObject+0×9a
b5509d50 7c9485ec 00000960 00000000 b5509d14 nt!KiFastCallEntry+0xfc
WARNING: Stack unwind information not available. Following frames may be wrong.
0454f3cc 00000000 00000000 00000000 00000000 ntdll!KiFastSystemCallRet

We see that it is waiting on 8708b888 object which is a thread itself and it is waiting on the same mutant 86dda264 owned by the thread 86dcf3a8 (Cid 25a0.14b8):

3: kd> !thread 8708b888
THREAD 8708b888  Cid 25a0.1cb0  Teb: 7ffa6000 Win32Thread: bc3ecb20 WAIT: (Unknown) KernelMode Non-Alertable
    86dda264  Mutant - owning thread 86dcf3a8
Not impersonating
DeviceMap                 da534618
Owning Process            87272d88       Image:         ApplicationB.exe
Wait Start TickCount      1070470        Ticks: 55612 (0:00:14:28.937)
Context Switch Count      11                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address dll!_beginthread (0×1b1122a9)
Start Address kernel32!BaseThreadStartThunk (0×7c8217ec)
Stack Init b4d12000 Current b4d117fc Base b4d12000 Limit b4d0f000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b4d11814 80833465 8708b888 8708b930 00000003 nt!KiSwapContext+0×26
b4d11840 80829a62 0000096c b4d118c4 b91e8f08 nt!KiSwapThread+0×2e5
b4d11888 b91f4c08 86dda264 00000006 00000000 nt!KeWaitForSingleObject+0×346
WARNING: Stack unwind information not available. Following frames may be wrong.
b4d118ac b91ee818 86dda260 b4d11d64 86dda000 DriverA+0xec08
b4d11d04 b91e8f58 000025a0 0000096c b4d11d64 DriverA+0×8818
b4d11d58 8088978c 0000096c 0567f974 7c9485ec DriverA+0×2f58

b4d11d58 7c9485ec 0000096c 0567f974 7c9485ec nt!KiFastCallEntry+0xfc
0567f974 30cba6ad 0000096c 00000000 00000003 ntdll!KiFastSystemCallRet

We can summarize our findings on the following wait chain diagram:

 

Looking from the component-object relationship perspective it is DriverA.sys that is waiting on the mutant 86dda264 although both blocked threads B and C belong to ApplicationB process.

- Dmitry Vostokov @ DumpAnalysis.org -

100 Patterns

Wednesday, October 22nd, 2008

Just realized that yesterday I wrote 100th crash dump analysis pattern. Today I’m going to write 101st. Just to remind that the fully classified color catalog of them is planned to be published:

Forthcoming CDAP Encyclopedia

More details will be announced closer to that date.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 77)

Tuesday, October 21st, 2008

This is a very simple pattern I planned to write about long time ago. It is called C++ Exception. It is similar to Managed Code Exception and can be manifested by the same RaiseException call on top of the stack (red). However it is called by Visual C runtime (I consider Microsoft C/C++ implementation here, msvcrt.dll, magenta). The typical example of it might be checking the validity of a C++ stream operator data format (blue):

STACK_TEXT: 
09d6f264 78007108 KERNEL32!RaiseException+0×56
09d6f2a4 677f2a88 msvcrt!_CxxThrowException+0×34
09d6f2bc 6759afff DLL!MyInputStream::operator>>+0×34

Also, some Visual C++ STL implementations check for out of bounds or invalid parameters and call unhandled exception filter directly, for example:

STACK_TEXT: 
0012d2e8 7c90e9ab ntdll!KiFastSystemCallRet
0012d2ec 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
0012d388 7c80a075 kernel32!WaitForMultipleObjectsEx+0x12c
0012d3a4 6945763c kernel32!WaitForMultipleObjects+0x18
0012dd38 694582b1 faultrep!StartDWException+0x5df
0012edac 7c8633b1 faultrep!ReportFault+0x533
0012f44c 004409b3 kernel32!UnhandledExceptionFilter+0x587
0012f784 00440a1b Application!_invoke_watson+0xc4
0012f79c 00406f4f Application!_invalid_parameter_noinfo+0xc
0012f7a0 0040566b Application!std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator[]+0×12

The latter example also shows how an unhandled exception filter in an application itself calls a postmortem debugger specified by AeDebug registry key (see also Who calls the postmortem debugger? post for detailed explanations).

- Dmitry Vostokov @ DumpAnalysis.org -

MDAA Volume 2 is available on Amazon and B&N

Saturday, October 18th, 2008

Paperback edition of Memory Dump Analysis Anthology, Volume 2 is finally available on Amazon and Barnes & Noble. Search Inside is also available on Amazon. In addition, I updated the list of recommended books:

Listmania! Crash Dump Analysis and Debugging

Hardcover edition will be available on Amazon and B&N in 2-3 weeks.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 11)

Thursday, October 16th, 2008

This one is called No Question and is summarized by the following dialog that I observed many times:

As an engineer proceeds with analysis of a problem a phone rings and an irritated customer (internal or external) asks “What is the answer?”, an engineer glancing over the problem description replies, “What is the question?”.

This immediately came to my mind when I found this famous dialog:

“As the poet Gertrude Stein lay on her deathbed, her partner, Alice B. Toklas, leaned over and whispered, ‘What is the answer, Gertrude?’. Replied Stein, ‘What’s the question?’”

- Dmitry Vostokov @ DumpAnalysis.org -