Archive for the ‘Crash Dump Patterns’ Category

Crash Dump Analysis Patterns (Part 68)

Friday, June 27th, 2008

Setting a thread affinity mask to a specific processor or core makes that thread running in a single processor environment from that thread point of view. It is always scheduled to run on that processor. This potentially creates a problem found on real single processor environments if the processor runs another higher priority thread (Thread Starvation pattern) or loops at dispatch level IRQL (Dispatch Level Spin pattern). Therefore I call this pattern Affine Thread.

Here is one example. Dual core laptop was hanging and kernel memory dump showed the following Wait Chain pattern:

Resource @ nt!PopPolicyLock (0x80563080)    Exclusively owned
    Contention Count = 32
    NumberOfExclusiveWaiters = 9
     Threads: 8b3b08b8-01<*>
     Threads Waiting On Exclusive Access:
              872935f0       8744cb30       87535da8       8755a6b0      
              8588dba8       8a446c10       85891c50       87250020      
              8a6e7da8

The thread 8b3b08b8 blocked other 9 threads and had the following stack trace:

0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8  Cid 0004.002c  Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap                 e1009248
Owning Process            8b3b2830       Image:         System
Wait Start TickCount      44419          Ticks: 8744 (0:00:02:16.625)
Context Switch Count      4579            
UserTime                  00:00:00.000
KernelTime                00:00:01.109
Start Address nt!ExpWorkerThread (0x8053867e)
Stack Init bad00000 Current bacffcb0 Base bad00000 Limit bacfd000 Call 0
Priority 15 BasePriority 12 PriorityDecrement 3 DecrementCount 16
ChildEBP RetAddr 
bacffcc8 804fd2c9 nt!KiUnlockDispatcherDatabase+0x9e
bacffcdc 8052a16f nt!KeSetSystemAffinityThread+0×5b
bacffd04 805caf03 nt!PopCompositeBatteryUpdateThrottleLimit+0×2d
bacffd24 805ca767 nt!PopCompositeBatteryDeviceHandler+0×1c5
bacffd3c 80529d3b nt!PopPolicyWorkerMain+0×25
bacffd7c 8053876d nt!PopPolicyWorkerThread+0xbf
bacffdac 805cff64 nt!ExpWorkerThread+0xef
bacffddc 805460de nt!PspSystemThreadStartup+0×34
00000000 00000000 nt!KiThreadStartup+0×16

Note this function and its first parameter: 

0: kd> !thread 8b3b08b8
[...]
bacffcdc 8052a16f 00000002 8a5b8cd8 00000030 nt!KeSetSystemAffinityThread+0×5b
[…]

The first parameter is KAFFINITY mask and 0×2 is 0y10 (binary) which is the second core. This thread had been already set to run on that core only:

0: kd> dt _KTHREAD 8b3b08b8
nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
[...]
   +0×124 Affinity         : 2
[…]
 

Let’s look at our second core:

0: kd> ~1s

1: kd> kL 100
ChildEBP RetAddr 
a8f00618 acd21947 hal!KeAcquireInStackQueuedSpinLock+0x43
a8f00618 acd21947 tcpip!IndicateData+0x98
a8f00684 acd173e5 tcpip!IndicateData+0x98
a8f0070c acd14ef5 tcpip!TCPRcv+0xbb0
a8f0076c acd14b19 tcpip!DeliverToUser+0x18e
a8f007e8 acd14836 tcpip!DeliverToUserEx+0x95e
a8f008a0 acd13928 tcpip!IPRcvPacket+0x6cb
a8f008e0 acd13853 tcpip!ARPRcvIndicationNew+0x149
a8f0091c ba56be45 tcpip!ARPRcvPacket+0x68
a8f00970 b635801d NDIS!ethFilterDprIndicateReceivePacket+0x307
a8f00984 b63581b4 psched!PsFlushReceiveQueue+0x15
a8f009a8 b63585f9 psched!PsEnqueueReceivePacket+0xda
a8f009c0 ba56c8ed psched!ClReceiveComplete+0x13
a8f009d8 b7defdb5 NDIS!EthFilterDprIndicateReceiveComplete+0x7c
a8f00a08 b7df0f78 driverA+0x17db5
a8f00a64 ba55ec2c driverA+0x18f78
a8f00a88 b6b0962c NDIS!ndisMSendCompleteX+0x8d
a8f00a9c b6b0a36d driverB+0x62c
a8f00ab8 ba55e88f driverB+0x136d
a8f00ae0 b7de003c NDIS!NdisReturnPackets+0xe9
a8f00af0 ba55e88f driverA+0x803c
a8f00b18 b6358061 NDIS!NdisReturnPackets+0xe9
a8f00b30 ba55e88f psched!MpReturnPacket+0x3b
a8f00b58 acc877cc NDIS!NdisReturnPackets+0xe9
87749da0 00000000 afd!AfdReturnBuffer+0xe1

1: kd> r
eax=a8f005f8 ebx=a8f00624 ecx=8a9862ed edx=a8f00b94 esi=874e2ed0 edi=8a9862d0
eip=806e6a33 esp=a8f005ec ebp=a8f00618 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
hal!KeAcquireInStackQueuedSpinLock+0x43:
806e6a33 74ee            je      hal!KeAcquireInStackQueuedSpinLock+0x33 (806e6a23) [br=0]

1: kd> !running

System Processors 3 (affinity mask)
  Idle Processors 1

     Prcb      Current   Next   
  1  bab38120  8a0c8ae8  8b3a7318  …………….

We see the thread 8a0c8ae8 had been spinning on the second core for more than 2 minutes:

1: kd> !thread 8a0c8ae8 1f
THREAD 8a0c8ae8  Cid 0660.0124  Teb: 7ffd7000 Win32Thread: e338c498 RUNNING on processor 1
IRP List:
    8a960008: (0006,01b4) Flags: 00000900  Mdl: 87535908
Not impersonating
DeviceMap                 e2f155b8
Owning Process            87373020       Image:         APPLICATION.EXE
Wait Start TickCount      43918          Ticks: 9245 (0:00:02:24.453)
Context Switch Count      690                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:02:24.453
[…]

Its kernel time looks consistent with the starved thread waiting time:

0: kd> !thread 8b3b08b8 1f
THREAD 8b3b08b8  Cid 0004.002c  Teb: 00000000 Win32Thread: 00000000 READY
Not impersonating
DeviceMap                 e1009248
Owning Process            8b3b2830       Image:         System
Wait Start TickCount      44419          Ticks: 8744 (0:00:02:16.625)
[…]

For comparison, the spinning thread has affinity mask 0y11 (0×3) which means that it can be scheduled to run on both cores:

0: kd> dt _KTHREAD 8a0c8ae8
nt!_KTHREAD
   +0x000 Header           : _DISPATCHER_HEADER
[...]
   +0×124 Affinity         : 3
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 67b)

Wednesday, June 25th, 2008

Nested exception analysis is much simpler in managed code than in unmanaged. Exception object references the inner exception if any and so on:

Exception.InnerException

WinDbg does a good job traversing all nested exceptions when executing !analyze -v command. In the following example of a Windows forms application crash ObjectDisposedException (shown in red) was re-thrown as Exception object with “Critical error” message (shown in magenta) which was re-thrown several times as Exception object with “Critical program error” message (shown in blue) that finally resulted in the process termination request in the top level exception handler:

MANAGED_STACK:
(TransitionMU)
001374B0 0B313757 System!System.Diagnostics.Process.Kill()+0x37
001374E4 0B3129C7 Component!Foo.HandleUnhandledException(System.Exception)+0x137
001374F4 07C0A7D3 Component!Foo+FooBarProcessMenuCommand(System.String)+0x33
(TransitionUM)
(TransitionMU)
[...]

EXCEPTION_OBJECT: !pe 3398614
Exception object: 03398614
Exception type: System.Exception
Message: Critical program error
InnerException: System.Exception, use !PrintException 03398560 to see more StackTrace (generated):

    SP       IP       Function
    001371A8 07C0CDD8 Foo.BarUserInteraction.Process(System.String)
    00137258 07C0CCA6 Foo.BarUserInteraction.ProcessUserInteraction(Sub, BarStepType)
    00137268 07C0A9BA Foo.BarMenu.Process(CMD)
    00137544 07C0A8D8 Foo.BarMenu.ProcessCMD(CMD)
    0013756C 07C0A7BE Foo+FooBar.ProcessBarMenuCommand(System.String)

StackTraceString: <none>
HResult: 80131500

EXCEPTION_OBJECT: !pe 3398560
Exception object: 03398560
Exception type: System.Exception
Message: Critical program error
InnerException: System.Exception, use !PrintException 033984ac to see more StackTrace (generated):

    SP       IP       Function
    00137154 07C0D4CA Foo.BarThreads+ProcessOpenQuery.Execute()
    00137218 07C0D3B3 Foo.BarMenu.ProcessQuery()
    00137220 07C0CCF3 Foo.BarUserInteraction.Process(System.String)

StackTraceString: <none>
HResult: 80131500

EXCEPTION_OBJECT: !pe 33984ac
Exception object: 033984ac
Exception type: System.Exception
Message: Critical program error
InnerException: System.Exception, use !PrintException 033983ec to see more StackTrace (generated):

    SP       IP       Function
    0013704C 0A6149DD Foo.Bar.OpenQueryThreaded(Foo.BarParameter)
    00137154 0A6140D0 Foo.BarThreads+ProcessParameter.Execute()
[…]

StackTraceString: <none>
HResult: 80131500

EXCEPTION_OBJECT: !pe 33983ec
Exception object: 033983ec
Exception type: System.Exception
Message: Critical program error
InnerException: System.Exception, use !PrintException 033982fc to see more StackTrace (generated):

    SP       IP       Function
    00137008 0ACA59F1 Foo.BarApplication.Refresh(Boolean, Boolean)
    001370C4 0A6144E0 Foo.Bar.OpenQueryThreaded(Foo.BarParameter)

StackTraceString: <none>
HResult: 80131500

EXCEPTION_OBJECT: !pe 33982fc
Exception object: 033982fc
Exception type: System.Exception
Message: Critical program error
InnerException: System.Exception, use !PrintException 03398260 to see more StackTrace (generated):

    SP       IP       Function
    00136F3C 0AE24983 Foo.BarDisplay.ShowVariableScreen(Foo.variables.BarVariables)
    00136FDC 0AE204F6 Foo.variables.BarVariables.ShowVariableScreen()
    00137070 0ACAFE1D Foo.BarApplication.ShowVariableScreen(Boolean)
    00137080 0ACA5977 Foo.BarApplication.Refresh(Boolean, Boolean)

StackTraceString: <none>
HResult: 80131500

EXCEPTION_OBJECT: !pe 3398260
Exception object: 03398260
Exception type: System.Exception
Message: Critical error
InnerException: System.ObjectDisposedException, use !PrintException 03397db8 to see more StackTrace (generated):

    SP       IP       Function
    00136FB4 0AE24905 Foo.BarDisplay.ShowVariableScreen(Foo.variables.BarVariables)

StackTraceString: <none>
HResult: 80131500

EXCEPTION_OBJECT: !pe 3397db8
Exception object: 03397db8
Exception type: System.ObjectDisposedException
Message: Cannot access a disposed object.
InnerException: <none>

StackTrace (generated):
    SP       IP       Function
    00136258 06D36158 System.Windows.Forms.Control.CreateHandle()
    001362B8 06D38F96 System.Windows.Forms.Control.get_Handle()
    001362C4 0B0C8C68 System.Windows.Forms.Control.PointToScreen(System.Drawing.Point)
    001362F0 0B0CECB4 System.Windows.Forms.Button.OnMouseUp(System.Windows.Forms.MouseEventArgs)
    00136314 0B0C8BB7 System.Windows.Forms.Control.WmMouseUp(System.Windows.Forms.Message ByRef, System.Windows.Forms.MouseButtons, Int32)
    00136384 06D385A0 System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)
    001363E8 0A69C73E System.Windows.Forms.ButtonBase.WndProc(System.Windows.Forms.Message ByRef)
    00136424 0A69C54D System.Windows.Forms.Button.WndProc(System.Windows.Forms.Message ByRef)
    0013642C 06D37FAD System.Windows.Forms.Control+ControlNativeWindow.OnMessage(System.Windows.Forms.Message ByRef)
    00136430 06D37F87 System.Windows.Forms.Control+ControlNativeWindow.WndProc(System.Windows.Forms.Message ByRef)
    00136440 06D37D9F System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)

StackTraceString: <none>
HResult: 80131622

[…]

EXCEPTION_MESSAGE:  Cannot access a disposed object.

STACK_TEXT: 
00137448 7c827c1b ntdll!KiFastSystemCallRet
0013744c 77e4201b ntdll!NtTerminateProcess+0xc
0013745c 05d78202 kernel32!TerminateProcess+0x20
[...]

- Dmitry Vostokov @ DumpAnalysis.org -

Forthcoming CDAP Encyclopedia

Wednesday, June 25th, 2008

I’m thinking big for a full-color book to celebrate the forthcoming anniversary of Memory Dump Analysis Anthology. Preliminary details:

  • Title: Encyclopedia of Crash Dump Analysis Patterns
  • Author: Dmitry Vostokov
  • Publisher: Opentask (15 April 2009)
  • Language: English
  • Product Dimensions: 21.6 x 14.0
  • ISBN-13: 978-1-906717-21-6
  • Paperback: 400 pages

- Dmitry Vostokov @ DumpAnalysis.org -

Forthcoming CDA pattern classification

Wednesday, June 25th, 2008

There are currently 86 patterns in 67 groups and more are coming. I’m working the first two weeks in July to classify them. The classification scheme(s) should appear around 15th of July.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 67)

Tuesday, June 24th, 2008

In one of my previous posts I wrote that in the case of a first-chance exception it is not possible to see it in a process crash dump because the entire exception processing was done in the kernel space (see the post How to distinguish between 1st and 2nd chances):

However the picture changes if we have Nested Exceptions pattern. In this case we should expect traces of inner exception processing like exception dispatcher code or exception handlers to be present on a raw stack dump:

Consider the following C++ code with two exception handlers:

 __try
 {
   __try
   {
     *(int *)NULL = 0;  // Exception 1
                        // Dump1 1st chance

   }
   __except (EXCEPTION_EXECUTE_HANDLER)
   {
     std::cout << “Inner” << std::endl;
     *(int *)NULL = 0;  // Exception 2
                        // Dump2 1st chance

   }
 }
 __except (EXCEPTION_EXECUTE_HANDLER)
 {
   std::cout << “Outer” << std::endl;
   *(int *)NULL = 0;    // Exception 3
                        // Dump3 1st chance
                        // Dump4 2nd chance

 }

If we run the actual program and we have set a default postmortem debugger we get a second-chance exception dump (Dump4). The program first outputs “Inner” and then “Outer” on a console and then crashes. When we look at the dump we see second-chance exception processing code where the exception record for NtRaiseException is the same and points to Exception 3 context (shown in red color): 

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(11dc.f94): Access violation - code c0000005 (first/second chance not available)
*** ERROR: Module load completed but symbols could not be loaded for NestedException.exe
NestedException+0x1a72:
00000001`40001a72 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> !teb
TEB at 000007fffffde000
    ExceptionList:        0000000000000000
    StackBase:            0000000000130000
    StackLimit:           000000000012d000
    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 000007fffffde000
    EnvironmentPointer:   0000000000000000
    ClientId:             00000000000011dc . 0000000000000f94
    RpcHandle:            0000000000000000
    Tls Storage:          000007fffffde058
    PEB Address:          000007fffffd5000
    LastErrorValue:       0
    LastStatusValue:      c000000d
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dqs 000000000012d000 0000000000130000
[...]
00000000`0012f918  00000000`00000006
00000000`0012f920  00000000`00000000
00000000`0012f928  00000000`775a208d ntdll!KiUserExceptionDispatch+0×53
00000000`0012f930  00000000`00000000
00000000`0012f938  00000000`0012f930 ; exception context
00000000`0012f940  01c8d5f0`00000000
00000000`0012f948  00000000`00000000
[…]

0:000> ub ntdll!KiUserExceptionDispatch+0x53
ntdll!KiUserExceptionDispatch+0x35:
00000000`775a206f xor     edx,edx
00000000`775a2071 call    ntdll!RtlRestoreContext (00000000`775a2255)
00000000`775a2076 jmp     ntdll!KiUserExceptionDispatch+0x53 (00000000`775a208d)
00000000`775a2078 mov     rcx,rsp
00000000`775a207b add     rcx,4D0h
00000000`775a2082 mov     rdx,rsp
00000000`775a2085 xor     r8b,r8b
00000000`775a2088 call    ntdll!NtRaiseException (00000000`775a1550)

0:000> .cxr 00000000`0012f930
rax=00000001400223d0 rbx=0000000000000000 rcx=0000000140022128
rdx=0000000000000001 rsi=0000000000000006 rdi=0000000140022120
rip=0000000140001a72 rsp=000000000012fed0 rbp=0000000000000000
 r8=000007fffffde000  r9=0000000000000001 r10=0000000000000000
r11=0000000000000246 r12=0000000000000000 r13=0000000000000002
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr ac po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010256
NestedException+0x1a72:
00000001`40001a72 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

However if we have first-chance exception Dump3 from some exception monitoring program we see that NtRaiseException parameter points to ”Inner” Exception 2 context (a different and earlier address, shown in magenta color):

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(11dc.f94): Access violation - code c0000005 (first/second chance not available)
*** ERROR: Module load completed but symbols could not be loaded for NestedException.exe
NestedException+0x1a72:
00000001`40001a72 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> dqs 000000000012d000 0000000000130000
[...]
00000000`0012f918  00000000`00000006
00000000`0012f920  00000000`00000000
00000000`0012f928  00000000`775a2068 ntdll!KiUserExceptionDispatch+0×2e
00000000`0012f930  00000000`00000000
00000000`0012f938  00000000`0012f930 ; exception context
00000000`0012f940  01c8d5f0`00000000
[…]

0:000>  .cxr 00000000`0012f930
rax=00000001400223d0 rbx=0000000000000000 rcx=0000000140022128
rdx=0000000000000001 rsi=0000000000000006 rdi=0000000140022120
rip=00000001400019fa rsp=000000000012fed0 rbp=0000000000000000
 r8=000007fffffde000  r9=0000000000000001 r10=0000000000000000
r11=0000000000000246 r12=0000000000000000 r13=0000000000000002
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr ac po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010256
NestedException+0x19fa:
00000001`400019fa c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

Similar can be said about Dump2 where NtRaiseException parameter points to Exception 1 context. But Dump1 doesn’t have any traces of exception processing as expected.

All 4 dump files can be downloaded from FTP to play with:

ftp://dumpanalysis.org/pub/CDAPatternNestedExceptions.zip 

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 66)

Friday, June 20th, 2008

Data Contents Locality is a comparative pattern that helps not only in identifying the class of the problem but increases our confidence and degree of belief in the specific hypothesis. Suppose we have a database of notes of previous problems. If we see the same or similar data accessed in the new memory dump we might suppose that the issue is similar. If Data Contents Locality is complemented by Code Path Locality (similar partial stack traces and code residues) it even  greater boosts our confidence in suggesting specific troubleshooting steps, recommending fixes and service packs or routing the problem to the next support or development service supply chain (like escalating the issue).

Suppose we got a new kernel memory dump with IRQL_NOT_LESS_OR_EQUAL  (A) bugcheck pointing to our module and we notice the write access to a structure in a nonpaged pool having specific pool tag:

3: kd> .trap 9ee8d9b0
ErrCode = 00000002
eax=85407650 ebx=858f6650 ecx=ffffffff edx=85407648 esi=858f65a8 edi=858f6620
eip=8083df4c esp=9ee8da24 ebp=9ee8da64 iopl=0 nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000 efl=00010246
nt!KeWaitForSingleObject+0x24f:
8083df4c 8919            mov     dword ptr [ecx],ebx  ds:0023:ffffffff=????????

STACK_TEXT: 
9ee8d9b0 8083df4c badb0d00 85407648 00000000 nt!KiTrap0E+0x2a7
9ee8da64 80853f3f 85407648 0000001d 00000000 nt!KeWaitForSingleObject+0×24f
9ee8da7c 8081d45f 865b18d8 854076b0 f4b9e53b nt!KiAcquireFastMutex+0×13
9ee8da88 f4b9e53b 00000004 86940110 85407638 nt!ExAcquireFastMutex+0×20
9ee8daa8 f4b9ed98 85407638 00000000 86940110 driver!Query+0×143
[…]

3: kd> !pool 85407648
Pool page 85407648 region is Nonpaged pool
 85407000 size:   80 previous size:    0  (Allocated)  Mdl
 85407080 size:   30 previous size:   80  (Allocated)  Even (Protected)
 854070b0 size:   28 previous size:   30  (Allocated)  Ntfn
 854070d8 size:   28 previous size:   28  (Allocated)  NtFs
 85407100 size:   28 previous size:   28  (Allocated)  Ntfn
[...]
 85407570 size:   28 previous size:   70  (Allocated)  Ntfn
 85407598 size:   98 previous size:   28  (Allocated)  File (Protected)
*85407630 size:   b0 previous size:   98  (Free ) *DrvA

Dumping the memory address passed to KeWaitForSingleObject shows simple but peculiar pattern:

3: kd> dd 85407648
85407648  ffffffff ffffffff ffffffff ffffffff
85407658  ffffffff ffffffff ffffffff ffffffff
85407668  ffffffff ffffffff ffffffff ffffffff
85407678  ffffffff ffffffff ffffffff ffffffff
85407688  ffffffff ffffffff ffffffff ffffffff
85407698  ffffffff ffffffff ffffffff ffffffff
854076a8  ffffffff ffffffff ffffffff ffffffff
854076b8  ffffffff ffffffff ffffffff ffffffff

We find several similar cases in our database but with different overall call stacks except the topmost wait call. Then we notice that in previous cases there were mutants associated with thread structure and we have the same now:

0: kd> !thread
THREAD 858f65a8 Cid 474c.4530 Teb: 7ffdf000 Win32Thread: bc012410 RUNNING on processor 0
[…]

3: kd> dt /r _KTHREAD 858f65a8 MutantListHead
nt!_KTHREAD
   +0×010 MutantListHead : _LIST_ENTRY [ 0×86773040 - 0×86773040 ]

3: kd> !pool 86773040
Pool page 86773040 region is Nonpaged pool
*86773000 size:   50 previous size:    0  (Allocated) *Muta (Protected)
  Pooltag Muta : Mutant objects
[…]

This narrows the issue to only a few previous cases. In one previous case WaitBlockList associated with a thread structure had 0xffffffff in its pointers. Our block shows the same pattern:

0: kd> dt -r _KTHREAD 858f65a8  WaitBlockList
nt!_KTHREAD
   +0×054 WaitBlockList : 0×858f6650 _KWAIT_BLOCK

0: kd> dt _KWAIT_BLOCK 0x858f6650
nt!_KWAIT_BLOCK
   +0x000 WaitListEntry    : _LIST_ENTRY [ 0x85407650 - 0xffffffff ]
   +0×008 Thread           : 0×858f65a8 _KTHREAD
   +0×00c Object           : 0×85407648
   +0×010 NextWaitBlock    : 0×858f6650 _KWAIT_BLOCK
   +0×014 WaitKey          : 0
   +0×016 WaitType         : 0×1 ”
   +0×017 SpareByte        : 0 ”

We have probably narrowed down the issue to a specific case. Although this doesn’t work always and mostly based on intuition there are spectacular cases where it really helps in troubleshooting. Here is another example where the contents of EDI register from exception context provided specific recommendation hints. When looking at the crash point we see an instance of Wild Code pattern:

0:000> kv
ChildEBP RetAddr  Args to Child             
WARNING: Frame IP not in any known module. Following frames may be wrong.
49ab5bba 00000000 00000000 00000000 00000000 0x60f1011a

0:000> r
eax=38084ff0 ebx=52303340 ecx=963f1416 edx=0000063d esi=baaff395 edi=678c5804
eip=60f1011a esp=5a9d0f48 ebp=49ab5bba iopl=0  nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000  efl=00210206
60f1011a cd01            int     1

0:000> u
60f1011a cd01            int     1
60f1011c cc              int     3
60f1011d 8d              ???
60f1011e c0eb02          shr     bl,2
60f10121 0f840f31cd01    je      62be3236
60f10127 8d              ???
60f10128 c0cc0f          ror     ah,0Fh
60f1012b 0bce            or      ecx,esi

Looking at raw stack data we notice the presence of a specific component that is known to patch the process import table. Applying techniques outlined in Hooked Functions pattern we notice two different 3rd-party components that patched two different modules (kernel 32 and user32):

0:000> !chkimg -lo 50 -d !kernel32 -v
Searching for module with expression: !kernel32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\kernel32.dll\4626487F102000\kernel32.dll
No range specified

Scanning section:    .text
Size: 564709
Range to scan: 77e41000-77ecade5
    77e41ae5-77e41ae9  5 bytes - kernel32!LoadLibraryExW
 [ 6a 34 68 48 7b:e9 16 e5 f4 07 ]
    77e44a8a-77e44a8e  5 bytes - kernel32!WaitNamedPipeW (+0×2fa5)
 [ 8b ff 55 8b ec:e9 71 b5 f9 07 ]
    77e5106a-77e5106e  5 bytes - kernel32!CreateProcessInternalW (+0xc5e0)
[…]
Total bytes compared: 564709(100%)
Number of errors: 49
49 errors : !kernel32 (77e41ae5-77e9aa16)

0:000> u 77e41ae5
kernel32!LoadLibraryExW:
77e41ae5 jmp     7fd90000
77e41aea out     77h,al
77e41aec call    kernel32!_SEH_prolog (77e6b779)
77e41af1 xor     edi,edi
77e41af3 mov     dword ptr [ebp-28h],edi
77e41af6 mov     dword ptr [ebp-2Ch],edi
77e41af9 mov     dword ptr [ebp-20h],edi
77e41afc cmp     dword ptr [ebp+8],edi

0:000> u 7fd90000
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ComponentA.dll -
7fd90000 jmp     ComponentA!DllUnregisterServer+0×2700 (678c4280)
7fd90005 push    34h
7fd90007 push    offset kernel32!`string’+0xc (77e67b48)
7fd9000c jmp     kernel32!LoadLibraryExW+0×7 (77e41aec)
7fd90011 add     byte ptr [eax],al
7fd90013 add     byte ptr [eax],al
7fd90015 add     byte ptr [eax],al
7fd90017 add     byte ptr [eax],al

0:000> !chkimg -lo 50 -d !user32 -v
Searching for module with expression: !user32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\user32.dll\45E7BFD692000\user32.dll
No range specified

Scanning section:    .text
Size: 396943
Range to scan: 77381000-773e1e8f
    77383f38-77383f3c  5 bytes - user32!EnumDisplayDevicesW
 [ 8b ff 55 8b ec:e9 c3 c0 82 08 ]
    77384406-7738440a  5 bytes - user32!EnumDisplaySettingsExW (+0×4ce)
 [ 8b ff 55 8b ec:e9 f5 bb 7e 08 ]
    773844d9-773844dd  5 bytes - user32!EnumDisplaySettingsW (+0xd3)
 [ 8b ff 55 8b ec:e9 22 bb 80 08 ]
    7738619b-7738619f  5 bytes - user32!EnumDisplayDevicesA (+0×1cc2)
 [ 8b ff 55 8b ec:e9 60 9e 83 08 ]
    7738e985-7738e989  5 bytes - user32!CreateWindowExA (+0×87ea)
 [ 8b ff 55 8b ec:e9 76 16 8c 08 ]
[…]
Total bytes compared: 396943(100%)
Number of errors: 119
119 errors : !user32 (77383f38-773c960c)

0:000> u 77383f38
user32!EnumDisplayDevicesW:
77383f38 e9c3c08208      jmp     7fbb0000
77383f3d 81ec58030000    sub     esp,358h
77383f43 a1ac243e77      mov     eax,dword ptr [user32!__security_cookie (773e24ac)]
77383f48 8b5508          mov     edx,dword ptr [ebp+8]
77383f4b 83a5acfcffff00  and     dword ptr [ebp-354h],0
77383f52 53              push    ebx
77383f53 56              push    esi
77383f54 8b7510          mov     esi,dword ptr [ebp+10h]

0:000> u 7fbb0000
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ComponentB.dll -
7fbb0000 e91b43d5e5      jmp     ComponentB+0×4320 (65904320)
7fbb0005 8bff            mov     edi,edi
7fbb0007 55              push    ebp
7fbb0008 8bec            mov     ebp,esp
7fbb000a e92e3f7df7      jmp     user32!EnumDisplayDevicesW+0×5 (77383f3d)
7fbb000f 0000            add     byte ptr [eax],al
7fbb0011 0000            add     byte ptr [eax],al
7fbb0013 0000            add     byte ptr [eax],al

Which one should we try to eliminate first to test our assumption that they somehow resulted in application faults? Looking at register context again we see that one specific register (EDI) has a value that lies in ComponentA address range:

0:000> r
eax=38084ff0 ebx=52303340 ecx=963f1416 edx=0000063d esi=baaff395 edi=678c5804
eip=60f1011a esp=5a9d0f48 ebp=49ab5bba iopl=0  nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000 efl=00210206
60f1011a cd01            int     1

0:000> lm
start    end        module name
00400000 01901000   Application
[...]
678c0000 6791d000   ComponentA   ComponentA.DLL
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 65)

Thursday, June 19th, 2008

Not My Version is another basic pattern of DLL Hell variety. It is when we look at the component timestamps and paths and realize that that one of the modules from the production environment is older than we had during development and testing. The lmft WinDbg command will produce the necessary output. If there are many modules we might want to create a CAD graph (Component Age Diagram) to spot anomalies. Component version check is one of the basic troubleshooting and system administration activities that will be fully illustrated in the forthcoming book “Crash Dump Analysis for System Administrators” (ISBN-13: 978-1-906717-02-5). Here is one example (module start and end load addresses are removed for visual clarity):

0:000> kL
Child-SP          RetAddr           Call Site
00000000`0012fed8 00000001`40001093 MyDLL!fnMyDLL
00000000`0012fee0 00000001`40001344 2DLLs+0×1093
00000000`0012ff10 00000000`773acdcd 2DLLs+0×1344
00000000`0012ff60 00000000`774fc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

0:000> lmft
module name
MyDLL    C:\OLD\MyDLL.dll Wed Jun 18 14:49:13 2004
user32   C:\Windows\System32\user32.dll Thu Feb 15 05:22:33 2007
kernel32 C:\Windows\System32\kernel32.dll Thu Nov 02 11:14:48 2006
ntdll    C:\Windows\System32\ntdll.dll Thu Nov 02 11:16:02 2006
2DLLs    C:\2DLLs\2DLLs.exe Thu Jun 19 10:46:44 2008 (485A2B04)
uxtheme  C:\Windows\System32\uxtheme.dll Thu Nov 02 11:15:07 2006
rpcrt4   C:\Windows\System32\rpcrt4.dll Tue Jul 17 05:21:15 2007
lpk      C:\Windows\System32\lpk.dll Thu Nov 02 11:12:33 2006
oleaut32 C:\Windows\System32\oleaut32.dll Thu Dec 06 05:09:35 2007
usp10    C:\Windows\System32\usp10.dll Thu Nov 02 11:15:03 2006
ole32    C:\Windows\System32\ole32.dll Thu Nov 02 11:14:31 2006
advapi32 C:\Windows\System32\advapi32.dll Thu Nov 02 11:11:35 2006
gdi32    C:\Windows\System32\gdi32.dll Thu Feb 21 04:40:51 2008
msvcrt   C:\Windows\System32\msvcrt.dll Thu Nov 02 11:13:37 2006
imm32    C:\Windows\System32\imm32.dll Thu Nov 02 11:13:15 2006
msctf    C:\Windows\System32\msctf.dll Thu Nov 02 11:13:42 2006

This pattern should be checked when we have instances of Module Variety and, especially, Duplicated Module. Note that this pattern can also easily become an anti-pattern when applied to an unknown component: Alien Component.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 64)

Thursday, June 19th, 2008

In addition to Module Variety there is another DLL Hell pattern that I call Duplicated Module. Here the same module is loaded at least twice and we can detect this when we see the module load address appended to its name in the output of lm commands (this is done to make the name of the module unique):

0:000> lm
start    end        module name
00b20000 0147f000   MSO_b20000
30000000 309a7000   EXCEL
30c90000 31848000   mso
71c20000 71c32000   tsappcmp
745e0000 7489e000   msi  
76290000 762ad000   imm32 
76b70000 76b7b000   psapi            
76f50000 76f63000   secur32            
77380000 77411000   user32
77670000 777a9000   ole32
77ba0000 77bfa000   msvcrt
77c00000 77c48000   gdi32          
77c50000 77cef000   rpcrt4           
77da0000 77df2000   shlwapi            
77e40000 77f42000   kernel32
77f50000 77feb000   advapi32           
7c800000 7c8c0000   ntdll

Usually his happens when the DLL is loaded from different locations. It can be exactly the same DLL version. The problems usually surface when there are different DLL versions and the new code loads the old version of the DLL and uses it. This may result in interface incompatibility issues and ultimately in application fault like an access violation.

In order to provide a dump to play with I created a small toy program called 2DLLs to model the worst case scenario similar to the one I encountered in a production environment a couple of days ago. The program periodically loads MyDLL module to call one of its functions. Unfortunately in one place it uses hardcoded relative path:

HMODULE hLib = LoadLibrary(L".\\DLL\\MyDLL.dll");

and in another place it relies on DLL search order:

hLib = LoadLibrary(L".\\MyDLL.dll");

PATH variable directories are used for search if this DLL was not found in other locations specified by DLL search order. We see that the problem can happen when another application is installed which uses the old version of that DLL and modifies PATH variable to point to its location. To model interface incompatibility I compiled the version of MyDLL that causes NULL pointer access violation when the same function is called from it. The DLL was placed into a separate folder and the PATH variable was modified to reference that folder:

C:\>set PATH=C:\OLD;%PATH%

The application crashes and the installed default postmortem debugger (CDB) saves its crash dump. If we open it we would see that it crashed in MyDLL_1e60000 module which should trigger suspicion:

0:000> r
rax=0000000001e61010 rbx=0000000000000000 rcx=0000775dcac00000
rdx=0000000000000000 rsi=0000000000000006 rdi=0000000000001770
rip=0000000001e61010 rsp=000000000012fed8 rbp=0000000000000000
 r8=0000000000000000  r9=000000000012fd58 r10=0000000000000001
r11=000000000012fcc0 r12=0000000000000000 r13=0000000000000002
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010200
MyDLL_1e60000!fnMyDLL:
00000000`01e61010 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> kL
Child-SP          RetAddr           Call Site
00000000`0012fed8 00000001`40001093 MyDLL_1e60000!fnMyDLL
00000000`0012fee0 00000001`40001344 2DLLs+0×1093
00000000`0012ff10 00000000`773acdcd 2DLLs+0×1344
00000000`0012ff60 00000000`774fc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

Looking at the list of modules we see two versions of MyDLL loaded from two different folders:

0:000> lm
start             end                 module name
00000000`01e60000 00000000`01e71000   MyDLL_1e60000
00000000`772a0000 00000000`7736a000   user32
00000000`77370000 00000000`774a1000   kernel32
00000000`774b0000 00000000`7762a000   ntdll
00000001`40000000 00000001`40010000   2DLLs
00000001`80000000 00000001`80011000   MyDLL
000007fe`fc9e0000 000007fe`fca32000   uxtheme
000007fe`fe870000 000007fe`fe9a9000   rpcrt4
000007fe`fe9b0000 000007fe`fe9bc000   lpk
000007fe`fea10000 000007fe`feae8000   oleaut32
000007fe`fecd0000 000007fe`fed6a000   usp10
000007fe`fedd0000 000007fe`fefb0000   ole32
000007fe`fefb0000 000007fe`ff0af000   advapi32
000007fe`ff0d0000 000007fe`ff131000   gdi32
000007fe`ff2e0000 000007fe`ff381000   msvcrt
000007fe`ff390000 000007fe`ff3b8000   imm32
000007fe`ff4b0000 000007fe`ff5b4000   msctf

0:000> lmv m MyDLL_1e60000
start             end                 module name
00000000`01e60000 00000000`01e71000   MyDLL_1e60000
    Loaded symbol image file: MyDLL.dll
    Image path: C:\OLD\MyDLL.dll
    Image name: MyDLL.dll
    Timestamp:        Wed Jun 18 14:49:13 2008 (48591259)
[…]

0:000> lmv m MyDLL
start             end                 module name
00000001`80000000 00000001`80011000   MyDLL
    Image path: C:\2DLLs\DLL\MyDLL.dll
    Image name: MyDLL.dll
    Timestamp:        Wed Jun 18 14:50:56 2008 (485912C0)
[...]

We can also see that the old version of MyDLL was the last loaded DLL:

0:000> !dlls -l

0x002c2680: C:\2DLLs\2DLLs.exe
      Base   0x140000000  EntryPoint  0x1400013b0  Size        0x00010000
      Flags  0x00004000  LoadCount   0x0000ffff  TlsIndex    0x00000000
             LDRP_ENTRY_PROCESSED

[...]

0x002ea9b0: C:\2DLLs\DLL\MyDLL.dll
      Base   0x180000000  EntryPoint  0x1800013d0  Size        0x00011000
      Flags  0x00084004  LoadCount   0x00000001  TlsIndex    0x00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED
             LDRP_PROCESS_ATTACH_CALLED

[...]

0x002ec430: C:\OLD\MyDLL.dll
      Base   0×01e60000  EntryPoint  0×01e613e0  Size        0×00011000
      Flags  0×00284004  LoadCount   0×00000001  TlsIndex    0×00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED
             LDRP_PROCESS_ATTACH_CALLED
             LDRP_IMAGE_NOT_AT_BASE

We can also see that the PATH variable points to its location and this might explain why it was loaded:

0:000> !peb
PEB at 000007fffffd6000
[...]
Path=C:\OLD;C:\Windows\system32;C:\Windows;[…]
[…]

We might think that the module having address in its name was loaded the last but this is not true. If we save another copy of the dump from the existing one using .dump command and load the new dump file we would see that order of the module names is reversed:

0:000> kL
Child-SP          RetAddr           Call Site
00000000`0012fed8 00000001`40001093 MyDLL!fnMyDLL
00000000`0012fee0 00000001`40001344 2DLLs+0×1093
00000000`0012ff10 00000000`773acdcd 2DLLs+0×1344
00000000`0012ff60 00000000`774fc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

0:000> lm
start             end                 module name
00000000`01e60000 00000000`01e71000   MyDLL
00000000`772a0000 00000000`7736a000   user32
00000000`77370000 00000000`774a1000   kernel32
00000000`774b0000 00000000`7762a000   ntdll
00000001`40000000 00000001`40010000   2DLLs
00000001`80000000 00000001`80011000   MyDLL_180000000
000007fe`fc9e0000 000007fe`fca32000   uxtheme
000007fe`fe870000 000007fe`fe9a9000   rpcrt4
000007fe`fe9b0000 000007fe`fe9bc000   lpk
000007fe`fea10000 000007fe`feae8000   oleaut32
000007fe`fecd0000 000007fe`fed6a000   usp10
000007fe`fedd0000 000007fe`fefb0000   ole32
000007fe`fefb0000 000007fe`ff0af000   advapi32
000007fe`ff0d0000 000007fe`ff131000   gdi32
000007fe`ff2e0000 000007fe`ff381000   msvcrt
000007fe`ff390000 000007fe`ff3b8000   imm32
000007fe`ff4b0000 000007fe`ff5b4000   msctf

0:000> !dlls -l

[...]

0x002ec430: C:\OLD\MyDLL.dll
      Base   0×01e60000  EntryPoint  0×01e613e0  Size        0×00011000
      Flags  0×00284004  LoadCount   0×00000001  TlsIndex    0×00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED
             LDRP_PROCESS_ATTACH_CALLED
             LDRP_IMAGE_NOT_AT_BASE

The postprocessed dump file can be downloaded from FTP to play with:

ftp://dumpanalysis.org/pub/CDAPatternDuplicatedModule.zip 

- Dmitry Vostokov @ DumpAnalysis.org -

Memiotics: a definition

Wednesday, June 18th, 2008

Analysis of computer memory snapshots (memory dumps) and their evolution is the domain of memoretics. Computer memory semiotics (memiotics) is the branch of memoretics that studies the interpretation of computer memory, its meaning, signs and symbols.

- Dmitry Vostokov @ DumpAnalysis.org -     

Crash Dump Analysis Patterns (Part 59b)

Thursday, June 12th, 2008

Previously I introduced Missing Component pattern where the example and emphasis was on dynamically loaded modules. In this part I cover statically linked modules. Failure for a loader to find one of them results in a software exception. The most frequent of them are (numbers were taken from Google search):

C0000142 918
C0000143 919
C0000145 1,530
C0000135 24,900

0:001> !error c0000142
Error code: (NTSTATUS) 0xc0000142 (3221225794) - {DLL Initialization Failed}  Initialization of the dynamic link library %hs failed. The process is terminating abnormally.

0:001> !error c0000143
Error code: (NTSTATUS) 0xc0000143 (3221225795) - {Missing System File}  The required system file %hs is bad or missing.

0:001> !error c0000145
Error code: (NTSTATUS) 0xc0000145 (3221225797) - {Application Error}  The application failed to initialize properly (0x%lx). Click on OK to terminate the application.

0:000> !error c0000135
Error code: (NTSTATUS) 0xc0000135 (3221225781) - {Unable To Locate Component}  This application has failed to start because %hs was not found. Re-installing the application may fix this problem.

In this part I only consider user mode exceptions. If we have a default debugger configured it will usually save a crash dump. To model this problem I modified one of my applications by changing all occurrences of KERNEL32.DLL to  KERNEL32.DL using Visual Studio Binary Editor. CDB was configured as a default postmortem debugger (see Custom postmortem debuggers on Vista). When the application was launched CDB attached to it and saved a crash dump. If we open it in WinDbg we get characteristic Special Stack Trace involving loader functions:

Loading Dump File [C:\UserDumps\CDAPatternMissingComponent.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Thu Jun 12 12:03:28.000 2008 (GMT+1)
System Uptime: 1 days 8:46:23.167
Process Uptime: 0 days 0:00:48.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(da4.f60): Wake debugger - code 80000007 (first/second chance not available)
eax=00000000 ebx=77c4a174 ecx=75ce3cf9 edx=00000000 esi=7efde028 edi=7efdd000
eip=77bcf1d1 esp=0017fca4 ebp=0017fd00 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
ntdll!_LdrpInitialize+0x6d:
77bcf1d1 8b45b8          mov     eax,dword ptr [ebp-48h] ss:002b:0017fcb8=7efde000

0:000> kL
ChildEBP RetAddr 
0017fd00 77b937ea ntdll!_LdrpInitialize+0×6d
0017fd10 00000000 ntdll!LdrInitializeThunk+0×10

Verbose analysis command doesn’t give us an indication of what had happened so we need to dig further:

0:000> !analyze -v
[...]

FAULTING_IP:
+0
00000000 ??              ???

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
   ExceptionCode: 80000007 (Wake debugger)
  ExceptionFlags: 00000000
NumberParameters: 0

BUGCHECK_STR:  80000007

PROCESS_NAME:  StackOverflow.exe

ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened}  the system debugger was awakened by an interrupt.

NTGLOBALFLAG:  400

APPLICATION_VERIFIER_FLAGS:  0

DERIVED_WAIT_CHAIN: 

Dl Eid Cid     WaitType
-- --- ------- --------------------------
   0   da4.f60 Unknown               

WAIT_CHAIN_COMMAND:  ~0s;k;;

BLOCKING_THREAD:  00000f60

DEFAULT_BUCKET_ID:  APPLICATION_HANG_BusyHang

PRIMARY_PROBLEM_CLASS:  APPLICATION_HANG_BusyHang

LAST_CONTROL_TRANSFER:  from 77b937ea to 77bcf1d1

FAULTING_THREAD:  00000000

STACK_TEXT: 
0017fd00 77b937ea 0017fd24 77b60000 00000000 ntdll!_LdrpInitialize+0x6d
0017fd10 00000000 0017fd24 77b60000 00000000 ntdll!LdrInitializeThunk+0x10

FOLLOWUP_IP:
ntdll!_LdrpInitialize+6d
77bcf1d1 8b45b8          mov     eax,dword ptr [ebp-48h]

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  ntdll!_LdrpInitialize+6d

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: ntdll

IMAGE_NAME:  ntdll.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  4549bdf8

STACK_COMMAND:  ~0s ; kb

BUCKET_ID:  80000007_ntdll!_LdrpInitialize+6d

FAILURE_BUCKET_ID:  ntdll.dll!_LdrpInitialize_80000007_APPLICATION_HANG_BusyHang

Followup: MachineOwner

Last event and error code are not helpful too: 

0:000> .lastevent
Last event: da4.f60: Wake debugger - code 80000007 (first/second chance not available)
  debugger time: Thu Jun 12 15:04:38.917 2008 (GMT+1)

0:000> !gle
LastErrorValue: (Win32) 0 (0) - The operation completed successfully.
LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0

Let’s search thread raw stack data for any signs of exceptions:

0:000> !teb
TEB at 7efdd000
    ExceptionList:        0017fcf0
    StackBase:            00180000
    StackLimit:           0017e000

    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7efdd000
    EnvironmentPointer:   00000000
    ClientId:             00000da4 . 00000f60
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7efde000
    LastErrorValue:       0
    LastStatusValue:      0
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dds 0017e000 00180000
[...]
0017f8d8  7efdd000
0017f8dc  0017f964
0017f8e0  77c11c78 ntdll!_except_handler4
0017f8e4  00000000
0017f8e8  0017f988
0017f8ec  0017f900
0017f8f0  77ba1ddd ntdll!RtlCallVectoredContinueHandlers+0x15
0017f8f4  0017f988
0017f8f8  0017f9d8
0017f8fc  77c40370 ntdll!RtlpCallbackEntryList
0017f900  0017f970
0017f904  77ba1db5 ntdll!RtlDispatchException+0×11f
0017f908  0017f988
0017f90c  0017f9d8
0017f910  7efde028
0017f914  00000001
0017f918  77630000 kernel32!_imp___aullrem <PERF> (kernel32+0×0)
0017f91c  00000001
0017f920  776ced81 kernel32!_DllMainCRTStartupForGS2+0×10
0017f924  0017f938
0017f928  7765d4d9 kernel32!BaseDllInitialize+0×18
0017f92c  76042340 user32!$$VProc_ImageExportDirectory
0017f930  00000001
0017f934  00000000
0017f938  0017f9e0
0017f93c  77b8f890 ntdll!LdrpSnapThunk+0xc9
0017f940  0040977a StackOverflow+0×977a
0017f944  0000030b
0017f948  76030000 user32!_imp__RegSetValueExW <PERF> (user32+0×0)
0017f94c  76042f94 user32!$$VProc_ImageExportDirectory+0xc54
0017f950  77bb8881 ntdll!LdrpSnapThunk+0×40d
0017f954  0017bb30
0017f958  00409770 StackOverflow+0×9770
0017f95c  00881a50
0017f960  004098b2 StackOverflow+0×98b2
0017f964  77bac282 ntdll!ZwRaiseException+0×12
0017f968  00180000
0017f96c  0017fc48
0017f970  0017fd00
0017f974  77bac282 ntdll!ZwRaiseException+0×12
0017f978  77b7ee72 ntdll!KiUserExceptionDispatcher+0×2a

0017f97c  0017f988 ; exception record
0017f980  0017f9d8 ; exception context
0017f984  00000000
0017f988  c0000135
0017f98c  00000001
0017f990  00000000
0017f994  77bcf1d1 ntdll!_LdrpInitialize+0×6d
0017f998  00000000
0017f99c  77c11c78 ntdll!_except_handler4
0017f9a0  77b8dab8 ntdll!RtlpRunTable+0×218
0017f9a4  fffffffe
0017f9a8  77ba2515 ntdll!vDbgPrintExWithPrefixInternal+0×214
0017f9ac  77ba253b ntdll!DbgPrintEx+0×1e
0017f9b0  77b7f356 ntdll! ?? ::FNODOBFM::`string’
0017f9b4  00000055
0017f9b8  00000003
0017f9bc  77b809c2 ntdll! ?? ::FNODOBFM::`string’
0017f9c0  0017fc9c
0017f9c4  00000001
0017f9c8  0017fd00
0017f9cc  77bcf28e ntdll!_LdrpInitialize+0×12a
0017f9d0  00000055
0017f9d4  75ce3cf9
0017f9d8  0001003f
0017f9dc  00000000
0017f9e0  00000000
0017f9e4  00000000
0017f9e8  00000000
0017f9ec  00000000
0017f9f0  00000000
0017f9f4  0000027f
[…]

We see exception dispatching calls highlighted above. One of their parameters is an exception record and we try to get one:

0:000> .exr 0017f988
ExceptionAddress: 77bcf1d1 (ntdll!_LdrpInitialize+0x0000006d)
   ExceptionCode: c0000135
  ExceptionFlags: 00000001
NumberParameters: 0

Error c0000135 means that the loader was unable to locate a component. Now we try to examine the same raw stack data for any string patterns. For example, the following UNICODE pattern is clearly visible:

0017f2fc  00000000
0017f300  00880ec4
0017f304  77b910d7 ntdll!RtlpDosPathNameToRelativeNtPathName_Ustr+0x344
0017f308  00000000
0017f30c  43000043
0017f310  0042002a
0017f314  0017f33c
0017f318  00000000
0017f31c  00000002
0017f320  00000008
0017f324  00000000
0017f328  0000008c
0017f32c  000a0008
0017f330  77b91670 ntdll!`string'
0017f334  00b92bd6
0017f338  0017f5d4
0017f33c  003a0043
0017f340  0050005c
0017f344  006f0072
0017f348  00720067
0017f34c  006d0061
0017f350  00460020
0017f354  006c0069
0017f358  00730065
0017f35c  00280020
0017f360  00380078
0017f364  00290036
0017f368  0043005c
0017f36c  006d006f
0017f370  006f006d
0017f374  0020006e
0017f378  00690046
0017f37c  0065006c
0017f380  005c0073
0017f384  006f0052
0017f388  00690078
0017f38c  0020006f
0017f390  00680053
0017f394  00720061
0017f398  00640065
0017f39c  0044005c
0017f3a0  004c004c
0017f3a4  00680053
0017f3a8  00720061
0017f3ac  00640065
0017f3b0  004b005c
0017f3b4  00520045
0017f3b8  0045004e
0017f3bc  0033004c
0017f3c0  002e0032
0017f3c4  006c0064

0017f3c8  00000000
0017f3cc  00000000

It is a path to DLL that was probably missing:

0:000> du 0017f33c
0017f33c  "C:\Program Files (x86)\Common Fi"
0017f37c  "les\Roxio Shared\DLLShared\KERNE"
0017f3bc  "L32.dl"

I think the loader was trying to find KERNEL32.dl following the DLL search order and this was the last path element:

0:000> !peb
PEB at 7efde000
    InheritedAddressSpace:    No
    ReadImageFileExecOptions: No
    BeingDebugged:            Yes
    ImageBaseAddress:         00400000
    Ldr                       77c40080
    Ldr.Initialized:          Yes
    Ldr.InInitializationOrderModuleList: 00881ad0 . 008831b8
    Ldr.InLoadOrderModuleList:           00881a50 . 00883dc8
    Ldr.InMemoryOrderModuleList:         00881a58 . 00883dd0
            Base TimeStamp                     Module
[...]
    Environment:  00881de8
[...] 
Path=C:\Windows\system32; C:\Windows; C:\Windows\System32\Wbem; C:\Program Files\ATI Technologies\ATI.ACE; c:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn\; C:\Program Files (x86)\Common Files\Roxio Shared\DLLShared\
[…]

In  similar situations !dlls command might help that shows the load order (-l switch) and points to the last processed DLL:

0:001> !dlls -l

0x004740e8: C:\Program Files\Application\Application.exe
      Base   0x012a0000  EntryPoint  0x012b0903  Size        0x00057000
      Flags  0x00004010  LoadCount   0x0000ffff  TlsIndex    0x00000000
             LDRP_ENTRY_PROCESSED

0x00474158: C:\Windows\SysWOW64\ntdll.dll
      Base   0x77d00000  EntryPoint  0x00000000  Size        0x00160000
      Flags  0x00004014  LoadCount   0x0000ffff  TlsIndex    0x00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED

0x00474440: C:\Windows\syswow64\kernel32.dll
      Base   0x77590000  EntryPoint  0x775a1f3e  Size        0x00110000
      Flags  0x00084014  LoadCount   0x0000ffff  TlsIndex    0x00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED
             LDRP_PROCESS_ATTACH_CALLED

[…] 

0x00498ff8: C:\Windows\WinSxS\x86_microsoft.windows.common-controls_...\comctl32.dll
      Base   0x74d90000  EntryPoint  0x74dc43e5  Size        0x0019e000
      Flags  0x100c4014  LoadCount   0x00000003  TlsIndex    0x00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED
             LDRP_DONT_CALL_FOR_THREADS
             LDRP_PROCESS_ATTACH_CALLED

0x004991b8: C:\Windows\WinSxS\x86_microsoft.vc80.mfcloc_...\MFC80ENU.DLL
      Base   0x71b10000  EntryPoint  0x00000000  Size        0x0000e000
      Flags  0x10004014  LoadCount   0x00000001  TlsIndex    0x00000000
             LDRP_IMAGE_DLL
             LDRP_ENTRY_PROCESSED

If it is difficult to identify what had really happened in crash dumps we can enable loader snaps using gflags and run the application under a debugger. For example, for notepad.exe we have:

Microsoft (R) Windows Debugger Version 6.8.0004.0 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: C:\Windows\notepad.exe
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00000000`ffac0000 00000000`ffaef000   notepad.exe
ModLoad: 00000000`779b0000 00000000`77b2a000   ntdll.dll
LDR: NEW PROCESS
     Image Path: C:\Windows\notepad.exe (notepad.exe)
     Current Directory: C:\Program Files\Debugging Tools for Windows 64-bit\
     Search Path: C:\Windows; C:\Windows\system32; C:\Windows\system; C:\Windows;.; C:\Program Files\Debugging Tools for Windows 64-bit\winext\arcade; C:\Windows\system32;C:\Windows; C:\Windows\System32\Wbem; C:\Program Files\ATI Technologies\ATI.ACE; c:\Program Files (x86)\Microsoft SQL Server\90\Tools\binn\; C:\Program Files (x86)\Common Files\Roxio Shared\DLLShared\
LDR: LdrLoadDll, loading kernel32.dll from
ModLoad: 00000000`777a0000 00000000`778d1000   C:\Windows\system32\kernel32.dll
LDR: kernel32.dll bound to ntdll.dll
LDR: kernel32.dll has stale binding to ntdll.dll
LDR: Stale Bind ntdll.dll from kernel32.dll
LDR: LdrGetProcedureAddress by NAME - BaseThreadInitThunk
[3d8,1278] LDR: Real INIT LIST for process C:\Windows\notepad.exe pid 984 0x3d8
[3d8,1278]    C:\Windows\system32\kernel32.dll init routine 00000000777DC960
[3d8,1278] LDR: kernel32.dll loaded - Calling init routine at 00000000777DC960
LDR: notepad.exe bound to ADVAPI32.dll
ModLoad: 000007fe`fe520000 000007fe`fe61f000   C:\Windows\system32\ADVAPI32.dll
LDR: ADVAPI32.dll bound to ntdll.dll
LDR: ADVAPI32.dll has stale binding to ntdll.dll
LDR: Stale Bind ntdll.dll from ADVAPI32.dll
LDR: ADVAPI32.dll bound to KERNEL32.dll
LDR: ADVAPI32.dll has stale binding to KERNEL32.dll
LDR: ADVAPI32.dll bound to ntdll.dll via forwarder(s) from kernel32.dll
LDR: ADVAPI32.dll has stale binding to ntdll.dll
LDR: Stale Bind KERNEL32.dll from ADVAPI32.dll
LDR: LdrGetProcedureAddress by NAME - RtlAllocateHeap
LDR: LdrGetProcedureAddress by NAME - RtlReAllocateHeap
LDR: LdrGetProcedureAddress by NAME - RtlEncodePointer
LDR: LdrGetProcedureAddress by NAME - RtlDecodePointer
LDR: LdrGetProcedureAddress by NAME - RtlSizeHeap
LDR: LdrGetProcedureAddress by NAME - RtlDeleteCriticalSection
LDR: LdrGetProcedureAddress by NAME - RtlEnterCriticalSection
LDR: LdrGetProcedureAddress by NAME - RtlLeaveCriticalSection
LDR: ADVAPI32.dll bound to RPCRT4.dll
[...]

This technique only works for native platform loader snaps. For example, it doesn’t show loader snaps for 32-bit modules loaded under WOW64 (to the best of my knowledge):

Microsoft (R) Windows Debugger Version 6.8.0004.0 X86
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: C:\Apps\StackOverflow.exe
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 00418000   StackOverflow.exe
ModLoad: 77b60000 77cb0000   ntdll.dll
LDR: NEW PROCESS
     Image Path: C:\Apps\StackOverflow.exe (StackOverflow.exe)
[...]
LDR: Loading (STATIC, NON_REDIRECTED) C:\Windows\system32\wow64cpu.dll
LDR: wow64cpu.dll bound to ntdll.dll
LDR: wow64cpu.dll has stale binding to ntdll.dll
LDR: Stale Bind ntdll.dll from wow64cpu.dll
LDR: wow64cpu.dll bound to wow64.dll
LDR: wow64cpu.dll has stale binding to wow64.dll
LDR: Stale Bind wow64.dll from wow64cpu.dll
LDR: wow64.dll has stale binding to wow64cpu.dll
LDR: Stale Bind wow64cpu.dll from wow64.dll
LDR: Refcount wow64cpu.dll (1)
LDR: Refcount wow64.dll (2)
LDR: Refcount wow64win.dll (1)
LDR: Refcount wow64.dll (3)
LDR: LdrGetProcedureAddress by NAME - Wow64LdrpInitialize
[...]
ModLoad: 77630000 77740000   C:\Windows\syswow64\kernel32.dll
ModLoad: 76030000 76100000   C:\Windows\syswow64\USER32.dll
ModLoad: 775a0000 77630000   C:\Windows\syswow64\GDI32.dll
ModLoad: 76d00000 76dbf000   C:\Windows\syswow64\ADVAPI32.dll
ModLoad: 76df0000 76ee0000   C:\Windows\syswow64\RPCRT4.dll
ModLoad: 75d60000 75dc0000   C:\Windows\syswow64\Secur32.dll

(1ec.1290): Unknown exception - code c0000135 (first chance)
(1ec.1290): Unknown exception - code c0000135 (!!! second chance !!!)
eax=00000000 ebx=77c4a174 ecx=75ce3cf9 edx=00000000 esi=7efde028 edi=7efdd000
eip=77bcf1d1 esp=0017fca4 ebp=0017fd00 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
ntdll!_LdrpInitialize+0×6d:
77bcf1d1 8b45b8          mov     eax,dword ptr [ebp-48h] ss:002b:0017fcb8=7efde000

The dump file that I used was modified to remove sensitive information (see Data Hiding in Crash Dumps for this technique). It can be downloaded from FTP to play with:

ftp://dumpanalysis.org/pub/CDAPatternMissingComponent.zip

- Dmitry Vostokov @ DumpAnalysis.org -