Archive for the ‘Crash Dump Patterns’ Category

Crash Dump Analysis Patterns (Part 145)

Saturday, June 25th, 2011

In addition to Blocked Thread, endpoint threads of Wait Chain patterns, and Blocking Module we would like to add Blocking File pattern that often happen (but not limited to) in roaming profile scenarios. For example, an application was reported hanging and in a complete memory dump we can see a thread in stack trace collection:

THREAD fffffa8005eca060  Cid 14b0.1fec  Teb: 000000007ef84000 Win32Thread: fffff900c26c2c30 WAIT: (Executive) KernelMode Non-Alertable
        fffffa80048e6758  NotificationEvent
IRP List:
       fffffa8005a6c160: (0006,03e8) Flags: 00060000  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a0055b6620
Owning Process            fffffa80063dd970       Image:         Application.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      171988390      Ticks: 26963639 (4:21:01:46.859)
Context Switch Count      226                 LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.015
Win32 Start Address 0×000000006d851f62
Stack Init fffff880075a9db0 Current fffff880075a9770
Base fffff880075aa000 Limit fffff880075a4000 Call 0
Priority 10 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
Child-SP          RetAddr           Call Site
fffff880`075a97b0 fffff800`0167f752 nt!KiSwapContext+0×7a
fffff880`075a98f0 fffff800`016818af nt!KiCommitThreadWait+0×1d2
fffff880`075a9980 fffff800`019b612a nt!KeWaitForSingleObject+0×19f
fffff880`075a9a20 fffff800`0198feaa nt! ?? ::NNGAKEGL::`string’+0×1d61a
fffff880`075a9a60 fffff800`018ed0e3 nt!IopSynchronousServiceTail+0×35a
fffff880`075a9ad0 fffff800`01677853 nt!NtLockFile+0×514
fffff880`075a9bb0 00000000`77840cea nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`075a9c20)
00000000`0798e488 00000000`7543293b ntdll!ZwLockFile+0xa
00000000`0798e490 00000000`7541cf87 wow64!whNtLockFile+0×7f
00000000`0798e510 00000000`7536276d wow64!Wow64SystemServiceEx+0xd7
00000000`0798edd0 00000000`7541d07e wow64cpu!TurboDispatchJumpAddressEnd+0×24
00000000`0798ee90 00000000`7541c549 wow64!RunCpuSimulation+0xa
00000000`0798eee0 00000000`7786d177 wow64!Wow64LdrpInitialize+0×429
00000000`0798f430 00000000`7782308e ntdll! ?? ::FNODOBFM::`string’+0×2bfe4
00000000`0798f4a0 00000000`00000000 ntdll!LdrInitializeThunk+0xe

We immediately spot the anomaly of a lock file attempt and look at its IRP:

0: kd> !irp fffffa8005a6c160
Irp is active with 7 stacks 7 is current (= 0xfffffa8005a6c3e0)
 No Mdl: No System Buffer: Thread fffffa8005eca060:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  2 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 ffffffffc000020c
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [ 11, 0]   0  2 fffffa8004da0620 00000000 fffff8800177d9cc-fffffa800710e580   
        \FileSystem\mrxsmb mup!MupiUncProviderCompletion
   Args: 00000000 00000000 00000000 00000000
>[ 11, 1]   0  0 fffffa8004066400 fffffa80048e66c0 00000000-00000000   
        \FileSystem\Mup
   Args: fffffa8004a98120 00000001 00000000 00000000

From that IRP we see a file name: 

0: kd> !fileobj fffffa80048e66c0

[...]\AppData\Roaming\Vendor\Product\Recent\index.dat

LockOperation Set  Device Object: 0xfffffa8004066400   \FileSystem\Mup
Vpb is NULL
Access: Read SharedRead SharedWrite SharedDelete

Flags:  0x40002
 Synchronous IO
 Handle Created

File Object is currently busy and has 0 waiters.

FsContext: 0xfffff8a00e8d9010 FsContext2: 0xfffff8a012e4d688
CurrentByteOffset: 0
Cache Data:
  Section Object Pointers: fffffa8006086928
  Shared Cache Map: 00000000
File object extension is at fffffa8005c8cbe0:

Alternatively we get a 32-bit stack trace from the virtualized process:

0: kd> .process /r /p fffffa80063dd970
Implicit process is now fffffa80`063dd970
Loading User Symbols

0: kd> .thread /w fffffa8005eca060
Implicit thread is now fffffa80`05eca060
The context is partially valid. Only x86 user-mode context is available.
x86 context set

0: kd:x86> .reload
Loading Kernel Symbols
Loading User Symbols
Loading unloaded module list
Loading Wow64 Symbols

0: kd:x86> kv
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child             
07ac8510 774f033f 00000390 00000000 00000000 ntdll_779d0000!ZwLockFile+0×12
07ac8590 774f00d3 061b2b68 ada9964d c0000016 kernel32!BaseDllOpenIniFileOnDisk+0×246
07ac85d0 774efae9 061b2b68 00001000 6d352f20 kernel32!BaseDllReadWriteIniFileOnDisk+0×2d
07ac85e8 775001bf 00000001 00000000 061b2b68 kernel32!BaseDllReadWriteIniFile+0xed
07ac861c 6d928401 07aca71c 00000000 00001000 kernel32!GetPrivateProfileStringW+0×35
WARNING: Stack unwind information not available. Following frames may be wrong.
07ac8640 6d9282f5 07aca71c 00000000 00000000 DLL+0×618401
[…]
07acfb14 774e3677 06757d20 07acfb60 77a09d72 DLL+0×541f6d
07acfb20 77a09d72 06757d20 eca51e43 00000000 kernel32!BaseThreadInitThunk+0xe
07acfb60 77a09d45 6d851f62 06757d20 ffffffff ntdll_779d0000!__RtlUserThreadStart+0×70
07acfb78 00000000 6d851f62 06757d20 00000000 ntdll_779d0000!_RtlUserThreadStart+0×1b

We get the same file name from a file handle:

0: kd> !handle 00000390
processor number 0, process fffffa80063dd970
PROCESS fffffa80063dd970
    SessionId: 5  Cid: 14b0    Peb: 7efdf000  ParentCid: 1fac
    DirBase: 48293000  ObjectTable: fffff8a010515f90  HandleCount: 342.
    Image: Application.exe

Handle table at fffff8a0083e9000 with 444 Entries in use
0390: Object: fffffa80048e66c0  GrantedAccess: 00120089 Entry: fffff8a00866fe40
Object: fffffa80048e66c0  Type: (fffffa8003cf0b40) File
    ObjectHeader: fffffa80048e6690 (new version)
        HandleCount: 1  PointerCount: 3
        Directory Object: 00000000  Name: […]\AppData\Roaming\Vendor\Product\Recent\index.dat {Mup}

We also c0000016 error code on raw stack and examine it too:

0: kd> !error c0000016
Error code: (NTSTATUS) 0xc0000016 (3221225494) - {Still Busy}  The specified I/O request packet (IRP) cannot be disposed of because the I/O operation is not complete.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 144)

Saturday, June 25th, 2011

This pattern description is also short as the previous one. How would you call a memory dump where you don’t see anything abnormal or even suspicious? We call it Quiet Dump. For example, in such a dump its stack trace collection would not deviate from reference stack traces and we would not see any spiking thread.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 143)

Saturday, June 25th, 2011

This pattern description is short. Pleiades is a cluster of modules in lm WinDbg command output that serve similar function, like print drivers in print spooler or Citrix printing services. Usually we know that anyone of them could be at fault. Another example is a group of process modules in a complete memory dump serving the same function in separate terminal services sessions.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 142)

Friday, June 24th, 2011

For analysis of memory dumps from coupled processes or, in general, memory fibers from fiber bundle memory spaces we need to know their creation times (called debug session time).  In some cases we need to know their time sequence: which process memory dump was saved first and how much time had passed before the second process memory dump was saved. Beside an initial output when we open a dump .time and version WinDbg commands can be used to check this information at any time during analysis.

In one example involving printing we see a blocking thread trying to contact a print spooler service using LPC. Its thread age is no more than 3 seconds. We also have the print spooler service process memory dump supposedly taken at the same time. However, when we check we see it was saved 2 minutes before. Moreover, PrintIsolationHost.exe process memory dump was saved even earlier. So the whole sequence was reversed because the printing application calls the spooler and it calls the appropriate driver, not the way around. We call this pattern Unsynchronized Dumps.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 141)

Thursday, June 23rd, 2011

Sometimes, it is useful to know how much time ago a thread was created in order to understand when other behavioral patterns possibly started to appear. For example, in user process memory dumps with saved thread time information we can see using !runaway WinDbg extension command or using .ttime command. Looking at a stack trace collection we notice a thread blocked in LPC call:

0:000> ~40 kc
Call Site
ntdll!NtAlpcSendWaitReceivePort
rpcrt4!LRPC_CCALL::SendReceive
rpcrt4!NdrpClientCall3
rpcrt4!NdrClientCall3
[...]
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

We are interested when all this started because we want to compare with other coupled process memory dumps saved at different times:

0:000> !runaway f
 User Mode Time
[...]
 Kernel Mode Time
  Thread       Time
[...]
 Elapsed Time
  Thread       Time
   0:8ac       4 days 11:42:14.484
   1:8b4       4 days 11:42:14.296
[...]
  35:868       4 days 10:18:48.255
  36:73ec      0 days 15:55:31.938
  37:c0bc      0 days 10:36:53.447
  38:782c      0 days 0:02:01.683
  39:5864      0 days 0:00:52.236
  40:5ffc      0 days 0:00:02.565

0:000> ~40s
ntdll!NtAlpcSendWaitReceivePort+0xa:
00000000`76d3ff0a c3              ret

0:040> .ttime
Created: Tue Jun  14 15:15:28.444 2011
Kernel:  0 days 0:00:00.000
User:    0 days 0:00:00.000

0:040> .time
Debug session time: Tue Jun  14 15:15:31.000 2011
System Uptime: 4 days 11:43:21.389
Process Uptime: 4 days 11:42:15.000
  Kernel time: 0 days 0:00:10.000
  User time: 0 days 0:04:22.000

We call this pattern Thread Age. We see that our blocked thread had only recently started compared to other threads and actually started after other dumps were saved when we look at their debug session time.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Icons for Memory Dump Analysis Patterns (Part 96)

Wednesday, June 22nd, 2011

Today we introduce an icon for Handle Limit (GDI) pattern:

B/W

Color

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 140)

Monday, June 20th, 2011

In this part we introduce Coupled Modules pattern. Often we identify a pattern that points to a particular module such as a driver or DLL other modules could use functional services from and, therefore, the latter modules can be implicated in abnormal software behavior. For example, detected insufficient kernel paged pool memory pointed to a driver that owns a pool tag DRV:

1: kd> !poolused 4
   Sorting by  Paged Pool Consumed

 Tag    Allocs    Frees     Diff     Used  
 DRV   1466496  1422361    44135  188917256  UNKNOWN pooltag ‘DRV ‘, please update pooltag.txt
 File  6334830  6284036    50794    6735720  File objects
 Thre    53721    45152     8569    4346432  Thread objects , Binary: nt!ps
[…]

This module is known to be a directing module to other drivers (from stack trace perspective) but we also know that other (directed) modules use its services that require memory allocation.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Stack Trace Patterns

Saturday, June 18th, 2011

A page to reference all different kinds of stack traces is necessary, so I created this post:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Stack trace collection, message box, self-diagnosis, version-specific extension, managed stack trace and managed code exception: pattern cooperation

Saturday, June 18th, 2011

A service written in one of .NET languages was described as crashing and a process dump was collected for analysis. The default analysis command couldn’t an exception so we had to look at stack trace collection to find any anomalies. Indeed, there was a thread with message box code:

0:010> kL
Child-SP          RetAddr           Call Site
00000000`1f69e808 00000000`774b4bc4 user32!ZwUserWaitMessage+0xa
00000000`1f69e810 00000000`774b4edd user32!DialogBox2+0x274
00000000`1f69e8a0 00000000`77502920 user32!InternalDialogBox+0x135
00000000`1f69e900 00000000`77501c15 user32!SoftModalMessageBox+0x9b4
00000000`1f69ea30 00000000`7750146b user32!MessageBoxWorker+0x31d
00000000`1f69ebf0 00000000`77501362 user32!MessageBoxTimeoutW+0xb3
00000000`1f69ecc0 000007fe`f1590ce7 user32!MessageBoxW+0×4e
00000000`1f69ed00 000007fe`eb0f5c59 mscorwks!DoNDirectCall__PatchGetThreadCall+0×7b
[…]

MessageBoxW parameters were showing self-diagnosis pattern with a stack trace:

0:010> du 00000000`085f90c8
00000000`085f90c8  "......    at ClassA.foo()
[…]

0:010> du 00000000`085f9c40
00000000`085f9c40  "Assertion Failed: Abort=Quit, Re"
00000000`085f9c80  "try=Debug, Ignore=Continue"

We tried unsuccessfully to load CLR extensions and requested the copy of .NET Framework from the affected computer. After that we were able to load version-specific extension, see managed stack trace and managed code exception:

0:010> !DumpStack
OS Thread Id: 0x8dc (15)
Child-SP         RetAddr          Call Site
000000001f69e808 00000000774b4bc4 user32!ZwUserWaitMessage+0xa
000000001f69e810 00000000774b4edd user32!DialogBox2+0x274
000000001f69e8a0 0000000077502920 user32!InternalDialogBox+0x135
000000001f69e900 0000000077501c15 user32!SoftModalMessageBox+0x9b4
000000001f69ea30 000000007750146b user32!MessageBoxWorker+0x31d
000000001f69ebf0 0000000077501362 user32!MessageBoxTimeoutW+0xb3
000000001f69ecc0 000007fef1590ce7 user32!MessageBoxW+0x4e
000000001f69ed00 000007feeb0f5c59 mscorwks!DoNDirectCall__PatchGetThreadCall+0x7b
[...]
000000001f69e030 000007ff00a9ba1c ModuleA!ClassA.foo()+0×47
[…]
000000001f69fe30 000000007781c521 kernel32!BaseThreadInitThunk+0xd
000000001f69fe60 0000000000000000 ntdll!RtlUserThreadStart+0×1d

0:010> ~0s

0:000> !pe
Exception object: 0000000005a976b8
Exception type: System.FormatException
Message: Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
InnerException: <none>
StackTrace (generated):
    SP               IP               Function
    0000000000D0BE40 000007FEEC2153B0 mscorlib_ni!System.Text.StringBuilder.AppendFormat(System.IFormatProvider, System.String, System.Object[])+0x999280
    0000000000D0BEE0 000007FEEB87C0FA mscorlib_ni!System.String.Format(System.IFormatProvider, System.String, System.Object[])+0x5a
    0000000000D0BF30 000007FF00AB336B ModuleA!ClassB.get()+0xeb

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 139)

Friday, June 17th, 2011

We have CLR Thread and Managed Code Exception patterns. But something is missing in this picture especially when we try to communicate an analysis. While writing today a case study I found the need to name another pattern as Managed Stack Trace. Typical examples are stack traces from !CLRStack and !pe extension commands or subtraces from !DumpStack and !EEStack extension commands:

0:000> !pe
Exception object: 0000000005a976b8
Exception type: System.FormatException
Message: Index (zero based) must be greater than or equal to zero and less than the size of the argument list.
InnerException: <none>
StackTrace (generated):
    SP               IP               Function
    0000000000D0BE40 000007FEEC2153B0 mscorlib_ni!System.Text.StringBuilder.AppendFormat(System.IFormatProvider, System.String, System.Object[])+0×999280
    0000000000D0BEE0 000007FEEB87C0FA mscorlib_ni!System.String.Format(System.IFormatProvider, System.String, System.Object[])+0×5a
    0000000000D0BF30 000007FF00AB336B ModuleA!ClassB.get()+0xeb

0:010> !DumpStack
OS Thread Id: 0x8dc (15)
Child-SP         RetAddr          Call Site
000000001f69e808 00000000774b4bc4 user32!ZwUserWaitMessage+0xa
000000001f69e810 00000000774b4edd user32!DialogBox2+0x274
000000001f69e8a0 0000000077502920 user32!InternalDialogBox+0x135
000000001f69e900 0000000077501c15 user32!SoftModalMessageBox+0x9b4
000000001f69ea30 000000007750146b user32!MessageBoxWorker+0x31d
000000001f69ebf0 0000000077501362 user32!MessageBoxTimeoutW+0xb3
000000001f69ecc0 000007fef1590ce7 user32!MessageBoxW+0x4e
000000001f69ed00 000007feeb0f5c59 mscorwks!DoNDirectCall__PatchGetThreadCall+0x7b
[...]
000000001f69e030 000007ff00a9ba1c ModuleA!ClassA.foo()+0×47
[…]
000000001f69fe30 000000007781c521 kernel32!BaseThreadInitThunk+0xd
000000001f69fe60 0000000000000000 ntdll!RtlUserThreadStart+0×1d

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 138)

Sunday, June 12th, 2011

Sometimes we don’t have symbols (No Component Symbols pattern) or have only a restricted set that we call No Data Types pattern. For example, in a base OS we have data types:

0:016> dt ntdll!*
          ntdll!LIST_ENTRY64
          ntdll!LIST_ENTRY32
          ntdll!_KUSER_SHARED_DATA
          ntdll!_KSYSTEM_TIME
          ntdll!_KSYSTEM_TIME
          ntdll!_NT_PRODUCT_TYPE
[...]

In the “private” version we don’t have them although the symbol file exists:

0:015> dt ntdll!*
0:015> !lmi ntdll
Loaded Module Info: [ntdll]
         Module: ntdll
   Base Address: 0000000076de0000
     Image Name: ntdll.dll
   Machine Type: 34404 (X64)
     Time Stamp: 4dcd9861 Fri May 13 21:45:21 2011
           Size: 17f000
       CheckSum: 188814
Characteristics: 2022  perf
Debug Data Dirs: Type  Size     VA  Pointer
             CODEVIEW    22, f72a8,   f66a8 RSDS - GUID: {05A648A7-625D-42E7-B736-7816F0CA1E0C}
               Age: 2, Pdb: ntdll.pdb
                CLSID     8, f72a0,   f66a0 [Data not mapped]
     Image Type: MEMORY   - Image read successfully from loaded memory.
    Symbol Type: PDB      - Symbols loaded successfully from symbol server.
                 c:\mss\ntdll.pdb\05A648A7625D42E7B7367816F0CA1E0C2\ntdll.pdb
    Load Report: public symbols , not source indexed
                 c:\mss\ntdll.pdb\05A648A7625D42E7B7367816F0CA1E0C2\ntdll.pdb

In such cases manually loading a proximate module might help: Coping with missing symbolic information (although I haven’t yet tested it on x64 systems). I also thought of naming the pattern as Private Modification but that would not cover many other cases where types were missing from the very beginning.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 137)

Sunday, June 12th, 2011

After getting my hands on crash dumps from Windows Azure cloud platform I discerned a few patterns and the first one is just called Cloud Environment. The pattern covers both development (emulator, if it exists) and real (staging and deployment) environments. This pattern is best diagnosed by looking at specific infrastructure modules:

0:016> lm m Wa*
start             end                 module name
00000000`00b00000 00000000`00b0c000   WaWorkerHost
00000000`74fb0000 00000000`74fbd000   WaRuntimeProxy

0:016> lm m *Azure*
start             end                 module name
00000000`57cd0000 00000000`57d26000   Microsoft_WindowsAzure_StorageClient
00000000`58820000 00000000`5886c000   Microsoft_WindowsAzure_Diagnostics
00000000`5c750000 00000000`5c764000   Microsoft_WindowsAzure_ServiceRuntime

Development platform can be distinguished for now by looking at ntdll version: 

0:016> lmv m ntdll
start             end                 module name
00000000`76de0000 00000000`76f5f000   ntdll
    Loaded symbol image file: ntdll.dll
    Image path: D:\Windows\System32\ntdll.dll
    Image name: ntdll.dll
    Timestamp:        Fri May 13 21:45:21 2011 (4DCD9861)
    CheckSum:         00188814
    ImageSize:        0017F000
    File version:     6.0.6002.18446
    Product version:  6.0.6002.18446
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     ntdll.dll
    OriginalFilename: ntdll.dll
    ProductVersion:   6.0.6002.18446
    FileVersion:      6.0.6002.18446 (rd_os_v1.110513-1321)
    FileDescription:  NT Layer DLL
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

0:016> lmv m ntdll
start             end                 module name
00000000`775a0000 00000000`7774b000   ntdll
    Loaded symbol image file: ntdll.dll
    Image path: C:\Windows\System32\ntdll.dll
    Image name: ntdll.dll
    Timestamp:        Tue Jul 14 02:32:27 2009 (4A5BE02B)
    CheckSum:         001B1CB5
    ImageSize:        001AB000
    File version:     6.1.7600.16385
    Product version:  6.1.7600.16385
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     ntdll.dll
    OriginalFilename: ntdll.dll
    ProductVersion:   6.1.7600.16385
    FileVersion:      6.1.7600.16385 (win7_rtm.090713-1255)
    FileDescription:  NT Layer DLL
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

We see that real Windows Azure is currently RD OS V1 (at least in a datacenter chosen for West European affinity during deployment).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Icons for Memory Dump Analysis Patterns (Part 95)

Thursday, June 9th, 2011

Today we introduce an icon for Custom Exception Handler (kernel space) pattern:

B/W

Color

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 136)

Wednesday, June 1st, 2011

Version-Specific Extension is a pattern similar to Platform-Specific Debugger pattern by suggesting the course of the further debugging actions. Similar instructions are given when a debugger depends on specialized modules differing from platform (or application) version. We consider here a .NET example where opening a dump shows only that it was perhaps saved manually with possible hidden exceptions that need to be dug out:

0:000> !analyze -v

FAULTING_IP:
+0
00000000`00000000 ??              ???

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 0000000000000000
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 0

We notice a failed attempt for .NET analysis and the following instructions on how correct it:

MANAGED_STACK: !dumpstack -EE
Failed to load data access DLL, 0×80004005

Verify that 1) you have a recent build of the debugger (6.2.14 or newer)
            2) the file mscordacwks.dll that matches your version of mscorwks.dll is in the version directory
            3) or, if you are debugging a dump file, verify that the file mscordacwks_<arch>_<arch>_<version>.dll is on your symbol path.
            4) you are debugging on the same architecture as the dump file. For example, an IA64 dump file must be debugged on an IA64 machine.

You can also run the debugger command .cordll to control the debugger's load of mscordacwks.dll.  .cordll -ve -u -l will do a verbose reload. If that succeeds, the SOS command should work on retry.

If you are debugging a minidump, you need to make sure that your executable path is pointing to mscorwks.dll as well.

Because we know that we have .NET framework installed on a postmortem debugging machine we check the target module version:

0:000> lmv m mscorwks
start             end                 module name
000007fe`ee380000 000007fe`eed1d000   mscorwks   (pdb symbols)       
    Loaded symbol image file: mscorwks.dll
    Image path: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscorwks.dll
    Image name: mscorwks.dll
    Timestamp:        Sun Feb 06 20:53:54 2011 (4D4F0A62)
    CheckSum:         00990593
    ImageSize:        0099D000
    File version:     2.0.50727.5444
    Product version:  2.0.50727.5444
    File flags:       0 (Mask 3F)
    File OS:          4 Unknown Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® .NET Framework
    InternalName:     mscorwks.dll
    OriginalFilename: mscorwks.dll
    ProductVersion:   2.0.50727.5444
    FileVersion:      2.0.50727.5444 (Win7SP1GDR.050727-5400)
    FileDescription:  Microsoft .NET Runtime Common Language Runtime - WorkStation
    LegalCopyright:   © Microsoft Corporation.  All rights reserved.
    Comments:         Flavor=Retail

It is slightly newer (.5444) than we have installed (.3619). The customer also sent their framework version together with the memory dump file. So we unload the current SOS extension (for details please see Managed Code Exception pattern): 

0:000> .chain
Extension DLL chain:
    C:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos: image 2.0.50727.3619, API 1.0.0, built Mon Oct 25 06:52:09 2010
        [path: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos.dll]
    dbghelp: image 6.11.0001.404, API 6.1.6, built Thu Feb 26 02:10:27 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\dbghelp.dll]
    ext: image 6.11.0001.404, API 1.0.0, built Thu Feb 26 02:10:26 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\winext\ext.dll]
    exts: image 6.11.0001.404, API 1.0.0, built Thu Feb 26 02:10:17 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\WINXP\exts.dll]
    uext: image 6.11.0001.404, API 1.0.0, built Thu Feb 26 02:10:20 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\winext\uext.dll]
    ntsdexts: image 6.1.7015.0, API 1.0.0, built Thu Feb 26 02:09:22 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\WINXP\ntsdexts.dll]

0:000> .unload C:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos
Unloading C:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos extension DLL

and load the customer version:

0:000> .load \MyData\sos.dll

0:000> .chain
Extension DLL chain:
    \MyDatasos.dll: image 2.0.50727.5444, API 1.0.0, built Sun Feb 06 21:14:12 2011
        [path: \MyData\sos.dll]
    dbghelp: image 6.11.0001.404, API 6.1.6, built Thu Feb 26 02:10:27 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\dbghelp.dll]
    ext: image 6.11.0001.404, API 1.0.0, built Thu Feb 26 02:10:26 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\winext\ext.dll]
    exts: image 6.11.0001.404, API 1.0.0, built Thu Feb 26 02:10:17 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\WINXP\exts.dll]
    uext: image 6.11.0001.404, API 1.0.0, built Thu Feb 26 02:10:20 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\winext\uext.dll]
    ntsdexts: image 6.1.7015.0, API 1.0.0, built Thu Feb 26 02:09:22 2009
        [path: C:\Program Files\Debugging Tools for Windows (x64)\WINXP\ntsdexts.dll]

0:000> .cordll -ve -u -l
CLR DLL status: No load attempts

Then we do a load attempt: 

0:000> !CLRStack
CLRDLL: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscordacwks.dll:2.0.50727.3619 f:0
doesn't match desired version 2.0.50727.5444 f:0
CLRDLL: Unable to find mscordacwks_AMD64_AMD64_2.0.50727.5444.dll by mscorwks search
CLRDLL: Unable to find ‘mscordacwks_AMD64_AMD64_2.0.50727.5444.dll’ on the path
CLRDLL: Unable to get version info for ‘c:\mss\mscorwks.dll\4D4F0A6299d000\mscordacwks_AMD64_AMD64_2.0.50727.5444.dll’, Win32 error 0n87
CLRDLL: ERROR: Unable to load DLL mscordacwks_AMD64_AMD64_2.0.50727.5444.dll, Win32 error 0n87
Failed to load data access DLL, 0×80004005

Verify that 1) you have a recent build of the debugger (6.2.14 or newer)
            2) the file mscordacwks.dll that matches your version of mscorwks.dll is in the version directory
            3) or, if you are debugging a dump file, verify that the file mscordacwks_<arch>_<arch>_<version>.dll is on your symbol path.
            4) you are debugging on the same architecture as the dump file. For example, an IA64 dump file must be debugged on an IA64 machine.

You can also run the debugger command .cordll to control the debugger's load of mscordacwks.dll.  .cordll -ve -u -l will do a verbose reload. If that succeeds, the SOS command should work on retry.

If you are debugging a minidump, you need to make sure that your executable path is pointing to mscorwks.dll as well.

We rename mscordacwks.dll to mscordacwks_AMD64_AMD64_2.0.50727.5444.dll and retry: 

0:000> .cordll -ve -u -l
CLR DLL status: No load attempts

0:000> !CLRStack
CLRDLL: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscordacwks.dll:2.0.50727.3619 f:0
doesn't match desired version 2.0.50727.5444 f:0
CLRDLL: Loaded DLL \MyData\mscordacwks_AMD64_AMD64_2.0.50727.5444.dll
OS Thread Id: 0×16e8 (0)
Child-SP         RetAddr          Call Site
00000000002fe570 000007feeaf8e378 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms. UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32, Int32, Int32)
00000000002fe7c0 000007feeaf8dde5 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
00000000002fe910 000007ff002364b6 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
00000000002fe970 000007feee6414c2 MyApplication.Main(System.String[])

0:000> !pe
Exception object: 00000000034a13f8
Exception type: System.IO.FileNotFoundException
Message: Could not load file or assembly 'System.Windows.Forms.XmlSerializers, Version=2.0.0.0, Culture=neutral, PublicKeyToken= ...' or one of its dependencies. The system cannot find the file specified.
InnerException: System.IO.FileNotFoundException, use !PrintException 00000000034a1b28 to see more
StackTrace (generated):
    SP               IP               Function
    00000000002FD0A0 0000000000000001 mscorlib_ni!System.Reflection.Assembly._nLoad(System.Reflection.AssemblyName, System.String, System.Security.Policy.Evidence, System.Reflection.Assembly, System.Threading.StackCrawlMark ByRef, Boolean, Boolean)+0x2
    00000000002FD0A0 000007FEED7ABF61 mscorlib_ni!System.Reflection.Assembly.InternalLoad(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Threading.StackCrawlMark ByRef, Boolean)+0x1a1
    00000000002FD130 000007FEED7E4804 mscorlib_ni!System.Reflection.Assembly.Load(System.Reflection.AssemblyName)+0x24
    00000000002FD170 000007FEE7855C0A System_Xml_ni!System.Xml.Serialization.TempAssembly.LoadGeneratedAssembly(System.Type, System.String, System.Xml.Serialization.XmlSerializerImplementation ByRef)+0x11a

StackTraceString: <none>
HResult: 80070002

0:000> !PrintException 00000000034a1b28
Exception object: 00000000034a1b28
Exception type: System.IO.FileNotFoundException
Message: Could not load file or assembly 'System.Windows.Forms.XmlSerializers, Version=2.0.0.0, Culture=neutral, PublicKeyToken=...' or one of its dependencies. The system cannot find the file specified.
InnerException: <none>
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 80070002

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

From Bugchecks to Patterns

Tuesday, May 24th, 2011

One of the questions asked during Introduction to Pattern-Driven Software Problem Solving Webinar was how to map bugcheck codes to crash dump analysis patterns. I’m starting this post to provide a few initial mappings and plan to extend it later.

BAD_POOL_CALLER (c2)

BAD_POOL_HEADER (19)

Dynamic Memory Corruption (kernel pool)

Self-Diagnosis (kernel mode)

UNEXPECTED_KERNEL_MODE_TRAP (7f)
Arg1: 00000000, EXCEPTION_DIVIDED_BY_ZERO

Divide by Zero (kernel mode)

NO_MORE_IRP_STACK_LOCATIONS (35)

Stack Overflow (software implementation)

MANUALLY_INITIATED_CRASH (e2)

NMI_HARDWARE_FAILURE (80)

CRITICAL_OBJECT_TERMINATION (f4)

KMODE_EXCEPTION_NOT_HANDLED (1e)
Arg2: 000000000000f001, The address that the exception occurred at

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
Arg4: 0000f001, address which referenced memory

Manual Dump (kernel)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Accelerated Windows Memory Dump Analysis Training Course

Sunday, May 8th, 2011

Learn how to analyze application, service and system crashes and freezes, navigate through memory dump space and diagnose heap corruption, memory leaks, CPU spikes, blocked threads, deadlocks, wait chains, and much more. We use a unique and innovative pattern-driven analysis approach to speed up the learning curve.

Accelerated Windows Memory Dump Analysis Logo

Memory Dump Analysis Services organizes a training course.

If you are registered you are allowed to optionally submit your memory dumps before the training. This will allow us in addition to the carefully constructed problems tailor extra examples to the needs of the attendees. All registrants will receive an optional pre-test before the first session and an optional test after each session.

The training consists of 2 four-hour sessions (4 hours on the first day and 4 hours on the next day). When you finish the training you additionally get:

  1. A full transcript in PDF format (retail price $200)
  2. 5 volumes of Memory Dump Analysis Anthology in PDF format (retail price $100)
  3. An attendance certificate (PDF format)

Prerequisites: Basic Windows troubleshooting

Session 1: August 1, 2011 2:00 PM - 6:00 PM BST
Session 2: August 2, 2011 2:00 PM - 6:00 PM BST

Price: 210 USD

Space is limited.
Reserve your remote training seat now.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Metadefect Template Library (Part 0)

Sunday, May 1st, 2011

To model software behavior at application and system levels, test generative debugging scenarious and construct software defects using metaprogramming I started working on MdTL (Metadefect Template Library). Its consists of C++ templates for structural and behavioral patterns. The simplest examples include Threads<Spike> and Spike<Thread>. The template classes can be more complex, of course, utilizing the full power of C++, STL, and existing libraries like Boost. The unique and novel feature of this library is the inclusion of dual classes of behaviour such as Leak<>, Deadlock<>, Overflow<>, Residue<>, Exception<>, Contention<>, etc. parameterized by various structural memory classes like Process<>, Thread<>, Heap<>, Stack<>, Region<>, Buffer<>, etc. MdTL also includes classes to model software tracing and this will be used for simultaneous software trace and memory dump analysis case study in the forthcoming presentation. I also plan to extend this metaprogramming approach in the future to model malware analysis patterns by introducing Metamalware Template Library (MmTL).

Forthcoming parts will introduce template classes and examples.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

The New School of Debugging: What’s New

Thursday, April 28th, 2011

The New School of Debugging places pattern-driven memory and software trace analysis as a solid foundation for any debugging methodology, processes, tools and techniques. Unified Debugging Patterns and Generative Debugging Framework are additional features of the new school. To highlight its essentials I’ve created this draft diagram:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 69b)

Tuesday, April 26th, 2011

This pattern is a kernel mode counterpart to Self-Diagnosis in user mode. It is just a collection of bugcheck codes where a problem is usually detected before corruption causes a fault, exception or trap. Typical example would be a detection of a failed assertion or corrupt structures such as:

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of the problem, and then special pool applied to the suspect tags or the driver verifier to a suspect driver.
Arguments:
Arg1: 00000020, a pool block header size is corrupt.
Arg2: 8b79d078, The pool entry we were looking for within the page.
Arg3: 8b79d158, The next pool entry.
Arg4: 8a1c0004, (reserved)

More examples would be added in the forthcoming case studies.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 127b)

Monday, April 25th, 2011

Here we continue with Technology-Specific Subtrace pattern series started earlier with COM interface invocation example. In this part we consider dynamic memory allocation example in kernel space (kernel pool). Usually pool corruption is detected during pool memory allocation or release with a special bugcheck code, for example:

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of the problem, and then special pool applied to the suspect tags or the driver verifier to a suspect driver.
Arguments:
Arg1: 00000020, a pool block header size is corrupt.
Arg2: 8b79d078, The pool entry we were looking for within the page.
Arg3: 8b79d158, The next pool entry.
Arg4: 8a1c0004, (reserved)

However, pool corruption might be deeper enough to trigger an access violation even before self-diagnosis. In such cases stack subtraces with functions like ExFreePoolWithTag might point to troubleshooting and debugging directions:

ATTEMPTED_WRITE_TO_READONLY_MEMORY (be)
An attempt was made to write to readonly memory. The guilty driver is on the stack trace (and is typically the current instruction pointer).
When possible, the guilty driver’s name (Unicode string) is printed on the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: 00470044, Virtual address for the attempted write.
Arg2: 06d39025, PTE contents.
Arg3: aec0fb30, (reserved)
Arg4: 0000000a, (reserved)

TRAP_FRAME:  aec0fb30 -- (.trap 0xffffffffaec0fb30)
ErrCode = 00000003
eax=8ac12d38 ebx=8b700040 ecx=000001ff edx=00470040 esi=8ac12db8 edi=808b0b40
eip=808949e7 esp=aec0fba4 ebp=aec0fbf0 iopl=0 nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000  efl=00010202
nt!ExFreePoolWithTag+0x6a3:
808949e7 895a04          mov     dword ptr [edx+4],ebx ds:0023:00470044=????????

STACK_TEXT:
aec0faa0 80860121 000000be 00470044 06d39025 nt!KeBugCheckEx+0x1b
aec0fb18 8088e490 00000001 00470044 00000000 nt!MmAccessFault+0xb25
aec0fb18 808949e7 00000001 00470044 00000000 nt!KiTrap0E+0xdc
aec0fbf0 808d93b5 8ac12dc0 00000000 00000000 nt!ExFreePoolWithTag+0×6a3
aec0fc08 808cd304 e5ae5770 8ac12dc0 8aa77db0 nt!CmpFreePostBlock+0×4d
aec0fc3c 8082ea53 8ac12dc0 aec0fc88 aec0fc7c nt!CmpPostApc+0xde
aec0fc8c 80833eec 00000000 00000000 00000000 nt!KiDeliverApc+0xf9
aec0fcc4 808290bd aec0fd64 8099781c 0160fd44 nt!KiSwapThread+0×300
aec0fd0c 809978a0 00000001 00000000 f77275e0 nt!KeDelayExecutionThread+0×2ab
aec0fd54 8088b45c 00000000 0160fd74 0160fd9c nt!NtDelayExecution+0×84
aec0fd54 7c82847c 00000000 0160fd74 0160fd9c nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0160fd9c 00000000 00000000 00000000 00000000 0×7c82847c

1: kd> !pool 8ac12dc0
Pool page 8ac12dc0 region is Nonpaged pool
 8ac12000 size:  858 previous size:    0  (Allocated)  TWPG
 8ac12858 size:    8 previous size:  858  (Free)       ….
 8ac12860 size:   20 previous size:    8  (Allocated)  VadS
 8ac12880 size:    8 previous size:   20  (Free)       NtFs
 8ac12888 size:   20 previous size:    8  (Allocated)  VadS
 8ac128a8 size:   28 previous size:   20  (Allocated)  Ntfn
 8ac128d0 size:   30 previous size:   28  (Allocated)  Vad
 8ac12900 size:   40 previous size:   30  (Allocated)  Muta (Protected)
 8ac12940 size:   38 previous size:   40  (Allocated)  Sema (Protected)
 8ac12978 size:   40 previous size:   38  (Allocated)  Muta (Protected)
 8ac129b8 size:  270 previous size:   40  (Allocated)  Thre (Protected)
 8ac12c28 size:   40 previous size:  270  (Allocated)  Ntfr
 8ac12c68 size:   d0 previous size:   40  (Allocated)  DRIV
8ac12d38 is not a valid large pool allocation, checking large session pool…
8ac12d38 is freed (or corrupt) pool
Bad previous allocation size @8ac12d38, last size was 1a

***
*** An error (or corruption) in the pool was detected;
*** Attempting to diagnose the problem.
***
*** Use !poolval 8ac12000 for more details.
***

Pool page [ 8ac12000 ] is __inVALID.

Analyzing linked list...
[ 8ac12c68 --> 8ac12db8 (size = 0x150 bytes)]: Corrupt region
Scanning for single bit errors...

None found

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -