Archive for the ‘Crash Dump Analysis’ Category

Book: Advanced Windows Debugging

Friday, August 17th, 2007

Waiting for this book to be released:

Advanced Windows Debugging by Mario Hewardt and Daniel Pravat

Buy from Amazon

Already ordered it and will post my review as soon as it arrives.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 22)

Thursday, August 16th, 2007

Sometimes we suspect that a problem was caused by some module but WinDbg lmv command doesn’t show the company name and other verbose information for it and Google search has no results for the file name. I call this pattern Unknown Component.

In such cases additional information can be obtained by dumping the module resource section or the whole module address range and looking for ASCII and UNICODE strings. For example (byte values in db output are omitted for clarity):

2: kd> lmv m driver
start    end        module name
f5022000 f503e400   driver   (deferred)
    Image path: \SystemRoot\System32\drivers\driver.sys
    Image name: driver.sys
    Timestamp:        Tue Jun 12 11:33:16 2007 (466E766C)
    CheckSum:         00021A2C
    ImageSize:        0001C400
    Translations:     0000.04b0 0000.04e0 0409.04b0 0409.04e0

2: kd> db f5022000 f503e400
f5022000  MZ..............
f5022010  ........@.......
f5022020  ................
f5022030  ................
f5022040  ........!..L.!Th
f5022050  is program canno
f5022060  t be run in DOS
f5022070  mode....$.......
f5022080  .g,._.B._.B._.B.
f5022090  _.C.=.B..%Q.X.B.
f50220a0  _.B.].B.Y%H.|.B.
f50220b0  ..D.^.B.Rich_.B.
f50220c0  ........PE..L...
f50220d0  lvnF............
...
...
...
f503ce30  ................
f503ce40  ................
f503ce50  ................
f503ce60  ............0...
f503ce70  ................
f503ce80  ....H...........
f503ce90  ..........4...V.
f503cea0  S._.V.E.R.S.I.O.
f503ceb0  N._.I.N.F.O.....
f503cec0  ................
f503ced0  ........?.......
f503cee0  ................
f503cef0  ....P.....S.t.r.
f503cf00  i.n.g.F.i.l.e.I.
f503cf10  n.f.o...,.....0.
f503cf20  4.0.9.0.4.b.0...
f503cf30  4.....C.o.m.p.a.
f503cf40  n.y.N.a.m.e.....
f503cf50  M.y.C.o.m.p. .A.
f503cf60  G...p.$...F.i.l.
f503cf70  e.D.e.s.c.r.i.p.
f503cf80  t.i.o.n.....M.y.
f503cf90  .B.i.g. .P.r.o.
f503cfa0  d.u.c.t. .H.o.o.
f503cfb0  k...............
f503cfc0  ................
f503cfd0  ....4.....F.i.l.
f503cfe0  e.V.e.r.s.i.o.n.
f503cff0  ....5...1...0...
f503d000  ????????????????
f503d010  ????????????????
f503d020  ????????????????
f503d030  ????????????????
...
...
...

We see that CompanyName is MyComp AG, FileDescription is My Big Product Hook and FileVersion is 5.0.1.

In our example the same information can be retrieved by dumping the image file header and then finding and dumping the resource section:

2: kd> lmv m driver
start    end        module name
f5022000 f503e400   driver   (deferred)
    Image path: \SystemRoot\System32\drivers\driver.sys
    Image name: driver.sys
    Timestamp:        Tue Jun 12 11:33:16 2007 (466E766C)
    CheckSum:         00021A2C
    ImageSize:        0001C400
    Translations:     0000.04b0 0000.04e0 0409.04b0 0409.04e0

2: kd> !dh f5022000 -f

File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
     14C machine (i386)
       6 number of sections
466E766C time date stamp Tue Jun 12 11:33:16 2007

       0 file pointer to symbol table
       0 number of symbols
      E0 size of optional header
     10E characteristics
            Executable
            Line numbers stripped
            Symbols stripped
            32 bit word machine

OPTIONAL HEADER VALUES
     10B magic #
    6.00 linker version
   190A0 size of code
    30A0 size of initialized data
       0 size of uninitialized data
   1A340 address of entry point
     2C0 base of code
         ----- new -----
00010000 image base
      20 section alignment
      20 file alignment
       1 subsystem (Native)
    4.00 operating system version
    0.00 image version
    4.00 subsystem version
   1C400 size of image
     2C0 size of headers
   21A2C checksum
00100000 size of stack reserve
00001000 size of stack commit
00100000 size of heap reserve
00001000 size of heap commit
       0 [       0] address [size] of Export Directory
   1A580 [      50] address [size] of Import Directory
   1AE40 [     348] address [size] of Resource Directory
       0 [       0] address [size] of Exception Directory
       0 [       0] address [size] of Security Directory
   1B1A0 [    1084] address [size] of Base Relocation Directory
     420 [      1C] address [size] of Debug Directory
       0 [       0] address [size] of Description Directory
       0 [       0] address [size] of Special Directory
       0 [       0] address [size] of Thread Storage Directory
       0 [       0] address [size] of Load Configuration Directory
       0 [       0] address [size] of Bound Import Directory
     2C0 [     15C] address [size] of Import Address Table Directory
       0 [       0] address [size] of Delay Import Directory
       0 [       0] address [size] of COR20 Header Directory
       0 [       0] address [size] of Reserved Directory

2: kd> db f5022000+1AE40 f5022000+1AE40+348
f503ce40  ................
f503ce50  ................
f503ce60  ............0...
f503ce70  ................
f503ce80  ....H...........
f503ce90  ..........4...V.
f503cea0  S._.V.E.R.S.I.O.
f503ceb0  N._.I.N.F.O.....
f503cec0  ................
f503ced0  ........?.......
f503cee0  ................
f503cef0  ....P.....S.t.r.
f503cf00  i.n.g.F.i.l.e.I.
f503cf10  n.f.o...,.....0.
f503cf20  4.0.9.0.4.b.0...
f503cf30  4.....C.o.m.p.a.
f503cf40  n.y.N.a.m.e.....
f503cf50  M.y.C.o.m.p. .A.
f503cf60  G...p.$...F.i.l.
f503cf70  e.D.e.s.c.r.i.p.
f503cf80  t.i.o.n.....M.y.
f503cf90  .B.i.g. .P.r.o.
f503cfa0  d.u.c.t. .H.o.o.
f503cfb0  k...............
f503cfc0  ................
f503cfd0  ....4.....F.i.l.
f503cfe0  e.V.e.r.s.i.o.n.
f503cff0  ....5...1...0...
f503d000  ????????????????
f503d010  ????????????????
...
...
...

- Dmitry Vostokov @ DumpAnalysis.org -

Picturing Computer Memory

Wednesday, August 15th, 2007

An alternative to converting memory dumps to picture files is to save a memory range to a binary file and then convert it to a BMP file. Thus you can view the particular DLL or driver mapped into address space, heap or pool region, etc.

To save a memory range to a file use WinDbg .writemem command:

.writemem d2p-range.bin 00800000 0085e000

or

.writemem d2p-range.bin 00400000 L20000

I wrote a WinDbg script that saves a specified memory range and then calls a shell script which automatically converts saved binary file to a BMP file and then runs whatever picture viewer is registered for .bmp extension.

The WinDbg script code (mempicture.txt):

.writemem d2p-range.bin ${$arg1} ${$arg2}
.if (${/d:$arg3})
{
  .shell -i- mempicture.cmd d2p-range ${$arg3}
}
.else
{
  .shell -i- mempicture.cmd d2p-range
}

The shell script (mempicture.cmd):

dump2picture %1.bin %1.bmp %2
%1.bmp

Because WinDbg installation folder is assumed to be the default directory for both scripts and Dump2Picture.exe they should be copied to the same folder where windbg.exe is located. On my system it is

C:\Program Files\Debugging Tools for Windows

Both scripts are now included in Dump2Picture package available for free download:

Dump2Picture package

To call the script from WinDbg use the following command:

$$>a< mempicture.txt Range [bits-per-pixel]

where Range can be in Address1 Address2 or Address Lxxx format, bits-per-pixel can be 8, 16, 24 or 32. By default it is 32.

For example, I loaded a complete Windows x64 memory dump and visualized HAL (hardware abstraction layer) module:

kd> lm
start             end                 module name
fffff800`00800000 fffff800`0085e000   hal
fffff800`01000000 fffff800`0147b000   nt
fffff97f`ff000000 fffff97f`ff45d000   win32k
...
...
...

kd> $$>a< mempicture.txt fffff800`00800000 fffff800`0085e000
Writing 5e001 bytes...

C:\Program Files\Debugging Tools for Windows>dump2picture d2p-range.bin d2p-range.bmp

Dump2Picture version 1.1
Written by Dmitry Vostokov, 2007

d2p-range.bmp
d2p-range.bin
        1 file(s) copied.

C:\Program Files\Debugging Tools for Windows>d2p-range.bmp
<.shell waiting 10 second(s) for process>
.shell: Process exited
kd>

and Windows Picture and Fax Viewer application was launched and displayed the following picture:

Enjoy :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Unicode Illuminated

Tuesday, August 14th, 2007

I generated a memory dump with plenty of Unicode and ASCII strings “Hello World!” to see how they look on a picture. I assume you know the difference between Unicode (UTF-16) and ASCII encodings: wide characters from the former occupy two bytes:

0:000> db 008c7420 l20
008c7420  48 00 65 00 6c 00 6c 00-6f 00 20 00 57 00 6f 00  H.e.l.l.o. .W.o.
008c7430  72 00 6c 00 64 00 21 00-00 00 00 00 00 00 00 00  r.l.d.!.........

and characters from the latter occupy one byte of memory:

0:000> db 008c72b4 l10
008c72b4  48 65 6c 6c 6f 20 57 6f-72 6c 64 21 00 00 00 00  Hello World!....

You can see that the second byte for Unicode English characters is zero. I converted that memory dump into 8 bits-per-pixel bitmap using Dump2Picture and after zooming it sufficiently in Vista Photo Viewer until pixels become squares I got the following picture that illustrates the difference between Unicode and ASCII strings:

Incidentally the same memory dump converted to 32 bits-per-pixel bitmap shows Unicode “Hello World!” strings in green colors :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Dump2Picture update (version 1.1)

Monday, August 13th, 2007

Previously announced Dump2Picture has been updated to version 1.1 with the following improvements to handle 8 bits-per-pixel images correctly:

- Saves grey scale palette

- Calculates right bitmap width and file size

The update can be downloaded from the same link:

Download Dump2Picture

Now 8 bits-per-pixel Vista kernel dump looks much better:

- Dmitry Vostokov @ DumpAnalysis.org -

Visualizing Memory Leaks

Sunday, August 12th, 2007

Dump2Picture can be used to explore memory leaks visually. I created the following small program in Visual C++ that leaks 64Kb every second:

#include "stdafx.h"
#include <windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
  while (true)
  {
    printf("%x\n", (UINT_PTR)malloc(0xFFFF));
    Sleep(1000);
  }

  return 0;
}

Then I sampled 3 dumps at 7Mb, 17Mb and 32Mb process virtual memory size and converted them as 16 bits-per-pixel bitmaps. On the pictures below we can see that the middle black memory area grows significantly. Obviously malloc function allocates zeroed memory and therefore we see black color.

7Mb process memory dump:

17Mb process memory dump:

32Mb process memory dump:

If we zoom in the black area we would see the following pattern:

 

Colored lines inside are heap control structures that are created for every allocated block of memory. If this is correct then allocating smaller memory blocks would create a hatched pattern. And this is true indeed. The following program leaks 256 byte memory blocks:

#include "stdafx.h"
#include <windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
  while (true)
  {
    printf("%x\n", (UINT_PTR)malloc(0xFF));
    Sleep(1000/0xFF);
  }

  return 0;
}

The corresponding process memory picture and zoomed heap area are the following:

Making allocations 4 times smaller makes heap area to look hatched and zoomed picture is more densely packed by heap control structures:

#include "stdafx.h"
#include <windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
  while (true)
  {
    printf("%x\n", (UINT_PTR)malloc(0xFF/4));
    Sleep((1000/0xFF)/4);
  }

  return 0;
}

 

Here is another example. One service was increasing its memory constantly. The crash dump picture shows huge hatched dark region in the middle:

  

and if we zoom in this region:

Because the pattern and allocation size look uniform it could be the true heap memory leak for some operation that allocates constant size buffers. After opening the dump and looking at heap segments that had grown the most we see the same allocation size indeed:

0:000> !.\w2kfre\ntsdexts.heap -h 5
HEAPEXT: Unable to get address of NTDLL!NtGlobalFlag.
Index   Address  Name      Debugging options enabled
  1:   00140000
  2:   00240000
  3:   00310000
  4:   00330000
  5:   00370000
    Segment at 00370000 to 00380000 (00010000 bytes committed)
    Segment at 01680000 to 01780000 (00100000 bytes committed)
    Segment at 019C0000 to 01BC0000 (00200000 bytes committed)
    Segment at 01BC0000 to 01FC0000 (00400000 bytes committed)
    Segment at 01FC0000 to 027C0000 (00800000 bytes committed)
    Segment at 027C0000 to 037C0000 (01000000 bytes committed)
    Segment at 037C0000 to 057C0000 (02000000 bytes committed)

    Segment at 057C0000 to 097C0000 (00155000 bytes committed)



    057B96E0: 01048 . 01048 [07] - busy (1030), tail fill
    057BA728: 01048 . 01048 [07] - busy (1030), tail fill
    057BB770: 01048 . 01048 [07] - busy (1030), tail fill
    057BC7B8: 01048 . 01048 [07] - busy (1030), tail fill
    057BD800: 01048 . 01048 [07] - busy (1030), tail fill
    057BE848: 01048 . 01048 [07] - busy (1030), tail fill
    057BF890: 01048 . 00770 [14] free fill
  Heap entries for Segment07 in Heap 370000
    057C0040: 00040 . 01048 [07] - busy (1030), tail fill
    057C1088: 01048 . 01048 [07] - busy (1030), tail fill
    057C20D0: 01048 . 01048 [07] - busy (1030), tail fill
    057C3118: 01048 . 01048 [07] - busy (1030), tail fill
    057C4160: 01048 . 01048 [07] - busy (1030), tail fill
    057C51A8: 01048 . 01048 [07] - busy (1030), tail fill


- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 21)

Sunday, August 12th, 2007

Sometimes it is possible that a process crash dump doesn’t have all usual threads inside. For example, you expect at least 4 threads including the main process thread but in the dump you see only 3. If you know that some access violations were reported in the event log before (not necessarily for the same PID) you might suspect that one of threads had been terminated due to some reason. I call this pattern Missing Thread.

In order to simulate this problem I created a small multithreaded program in Visual C++:

#include "stdafx.h"
#include <process.h>

void thread_request(void *param)
{
    while (true);
}

int _tmain(int argc, _TCHAR* argv[])
{
    _beginthread(thread_request, 0, NULL);

    try
    {
        if (argc == 2)
        {
            *(int *)NULL = 0;
        }
    }
    catch (...)
    {
        _endthread();
    }

    while (true);

    return 0;
}

If there is a command line argument then the main thread simulates access violation and finishes in the exception handler. In order to use SEH exceptions with C++ try/catch blocks you have to enable /EHa option in C++ Code Generation properties:

If we run the program without command line parameter and take a manual dump from it we would see 2 threads:

0:000> ~*kL

.  0  Id: 1208.fdc Suspend: 1 Teb: 7efdd000 Unfrozen
ChildEBP RetAddr
0012ff70 00401403 MissingThread!wmain+0x58
0012ffc0 7d4e7d2a MissingThread!__tmainCRTStartup+0x15e
0012fff0 00000000 kernel32!BaseProcessStart+0x28

   1  Id: 1208.102c Suspend: 1 Teb: 7efda000 Unfrozen
ChildEBP RetAddr
005dff7c 004010ef MissingThread!thread_request
005dffb4 00401188 MissingThread!_callthreadstart+0x1b
005dffb8 7d4dfe21 MissingThread!_threadstart+0x73
005dffec 00000000 kernel32!BaseThreadStart+0x34

0:000> ~
.  0  Id: 1208.fdc Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: 1208.102c Suspend: 1 Teb: 7efda000 Unfrozen

0:000> dd 7efdd000 l4
7efdd000  0012ff64 00130000 0012e000 00000000

I also dumped TEB of the main thread. However if we run the program with any command line parameter and look at its manual dump we would see only one thread with the main thread missing:

0:000> ~*kL

.  0  Id: 1004.12e8 Suspend: 1 Teb: 7efda000 Unfrozen
ChildEBP RetAddr
005dff7c 004010ef MissingThread!thread_request
005dffb4 00401188 MissingThread!_callthreadstart+0x1b
005dffb8 7d4dfe21 MissingThread!_threadstart+0x73
005dffec 00000000 kernel32!BaseThreadStart+0x34

0:000> ~
.  0  Id: 1004.12e8 Suspend: 1 Teb: 7efda000 Unfrozen

If we try to dump TEB address and stack data from the missing main thread we would see that the memory was already decommitted:

0:000> dd 7efdd000 l4
7efdd000  ???????? ???????? ???????? ????????

0:000> dds 0012e000  00130000
0012e000  ????????
0012e004  ????????
0012e008  ????????
0012e00c  ????????
0012e010  ????????
0012e014  ????????
0012e018  ????????
0012e01c  ????????
0012e020  ????????
0012e024  ????????

The same effect can be achieved in the similar program that exits the thread in the custom unhandled exception filter:

#include "stdafx.h"
#include <process.h>
#include <windows.h>

LONG WINAPI CustomUnhandledExceptionFilter(struct _EXCEPTION_POINTERS* ExceptionInfo)
{
    ExitThread(-1);
}

void thread_request(void *param)
{
    while (true);
}

int _tmain(int argc, _TCHAR* argv[])
{
    _beginthread(thread_request, 0, NULL);
    SetUnhandledExceptionFilter(CustomUnhandledExceptionFilter);

    *(int *)NULL = 0;

    while (true);

    return 0;
}

The solution to catch an exception that results in a thread termination would be to run the program under WinDbg or any other debugger:

CommandLine: C:\MissingThread\MissingThread.exe 1
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 0040f000   MissingThread.exe
ModLoad: 7d4c0000 7d5f0000   NOT_AN_IMAGE
ModLoad: 7d600000 7d6f0000   C:\W2K3\SysWOW64\ntdll32.dll
ModLoad: 7d4c0000 7d5f0000   C:\W2K3\syswow64\kernel32.dll
(df0.12f0): Break instruction exception - code 80000003 (first chance)
eax=7d600000 ebx=7efde000 ecx=00000005 edx=00000020 esi=7d6a01f4 edi=00221f38
eip=7d61002d esp=0012fb4c ebp=0012fcac iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
ntdll32!DbgBreakPoint:
7d61002d cc              int     3

0:000> g
ModLoad: 71c20000 71c32000   C:\W2K3\SysWOW64\tsappcmp.dll
ModLoad: 77ba0000 77bfa000   C:\W2K3\syswow64\msvcrt.dll
ModLoad: 00410000 004ab000   C:\W2K3\syswow64\ADVAPI32.dll
ModLoad: 7da20000 7db00000   C:\W2K3\syswow64\RPCRT4.dll
ModLoad: 7d8d0000 7d920000   C:\W2K3\syswow64\Secur32.dll
(df0.12f0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=000007a0 ebx=7d4d8df9 ecx=78b842d9 edx=00000000 esi=00000002 edi=00000ece
eip=00401057 esp=0012ff50 ebp=0012ff70 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
MissingThread!wmain+0x47:
00401057 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

0:000> kL
ChildEBP RetAddr
0012ff70 00401403 MissingThread!wmain+0x47
0012ffc0 7d4e7d2a MissingThread!__tmainCRTStartup+0x15e
0012fff0 00000000 kernel32!BaseProcessStart+0x28

If live debugging is not possible and you are interested in crash dumps saved upon a first chance exception before it is processed in an exception handler you can also use MS userdump after you install it and enable All Exceptions in the Process Monitoring Rules dialog box. Another tool can be used is ADPlus in crash mode from Debugging Tools for Windows.

- Dmitry Vostokov @ DumpAnalysis.org -

Another look at page faults

Thursday, August 9th, 2007

Recently observed this bugcheck with reported “valid” address (in blue):

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: e16623fc, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: ae2b222e, address which referenced memory

TRAP_FRAME:  a54a4a40 -- (.trap 0xffffffffa54a4a40)
ErrCode = 00000000
eax=00000000 ebx=00000000 ecx=e16623f0 edx=00000000 esi=ae2ce428 edi=a54a4b4c
eip=ae2b222e esp=a54a4ab4 ebp=a54a4ac4 iopl=0 nv up ei pl nz ac po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010212
driver!ProcessCommand+0x44:
ae2b222e 39590c cmp dword ptr [ecx+0Ch],ebx ds:0023:e16623fc=00000000

1: kd> dd e16623fc l4
e16623fc 00000000 00790000 004c4c44 00010204

The address belongs to a paged pool:

1: kd> !pool e16623fc
Pool page e16623fc region is Paged pool
 e1662000 size:  3a8 previous size:    0  (Allocated)  NtfF
 e16623a8 size:   10 previous size:  3a8  (Free)       ….
 e16623b8 size:   28 previous size:   10  (Allocated)  Ntfo
 e16623e0 size:    8 previous size:   28  (Free)       CMDa
*e16623e8 size:   20 previous size:    8  (Allocated) *DRV

So why do we have the bugcheck here if the memory wasn’t paged out? This is because page faults occur when pages are marked as invalid in page tables and not only when they are paged out to a disk. We can check whether an address belongs to an invalid page by using !pte command:

1: kd> !pte e16623fc
               VA e16623fc
PDE at 00000000C0603858    PTE at 00000000C070B310
contains 00000000F5434863  contains 00000000E817A8C2
pfn f5434 ---DA--KWEV                           not valid
                       Transition: e817a
                       Protect: 6 - ReadWriteExecute

Let’s check our PTE (page table entry):

1: kd> .formats 00000000E817A8C2
Evaluate expression:
  Hex: e817a8c2
  Decimal: -401102654
  Octal: 35005724302
  Binary: 11101000 00010111 10101000 11000010

We see that 0th (Valid) bit is cleared and this means that PTE marks the page as invalid and also 11th bit (Transition) is set which marks that page as on standby or modified lists. When referenced and IRQL is less than 2 the page will be made valid and added to a process working set. We see the address as “valid” in WinDbg  because that page was not paged out and present in a crash dump. But it is marked as invalid and therefore triggers the page fault. Page fault handler sees that IRQL == 2 and generates D1 bugcheck.

- Dmitry Vostokov @ DumpAnalysis.org -

Basic Windows Crash Dump Analysis (Part 1)

Tuesday, August 7th, 2007

I have published the HTML version (with minor updates) of the original training presentation created in 2005. 

The first part explains various concepts like process, thread, crash, hang, etc. and introduces memory dump classification from memory type and procedure perspectives. It also covers crash dump gathering and verification, explains symbols and lists common scenarios. Here is the link: 

Basic Windows Crash Dump Analysis (Part 1)

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump Analysis Patterns

Monday, August 6th, 2007

Because the number of crash dump analysis patterns is growing I created a page that lists them and new will be added to it as soon as I write them. Some patterns are not really related to crashes, for example, memory leaks or CPU spikes so it would be better to call all of them memory dump analysis patterns instead. 

Memory Dump Analysis Patterns

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 20a)

Monday, August 6th, 2007

Memory Leak is another pattern that may be finally manifested as Insufficient Memory pattern in a crash dump. In this part I’ll cover process heap memory leaks. They are usually identified when the process virtual memory size grows over time. It starts with 80Mb and instead of fluctuating normally below 100Mb it suddenly starts growing to 150Mb after some time and then to 300Mb next day and then grows to 600Mb and so on.

Usually a process heap is under suspicion here. To confirm this we need to sample 2-3 consecutive user memory dumps at process sizes 100Mb, 200Mb and 300Mb, for example. This can be done by using Microsoft userdump.exe command line tool. Then we can see whether there is any heap growth by using !heap -s WinDbg command:

1st dump

0:000> !heap -s
  Heap     Flags   Reserv  Commit  Virt
                    (k)     (k)    (k)
---------------------------------------
00140000 00000002    2048   1048   1112
00240000 00008000      64     12     12
00310000 00001002    7232   4308   4600
00420000 00001002    1024    520    520
00340000 00001002     256     40     40
00720000 00001002      64     32     32
00760000 00001002      64     48     48
01020000 00001002     256     24     24
02060000 00001002      64     16     16
02070000 00001003     256    120    120
020b0000 00001003     256      4      4
020f0000 00001003     256      4      4
02130000 00001003     256      4      4
02170000 00001003     256      4      4
021f0000 00001002    1088     76     76
021e0000 00001002      64     16     16
02330000 00001002    1088    428    428
02340000 00011002     256     12     12
02380000 00001002      64     12     12
024c0000 00001003      64      8      8
028d0000 00001002    7232   3756   6188
02ce0000 00001003      64      8      8
07710000 00001002      64     20     20
07b20000 00001002      64     16     16
07f30000 00001002      64     16     16
09050000 00001002     256     12     12
09c80000 00001002  130304 102340 102684
007d0000 00001003     256    192    192
00810000 00001003     256      4      4
0bdd0000 00001003     256      4      4
0be10000 00001003     256      4      4
0be50000 00001003     256      4      4
0be90000 00001003     256     56     56
0bed0000 00001003     256      4      4
0bf10000 00001003     256      4      4
0bf50000 00001003     256      4      4
0bf90000 00001003     256      4      4
00860000 00001002      64     20     20
00870000 00001002      64     20     20
0d760000 00001002     256     12     12
0dc60000 00001002    1088    220    220
0c3a0000 00001002      64     12     12
0c3d0000 00001002    1088    160    364
08420000 00001002      64     64     64

2nd dump

0:000> !heap -s
  Heap     Flags   Reserv  Commit  Virt
                    (k)     (k)    (k)
---------------------------------------
00140000 00000002    8192   4600   4600
00240000 00008000      64     12     12
00310000 00001002    7232   4516   4600
00420000 00001002    1024    520    520
00340000 00001002     256     44     44
00720000 00001002      64     32     32
00760000 00001002      64     48     48
01020000 00001002     256     24     24
02060000 00001002      64     16     16
02070000 00001003     256    124    124
020b0000 00001003     256      4      4
020f0000 00001003     256      4      4
02130000 00001003     256      4      4
02170000 00001003     256      4      4
021f0000 00001002    1088     76     76
021e0000 00001002      64     16     16
02330000 00001002    1088    428    428
02340000 00011002     256     12     12
02380000 00001002      64     12     12
024c0000 00001003      64      8      8
028d0000 00001002    7232   3796   6768
02ce0000 00001003      64      8      8
07710000 00001002      64     20     20
07b20000 00001002      64     16     16
07f30000 00001002      64     16     16
09050000 00001002     256     12     12
09c80000 00001002  261376 221152 221928
007d0000 00001003     256    192    192
00810000 00001003     256      4      4
0bdd0000 00001003     256      4      4
0be10000 00001003     256      4      4
0be50000 00001003     256      4      4
0be90000 00001003     256     60     60
0bed0000 00001003     256      4      4
0bf10000 00001003     256      4      4
0bf50000 00001003     256      4      4
0bf90000 00001003     256      4      4
00860000 00001002      64     20     20
00870000 00001002      64     20     20
0d760000 00001002     256     12     12
0dc60000 00001002    1088    228    228
0c3a0000 00001002      64     12     12
0c3d0000 00001002    1088    168    224
08450000 00001002      64     64     64

We see that the only significant heap growth is at 09c80000 address, from 130Mb to 260Mb. However this doesn’t say which code uses it. In order to find the code we need to enable the so called user mode stack trace database. Please refer to the following Citrix article:

http://support.citrix.com/article/CTX106970

The example in the article is for Citrix IMA service but you can replace ImaSrv.exe with any other executable name.

Suppose that after enabling user mode stack trace database and restarting the program or service we see the growth and we get memory dumps with the following suspicious heap highlighted in red:

0:000> !gflag
Current NtGlobalFlag contents: 0x00001000
ust - Create user mode stack trace database

0:000> !heap -s
NtGlobalFlag enables following debugging aids for new heaps:
  stack back traces
LFH Key: 0x2687ed29
  Heap     Flags   Reserv  Commit  Virt
                    (k)     (k)    (k)
---------------------------------------
00140000 58000062    4096    488    676
00240000 58008060      64     12     12
00360000 58001062    3136   1152   1216
003b0000 58001062      64     32     32
01690000 58001062     256     32     32
016d0000 58001062    1024    520    520
003e0000 58001062      64     48     48
02310000 58001062     256     24     24
02b30000 58001062      64     16     16
02b40000 58001063     256     64     64
02b80000 58001063     256      4      4
02bc0000 58001063     256      4      4
02c00000 58001063     256      4      4
02c40000 58001063     256      4      4
02c80000 58001063     256      4      4
02cc0000 58001063     256      4      4
02d30000 58001063      64      4      4
03140000 58001062    7232   4160   4896
03550000 58001063      64      4      4
07f70000 58001062      64     12     12
08380000 58001062      64     12     12
08790000 58001062      64     12     12
091d0000 58011062     256     12     12
09210000 58001062      64     16     16
09220000 58001062      64     12     12
092a0000 58001062      64     12     12
09740000 58001062     256     12     12
0b1a0000 58001062      64     12     12
0b670000 58001062   64768  39508  39700
0b7b0000 58001062      64     12     12
0c650000 58001062    1088    192    192

Every heap is subdivided into several segments and to see which segments have grown the most we can use !heap -m <heap address> command:

0:000> !heap -m 0b670000
Index   Address  Name      Debugging options enabled
29:   0b670000
    Segment at 0b670000 to 0b6b0000 (00040000 bytes committed)
    Segment at 0c760000 to 0c860000 (00100000 bytes committed)
    Segment at 0c980000 to 0cb80000 (001fe000 bytes committed)
    Segment at 0cb80000 to 0cf80000 (003cc000 bytes committed)
    Segment at 0dc30000 to 0e430000 (00800000 bytes committed)
    Segment at 12330000 to 13330000 (01000000 bytes committed)
    Segment at 13330000 to 15330000 (0078b000 bytes committed)



If we use !heap -a <heap address> command then in addition to the list of heap segments individual heap allocation entries will be dumped as well. This could be very big output and we should open the log file in advance by using .logopen <file name> command.

The output can be like this (taken from another dump):

0:000> !heap -a 000a0000
...
...
...
    Segment00 at 000a0000:
        Flags:           00000000
        Base:            000a0000
        First Entry:     000a0580
        Last Entry:      000b0000
        Total Pages:     00000010
        Total UnCommit:  00000002
        Largest UnCommit:00000000
        UnCommitted Ranges: (1)

    Heap entries for Segment00 in Heap 000a0000
        000a0000: 00000 . 00580 [101] - busy (57f)
        000a0580: 00580 . 00240 [101] - busy (23f)
        000a07c0: 00240 . 00248 [101] - busy (22c)
        000a0a08: 00248 . 00218 [101] - busy (200)
        000a0c20: 00218 . 00ce0 [100]
        000a1900: 00ce0 . 00f88 [101] - busy (f6a)
        000a2888: 00f88 . 04418 [101] - busy (4400)
        000a6ca0: 04418 . 05958 [101] - busy (5940)
        000ac5f8: 05958 . 00928 [101] - busy (90c)
        000acf20: 00928 . 010c0 [100]
        000adfe0: 010c0 . 00020 [111] - busy (1d)
        000ae000:      00002000      - uncommitted bytes.

Then we can inspect individual entries to see stack traces that allocated them by using !heap -p -a <heap entry address> command:

0:000> !heap -p -a 000a6ca0
    address 000a6ca0 found in
    _HEAP @ a0000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        000a6ca0 0b2b 0000  [00]   000a6cb8    05940 - (busy)
        Trace: 2156ac
        7704dab4 ntdll!RtlAllocateHeap+0x0000021d
        75c59b12 USP10!UspAllocCache+0x0000002b
        75c62381 USP10!AllocSizeCache+0x00000048
        75c61c74 USP10!FindOrCreateSizeCacheWithoutRealizationID+0x00000124
        75c61bc0 USP10!FindOrCreateSizeCacheUsingRealizationID+0x00000070
        75c59a97 USP10!UpdateCache+0x0000002b
        75c59a61 USP10!ScriptCheckCache+0x0000005c
        75c59d04 USP10!ScriptStringAnalyse+0x0000012a
        7711140f LPK!LpkStringAnalyse+0x00000114
        7711159e LPK!LpkCharsetDraw+0x00000302
        77111488 LPK!LpkDrawTextEx+0x00000044
        76a4beb3 USER32!DT_DrawStr+0x0000013a
        76a4be45 USER32!DT_DrawJustifiedLine+0x0000005f
        76a49d68 USER32!AddEllipsisAndDrawLine+0x00000186
        76a4bc31 USER32!DrawTextExWorker+0x000001b1
        76a4bedc USER32!DrawTextExW+0x0000001e
        746051d8 uxtheme!CTextDraw::GetTextExtent+0x000000be
        7460515a uxtheme!GetThemeTextExtent+0x00000065
        74611ed4 uxtheme!CThemeMenuBar::MeasureItem+0x00000124
        746119c1 uxtheme!CThemeMenu::OnMeasureItem+0x0000003f
        74611978 uxtheme!CThemeWnd::_PreDefWindowProc+0x00000117
        74601ea5 uxtheme!_ThemeDefWindowProc+0x00000090
        74601f61 uxtheme!ThemeDefWindowProcW+0x00000018
        76a4a09e USER32!DefWindowProcW+0x00000068
        931406 notepad!NPWndProc+0x00000084
        76a51a10 USER32!InternalCallWinProc+0x00000023
        76a51ae8 USER32!UserCallWinProcCheckWow+0x0000014b
        76a51c03 USER32!DispatchClientMessage+0x000000da
        76a3bc24 USER32!__fnINOUTLPUAHMEASUREMENUITEM+0x00000027
        77040e6e ntdll!KiUserCallbackDispatcher+0x0000002e
        76a51d87 USER32!RealDefWindowProcW+0x00000047
        74601f2f uxtheme!_ThemeDefWindowProc+0x000001b8

If we want to dump all heap entries with their corresponding stack traces we can use !heap -k -h <heap address> command.

Note: sometimes all these commands don’t work. In such cases we can use old Windows 2000 extension.

Some prefer to use umdh.exe and get text file logs but the advantage of embedding heap allocation stack traces in a dump is that we are not concerned with sending and configuring symbol files at a customer side.

- Dmitry Vostokov @ DumpAnalysis.org -

Visualizing Memory Dumps

Saturday, August 4th, 2007

As the first step towards Memory Dump Tomography I created a small program that interprets a memory dump as a picture. You can visualize crash dumps with it. The tool is available for free download:

Download Dump2Picture

Simply run it from the command prompt and specify full paths to a dump file and an output BMP file. The memory dump file will be converted by default into true color, 32 bits-per-pixel bitmap. You can specify other values: 8, 16 and 24.

C:\Dump2Picture>Dump2Picture.exe

Dump2Picture version 1.0
Written by Dmitry Vostokov, 2007

Usage: Dump2Picture dumpfile bmpfile [8|16|24|32]

For example:

C:\Dump2Picture>Dump2Picture.exe MEMORY.DMP MEMORY.BMP 8

Dump2Picture version 1.0
Written by Dmitry Vostokov, 2007

MEMORY.BMP
MEMORY.DMP
        1 file(s) copied.

Below are some screenshots of bitmap files created by the tool. Think about them as visualized kernel or user address spaces. 

Vista kernel memory dump (8 bits-per-pixel):

Vista kernel memory dump (16 bits-per-pixel):

Vista kernel memory dump (24 bits-per-pixel):

Vista kernel memory dump (32 bits-per-pixel):

Notepad process user memory dump (8 bits-per-pixel):

Notepad process user memory dump (16 bits-per-pixel):

Notepad process user memory dump (24 bits-per-pixel):

Notepad process user memory dump (32 bits-per-pixel):

Mspaint process user memory dump (32 bits-per-pixel):

Mspaint process user memory dump after loading “Toco Toucan.jpg” from Vista Sample Pictures folder (32 bits-per-pixel):

Citrix ICA client process (wfica32.exe) user memory dump (32 bits-per-pixel):

Enjoy :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Listening to Computer Memory

Sunday, July 29th, 2007

An alternative to converting memory dumps to sound files is to save a memory range to a binary file and then convert it to a wave file. The latter is better for complete memory dumps which can be several Gb in size.

To save a memory range to a file use WinDbg .writemem command:

.writemem d2w-range.bin 00400000 00433000

or

.writemem d2w-range.bin 00400000 L200

I wrote a WinDbg script that saves a specified memory range and then calls a shell script which automatically converts saved binary file to a wave file and then runs whatever sound program is registered for .wav extension. On many systems it is Microsoft Media Player unless you installed any other third-party player.

The WinDbg script code (memsounds.txt):

.writemem d2w-range.bin ${$arg1} ${$arg2}
.if (${/d:$arg5})
{
  .shell -i- memsounds.cmd d2w-range ${$arg3} ${$arg4} ${$arg5}
}
.elsif (${/d:$arg4})
{
  .shell -i- memsounds.cmd d2w-range ${$arg3} ${$arg4}
}
.elsif (${/d:$arg3})
{
  .shell -i- memsounds.cmd d2w-range ${$arg3}
}
.else
{
  .shell -i- memsounds.cmd d2w-range
}

The shell script (memsounds.cmd):

dump2wave %1.bin %1.wav %2 %3 %4
%1.wav

Because WinDbg installation folder is assumed to be the default directory for both scripts and Dump2Wave.exe they should be copied to the same folder where windbg.exe is located. On my system it is

C:\Program Files\Debugging Tools for Windows

Both scripts are included in Dump2Wave package available for free download:

Dump2Wave package

To call the script from WinDbg use the following command:

$$>a< memsounds.txt Range [Freq] [Bits] [Channels]

where Range can be in Address1 Address2 or Address Lxxx format, Freq can be 44100, 22050, 11025 or 8000, Bits can be 8 or 16, Channels can be 1 or 2. By default it is 44100, 16, 2.

If you have a live debugging session or loaded a crash dump you can listen to a memory range immediately. For example, the range of memory from 00400000 to 00433000 interpreted as 44.1KHz 16bit stereo:

0:000> $$>a< memsounds.txt 00400000 00433000
Writing 33001 bytes...

C:\Program Files\Debugging Tools for Windows>dump2wave d2w-range.bin d2w-range.wav

Dump2Wave version 1.2.1
Written by Dmitry Vostokov, 2006

d2w-range.wav
d2w-range.bin
        1 file(s) copied.

C:\Program Files\Debugging Tools for Windows>d2w-range.wav
.shell: Process exited
0:000>

or the same range interpreted as 8KHz 8bit mono:

0:000> $$>a< memsounds.txt 00400000 00433000 8000 8 1
Writing 33001 bytes...

C:\Program Files\Debugging Tools for Windows>dump2wave d2w-range.bin d2w-range.wav 8000 8 1

Dump2Wave version 1.2.1
Written by Dmitry Vostokov, 2006

d2w-range.wav
d2w-range.bin
        1 file(s) copied.

C:\Program Files\Debugging Tools for Windows>d2w-range.wav
.shell: Process exited
0:000>

The script starts Windows Media Player on my system and I only need to push the play button to start listening.

Enjoy :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 9c)

Saturday, July 28th, 2007

This is another variant of Deadlock pattern when we have mixed synchronization objects, for example, events and critical sections. An event may be used to signal the availability of some work item for processing it, the fact that the queue is not empty and a critical section may be used to protect some shared data. 

The typical deadlock scenario here is when one thread resets an event by calling WaitForSingleObject and tries to acquire a critical section. In the mean time the second thread has already acquired that critical section and now is waiting for the event to be set:

Thread A     |  Thread B
..           |  ..
reset Event  |  ..
..           |  acquire CS
wait for CS  |  ..
             |  wait for Event

The classical fix to this bug is to acquire the critical section and wait for the event in the same order in both threads.

In our example crash dump we can easily identify the second thread that acquied the critical section and is waiting for the event 0×480:

0:000> !locks

CritSec ntdll!LdrpLoaderLock+0 at 7c889d94
WaiterWoken        No
LockCount          9
RecursionCount     1
OwningThread       2038
EntryCount         0
ContentionCount    164
*** Locked

  13  Id: 590.2038 Suspend: 1 Teb: 7ffaa000 Unfrozen
ChildEBP RetAddr  Args to Child
0483fd5c 7c822124 77e6bad8 00000480 00000000 ntdll!KiFastSystemCallRet
0483fd60 77e6bad8 00000480 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0483fdd0 77e6ba42 00000480 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0483fde4 776cfb30 00000480 ffffffff 777904f8 kernel32!WaitForSingleObject+0×12
0483fe00 776adfaa 00000480 00000000 00000080 ole32!CDllHost::ClientCleanupFinish+0×2a
0483fe2c 776adf1a 00000000 0483fe7c 77790828 ole32!DllHostProcessUninitialize+0×80
0483fe4c 776b063f 00000000 00000000 0c9ecee0 ole32!ApartmentUninitialize+0xf8
0483fe64 776b06e3 0483fe7c 00000000 00000001 ole32!wCoUninitialize+0×48
0483fe80 776e43f5 00000001 77670000 776afef0 ole32!CoUninitialize+0×65
0483fe8c 776afef0 0483feb4 776b5cb8 77670000 ole32!DoThreadSpecificCleanup+0×63
0483fe94 776b5cb8 77670000 00000003 00000000 ole32!ThreadNotification+0×37
0483feb4 776b5c1b 77670000 00000003 00000000 ole32!DllMain+0×176
0483fed4 7c82257a 77670000 00000003 00000000 ole32!_DllMainCRTStartup+0×52
0483fef4 7c83c195 776b5bd3 77670000 00000003 ntdll!LdrpCallInitRoutine+0×14
0483ffa8 77e661d6 00000000 00000000 0483ffec ntdll!LdrShutdownThread+0xd2
0483ffb8 77e66090 00000000 00000000 00000000 kernel32!ExitThread+0×2f
0483ffec 00000000 77c5de6d 0ab24f68 00000000 kernel32!BaseThreadStart+0×39

0:000> !handle 480 ff
Handle 00000480
  Type          Event
  Attributes    0
  GrantedAccess 0×1f0003:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState,ModifyState
  HandleCount   2
  PointerCount  4
  Name          <none>
  No object specific information available

It is difficult to find the first thread, the one which has reset the event and is waiting for the critical section. In our dump we have 9 such threads from !locks command output:

LockCount          9

Event as a synchronization primitive doesn’t have an owner. Despite this we can try to find 0×480 and WaitForSingleObject address near on some other thread raw stack if that information wasn’t overwritten. Let’s do a memory search:

0:000> s -d 0 L4000000 00000480
000726ec 00000480 00000022 000004a4 00000056
008512a0 00000480 00000480 00000000 00000000
008512a4 00000480 00000000 00000000 01014220
0085ab68 00000480 00000480 00000092 00000000
0085ab6c 00000480 00000092 00000000 01014234
00eb12a0 00000480 00000480 00000000 00000000
00eb12a4 00000480 00000000 00000000 0101e614
00ebeb68 00000480 00000480 00000323 00000000
00ebeb6c 00000480 00000323 00000000 0101e644
03ffb4fc 00000480 d772c13b ce753966 00fa840f
040212a0 00000480 00000480 00000000 00000000
040212a4 00000480 00000000 00000000 01063afc
0402ab68 00000480 00000480 00000fb6 00000000
0402ab6c 00000480 00000fb6 00000000 01063b5c
041312a0 00000480 00000480 00000000 00000000
041312a4 00000480 00000000 00000000 01065b28
0413eb68 00000480 00000480 00001007 00000000
0413eb6c 00000480 00001007 00000000 01065b7c
043412a0 00000480 00000480 00000000 00000000
043412a4 00000480 00000000 00000000 01066b44
0434ab68 00000480 00000480 00001033 00000000
0434ab6c 00000480 00001033 00000000 01066b9c
0483fd68 00000480 00000000 00000000 00000000
0483fdd8 00000480 ffffffff 00000000 0483fe00
0483fdec 00000480 ffffffff 777904f8 77790738
0483fe08 00000480 00000000 00000080 776b0070
0483fe20 00000480 00000000 00000000 0483fe4c

05296f58 00000480 ffffffff ffffffff ffffffff
05297eb0 00000480 00000494 000004a4 000004c0
0557cf9c 00000480 00000000 00000000 00000000
05580adc 00000480 00000000 00000000 00000000
0558715c 00000480 00000000 00000000 00000000
0558d3cc 00000480 00000000 00000000 00000000
0559363c 00000480 00000000 00000000 00000000
0559ee0c 00000480 00000000 00000000 00000000
055a507c 00000480 00000000 00000000 00000000
056768ec 00000480 00000000 00000000 00000000
0568ef14 00000480 00000000 00000000 00000000
0581ff88 00000480 07ca7ee0 0581ff98 776cf2a3
05ed1260 00000480 00000480 00000000 00000000
05ed1264 00000480 00000000 00000000 01276efc
05ed8b68 00000480 00000480 00005c18 00000000
05ed8b6c 00000480 00005c18 00000000 01276f74
08f112a0 00000480 00000480 00000000 00000000
08f112a4 00000480 00000000 00000000 00000000
08f1ab68 00000480 00000480 00007732 00000000
08f1ab6c 00000480 00007732 00000000 01352db0

In blue color I highlighted the thread #13 raw stack occurrences and in red color I highlighted memory locations that belong to another thread raw stack. In fact, these are the only memory locations from search results that make any sense from code perspective. The only meaningful stack traces can be found in memory locations highlighted in blue and red above.

This can be seen if we feed search results to WinDbg dds command (not all output is shown for clarity):

0:000> .foreach (place { s-[1]d 0 L4000000 00000480 }) { dds place -30; .printf "\n" }
000726bc  00000390
000726c0  00000022
000726c4  000003b4
000726c8  00000056
000726cc  00000004
000726d0  6dc3f6fd
000726d4  0000040c
000726d8  0000001e
000726dc  0000042c
000726e0  00000052
000726e4  00000004
000726e8  eacb0f6d
000726ec  00000480
000726f0  00000022
000726f4  000004a4
000726f8  00000056
000726fc  00000004
00072700  62b796d2
00072704  000004fc
00072708  0000001e
0007270c  0000051c
00072710  00000052
00072714  00000004
00072718  2a615cff
0007271c  00000570
00072720  00000024
00072724  00000598
00072728  00000058
0007272c  00000004
00072730  51913e59
00072734  000005f0
00072738  00000016
...
...
...
0568eee4  05680008 xpsp2res+0x1b0008
0568eee8  01200000
0568eeec  00001010
0568eef0  00200001
0568eef4  00000468
0568eef8  00000121
0568eefc  00000000
0568ef00  00000028
0568ef04  00000030
0568ef08  00000060
0568ef0c  00040001
0568ef10  00000000
0568ef14  00000480
0568ef18  00000000
0568ef1c  00000000
0568ef20  00000000
0568ef24  00000000
0568ef28  00000000
0568ef2c  00800000
0568ef30  00008000
0568ef34  00808000
0568ef38  00000080
0568ef3c  00800080
0568ef40  00008080
0568ef44  00808080
0568ef48  00c0c0c0
0568ef4c  00ff0000
0568ef50  0000ff00
0568ef54  00ffff00
0568ef58  000000ff
0568ef5c  00ff00ff
0568ef60  0000ffff

0581ff58  0581ff70
0581ff5c  776b063f ole32!wCoUninitialize+0x48
0581ff60  00000001
0581ff64  00007530
0581ff68  77790438 ole32!gATHost
0581ff6c  00000000
0581ff70  0581ff90
0581ff74  776cf370 ole32!CDllHost::WorkerThread+0xdd
0581ff78  0581ff8c
0581ff7c  00000001
0581ff80  77e6ba50 kernel32!WaitForSingleObjectEx
0581ff84  0657cfe8
0581ff88  00000480

0581ff8c  07ca7ee0
0581ff90  0581ff98
0581ff94  776cf2a3 ole32!DLLHostThreadEntry+0xd
0581ff98  0581ffb8
0581ff9c  776b2307 ole32!CRpcThread::WorkerLoop+0×1e
0581ffa0  77790438 ole32!gATHost
0581ffa4  00000000
0581ffa8  0657cfe8
0581ffac  77670000 ole32!_imp__InstallApplication <PERF> (ole32+0×0)
0581ffb0  776b2374 ole32!CRpcThreadCache::RpcWorkerThreadEntry+0×20
0581ffb4  00000000
0581ffb8  0581ffec
0581ffbc  77e6608b kernel32!BaseThreadStart+0×34
0581ffc0  0657cfe8
0581ffc4  00000000
0581ffc8  00000000
0581ffcc  0657cfe8
0581ffd0  3cfb5963
0581ffd4  0581ffc4

05ed1230  0101f070
05ed1234  05ed1274
05ed1238  05ed1174
05ed123c  05ed0000
05ed1240  05ed1280
05ed1244  00000000
05ed1248  00000000
05ed124c  00000000
05ed1250  05ed8b80
05ed1254  05ed8000
05ed1258  00002000
05ed125c  00001000
05ed1260  00000480
05ed1264  00000480
05ed1268  00000000
05ed126c  00000000
05ed1270  01276efc
05ed1274  05ed12b4
05ed1278  05ed1234
05ed127c  05ed0000
05ed1280  05ed2d00
05ed1284  05ed1240
05ed1288  05ed1400
05ed128c  00000000
05ed1290  05edade0
05ed1294  05eda000
05ed1298  00002000
05ed129c  00001000
05ed12a0  00000220
05ed12a4  00000220
05ed12a8  00000000
05ed12ac  00000000
...
...
...
08f1ab3c  00000000
08f1ab40  00000000
08f1ab44  00000000
08f1ab48  00000000
08f1ab4c  00000000
08f1ab50  00000000
08f1ab54  00000000
08f1ab58  00000000
08f1ab5c  00000000
08f1ab60  abcdbbbb
08f1ab64  08f11000
08f1ab68  00000480
08f1ab6c  00000480
08f1ab70  00007732
08f1ab74  00000000
08f1ab78  01352db0
08f1ab7c  dcbabbbb
08f1ab80  ffffffff
08f1ab84  c0c00ac1
08f1ab88  00000000
08f1ab8c  c0c0c0c0
08f1ab90  c0c0c0c0
08f1ab94  c0c0c0c0
08f1ab98  c0c0c0c0
08f1ab9c  c0c0c0c0
08f1aba0  c0c0c0c0
08f1aba4  ffffffff
08f1aba8  c0c00ac1
08f1abac  00000000
08f1abb0  c0c0c0c0
08f1abb4  c0c0c0c0
08f1abb8  c0c0c0c0

We see that address 0581ff88 is the most meaningful and it also has WaitForSingleObjectEx nearby. This address belongs to the raw stack of the following thread #16:

  16  Id: 590.1a00 Suspend: 1 Teb: 7ffa9000 Unfrozen
ChildEBP RetAddr
0581fc98 7c822124 ntdll!KiFastSystemCallRet
0581fc9c 7c83970f ntdll!NtWaitForSingleObject+0xc
0581fcd8 7c839620 ntdll!RtlpWaitOnCriticalSection+0x19c
0581fcf8 7c83a023 ntdll!RtlEnterCriticalSection+0xa8
0581fe00 77e67bcd ntdll!LdrUnloadDll+0x35
0581fe14 776b46fb kernel32!FreeLibrary+0x41
0581fe20 776b470f ole32!CClassCache::CDllPathEntry::CFinishObject::Finish+0x2f
0581fe34 776b44a0 ole32!CClassCache::CFinishComposite::Finish+0x1d
0581ff0c 776b0bfd ole32!CClassCache::CleanUpDllsForApartment+0x1d0
0581ff38 776b0b1f ole32!FinishShutdown+0xd7
0581ff58 776b063f ole32!ApartmentUninitialize+0x94
0581ff70 776cf370 ole32!wCoUninitialize+0x48
0581ff90 776cf2a3 ole32!CDllHost::WorkerThread+0xdd
0581ff98 776b2307 ole32!DLLHostThreadEntry+0xd
0581ffac 776b2374 ole32!CRpcThread::WorkerLoop+0×1e
0581ffb8 77e6608b ole32!CRpcThreadCache::RpcWorkerThreadEntry+0×20
0581ffec 00000000 kernel32!BaseThreadStart+0×34

And if we disassemble ole32!CRpcThread::WorkerLoop function which is found below WaitForSingleObjectEx on both stack trace and raw stack data from search results we would see that the former function calls the latter function indeed:

0:000> uf ole32!CRpcThread::WorkerLoop
ole32!CRpcThread::WorkerLoop:
776b22e9 mov     edi,edi
776b22eb push    esi
776b22ec mov     esi,ecx
776b22ee cmp     dword ptr [esi+4],0
776b22f2 jne     ole32!CRpcThread::WorkerLoop+0x67 (776b234d)

ole32!CRpcThread::WorkerLoop+0xb:
776b22f4 push    ebx
776b22f5 push    edi
776b22f6 mov     edi,dword ptr [ole32!_imp__WaitForSingleObjectEx (77671304)]
776b22fc mov     ebx,7530h

ole32!CRpcThread::WorkerLoop+0x18:
776b2301 push    dword ptr [esi+0Ch]
776b2304 call    dword ptr [esi+8]
776b2307 call    dword ptr [ole32!_imp__GetCurrentThread (7767130c)]
776b230d push    eax
776b230e call    dword ptr [ole32!_imp__RtlCheckForOrphanedCriticalSections (77671564)]
776b2314 xor     eax,eax
776b2316 cmp     dword ptr [esi],eax
776b2318 mov     dword ptr [esi+8],eax
776b231b mov     dword ptr [esi+0Ch],eax
776b231e je      ole32!CRpcThread::WorkerLoop+0x65 (776b234b)

ole32!CRpcThread::WorkerLoop+0x37:
776b2320 push    esi
776b2321 mov     ecx,offset ole32!gRpcThreadCache (7778fc28)
776b2326 call    ole32!CRpcThreadCache::AddToFreeList (776de78d)

ole32!CRpcThread::WorkerLoop+0x55:
776b232b push    0
776b232d push    ebx
776b232e push    dword ptr [esi]
776b2330 call    edi

776b2332 test    eax,eax
776b2334 je      ole32!CRpcThread::WorkerLoop+0×60 (776cf3be)

ole32!CRpcThread::WorkerLoop+0x44:
776b233a push    esi
776b233b mov     ecx,offset ole32!gRpcThreadCache (7778fc28)
776b2340 call    ole32!CRpcThreadCache::RemoveFromFreeList (776e42de)
776b2345 cmp     dword ptr [esi+4],0
776b2349 je      ole32!CRpcThread::WorkerLoop+0x55 (776b232b)

ole32!CRpcThread::WorkerLoop+0x65:
776b234b pop     edi
776b234c pop     ebx

ole32!CRpcThread::WorkerLoop+0x67:
776b234d pop     esi
776b234e ret

ole32!CRpcThread::WorkerLoop+0x60:
776cf3be cmp     dword ptr [esi+4],eax
776cf3c1 je      ole32!CRpcThread::WorkerLoop+0x18 (776b2301)

ole32!CRpcThread::WorkerLoop+0x69:
776cf3c7 jmp     ole32!CRpcThread::WorkerLoop+0x65 (776b234b)

Therefore we have possibly identified the thread #16 that resets the event by calling WaitForSingleObjectEx and tries to acquire the critical section. We also know the second thread #13 that has already acquired that critical section and now is waiting for the event to be signaled.

- Dmitry Vostokov @ DumpAnalysis.org -

Reconstructing Stack Trace Manually

Wednesday, July 25th, 2007

This is a small case study to complement Incorrect Stack Trace pattern and show how to reconstruct stack trace manually based on an example with complete source code.

I created a small working multithreaded program:

#include "stdafx.h"
#include <stdio.h>
#include <process.h>

typedef void (*REQ_JUMP)();
typedef void (*REQ_RETURN)();

const char str[] = "\0\0\0\0\0\0\0";

bool loop = true;

void return_func()
{
  puts("Return Func");
  loop = false;
  _endthread();
}

void jump_func()
{
  puts("Jump Func");
}

void internal_func_2(void *param_jump,void *param_return)
{
  REQ_JUMP f_jmp = (REQ_JUMP)param_jump;
  REQ_RETURN f_ret = (REQ_RETURN)param_return;

  puts("Internal Func 2");
  // Uncomment memcpy to crash the program
  // Overwrite f_jmp and f_ret with NULL
  // memcpy(&f_ret, str, sizeof(str));
  __asm
  {
     push f_ret;
     mov  eax, f_jmp
     mov  ebp, 0 // use ebp as a general purpose register
     jmp  eax
  }
}

void internal_func_1(void *param)
{
  puts("Internal Func 1");
  internal_func_2(param, &return_func);
}

void thread_request(void *param)
{
  puts("Request");
  internal_func_1(param);
}

int _tmain(int argc, _TCHAR* argv[])
{
  _beginthread(thread_request, 0, (void *)jump_func);
  while (loop);
  return 0;
}

For it I had to disable optimizations in Visual C++ compiler otherwise most of the code would have been eliminated because the program is very small and easy for code optimizer. If we run the program it displays the following output:

Request
Internal Func 1
Internal Func 2
Jump Func
Return Func

internal_func_2 gets two parameters: the function address to jump and the function address to call upon the return. The latter sets loop variable to false in order to break infinite main thread loop and calls _endthread. Why is that complexity in so small sample? I wanted to simulate FPO optimization in an inner function call and also gain control over a return address. This is why I set EBP to zero before jumping and pushed the custom return address which I can change any time. If I used the call instruction then the processor would have determined the return address as the next instruction address.

The code also copies two internal_func_2 parameters into local variables f_jmp and f_ret because the commented memcpy call is crafted to overwrite them with zeroes and do not touch the saved EBP, return address and function arguments. This is all to make stack trace incorrect but at the same time make manual stack reconstruction as easy as possible in this example.

Let’s suppose that memcpy call is a bug that overwrites local variables. Then we have a crash obviously because EAX is zero and jump to zero address will cause access violation. EBP is also 0 because we assigned 0 to it explicitly. Let’s pretend that we wanted to pass some constant via EBP and it is zero.

What we have now:

EBP is 0
EIP is 0
the return address is 0

As you might have expected already when you load a crash dump WinDbg is utterly confused because it has no clue on how to reconstruct the stack trace:

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(bd0.ec8): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00595620 ecx=00000002 edx=00000000 esi=00000000 edi=00000000
eip=00000000 esp=0069ff54 ebp=00000000 iopl=0 nv up ei pl nz ac po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010212
00000000 ??              ???

0:001> kv
ChildEBP RetAddr  Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
0069ff50 00000000 00000000 00000000 0069ff70 0×0

Fortunately ESP is not zero so we can look at raw stack:

0:001> dds esp
0069ff54  00000000
0069ff58  00000000
0069ff5c  00000000
0069ff60  0069ff70
0069ff64  0040187f WrongIP!internal_func_1+0x1f
0069ff68  00401830 WrongIP!jump_func
0069ff6c  00401840 WrongIP!return_func
0069ff70  0069ff7c
0069ff74  0040189c WrongIP!thread_request+0xc
0069ff78  00401830 WrongIP!jump_func
0069ff7c  0069ffb4
0069ff80  78132848 msvcr80!_endthread+0x4b
0069ff84  00401830 WrongIP!jump_func
0069ff88  aa75565b
0069ff8c  00000000
0069ff90  00000000
0069ff94  00595620
0069ff98  c0000005
0069ff9c  0069ff88
0069ffa0  0069fb34
0069ffa4  0069ffdc
0069ffa8  78138cd9 msvcr80!_except_handler4
0069ffac  d207e277
0069ffb0  00000000
0069ffb4  0069ffec
0069ffb8  781328c8 msvcr80!_endthread+0xcb
0069ffbc  7d4dfe21 kernel32!BaseThreadStart+0x34
0069ffc0  00595620
0069ffc4  00000000
0069ffc8  00000000
0069ffcc  00595620
0069ffd0  c0000005

Here we can start searching for the following pairs:

EBP:         PreviousEBP
             Function return address



PreviousEBP: PrePreviousEBP
             Function return address


for example:

0:001> dds esp
0069ff54 00000000
0069ff58 00000000
0069ff5c 00000000
0069ff60 0069ff70
0069ff64 0040187f WrongIP!internal_func_1+0×1f
0069ff68 00401830 WrongIP!jump_func
0069ff6c 00401840 WrongIP!return_func
0069ff70 0069ff7c
0069ff74 0040189c WrongIP!thread_request+0xc
0069ff78 00401830 WrongIP!jump_func
0069ff7c 0069ffb4

This is based on the fact that a function call saves its return address and the standard function prolog saves the previous EBP value and sets ESP to point to it.

push ebp
mov ebp, esp

Therefore our stack looks like this:

0:001> dds esp
0069ff54 00000000
0069ff58 00000000
0069ff5c 00000000
0069ff60 0069ff70
0069ff64 0040187f WrongIP!internal_func_1+0×1f
0069ff68 00401830 WrongIP!jump_func
0069ff6c 00401840 WrongIP!return_func
0069ff70 0069ff7c
0069ff74 0040189c WrongIP!thread_request+0xc
0069ff78 00401830 WrongIP!jump_func
0069ff7c 0069ffb4
0069ff80 78132848 msvcr80!_endthread+0×4b
0069ff84 00401830 WrongIP!jump_func
0069ff88 aa75565b
0069ff8c 00000000
0069ff90 00000000
0069ff94 00595620
0069ff98 c0000005
0069ff9c 0069ff88
0069ffa0 0069fb34
0069ffa4 0069ffdc
0069ffa8 78138cd9 msvcr80!_except_handler4
0069ffac d207e277
0069ffb0 00000000
0069ffb4 0069ffec
0069ffb8 781328c8 msvcr80!_endthread+0xcb
0069ffbc 7d4dfe21 kernel32!BaseThreadStart+0×34
0069ffc0 00595620
0069ffc4 00000000
0069ffc8 00000000
0069ffcc 00595620
0069ffd0 c0000005

Also we double check return addresses to see if they are valid code indeed. The best way is to try to disassemble them backwards. This should show call instructions resulted in saved return addresses:

0:001> ub WrongIP!internal_func_1+0x1f
WrongIP!internal_func_1+0x1:
00401871 mov     ebp,esp
00401873 push    offset WrongIP!GS_ExceptionPointers+0x38 (00402124)
00401878 call    dword ptr [WrongIP!_imp__puts (004020ac)]
0040187e add     esp,4
00401881 push    offset WrongIP!return_func (00401850)
00401886 mov     eax,dword ptr [ebp+8]
00401889 push    eax
0040188a call    WrongIP!internal_func_2 (004017e0)

0:001> ub WrongIP!thread_request+0xc
WrongIP!internal_func_1+0x2d:
0040189d int     3
0040189e int     3
0040189f int     3
WrongIP!thread_request:
004018a0 push    ebp
004018a1 mov     ebp,esp
004018a3 mov     eax,dword ptr [ebp+8]
004018a6 push    eax
004018a7 call    WrongIP!internal_func_1 (00401870)

0:001> ub msvcr80!_endthread+0x4b
msvcr80!_endthread+0x2f:
7813282c pop     esi
7813282d push    0Ch
7813282f push    offset msvcr80!__rtc_tzz+0x64 (781b4b98)
78132834 call    msvcr80!_SEH_prolog4 (78138c80)
78132839 call    msvcr80!_getptd (78132e29)
7813283e and     dword ptr [ebp-4],0
78132842 push    dword ptr [eax+58h]
78132845 call    dword ptr [eax+54h]

0:001> ub msvcr80!_endthread+0xcb
msvcr80!_endthread+0xaf:
781328ac mov     edx,dword ptr [ecx+58h]
781328af mov     dword ptr [eax+58h],edx
781328b2 mov     edx,dword ptr [ecx+4]
781328b5 push    ecx
781328b6 mov     dword ptr [eax+4],edx
781328b9 call    msvcr80!_freefls (78132e41)
781328be call    msvcr80!_initp_misc_winxfltr (781493c1)
781328c3 call    msvcr80!_endthread+0×30 (7813282d)

0:001> ub BaseThreadStart+0x34
kernel32!BaseThreadStart+0x10:
7d4dfdfd mov     eax,dword ptr fs:[00000018h]
7d4dfe03 cmp     dword ptr [eax+10h],1E00h
7d4dfe0a jne     kernel32!BaseThreadStart+0x2e (7d4dfe1b)
7d4dfe0c cmp     byte ptr [kernel32!BaseRunningInServerProcess (7d560008)],0
7d4dfe13 jne     kernel32!BaseThreadStart+0x2e (7d4dfe1b)
7d4dfe15 call    dword ptr [kernel32!_imp__CsrNewThread (7d4d0310)]
7d4dfe1b push    dword ptr [ebp+0Ch]
7d4dfe1e call    dword ptr [ebp+8]

Now we can use extended version of k command and supply custom EBP, ESP and EIP values. We set EBP to the first found address of EBP:PreviousEBP pair and set EIP to 0:

0:001> k L=0069ff60 0069ff60 0
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
0069ff5c 0069ff70 0x0
0069ff60 0040188f 0x69ff70
0069ff70 004018ac WrongIP!internal_func_1+0x1f
0069ff7c 78132848 WrongIP!thread_request+0xc
0069ffb4 781328c8 msvcr80!_endthread+0x4b
0069ffb8 7d4dfe21 msvcr80!_endthread+0xcb
0069ffec 00000000 kernel32!BaseThreadStart+0x34

The stack trace looks good because it also shows BaseThreadStart.

From the backwards disassembly of the return address WrongIP!internal_func_1+0×1f we see that internal_func_1 calls internal_func_2 so we can disassemble the latter function:

0:001> uf internal_func_2
Flow analysis was incomplete, some code may be missing
WrongIP!internal_func_2:
   28 004017e0 push    ebp
   28 004017e1 mov     ebp,esp

   28 004017e3 sub     esp,8
   29 004017e6 mov     eax,dword ptr [ebp+8]
   29 004017e9 mov     dword ptr [ebp-4],eax

   30 004017ec mov     ecx,dword ptr [ebp+0Ch]
   30 004017ef mov     dword ptr [ebp-8],ecx
   32 004017f2 push    offset WrongIP!GS_ExceptionPointers+0×28 (00402114)
   32 004017f7 call    dword ptr [WrongIP!_imp__puts (004020ac)]
   32 004017fd add     esp,4
   33 00401800 push    8
   33 00401802 push    offset WrongIP!GS_ExceptionPointers+0×8 (004020f4)
   33 00401807 lea     edx,[ebp-8]
   33 0040180a push    edx
   33 0040180b call    WrongIP!memcpy (00401010)
   33 00401810 add     esp,0Ch
   35 00401813 push    dword ptr [ebp-8]
   36 00401816 mov     eax,dword ptr [ebp-4]
   37 00401819 mov     ebp,0
   38 0040181e jmp     eax

We see that it takes some value from [ebp-8], puts it into EAX and then jumps to that address. The function uses standard prolog (in blue) and therefore EBP-4 is the local variable. From the code we see that it comes from [EBP+8] which is the first function parameter:

EBP+C: second parameter
EBP+8: first parameter
EBP+4: return address
EBP:   previous EBP
EBP-4: local variable
EBP-8: local variable

If we examine the first parameter we would see it is a valid function address that we were supposed to call:

0:001> kv L=0069ff60 0069ff60 0
ChildEBP RetAddr  Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
0069ff5c 0069ff70 0040188f 00401830 00401850 0x0
0069ff60 0040188f 00401830 00401850 0069ff7c 0x69ff70
0069ff70 004018ac 00401830 0069ffb4 78132848 WrongIP!internal_func_1+0×1f
0069ff7c 78132848 00401830 6d5ba283 00000000 WrongIP!thread_request+0xc
0069ffb4 781328c8 7d4dfe21 00595620 00000000 msvcr80!_endthread+0×4b
0069ffb8 7d4dfe21 00595620 00000000 00000000 msvcr80!_endthread+0xcb
0069ffec 00000000 7813286e 00595620 00000000 kernel32!BaseThreadStart+0×34

0:001> u 00401830
WrongIP!jump_func:
00401830 push    ebp
00401831 mov     ebp,esp
00401833 push    offset WrongIP!GS_ExceptionPointers+0x1c (00402108)
00401838 call    dword ptr [WrongIP!_imp__puts (004020ac)]
0040183e add     esp,4
00401841 pop     ebp
00401842 ret
00401843 int     3

However if we look at the code we would see that we call memcpy with ebp-8 address and the number of bytes to copy is 8. In pseudo-code it would look like:

memcpy(ebp-8, 004020f4, 8);

   33 00401800 push    8
   33 00401802 push    offset WrongIP!GS_ExceptionPointers+0x8 (004020f4)
   33 00401807 lea     edx,[ebp-8]
   33 0040180a push    edx
   33 0040180b call    WrongIP!memcpy (00401010)
   33 00401810 add     esp,0Ch

If we examine 004020f4 address we would see that it contains 8 zeroes:

0:001> db 004020f4 l8
004020f4  00 00 00 00 00 00 00 00
       

Therefore memcpy overwrites our local variables that contain a jump address with zeroes. This explains why we have jumped to 0 address and why EIP was zero.

Finally our reconstructed stack trace looks like this:

WrongIP!internal_func_2+offset ; here we jump
WrongIP!internal_func_1+0x1f
WrongIP!thread_request+0xc
msvcr80!_endthread+0x4b
msvcr80!_endthread+0xcb
kernel32!BaseThreadStart+0x34

This was based on the fact that ESP was valid. If we have a zero or invalid ESP we can look at the entire raw stack range from thread environment block (TEB). Use !teb command to get thread stack range. In my example this command doesn’t work due to the lack of proper MS symbols but it reports TEB address and we can dump it:

0:001> !teb
TEB at 7efda000
error InitTypeRead( TEB )...

0:001> dd 7efda000 l3
7efda000 0069ffa4 006a0000 0069e000

Usually the second double word is the stack limit and the third is the stack base address so we can dump the range and start reconstructing stack trace for our example from the bottom of the stack (BaseThreadStart) or look after exception handling calls (shown in red):

0:001> dds 0069e000 006a0000
0069e000  00000000
0069e004  00000000
...
...
...
0069fb24  7d535b43 kernel32!UnhandledExceptionFilter+0×851



0069fbb0  0069fc20
0069fbb4  7d6354c9 ntdll!RtlDispatchException+0×11f
0069fbb8  0069fc38
0069fbbc  0069fc88
0069fc1c  00000000
0069fc20  00000000
0069fc24  7d61dd26 ntdll!NtRaiseException+0×12
0069fc28  7d61ea51 ntdll!KiUserExceptionDispatcher+0×29
0069fc2c  0069fc38



0069ff38  00000000
0069ff3c  00000000
0069ff40  00000000
0069ff44  00000000
0069ff48  00000000
0069ff4c  00000000
0069ff50  00000000
0069ff54  00000000
0069ff58  00000000
0069ff5c  00000000
0069ff60  0069ff70
0069ff64  0040188f WrongIP!internal_func_1+0×1f
0069ff68  00401830 WrongIP!jump_func
0069ff6c  00401850 WrongIP!return_func
0069ff70  0069ff7c
0069ff74  004018ac WrongIP!thread_request+0xc
0069ff78  00401830 WrongIP!jump_func
0069ff7c  0069ffb4
0069ff80  78132848 msvcr80!_endthread+0×4b
0069ff84  00401830 WrongIP!jump_func
0069ff88  6d5ba283
0069ff8c  00000000
0069ff90  00000000
0069ff94  00595620
0069ff98  c0000005
0069ff9c  0069ff88
0069ffa0  0069fb34
0069ffa4  0069ffdc
0069ffa8  78138cd9 msvcr80!_except_handler4
0069ffac  152916af
0069ffb0  00000000
0069ffb4  0069ffec
0069ffb8  781328c8 msvcr80!_endthread+0xcb
0069ffbc  7d4dfe21 kernel32!BaseThreadStart+0×34
0069ffc0  00595620
0069ffc4  00000000


- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 19)

Sunday, July 22nd, 2007

Almost all threads in any system are waiting for resources or waiting in a ready-to-run queues to be scheduled. At any moment of time the number of running threads is equal to the number of processors. The rest, hundreds and thousands of threads, are waiting. Looking at their waiting times in kernel and complete memory dumps provides some interesting observations that worth their own pattern name: Waiting Thread Time.

When a thread starts waiting that time is recorded in WaitTime field of _KTHREAD structure:

1: kd> dt _KTHREAD 8728a020
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 MutantListHead   : _LIST_ENTRY [ 0x8728a030 - 0x8728a030 ]
   +0x018 InitialStack     : 0xa3a1f000
   +0x01c StackLimit       : 0xa3a1a000
   +0x020 KernelStack      : 0xa3a1ec08
   +0x024 ThreadLock       : 0
   +0x028 ApcState         : _KAPC_STATE
   +0x028 ApcStateFill     : [23]  "H???"
   +0x03f ApcQueueable     : 0x1 ''
   +0x040 NextProcessor    : 0x3 ''
   +0x041 DeferredProcessor : 0x3 ''
   +0x042 AdjustReason     : 0 ''
   +0x043 AdjustIncrement  : 1 ''
   +0x044 ApcQueueLock     : 0
   +0x048 ContextSwitches  : 0x6b7
   +0x04c State            : 0x5 ''
   +0x04d NpxState         : 0xa ''
   +0x04e WaitIrql         : 0 ''
   +0x04f WaitMode         : 1 ''
   +0x050 WaitStatus       : 0
   +0x054 WaitBlockList    : 0x8728a0c8 _KWAIT_BLOCK
   +0x054 GateObject       : 0x8728a0c8 _KGATE
   +0x058 Alertable        : 0 ''
   +0x059 WaitNext         : 0 ''
   +0x05a WaitReason       : 0x11 ''
   +0x05b Priority         : 12 ''
   +0x05c EnableStackSwap  : 0x1 ''
   +0x05d SwapBusy         : 0 ''
   +0x05e Alerted          : [2]  ""
   +0x060 WaitListEntry    : _LIST_ENTRY [ 0x88091e10 - 0x88029ce0 ]
   +0x060 SwapListEntry    : _SINGLE_LIST_ENTRY
   +0x068 Queue            : (null)
   +0×06c WaitTime         : 0×82de9b
   +0×070 KernelApcDisable : 0


This value is also displayed in decimal format as Wait Start TickCount when you list threads or use !thread command:

0: kd> ? 0x82de9b
Evaluate expression: 8576667 = 0082de9b

1: kd> !thread 8728a020
THREAD 8728a020  Cid 4c9c.59a4  Teb: 7ffdf000 Win32Thread: bc012008 WAIT: (Unknown) UserMode Non-Alertable
    8728a20c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 017db413:
Current LPC port e5fcff68
Impersonation token:  e2b07028 (Level Impersonation)
DeviceMap                 e1da6518
Owning Process            89d20740       Image:         winlogon.exe
Wait Start TickCount      8576667        Ticks: 7256 (0:00:01:53.375)
Context Switch Count      1719                 LargeStack
UserTime                  00:00:00.0359
KernelTime                00:00:00.0375

Tick is a system unit of time and KeTimeIncrement double word value contains its equivalent as the number of 100-nanosecond units:

0: kd> dd KeTimeIncrement l1
808a6304  0002625a

0: kd> ? 0002625a
Evaluate expression: 156250 = 0002625a

0: kd> ?? 156250.0/10000000.0
double 0.015625

Therefore on that system one tick is 0.015625 of a second.

The current tick count is available via KeTickCount variable:

0: kd> dd KeTickCount l1
8089c180  0082faf3

If we subtract the recorded start wait time from the current tick count we get the number of ticks passed since the thread began waiting:

0: kd> ? 0082faf3-82de9b
Evaluate expression: 7256 = 00001c58

Using our previously calculated constant of the number of seconds per tick (0.015625) we get the number of seconds passed:

0: kd> ?? 7256.0 * 0.015625
double 113.37499999999999

113.375 seconds is 1 minute 53 seconds and 375 milliseconds:

0: kd> ?? 113.375-60.0
double 53.374999999999986

We can see that this value corresponds to Ticks value that WinDbg shows for the thread:

Wait Start TickCount 8576667 Ticks: 7256 (0:00:01:53.375)

Why do we need to concern ourselves with these ticks? If we know that some activity was frozen for 15 minutes we can filter out threads from our search space because threads with significantly less number of ticks were running at some time and were not waiting for 15 minutes.

Threads with low number of ticks were running recently:

THREAD 86ced020  Cid 0004.3908  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    b99cb7d0  QueueObject
    86ced098  NotificationTimer
Not impersonating
DeviceMap                 e10038e0
Owning Process            8ad842a8       Image:         System
Wait Start TickCount      8583871        Ticks: 52 (0:00:00:00.812)
Context Switch Count      208
UserTime                  00:00:00.0000
KernelTime                00:00:00.0000
Start Address rdbss!RxWorkerThreadDispatcher (0xb99cdc2e)
Stack Init ad21d000 Current ad21ccd8 Base ad21d000 Limit ad21a000 Call 0
Priority 8 BasePriority 8 PriorityDecrement
ChildEBP RetAddr
ad21ccf0 808330c6 nt!KiSwapContext+0×26
ad21cd1c 8082af7f nt!KiSwapThread+0×284
ad21cd64 b99c00e9 nt!KeRemoveQueue+0×417
ad21cd9c b99cdc48 rdbss!RxpWorkerThreadDispatcher+0×4b
ad21cdac 80948e74 rdbss!RxWorkerThreadDispatcher+0×1a
ad21cddc 8088d632 nt!PspSystemThreadStartup+0×2e
00000000 00000000 nt!KiThreadStartup+0×16

Another application would be to find all threads from different processes whose wait time roughly corresponds to 15 minutes and therefore they might be related to the same frozen activity. For example, these RPC threads below from different processes are most likely related because one is the RPC client thread, the other is the RPC server thread waiting for some object and their common Ticks value is the same: 15131.

THREAD 89cc9750  Cid 0f1c.0f60  Teb: 7ffd6000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    89cc993c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 0000a7e7:
Current LPC port e18fcae8
Not impersonating
DeviceMap                 e10018a8
Owning Process            88d3b938       Image:         svchost.exe
Wait Start TickCount      29614          Ticks: 15131 (0:00:03:56.421)
Context Switch Count      45
UserTime                  00:00:00.0000
KernelTime                00:00:00.0000
Win32 Start Address 0×0000a7e6
LPC Server thread working on message Id a7e6
Start Address kernel32!BaseThreadStartThunk (0×7c82b5bb)
Stack Init f29a6000 Current f29a5c08 Base f29a6000 Limit f29a3000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr
f29a5c20 80832f7a nt!KiSwapContext+0×26
f29a5c4c 8082927a nt!KiSwapThread+0×284
f29a5c94 8091df86 nt!KeWaitForSingleObject+0×346
f29a5d50 80888c6c nt!NtRequestWaitReplyPort+0×776
f29a5d50 7c94ed54 nt!KiFastCallEntry+0xfc
0090f6b8 7c941c94 ntdll!KiFastSystemCallRet
0090f6bc 77c42700 ntdll!NtRequestWaitReplyPort+0xc
0090f708 77c413ba RPCRT4!LRPC_CCALL::SendReceive+0×230
0090f714 77c42c7f RPCRT4!I_RpcSendReceive+0×24
0090f728 77cb5d63 RPCRT4!NdrSendReceive+0×2b
0090f9cc 67b610ca RPCRT4!NdrClientCall+0×334
0090f9dc 67b61c07 component!NotifyOfEvent+0×14



0090ffec 00000000 kernel32!BaseThreadStart+0×34

THREAD 89b49590  Cid 098c.01dc  Teb: 7ff92000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    88c4e020  Thread
    89b49608  NotificationTimer
Not impersonating
DeviceMap                 e10018a8
Owning Process            89d399c0       Image:         MyService.exe
Wait Start TickCount      29614          Ticks: 15131 (0:00:03:56.421)
Context Switch Count      310
UserTime                  00:00:00.0015
KernelTime                00:00:00.0000
Win32 Start Address 0×0000a7e7
LPC Server thread working on message Id a7e7
Start Address kernel32!BaseThreadStartThunk (0×7c82b5bb)
Stack Init f2862000 Current f2861c60 Base f2862000 Limit f285f000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr
f2861c78 80832f7a nt!KiSwapContext+0×26
f2861ca4 8082927a nt!KiSwapThread+0×284
f2861cec 80937e4c nt!KeWaitForSingleObject+0×346
f2861d50 80888c6c nt!NtWaitForSingleObject+0×9a
f2861d50 7c94ed54 nt!KiFastCallEntry+0xfc
0a6cf590 7c942124 ntdll!KiFastSystemCallRet
0a6cf594 7c82baa8 ntdll!NtWaitForSingleObject+0xc
0a6cf604 7c82ba12 kernel32!WaitForSingleObjectEx+0xac
0a6cf618 3f691c11 kernel32!WaitForSingleObject+0×12
0a6cf658 09734436 component2!WaitForResponse+0×75



0a6cf8b4 77cb23f7 RPCRT4!Invoke+0×30
0a6cfcb4 77cb26ed RPCRT4!NdrStubCall2+0×299
0a6cfcd0 77c409be RPCRT4!NdrServerCall2+0×19
0a6cfd04 77c4093f RPCRT4!DispatchToStubInCNoAvrf+0×38
0a6cfd58 77c40865 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
0a6cfd7c 77c357eb RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
0a6cfdbc 77c41e26 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject+0xc0
0a6cfdfc 77c41bb3 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
0a6cfe20 77c45458 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0a6cff84 77c2778f RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0a6cff8c 77c2f7dd RPCRT4!RecvLotsaCallsWrapper+0xd
0a6cffac 77c2de88 RPCRT4!BaseCachedThreadRoutine+0×9d
0a6cffb8 7c826063 RPCRT4!ThreadStartRoutine+0×1b
0a6cffec 00000000 kernel32!BaseThreadStart+0×34

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 5)

Friday, July 20th, 2007

In this part I’ll show how to simulate .trap WinDbg command when you have x64 Windows kernel and complete memory dumps. 

When you have a fault an x64 processor saves some registers on the current thread stack as explained previously in Part 2. Then an interrupt handler saves _KTRAP_FRAME on the stack:

6: kd> uf nt!KiPageFault
nt!KiPageFault:
fffff800`0102d400 push    rbp
fffff800`0102d401 sub     rsp,158h
fffff800`0102d408 lea     rbp,[rsp+80h]
fffff800`0102d410 mov     byte ptr [rbp-55h],1
fffff800`0102d414 mov     qword ptr [rbp-50h],rax
fffff800`0102d418 mov     qword ptr [rbp-48h],rcx
fffff800`0102d41c mov     qword ptr [rbp-40h],rdx
fffff800`0102d420 mov     qword ptr [rbp-38h],r8
fffff800`0102d424 mov     qword ptr [rbp-30h],r9
fffff800`0102d428 mov     qword ptr [rbp-28h],r10
fffff800`0102d42c mov     qword ptr [rbp-20h],r11
...
...
...

6: kd> dt _KTRAP_FRAME
   +0x000 P1Home           : Uint8B
   +0x008 P2Home           : Uint8B
   +0x010 P3Home           : Uint8B
   +0x018 P4Home           : Uint8B
   +0x020 P5               : Uint8B
   +0x028 PreviousMode     : Char
   +0x029 PreviousIrql     : UChar
   +0x02a FaultIndicator   : UChar
   +0x02b ExceptionActive  : UChar
   +0x02c MxCsr            : Uint4B
   +0x030 Rax              : Uint8B
   +0x038 Rcx              : Uint8B
   +0x040 Rdx              : Uint8B
   +0x048 R8               : Uint8B
   +0x050 R9               : Uint8B
   +0x058 R10              : Uint8B
   +0x060 R11              : Uint8B
   +0x068 GsBase           : Uint8B
   +0x068 GsSwap           : Uint8B
   +0x070 Xmm0             : _M128A
   +0x080 Xmm1             : _M128A
   +0x090 Xmm2             : _M128A
   +0x0a0 Xmm3             : _M128A
   +0x0b0 Xmm4             : _M128A
   +0x0c0 Xmm5             : _M128A
   +0x0d0 FaultAddress     : Uint8B
   +0x0d0 ContextRecord    : Uint8B
   +0x0d0 TimeStamp        : Uint8B
   +0x0d8 Dr0              : Uint8B
   +0x0e0 Dr1              : Uint8B
   +0x0e8 Dr2              : Uint8B
   +0x0f0 Dr3              : Uint8B
   +0x0f8 Dr6              : Uint8B
   +0x100 Dr7              : Uint8B
   +0x108 DebugControl     : Uint8B
   +0x110 LastBranchToRip  : Uint8B
   +0x118 LastBranchFromRip : Uint8B
   +0x120 LastExceptionToRip : Uint8B
   +0x128 LastExceptionFromRip : Uint8B
   +0x108 LastBranchControl : Uint8B
   +0x110 LastBranchMSR    : Uint4B
   +0x130 SegDs            : Uint2B
   +0x132 SegEs            : Uint2B
   +0x134 SegFs            : Uint2B
   +0x136 SegGs            : Uint2B
   +0x138 TrapFrame        : Uint8B
   +0x140 Rbx              : Uint8B
   +0x148 Rdi              : Uint8B
   +0x150 Rsi              : Uint8B
   +0x158 Rbp              : Uint8B
   +0×160 ErrorCode        : Uint8B
   +0×160 ExceptionFrame   : Uint8B
   +0×168 Rip              : Uint8B
   +0×170 SegCs            : Uint2B
   +0×172 Fill1            : [3] Uint2B
   +0×178 EFlags           : Uint4B
   +0×17c Fill2            : Uint4B
   +0×180 Rsp              : Uint8B
   +0×188 SegSs            : Uint2B
   +0×18a Fill3            : [1] Uint2B
   +0×18c CodePatchCycle   : Int4B

Unfortunately the technique to use DS and ES pair to find the trap frame in x86 Windows crash dump doesn’t work here because KiPageFault interrupt handler doesn’t save them as can be found by inspecting its disassembly. Fortunately the registers that an x64 processor pushes upon an interrupt are part of _KTRAP_FRAME shown in blue above. Fill1, Fill2, Fill3 and CodePatchCycle are just dummy values to fill 64-bit slots because CS and SS are 16-bit registers and in 64-bit RFLAGS only the first 32-bit EFLAGS part is currently used. Remember that a processor in 64-bit mode pushes 64-bit values even if values occupy only 16 or 32-bit. Therefore we can try to find CS and SS on the stack because they have the following constant values:

6: kd> r cs
cs=0010
6: kd> r ss
ss=0018

6: kd> k
Child-SP          RetAddr           Call Site
fffffadc`6e02b9e8 fffff800`013731b1 nt!KeBugCheckEx



fffffadc`6e02cd70 fffff800`010202d6 nt!PspSystemThreadStartup+0×3e
fffffadc`6e02cdd0 00000000`00000000 nt!KxStartSystemThread+0×16

6: kd> dqs fffffadc`6e02b9e8 fffffadc`6e02cd70
...
...
...
fffffadc`6e02c938 fffff800`0102d5e1 nt!KiPageFault+0x1e1
...
...
...
fffffadc`6e02ca70  fffff97f`f3937a8c
fffffadc`6e02ca78  fffff97f`ff57d28b driver+0x3028b
fffffadc`6e02ca80  00000000`00000000
fffffadc`6e02ca88  fffff97f`f3937030
fffffadc`6e02ca90  fffff97f`ff5c2990 driver+0x75990
fffffadc`6e02ca98  00000000`00000000
fffffadc`6e02caa0  00000000`00000000 ; ErrorCode
fffffadc`6e02caa8  fffff97f`ff591ed3 driver+0x44ed3 ; RIP
fffffadc`6e02cab0  00000000`00000010 ; CS
fffffadc`6e02cab8  00000000`00010282 ; RFLAGS
fffffadc`6e02cac0  fffffadc`6e02cad0 ; RSP
fffffadc`6e02cac8  00000000`00000018 ; SS
fffffadc`6e02cad0  fffff97f`f382b0e0
fffffadc`6e02cad8  fffffadc`6e02cbd0
fffffadc`6e02cae0  fffff97f`f3937a8c
fffffadc`6e02cae8  fffff97f`f3937030
fffffadc`6e02caf0  00000000`00000000
fffffadc`6e02caf8  00000000`00000001


Now we can calculate the trap frame address by subtracting SegSs offset in _KTRAP_FRAME structure (0×188) from fffffadc`6e02cac8 address:

6: kd> ? fffffadc`6e02cac8-188
Evaluate expression: -5650331285184 = fffffadc`6e02c940

6: kd> .trap fffffadc`6e02c940
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed.
rax=fffffadcdac27298 rbx=0000000000000000 rcx=fffffadcdb45a4c0
rdx=0000000000000555 rsi=fffff97fff5c2990 rdi=fffff97ff3937030
rip=fffff97fff591ed3 rsp=fffffadc6e02cad0 rbp=0000000000000000
 r8=fffffadcdac27250  r9=fffff97ff3824030 r10=0000000000000020
r11=fffffadcdac27250 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
driver+0x44ed3:
fffff97f`ff591ed3 0fb74514  movzx eax,word ptr [rbp+14h] ss:0018:00000000`00000014=????

6: kd> k
Child-SP          RetAddr           Call Site
fffffadc`6e02cad0 fffff97f`ff5935f7 driver+0x44ed3
fffffadc`6e02cc40 fffff800`0124b972 driver+0x465f7
fffffadc`6e02cd70 fffff800`010202d6 nt!PspSystemThreadStartup+0x3e
fffffadc`6e02cdd0 00000000`00000000 nt!KxStartSystemThread+0x16

Our example shows how to find a trap frame manually in x64 kernel or complete memory dump. Usually WinDbg finds trap frames automatically (call arguments are removed from the verbose stack trace for clarity):

6: kd> kv
Child-SP          RetAddr           Call Site
fffffadc`6e02b9e8 fffff800`013731b1 nt!KeBugCheckEx
fffffadc`6e02b9f0 fffff800`010556ab nt!PspSystemThreadStartup+0x270
fffffadc`6e02ba40 fffff800`010549fd nt!_C_specific_handler+0x9b
fffffadc`6e02bad0 fffff800`01054f93 nt!RtlpExecuteHandlerForException+0xd
fffffadc`6e02bb00 fffff800`0100b901 nt!RtlDispatchException+0x2c0
fffffadc`6e02c1c0 fffff800`0102e76f nt!KiDispatchException+0xd9
fffffadc`6e02c7c0 fffff800`0102d5e1 nt!KiExceptionExit
fffffadc`6e02c940 fffff97f`ff591ed3 nt!KiPageFault+0x1e1 (TrapFrame @ fffffadc`6e02c940)
fffffadc`6e02cad0 fffff97f`ff5935f7 driver+0×44ed3
fffffadc`6e02cc40 fffff800`0124b972 driver+0×465f7
fffffadc`6e02cd70 fffff800`010202d6 nt!PspSystemThreadStartup+0×3e
fffffadc`6e02cdd0 00000000`00000000 nt!KxStartSystemThread+0×16

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 18)

Friday, July 20th, 2007

Sometimes the page file size is less than the amount of physical memory. If this is the case and we have configured “Complete memory dump” in Startup and Recovery settings in Control Panel we get truncated dumps. Therefore we can call our next pattern “Truncated Dump”. WinDbg prints a warning when we open such dump:

************************************************************
WARNING: Dump file has been truncated.  Data may be missing.
************************************************************

We can double check this with !vm command:

kd> !vm

*** Virtual Memory Usage ***
       Physical Memory:      511859 (   2047436 Kb)
       Paging File Name paged out
         Current:   1536000 Kb  Free Space:   1522732 Kb
         Minimum:   1536000 Kb  Maximum:      1536000 Kb

We see that the page file size is 1.5Gb but the amount of physical memory is 2Gb. When BSOD happens the physical memory contents will be saved to the page file and the dump file size will be no more than 1.5Gb effectively truncating data needed for crash dump analysis.

Sometimes you can still access some data in truncated dumps but pay attention to what WinDbg says. For example, in the truncated dump shown above the stack and driver code are not available:

kd> kv
ChildEBP RetAddr  Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
f408b004 00000000 00000000 00000000 00000000 driver+0x19237

kd> r
Last set context:
eax=89d55230 ebx=89d21130 ecx=89d21130 edx=89c8cc20 esi=89e24ac0 edi=89c8cc20
eip=f7242237 esp=f408afec ebp=f408b004 iopl=0 nv up ei ng nz ac po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010292
driver+0x19237:
f7242237 ??              ???

kd> dds esp
f408afec  ????????
f408aff0  ????????
f408aff4  ????????
f408aff8  ????????
f408affc  ????????
f408b000  ????????
f408b004  ????????
f408b008  ????????
f408b00c  ????????
f408b010  ????????
f408b014  ????????
f408b018  ????????
f408b01c  ????????
f408b020  ????????
f408b024  ????????
f408b028  ????????
f408b02c  ????????
f408b030  ????????
f408b034  ????????
f408b038  ????????
f408b03c  ????????
f408b040  ????????
f408b044  ????????
f408b048  ????????
f408b04c  ????????
f408b050  ????????
f408b054  ????????
f408b058  ????????
f408b05c  ????????
f408b060  ????????
f408b064  ????????
f408b068  ????????

kd> lmv m driver
start    end        module name
f7229000 f725f000   driver     T (no symbols)
    Loaded symbol image file: driver.sys
    Image path: driver.sys
    Image name: driver.sys
    Timestamp:        unavailable (FFFFFFFE)
    CheckSum:         missing
    ImageSize:        00036000

kd> dd f7229000
f7229000  ???????? ???????? ???????? ????????
f7229010  ???????? ???????? ???????? ????????
f7229020  ???????? ???????? ???????? ????????
f7229030  ???????? ???????? ???????? ????????
f7229040  ???????? ???????? ???????? ????????
f7229050  ???????? ???????? ???????? ????????
f7229060  ???????? ???????? ???????? ????????
f7229070  ???????? ???????? ???????? ????????

If due to some reasons you cannot increase the size of your page file then just configure “Kernel memory dump” in Startup and Recovery. For most all bugchecks kernel memory dump is sufficient except manual crash dumps when you need to inspect user process space.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 17)

Friday, July 20th, 2007

.NET programs also crash either from defects in .NET runtime (Common Language Runtime, CLR) or from non-handled runtime exceptions in managed code executed by .NET virtual machine. The latter exceptions are re-thrown from .NET runtime to be handled by operating system and intercepted by native debuggers. Therefore our next crash dump analysis pattern is called Managed Code Exception

When you get a dump from .NET application it is the dump from a native process. !analyze -v output can usually tell you that exception is actually CLR exception and give you other hints to look at managed code stack (CLR stack):

FAULTING_IP:
kernel32!RaiseException+53
77e4bee7 5e              pop     esi

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 77e4bee7 (kernel32!RaiseException+0x00000053)
   ExceptionCode: e0434f4d (CLR exception)
   ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 80131604

DEFAULT_BUCKET_ID:  CLR_EXCEPTION

PROCESS_NAME:  mmc.exe

ERROR_CODE: (NTSTATUS) 0xe0434f4d - <Unable to get error code text>

MANAGED_STACK: !dumpstack -EE
No export dumpstack found

STACK_TEXT:
05faf3d8 79f97065 e0434f4d 00000001 00000001 kernel32!RaiseException+0x53
WARNING: Stack unwind information not available. Following frames may be wrong.
05faf438 7a0945a4 023f31e0 00000000 00000000 mscorwks!DllCanUnloadNowInternal+0×37a9
05faf4fc 00f2f00a 02066be4 02085ee8 023d0df0 mscorwks!CorLaunchApplication+0×12005
05faf500 02066be4 02085ee8 023d0df0 023d0e2c 0xf2f00a
05faf504 02085ee8 023d0df0 023d0e2c 05e00dfa 0×2066be4
05faf508 023d0df0 023d0e2c 05e00dfa 023d0e10 0×2085ee8
05faf50c 023d0e2c 05e00dfa 023d0e10 05351d30 0×23d0df0
05faf510 05e00dfa 023d0e10 05351d30 023d0e10 0×23d0e2c

FOLLOWUP_IP:
mscorwks!DllCanUnloadNowInternal+37a9
79f97065 c745fcfeffffff  mov     dword ptr [ebp-4],0FFFFFFFEh

SYMBOL_NAME:  mscorwks!DllCanUnloadNowInternal+37a9

MODULE_NAME: mscorwks

IMAGE_NAME:  mscorwks.dll

PRIMARY_PROBLEM_CLASS:  CLR_EXCEPTION

BUGCHECK_STR:  APPLICATION_FAULT_CLR_EXCEPTION

Sometimes you can see mscorwks.dll on raw stack or see it loaded and find it on other thread stacks than the current one.

When you get such hints you might want to get managed code stack as well. First you need to load the appropriate WinDbg SOS extension (Son of Strike) corresponding to .NET runtime version. This can be done by the following command:

0:015> .loadby sos mscorwks

You can check which SOS extension version was loaded this by using .chain command:

0:015> .chain
Extension DLL search Path:
...
...
...
Extension DLL chain:
    C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos: image 2.0.50727.42, API 1.0.0, built Fri Sep 23 08:27:26 2005
        [path: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll]

    dbghelp: image 6.6.0007.5, API 6.0.6, built Sat Jul 08 21:11:32 2006
        [path: C:\Program Files\Debugging Tools for Windows\dbghelp.dll]
    ext: image 6.6.0007.5, API 1.0.0, built Sat Jul 08 21:10:52 2006
        [path: C:\Program Files\Debugging Tools for Windows\winext\ext.dll]
    exts: image 6.6.0007.5, API 1.0.0, built Sat Jul 08 21:10:48 2006
        [path: C:\Program Files\Debugging Tools for Windows\WINXP\exts.dll]
    uext: image 6.6.0007.5, API 1.0.0, built Sat Jul 08 21:11:02 2006
        [path: C:\Program Files\Debugging Tools for Windows\winext\uext.dll]
    ntsdexts: image 6.0.5457.0, API 1.0.0, built Sat Jul 08 21:29:38 2006
        [path: C:\Program Files\Debugging Tools for Windows\WINXP\ntsdexts.dll]

Then you can use !dumpstack to dump the current stack or !EEStack command to dump all thread stacks. The native stack trace would be mixed with managed stack trace:

0:015> !dumpstack
OS Thread Id: 0x16e8 (15)
Current frame: kernel32!RaiseException+0x53
ChildEBP RetAddr Caller,Callee
05faf390 77e4bee7 kernel32!RaiseException+0x53, calling ntdll!RtlRaiseException
05faf3a8 79e814da mscorwks!Binder::RawGetClass+0x23, calling mscorwks!Module::LookupTypeDef
05faf3bc 79e87ff4 mscorwks!Binder::IsClass+0x21, calling mscorwks!Binder::RawGetClass
05faf3c8 79f958b8 mscorwks!Binder::IsException+0x13, calling mscorwks!Binder::IsClass
05faf3d8 79f97065 mscorwks!RaiseTheExceptionInternalOnly+0x226, calling kernel32!RaiseException
05faf438 7a0945a4 mscorwks!JIT_Throw+0xd0, calling mscorwks!RaiseTheExceptionInternalOnly
05faf4ac 7a0944ea mscorwks!JIT_Throw+0x1e, calling mscorwks!LazyMachStateCaptureState
05faf4c8 793d424e (MethodDesc 0x7924ad68 +0x2e System.Threading.WaitHandle.WaitOne(Int64, Boolean)), calling mscorwks!WaitHandleNative::CorWaitOneNative
05faf4fc 00f2f00a (MethodDesc 0x4f97500 +0x9a Ironring.Management.MMC.SnapinBase+MmcWindow.Invoke(System.Delegate, System.Object[])), calling mscorwks!JIT_Throw
05faf510 05e00dfa (MethodDesc 0×4f98fd8 +0xca MyNamespace.MyClass.MyMethod(Boolean)), calling 05fc7124
05faf55c 00f62fbc (MethodDesc 0×4f95f90 +0×16f4 MyNamespace.MyClass.MyMethod.Initialise(System.Object))

05faf740 793d912f (MethodDesc 0×7925fc70 +0×2f System.Threading._ThreadPoolWaitCallback.WaitCallback_Context(System.Object))
05faf748 793683dd (MethodDesc 0×7913f3d0 +0×81 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object))
05faf75c 793d9218 (MethodDesc 0×7925fc80 +0×6c System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(System.Object)), calling (MethodDesc 0×7913f3d0 +0 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object))
05faf774 79e88f63 mscorwks!CallDescrWorker+0×33
05faf784 79e88ee4 mscorwks!CallDescrWorkerWithHandler+0xa3, calling mscorwks!CallDescrWorker
05faf804 79f20212 mscorwks!DispatchCallBody+0×1e, calling mscorwks!CallDescrWorkerWithHandler
05faf824 79f201bc mscorwks!DispatchCallDebuggerWrapper+0×3d, calling mscorwks!DispatchCallBody
05faf888 79f2024b mscorwks!DispatchCallNoEH+0×51, calling mscorwks!DispatchCallDebuggerWrapper
05faf8bc 7a07bdf0 mscorwks!Holder,2>::~Holder,2>+0xbb, calling mscorwks!DispatchCallNoEH
05faf90c 77e61d1e kernel32!WaitForSingleObjectEx+0xac, calling ntdll!ZwWaitForSingleObject
05faf91c 79ecb4a4 mscorwks!Thread::UserResumeThread+0xfb
05faf92c 79ecb442 mscorwks!Thread::DoADCallBack+0×355, calling mscorwks!Thread::UserResumeThread+0xae
05faf950 79e74afe mscorwks!Thread::EnterRuntimeNoThrow+0×9b, calling mscorwks!_EH_epilog3
05faf988 79e77fe8 mscorwks!PEImage::LoadImage+0×1e1, calling mscorwks!_SEH_epilog4
05faf9c0 79ecb364 mscorwks!Thread::DoADCallBack+0×541, calling mscorwks!Thread::DoADCallBack+0×2a5
05faf9fc 7a0e1b7e mscorwks!Thread::DoADCallBack+0×575, calling mscorwks!Thread::DoADCallBack+0×4d4
05fafa24 7a0e1bab mscorwks!ManagedThreadBase::ThreadPool+0×13, calling mscorwks!Thread::DoADCallBack+0×550
05fafa38 7a07cae8 mscorwks!QueueUserWorkItemCallback+0×9d, calling mscorwks!ManagedThreadBase::ThreadPool
05fafa54 7a07ca48 mscorwks!QueueUserWorkItemCallback, calling mscorwks!UnwindAndContinueRethrowHelperAfterCatch
05fafa90 7a110f08 mscorwks!ThreadpoolMgr::ExecuteWorkRequest+0×40
05fafaa8 7a112328 mscorwks!ThreadpoolMgr::WorkerThreadStart+0×1f2, calling mscorwks!ThreadpoolMgr::ExecuteWorkRequest
05fafad0 79e7839d mscorwks!EEHeapFreeInProcessHeap+0×21, calling mscorwks!EEHeapFree
05fafae0 79e782dc mscorwks!operator delete[]+0×30, calling mscorwks!EEHeapFreeInProcessHeap
05fafb14 79ecb00b mscorwks!Thread::intermediateThreadProc+0×49
05fafb48 77e65512 kernel32!FlsSetValue+0xc7, calling kernel32!_SEH_epilog
05fafb6c 75da14d0 sxs!_calloc_crt+0×19, calling sxs!calloc
05fafb80 77e65512 kernel32!FlsSetValue+0xc7, calling kernel32!_SEH_epilog
05fafb88 75da1401 sxs!_CRT_INIT+0×17e, calling sxs!_initptd
05fafb8c 75da1408 sxs!_CRT_INIT+0×185, calling kernel32!GetCurrentThreadId
05fafb9c 30403805 MMCFormsShim!DllMain+0×15, calling MMCFormsShim!PrxDllMain
05fafbb0 30418b69 MMCFormsShim!__DllMainCRTStartup+0×7a, calling MMCFormsShim!DllMain
05fafbdc 75de0e4c sxs!_SxsDllMain+0×87, calling sxs!DllStartup_CrtInit
05fafbf0 30418bf9 MMCFormsShim!__DllMainCRTStartup+0×10a, calling MMCFormsShim!__SEH_epilog4
05fafbf4 30418c22 MMCFormsShim!_DllMainCRTStartup+0×1d, calling MMCFormsShim!__DllMainCRTStartup
05fafbfc 7c81a352 ntdll!LdrpCallInitRoutine+0×14
05fafc24 7c82ee8b ntdll!LdrpInitializeThread+0×1a5, calling ntdll!RtlLeaveCriticalSection
05fafc2c 7c82edec ntdll!LdrpInitializeThread+0×18f, calling ntdll!_SEH_epilog
05fafc7c 7c82ed71 ntdll!LdrpInitializeThread+0xd8, calling ntdll!RtlActivateActivationContextUnsafeFast
05fafc80 7c82ed35 ntdll!LdrpInitializeThread+0×12c, calling ntdll!RtlDeactivateActivationContextUnsafeFast
05fafcb4 7c82edec ntdll!LdrpInitializeThread+0×18f, calling ntdll!_SEH_epilog
05fafcb8 7c827c3b ntdll!NtTestAlert+0xc
05fafcbc 7c82ecb1 ntdll!_LdrpInitialize+0×1de, calling ntdll!_SEH_epilog
05fafd10 7c82ecb1 ntdll!_LdrpInitialize+0×1de, calling ntdll!_SEH_epilog
05fafd14 7c826d9b ntdll!NtContinue+0xc
05fafd18 7c8284da ntdll!KiUserApcDispatcher+0×3a, calling ntdll!NtContinue
05faffa4 79ecaff9 mscorwks!Thread::intermediateThreadProc+0×37, calling mscorwks!_alloca_probe_16
05faffb8 77e64829 kernel32!BaseThreadStart+0×34

.NET language symbolic names are usually reconstructed from .NET assembly metadata. 

You can examine a CLR exception and get managed stack trace by using !PrintException and !CLRStack commands, for example:

0:014> !PrintException
Exception object: 02320314
Exception type: System.Reflection.TargetInvocationException
Message: Exception has been thrown by the target of an invocation.
InnerException: System.Runtime.InteropServices.COMException, use !PrintException 023201a8 to see more
StackTrace (generated):
    SP       IP       Function
    075AF4FC 016BFD9A Ironring.Management.MMC.SnapinBase+MmcWindow.Invoke(System.Delegate, System.Object[])
    ...
    ...
    ...
    075AF740 793D87AF System.Threading._ThreadPoolWaitCallback.WaitCallback_Context(System.Object)
    075AF748 793608FD System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
    075AF760 793D8898 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(System.Object)

StackTraceString: <none>
HResult: 80131604

0:014> !PrintException 023201a8
Exception object: 023201a8
Exception type: System.Runtime.InteropServices.COMException
Message: Error HRESULT E_FAIL has been returned from a call to a COM component.
InnerException: <none>
StackTrace (generated):
    SP       IP       Function
    00000000 00000001 Ironring.Management.MMC.IMMCFormsShim.HostUserControl3(System.Object, System.Object, System.String, System.String, Int32, Int32)
    0007F724 073875B9 Ironring.Management.MMC.FormNode.SetShimControl(System.Object)
    0007F738 053D9DDE Ironring.Management.MMC.FormNode.set_ControlType(System.Type)
    ...
    ...
    ...

StackTraceString: <none>
HResult: 80004005

0:014> !CLRStack
OS Thread Id: 0x11ec (14)
ESP       EIP
075af4fc 016bfd9a Ironring.Management.MMC.SnapinBase+MmcWindow.Invoke(System.Delegate, System.Object[])
...
...
...
075af740 793d87af System.Threading._ThreadPoolWaitCallback.WaitCallback_Context(System.Object)
075af748 793608fd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
075af760 793d8898 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(System.Object)
075af8f0 79e7be1b [GCFrame: 075af8f0]

!help command gives the list of other available SOS extension commands:

0:014> !help

Object Inspection

DumpObj (do)
DumpArray (da)
DumpStackObjects (dso)
DumpHeap
DumpVC
GCRoot
ObjSize
FinalizeQueue
PrintException (pe)
TraverseHeap

Examining code and stacks

Threads
CLRStack
IP2MD
U
DumpStack
EEStack
GCInfo
EHInfo
COMState
BPMD

Examining CLR data structures

DumpDomain
EEHeap
Name2EE
SyncBlk
DumpMT
DumpClass
DumpMD
Token2EE
EEVersion
DumpModule
ThreadPool
DumpAssembly
DumpMethodSig
DumpRuntimeTypes
DumpSig
RCWCleanupList
DumpIL

Diagnostic Utilities

VerifyHeap
DumpLog
FindAppDomain
SaveModule
GCHandles
GCHandleLeaks
VMMap
VMStat
ProcInfo
StopOnException (soe)
MinidumpMode

Other

FAQ

If you are new to .NET and interested in .NET debugging I would recommend 3 books:

Essential .NET, Volume I: The Common Language Runtime

Buy from Amazon

Debugging Microsoft .NET 2.0 Applications

Buy from Amazon

Advanced .NET Debugging

Buy from Amazon

Expert .NET 2.0 IL Assembler

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

COM+ crash dumps

Monday, July 16th, 2007

If you have problems with COM+ components you can configure Component Services in Control Panel to save a dump:

Refer to the following article for details:

http://msdn.microsoft.com/msdnmag/issues/01/08/ComXP/

If you want to use userdump.exe to save a crash dump when a failing COM+ application displays an error dialog box refer to the following article:

http://support.microsoft.com/kb/287643

If you want dumps to be automatically collected after some timeout value refer to the following article for details:

http://support.microsoft.com/kb/910904/ 

If you have an exception the following article describes how to get a stack trace from a saved process dump:

http://support.microsoft.com/kb/317317

This article explains how COM+ handles application faults:

Fault Isolation and Failfast Policy

Now I show how to get an error message that was written to event log when COM+ application was terminated due to a different error code than an access violation. If you get a dump from COM+ process look at all threads and find the one that runs through comsvcs.dll:

0:000> ~*k
...
...
...
6 Id: 8d4.1254 Suspend: 0 Teb: 7ffd9000 Unfrozen
ChildEBP RetAddr
0072ee30 7c822124 ntdll!KiFastSystemCallRet
0072ee34 77e6baa8 ntdll!NtWaitForSingleObject+0xc
0072eea4 77e6ba12 kernel32!WaitForSingleObjectEx+0xac
0072eeb8 75c2b250 kernel32!WaitForSingleObject+0x12
0072f340 75c2bb91 comsvcs!FF_RunCmd+0xa2
0072f60c 75c2bc76 comsvcs!FF_DumpProcess_MD+0x21a
0072f850 75c2be83 comsvcs!FF_DumpProcess+0x39
0072fdc0 75c2c351 comsvcs!FailFastStr+0x2ce
0072fe20 75bf31fa comsvcs!CError::WriteToLog+0x198
0072fe8c 75bf3d48 comsvcs!CSurrogateServices::FireApplicationLaunch+0x13b
0072fee0 75bf3e19 comsvcs!CApplication::AsyncApplicationLaunch+0x101
0072feec 7c81a3c5 comsvcs!CApplication::AppLaunchThreadProc+0x18
0072ff44 7c8200fc ntdll!RtlpWorkerCallout+0x71
0072ff64 7c81a3fa ntdll!RtlpExecuteWorkerRequest+0x4f
0072ff78 7c82017f ntdll!RtlpApcCallout+0x11
0072ffb8 77e66063 ntdll!RtlpWorkerThread+0x61
0072ffec 00000000 kernel32!BaseThreadStart+0x34
...
...
...

0:000> ~*kL
...
...
...
   6  Id: 8d4.1254 Suspend: 0 Teb: 7ffd9000 Unfrozen
ChildEBP RetAddr  Args to Child
0072ee30 7c822124 77e6baa8 00000394 00000000
ntdll!KiFastSystemCallRet
0072ee34 77e6baa8 00000394 00000000 00000000
ntdll!NtWaitForSingleObject+0xc
0072eea4 77e6ba12 00000394 ffffffff 00000000
kernel32!WaitForSingleObjectEx+0xac
0072eeb8 75c2b250 00000394 ffffffff 0072f640
kernel32!WaitForSingleObject+0x12
0072f340 75c2bb91 75b8e7fc 75b8e810 000008d4
comsvcs!FF_RunCmd+0xa2
0072f60c 75c2bc76 0072f640 75c6c5c0 0072fe44
comsvcs!FF_DumpProcess_MD+0x21a
0072f850 75c2be83 00000000 77ce21ce 0bd5f0f0
comsvcs!FF_DumpProcess+0×39
0072fdc0 75c2c351 75c6c5c0 75b8b008 00000142
comsvcs!FailFastStr+0×2ce
0072fe20 75bf31fa 0072fe44 75b8b008 00000142
comsvcs!CError::WriteToLog+0×198
0072fe8c 75bf3d48 0bcf5d0c 00000000 0bcf5cf8
comsvcs!CSurrogateServices::FireApplicationLaunch+0×13b
0072fee0 75bf3e19 75bf3e01 0072ff44 7c81a3c5
comsvcs!CApplication::AsyncApplicationLaunch+0×101
0072feec 7c81a3c5 0bcf5cf8 7c889880 0bcf5d50
comsvcs!CApplication::AppLaunchThreadProc+0×18
0072ff44 7c8200fc 75bf3e01 0bcf5cf8 00000000
ntdll!RtlpWorkerCallout+0×71
0072ff64 7c81a3fa 00000000 0bcf5cf8 0bcf5d50
ntdll!RtlpExecuteWorkerRequest+0×4f
0072ff78 7c82017f 7c8200bb 00000000 0bcf5cf8
ntdll!RtlpApcCallout+0×11
0072ffb8 77e66063 00000000 00000000 00000000
ntdll!RtlpWorkerThread+0×61
0072ffec 00000000 7c83ad38 00000000 00000000
kernel32!BaseThreadStart+0×34


FF_DumpProcess function is an indication that the process was being dumped. There is no ComSvcsExceptionFilter function on thread stack but we can still get an error message if we look at FailFastStr function arguments:

0:000> du 75c6c5c0 75c6c5c0+400
75c6c5c0  “{646F1874-46B6-4149-BD55-8C317FB”
75c6c600  “71CC0}….Server Application ID:”
75c6c640  ” {646F1874-46B6-4149-BD55-8C317F”
75c6c680  “B71CC0}..Server Application Inst”
75c6c6c0  “ance ID:..{7A39BC48-78DA-4FBB-A7″
75c6c700  “46-EEA7E42CDAC7}..Server Applica”
75c6c740  “tion Name: My Server”
75c6c780  “..The serious nature of this err”
75c6c7c0  “or has caused the process to ter”
75c6c800  “minate…Error Code = 0×80131600″
75c6c840  ” : ..COM+ Services Internals Inf”
75c6c880  “ormation:..File: d:\nt\com\compl”
75c6c8c0  “us\src\comsvcs\srgtapi\csrgtserv”
75c6c900  “.cpp, Line: 322..Comsvcs.dll fil”
75c6c940  “e version: ENU 2001.12.4720.2517″
75c6c980  ” shp”

Also if we examine parameters of FF_RunCmd call we would see what application was used to dump a process:

ChildEBP RetAddr Args to Child
0072f340 75c2bb91 75b8e7fc 75b8e810 000008d4
comsvcs!FF_RunCmd+0xa2

0:000> du 75b8e7fc
75b8e7fc  “%s %d %s”
0:000> du 75b8e810
75b8e810  “RunDll32 comsvcs.dll,MiniDump”

We can guess that the first parameter is a format string, the second one is the command line for a process dumper, the third one is PID and the fourth one should be the name of a dump file to save. We can double check this from the raw stack:

ChildEBP RetAddr Args to Child
0072f340 75c2bb91 75b8e7fc 75b8e810 000008d4
comsvcs!FF_RunCmd+0xa2

0:000> dd 0072f340
0072f340  0072f60c 75c2bb91 75b8e7fc 75b8e810
     ; saved EBP, return EIP, 1st param, 2nd param
0072f350  000008d4 0072f640 0072f84a 00000000
     ; 3rd param, 4th param

0:000> du 0072f640
0072f640  “C:\WINDOWS\system32\com\dmp\{646″
0072f680  “F1874-46B6-4149-BD55-8C317FB71CC”
0072f6c0  “0}_2007_07_16_12_05_08.dmp”

We can actually find the already formatted command that was passed to CreateProcess call on the raw stack:

0:006> du 0072ef2c
0072ef2c  "RunDll32 comsvcs.dll,MiniDump 22"
0072ef6c  "60 C:\WINDOWS\system32\com\dmp\{"
0072efac  "646F1874-46B6-4149-BD55-8C317FB7"
0072efec  "1CC0}_2007_07_16_12_05_08.dmp"

- Dmitry Vostokov @ DumpAnalysis.org -