Archive for the ‘Debugging’ Category

Crash Dump Analysis Patterns (Part 36)

Wednesday, November 14th, 2007

The pattern I should have written as one of the first is called Local Buffer Overflow. It is observed on x86 platforms when a local variable and a function return address and/or saved frame pointer EBP are overwritten with some data. As a result, the instruction pointer EIP becomes Wild Pointer and we have a process crash in user mode or a bugcheck in kernel mode. Sometimes this pattern is diagnosed by looking at mismatched EBP and ESP values and in the case of ASCII or UNICODE buffer overflow EIP register may contain 4-char or 2-wchar_t value and ESP or EBP or both registers might point at some string fragment like in the example below:

0:000> r
eax=000fa101 ebx=0000c026 ecx=01010001 edx=bd43a010 esi=000003e0 edi=00000000
eip=0048004a esp=0012f158 ebp=00510044 iopl=0  nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000202
0048004a 0000 add     byte ptr [eax],al  ds:0023:000fa101=??

0:000> kL
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f154 00420047 0x48004a
0012f158 00440077 0x420047
0012f15c 00420043 0x440077
0012f160 00510076 0x420043
0012f164 00420049 0x510076
0012f168 00540041 0x420049
0012f16c 00540041 0x540041
...
...
...

Good buffer overflow case studies with complete analysis including assembly language tutorial can be found in Buffer Overflow Attacks book.

Buy from Amazon 

- Dmitry Vostokov @ DumpAnalysis.org -

TestDefaultDebugger.NET

Thursday, November 8th, 2007

Sometimes there are situations when we need to test exception handling to see whether it works and how to get dumps or logs from it. For example, a customer reports infrequent process crashes but no dumps are saved. Then we can try some application that crashes immediately to see whether it results in error messages and/or saved crash dumps. This was the motivation behind TestDefaultDebugger package. Unfortunately it contains only native applications and today I needed to test .NET CLR exception handling and see what messages it shows in my environment. So I wrote a simple program in C# that creates an empty Stack object and then calls its Pop method which triggers “Stack empty” exception sufficient for my purposes.

The updated package now includes TestDefaultDebugger.NET.exe and can be downloaded from Citrix support web site (requires free registration):

Download TestDefaultDebugger package

- Dmitry Vostokov @ DumpAnalysis.org -

Symbol file warnings in WinDbg 6.8.0004.0

Thursday, November 8th, 2007

I started using new WinDbg 6.8.4.0 and found that it prints the following message twice when I open a process dump or a complete memory dump where the current context is from some user mode process:

0:000> !analyze -v
...
...
...
***
***    Your debugger is not using the correct symbols
***
***    In order for this command to work properly, your symbol path
***    must point to .pdb files that have full type information.
***
***    Certain .pdb files (such as the public OS symbols) do not
***    contain the required information.  Contact the group that
***    provided you with these symbols if you need this command to
***    work.
***
***    Type referenced: kernel32!pNlsUserInfo
***

Fortunately kernel32.dll symbols were loaded correctly despite the warning:

0:000> lmv m kernel32
start    end        module name
77e40000 77f42000   kernel32   (pdb symbols)          c:\mssymbols\kernel32.pdb\DF4F569C743446809ACD3DFD1E9FA2AF2\kernel32.pdb
    Loaded symbol image file: kernel32.dll
    Image path: C:\WINDOWS\system32\kernel32.dll
    Image name: kernel32.dll
    Timestamp:        Tue Jul 25 13:31:53 2006 (44C60F39)
    CheckSum:         001059A9
    ImageSize:        00102000
    File version:     5.2.3790.2756
    Product version:  5.2.3790.2756
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     kernel32
    OriginalFilename: kernel32
    ProductVersion:   5.2.3790.2756
    FileVersion:      5.2.3790.2756 (srv03_sp1_gdr.060725-0040)
    FileDescription:  Windows NT BASE API Client DLL
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

Also double checking return addresses on the stack trace shows that symbol mapping was correct (from another dump with the same message):

kd> dpu kernel32!pNlsUserInfo l1
77ecb0a8  77ecb760 "ENU"

kd> kv
ChildEBP RetAddr  Args to Child
f552bbec f79e1743 000000e2 cccccccc 858a0470 nt!KeBugCheckEx+0x1b
WARNING: Stack unwind information not available. Following frames may be wrong.
f552bc38 8081d39d 85699390 8596fe78 860515f8 SystemDump+0x743
f552bc4c 808ec789 8596fee8 860515f8 8596fe78 nt!IofCallDriver+0x45
f552bc60 808ed507 85699390 8596fe78 860515f8 nt!IopSynchronousServiceTail+0x10b
f552bd00 808e60be 00000090 00000000 00000000 nt!IopXxxControlFile+0x5db
f552bd34 80882fa8 00000090 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
f552bd34 7c82ed54 00000090 00000000 00000000 nt!KiFastCallEntry+0xf8
0012efc4 7c8213e4 77e416f1 00000090 00000000 ntdll!KiFastSystemCallRet
0012efc8 77e416f1 00000090 00000000 00000000 ntdll!NtDeviceIoControlFile+0xc
0012f02c 00402208 00000090 9c400004 00947eb8 kernel32!DeviceIoControl+0×137
0012f884 00404f8e 0012fe80 00000001 00000000 SystemDump_400000+0×2208

kd> ub 77e416f1
kernel32!DeviceIoControl+0x11d:
77e416db lea     eax,[ebp-28h]
77e416de push    eax
77e416df push    ebx
77e416e0 push    ebx
77e416e1 push    ebx
77e416e2 push    dword ptr [ebp+8]
77e416e5 je      kernel32!DeviceIoControl+0x131 (77e417f3)
77e416eb call    dword ptr [kernel32!_imp__NtDeviceIoControlFile (77e4103c)]

So everything is allright and messages above shall be ignored. I also got e-mails from other people having the same problem so it seems to be related with this WinDbg release and not with my debugging environment.

- Dmitry Vostokov @ DumpAnalysis.org -

WinDbg has been updated to version 6.8.4.0

Wednesday, November 7th, 2007

A bit late notice. I have just found that the new version of WinDbg was released last month:

http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx

Seems not so many enhancements in this release according to the link below and relnotes.txt and at least it is not called Beta:

http://www.microsoft.com/whdc/devtools/debugging/whatsnew.mspx

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 34)

Tuesday, November 6th, 2007

Although crash dumps are static in nature they contain Historical Information about past system dynamics that might give clues to a problem and help with troubleshooting and debugging.

For example, IRP flow between user processes and drivers is readily available in any kernel or complete memory dump. WinDbg !irpfind command will show the list of currently present I/O request packets. !irp command will give individual packet details. 

Recent Driver Verifier improvements in Vista and Windows Server 2008 allow to embed stack traces associated with IRP allocation, completion and cancellation. For information please look at the following document:

http://www.microsoft.com/whdc/devtools/tools/vistaverifier.mspx

Other information that can be included in process, kernel and complete memory dumps may reveal some history of function calls beyond the current snapshot of thread stacks:

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 13c)

Friday, November 2nd, 2007

Although handle leaks may result in insufficient pool memory, many drivers allocate their own private memory and specify a 4-letter ASCII tag, for example, non-paged pool from my x64 Vista workstation:

lkd> !poolused 3
   Sorting by  NonPaged Pool Consumed

  Pool Used:
            NonPaged
 Tag    Allocs    Frees     Diff     Used
 EtwB      304      134      170  6550080  Etw Buffer , Binary: nt!etw
 File 32630649 32618671    11978  3752928  File objects
 Pool       16       11        5  3363472  Pool tables, etc.
 Ntfr   204791   187152    17639  2258704  ERESOURCE , Binary: ntfs.sys
 FMsl   199039   187685    11354  2179968  STREAM_LIST_CTRL structure , Binary: fltmgr.sys
 MmCa   250092   240351     9741  2134368  Mm control areas for mapped files , Binary: nt!mm
 ViMm   135503   134021     1482  1783824  Video memory manager , Binary: dxgkrnl.sys
 Cont       53       12       41  1567664  Contiguous physical memory allocations for device drivers
 Thre    72558    71527     1031  1234064  Thread objects , Binary: nt!ps
 VoSm      872      851       21  1220544  Bitmap allocations , Binary: volsnap.sys
 NtFs  8122505  8110933    11572  1190960  StrucSup.c , Binary: ntfs.sys
 AmlH        1        0        1  1048576  ACPI AMLI Pooltags
 SaSc    20281    14820     5461  1048512  UNKNOWN pooltag ‘SaSc’, please update pooltag.txt
 RaRS     1000        0     1000   960000  UNKNOWN pooltag ‘RaRS’, please update pooltag.txt


If the pool tag is unknown the following Microsoft article KB298102 explains how to locate the corresponding driver. We can also use memory search in WinDbg to locate kernel space addresses and see what modules they correspond to. 

WinDbg shows the number of failed pool allocations and also shows a message when pool usage is nearly its maximum. Below I put some examples with possible troubleshooting hints.

Session pool

3: kd> !vm

*** Virtual Memory Usage ***
       Physical Memory:     1572637 (   6290548 Kb)
       Page File: \??\C:\pagefile.sys
         Current:   3145728 Kb  Free Space:   3001132 Kb
         Minimum:   3145728 Kb  Maximum:      3145728 Kb
       Available Pages:     1317401 (   5269604 Kb)
       ResAvail Pages:      1478498 (   5913992 Kb)
       Locked IO Pages:         114 (       456 Kb)
       Free System PTEs:     194059 (    776236 Kb)
       Free NP PTEs:          32766 (    131064 Kb)
       Free Special NP:           0 (         0 Kb)
       Modified Pages:          443 (      1772 Kb)
       Modified PF Pages:       442 (      1768 Kb)
       NonPagedPool Usage:    13183 (     52732 Kb)
       NonPagedPool Max:      65215 (    260860 Kb)

       PagedPool 0 Usage:     11328 (     45312 Kb)
       PagedPool 1 Usage:      1473 (      5892 Kb)
       PagedPool 2 Usage:      1486 (      5944 Kb)
       PagedPool 3 Usage:      1458 (      5832 Kb)
       PagedPool 4 Usage:      1505 (      6020 Kb)
       PagedPool Usage:       17250 (     69000 Kb)
       PagedPool Maximum:     65536 (    262144 Kb)

 

       ********** 3441 pool allocations have failed **********
 

       Shared Commit:          8137 (     32548 Kb)
       Special Pool:              0 (         0 Kb)
       Shared Process:         8954 (     35816 Kb)
       PagedPool Commit:      17312 (     69248 Kb)
       Driver Commit:          2095 (      8380 Kb)
       Committed pages:      212476 (    849904 Kb)
       Commit limit:        2312654 (   9250616 Kb)

Paged and non-paged pool usage is far from maximum therefore we check session pool:

3: kd> !vm 4

       Terminal Server Memory Usage By Session:
 

       Session Paged Pool Maximum is 32768K
       Session View Space Maximum is 20480K

 

       Session ID 0 @ f79a1000:
       Paged Pool Usage:        9824K
       Commit Usage:           10148K

 

       Session ID 2 @ f7989000:
       Paged Pool Usage:        1212K
       Commit Usage:            2180K

 

       Session ID 9 @ f79b5000:
       Paged Pool Usage:       32552K

 

       *** 7837 Pool Allocation Failures ***
 

       Commit Usage:           33652K

Here Microsoft article KB840342 might help.

Paged pool

We might have a direct warning:

1: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:   511881   ( 2047524 Kb)
 Page File: \??\S:\pagefile.sys
    Current:   2098176Kb Free Space:   1837740Kb
    Minimum:   2098176Kb Maximum:      2098176Kb
 Page File: \??\R:\pagefile.sys
    Current:   1048576Kb Free Space:    792360Kb
    Minimum:   1048576Kb Maximum:      1048576Kb
 Available Pages:   201353   (  805412 Kb)
 ResAvail Pages:    426839   ( 1707356 Kb)
 Modified Pages:     45405   (  181620 Kb)
 NonPagedPool Usage: 10042   (   40168 Kb)
 NonPagedPool Max:   68537   (  274148 Kb)
 PagedPool 0 Usage:  26820   (  107280 Kb)
 PagedPool 1 Usage:   1491   (    5964 Kb)
 PagedPool 2 Usage:   1521   (    6084 Kb)
 PagedPool 3 Usage:   1502   (    6008 Kb)
 PagedPool 4 Usage:   1516   (    6064 Kb)
 ********** Excessive Paged Pool Usage *****
 PagedPool Usage:    32850   (  131400 Kb)
 PagedPool Maximum:  40960   (  163840 Kb)
 Shared Commit:      14479   (   57916 Kb)
 Special Pool:           0   (       0 Kb)
 Free System PTEs:  135832   (  543328 Kb)
 Shared Process:     15186   (   60744 Kb)
 PagedPool Commit:   32850   (  131400 Kb)
 Driver Commit:       1322   (    5288 Kb)
 Committed pages:   426786   ( 1707144 Kb)
 Commit limit:     1259456   ( 5037824 Kb)

or if there is no warning we can check the size manually and if paged pool usage is close to its maximum but for non-paged pool it is not then most likely failed allocations were from paged pool:

0: kd> !vm
 

*** Virtual Memory Usage ***
       Physical Memory:     4193696 (  16774784 Kb)
       Page File: \??\C:\pagefile.sys
         Current:   4193280 Kb  Free Space:   3313120 Kb
         Minimum:   4193280 Kb  Maximum:      4193280 Kb
       Available Pages:     3210617 (  12842468 Kb)
       ResAvail Pages:      4031978 (  16127912 Kb)
       Locked IO Pages:         120 (       480 Kb)
       Free System PTEs:      99633 (    398532 Kb)
       Free NP PTEs:          26875 (    107500 Kb)
       Free Special NP:           0 (         0 Kb)
       Modified Pages:          611 (      2444 Kb)
       Modified PF Pages:       590 (      2360 Kb)
       NonPagedPool 0 Used:    8271 (   33084 Kb)
       NonPagedPool 1 Used:   13828 (   55312 Kb)
       NonPagedPool Usage:    37846 (    151384 Kb)
       NonPagedPool Max:      65215 (    260860 Kb)

       PagedPool 0 Usage:     82308 (    329232 Kb)
       PagedPool 1 Usage:     12700 (     50800 Kb)
       PagedPool 2 Usage:     25702 (    102808 Kb)
       PagedPool Usage:      120710 (    482840 Kb)
       PagedPool Maximum:    134144 (    536576 Kb)

 

      ********** 818 pool allocations have failed **********
 

       Shared Commit:         80168 (    320672 Kb)
       Special Pool:              0 (         0 Kb)
       Shared Process:        55654 (    222616 Kb)
       PagedPool Commit:     120772 (    483088 Kb)
       Driver Commit:          1890 (      7560 Kb)
       Committed pages:     1344388 (   5377552 Kb)
       Commit limit:        5177766 (  20711064 Kb)

!poolused 4 WinDbg command will sort paged pool consumption by pool tag:

0: kd> !poolused 4
   Sorting by  Paged Pool Consumed

 

  Pool Used:
            NonPaged            Paged
 Tag    Allocs     Used    Allocs     Used
 MmSt        0        0     85622 140642616     Mm section object prototype ptes , Binary: nt!mm
 Ntff        5     1040     63715 51991440      FCB_DATA , Binary: ntfs.sys

Here Microsoft article KB312362 might help.

Non-paged pool

0: kd> !vm
 

*** Virtual Memory Usage ***
       Physical Memory:      851775 (   3407100 Kb)
       Page File: \??\C:\pagefile.sys
         Current:   4190208 Kb  Free Space:   4175708 Kb
         Minimum:   4190208 Kb  Maximum:      4190208 Kb
       Available Pages:      147274 (    589096 Kb)
       ResAvail Pages:       769287 (   3077148 Kb)
       Locked IO Pages:         118 (       472 Kb)
       Free System PTEs:     184910 (    739640 Kb)
       Free NP PTEs:            110 (       440 Kb)
       Free Special NP:           0 (         0 Kb)
       Modified Pages:          168 (       672 Kb)
       Modified PF Pages:       168 (       672 Kb)
       NonPagedPool Usage:    64445 (    257780 Kb)
       NonPagedPool Max:      64640 (    258560 Kb)
       ********** Excessive NonPaged Pool Usage *****
       PagedPool 0 Usage:     21912 (     87648 Kb)
       PagedPool 1 Usage:       691 (      2764 Kb)
       PagedPool 2 Usage:       706 (      2824 Kb)
       PagedPool 3 Usage:       704 (      2816 Kb)
       PagedPool 4 Usage:       708 (      2832 Kb)
       PagedPool Usage:       24721 (     98884 Kb)
       PagedPool Maximum:    134144 (    536576 Kb)

 

       ********** 429 pool allocations have failed **********
 

       Shared Commit:          5274 (     21096 Kb)
       Special Pool:              0 (         0 Kb)
       Shared Process:         3958 (     15832 Kb)
       PagedPool Commit:      24785 (     99140 Kb)
       Driver Commit:         19289 (     77156 Kb)
       Committed pages:      646282 (   2585128 Kb)
       Commit limit:        1860990 (   7443960 Kb)

!poolused 3 WinDbg command will sort non-paged pool consumption by pool tag:

0: kd> !poolused 3
   Sorting by  NonPaged Pool Consumed

 

  Pool Used:
            NonPaged
 Tag    Allocs    Frees     Diff
 Ddk   9074558  3859522  5215036  Default for driver allocated memory (user’s of ntddk.h)
 MmCm    43787    42677     1110  Calls made to MmAllocateContiguousMemory , Binary: nt!mm
 LSwi        1        0        1  initial work context
 TCPt  3281838  3281808       30  TCP/IP network protocol , Binary: TCP

Regarding Ddk tag I published a case study earlier:

The search for ‘Ddk’ tag

The following Microsoft article KB293857 explains how we can use xpool command from old kdex2×86.dll extension which even works for Windows 2003 dumps:

0: kd> !w2kfre\kdex2x86.xpool -map
unable to get NT!MmSizeOfNonPagedMustSucceed location
unable to get NT!MmSubsectionTopPage location
unable to get NT!MmKseg2Frame location
unable to get NT!MmNonPagedMustSucceed location

Status Map of Pool Area Pages
==============================
  'O': one page in use                              ('P': paged out)
  '<': start page of contiguous pages in use        ('{': paged out)
  '>': last page of contiguous pages in use         ('}': paged out)
  '=': intermediate page of contiguous pages in use ('-': paged out)
  '.': one page not used

Non-Paged Pool Area Summary
----------------------------
Maximum Number of Pages  = 64640 pages
Number of Pages In Use   = 36721 pages (56.8%)

          +00000  +08000   +10000  +18000   +20000  +28000   +30000  +38000
82780000: ..OO.OO.OO..O.OO .O..OO.OO.OO..O. OO.O..OO.O..OO.. ..OO.O..OO.OO.OO
827c0000: .O..OO....OO..O. OO.OO.OO....OO.. O....O..OO....OO .O..OO.O..OO..O.
82800000: ..O............. ................ ................ ................
82840000: ................ ................ ................ ................
82880000: ......O.....O... ..O.O.....O..... O.....O.....O... ..O.....O.......
828c0000: ..O.........O... ......OOO.....O. ....O.....O..... O.....O.........
82900000: .O.........OO... O....O........O. ......OO........ OO.O..O.........
82940000: ...............O ..O.OO........OO ................ ...O.....O......
82980000: O.........O..O.. ....O.........O. ........O.....O. ..O.........O...
829c0000: ........O....... ..O...........O. .O..O...O..O.... ..O.........O...
82a00000: ......O..O...... O.........O..... ....O.........O. ................
82a40000: ............O... O..O.O......OO.. ......O.....O... ..O.....O...O.OO
...
...
...
893c0000: ................ ................ ................ ................
89400000: ..........=..=.. ....=.....=..... =..=......=..=.. ....=..=......=.
89440000: ..=............. ............=... =..=.....=..=... =...=.=.....==..
89480000: ....==......=.=. .........=...... ====.=.=........ ................
894c0000: ................ ................ ..........=.=... ...==...........
89500000: ..=............. ..=............. ..=............. ..=.............
89540000: ..=............. ..=............. ..=............. ..=...=.....=..=
89580000: ......=..=...... =..=......=.==== ==..==.=....=... .=....=....=.==.
895c0000: =.....==........ ..=............. =..=......=...=. ................
89600000: ........=...=..= .....=......=..= ==....=......... .........=....=.
89640000: ..=...===...=... ==......=..=..=. ..=..=......=... ......=.=.....=.
...
...
...

Here is another example:

0: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:   786299   ( 3145196 Kb)
 Page File: \??\C:\pagefile.sys
    Current:   4193280Kb Free Space:   3407908Kb
    Minimum:   4193280Kb Maximum:      4193280Kb
 Available Pages:   200189   (  800756 Kb)
 ResAvail Pages:    657130   ( 2628520 Kb)
 Modified Pages:       762   (    3048 Kb)
 NonPagedPool Usage: 22948   (   91792 Kb)
 NonPagedPool Max:   70145   (  280580 Kb)
 PagedPool 0 Usage:  19666   (   78664 Kb)
 PagedPool 1 Usage:   3358   (   13432 Kb)
 PagedPool 2 Usage:   3306   (   13224 Kb)
 PagedPool 3 Usage:   3312   (   13248 Kb)
 PagedPool 4 Usage:   3309   (   13236 Kb)
 ********** Excessive Paged Pool Usage *****
 PagedPool Usage:    32951   (  131804 Kb)
 PagedPool Maximum:  40960   (  163840 Kb)
 Shared Commit:       9664   (   38656 Kb)
 Special Pool:           0   (       0 Kb)
 Free System PTEs:  103335   (  413340 Kb)
 Shared Process:     45024   (  180096 Kb)
 PagedPool Commit:   32951   (  131804 Kb)
 Driver Commit:       1398   (    5592 Kb)
 Committed pages:   864175   ( 3456700 Kb)
 Commit limit:     1793827   ( 7175308 Kb)

0: kd> !poolused 4
   Sorting by Paged Pool Consumed

  Pool Used:
            NonPaged            Paged
 Tag    Allocs     Used    Allocs     Used
 CM         85     5440     11045 47915424
 MyAV        0        0       186 14391520

 MmSt        0        0     11795 13235744
 Obtb      709    90752      2712 11108352
 Ntff        5     1120      9886  8541504


MyAV tag seems to be the prefix for MyAVDrv module and this is hardly a coincidence. Looking at the list of drivers we see that MyAVDrv.sys was loaded and unloaded several times. Could it be that it didn’t free its non-paged pool allocations?

0: kd> lmv m MyAVDrv.sys
start    end        module name

Unloaded modules:
a5069000 a5084000   MyAVDrv.sys
    Timestamp: unavailable (00000000)
    Checksum:  00000000
a5069000 a5084000   MyAVDrv.sys
    Timestamp: unavailable (00000000)
    Checksum:  00000000
a5069000 a5084000   MyAVDrv.sys
    Timestamp: unavailable (00000000)
    Checksum:  00000000
b93e1000 b93fc000   MyAVDrv.sys
    Timestamp: unavailable (00000000)
    Checksum:  00000000
b9ae5000 b9b00000   MyAVDrv.sys
    Timestamp: unavailable (00000000)
    Checksum:  00000000
be775000 be790000   MyAVDrv.sys
    Timestamp: unavailable (00000000)
    Checksum:  00000000

Also we see that CM tag has the most allocations and !locks command shows hundreds of threads waiting for registry, an example of High Contention pattern:

0: kd> !locks

Resource @ nt!CmpRegistryLock (0x80478b00)    Shared 10 owning threads
    Contention Count = 9149810
    NumberOfSharedWaiters = 718
    NumberOfExclusiveWaiters = 21

Therefore we see at least two problems in this memory dump: excessive paged pool usage and high thread contention around registry resource slowing down if not halting the system. 

- Dmitry Vostokov @ DumpAnalysis.org -

JIT service debugging

Wednesday, October 24th, 2007

If you have services running under network service account (prior to Vista) and they crash you can use NTSD from recent Debugging Tools for Windows and -noio switch as described in the following article:

http://www.debuginfo.com/articles/ntsdwatson.html 

You need to copy the latest ntsd.exe, dbghelp.dll and dbgeng.dll to some folder on your system if you don’t want to install Debugging Tools for Windows in your production environment.

The AeDebug key I use for 64-bit JIT debugging is

C:\ntsd\ntsd -p %ld -e %ld -g -noio -c ".dump /ma /u c:\TEMP\new.dmp; q"

It is always good to double check settings with TestDefaultDebugger tool.

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump - A Mathematical Definition

Wednesday, October 24th, 2007

This is the first post in Science of Memory Dump Analysis category where I apply philosophy, systems theory, mathematics, physics and computer science ideas. It was inspired after reading Life Itself book written by Robert Rosen where computers are depicted as direct sums of states. As shown in that book, in the case of machines, their synthetic models (direct sums) are equivalent to analytic models (direct product of observables). Taking every single bit as an observable having its values in Z2 set {0, 1} we can make a definition of an ideal memory dump as a direct product or a direct sum of bits saved instantaneously at the given time:

i si = i si

Of course, we can also consider bytes having 8 bits as observables having their values from Z256 set, etc.

In our case we can simply rewrite direct sum or product as the list of bits, bytes, words or double words, etc:

(…, si-1, si, si+1, …, sj-1, sj, sj+1, …)

According to Rosen we include hardware states (registers, for example) and partition memory into input, output states for particular computation and other states.

Saving a memory dump takes certain amount of time. Suppose that it takes 3 discrete time events (ticks). During the first tick we save memory up to (…, si-1, si) and that memory has some relationship to sj state. During the second tick sj state changes its value and during the 3rd tick we copy the rest of the memory (si+1, …, sj-1, sj, sj+1, …). Now we see that the final memory dump is inconsistent:

(…, si-1, si, si+1, …, sj-1, sj, sj+1, …)

I explained this earlier in plain words in Inconsistent Dump pattern. Therefore we might consider a real memory dump as a direct sum of disjoint memory areas Mt taken during some time interval (t0, …, tn)

M = t Mt where Mt = k stk or simply

M = t k stk

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 32)

Tuesday, October 23rd, 2007

When we look at a thread and it is not in the Passive Thread list and looks more like Blocked Thread we may ask whether it is Main Thread. Every process has at least one thread of execution called main or primary thread. Most GUI applications have window message processing loop inside their main process thread. When a memory dump is saved it is most likely that this thread is blocked waiting for window or user-defined messages to arrive and can be considered as Passive Thread. If we see it blocked on something else waiting for some time we may consider that the application was hanging. 

Here is an example of the normal iexplore.exe thread stack taken from a kernel dump:

PROCESS 88de4140  SessionId: 3  Cid: 15a8    Peb: 7ffdf000  ParentCid: 0e28
    DirBase: 0a43df40  ObjectTable: 88efe008  TableSize: 852.
    Image: IEXPLORE.EXE
    VadRoot 88dbbca8 Clone 0 Private 6604. Modified 951. Locked 0.
    DeviceMap 88de6408
    Token                             e3f5ccf0
    ElapsedTime                        0:10:52.0281
    UserTime                          0:00:06.0250
    KernelTime                        0:00:10.0421
    QuotaPoolUsage[PagedPool]         126784
    QuotaPoolUsage[NonPagedPool]      197704
    Working Set Sizes (now,min,max)  (8347, 50, 345) (33388KB, 200KB, 1380KB)
    PeakWorkingSetSize                10000
    VirtualSize                       280 Mb
    PeakVirtualSize                   291 Mb
    PageFaultCount                    15627
    MemoryPriority                    FOREGROUND
    BasePriority                      8
    CommitCharge                      7440

THREAD 88ee2b00  Cid 15a8.1654  Teb: 7ffde000  Win32Thread: a2242018 WAIT: (WrUserRequest) UserMode Non-Alertable
    88f82ee0  SynchronizationEvent
Not impersonating
Owning Process 88de4140
Wait Start TickCount    104916        Elapsed Ticks: 0
Context Switch Count    100208                   LargeStack
UserTime                  0:00:04.0484
KernelTime                0:00:09.0859
Start Address KERNEL32!BaseProcessStartThunk (0x7c57b70c)
Stack Init be597000 Current be596cc8 Base be597000 Limit be58f000 Call 0
Priority 12 BasePriority 8 PriorityDecrement 0 DecrementCount 0

ChildEBP RetAddr
be596ce0 8042d8d7 nt!KiSwapThread+0x1b1
be596d08 a00019c2 nt!KeWaitForSingleObject+0x1a3
be596d44 a00138c5 win32k!xxxSleepThread+0x18a
be596d54 a00138d1 win32k!xxxWaitMessage+0xe
be596d5c 8046b2a9 win32k!NtUserWaitMessage+0xb
be596d5c 77e3c7cd nt!KiSystemService+0xc9

In the same kernel dump there is another iexplore.exe process with the following main thread stack which had been blocked for 31 seconds:

PROCESS 8811ca00  SessionId: 21  Cid: 4d18    Peb: 7ffdf000  ParentCid: 34c8
    DirBase: 0a086ee0  ObjectTable: 87d07528  TableSize: 677.
    Image: IEXPLORE.EXE
    VadRoot 87a92ae8 Clone 0 Private 4600. Modified 227. Locked 0.
    DeviceMap 88b174e8
    Token                             e49508d0
    ElapsedTime                        0:08:03.0062
    UserTime                          0:00:01.0531
    KernelTime                        0:00:10.0375
    QuotaPoolUsage[PagedPool]         120792
    QuotaPoolUsage[NonPagedPool]      198376
    Working Set Sizes (now,min,max)  (7726, 50, 345) (30904KB, 200KB, 1380KB)
    PeakWorkingSetSize                7735
    VirtualSize                       272 Mb
    PeakVirtualSize                   275 Mb
    PageFaultCount                    11688
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      6498

THREAD 87ce6da0  Cid 4d18.4c68  Teb: 7ffde000  Win32Thread: a22157b8 WAIT: (Executive) KernelMode Non-Alertable
    b5bd6370  NotificationEvent
IRP List:
    885d4808: (0006,00dc) Flags: 00000014  Mdl: 00000000
Not impersonating
Owning Process 8811ca00
Wait Start TickCount    102908        Elapsed Ticks: 2008
Context Switch Count    130138                   LargeStack
UserTime                  0:00:01.0125
KernelTime                0:00:08.0843
Start Address KERNEL32!BaseProcessStartThunk (0×7c57b70c)
Stack Init b5bd7000 Current b5bd62f4 Base b5bd7000 Limit b5bcf000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0

ChildEBP RetAddr
b5bd630c 8042d8d7 nt!KiSwapThread+0x1b1
b5bd6334 bf09342d nt!KeWaitForSingleObject+0x1a3
b5bd6380 bf08896f mrxsmb!SmbCeAssociateExchangeWithMid+0x24b
b5bd63b0 bf0aa0ef mrxsmb!SmbCeTranceive+0xff
b5bd6490 bf0a92df mrxsmb!SmbTransactExchangeStart+0x559
b5bd64a8 bf0a9987 mrxsmb!SmbCeInitiateExchange+0x2ac
b5bd64c4 bf0a96e2 mrxsmb!SmbCeSubmitTransactionRequest+0x124
b5bd6524 bf0ac7c3 mrxsmb!_SmbCeTransact+0x86
b5bd6608 bf104ea0 mrxsmb!MRxSmbQueryFileInformation+0x553
b5bd66b4 bf103aff rdbss!__RxInitializeTopLevelIrpContext+0x52
b5bd6784 bf10da73 rdbss!WPP_SF_ZL+0x4b
b5bd67b4 bf0a8b29 rdbss!RxCleanupPipeQueues+0x117
b5bd67d4 8041ef05 mrxsmb!MRxSmbFsdDispatch+0x118
b5bd67e8 eb833839 nt!IopfCallDriver+0x35
b5bd6890 804a8109 nt!IopQueryXxxInformation+0x164
b5bd68b0 804c7d63 nt!IoQueryFileInformation+0x19
b5bd6a4c 80456562 nt!IopParseDevice+0xe8f
b5bd6ac4 804de0c0 nt!ObpLookupObjectName+0x504
b5bd6bd4 804a929b nt!ObOpenObjectByName+0xc8
b5bd6d54 8046b2a9 nt!NtQueryFullAttributesFile+0xe7
b5bd6d54 77f88887 nt!KiSystemService+0xc9

0: kd> !whattime 0n2008
2008 Ticks in Standard Time: 31.375s

Main thread need not be GUI thread. Most input console applications have ReadConsole calls in normal main process thread stack:

0:000> kL
ChildEBP RetAddr
0012fc6c 77d20190 ntdll!KiFastSystemCallRet
0012fc70 77d27fdf ntdll!NtRequestWaitReplyPort+0xc
0012fc90 765d705c ntdll!CsrClientCallServer+0xc2
0012fd8c 76634674 kernel32!ReadConsoleInternal+0x1cd
0012fe14 765eaf6a kernel32!ReadConsoleA+0×40
0012fe7c 6ec35196 kernel32!ReadFile+0×84
0012fec0 6ec35616 MSVCR80!_read_nolock+0×201
0012ff04 6ec45928 MSVCR80!_read+0xc0
0012ff1c 6ec49e47 MSVCR80!_filbuf+0×78
0012ff54 0040100d MSVCR80!getc+0×113
0012ff5c 0040117c ConsoleTest!wmain+0xd
0012ffa0 765d3833 ConsoleTest!__tmainCRTStartup+0×10f
0012ffac 77cfa9bd kernel32!BaseThreadInitThunk+0xe
0012ffec 00000000 ntdll!_RtlUserThreadStart+0×23

0:000> kL
ChildEBP RetAddr
001cf594 77d20190 ntdll!KiFastSystemCallRet
001cf598 77d27fdf ntdll!NtRequestWaitReplyPort+0xc
001cf5b8 765d705c ntdll!CsrClientCallServer+0xc2
001cf6b4 765d6efe kernel32!ReadConsoleInternal+0x1cd
001cf740 49ecd538 kernel32!ReadConsoleW+0×47
001cf7a8 49ecd645 cmd!ReadBufFromConsole+0xb5
001cf7d4 49ec2247 cmd!FillBuf+0×175
001cf7d8 49ec2165 cmd!GetByte+0×11
001cf7f4 49ec20d8 cmd!Lex+0×75
001cf80c 49ec207f cmd!GeToken+0×27
001cf81c 49ec200a cmd!ParseStatement+0×36
001cf830 49ec6038 cmd!Parser+0×46
001cf878 49ecc703 cmd!main+0×1de
001cf8bc 765d3833 cmd!_initterm_e+0×163
001cf8c8 77cfa9bd kernel32!BaseThreadInitThunk+0xe
001cf908 00000000 ntdll!_RtlUserThreadStart+0×23

- Dmitry Vostokov @ DumpAnalysis.org -

Old dumps, new extensions

Tuesday, October 23rd, 2007

Up to now I’ve been using old Windows 2000 WinDbg extensions to extract information from Windows 2003 and XP crash dumps when their native extensions failed. Today I have found I can do the way around, to extract information from old Windows 2000 crash dumps using WinDbg extensions written for Windows XP and later. Here is an example. WinDbg !stacks command shows the following not really helpful output from Windows 2000 complete memory dump:

2: kd> !stacks
Proc.Thread  Thread   Ticks   ThreadState Blocker
                                     [System]
   8.000004  89df8220 0000000 BLOCKED     nt!KiSwapThread+0x1b1
   8.00000c  89dc1860 0003734 BLOCKED     nt!KiSwapThread+0x1b1
   8.000010  89dc15e0 0003734 BLOCKED     nt!KiSwapThread+0x1b1
   8.000014  89dc1360 00003b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.000018  89dc10e0 0003734 BLOCKED     nt!KiSwapThread+0x1b1
   8.00001c  89dc0020 0000381 BLOCKED     nt!KiSwapThread+0x1b1
   8.000020  89dc0da0 00066f6 BLOCKED     nt!KiSwapThread+0x1b1
   8.000024  89dc0b20 00025b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.000028  89dc08a0 00025b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.00002c  89dc0620 0003734 BLOCKED     nt!KiSwapThread+0x1b1
   8.000030  89dc03a0 0003734 BLOCKED     nt!KiSwapThread+0x1b1
   8.000034  89dbf020 00025b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.000038  89dbfda0 00025b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.00003c  89dbfb20 00007b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.000040  89dbf8a0 00007b4 BLOCKED     nt!KiSwapThread+0x1b1
   8.000044  89dbf620 0000074 BLOCKED     nt!KiSwapThread+0x1b1
   8.000048  89dbf3a0 00007b4 BLOCKED     nt!KiSwapThread+0x1b1
...
...
...

This command belongs to different WinDbg extension DLLs (from WinDbg help):

Windows NT 4.0         Unavailable
Windows 2000           Kdextx86.dll
Windows XP and later   Kdexts.dll

and I tried newer kdexts.dll with better results:

2: kd> !winxp\kdexts.stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
                            [89df84a0 System]
   8.0000c8  89db77c0 0000000 Blocked    nt!MiRemoveUnusedSegments+0xf4
   8.0000f0  89c8a020 0019607 Blocked    cpqasm2+0x1ef0
   8.000108  89881900 0000085 Blocked    CPQCISSE+0x3ae8
   8.000110  8982cda0 000000a Blocked    cpqasm2+0x2a523
   8.00013c  8974a9a0 00007d7 Blocked    rdbss!RxSetMinirdrCancelRoutine+0x3d
   8.000148  89747b20 000010a Blocked    rdbss!RxIsOkToPurgeFcb+0x3f
   8.00014c  89758a80 0019493 Blocked    nt!NtNotifyChangeMultipleKeys+0x434
   8.0002dc  89620680 000000e Blocked    cpqasm2+0x5523
   8.0002e0  89620400 00000d2 Blocked    cpqasm2+0x584d
   8.0004ac  895ae9c0 000955b Blocked    srv!SrvOemStringTo8dot3+0xb7
   8.0004c0  8937b4e0 0018fea Blocked    srv!SrvOemStringTo8dot3+0xb7
   8.0004a0  895b09e0 0018fe9 Blocked    srv!SrvOemStringTo8dot3+0xb7
   8.0004cc  893784e0 0018fe8 Blocked    srv!SrvOemStringTo8dot3+0xb7
   8.0004d0  893774e0 000955b Blocked    srv!SrvOemStringTo8dot3+0xb7
   8.0004d4  893764e0 0018fe8 Blocked    srv!SrvOemStringTo8dot3+0xb7
   8.003d68  87abb580 00000b7 Blocked    rdbss!RxSearchForCollapsibleOpen+0x17c
   8.002b94  88e4f180 00000b9 Blocked    rdbss!RxSearchForCollapsibleOpen+0x17c

                            [89736940 smss.exe]

                            [896d3b20 csrss.exe]
 178.000180  896c8020 0000012 Blocked    ntdll!NtReplyWaitReceivePort+0xb
 178.00018c  896c5320 0000012 Blocked    ntdll!NtReplyWaitReceivePort+0xb
 178.001260  88fbcb20 0000060 Blocked    ntdll!NtReplyWaitReceivePort+0xb
 178.001268  88fbbda0 0000060 Blocked    ntdll!NtReplyWaitReceivePort+0xb

                            [896c8740 WINLOGON.EXE]
 174.00019c  896b7740 0000299 Blocked    ntdll!ZwDelayExecution+0xb
 174.0001a0  896b6020 00015dd Blocked    ntdll!NtRemoveIoCompletion+0xb
 174.000f08  8913eda0 00000b0 Blocked    ntdll!ZwWaitForMultipleObjects+0xb
 174.000f0c  8901b020 00000b0 Blocked    ntdll!ZwWaitForSingleObject+0xb

- Dmitry Vostokov @ DumpAnalysis.org -

Local crash dumps on Vista

Thursday, October 18th, 2007

It appears that Microsoft decided to help customers to save full user dumps locally for later postmortem analysis. According to MSDN this is done via LocalDumps registry key starting from Vista SP1 and Windows Server 2008:

http://msdn2.microsoft.com/en-us/library/bb787181.aspx

This is a quote from the article above:

[…] Prior to application termination, the system will check the registry settings to determine whether a local dump is to be collected. The registry settings control whether a full dump is collected versus a minidump. The custom flags specified also determine which information is collected in the dump. […] You can make use of the local dump collection even if WER is disabled. The local dumps are collected even if the user cancels WER reporting at any point. […]

From my understanding it is independent from the default postmortem debugger mechanism via AeDebug registry key and might help to solve the problem with native services. I haven’t tried it yet but will do as soon as I install Vista SP1 or install Windows Server 2008 RC0. If it works then dump collection might be easier in production environments because of no need to install Debugging Tools for Windows to set up a postmortem debugger.

- Dmitry Vostokov @ DumpAnalysis.org -

Minidump Analysis (Part 4)

Thursday, October 11th, 2007

In part 3 we explored raw stack dumps. Now suppose we have a minidump with a stack trace that involves our product driver and due to some reason WinDbg doesn’t pick symbols automatically and shows the following stack trace and crash address that point to driver.sys module:

1: kd> kL
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
ba0fd0e4 bfabd64b driver+0×2df2a
ba0fd1c8 bf8b495d driver+0×1f64b
ba0fd27c bf9166ae win32k!NtGdiBitBlt+0×52d
ba0fd2d8 bf9168d0 win32k!TileWallpaper+0xd4
ba0fd2f8 bf826c83 win32k!xxxDrawWallpaper+0×50
ba0fd324 bf8651df win32k!xxxDesktopPaintCallback+0×48
ba0fd388 bf8280f3 win32k!xxxEnumDisplayMonitors+0×13a
ba0fd3d4 bf8283ab win32k!xxxInternalPaintDesktop+0×66
ba0fd3f8 80833bdf win32k!NtUserPaintDesktop+0×41
ba0fd3f8 7c9485ec nt!KiFastCallEntry+0xfc

1: kd> r
eax=000007d0 ebx=000007d0 ecx=00000086 edx=bfb371a3 esi=bc492000 edi=bfb3775b
eip=bfacbf2a esp=ba0fd0b8 ebp=ba0fd0e4 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202
driver+0×2df2a:
bfacbf2a f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:bfb3775b=e4405a64 ds:0023:bc492000=????????

We can get timestamp of this module too: 

1: kd> lmv m driver
start    end        module name
bfa9e000 bfb42a00   driver   T (no symbols)
    Loaded symbol image file: driver.sys
    Image path: driver.sys
    Image name: driver.sys
    Timestamp:        Thu Mar 01 20:50:04 2007 (45E73C7C)
    CheckSum:         000A5062
    ImageSize:        000A4A00
    Translations:     0000.04b0 0000.04e0 0409.04b0 0409.04e0

We see that no symbols for driver.sys were found and this is also indicated by the absence of function names and huge code offsets like 0×2df2a. Perhaps we don’t have a symbol server and store our symbol files somewhere. Or we got symbols from the developer of the recent fix that bugchecks and we want to apply them. In any case if we add a path to Symbol Search Path dialog (File -> Symbol File Path …) or use .sympath WinDbg command

we are able to get better stack trace and crash point:

1: kd> .reload
Loading Kernel Symbols
...
Loading User Symbols
Loading unloaded module list
...
Unable to load image driver.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for driver.sys

1: kd> kL
ChildEBP RetAddr
ba0fd0c0 bfabc399 driver!ProcessBytes+0×18
ba0fd0e4 bfabd64b driver!ProcessObject+0xc9
ba0fd1c8 bf8b495d driver!CacheBitBlt+0×13d
ba0fd27c bf9166ae win32k!NtGdiBitBlt+0×52d
ba0fd2d8 bf9168d0 win32k!TileWallpaper+0xd4
ba0fd2f8 bf826c83 win32k!xxxDrawWallpaper+0×50
ba0fd324 bf8651df win32k!xxxDesktopPaintCallback+0×48
ba0fd388 bf8280f3 win32k!xxxEnumDisplayMonitors+0×13a
ba0fd3d4 bf8283ab win32k!xxxInternalPaintDesktop+0×66
ba0fd3f8 80833bdf win32k!NtUserPaintDesktop+0×41
ba0fd3f8 7c9485ec nt!KiFastCallEntry+0xfc

1: kd> r
eax=000007d0 ebx=000007d0 ecx=00000086 edx=bfb371a3 esi=bc492000 edi=bfb3775b
eip=bfacbf2a esp=ba0fd0b8 ebp=ba0fd0e4 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202
driver!ProcessBytes+0×18:
bfacbf2a f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:bfb3775b=e4405a64 ds:0023:bc492000=????????

Because WinDbg reports that it was unable to verify timestamp for driver.sys we might want to double check the return address saved when ProcessBytes function was called. If symbols are correct then disassembling the return address backwards will most likely show ProcessObject function code and “call” instruction with ProcessBytes address. Unfortunately minidumps don’t have code except for the currently executing function:

1: kd> ub bfabc399
                 ^ Unable to find valid previous instruction for 'ub bfabc399'

1: kd> uf driver!ProcessObject
No code found, aborting

Therefore we need to point WinDbg to our driver.sys which contains executable code. This can be done by specifying a path in Executable Image Search Path dialog (File -> Image File Path …) or using .exepath WinDbg command.

Now we get more complete stack trace and we are able to double check the return address:

1: kd> .reload
Loading Kernel Symbols
...
Loading User Symbols
Loading unloaded module list
...

1: kd> kL
ChildEBP RetAddr
ba0fd0c0 bfabc399 driver!ProcessBytes+0×18
ba0fd0e4 bfabd64b driver!ProcessObject+0xc9
ba0fd104 bfac5aac driver!CacheBitBlt+0×13d
ba0fd114 bfac6840 driver!ProcessCommand+0×150
ba0fd140 bfac1878 driver!CheckSurface+0×258
ba0fd178 bfaba0ee driver!CopyBitsEx+0xfa
ba0fd1c8 bf8b495d driver!DrvCopyBits+0xb6
ba0fd27c bf9166ae win32k!NtGdiBitBlt+0×52d
ba0fd2d8 bf9168d0 win32k!TileWallpaper+0xd4
ba0fd2f8 bf826c83 win32k!xxxDrawWallpaper+0×50
ba0fd324 bf8651df win32k!xxxDesktopPaintCallback+0×48
ba0fd388 bf8280f3 win32k!xxxEnumDisplayMonitors+0×13a
ba0fd3d4 bf8283ab win32k!xxxInternalPaintDesktop+0×66
ba0fd3f8 80833bdf win32k!NtUserPaintDesktop+0×41
ba0fd3f8 7c9485ec nt!KiFastCallEntry+0xfc

1: kd> ub bfabc399
driver!ProcessObject+0xb7:
bfabc387 57              push    edi
bfabc388 40              inc     eax
bfabc389 50              push    eax
bfabc38a e861fb0000      call    driver!convert (bfacbef0)
bfabc38f ff7508          push    dword ptr [ebp+8]
bfabc392 57              push    edi
bfabc393 50              push    eax
bfabc394 e879fb0000      call    driver!ProcessBytes (bfacbf12)

- Dmitry Vostokov @ DumpAnalysis.org -

Heaps and heap corruption explained

Tuesday, October 2nd, 2007

Excellent free chapter explaining process heap implementation and debugging heap corruption issues from the authors of Advanced Windows Debugging book:

Sample Chapter

- Dmitry Vostokov @ DumpAnalysis.org -

Windows service crash dumps in Vista

Monday, October 1st, 2007

I was playing with Vista Platform SDK samples to create the minimal native Windows service that crashes to test various postmortem debugger configurations, Windows Error Reporting (WER) options and conditions under which crash dumps are available. Initially I put a NULL pointer dereference into the service control handler processing service stop command and although the service crashed under WinDbg I couldn’t get CDB or NTSD configured as a default postmortem debugger to save the crash dump automatically. I tested under x64 Vista and Windows Server 2003 x64 both 32-bit and 64-bit versions of my service.

Here is the source code and stack trace from WinDbg when we attach it to the running service and then try to stop it:

//
// FUNCTION: service_ctrl
//
// PURPOSE: This function is called by the SCM whenever
// ControlService() is called on this service.
//
// PARAMETERS:
// dwCtrlCode - type of control requested
//
// RETURN VALUE:
// none
//
// COMMENTS:
//
VOID WINAPI service_ctrl(DWORD dwCtrlCode)
{
  // Handle the requested control code.
  //
  switch (dwCtrlCode)
  {
  case SERVICE_CONTROL_STOP:
    *(int *)NULL = 0;
    ReportStatusToSCMgr(SERVICE_STOP_PENDING, NO_ERROR, 0);
    ServiceStop();
    return;

  // Update the service status.
  //
  case SERVICE_CONTROL_INTERROGATE:
    break;

  // invalid control code
  //
  default:
    break;

}

ReportStatusToSCMgr(ssStatus.dwCurrentState, NO_ERROR, 0);
}

0:000> r
rax=0000000000000001 rbx=00000000001e36d0 rcx=0000000000000001
rdx=000000000a9ff32c rsi=0000000000000000 rdi=0000000000401aa0
rip=0000000000401ab9 rsp=000000000012fab0 rbp=00000000001e36d0
 r8=0000000000400000  r9=0000000077b3f990 r10=00000000004000d8
r11=00000000004000d8 r12=0000000000000000 r13=000000000a9ff32c
r14=00000000001e36e8 r15=000000000012fc20
iopl=0 nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b  efl=00010246
simple!service_ctrl+0x19:
00000000`00401ab9 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fab0 000007fe`fe276cee simple!service_ctrl+0x19
00000000`0012faf0 000007fe`fe2cea5d ADVAPI32!ScDispatcherLoop+0x54c
00000000`0012fbf0 00000000`004019f5 ADVAPI32!StartServiceCtrlDispatcherA+0x8d
00000000`0012fe70 00000000`00408b8c simple!main+0x155
00000000`0012fec0 00000000`0040895e simple!__tmainCRTStartup+0x21c
00000000`0012ff30 00000000`7792cdcd simple!mainCRTStartup+0xe
00000000`0012ff60 00000000`77a7c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

If we put while(1); code instead of NULL pointer dereference the process will be interrupted via breakpoint and then terminated. There is no postmortem dump saved too. Therefore it looks like any fault inside the service main thread is not allowed to execute the potentially blocking operation of unhandled exception filter perhaps to avoid blocking the service control manager (SCM) communicating with service dispatcher code.

On Vista if Windows Error Reporting service is running and WER is configured in Control Panel to allow a user to choose reporting settings we get the following familiar dialog but without Debug option to attach a postmortem debugger and save a crash dump:

If we choose the recommended option we get the following dialog showing the path where a minidump file was temporarily stored:

  

We need to leave this dialog open if we want to open the crash dump or copy it to another location otherwise report files will be removed as soon as we dismiss the dialog box (they may be stored temporarily in another location - check Problem Reports and Solutions\View Problem History in Control Panel). If we open the crash dump using WinDbg we get the same stack trace that we got previously during live debugging:

Loading Dump File [C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Report19527353 \WER7346.tmp.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Fri Sep 28 16:36:38.000 2007 (GMT+1)
System Uptime: 2 days 1:42:22.810
Process Uptime: 0 days 0:00:10.000
.....
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(13b0.d54): Access violation - code c0000005 (first/second chance not available)
*** WARNING: Unable to verify checksum for simple.exe
simple!service_ctrl+0x19:
00000000`00401ab9 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????
0:000> k
Child-SP          RetAddr           Call Site
00000000`0012fab0 000007fe`fe276cee simple!service_ctrl+0x19
00000000`0012faf0 000007fe`fe2cea5d advapi32!ScDispatcherLoop+0x54c
00000000`0012fbf0 00000000`004019f5 advapi32!StartServiceCtrlDispatcherA+0x8d
00000000`0012fe70 00000000`00408b8c simple!main+0x155
00000000`0012fec0 00000000`0040895e simple!__tmainCRTStartup+0x21c
00000000`0012ff30 00000000`7792cdcd simple!mainCRTStartup+0xe
00000000`0012ff60 00000000`77a7c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Fault in any other service thread, for example, the one that SCM starts per every SERVICE_TABLE_ENTRY in dispatch table results in a default postmortem debugger saving a crash dump on Windows Server 2003 x64 but not on Vista x64 or Vista x86 (32-bit):

void __cdecl main(int argc, char **argv)
{
  SERVICE_TABLE_ENTRY dispatchTable[] =
  {
  { TEXT(SZSERVICENAME), (LPSERVICE_MAIN_FUNCTION)service_main},
  { NULL, NULL}
  };
  ...
  ...
  ...
  if (!StartServiceCtrlDispatcher(dispatchTable))
    AddToMessageLog(TEXT("StartServiceCtrlDispatcher failed."));
}

void WINAPI service_main(DWORD dwArgc, LPTSTR *lpszArgv)
{
  // register our service control handler:
  //
  sshStatusHandle = RegisterServiceCtrlHandler( TEXT(SZSERVICENAME), service_ctrl);

  if (!sshStatusHandle)
    goto cleanup;

  // SERVICE_STATUS members that don't change in example
  //
  ssStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS;
  ssStatus.dwServiceSpecificExitCode = 0;

  // report the status to the service control manager.
  //
  if (!ReportStatusToSCMgr(
      SERVICE_START_PENDING, // service state
      NO_ERROR, // exit code
      3000)) // wait hint
    goto cleanup;
  *(int *)NULL = 0;
  …
  …
  …
}

Seems the only way to get a crash mindump for analysis is to copy it from the report data like I explained above:

Loading Dump File [C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Report0fa05f9d \WER5F42.tmp.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Fri Sep 28 17:50:06.000 2007 (GMT+1)
System Uptime: 0 days 0:30:59.495
Process Uptime: 0 days 0:00:04.000
.....
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(d6c.fcc): Access violation - code c0000005 (first/second chance not available)
*** WARNING: Unable to verify checksum for simple.exe
simple!service_main+0x60:
00000000`00401aa0 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????
0:001> ~*k

   0  Id: d6c.cf4 Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0012f978 00000000`777026da ntdll!NtReadFile+0xa
00000000`0012f980 000007fe`feb265aa kernel32!ReadFile+0x8a
00000000`0012fa10 000007fe`feb262e3 advapi32!ScGetPipeInput+0x4a
00000000`0012faf0 000007fe`feb7ea5d advapi32!ScDispatcherLoop+0x9a
00000000`0012fbf0 00000000`004019f5 advapi32!StartServiceCtrlDispatcherA+0x8d
00000000`0012fe70 00000000`00408bac simple!main+0x155
00000000`0012fec0 00000000`0040897e simple!__tmainCRTStartup+0x21c
00000000`0012ff30 00000000`7770cdcd simple!mainCRTStartup+0xe
00000000`0012ff60 00000000`7792c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

#  1  Id: d6c.fcc Suspend: 0 Teb: 000007ff`fffdb000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`008eff00 000007fe`feb24bf5 simple!service_main+0x60
00000000`008eff30 00000000`7770cdcd advapi32!ScSvcctrlThreadW+0x25
00000000`008eff60 00000000`7792c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`008eff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Spawning a custom thread with NULL pointer access violation doesn’t result in a crash dump on my Vista x86 and x64 too. Therefore it appears that there are no automatic postmortem crash dumps saved for native Window services in Vista unless there is some setting that I missed. This might create some problems for traditional 3rd party Technical Support procedures especially if Windows Server 2008 (Longhorn) will have the same behavior.   

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 28)

Wednesday, September 26th, 2007

Sometimes we have a problem that some functionality is not available or it is unresponsive when we request it. We can suppose that the process implementing that functionality has crashed or hangs. If we know the relationship between processes we can request several user dumps at once or a complete memory dump to analyze the dependency between processes by looking at their stack traces. This is an example of the system level crash dump analysis pattern that I call Coupled Processes.

Process relationship can be implemented via different interprocess communication mechanisms (IPC), for example, Remote Procedure Call (RPC) via LPC (Local Procedure Call) which can be easily identified in stack traces.

My favorite example here is when some application tries to print and hangs. Printing API is exported from WINSPOOL.DLL and it forwards via RPC most requests to Windows Print Spooler service. Therefore it is logical to take two dumps, one from that application and one from spoolsv.exe. Similar example is from Citrix Presentation Server environments related to printer autocreation when there are dependencies between Citrix Printing Service CpSvc.exe and spoolsv.exe. Therefore if new user connections hang and restarting both printing services resolves the issue then we might need to analyze dumps from both services together to confirm this Procedure Call Chain and find the problem 3rd-party printing component or driver.

Back to my favorite example. In the hang application we have the following thread:

  18  Id: 2130.6320 Suspend: 1 Teb: 7ffa8000 Unfrozen
ChildEBP RetAddr
01eae170 7c821c94 ntdll!KiFastSystemCallRet
01eae174 77c72700 ntdll!NtRequestWaitReplyPort+0xc
01eae1c8 77c713ba rpcrt4!LRPC_CCALL::SendReceive+0x230
01eae1d4 77c72c7f rpcrt4!I_RpcSendReceive+0x24
01eae1e8 77ce219b rpcrt4!NdrSendReceive+0x2b
01eae5d0 7307c9ef rpcrt4!NdrClientCall2+0x22e
01eae5e8 73082d8d winspool!RpcAddPrinter+0x1c
01eaea70 0040d81a winspool!AddPrinterW+0x102
01eaef58 0040ee7c App!AddNewPrinter+0x816
...
...
...

Notice winspool and rpcrt4 modules. The application is calling spooler service using RPC to add a new printer and waiting for a reply back. Looking at spooler service dump shows several threads displaying message boxes and waiting for user input: 

  20  Id: 790.5950 Suspend: 1 Teb: 7ffa2000 Unfrozen
ChildEBP RetAddr  Args to Child
03deea70 7739d02f 77392bf3 00000000 00000000 ntdll!KiFastSystemCallRet
03deeaa8 7738f122 03dd0058 00000000 00000001 user32!NtUserWaitMessage+0xc
03deead0 773a1722 77380000 00123690 00000000 user32!InternalDialogBox+0xd0
03deed90 773a1004 03deeeec 03dae378 03dae160 user32!SoftModalMessageBox+0x94b
03deeee0 773b1a28 03deeeec 00000028 00000000 user32!MessageBoxWorker+0x2ba
03deef38 773b19c4 00000000 03defb9c 03def39c user32!MessageBoxTimeoutW+0x7a
03deef58 773b19a0 00000000 03defb9c 03def39c user32!MessageBoxExW+0x1b
03deef74 021f265b 00000000 03defb9c 03def39c user32!MessageBoxW+0×45
WARNING: Stack unwind information not available. Following frames may be wrong.
03deef88 00000000 03dae160 03deffec 03dae16a PrinterDriver!UninstallerInstall+0×2cb

Dumping the 3rd parameter of MessageBoxW using WinDbg du command shows the message:

“Installation of the software for your printer is now complete. Restart your computer to make the new settings active.”

Another example when one process starts another and is waiting for it to finish:

0 Id: 2a34.24d0 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
0007ec8c 7c822124 ntdll!KiFastSystemCallRet
0007ec90 77e6bad8 ntdll!NtWaitForSingleObject+0xc
0007ed00 77e6ba42 kernel32!WaitForSingleObjectEx+0xac
0007ed14 01002f4c kernel32!WaitForSingleObject+0x12
0007f79c 01003137 userinit!ExecApplication+0x2d3
0007f7dc 0100366b userinit!ExecProcesses+0x1bb
0007fe68 010041fd userinit!StartTheShell+0x132
0007ff1c 010056f1 userinit!WinMain+0x263
0007ffc0 77e523e5 userinit!WinMainCRTStartup+0x186
0007fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov @ DumpAnalysis.org -

Resolving “Symbol file could not be found”

Monday, September 17th, 2007

On one of my debugging workstations I couldn’t analyze kernel and complete memory dumps from Windows 2003 Server R02. I was always getting this message: 

*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ntkrnlmp.exe -

An attempt to reload and overwrite PDB files using .reload /o /f command didn’t resolve the issue but the following WinDbg command helped:

1: kd> !sym noisy
noisy mode - symbol prompts on

Reloading symbol files showed that default symbol path contained corrupt ntkrnlmp.pdb:  

1: kd> .reload
DBGHELP: C:\Program Files\Debugging Tools for Windows\sym\ntkrnlmp.pdb\A91CA63E49A840F4A50509F90ADE10D52\ntkrnlmp.pdb - E_PDB_CORRUPT
DBGHELP: ntkrnlmp.pdb - file not found
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ntkrnlmp.exe -
DBGHELP: nt - export symbol

Deleting it and reloading symbols again showed problems with the file downloaded from MS symbol server too: 

1: kd> .reload
SYMSRV:  c:\mss\ntkrnlmp.pdb\A91CA63E49A840F4A50509F90ADE10D52\ntkrnlmp.pd_
         The file or directory is corrupted and unreadable.
DBGHELP: ntkrnlmp.pdb - file not found
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ntkrnlmp.exe -
DBGHELP: nt - export symbols

Removing the folder and reloading symbols resolved the problem: 

1: kd> .reload
DBGHELP: nt - public symbols
         c:\mss\ntkrnlmp.pdb\A91CA63E49A840F4A50509F90ADE10D52\ntkrnlmp.pdb

Now it was time to switch noisy mode off:

1: kd> !sym quiet
quiet mode - symbol prompts on

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 27)

Friday, September 14th, 2007

Sometimes a problem can be identified not from a single Stack Trace pattern but from a Stack Trace Collection

These include Coupled ProcessesProcedure Call Chains and Blocked Threads. All of them will be discussed in subsequent parts and in this part I only discuss various methods to list stack traces.

  • Process dumps including various process minidumps:

~*kv command lists all process threads

!findstack module[!symbol] 2 command filters all stack traces to show ones containing module or module!symbol

!uniqstack command

  • Kernel minidumps:

have only one problem thread. kv command or its variant is suffice.

  • Kernel and complete memory dumps:

!process 0 ff command lists all processes and their threads including user space process thread stacks for complete memory dumps. This command is valid for Windows XP and later. For older systems use WinDbg scripts

!stacks 2 [module[!symbol]] command shows kernel mode stack traces and you can filter the output based on module or module!symbol. Filtering is valid only for dumps from Windows XP and later systems    

~[ProcessorN]s;.reload /user;kv command sequence shows stack trace for the running thread on the specified processor.

!sprocess 0n<SessionId> ff lists all processes and their threads for the specified [terminal services] session. 

!for_each_thread “.thread /r /p @#Thread; kvallows execution of stack trace command variants per each thread. The following script takes the advantage of this command to list complete stack traces from an x64 system.

The processor change command is illustrated in this example:

0: kd> ~2s

2: kd> k
ChildEBP RetAddr
eb42bd58 00000000 nt!KiIdleLoop+0x14

2: kd> ~1s;.reload /user;k
Loading User Symbols
...
ChildEBP RetAddr
be4f8c30 eb091f43 i8042prt!I8xProcessCrashDump+0x53
be4f8c8c 8046bfe2 i8042prt!I8042KeyboardInterruptService+0x15d
be4f8c8c 8049470f nt!KiInterruptDispatch+0x32
be4f8d54 80468389 nt!NtSetEvent+0x71
be4f8d54 77f8290a nt!KiSystemService+0xc9
081cfefc 77f88266 ntdll!ZwSetEvent+0xb
081cff0c 77f881b1 ntdll!RtlpUnWaitCriticalSection+0x1b
081cff14 1b00c7d1 ntdll!RtlLeaveCriticalSection+0x1d
081cff4c 1b0034da msjet40!Database::ReadPages+0x81
081cffb4 7c57b3bc msjet40!System::WorkerThread+0x115
081cffec 00000000 KERNEL32!BaseThreadStart+0x52

Example of !findstack command (process dump):

0:000> !findstack kernel32!RaiseException 2
Thread 000, 1 frame(s) match
* 00 0013b3f8 72e8d3ef kernel32!RaiseException+0x53
  01 0013b418 72e9a26b msxml3!Exception::raiseException+0x5f
  02 0013b424 72e8ff00 msxml3!Exception::_throwError+0x22
  03 0013b46c 72e6abaa msxml3!COMSafeControlRoot::getBaseURL+0x3d
  04 0013b4bc 72e6a888 msxml3!Document::loadXML+0x82
  05 0013b510 64b73a9b msxml3!DOMDocumentWrapper::loadXML+0x5a
  06 0013b538 64b74eb6 iepeers!CPersistUserData::initXMLCache+0xa6
  07 0013b560 77d0516e iepeers!CPersistUserData::load+0xfc
  08 0013b57c 77d14abf oleaut32!DispCallFunc+0x16a
...
...
...
  66 0013fec8 0040243d shdocvw!IEWinMain+0x129
  67 0013ff1c 00402744 iexplore!WinMain+0x316
  68 0013ffc0 77e6f23b iexplore!WinMainCRTStartup+0x182
  69 0013fff0 00000000 kernel32!BaseProcessStart+0x23

Example of !stacks command (kernel dump):

2: kd> !stacks 2 nt!PspExitThread
Proc.Thread  .Thread  Ticks   ThreadState Blocker
                            [8a390818 System]

                            [8a1bbbf8 smss.exe]

                            [8a16cbf8 csrss.exe]

                            [89c14bf0 winlogon.exe]

                            [89dda630 services.exe]

                            [89c23af0 lsass.exe]

                            [8a227470 svchost.exe]

                            [89f03bb8 svchost.exe]

                            [89de3820 svchost.exe]

                            [89d09b60 svchost.exe]

                            [89c03530 ccEvtMgr.exe]

                            [89b8f4f0 ccSetMgr.exe]

                            [89dfe8c0 SPBBCSvc.exe]

                            [89c9db18 svchost.exe]

                            [89dfa268 spoolsv.exe]

                            [89dfa6b8 msdtc.exe]

                            [89df38f0 CpSvc.exe]

                            [89d97d88 DefWatch.exe]

                            [89e04020 IBMSPSVC.EXE]

                            [89b54710 IBMSPREM.EXE]

                            [89d9e4b0 IBMSPREM.EXE]

                            [89c2c4e8 svchost.exe]

                            [89d307c0 SavRoam.exe]

                            [89bfcd88 Rtvscan.exe]

                            [89b53b60 uphclean.exe]

                            [89c24020 AgentSVC.exe]

                            [89d75b60 sAginst.exe]

                            [89cf0d88 CdfSvc.exe]

                            [89d87020 cdmsvc.exe]

                            [89dafd88 ctxxmlss.exe]

                            [89d8dd88 encsvc.exe]

                            [89d06d88 ImaSrv.exe]

                            [89d37b60 mfcom.exe]

                            [89c8bb18 SmaService.exe]

                            [89d2ba80 svchost.exe]

                            [89ce8630 XTE.exe]

                            [89b64b60 XTE.exe]

                            [89b7c680 ctxcpusched.exe]

                            [88d94a88 ctxcpuusync.exe]

                            [89ba5418 unsecapp.exe]

                            [89d846e0 wmiprvse.exe]

                            [89cda9d8 ctxwmisvc.exe]

                            [88d6cb78 logon.scr]

                            [88ba0a70 csrss.exe]

                            [88961968 winlogon.exe]

                            [8865f740 rdpclip.exe]

                            [8858db20 wfshell.exe]

                            [88754020 explorer.exe]

                            [88846d88 BacsTray.exe]

                            [886b6180 ccApp.exe]

                            [884bc020 fppdis3a.exe]

                            [885cb350 ctfmon.exe]

                            [888bb918 cscript.exe]

                            [8880b3c8 cscript.exe]

                            [88ad2950 csrss.exe]
 b68.00215c  88930020 0000000 RUNNING    nt!KeBugCheckEx+0x1b
                                        nt!MiCheckSessionPoolAllocations+0xe3
                                        nt!MiDereferenceSessionFinal+0x183
                                        nt!MmCleanProcessAddressSpace+0x6b
                                        nt!PspExitThread+0x5f1
                                        nt!PspTerminateThreadByPointer+0x4b
                                        nt!PspSystemThreadStartup+0x3c
                                        nt!KiThreadStartup+0x16

                            [88629310 winlogon.exe]

                            [88a4d9b0 csrss.exe]

                            [88d9f8b0 winlogon.exe]

                            [88cd5840 wfshell.exe]

                            [8a252440 OUTLOOK.EXE]

                            [8a194bf8 WINWORD.EXE]

                            [88aabd20 ctfmon.exe]

                            [889ef440 EXCEL.EXE]

                            [88bec838 HogiaGUI2.exe]

                            [88692020 csrss.exe]

                            [884dd508 winlogon.exe]

                            [88be1d88 wfshell.exe]

                            [886a7d88 OUTLOOK.EXE]

                            [889baa70 WINWORD.EXE]

                            [8861e3d0 ctfmon.exe]

                            [887bbb68 EXCEL.EXE]

                            [884e4020 csrss.exe]

                            [8889d218 winlogon.exe]

                            [887c8020 wfshell.exe]

Threads Processed: 1101

What if we have a list of processes from a complete memory dump by using !process 0 0 command and we want to interrogate the specific process? In this case we need to switch to that process and reload user space symbol files (.process /r /p address)

There is also a separate command to reload user space symbol files any time (.reload /user).

After switching we can list threads (!process address), dump or search process virtual memory, etc. For example:

1: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 890a3320  SessionId: 0  Cid: 0008    Peb: 00000000  ParentCid: 0000
    DirBase: 00030000  ObjectTable: 890a3e08  TableSize: 405.
    Image: System

PROCESS 889dfd60  SessionId: 0  Cid: 0144    Peb: 7ffdf000  ParentCid: 0008
    DirBase: 0b9e7000  ObjectTable: 889fdb48  TableSize: 212.
    Image: SMSS.EXE

PROCESS 890af020  SessionId: 0  Cid: 0160    Peb: 7ffdf000  ParentCid: 0144
    DirBase: 0ce36000  ObjectTable: 8898e308  TableSize: 747.
    Image: CSRSS.EXE

PROCESS 8893d020  SessionId: 0  Cid: 0178    Peb: 7ffdf000  ParentCid: 0144
    DirBase: 0d33b000  ObjectTable: 890ab4c8  TableSize: 364.
    Image: WINLOGON.EXE

PROCESS 88936020  SessionId: 0  Cid: 0194    Peb: 7ffdf000  ParentCid: 0178
    DirBase: 0d7d5000  ObjectTable: 88980528  TableSize: 872.
    Image: SERVICES.EXE

PROCESS 8897f020  SessionId: 0  Cid: 01a0    Peb: 7ffdf000  ParentCid: 0178
    DirBase: 0d89d000  ObjectTable: 889367c8  TableSize: 623.
    Image: LSASS.EXE

1: kd> .process /r /p 8893d020
Implicit process is now 8893d020
Loading User Symbols
...

1: kd> !process 8893d020
PROCESS 8893d020  SessionId: 0  Cid: 0178    Peb: 7ffdf000  ParentCid: 0144
    DirBase: 0d33b000  ObjectTable: 890ab4c8  TableSize: 364.
    Image: WINLOGON.EXE
    VadRoot 8893a508 Clone 0 Private 1320. Modified 45178. Locked 0.
    DeviceMap 89072448
    Token                             e392f8d0
    ElapsedTime                        9:54:06.0882
    UserTime                          0:00:00.0071
    KernelTime                        0:00:00.0382
    QuotaPoolUsage[PagedPool]         34828
    QuotaPoolUsage[NonPagedPool]      43440
    Working Set Sizes (now,min,max)  (737, 50, 345) (2948KB, 200KB, 1380KB)
    PeakWorkingSetSize                2764
    VirtualSize                       46 Mb
    PeakVirtualSize                   52 Mb
    PageFaultCount                    117462
    MemoryPriority                    FOREGROUND
    BasePriority                      13
    CommitCharge                      1861

        THREAD 8893dda0  Cid 178.15c  Teb: 7ffde000  Win32Thread: a2034908 WAIT: (WrUserRequest) UserMode Non-Alertable
            8893bee0  SynchronizationEvent
        Not impersonating
        Owning Process 8893d020
        Wait Start TickCount    29932455      Elapsed Ticks: 7
        Context Switch Count    28087                   LargeStack
        UserTime                  0:00:00.0023
        KernelTime                0:00:00.0084
        Start Address winlogon!WinMainCRTStartup (0x0101cbb0)
        Stack Init eb1b0000 Current eb1afcc8 Base eb1b0000 Limit eb1ac000 Call 0
        Priority 15 BasePriority 15 PriorityDecrement 0 DecrementCount 0

        ChildEBP RetAddr  Args to Child
        eb1afce0 8042d893 00000000 a2034908 00000001 nt!KiSwapThread+0x1b1
        eb1afd08 a00019c2 8893bee0 0000000d 00000001 nt!KeWaitForSingleObject+0x1a3
        eb1afd44 a0013993 000020ff 00000000 00000001 win32k!xxxSleepThread+0x18a
        eb1afd54 a001399f 0006fdd8 80468389 00000000 win32k!xxxWaitMessage+0xe
        eb1afd5c 80468389 00000000 00000000 00000000 win32k!NtUserWaitMessage+0xb
        eb1afd5c 77e58b53 00000000 00000000 00000000 nt!KiSystemService+0xc9
        0006fdd0 77e33630 00000000 00000000 0000ffff USER32!NtUserWaitMessage+0xb
        0006fe04 77e44327 000100d2 00000000 00000010 USER32!DialogBox2+0x216
        0006fe28 77e38d37 76b90000 76c75c78 00000000 USER32!InternalDialogBox+0xd1
        0006fe48 77e39eba 76b90000 76c75c78 00000000 USER32!DialogBoxIndirectParamAorW+0x34
        0006fe6c 01011749 76b90000 00000578 00000000 USER32!DialogBoxParamW+0x3d
        0006fea8 01018bd3 000755e8 76b90000 00000578 winlogon!TimeoutDialogBoxParam+0x27
        0006fee0 76b93701 000755e8 76b90000 00000578 winlogon!WlxDialogBoxParam+0×7b
        0006ff08 010164c6 0008d0e0 5ffa0000 000755e8 3rdPartyGINA!WlxDisplaySASNotice+0×43
        0006ff20 01014960 000755e8 00000005 00072c9c winlogon!MainLoop+0×96
        0006ff58 0101cd06 00071fc8 00000000 00072c9c winlogon!WinMain+0×37a
        0006fff4 00000000 7ffdf000 000000c8 00000100 winlogon!WinMainCRTStartup+0×156

        THREAD 88980020  Cid 178.188  Teb: 7ffdc000  Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            88980108  NotificationTimer
        Not impersonating
        Owning Process 8893d020
        Wait Start TickCount    29930810      Elapsed Ticks: 1652
        Context Switch Count    15638
        UserTime                  0:00:00.0000
        KernelTime                0:00:00.0000
        Start Address KERNEL32!BaseThreadStartThunk (0x7c57b740)
        Win32 Start Address ntdll!RtlpTimerThread (0x77faa02d)
        Stack Init bf6f7000 Current bf6f6cc4 Base bf6f7000 Limit bf6f4000 Call 0
        Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 0

        ChildEBP RetAddr  Args to Child
        bf6f6cdc 8042d340 bf6f6d64 00bfffac 00bfffac nt!KiSwapThread+0x1b1
        bf6f6d04 8052aac9 8046c101 00000001 bf6f6d34 nt!KeDelayExecutionThread+0x182
        bf6f6d54 80468389 00000001 00bfffac 00000000 nt!NtDelayExecution+0x7f
        bf6f6d54 77f82831 00000001 00bfffac 00000000 nt!KiSystemService+0xc9
        00bfff9c 77f842c4 00000001 00bfffac 00000000 ntdll!NtDelayExecution+0xb
        00bfffb4 7c57b3bc 0006fe60 00000000 00000000 ntdll!RtlpTimerThread+0x42
        00bfffec 00000000 77faa02d 0006fe60 00000000 KERNEL32!BaseThreadStart+0x52

1: kd> dds 0006fee0
0006fee0  0006ff08
0006fee4  76b93701 3rdPartyGINA!WlxDisplaySASNotice+0x43
0006fee8  000755e8
0006feec  76b90000 3rdParty
0006fef0  00000578
0006fef4  00000000
0006fef8  76b9370b 3rdParty!WlxDisplaySASNotice+0x4d
0006fefc  0008d0e0
0006ff00  00000008
0006ff04  00000080
0006ff08  0006ff20
0006ff0c  010164c6 winlogon!MainLoop+0x96
0006ff10  0008d0e0
0006ff14  5ffa0000
0006ff18  000755e8
0006ff1c  00000000
0006ff20  0006ff58
0006ff24  01014960 winlogon!WinMain+0x37a
0006ff28  000755e8
0006ff2c  00000005
0006ff30  00072c9c
0006ff34  00000001
0006ff38  000001bc
0006ff3c  00000005
0006ff40  00000001
0006ff44  0000000d
0006ff48  00000000
0006ff4c  00000000
0006ff50  00000000
0006ff54  0000ffe4
0006ff58  0006fff4
0006ff5c  0101cd06 winlogon!WinMainCRTStartup+0x156

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 25)

Monday, September 10th, 2007

The most important pattern that is used for problem identification and resolution is Stack Trace. Consider the following fragment of !analyze -v output from w3wp.exe crash dump:

STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
1824f90c 5a39f97e 01057b48 01057bd0 5a3215b4 0x0
1824fa50 5a32cf7c 01057b48 00000000 79e651c0 w3core!ISAPI_REQUEST::SendResponseHeaders+0x5d
1824fa78 5a3218ad 01057bd0 79e651c0 79e64d9c w3isapi!SSFSendResponseHeader+0xe0
1824fae8 79e76127 01057bd0 00000003 79e651c0 w3isapi!ServerSupportFunction+0x351
1824fb0c 79e763a3 80000411 00000000 00000000 aspnet_isapi!HttpCompletion::ReportHttpError+0x3a
1824fd50 79e761c3 34df6cf8 79e8e42f 79e8e442 aspnet_isapi!HttpCompletion::ProcessRequestInManagedCode+0x1d1
1824fd5c 79e8e442 34df6cf8 00000000 00000000 aspnet_isapi!HttpCompletion::ProcessCompletion+0x24
1824fd70 791d6211 34df6cf8 18e60110 793ee0d8 aspnet_isapi!CorThreadPoolWorkitemCallback+0x13
1824fd84 791d616a 18e60110 00000000 791d60fa mscorsvr!ThreadpoolMgr::ExecuteWorkRequest+0x19
1824fda4 791fe95c 00000000 8083d5c7 00000000 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129
1824ffb8 77e64829 17bb9c18 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
1824ffec 00000000 791fe91b 17bb9c18 00000000 kernel32!BaseThreadStart+0x34

Ignoring the first 5 numeric columns gives us the following trace:

0x0
w3core!ISAPI_REQUEST::SendResponseHeaders+0x5d
w3isapi!SSFSendResponseHeader+0xe0
w3isapi!ServerSupportFunction+0x351
aspnet_isapi!HttpCompletion::ReportHttpError+0x3a
aspnet_isapi!HttpCompletion::ProcessRequestInManagedCode+0x1d1
aspnet_isapi!HttpCompletion::ProcessCompletion+0x24
aspnet_isapi!CorThreadPoolWorkitemCallback+0x13
mscorsvr!ThreadpoolMgr::ExecuteWorkRequest+0x19
mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129
mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
kernel32!BaseThreadStart+0x34

or in general we have something like this:

moduleA!functionX+offsetN
moduleB!functionY+offsetM
...
...
...

Sometimes function names are not available or offsets are very big like 0×2380. If this is the case then we probably don’t have symbol files for moduleA and moduleB:

moduleA+offsetN
moduleB+offsetM
...
...
...

Usually there is some kind of a database of previous issues we can use to match moduleA!functionX+offsetN against. If there is no such match we can try functionX+offsetN, moduleA!functionX or just functionX. If there is no such match again we can try the next signature, moduleB!functionY+offsetM, and moduleB!functionY, etc. Usually, the further down the trace the less useful the signature is for problem resolution. For example, mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129 will probably match many issues because this signature is common for many ASP.NET applications.

If there is no match in internal databases we can try Google. For our example, Google search for SendResponseHeaders+0x5d gives the following search results:

Browsing search results reveals the following discussion:

http://groups.google.com/group/microsoft.public.inetserver.iis/ browse_frm/thread/34bc2be635b26531?tvc=1 

which can be found directly by searching Google groups:

 

Another example from BSOD complete memory dump. Analysis command has the following output (stripped for clarity):

MODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: bff90ca3, The address that the exception occurred at
Arg3: 00000000, Parameter 0 of the exception
Arg4: 00000000, Parameter 1 of the exception

TRAP_FRAME: bdf80834 -- (.trap ffffffffbdf80834)
ErrCode = 00000000
eax=00000000 ebx=bdf80c34 ecx=89031870 edx=88096928 esi=88096928 edi=8905e7f0
eip=bff90ca3 esp=bdf808a8 ebp=bdf80a44 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
tsmlvsa+0xfca3:
bff90ca3 8b08 mov ecx,dword ptr [eax] ds:0023:00000000=????????
Resetting default scope

STACK_TEXT:
bdf807c4 80467a15 bdf807e0 00000000 bdf80834 nt!KiDispatchException+0x30e
bdf8082c 804679c6 00000000 bdf80860 804d9f69 nt!CommonDispatchException+0x4d
bdf80838 804d9f69 00000000 00000005 e56c6946 nt!KiUnexpectedInterruptTail+0x207
00000000 00000000 00000000 00000000 00000000 nt!ObpAllocateObject+0xe1

Because the crash point tsmlvsa+0xfca3 is not on the stack trace we use .trap command:

1: kd> .trap ffffffffbdf80834
ErrCode = 00000000
eax=00000000 ebx=bdf80c34 ecx=89031870 edx=88096928 esi=88096928 edi=8905e7f0
eip=bff90ca3 esp=bdf808a8 ebp=bdf80a44 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
tsmlvsa+0xfca3:
bff90ca3 8b08 mov ecx,dword ptr [eax] ds:0023:00000000=????????

1: kd> k
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
00000000 bdf80afc tsmlvsa+0xfca3
89080c00 00000040 nt!ObpLookupObjectName+0x504
00000000 00000001 nt!ObOpenObjectByName+0xc5
c0100080 0012b8d8 nt!IopCreateFile+0x407
c0100080 0012b8d8 nt!IoCreateFile+0x36
c0100080 0012b8d8 nt!NtCreateFile+0x2e
c0100080 0012b8d8 nt!KiSystemService+0xc9
c0100080 0012b8d8 ntdll!NtCreateFile+0xb
c0000000 00000000 KERNEL32!CreateFileW+0x343

1: kd> lmv m tsmlvsa
bff81000 bff987c0 tsmlvsa (no symbols)
Loaded symbol image file: tsmlvsa.sys
Image path: tsmlvsa.sys
Image name: tsmlvsa.sys
Timestamp: Thu Mar 18 06:18:51 2004 (40593F4B)
CheckSum: 0002D102
ImageSize: 000177C0
Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0

Google search for tsmlvsa+0xfca3 fails but if we search just for tsmlvsa we get the first link towards problem resolution:

http://www-1.ibm.com/support/docview.wss?uid=swg1IC40964

- Dmitry Vostokov @ DumpAnalysis.org -

Raw Stack Dump of all threads

Tuesday, August 28th, 2007

Sometimes we need to dump the whole thread stack data to find traces of hooks, printer drivers or just string fragments. This is usually done by finding the appropriate TEB and dumping the data between StackLimit and StackBase addresses, for example:

0:000> ~
.  0  Id: 106c.4e4 Suspend: 1 Teb: 7ffde000 Unfrozen
   1  Id: 106c.4e0 Suspend: 1 Teb: 7ffdc000 Unfrozen
   2  Id: 106c.1158 Suspend: 1 Teb: 7ffdb000 Unfrozen
   3  Id: 106c.c3c Suspend: 1 Teb: 7ffd9000 Unfrozen
   4  Id: 106c.1174 Suspend: 1 Teb: 7ffd8000 Unfrozen
   5  Id: 106c.1168 Suspend: 1 Teb: 7ffd4000 Unfrozen
   6  Id: 106c.1568 Suspend: 1 Teb: 7ffaf000 Unfrozen
   7  Id: 106c.1574 Suspend: 1 Teb: 7ffad000 Unfrozen
   8  Id: 106c.964 Suspend: 1 Teb: 7ffac000 Unfrozen
   9  Id: 106c.1164 Suspend: 1 Teb: 7ffab000 Unfrozen
  10  Id: 106c.d84 Suspend: 1 Teb: 7ffaa000 Unfrozen
  11  Id: 106c.bf4 Suspend: 1 Teb: 7ffa9000 Unfrozen
  12  Id: 106c.eac Suspend: 1 Teb: 7ffa8000 Unfrozen
  13  Id: 106c.614 Suspend: 1 Teb: 7ffd5000 Unfrozen
  14  Id: 106c.cd8 Suspend: 1 Teb: 7ffa7000 Unfrozen
  15  Id: 106c.1248 Suspend: 1 Teb: 7ffa6000 Unfrozen
  16  Id: 106c.12d4 Suspend: 1 Teb: 7ffa4000 Unfrozen
  17  Id: 106c.390 Suspend: 1 Teb: 7ffa3000 Unfrozen
  18  Id: 106c.764 Suspend: 1 Teb: 7ffa1000 Unfrozen
  19  Id: 106c.f48 Suspend: 1 Teb: 7ff5f000 Unfrozen
  20  Id: 106c.14a8 Suspend: 1 Teb: 7ff53000 Unfrozen
  21  Id: 106c.464 Suspend: 1 Teb: 7ff4d000 Unfrozen
  22  Id: 106c.1250 Suspend: 1 Teb: 7ffa5000 Unfrozen
  23  Id: 106c.fac Suspend: 1 Teb: 7ff5c000 Unfrozen
  24  Id: 106c.1740 Suspend: 1 Teb: 7ffd7000 Unfrozen
  25  Id: 106c.ae4 Suspend: 1 Teb: 7ffd6000 Unfrozen
  26  Id: 106c.a4c Suspend: 1 Teb: 7ffdd000 Unfrozen
  27  Id: 106c.1710 Suspend: 1 Teb: 7ffda000 Unfrozen
  28  Id: 106c.1430 Suspend: 1 Teb: 7ffa2000 Unfrozen
  29  Id: 106c.1404 Suspend: 1 Teb: 7ff4e000 Unfrozen
  30  Id: 106c.9a8 Suspend: 1 Teb: 7ff4c000 Unfrozen
  31  Id: 106c.434 Suspend: 1 Teb: 7ff4b000 Unfrozen
  32  Id: 106c.c8c Suspend: 1 Teb: 7ff4a000 Unfrozen
  33  Id: 106c.4f0 Suspend: 1 Teb: 7ff49000 Unfrozen
  34  Id: 106c.be8 Suspend: 1 Teb: 7ffae000 Unfrozen
  35  Id: 106c.14e0 Suspend: 1 Teb: 7ff5d000 Unfrozen
  36  Id: 106c.fe0 Suspend: 1 Teb: 7ff5b000 Unfrozen
  37  Id: 106c.1470 Suspend: 1 Teb: 7ff57000 Unfrozen
  38  Id: 106c.16c4 Suspend: 1 Teb: 7ff5e000 Unfrozen

0:000> !teb 7ffad000
TEB at 7ffad000
    ExceptionList:        0181ff0c
    StackBase:            01820000
    StackLimit:           0181c000

    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffad000
    EnvironmentPointer:   00000000
    ClientId:             0000106c . 00001574
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffdf000
    LastErrorValue:       0
    LastStatusValue:      c000000d
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dps 0181c000 01820000
0181c000  00000000
0181c004  00000000
0181c008  00000000
0181c00c  00000000
0181c010  00000000
0181c014  00000000
0181c018  00000000
0181c01c  00000000
0181c020  00000000
0181c024  00000000
...
...
...
0181ffb8  0181ffec
0181ffbc  77e6608b kernel32!BaseThreadStart+0x34
0181ffc0  00f31eb0
0181ffc4  00000000
0181ffc8  00000000
0181ffcc  00f31eb0
0181ffd0  8a38f7a8
0181ffd4  0181ffc4
0181ffd8  88a474b8
0181ffdc  ffffffff
0181ffe0  77e6b7d0 kernel32!_except_handler3
0181ffe4  77e66098 kernel32!`string'+0x98
0181ffe8  00000000
0181ffec  00000000
0181fff0  00000000
0181fff4  7923a709
0181fff8  00f31eb0
0181fffc  00000000
01820000  ????????

However, if our process has many threads, like in the example above,  and we want to dump stack data from all of them, we need to automate this process. After several attempts I created the following simple script which can be copy-pasted into WinDbg command window or saved in a text file to be loaded and executed later via WinDbg $$>< command. The script takes the advantage of the following command

~e (Thread-Specific Command)

The ~e command executes one or more commands for a specific thread or for all threads in the target process.

(from WinDbg help)

Here is the script:

~*e r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2

Raw stack data from different stacks is separated by !teb output for clarity, for example:

0:000> .logopen rawdata.log
0:000> ~*e r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2
TEB at 7ffde000
    ExceptionList:        0007fd38
    StackBase:            00080000
    StackLimit:           0007c000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffde000
    EnvironmentPointer:   00000000
    ClientId:             0000106c . 000004e4
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffdf000
    LastErrorValue:       0
    LastStatusValue:      c0000034
    Count Owned Locks:    0
    HardErrorMode:        0
0007c000  00000000
0007c004  00000000
0007c008  00000000
0007c00c  00000000
0007c010  00000000
0007c014  00000000
0007c018  00000000
0007c01c  00000000
0007c020  00000000
0007c024  00000000
...
...
...
...
...
...
...
0977ffb4  00000000
0977ffb8  0977ffec
0977ffbc  77e6608b kernel32!BaseThreadStart+0x34
0977ffc0  025c3728
0977ffc4  00000000
0977ffc8  00000000
0977ffcc  025c3728
0977ffd0  a50c4963
0977ffd4  0977ffc4
0977ffd8  000a5285
0977ffdc  ffffffff
0977ffe0  77e6b7d0 kernel32!_except_handler3
0977ffe4  77e66098 kernel32!`string'+0x98
0977ffe8  00000000
0977ffec  00000000
0977fff0  00000000
0977fff4  77bcb4bc msvcrt!_endthreadex+0x2f
0977fff8  025c3728
0977fffc  00000000
09780000  ????????
TEB at 7ffae000
    ExceptionList:        0071ff64
    StackBase:            00720000
    StackLimit:           0071c000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffae000
    EnvironmentPointer:   00000000
    ClientId:             0000106c . 00000be8
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffdf000
    LastErrorValue:       0
    LastStatusValue:      c000000d
    Count Owned Locks:    0
    HardErrorMode:        0
0071c000  00000000
0071c004  00000000
0071c008  00000000
0071c00c  00000000
0071c010  00000000
0071c014  00000000
0071c018  00000000
0071c01c  00000000
0071c020  00000000
0071c024  00000000
0071c028  00000000
0071c02c  00000000
0071c030  00000000
0071c034  00000000
0071c038  00000000
0071c03c  00000000
0071c040  00000000
0071c044  00000000
0071c048  00000000
0071c04c  00000000
0071c050  00000000
0071c054  00000000
...
...
...
...
...
...
...
0:000> .logclose

Instead of (or in addition to) dps command used in the script we can use dpu or dpa commands to dump all strings that are pointed to by stack data or create an even more complex script that does triple dereference.    

- Dmitry Vostokov @ DumpAnalysis.org -

Moving to kernel space (updated references)

Sunday, August 26th, 2007

If you are developing and debugging user space applications (and/or doing crash dump analysis in user space) and you want to understand Windows kernel dumps and device drivers better (and probably start writing your own kernel tools) here is the reading list I found the most effective over the last 4 years:

0. Read and re-read Windows Internals book in parallel while reading all other books. I read all editions by the way. It will show you the big picture and some useful WinDbg commands and techniques but you need to read device driver books to fill the gaps and be confident in kernel space:

Buy from Amazon

1. Start with “The Windows 2000 Device Driver Book: A Guide for Programmers (2nd Edition)”. This short book will show you the basics and you can start writing your drivers and kernel tools immediately.

Buy from Amazon

2. Next read “Windows NT Device Driver Development” book to consolidate your knowledge. This book has been reprinted by OSR:

Buy from Amazon

3. Don’t stop here. Read “Developing Windows NT Device Drivers:
 A Programmer’s Handbook”. This is very good book explaining everything in great detail and good pictures. You will finally understand various buffering methods.

Buy from Amazon

4. Continue with WDM drivers and modern presentation: “Programming the Microsoft Windows Driver Model, Second Edition”. Must read even if your drivers are not WDM.

Buy from Amazon

5. Finally read “Developing Drivers with the Windows Driver Foundation” book as this is the future and it also covers ETW (event tracing for Windows), WinDbg extensions, PREfast and static driver verifier.

Buy from Amazon

Additional reading (not including DDK Help which you will use anyway) can be done in parallel after finishing “Windows NT Device Driver Development” book:

1. OSR NT Insider articles. I have their full printed collection 1996 - 2006

http://www.osronline.com/

2. “Windows NT File System Internals” reprinted by OSR:

Buy from Amazon

3. “Rootkits: Subverting the Windows Kernel” book will show you Windows kernel from hacker perspective. In addition you will find overview of kernel areas not covered in other books.

Buy from Amazon

Of course, you must know C language and its idioms really well. Really know it down to assembly language level! I’ll publish another reading list soon. Stay tuned.

- Dmitry Vostokov @ DumpAnalysis.org -