Archive for the ‘Crash Dump Analysis’ Category

Stack trace, invalid code pointer and hooked functions: pattern cooperation

Wednesday, May 13th, 2009

When looking at a stack trace of one crashed process we noticed an invalid code pointer. It is not a NULL code pointer but has the same stack trace pattern:

0:000> kL
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
0013dfb4 00401791 0×5e388583
0013fdf4 0040189f Application!RequestData+0×3e1

0013fee4 00401d0a Application!main+0×3f
0013ffc0 77e4f23b Application!mainCRTStartup+0×16c
0013fff0 00000000 kernel32!BaseProcessStart+0×23

When we look at raw stack data and examine the backward disassembly of the return address we see that invalid code was called from RequestData function and WinDbg stack trace reconstruction is not suspicious (it is structurally and semantically correct):

0:000> dds esp l10
0013dfb8  00401791 Application!RequestData+0x3e1
0013dfbc  00000140
0013dfc0  0013ee50
0013dfc4  00000fa4
0013dfc8  00000000
0013dfcc  00000000
0013dfd0  00000ece
0013dfd4  0013ffc0
0013dfd8  7ffdc000
0013dfdc  00000140
0013dfe0  0000054c
0013dfe4  50000002
0013dfe8  4b4919ac
0013dfec  00000000
0013dff0  00000000
0013dff4  003f003c

0:000> .asm no_code_bytes
Assembly options: no_code_bytes

0:000> ub 00401791
Application!RequestData+0x3c8:
00401778 Application!RequestData+0x3d0 (00401780)
0040177a lea     ebx,[ebx]
00401780 push    0
00401782 push    eax
00401783 lea     ecx,[esp+esi+0E30h]
0040178a push    ecx
0040178b push    edi
0040178c call    Application!recv (0040e382)

When seeing recv call we might suspect that the crash happened just inside that function because the raw stack data upwards (lower addresses) doesn’t have any execution residue left from nested function calls:

0:000> dds esp-100 esp
0013deb8  00000000
0013debc  00000000
0013dec0  00000000
0013dec4  00000000
0013dec8  00000000
0013decc  00000000
0013ded0  00000000
0013ded4  00000000
0013ded8  00000000
0013dedc  00000000
0013dee0  00000000
0013dee4  00000000
0013dee8  00000000
0013deec  00000000
0013def0  00000000
0013def4  00000000
0013def8  00000000
0013defc  00000000
0013df00  00000000
0013df04  00000000
0013df08  00000000
0013df0c  00000000
0013df10  00000000
0013df14  00000000
0013df18  00000000
0013df1c  00000000
0013df20  00000000
0013df24  00000000
0013df28  00000000
0013df2c  00000000
0013df30  00000000
0013df34  00000000
0013df38  00000000
0013df3c  00000000
0013df40  00000000
0013df44  00000000
0013df48  00000000
0013df4c  00000000
0013df50  00000000
0013df54  00000000
0013df58  00000000
0013df5c  00000000
0013df60  00000000
0013df64  00000000
0013df68  00000000
0013df6c  00000000
0013df70  00000000
0013df74  00000000
0013df78  00000000
0013df7c  00000000
0013df80  00000000
0013df84  00000000
0013df88  00000000
0013df8c  00000000
0013df90  00000000
0013df94  00000000
0013df98  00000000
0013df9c  00000000
0013dfa0  00000000
0013dfa4  00000000
0013dfa8  00000000
0013dfac  00000000
0013dfb0  00000000
0013dfb4  00000000
0013dfb8  00401791 Application!RequestData+0x3e1

So we follow recv call forward disassembly (notice that the first jump is indirect):

0:000> u 0040e382
Application!recv:
0040e382 jmp     dword ptr [Application!_imp__recv (00410180)]

Application!closesocket:
0040e388 jmp     dword ptr [Application!_imp__closesocket (00410170)]
Application!WSAGetLastError:
0040e38e jmp     dword ptr [Application!_imp__WSAGetLastError (00410174)]
Application!send:
0040e394 jmp     dword ptr [Application!_imp__send (00410178)]
Application!connect:
0040e39a jmp     dword ptr [Application!_imp__connect (0041017c)]
Application!htons:
0040e3a0 jmp     dword ptr [Application!_imp__htons (00410198)]
Application!setsockopt:
0040e3a6 jmp     dword ptr [Application!_imp__setsockopt (00410184)]
Application!socket:
0040e3ac jmp     dword ptr [Application!_imp__socket (00410188)]

0:000> dps 00410180 l10
00410180  71ad2f7f ws2_32!recv
00410184  71ad2d47 ws2_32!setsockopt
00410188  71ad410c ws2_32!socket
0041018c  71ad7ca1 ws2_32!gethostbyname
00410190  71ad4f3b ws2_32!WSAStartup
00410194  71ad7b5b ws2_32!gethostname
00410198  71ad28bc ws2_32!htons
0041019c  71ad3da8 ws2_32!WSACleanup
004101a0  00000000
004101a4  00000000
004101a8  00000000
004101ac  00000000
004101b0  00000000
004101b4  45cd184e
004101b8  00000000
004101bc  00000002

0:000> u 71ad2f7f
ws2_32!recv:
71ad2f7f jmp     7fd60000

71ad2f84 sub     esp,10h
71ad2f87 push    ebx
71ad2f88 xor     ebx,ebx
71ad2f8a cmp dword ptr [ws2_32!PrologPointer (71ae4044)],offset ws2_32!Prolog_v2 (71ad6067)
71ad2f94 push    esi
71ad2f95 je      ws2_32!recv+0×18 (71ad6207)
71ad2f9b lea     eax,[ebp-8]

0:000> u 7fd60000
7fd60000 jmp     DllA!recv_patch (612101b6)
7fd60005 mov     edi,edi
7fd60007 push    ebp
7fd60008 mov     ebp,esp
7fd6000a jmp     ws2_32!recv+0×5 (71ad2f84)
7fd6000f add     byte ptr [eax],al
7fd60011 add     byte ptr [eax],al
7fd60013 add     byte ptr [eax],al

Finally we see that sockets library functions were patched by a 3rd-party module DllA and we need to contact its vendor.

- Dmitry Vostokov @ DumpAnalysis.org -

The Structure of MDA Certification

Tuesday, May 12th, 2009

Previously announced memory dump analysis certification will have x86/x64-based tracks for Windows and Unix (including Linux / FreeBSD / Mac OS X). Each track consists of 3 exams, each having its own set of requirements and scope:

  • Fundamentals of Memory Dump Analysis
  • Intermediate Memory Dump Analysis
  • Advanced Memory Dump Analysis

More information will be available later. The initiative is supported by OpenTask.

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump Analysis Certification (Part 1)

Monday, May 11th, 2009

This is an example from the sample question set of the forthcoming memory dump analysis certification planned by Memory Analysis and Debugging Institute. You can reply to this post with your answers.

Q. Interpret the fragment from WinDbg output below.

0:002> !teb
[...]
TEB at 7efaf000
[...]

0:002> dds poi(7efaf000+8) poi(7efaf000+4)
01192000  00000000
01192004  00000000
01192008  00000000
[...]
0119e448  00010020
0119e44c  00030002
0119e450  00050004
0119e454  00070006
0119e458  00090008
0119e45c  000b000a
0119e460  000d000c
0119e464  000f000e
0119e468  00110010
0119e46c  00130012
0119e470  00150014
0119e474  00170016
0119e478  00190018
0119e47c  001b001a
0119e480  001d001c
0119e484  001f001e
0119e488  00210020
0119e48c  00230022
0119e490  00250024
0119e494  00270026
0119e498  00290028
0119e49c  002b002a
0119e4a0  002d002c
0119e4a4  002f002e
0119e4a8  00310030
0119e4ac  00330032
0119e4b0  00350034
0119e4b4  00370036
0119e4b8  00390038
0119e4bc  003b003a
0119e4c0  003d003c
0119e4c4  003f003e
0119e4c8  00410040 MyService!__InternalCxxFrameHandler+0x5c
0119e4cc  00430042
0119e4d0  00450044
0119e4d4  00470046
0119e4d8  00490048
0119e4dc  004b004a
0119e4e0  004d004c
0119e4e4  004f004e
0119e4e8  00510050 advapi32!`string'+0x164
0119e4ec  00530052 advapi32!GetPerflibKeyValue+0x184
0119e4f0  00550054 advapi32!`string'+0x20c
0119e4f4  00570056 advapi32!_NULL_IMPORT_DESCRIPTOR+0x2714
0119e4f8  00590058 advapi32!szPerflibSectionName <PERF> (advapi32+0x90058)
0119e4fc  005b005a shlwapi!_CRT_INIT+0xaf
0119e500  005d005c shlwapi!_OpenProgidKey+0xee
0119e504  005f005e shlwapi!_pRawDllMain <PERF> (shlwapi+0x5005e)
0119e508  00410060 MyService!__InternalCxxFrameHandler+0x7c
0119e50c  00430042
0119e510  00450044
0119e514  00470046
0119e518  00490048
0119e51c  004b004a
0119e520  004d004c
0119e524  004f004e
0119e528  00510050 advapi32!`string'+0x164
0119e52c  00530052 advapi32!GetPerflibKeyValue+0x184
0119e530  00550054 advapi32!`string'+0x20c
0119e534  00570056 advapi32!_NULL_IMPORT_DESCRIPTOR+0x2714
0119e538  00590058 advapi32!szPerflibSectionName <PERF> (advapi32+0x90058)
0119e53c  007b005a
0119e540  007d007c
0119e544  007f007e
[...]

Choose all answers that are valid:

a. ASCII string fragment
b. Raw stack data
c. Exception handling
d. UNICODE string fragment
e. Partial stack traces
f. Performance monitoring
g. Execution residue

- Dmitry Vostokov @ DumpAnalysis.org -

Raw Stack Dump of all threads (part 3)

Monday, May 11th, 2009

Sometimes the script featured in part 1 doesn’t work because of the lack of symbols or something else:

***                                                                  
***                                                                  
*** Your debugger is not using the correct symbols    
*** 
*** In order for this command to work properly, your symbol path
*** must point to .pdb files that have full type information.
*** 
*** Certain .pdb files (such as the public OS symbols) do not
*** contain the required information.  Contact the group that
*** provided you with these symbols if you need this command to
*** work.
*** 
*** Type referenced: ntdll!_NT_TIB
***
Couldn’t resolve error at ‘ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2′
                    ^ Extra character error in ‘~*e r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2′

This is the case where !teb WinDbg command doesn’t work and we can cope with this as shown in the following post:

Coping with missing symbolic information

Therefore we can adjust our user-mode script to use hard-coded offsets and delineate raw stack outputs by the output of kv WinDbg command:

~*e r? $t0 = @$teb; r? $t1 = @$t0+8; r? $t2 = @$t0+4; kv 100; dps poi(@$t1) poi(@$t2)

- Dmitry Vostokov @ DumpAnalysis.org -

Stack trace collection, blocked threads, pass through functions and main thread: pattern cooperation

Friday, May 8th, 2009

A server was hanging and a complete memory dump was generated. Usual checks for resource contention or CPU spikes like !locks and !running WinDbg commands didn’t reveal anything. Therefore we decided to list all threads in the system using !stacks and !process 0 ff commands. The former command gives a birds eye overview of threads and it was found suspicious that many threads in different processes had 0 ticks waiting and had this top level call:

0: kd> !stacks
Proc.Thread .Thread Ticks ThreadState Blocker
[...]

[88d9cd88 explorer.exe]
Proc.Thread .Thread  Ticks   ThreadState Blocker
1ecc.001eb4 88d99db0 0000000 Blocked     3rdPartyAVDriver+0×1394
[…]

[862bad88 userinit.exe]
4394.005570 878f0790 0000000 Blocked     3rdPartyAVDriver+0×1394

Inspection of the latter command detailed output of thread stacks showed that all these suspicious threads were in fact blocked when loading a DLL:

0: kd> .process /r /p 862bad88
Implicit process is now 862bad88
Loading User Symbols

0: kd> .thread 878f0790
Implicit thread is now 878f0790

0: kd> kL 100
ChildEBP RetAddr 
9aeb529c 80833e95 nt!KiSwapContext+0x26
9aeb52c8 80829027 nt!KiSwapThread+0x2e5
9aeb5310 b8dff394 nt!KeDelayExecutionThread+0x2ab
WARNING: Stack unwind information not available. Following frames may be wrong.
9aeb5338 b8dffe35 3rdPartyAVDriver+0×1394
9aeb5354 b8e00208 3rdPartyAVDriver+0×1e35
9aeb5398 b8e0e45a 3rdPartyAVDriver+0×2208
9aeb53d0 8081e087 3rdPartyAVDriver+0×1045a

9aeb53e4 b7d6e41b nt!IofCallDriver+0×45
9aeb53f4 b7d6a3ee DriverA!Dispatch+0xfe
9aeb5404 8081e087 DriverA!KUI_dispatch+0×6d

9aeb5418 b8bdb4e1 nt!IofCallDriver+0×45
9aeb543c b8bdb755 DriverB!PassThrough+0xd3
9aeb545c 8081e087 DriverB!Dispatch+0×153

9aeb5470 b6a7b81d nt!IofCallDriver+0×45
9aeb5498 b6a6c5bb DriverC!LowerPassThrough+0×48
9aeb54bc b6a6cc32 DriverC+0×75bb
9aeb5554 b6a7aa57 DriverC+0×7c32
9aeb5564 b6a7aaa7 DriverC+0×15a57
9aeb558c 8081e087 DriverC!DispatchPassThrough+0×48
9aeb55a0 808fb13b nt!IofCallDriver+0×45
9aeb5688 80939c6a nt!IopParseDevice+0xa35
9aeb5708 80935d9e nt!ObpLookupObjectName+0×5b0
9aeb575c 808ece57 nt!ObOpenObjectByName+0xea
9aeb57d8 808ee0f1 nt!IopCreateFile+0×447
9aeb5834 808f1e31 nt!IoCreateFile+0xa3
9aeb5874 8088b19c nt!NtOpenFile+0×27
9aeb5874 7c94860c nt!KiFastCallEntry+0xfc
0007e1fc 7c947339 ntdll!KiFastSystemCallRet
0007e200 7c952653 ntdll!NtOpenFile+0xc
0007e278 7c9525db ntdll!LdrpCreateDllSection+0×4e
0007e364 7c95425c ntdll!LdrpMapDll+0×297
0007e610 7c95408d ntdll!LdrpLoadDll+0×1e4
0007e88c 7c801bf7 ntdll!LdrLoadDll+0×198
0007e8f4 7c81c70b kernel32!LoadLibraryExW+0×1b2
0007e908 7c93a352 kernel32!LoadLibraryW+0×11
0007ee38 7c95348d ntdll!LdrpCallInitRoutine+0×14
0007ef40 7c954339 ntdll!LdrpRunInitializeRoutines+0×367
0007f1d4 7c95408d ntdll!LdrpLoadDll+0×3cd
0007f450 7c801bf7 ntdll!LdrLoadDll+0×198
0007f4b8 7c81c70b kernel32!LoadLibraryExW+0×1b2
0007f4cc 77e556fb kernel32!LoadLibraryW+0×11
0007f558 77e3dc66 USER32!LoadAppDlls+0×179
0007f694 77e3df10 USER32!ClientThreadSetup+0×1c4
0007f698 7c948556 USER32!__ClientThreadSetup+0×5
0007f698 80831348 ntdll!KiUserCallbackDispatcher+0×2e
9aeb5b4c 8091f9e1 nt!KiCallUserMode+0×4
9aeb5ba4 bf8bc7c3 nt!KeUserModeCallback+0×8f
9aeb5bcc bf8bbc93 win32k!xxxClientThreadSetup+0×21
9aeb5ccc bf8b6c46 win32k!xxxCreateThreadInfo+0×804
9aeb5cdc bf8b7ae4 win32k!UserThreadCallout+0×71
9aeb5cf8 80947a1b win32k!W32pThreadCallout+0×3a
9aeb5d54 8088ae9e nt!PsConvertToGuiThread+0×153
9aeb5d58 00000000 nt!KiBBTUnexpectedRange+0xc

The functions shown in blue are known from past issues to be pass through forwarding IRP to the lower drivers in a device driver stack.

Actually the thread stack above belongs to the main thread blocked while loading application DLLs. The conclusion was to check 3rdPartyAVDriver.sys vendor.

- Dmitry Vostokov @ DumpAnalysis.org -

Programming Language Pragmatics (3rd Edition)

Friday, May 8th, 2009

As soon as I wrote my review of the 2nd edition I found out that the 3rd edition was recently published and immediately bought it. I intend to read it from cover to cover again and publish my notes and comments in my reading notebook on Software Generalist blog. The new edition is also bundled with a companion CD.

Programming Language Pragmatics, Third Edition

Buy from Amazon

Hope in one of subsequent editions the author includes my Riemann Programming Language :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.91

Monday, May 4th, 2009

On universal memory dumps:

“[…] the first man who noticed the analogy between a” dump “and” an observation “made a notable advance in the history of thought.”

Alfred North Whitehead, Science and the Modern World

- Dmitry Vostokov @ DumpAnalysis.org -

Heap corruption, module variety, execution residue, coincidental symbolic information and critical section corruption: pattern cooperation

Friday, May 1st, 2009

This is a synthesized dump analysis of many similar print spooler crashes in multi-user terminal service environments where old printer drivers are used that were tested only in single-user environments or insufficiently tested in multi-threaded environments. Many such crashes result from dynamic memory corruption of a process heap:

(40dc.4278): Access violation - code c0000005 (!!! second chance !!!)
eax=00000079 ebx=00000001 ecx=0008bff8 edx=00107898 esi=07d7522a edi=00107890
eip=7c8199b2 esp=0155fc14 ebp=0155fc44 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
ntdll!RtlpLowFragHeapFree+0×30:
7c8199b2 8b4604          mov     eax,dword ptr [esi+4] ds:0023:07d7522e=????????

0:017> kL
ChildEBP RetAddr 
0155fc44 7c819770 ntdll!RtlpLowFragHeapFree+0×30
0155fd1c 77c87a2b ntdll!RtlFreeHeap+0×5c

0155fd30 77c87a02 RPCRT4!FreeWrapper+0×1e
0155fd3c 77c821c2 RPCRT4!operator delete+0xd
0155fd50 77c8047b RPCRT4!LRPC_SCALL::FreeBuffer+0×77
0155fd9c 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×192
0155fdc0 77c811dc RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
0155fdfc 77c812f0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
0155fe20 77c88678 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0155ff84 77c88792 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0155ff8c 77c8872d RPCRT4!RecvLotsaCallsWrapper+0xd
0155ffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
0155ffb8 77e64829 RPCRT4!ThreadStartRoutine+0×1b
0155ffec 00000000 kernel32!BaseThreadStart+0×34

Although any module could corrupt the heap and either Gflags or Application Verifier is recommended to enable full page heap, sometimes we need to point to some print drivers to eliminate or upgrade them in the meantime. When there are many of them we can point to the oldest one:

0:017> lmt
start    end        module name
[...]
010d0000 010d8000   PrintDriver1  2007
01260000 01272000   PrintDriver2  1999
01290000 012da000   PrintDriver3  2009
012f0000 01302000   PrintDriver4  2003
01310000 01320000   PrintDriver5  2004
01320000 01332000   PrintDriver6  2004
01340000 01353000   PrintDriver7  2005
01360000 0139e000   PrintDriver8  2007
013b0000 013c3000   PrintDriver9  2004
013d0000 013e0000   PrintDriver10 2005
013e0000 013f3000   PrintDriver11 2005
01400000 01413000   PrintDriver12 2006
01420000 0146b000   PrintDriver13 2007
01480000 01488000   PrintDriver14 2003
017f0000 0181e000   PrintDriver15 2004
01920000 0192d000   PrintDriver16 2008
01930000 01936000   PrintDriver17 2008
01950000 01959000   PrintDriver18 2008
01960000 01969000   PrintDriver19 2008
01f80000 021e8000   PrintDriver20 2004
032f0000 03514000   PrintDriver21 2003
03cd0000 03cd6000   PrintDriver22 2008
32100000 32148000   PrintDriver23 2008
3ea40000 3ea46000   PrintDriver24 2007
3f000000 3f03d000   PrintDriver25 2009
3f100000 3f133000   PrintDriver26 2009
[…]

The age distribution among 121 modules can be visualized on a CAD diagram:

 

Alternatively we can look at the execution residue on a raw thread stack:

0:017> !teb
TEB at 7ffa9000
    ExceptionList:        0155fd0c
    StackBase:            01560000
    StackLimit:           01550000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffa9000
    EnvironmentPointer:   00000000
    ClientId:             000040dc . 00004278
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffd8000
    LastErrorValue:       0
    LastStatusValue:      8000001a
    Count Owned Locks:    0
    HardErrorMode:        0

0:017> dds 01550000 01560000
01550000  00000000
01550004  00000000
[...]
01554e78  00000000
01554e7c  00000000
01554e80  01554ecc
01554e84  7c82d1bb ntdll!RtlFindActivationContextSectionString+0xe1
01554e88  01554ea4
01554e8c  01554efc
01554e90  00000000
01554e94  020a0018 PrintDriver20!Callback+0×5c88
01554e98  7ffa9c00
01554e9c  00000000
01554ea0  01554ed4
01554ea4  7c82dd6c ntdll!RtlEncodeSystemPointer+0×45b
01554ea8  00020000
01554eac  01554ec8
01554eb0  01554ec8
01554eb4  01554ec8
[…]
01555eb4  0040003e
01555eb8  01556854
01555ebc  00000000
01555ec0  0000003e
01555ec4  0208003e PrintDriver20!GetValue+0xb37fe
01555ec8  00000000
01555ecc  43000000
01555ed0  0000003e
01555ed4  01555ffa
01555ed8  01555fbc
01555edc  01555fa8
01555ee0  001190d8
01555ee4  7c81990d ntdll!RtlpLowFragHeapAlloc+0×210
01555ee8  7c819962 ntdll!RtlpLowFragHeapAlloc+0xc6a
01555eec  0008bff8
01555ef0  00000000
01555ef4  00080000
[…]

The first address 020a0018 seems to be coincidental because its disassembled code is not good:

0:017> ub 020a0018
                 ^ Unable to find valid previous instruction for 'ub 020a0018'

0:017> u 020a0018
PrintDriver20!Callback+0x5c88:
020a0018 048b            add     al,8Bh
020a001a c7              ???

020a001b ebe8            jmp     PrintDriver20!Callback+0×5c75 (020a0005)
020a001d 8d4e28          lea     ecx,[esi+28h]
020a0020 e8d9960100      call    PrintDriver20!DlgProc+0×86ee (020b96fe)
020a0025 8d4e28          lea     ecx,[esi+28h]
020a0028 e87b930100      call    PrintDriver20!DlgProc+0×8398 (020b93a8)
020a002d 8b4618          mov     eax,dword ptr [esi+18h]

However the second address code 0208003e seems sound: cmp is followed by jne:

0:017> ub 0208003e
PrintDriver20!GetValue+0xb37e5:
02080025 8d442414        lea     eax,[esp+14h]
02080029 8b4b2c          mov     ecx,dword ptr [ebx+2Ch]
0208002c 50              push    eax
0208002d e8ce3a0000      call    PrintDriver20!GetValue+0xb72c0 (02083b00)
02080032 8b38            mov     edi,dword ptr [eax]
02080034 8b4c2410        mov     ecx,dword ptr [esp+10h]
02080038 8b41fc          mov     eax,dword ptr [ecx-4]
0208003b 3947fc          cmp     dword ptr [edi-4],eax

0:017> u 0208003e
PrintDriver20!GetValue+0xb37fe:
0208003e 751b            jne     PrintDriver20!GetValue+0xb381b (0208005b)
02080040 8bc8            mov     ecx,eax
02080042 8b742410        mov     esi,dword ptr [esp+10h]
02080046 c1e902          shr     ecx,2
02080049 f3a7            repe cmps dword ptr [esi],dword ptr es:[edi]
0208004b 750e            jne     PrintDriver20!GetValue+0xb381b (0208005b)
0208004d 8bc8            mov     ecx,eax
0208004f 83e103          and     ecx,3

But this is not a function call resulted in saved return address so we can still consider it as a coincidence. However, on the raw stack we also see a large chunk of ASCII data pointing to the same driver in a textual form:

[...]
0155d360  6f742064
0155d364  696e6920
0155d368  6c616974
0155d36c  20657a69
0155d370  61636562
0155d374  20657375
0155d378  75732061
0155d37c  62617469
[...]

0:017> da 0155d360 0155d734
0155d360  "d to initialize because a suitab"
0155d380  "le PrinterDriver20 inf file was ”
[…]

This reinforces our belief in PrinterDriver20. Finally, when looking at critical section list we see corruption signs pointing to the same driver addresses:

0:017> !cs -l -o -s
DebugInfo          = 0x0014bc60
Critical section   = 0x020f7140 (PrintDriver20!DlgProc+0×46130)
LOCKED
LockCount          = 0xFF85EA7F
WaiterWoken        = Yes
OwningThread       = 0×8b0c244c
RecursionCount     = 0×8BFFFBB4
LockSemaphore      = 0×83182444
SpinCount          = 0×088908c4

WARNING: critical section DebugInfo = 0x00000008 doesn't point back
to the DebugInfo found in the active critical sections list = 0x0014bc60.
The critical section was probably reused without calling DeleteCriticalSection.

Cannot read structure field value at 0x0000000a, error 0
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.

DebugInfo          = 0x0014bc88
Critical section   = 0x020f7110 (PrintDriver20!DlgProc+0×46100)
LOCKED
LockCount          = 0×1E7245FF
WaiterWoken        = No
OwningThread       = 0xccccc304
RecursionCount     = 0xC483FFFD
LockSemaphore      = 0×158638B9
SpinCount          = 0xff96e902

WARNING: critical section DebugInfo = 0x0f712068 doesn't point back
to the DebugInfo found in the active critical sections list = 0x0014bc88.
The critical section was probably reused without calling DeleteCriticalSection.

Cannot read structure field value at 0x0f71206a, error 0
[...]

- Dmitry Vostokov @ DumpAnalysis.org -

Viewing Problem Artifacts from Different Angles

Tuesday, April 28th, 2009

I often say or write something like this: “I looked at the dump|trace file from different angles”.

- Dmitry Vostokov @ DumpAnalysis.org -

Null data pointer, incorrect stack trace, changed environment, hooked functions and coincidental symbolic information: pattern cooperation

Tuesday, April 28th, 2009

One application was constantly crashing in a highly dynamic multi-user environment with various value-added GUI-enhancing hooking and patching 3rd-party products. The dump was analyzed and it shows the data NULL pointer access violation:

0:000> r
eax=05503ff0 ebx=05503f10 ecx=6bb7da25 edx=00000000 esi=02203c10 edi=00000000
eip=6b542ba3 esp=04f6f860 ebp=04f6f87c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
gui_plugin!entry+0xe63:
6b542ba3 8b0f            mov     ecx,dword ptr [edi]  ds:0023:00000000=????????

Unfortunately due to the lack of gui_plugin symbols the stack trace is incorrect:

0:000> kv
ChildEBP RetAddr  Args to Child             
WARNING: Stack unwind information not available. Following frames may be wrong.
04f6f87c 00000000 000003b9 00000690 00000d20 gui_plugin!entry+0xe63

We do not attempt stack trace reconstruction here but show one of troubleshooting methods to eliminate possible effects of the changed environment. We look for any code patching modules by examining hooked functions:

0:000> !chkimg -lo 50 -d !kernel32 -v
Searching for module with expression: !kernel32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\kernel32.dll\4791A76Ddb000\kernel32.dll
No range specified

Scanning section:    .text
Size: 835001
Range to scan: 76701000-767ccdb9
    7670a672-7670a676  5 bytes - kernel32!MoveFileW
 [ 8b ff 55 8b ec:e9 89 59 80 09 ]
    7671c5c8-7671c5cc  5 bytes - kernel32!DeleteFileW (+0×11f56)
 [ 8b ff 55 8b ec:e9 33 3a 80 09 ]
    76721070-76721074  5 bytes - kernel32!MoveFileExW (+0×4aa8)
 [ 8b ff 55 8b ec:e9 8b ef 83 09 ]
    7678ff3f-7678ff43  5 bytes - kernel32!ReplaceFileW (+0×6eecf)
 [ 68 b4 03 00 00:e9 bc 00 7a 09 ]
Total bytes compared: 835001(100%)
Number of errors: 20
20 errors : !kernel32 (7670a672-7678ff43)

0:000> u 7670a672
kernel32!MoveFileW:
7670a672 e989598009      jmp     7ff10000
7670a677 6a02            push    2
7670a679 6a00            push    0
7670a67b 6a00            push    0
7670a67d ff750c          push    dword ptr [ebp+0Ch]
7670a680 ff7508          push    dword ptr [ebp+8]
7670a683 e8c4690100      call    kernel32!MoveFileWithProgressW (7672104c)
7670a688 5d              pop     ebp

0:000> u 7ff10000
7ff10000 e99b1e94f5      jmp     3rdPartyHook!MoveFileW (75851ea0)
7ff10005 8bff            mov     edi,edi
7ff10007 55              push    ebp
7ff10008 8bec            mov     ebp,esp
7ff1000a e968a67ff6      jmp     kernel32!MoveFileW+0×5 (7670a677)
7ff1000f 0000            add     byte ptr [eax],al
7ff10011 0000            add     byte ptr [eax],al
7ff10013 0000            add     byte ptr [eax],al

We see lots of references to 3rdPartyHook on the thread raw stack:

0:000> !teb
TEB at 7ffd5000
    ExceptionList:        04f6ff70
    StackBase:            04f70000
    StackLimit:           04f6c000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffd5000
    EnvironmentPointer:   00000000
    ClientId:             00001790 . 000017d0
    RpcHandle:            00000000
    Tls Storage:          7ffd502c
    PEB Address:          7ffd8000
    LastErrorValue:       0
    LastStatusValue:      0
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dds 04f6c000 04f70000
[...]
04f6cf28  00440042
04f6cf2c  04f6d2d4
04f6cf30  77179834 ntdll!_except_handler4
04f6cf34  003ce48e
04f6cf38  fffffffe
04f6cf3c  7719d584 ntdll!LdrpResSearchResourceInsideDirectory+0x80f
04f6cf40  7719cf9d ntdll!LdrpResSearchResourceMappedFile+0x521
04f6cf44  6b920002
04f6cf48  00000000
04f6cf4c  000bf000
04f6cf50  6b9d9000
04f6cf54  6b9200f8
04f6cf58  00000000
04f6cf5c  01f6d1d4 <Unloaded_D3D9.DLL>+0x12d1d4
04f6cf60  00000003
04f6cf64  04f6cf88
04f6cf68  04f6d370
04f6cf6c  04f6d340
04f6cf70  771d8fb2 ntdll!_SEH_epilog4_GS+0xa
04f6cf74  7719d018 ntdll!LdrpResSearchResourceMappedFile+0x781
04f6cf78  73d77612
04f6cf7c  6b920002
04f6cf80  000bf000
04f6cf84  00000000
04f6cf88  00000001
04f6cf8c  00000000
04f6cf90  00000000
04f6cf94  00000000
04f6cf98  00000000
04f6cf9c  04f6d44c
04f6cfa0  75851c2f 3rdPartyHook+0×1c2f
04f6cfa4  04f6d1e8
04f6cfa8  00000000
04f6cfac  00000000
04f6cfb0  00000000
04f6cfb4  00000000
[…]

The symbolic references point to valid and sound code:

0:000> ub 75851c2f
3rdPartyHook!GetData+0x22d:
75851c1d c20c00          ret     0Ch
75851c20 8b4dfc          mov     ecx,dword ptr [ebp-4]
75851c23 5f              pop     edi
75851c24 5e              pop     esi
75851c25 33cd            xor     ecx,ebp
75851c27 33c0            xor     eax,eax
75851c29 5b              pop     ebx
75851c2a e8590f0000      call    3rdPartyHook!__security_check_cookie (75852b88)

0:000> u 75851c2f
3rdPartyHook!GetData+0x23f:
75851c2f 8be5            mov     esp,ebp
75851c31 5d              pop     ebp
75851c32 c20c00          ret     0Ch
75851c35 cc              int     3
75851c36 cc              int     3
75851c37 cc              int     3
75851c38 cc              int     3
75851c39 cc              int     3

 

We also see the other 4th-party module that we know as patching from our past experience:

[...]
04f6f850  00000000
04f6f854  00000000
04f6f858  00000000
04f6f85c  00000000
04f6f860  1000c200 4thPartyHook!Shared+0×80
04f6f864  00000000
04f6f868  04f6f87c
04f6f86c  00000001
04f6f870  00000000
04f6f874  00000000
04f6f878  00000000
04f6f87c  0000006c
[…]

The address 1000c200 looks suspicious and coincidental as the set of flags. However, when we disassemble the address, we see the valid and sound code and the module seems to be loaded at 0×10000000 address:

0:000> ub 1000c200
4thPartyHook!Shared+0x80:
1000c1e8 e8234effff      call    10001010
1000c1ed a160020710      mov     eax,dword ptr [10070260]
1000c1f2 83c418          add     esp,18h
1000c1f5 8b750c          mov     esi,dword ptr [ebp+0Ch]
1000c1f8 85f6            test    esi,esi
1000c1fa 7530            jne     1000c22c
1000c1fc 85c0            test    eax,eax
1000c1fe 742c            je      1000c22c

0:000> u 1000c200
4thPartyHook!Shared+0x80:
1000c200 8b08            mov     ecx,dword ptr [eax]
1000c202 8b5004          mov     edx,dword ptr [eax+4]
1000c205 53              push    ebx
1000c206 8bd9            mov     ebx,ecx
1000c208 0bda            or      ebx,edx
1000c20a 5b              pop     ebx
1000c20b 741f            je      1000c22c
1000c20d f6400801        test    byte ptr [eax+8],1

0:000> !dh 10000000
[...]
File Type: DLL
FILE HEADER VALUES
     14C machine (i386)
       5 number of sections
       0 file pointer to symbol table
       0 number of symbols
      E0 size of optional header
    2102 characteristics
            Executable
            32 bit word machine
            DLL
[...]
OPTIONAL HEADER VALUES
     10B magic #
    8.00 linker version
   62000 size of code
   15000 size of initialized data
       0 size of uninitialized data
   3E1B6 address of entry point
    1000 base of code
         ----- new -----
10000000 image base
    1000 section alignment
    1000 file alignment
       2 subsystem (Windows GUI)
    4.00 operating system version
    0.00 image version
    4.00 subsystem version
   7A000 size of image
    1000 size of headers
[...]

0:000> lm
start    end        module name
00400000 00425000   App        
[...]
10000000 1007a000   4thPartyHook
[...]
 

The recommendation I usually give in such cases it to remove or disable 3rd- and 4th-party products that do code patching to see if this resolves the problem. This makes the module list internally clean and if the problem persists then we should look for another causes and restore 3rd- and 4th-party products.

- Dmitry Vostokov @ DumpAnalysis.org -

Looking for abnormal: case study

Monday, April 27th, 2009

I’m RARE rule #5 says:

“Provide appropriate explanations and narrative in the cases where analysis is inconclusive”.

Here is the typical example of such case when a kernel dump was taken with the vague description about server problems. The dump file analysis revealed the following abnormal conditions warranting further troubleshooting steps:

The AppA.exe, the part of the customer product, is about 1Gb in size when its typical size should be no more than 200Mb. Perhaps we have a memory leak here. We can suggest to take a few consecutive memory dumps of the growing memory and analyze it later as described in a heap leak pattern. This can also be a .NET leak too if unmanaged AppA.exe happened to load any managed components through 3rd-party DLLs. It could be also  some unknown loaded component reserved and committed large portion of virtual memory space.

0: kd> !vm
[...]
0eec AppA.exe        241366 (    965464 Kb)
03c0 svchost.exe      10304 (     41216 Kb)
0230 lsass.exe         8764 (     35056 Kb)
0298 svchost.exe       6402 (     25608 Kb)
01f4 winlogon.exe      5787 (     23148 Kb)
[…]

We can confirm the absence of handle leaks:

0: kd> !process 0eec
Searching for Process with Cid == eec
Cid Handle table at fffffa80014d6000 with 794 Entries in use
PROCESS fffffade6e601860
    SessionId: 0  Cid: 0eec    Peb: 7efdf000  ParentCid: 0eb8
    DirBase: b10fa000  ObjectTable: fffffa8000c39170  HandleCount: 865.
    Image: AppA.exe
    VadRoot fffffade68d7e580 Vads 1961 Clone 0 Private 237843. Modified 77. Locked 1.
    DeviceMap fffffa8001221580
    Token                             fffffa8001fdebe0
    ElapsedTime                       6 Days 22:18:09.271
    UserTime                          00:23:00.406
    KernelTime                        00:27:31.281

    QuotaPoolUsage[PagedPool]         106968
    QuotaPoolUsage[NonPagedPool]      19055186
    Working Set Sizes (now,min,max)  (240529, 50, 345) (962116KB, 200KB, 1380KB)
    PeakWorkingSetSize                240671
    VirtualSize                       1244 Mb
    PeakVirtualSize                   1246 Mb
    PageFaultCount                    244053
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      241366

Kernel and user times seem high (about 30 minutes) but it correlates with almost 7 day extensive application usage that involves constant database access.

Looking further at running processes we see that the crucial AppB and AppC applications that were supposed to be running to serve user requests are orphaned:

09e8 AppB.exe            0 (         0 Kb)
09c0 AppC.exe            0 (         0 Kb)

Were they closed normally, forcefully terminated after being hang or crashed? These questions should be asked and appropriate measures taken to capture crash dumps in case event logs reveal access violations.

- Dmitry Vostokov @ DumpAnalysis.org -

Unusual in Windows 7

Friday, April 24th, 2009

Noticed UnusualBoost in !process WinDbg command output from the complete memory dump taken from Windows 7 Beta which I’m evaluating for the purposes of memory dump analysis:

THREAD 852b5d48  Cid 071c.0950  Teb: 7ff9c000 Win32Thread: fe1fc008 WAIT: (WrUserRequest) UserMode Non-Alertable
    853e0690  SynchronizationEvent
Not impersonating
DeviceMap                 8f909fc8
Owning Process            8538a030       Image:         explorer.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      47057          Ticks: 9460 (0:00:02:27.812)
Context Switch Count      61            
UserTime                  00:00:00.000
KernelTime                00:00:00.046
Win32 Start Address WINMM!mciwindow (0x73942761)
Stack Init 904b9fd0 Current 904b9a60 Base 904ba000 Limit 904b7000 Call 0
Priority 13 BasePriority 10 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
Kernel stack not resident.
ChildEBP RetAddr 
904b9a78 8268951d nt!KiSwapContext+0×26
904b9abc 826cf460 nt!KiSwapThread+0×57b
904b9b10 8268ccaf nt!KiCommitThreadWait+0×340
904b9be8 8e50c768 nt!KeWaitForSingleObject+0×3ee
904b9c44 8e50c575 win32k!xxxRealSleepThread+0×1d7
904b9c60 8e508379 win32k!xxxSleepThread+0×2d
904b9cb8 8e50cf9a win32k!xxxRealInternalGetMessage+0×4b2
904b9d1c 8269066a win32k!NtUserGetMessage+0×3f
904b9d1c 771e5704 nt!KiFastCallEntry+0×12a
053af7e8 76fdbb29 ntdll!KiFastSystemCallRet
053af7ec 76fd3f49 USER32!NtUserGetMessage+0xc
053af810 739427e0 USER32!GetMessageA+0×8d
053af848 76f536d6 WINMM!mciwindow+0×102
053af854 771c883c kernel32!BaseThreadInitThunk+0xe
053af894 771c880f ntdll!__RtlUserThreadStart+0×70
053af8ac 00000000 ntdll!_RtlUserThreadStart+0×1b

There is also ForegroundBoost but its meaning is obvious to me.

- Dmitry Vostokov @ DumpAnalysis.org -

FinalExceptionHandler

Friday, April 24th, 2009

I looked at thread raw stack in a process dump from Windows Server 2008 and found ntdll!FinalExceptionHandler symbol:

Loading Dump File [App.dmp]
User Mini Dump File with Full Memory: Only application data is available
Windows Server 2008/Windows Vista Version 6001 (Service Pack 1) MP (4 procs) Free x86 compatible

0:000> !teb
TEB at 7ffde000
    ExceptionList:        0022fdd8
    StackBase:            00230000
    StackLimit:           00225000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffde000
    EnvironmentPointer:   00000000
    ClientId:             00002458 . 00002478
    RpcHandle:            00000000
    Tls Storage:          7ffde02c
    PEB Address:          7ffdf000
    LastErrorValue:       0
    LastStatusValue:      c0000035
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dds 00225000 00230000
[...]
0022ff7c  00000000
0022ff80  00000000
0022ff84  00000000
0022ff88  0022ff94
0022ff8c  76744911 kernel32!BaseThreadInitThunk+0xe
0022ff90  7ffdf000
0022ff94  0022ffd4
0022ff98  77b5e4b6 ntdll!__RtlUserThreadStart+0x23
0022ff9c  7ffdf000
0022ffa0  2497b80a
0022ffa4  00000000
0022ffa8  00000000
0022ffac  7ffdf000
0022ffb0  00000000
0022ffb4  00000000
0022ffb8  00000000
0022ffbc  0022ffa0
0022ffc0  00000000
0022ffc4  0022ffe4
0022ffc8  77b29834 ntdll!_except_handler4
0022ffcc  530d7826
0022ffd0  00000000
0022ffd4  0022ffec
0022ffd8  77b5e489 ntdll!_RtlUserThreadStart+0x1b
0022ffdc  00401110 App+0x1110
0022ffe0  7ffdf000
0022ffe4  ffffffff
0022ffe8  77bc75de ntdll!FinalExceptionHandler
0022ffec  00000000
0022fff0  00000000
0022fff4  00401110 App+0×1110
0022fff8  7ffdf000
0022fffc  00000000
00230000  78746341

Search on Internet led me to this very interesting review article about Windows memory protection mechanisms:

http://taossa.com/archive/bh08sotirovdowd.pdf

It also explains this new “Final” exception mechanism in W2K8.

I couldn’t find this in raw stack traces on Windows 7 so it might be the case that it is not enabled by default there like in Windows Vista.

- Dmitry Vostokov @ DumpAnalysis.org -

Blocked thread, historical information, execution residue, hidden exception, dynamic memory corruption, incorrect stack trace and not my version: pattern cooperation

Thursday, April 23rd, 2009

One application was hanging after printing and its user dump was taken. Inspection of threads reveals only one blocked thread:

0:000> ~*kL

.  0  Id: 486c.5704 Suspend: 1 Teb: 7ffa9000 Unfrozen
ChildEBP RetAddr 
01c0fb68 7c827cfb ntdll!KiFastSystemCallRet
01c0fb6c 77e6202c ntdll!NtWaitForMultipleObjects+0xc
01c0fc14 77e62fbe kernel32!WaitForMultipleObjectsEx+0x11a
01c0fc30 00439f16 kernel32!WaitForMultipleObjects+0×18
01c0fd94 77bcaed6 App!~MyPrinter+0xe
01c0fdc0 77bcaf82 msvcrt!_cinit+0xad
01c0fdd0 77baf582 msvcrt!_cexit+0xb
01c0fde0 7c81a352 msvcrt!__CRTDLL_INIT+0×15c
01c0fe00 7c830e70 ntdll!LdrpCallInitRoutine+0×14
01c0feb8 77e668a3 ntdll!LdrShutdownProcess+0×182

We see that the process was being terminated and the last thread was blocked in a destructor of MyPrinter class. However we see some historical information, faultrep.dll unloaded, and this means that some exception happened prior to the process shutdown:

0:000> lm
[...]
Unloaded modules:
69500000 69517000   faultrep.dll
[…]

Then we look at the thread raw stack to see any signs of execution residue and hidden exceptions in particular:

0:000> !teb
TEB at 7ffa9000
    ExceptionList:        01c0fc04
    StackBase:            01c10000
    StackLimit:           01c08000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffa9000
    EnvironmentPointer:   00000000
    ClientId:             0000486c . 00005704
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffd9000
    LastErrorValue:       0
    LastStatusValue:      c0000010
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dds 01c08000 01c10000
01c08000  00000000
01c08004  00000000
01c08008  00000000
[...]
01c0e704  00001000
01c0e708  01c0ea14
01c0e70c  7c82855e ntdll!KiUserExceptionDispatcher+0xe
01c0e710  01c0a000
01c0e714  01c0e73c
01c0e718  01c0e720
01c0e71c  01c0e73c
01c0e720  c0000005
01c0e724  00000000
01c0e728  00000000
01c0e72c  7c82a756 ntdll!RtlpCoalesceFreeBlocks+0×370
01c0e730  00000002
01c0e734  00000000
[…]

Applying exception context we see heap corruption:

0:000> .cxr 01c0e73c
eax=01998db0 ebx=00370000 ecx=01998db0 edx=4d445443 esi=01998da8 edi=01998010
eip=7c82a756 esp=01c0ea08 ebp=01c0ea14 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000    efl=00010246
ntdll!RtlpCoalesceFreeBlocks+0×370:
7c82a756 3b4a04  cmp    ecx,dword ptr [edx+4] ds:0023:4d445447=????????

However stack trace a bit incorrect at the end (shown in magenta) but this can be explained that it is a residual stack trace that happened in the past with some data overwritten by exception processing:

0:000> kL
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr 
01c0ea14 7c82a82b ntdll!RtlpCoalesceFreeBlocks+0x370
01c0eafc 77bbcef6 ntdll!RtlFreeHeap+0x38e
01c0eb44 00421fdc msvcrt!free+0xc3
01c0eb6c 00422395 App!SetData+0x3a5
[...]
01c0f81c 7c829f60 App!CreatePrinter+0x6a
01c0f824 7c829f59 ntdll!CheckHeapFillPattern+0×64
01c0f948 7c82770b ntdll!RtlFreeHeap+0×70f
01c0f96c 77c27631 ntdll!NtQueryValueKey+0xc
01c0fab8 00000000 gdi32!ReadDisableMetaFilesRegKey+0xfc

Because the hang happens after printing we detect old printer drivers in the module list (certainly not my version of the environment used for application testing):

0:000> lmt
start    end        module name
[...]
6b900000 6b925000   PrinterDriverA  [...] 2002
6c900000 6c947000   PrinterDriverB  […] 2002
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Pattern-Driven Memory Analysis (Part 2)

Tuesday, April 21st, 2009

Before we explain stages of the analysis process shown in Part 1, let’s start with a brief overview of memory dumps, debuggers and logs. Recall that a memory dump is a snapshot of a process, system or physical memory state. This unifies post-mortem analysis and live debugging. Debuggers are tools that allow us to get and modify these memory snapshots. Other tools that allow us to get memory dump files are process dumpers like userdump.exe, Task Manager since Vista, WER, and system dumpers like LiveKd and Win32dd. We should not forget tools and methods that allow us to trigger Windows kernel ability to save consistent memory dump files: NMI button, keyboard method and various software bugcheck-triggers like Citrix SystemDump. Now coming back to debuggers. One of their essential features is to save a debugging session log, formatted textual output saved in a text file for further processing. One good example is !process 0 ff WinDbg command to output all processes and their thread stack traces (see Stack Trace Collection pattern for other variations). 

I’ve created a page to add all P-DMA parts as soon as I write them:

Pattern-Driven Memory Analysis

- Dmitry Vostokov @ DumpAnalysis.org -

Music for Debugging: In the Memory Dump File

Monday, April 20th, 2009

I used to be a fun of Yanni music in the late 1990s. Today I started listening again to some of his albums and recommend them for any debugging session. If you are new to this music there is a compilation album that I’m listening to while I’m writing this post:

In the Mirror

Buy from Amazon

Here is my version of track titles inspired by listening (with my comments in italics): 

1. In the Memory Dump File
2. The Morning Session
3. Love for Debugging
4. A Debugger’s Dream 
5. Within Kernel
6. Forbidden Access
7. Once Upon a Second Chance
8. Chasing Bugs
9. The Main Thread [Special Debugging Version]  
10. Quiet Memory Analyst 
11. Debugging Joy (My Life is Debugging)
12. So Long My Debugger (My Only Friend on Virtual Memory Plains)
13. Before I Leave the Debugger 
14. End of Session (It wasn’t bad after all)
15. Face in the Memory Dump (after applying Natural Memory Visualization techniques: you can see pictures and various artifacts stored in memory buffers)

- Dmitry Vostokov @ DumpAnalysis.org -

The Debugging Decade!

Monday, April 20th, 2009

DumpAnalysis.org announces forthcoming 2011 - 2020 as The Debugging Decade.

Q&A

Q. Why 2011 - 2020?

A. The main reason is the fact that 2009 is The Year of Debugging and 2010 is The Year of Dump Analysis. This naturally extends to a decade.

Q. Do you plan The Debugging Century?

A. Yes, I do. Details will be announced later.

- Dmitry Vostokov @ DumpAnalysis.org -

Manual and early crash dump, stack trace collection, main thread, blocked threads and pass through functions: pattern cooperation

Thursday, April 16th, 2009

The system was hanging and a manual kernel dump file was generated:

0: kd> !analyze -v
[...]
NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction.  The hardware supplier should
be called.
Arguments:
Arg1: 004f4454
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
[...]

Initially it looked like an earlier premature crash dump taken after the reboot:

0: kd> version
[...]
System Uptime: 0 days 0:00:54.203
[…]

However the incident description stated that the system was hanging during its startup so we went further to look at a stack trace collection of all its threads and found the main thread of spooler was hanging for about 23 seconds after 4 seconds since process creation:

0: kd> !process 0 ff

[...]

PROCESS 8cf06020  SessionId: 0  Cid: 04a4    Peb: 7ffde000  ParentCid: 0244
    DirBase: cff8c1a0  ObjectTable: d70086d0  HandleCount:  24.
    Image: spoolsv.exe
    VadRoot 8bb19130 Vads 32 Clone 0 Private 60. Modified 0. Locked 0.
    DeviceMap d66028f0
    Token                             d6e88988
    ElapsedTime                       00:00:27.921
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         27844
    QuotaPoolUsage[NonPagedPool]      1280
    Working Set Sizes (now,min,max)  (312, 50, 345) (1248KB, 200KB, 1380KB)
    PeakWorkingSetSize                312
    VirtualSize                       12 Mb
    PeakVirtualSize                   12 Mb
    PageFaultCount                    322
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      80

THREAD 8cf22db0  Cid 04a4.04a8  Teb: 7ffdd000 Win32Thread: bc155948 WAIT: (Unknown) KernelMode Non-Alertable
    8beeca28  SynchronizationEvent
IRP List:
    8beecd08: (0006,0094) Flags: 00000884  Mdl: 00000000
Not impersonating
DeviceMap                 d66028f0
Owning Process            8cf06020       Image:         spoolsv.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      1981           Ticks: 1488 (0:00:00:23.250)
Context Switch Count      9                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×0100468c
Start Address 0×77e617f8
Stack Init b6a63b30 Current b6a630cc Base b6a64000 Limit b6a60000 Call b6a63b34
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
b6a630e4 80833465 nt!KiSwapContext+0×26
b6a63110 80829a62 nt!KiSwapThread+0×2e5
b6a63158 b5dbc727 nt!KeWaitForSingleObject+0×346
b6a6318c b5dbd020 mrxdav!RxConstructSrvCall+0×187
b6a63200 b5dbd144 mrxdav!RxFindOrCreateConnections+0×58a
b6a6324c b5dbd552 mrxdav!RxConstructVirtualNetRoot+0×38
b6a632b8 b5db27fc mrxdav!RxFindOrConstructVirtualNetRoot+0×220
b6a63304 b5da7ff1 mrxdav!RxPrefixClaim+0×160
b6a6331c b5d93f72 mrxdav!RxCommonDevFCBIoCtl+0×71
b6a633ac b5da780c mrxdav!RxFsdCommonDispatch+0×320
b6a633cc b5da0fc4 mrxdav!RxFsdDispatch+0xd4
b6a63440 8081df65 mrxdav!MRxDAVFsdDispatch+0×1f0
b6a63454 bace232e nt!IofCallDriver+0×45
b6a63460 bace24f6 Mup!MupiIssueQueryRequest+0×18
b6a63488 bace2414 Mup!MupiResolvePrefix+0×11b
b6a634d4 bacdf4ed Mup!CreateRedirectedFile+0×385
b6a63530 8081df65 Mup!MupCreate+0×1cb
b6a63544 808f8f71 nt!IofCallDriver+0×45
b6a6362c 80937942 nt!IopParseDevice+0xa35
b6a636ac 80933a76 nt!ObpLookupObjectName+0×5b0
b6a63700 808ec76b nt!ObOpenObjectByName+0xea
b6a63884 8088978c nt!NtQueryAttributesFile+0×11d
b6a63884 7c8285ec nt!KiFastCallEntry+0xfc (TrapFrame @ b6a63894)

Looking at IRP we see that I/O request was redirected to the following file share:

0: kd> !irp 8beecd08
Irp is active with 1 stacks 1 is current (= 0x8beecd78)
 No Mdl: No System Buffer: Thread 8cf22db0:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
>[  0, 0]   0  0 8cfdcee8 b6a637f0 00000000-00000000   
        \FileSystem\Mup
   Args: b6a63570 01200000 00070000 00000000

0: kd> !fileobj b6a637f0

\SERVER-A\f$\AppFiles\Component.dll

Device Object: 0x8cfdcee8   \FileSystem\Mup
Vpb is NULL

Flags:  0x0

CurrentByteOffset: 0

When looking at other processes we see 2 threads blocked in svchost.exe in the same redirector component:

0: kd> !stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
[...]
                            [8bb2a728 svchost.exe]
 3d0.0003d4  8bb2a290 0000701 Blocked    nt!NtReadFile+0x5d5
 3d0.0003d8  8bb2aba0 00006ff Blocked    nt!ObpWaitForMultipleObjects+0x202
 3d0.0003dc  8bb338c0 00005eb Blocked    nt!ObpWaitForMultipleObjects+0x202
 3d0.0003e8  8bb2cdb0 0000892 Blocked    nt!ObpWaitForMultipleObjects+0x202
 3d0.0003ec  8bb2c838 0000892 Blocked    nt!KiFastCallEntry+0xfc
 3d0.0003f0  8bb26cd0 0000892 Blocked    nt!KiFastCallEntry+0xfc
 3d0.000488  8ceffb88 00006fb Blocked    mrxdav!UMRxAssignWork+0xdf
 3d0.00048c  8cec3db0 00005d0 Blocked    mrxdav!UMRxAssignWork+0xdf

 3d0.000490  8cec3b40 00006fb Blocked    nt!KiFastCallEntry+0xfc
 3d0.0004d4  8bb02918 00005c7 Blocked    nt!KiFastCallEntry+0xfc
 3d0.0004d8  8bafdb40 00005c4 Blocked    nt!KiFastCallEntry+0xfc
[…]

In these two threads we see pass through functions in driverA and driverB:

0: kd> !thread 8ceffb88
THREAD 8ceffb88  Cid 03d0.0488  Teb: 7ffd8000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8cef1f28  QueueObject
Not impersonating
DeviceMap                 d6e2c298
Owning Process            8bb2a728       Image:         svchost.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      1682           Ticks: 1787 (0:00:00:27.921)
Context Switch Count      2            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x5a005e80
Start Address 0x77e617ec
Stack Init b5ed8000 Current b5ed7a64 Base b5ed8000 Limit b5ed5000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr
b5ed7a7c 80833465 nt!KiSwapContext+0x26
b5ed7aa8 8082b60f nt!KiSwapThread+0x2e5
b5ed7af0 b5da714d nt!KeRemoveQueue+0x417
b5ed7b10 b5d9d7db mrxdav!UMRxAssignWork+0xdf
b5ed7b70 bae71bc5 mrxdav!MRxDAVFastIoDeviceControl+0x13d
b5ed7ba8 bae7f42d fltmgr!FltpPerformFastIoCall+0x153
b5ed7bf4 b5f57300 fltmgr!FltpFastIoDeviceControl+0xb1
b5ed7c20 b6e20f8f driverB!FS_FastIoDeviceControl+0×60
b5ed7c5c 808f5e2f driverA!FastIoDeviceControl+0xcf

b5ed7d00 808eed08 nt!IopXxxControlFile+0×255
b5ed7d34 8088978c nt!NtDeviceIoControlFile+0×2a
b5ed7d34 7c8285ec nt!KiFastCallEntry+0xfc (TrapFrame @ b5ed7d64)

- Dmitry Vostokov @ DumpAnalysis.org -

A Copernican Revolution in Debugging

Thursday, April 16th, 2009

A number of Copernican revolutions occurred or announced in various branches of various sciences. Now it’s my turn to say that action-based ”earth-centric” debugging is replaced by memory (dump) analysis as a “heliocentric” foundation of debugging. Because even in live debugging we have memory snapshots and differential memory analysis. Traces in trace-based debugging is another example of universal memory dumps. Therefore memory (dump) analysis comes first.

- Dmitry Vostokov @ DumpAnalysis.org -

NULL Data Pointer Pattern: case study

Wednesday, April 15th, 2009

Here is the promised case study for the previous post about data NULL pointers. The complete dump has this bugcheck:

0: kd> !analyze -v

[...]

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck.  Usually the exception address pinpoints the driver/function that caused the problem.  Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003.  This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG.  This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG.  This will let us see why this breakpoint is happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8081c7c4, The address that the exception occurred at
Arg3: f1b5d730, Trap Frame
Arg4: 00000000

[...]

FAULTING_IP:
nt!IoIsOperationSynchronous+e
8081c7c4 f6412c02        test    byte ptr [ecx+2Ch],2

TRAP_FRAME:  f1b5d730 -- (.trap 0xfffffffff1b5d730)
[...]

0: kd> .trap 0xfffffffff1b5d730
ErrCode = 00000000
eax=8923b008 ebx=00000000 ecx=00000000 edx=8923b008 esi=891312d0 edi=89f0b300
eip=8081c7c4 esp=f1b5d7a4 ebp=f1b5d7a4 iopl=0 nv up ei ng nz ac pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010296
nt!IoIsOperationSynchronous+0xe:
8081c7c4 f6412c02   test    byte ptr [ecx+2Ch],2  ds:0023:0000002c=??

0: kd> kv 100
ChildEBP RetAddr  Args to Child             
f1b5d7a4 f42cdea9 8923b008 89f0b300 8923b008 nt!IoIsOperationSynchronous+0xe
f1b5d7bc 8081df85 89f0b300 8923b008 00000200 driveB!FsdDeviceIoControlFile+0×19
f1b5d7d0 808ed7a9 00000000 f1b5da84 f1b5db6c nt!IofCallDriver+0×45
f1b5da20 f3c3a521 89f0b300 f1b5da84 f1b5da84 nt!IoVolumeDeviceToDosName+0×89
WARNING: Stack unwind information not available. Following frames may be wrong.
f1b5da3c f3c3b58e 00000618 e4e00420 f1b5dad4 driverA+0×18531
[…]
f1b5dc3c 8081df85 89f48b48 87fa3008 89140d30 driverA+0×1df4

f1b5dc50 808f5437 87fa3078 89140d30 87fa3008 nt!IofCallDriver+0×45
f1b5dc64 808f61bf 89f48b48 87fa3008 89140d30 nt!IopSynchronousServiceTail+0×10b
f1b5dd00 808eed08 000000f0 00000000 00000000 nt!IopXxxControlFile+0×5e5
f1b5dd34 808897bc 000000f0 00000000 00000000 nt!NtDeviceIoControlFile+0×2a
f1b5dd34 7c8285ec 000000f0 00000000 00000000 nt!KiFastCallEntry+0xfc (TrapFrame @ f1b5dd64)
0856e154 7c826fcb 77e416f5 000000f0 00000000 ntdll!KiFastSystemCallRet
0856e158 77e416f5 000000f0 00000000 00000000 ntdll!NtDeviceIoControlFile+0xc
0856e1bc 6f050c6c 000000f0 5665824c 0856e234 kernel32!DeviceIoControl+0×137
[…]

From WDK help we know that the first parameter to IoIsOperationSynchronous is a pointer to an IRP structure:

0: kd> !irp 8923b008
Irp is active with 3 stacks 3 is current (= 0x8923b0c0)
 No Mdl: System buffer=878b7288: Thread 8758a020:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
                     Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
                     Args: 00000000 00000000 00000000 00000000
>[  e, 0]   0  0 89f0b300 00000000 00000000-00000000   
              \FileSystem\DriverB
                     Args: 00000200 00000000 004d0008 00000000

Disassembling the function shows some pointer dereferencing and we can reconstruct it starting from EBP+8, a pointer to an IRP. 

0: kd> .asm no_code_bytes
Assembly options: no_code_bytes

0: kd> u nt!IoIsOperationSynchronous nt!IoIsOperationSynchronous+0xe
nt!IoIsOperationSynchronous:
8081c7b6 mov     edi,edi
8081c7b8 push    ebp
8081c7b9 mov     ebp,esp
8081c7bb mov     eax,dword ptr [ebp+8]
8081c7be mov     ecx,dword ptr [eax+60h]
8081c7c1 mov     ecx,dword ptr [ecx+18h]

EAX+60 seems to be a current stack location member of IRP and it is a pointer itself to _IO_STACK_LOCATION structure:

0: kd> dt -r _IRP 8923b008
ntdll!_IRP
   +0x000 Type             : 6
   +0x002 Size             : 0x268
   +0x004 MdlAddress       : (null)
   +0x008 Flags            : 0x70
[...]
   +0x038 CancelRoutine    : (null)
   +0x03c UserBuffer       : 0xf1b5d814
   +0×040 Tail             : __unnamed
      +0×000 Overlay          : __unnamed
         +0×000 DeviceQueueEntry : _KDEVICE_QUEUE_ENTRY
         +0×000 DriverContext    : [4] (null)
         +0×010 Thread           : 0×8758a020 _ETHREAD
         +0×014 AuxiliaryBuffer  : (null)
         +0×018 ListEntry        : _LIST_ENTRY [ 0×0 - 0×0 ]
         +0×020 CurrentStackLocation : 0×8923b0c0 _IO_STACK_LOCATION
[…]

ECX+18 is a pointer to a file object in _IO_STACK_LOCATION structure:

0: kd> dt _IO_STACK_LOCATION 8923b008+60
ntdll!_IO_STACK_LOCATION
   +0x000 MajorFunction    : 0xc0 ''
   +0x001 MinorFunction    : 0xb0 ''
   +0x002 Flags            : 0x23 '#'
   +0x003 Control          : 0x89 ''
   +0x004 Parameters       : __unnamed
   +0x014 DeviceObject     : (null)
   +0×018 FileObject       : (null)
   +0×01c CompletionRoutine : (null)
   +0×020 Context          : (null)

2C offset at the crash point test byte ptr [ecx+2Ch],2 is _FILE_OBJECT Flags member:

0: kd> dt _FILE_OBJECT
ntdll!_FILE_OBJECT
   +0x000 Type             : Int2B
   +0x002 Size             : Int2B
   +0x004 DeviceObject     : Ptr32 _DEVICE_OBJECT
   +0x008 Vpb              : Ptr32 _VPB
   +0x00c FsContext        : Ptr32 Void
   +0x010 FsContext2       : Ptr32 Void
   +0x014 SectionObjectPointer : Ptr32 _SECTION_OBJECT_POINTERS
   +0x018 PrivateCacheMap  : Ptr32 Void
   +0x01c FinalStatus      : Int4B
   +0x020 RelatedFileObject : Ptr32 _FILE_OBJECT
   +0x024 LockOperation    : UChar
   +0x025 DeletePending    : UChar
   +0x026 ReadAccess       : UChar
   +0x027 WriteAccess      : UChar
   +0x028 DeleteAccess     : UChar
   +0x029 SharedRead       : UChar
   +0x02a SharedWrite      : UChar
   +0x02b SharedDelete     : UChar
   +0×02c Flags            : Uint4B
   +0×030 FileName         : _UNICODE_STRING
   +0×038 CurrentByteOffset : _LARGE_INTEGER
   +0×040 Waiters          : Uint4B
   +0×044 Busy             : Uint4B
   +0×048 LastLock         : Ptr32 Void
   +0×04c Lock             : _KEVENT
   +0×05c Event            : _KEVENT
   +0×06c CompletionContext : Ptr32 _IO_COMPLETION_CONTEXT

So it looks like driverA passed an IRP with NULL File object address to driverB and this is also shown in the output of !irp command above.

- Dmitry Vostokov @ DumpAnalysis.org -