Archive for the ‘Debugging’ Category

Raw Stack Dump of all threads (part 3)

Monday, May 11th, 2009

Sometimes the script featured in part 1 doesn’t work because of the lack of symbols or something else:

***                                                                  
***                                                                  
*** Your debugger is not using the correct symbols    
*** 
*** In order for this command to work properly, your symbol path
*** must point to .pdb files that have full type information.
*** 
*** Certain .pdb files (such as the public OS symbols) do not
*** contain the required information.  Contact the group that
*** provided you with these symbols if you need this command to
*** work.
*** 
*** Type referenced: ntdll!_NT_TIB
***
Couldn’t resolve error at ‘ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2′
                    ^ Extra character error in ‘~*e r? $t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? $t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; !teb; dps @$t1 @$t2′

This is the case where !teb WinDbg command doesn’t work and we can cope with this as shown in the following post:

Coping with missing symbolic information

Therefore we can adjust our user-mode script to use hard-coded offsets and delineate raw stack outputs by the output of kv WinDbg command:

~*e r? $t0 = @$teb; r? $t1 = @$t0+8; r? $t2 = @$t0+4; kv 100; dps poi(@$t1) poi(@$t2)

- Dmitry Vostokov @ DumpAnalysis.org -

Programming Language Pragmatics (3rd Edition)

Friday, May 8th, 2009

As soon as I wrote my review of the 2nd edition I found out that the 3rd edition was recently published and immediately bought it. I intend to read it from cover to cover again and publish my notes and comments in my reading notebook on Software Generalist blog. The new edition is also bundled with a companion CD.

Programming Language Pragmatics, Third Edition

Buy from Amazon

Hope in one of subsequent editions the author includes my Riemann Programming Language :-)

- Dmitry Vostokov @ DumpAnalysis.org -

A Windows case for delta debugging

Thursday, May 7th, 2009

My local browser crashed today when I did copy-paste of an RTF text into an HTML editor window. The dump was not saved because I previously set up logging as described here (my script doesn’t include .dump commands):

All at once: postmortem logs and dump files

Looking at stack trace I noticed that the crash happened during HTML processing (call arguments are removed for visual clarity):

STACK_TEXT: 
0476de3c 6970d597 html!FPseudoStyleBis+0x26
0476de48 69703b0e html!BisFromLpxszStyle+0x1c
0476de60 69702ba9 html!LwMultDivRU+0x4b6
0476dea4 6970518a html!FMarkListCallback+0x56c
0476deb4 697068b7 html!JcCalcFromXaExtents+0x91
0476df60 697070c5 html!EmitNonBreakingSpace+0x445
0476e08c 697107ff html!FEmitHtmlFnOtag+0x17d
0476e0b0 696ec6a8 html!ConvertRtfToForeign+0x105
0476e538 696ec745 html!FceRtfToForeign+0x266
0476e560 6b7e5ad4 html!RtfToForeign32+0x51
0476e9a8 6b7e5c83 mshtmled!CRtfToHtmlConverter::ExternalRtfToInternalHtml+0x163
0476edfc 6b79cc15 mshtmled!CRtfToHtmlConverter::StringRtfToStringHtml+0x11a
0476ee18 6b79cd81 mshtmled!CRtfToHtmlConverter::StringRtfToStringHtml+0x38
0476ee2c 6b7cdcea mshtmled!CHTMLEditor::ConvertRTFToHTML+0x12
0476ee98 6b7ce392 mshtmled!CPasteCommand::PasteFromClipboard+0x2c0
0476ef08 6b78d218 mshtmled!CPasteCommand::PrivateExec+0x47a
0476ef2c 6b78d1ad mshtmled!CCommand::Exec+0x4b
0476ef50 6b470d14 mshtmled!CMshtmlEd::Exec+0xf9
0476ef80 6b4688a8 mshtml!CEditRouter::ExecEditCommand+0xd6
0476f328 6b5ceccf mshtml!CDoc::ExecHelper+0x338d
0476f374 6b468a2f mshtml!CFrameSite::Exec+0x264
0476f3a8 6b4687af mshtml!CDoc::RouteCTElement+0xf1
0476f740 6b468586 mshtml!CDoc::ExecHelper+0x325e
0476f760 6b510e7b mshtml!CDoc::Exec+0x1e
0476f798 6b48a708 mshtml!CDoc::OnCommand+0x9c
0476f8ac 6b3997e1 mshtml!CDoc::OnWindowMessage+0x841
0476f8d8 766ff8d2 mshtml!CServer::WndProc+0x78
0476f904 766ff794 USER32!InternalCallWinProc+0x23
0476f97c 767006f6 USER32!UserCallWinProcCheckWow+0x14b
0476f9ac 7670069c USER32!CallWindowProcAorW+0x97
0476f9cc 6ce1851b USER32!CallWindowProcW+0x1b
WARNING: Stack unwind information not available. Following frames may be wrong.
0476fa40 6ce0cdc6 GoogleToolbarDynamic_6D0D6FD66D664927!DllGetClassObject+0x24981
0476fa64 6ce9beaa GoogleToolbarDynamic_6D0D6FD66D664927!DllGetClassObject+0x1922c
0476fa94 766ff8d2 GoogleToolbarDynamic_6D0D6FD66D664927!DllGetClassObject+0xa8310
0476fac0 766ff794 USER32!InternalCallWinProc+0x23
0476fb38 76700a05 USER32!UserCallWinProcCheckWow+0x14b
0476fb78 76700afa USER32!SendMessageWorker+0x4b7
0476fb98 6b47fb9b USER32!SendMessageW+0x7c
0476fbc4 6b3d8e5a mshtml!CElement::PerformTA+0x71
0476fbe4 6b3d8db9 mshtml!CDoc::PerformTA+0xd8
0476fc60 6b46381c mshtml!CDoc::PumpMessage+0x8e0
0476fd14 6b463684 mshtml!CDoc::DoTranslateAccelerator+0x33f
0476fd30 6b4634cc mshtml!CServer::TranslateAcceleratorW+0x56
0476fd50 70c9f550 mshtml!CDoc::TranslateAcceleratorW+0x83
0476fd6c 70c9f600 IEFRAME!CProxyActiveObject::TranslateAcceleratorW+0x30
0476fd90 70c9fca1 IEFRAME!CDocObjectView::TranslateAcceleratorW+0xb1
0476fdb0 70c9faf4 IEFRAME!CCommonBrowser::v_MayTranslateAccelerator+0xda
0476fddc 70c9f7b0 IEFRAME!CShellBrowser2::_MayTranslateAccelerator+0x68
0476fdec 70c9f7f5 IEFRAME!CShellBrowser2::v_MayTranslateAccelerator+0x15
0476fe58 76894911 IEFRAME!CTabWindow::_TabWindowThreadProc+0x264
0476fe64 776ae4b6 kernel32!BaseThreadInitThunk+0xe
0476fea4 776ae489 ntdll!__RtlUserThreadStart+0x23
0476febc 00000000 ntdll!_RtlUserThreadStart+0x1b

I immediately recalled that in Andreas Zeller’s book Why Programs Fail a browser parsing HTML was used as an example to show delta debugging (I think it was Mozilla).

The complete log file can be downloaded from here.

- Dmitry Vostokov @ DumpAnalysis.org -

Trace Analysis Patterns (Part 2)

Thursday, May 7th, 2009

A typical trace is a detailed narrative. It is accompanied by a problem description that lists essential facts. Therefore the first task of any trace analysis is to check the presence of Basic Facts in the trace. If they are not visible or do not correspond then the trace was possibly not recorded during the problem or was taken from a different computer or under different conditions. Here is an example. A user “test01″ cannot connect to an application. We look at the trace and find this statement:

No   PID  TID  Date      Time         Statement
[...]
3903 3648 5436 4/29/2009 16:17:36.150 User Name: test01
[...]

At least we can be sure that this trace was taken for the user test01 especially when we expect this or similar trace statements. If we could not see this trace statement we can suppose that the trace was taken at the wrong time, for example, after the problem happened already.

- Dmitry Vostokov @ TraceAnalysis.org -

Bugtation No.91

Monday, May 4th, 2009

On universal memory dumps:

“[…] the first man who noticed the analogy between a” dump “and” an observation “made a notable advance in the history of thought.”

Alfred North Whitehead, Science and the Modern World

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.90

Saturday, May 2nd, 2009

“The first rule of” debugging “is to have brains and good luck. The second rule of” debugging “is to sit tight and wait till you” hit “a” breakpoint.

George Pólya, How to Solve It

- Dmitry Vostokov @ DumpAnalysis.org -

Heap corruption, module variety, execution residue, coincidental symbolic information and critical section corruption: pattern cooperation

Friday, May 1st, 2009

This is a synthesized dump analysis of many similar print spooler crashes in multi-user terminal service environments where old printer drivers are used that were tested only in single-user environments or insufficiently tested in multi-threaded environments. Many such crashes result from dynamic memory corruption of a process heap:

(40dc.4278): Access violation - code c0000005 (!!! second chance !!!)
eax=00000079 ebx=00000001 ecx=0008bff8 edx=00107898 esi=07d7522a edi=00107890
eip=7c8199b2 esp=0155fc14 ebp=0155fc44 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
ntdll!RtlpLowFragHeapFree+0×30:
7c8199b2 8b4604          mov     eax,dword ptr [esi+4] ds:0023:07d7522e=????????

0:017> kL
ChildEBP RetAddr 
0155fc44 7c819770 ntdll!RtlpLowFragHeapFree+0×30
0155fd1c 77c87a2b ntdll!RtlFreeHeap+0×5c

0155fd30 77c87a02 RPCRT4!FreeWrapper+0×1e
0155fd3c 77c821c2 RPCRT4!operator delete+0xd
0155fd50 77c8047b RPCRT4!LRPC_SCALL::FreeBuffer+0×77
0155fd9c 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×192
0155fdc0 77c811dc RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
0155fdfc 77c812f0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
0155fe20 77c88678 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0155ff84 77c88792 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0155ff8c 77c8872d RPCRT4!RecvLotsaCallsWrapper+0xd
0155ffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
0155ffb8 77e64829 RPCRT4!ThreadStartRoutine+0×1b
0155ffec 00000000 kernel32!BaseThreadStart+0×34

Although any module could corrupt the heap and either Gflags or Application Verifier is recommended to enable full page heap, sometimes we need to point to some print drivers to eliminate or upgrade them in the meantime. When there are many of them we can point to the oldest one:

0:017> lmt
start    end        module name
[...]
010d0000 010d8000   PrintDriver1  2007
01260000 01272000   PrintDriver2  1999
01290000 012da000   PrintDriver3  2009
012f0000 01302000   PrintDriver4  2003
01310000 01320000   PrintDriver5  2004
01320000 01332000   PrintDriver6  2004
01340000 01353000   PrintDriver7  2005
01360000 0139e000   PrintDriver8  2007
013b0000 013c3000   PrintDriver9  2004
013d0000 013e0000   PrintDriver10 2005
013e0000 013f3000   PrintDriver11 2005
01400000 01413000   PrintDriver12 2006
01420000 0146b000   PrintDriver13 2007
01480000 01488000   PrintDriver14 2003
017f0000 0181e000   PrintDriver15 2004
01920000 0192d000   PrintDriver16 2008
01930000 01936000   PrintDriver17 2008
01950000 01959000   PrintDriver18 2008
01960000 01969000   PrintDriver19 2008
01f80000 021e8000   PrintDriver20 2004
032f0000 03514000   PrintDriver21 2003
03cd0000 03cd6000   PrintDriver22 2008
32100000 32148000   PrintDriver23 2008
3ea40000 3ea46000   PrintDriver24 2007
3f000000 3f03d000   PrintDriver25 2009
3f100000 3f133000   PrintDriver26 2009
[…]

The age distribution among 121 modules can be visualized on a CAD diagram:

 

Alternatively we can look at the execution residue on a raw thread stack:

0:017> !teb
TEB at 7ffa9000
    ExceptionList:        0155fd0c
    StackBase:            01560000
    StackLimit:           01550000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ffa9000
    EnvironmentPointer:   00000000
    ClientId:             000040dc . 00004278
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffd8000
    LastErrorValue:       0
    LastStatusValue:      8000001a
    Count Owned Locks:    0
    HardErrorMode:        0

0:017> dds 01550000 01560000
01550000  00000000
01550004  00000000
[...]
01554e78  00000000
01554e7c  00000000
01554e80  01554ecc
01554e84  7c82d1bb ntdll!RtlFindActivationContextSectionString+0xe1
01554e88  01554ea4
01554e8c  01554efc
01554e90  00000000
01554e94  020a0018 PrintDriver20!Callback+0×5c88
01554e98  7ffa9c00
01554e9c  00000000
01554ea0  01554ed4
01554ea4  7c82dd6c ntdll!RtlEncodeSystemPointer+0×45b
01554ea8  00020000
01554eac  01554ec8
01554eb0  01554ec8
01554eb4  01554ec8
[…]
01555eb4  0040003e
01555eb8  01556854
01555ebc  00000000
01555ec0  0000003e
01555ec4  0208003e PrintDriver20!GetValue+0xb37fe
01555ec8  00000000
01555ecc  43000000
01555ed0  0000003e
01555ed4  01555ffa
01555ed8  01555fbc
01555edc  01555fa8
01555ee0  001190d8
01555ee4  7c81990d ntdll!RtlpLowFragHeapAlloc+0×210
01555ee8  7c819962 ntdll!RtlpLowFragHeapAlloc+0xc6a
01555eec  0008bff8
01555ef0  00000000
01555ef4  00080000
[…]

The first address 020a0018 seems to be coincidental because its disassembled code is not good:

0:017> ub 020a0018
                 ^ Unable to find valid previous instruction for 'ub 020a0018'

0:017> u 020a0018
PrintDriver20!Callback+0x5c88:
020a0018 048b            add     al,8Bh
020a001a c7              ???

020a001b ebe8            jmp     PrintDriver20!Callback+0×5c75 (020a0005)
020a001d 8d4e28          lea     ecx,[esi+28h]
020a0020 e8d9960100      call    PrintDriver20!DlgProc+0×86ee (020b96fe)
020a0025 8d4e28          lea     ecx,[esi+28h]
020a0028 e87b930100      call    PrintDriver20!DlgProc+0×8398 (020b93a8)
020a002d 8b4618          mov     eax,dword ptr [esi+18h]

However the second address code 0208003e seems sound: cmp is followed by jne:

0:017> ub 0208003e
PrintDriver20!GetValue+0xb37e5:
02080025 8d442414        lea     eax,[esp+14h]
02080029 8b4b2c          mov     ecx,dword ptr [ebx+2Ch]
0208002c 50              push    eax
0208002d e8ce3a0000      call    PrintDriver20!GetValue+0xb72c0 (02083b00)
02080032 8b38            mov     edi,dword ptr [eax]
02080034 8b4c2410        mov     ecx,dword ptr [esp+10h]
02080038 8b41fc          mov     eax,dword ptr [ecx-4]
0208003b 3947fc          cmp     dword ptr [edi-4],eax

0:017> u 0208003e
PrintDriver20!GetValue+0xb37fe:
0208003e 751b            jne     PrintDriver20!GetValue+0xb381b (0208005b)
02080040 8bc8            mov     ecx,eax
02080042 8b742410        mov     esi,dword ptr [esp+10h]
02080046 c1e902          shr     ecx,2
02080049 f3a7            repe cmps dword ptr [esi],dword ptr es:[edi]
0208004b 750e            jne     PrintDriver20!GetValue+0xb381b (0208005b)
0208004d 8bc8            mov     ecx,eax
0208004f 83e103          and     ecx,3

But this is not a function call resulted in saved return address so we can still consider it as a coincidence. However, on the raw stack we also see a large chunk of ASCII data pointing to the same driver in a textual form:

[...]
0155d360  6f742064
0155d364  696e6920
0155d368  6c616974
0155d36c  20657a69
0155d370  61636562
0155d374  20657375
0155d378  75732061
0155d37c  62617469
[...]

0:017> da 0155d360 0155d734
0155d360  "d to initialize because a suitab"
0155d380  "le PrinterDriver20 inf file was ”
[…]

This reinforces our belief in PrinterDriver20. Finally, when looking at critical section list we see corruption signs pointing to the same driver addresses:

0:017> !cs -l -o -s
DebugInfo          = 0x0014bc60
Critical section   = 0x020f7140 (PrintDriver20!DlgProc+0×46130)
LOCKED
LockCount          = 0xFF85EA7F
WaiterWoken        = Yes
OwningThread       = 0×8b0c244c
RecursionCount     = 0×8BFFFBB4
LockSemaphore      = 0×83182444
SpinCount          = 0×088908c4

WARNING: critical section DebugInfo = 0x00000008 doesn't point back
to the DebugInfo found in the active critical sections list = 0x0014bc60.
The critical section was probably reused without calling DeleteCriticalSection.

Cannot read structure field value at 0x0000000a, error 0
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.

DebugInfo          = 0x0014bc88
Critical section   = 0x020f7110 (PrintDriver20!DlgProc+0×46100)
LOCKED
LockCount          = 0×1E7245FF
WaiterWoken        = No
OwningThread       = 0xccccc304
RecursionCount     = 0xC483FFFD
LockSemaphore      = 0×158638B9
SpinCount          = 0xff96e902

WARNING: critical section DebugInfo = 0x0f712068 doesn't point back
to the DebugInfo found in the active critical sections list = 0x0014bc88.
The critical section was probably reused without calling DeleteCriticalSection.

Cannot read structure field value at 0x0f71206a, error 0
[...]

- Dmitry Vostokov @ DumpAnalysis.org -

Viewing Problem Artifacts from Different Angles

Tuesday, April 28th, 2009

I often say or write something like this: “I looked at the dump|trace file from different angles”.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.89

Tuesday, April 28th, 2009

On the great divide in modern software factories:

“We have in fact, two kinds of” engineers, “side by side: one that” design, “but do not” code, “and another that” code, “but seldom” design.

Bertrand Russell, Sceptical Essays

- Dmitry Vostokov @ DumpAnalysis.org -

Trace Analysis Patterns (Part 1)

Tuesday, April 28th, 2009

After coming back to engineering I decided to expand the domain of my research and start the new series of posts called Trace Analysis Patterns. In addition to Citrix CDF / Microsoft ETW traces I plan to cover other variants based on my extensive software engineering background in the past where I used tracing in software products ranging from soft multi-platform real-time systems to static code analysis tools. Connection with memory dump analysis will be covered too because sometimes the combination of static and dynamic data leads to interesting observations and helps to troubleshoot and resolve customer problems especially when not all data can be collected dynamically.

In fact, stack traces and their collections are specializations of the more general traces. Another example is historical information in memory dump files especially when it is somehow timestamped.  

In this part I start with the obvious and to some extent the trivial pattern called Periodic Error. This is an error or a status value that is observed periodically many times:

No     PID  TID   Date      Time         Statement
[...]
664957 1788 22504 4/23/2009 17:59:14.600 MyClass::Initialize: Cannot open connection “Client ID: 310″, status=5  
[…]
668834 1788 19868 4/23/2009 19:11:52.979 MyClass::Initialize: Cannot open connection “Client ID: 612″, status=5 
[…]

or 

No     PID  TID   Date      Time         Statement
[...] 
202314 1788 19128 4/21/2009 16:03:46.861 HandleDataLevel: Error 12005 Getting Mask
[…]
347653 1788 17812 4/22/2009 13:26:00.735 HandleDataLevel: Error 12005 Getting Mask
[…]

Here single trace entries can be isolated from the trace and studied in detail. 

Be aware though that some modules might report periodic errors that are false positive, in the sense, that they are expected as a part of implementation details, for example, when a function returns an error to indicate that bigger buffer is required or to estimate its size for a subsequent call. It merits its own pattern name and I come to it next time with more examples.

I also created a page where I’ll will be adding all tracing patterns:

Trace Analysis Patterns   

- Dmitry Vostokov @ TraceAnalysis.org -

Bugtation No.88

Monday, April 27th, 2009

On the deliberate practice to become a Debugging Expert:

“Of all days, the day on which one has not” debugged “is the one most surely wasted.”

Nicolas Chamfort, Maximes et Pensées

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.87

Sunday, April 26th, 2009

“A” fix “can break a” bug “in two.”

The Talmud

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.86

Saturday, April 25th, 2009

“It is easier to know” programming “in general than to understand one” program “in particular.”

François de La Rochefoucauld, Maxims

- Dmitry Vostokov @ DumpAnalysis.org -

Review of and Notes on The Developer’s Guide to Debugging

Friday, April 24th, 2009

I finally read this book from cover to cover and I must say it is the very sound book and presents a consistent approach to debugging real-life problems with user-land C and C++ code on Linux environments.

The Developer’s Guide to Debugging

Buy from Amazon

Although it uses mainly GDB for illustrations and provides Visual C++ equivalents when possible it doesn’t cover Debugging Tools for Windows and its main GUI debugger, WinDbg. To rectify this I created extensive notes while reading.

Additional reader audience for this book might include a Windows engineer who needs to debug software on Linux or FreeBSD so a quick GDB crash course is needed. It would also serve as an excellent debugging course or as a supplemental course to any C or C++ course. Highly recommended if you are a Linux C/C++ software engineer. Even if you are an experienced one, you will find something new or make your debugging more consistent. If you need to teach or mentor juniors, this book helps too.

- Dmitry Vostokov @ DumpAnalysis.org -

On Subjectivity of Software Defects

Wednesday, April 22nd, 2009

If we assume the model-based definition of software defects we can easily see that any changes to an underlying model can surface the new unanticipated defects and hide the known ones. New and evolving disciplines like software security engineering can change our views about solid code and create defects by introducing non-functional constraints on models. Another aspect of this is the interaction of a human debugger with code, the very act of reading code can create defects. However the latter effect is controversial and belongs to the evolving quantum theory of software defects (see my previous post about bugtanglement).

- Dmitry Vostokov @ DumpAnalysis.org -

Pattern-Driven Memory Analysis (Part 2)

Tuesday, April 21st, 2009

Before we explain stages of the analysis process shown in Part 1, let’s start with a brief overview of memory dumps, debuggers and logs. Recall that a memory dump is a snapshot of a process, system or physical memory state. This unifies post-mortem analysis and live debugging. Debuggers are tools that allow us to get and modify these memory snapshots. Other tools that allow us to get memory dump files are process dumpers like userdump.exe, Task Manager since Vista, WER, and system dumpers like LiveKd and Win32dd. We should not forget tools and methods that allow us to trigger Windows kernel ability to save consistent memory dump files: NMI button, keyboard method and various software bugcheck-triggers like Citrix SystemDump. Now coming back to debuggers. One of their essential features is to save a debugging session log, formatted textual output saved in a text file for further processing. One good example is !process 0 ff WinDbg command to output all processes and their thread stack traces (see Stack Trace Collection pattern for other variations). 

I’ve created a page to add all P-DMA parts as soon as I write them:

Pattern-Driven Memory Analysis

- Dmitry Vostokov @ DumpAnalysis.org -

Music for Debugging: In the Memory Dump File

Monday, April 20th, 2009

I used to be a fun of Yanni music in the late 1990s. Today I started listening again to some of his albums and recommend them for any debugging session. If you are new to this music there is a compilation album that I’m listening to while I’m writing this post:

In the Mirror

Buy from Amazon

Here is my version of track titles inspired by listening (with my comments in italics): 

1. In the Memory Dump File
2. The Morning Session
3. Love for Debugging
4. A Debugger’s Dream 
5. Within Kernel
6. Forbidden Access
7. Once Upon a Second Chance
8. Chasing Bugs
9. The Main Thread [Special Debugging Version]  
10. Quiet Memory Analyst 
11. Debugging Joy (My Life is Debugging)
12. So Long My Debugger (My Only Friend on Virtual Memory Plains)
13. Before I Leave the Debugger 
14. End of Session (It wasn’t bad after all)
15. Face in the Memory Dump (after applying Natural Memory Visualization techniques: you can see pictures and various artifacts stored in memory buffers)

- Dmitry Vostokov @ DumpAnalysis.org -

The Debugging Decade!

Monday, April 20th, 2009

DumpAnalysis.org announces forthcoming 2011 - 2020 as The Debugging Decade.

Q&A

Q. Why 2011 - 2020?

A. The main reason is the fact that 2009 is The Year of Debugging and 2010 is The Year of Dump Analysis. This naturally extends to a decade.

Q. Do you plan The Debugging Century?

A. Yes, I do. Details will be announced later.

- Dmitry Vostokov @ DumpAnalysis.org -

A Copernican Revolution in Debugging

Thursday, April 16th, 2009

A number of Copernican revolutions occurred or announced in various branches of various sciences. Now it’s my turn to say that action-based ”earth-centric” debugging is replaced by memory (dump) analysis as a “heliocentric” foundation of debugging. Because even in live debugging we have memory snapshots and differential memory analysis. Traces in trace-based debugging is another example of universal memory dumps. Therefore memory (dump) analysis comes first.

- Dmitry Vostokov @ DumpAnalysis.org -

NULL Data Pointer Pattern: case study

Wednesday, April 15th, 2009

Here is the promised case study for the previous post about data NULL pointers. The complete dump has this bugcheck:

0: kd> !analyze -v

[...]

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck.  Usually the exception address pinpoints the driver/function that caused the problem.  Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003.  This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG.  This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG.  This will let us see why this breakpoint is happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8081c7c4, The address that the exception occurred at
Arg3: f1b5d730, Trap Frame
Arg4: 00000000

[...]

FAULTING_IP:
nt!IoIsOperationSynchronous+e
8081c7c4 f6412c02        test    byte ptr [ecx+2Ch],2

TRAP_FRAME:  f1b5d730 -- (.trap 0xfffffffff1b5d730)
[...]

0: kd> .trap 0xfffffffff1b5d730
ErrCode = 00000000
eax=8923b008 ebx=00000000 ecx=00000000 edx=8923b008 esi=891312d0 edi=89f0b300
eip=8081c7c4 esp=f1b5d7a4 ebp=f1b5d7a4 iopl=0 nv up ei ng nz ac pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010296
nt!IoIsOperationSynchronous+0xe:
8081c7c4 f6412c02   test    byte ptr [ecx+2Ch],2  ds:0023:0000002c=??

0: kd> kv 100
ChildEBP RetAddr  Args to Child             
f1b5d7a4 f42cdea9 8923b008 89f0b300 8923b008 nt!IoIsOperationSynchronous+0xe
f1b5d7bc 8081df85 89f0b300 8923b008 00000200 driveB!FsdDeviceIoControlFile+0×19
f1b5d7d0 808ed7a9 00000000 f1b5da84 f1b5db6c nt!IofCallDriver+0×45
f1b5da20 f3c3a521 89f0b300 f1b5da84 f1b5da84 nt!IoVolumeDeviceToDosName+0×89
WARNING: Stack unwind information not available. Following frames may be wrong.
f1b5da3c f3c3b58e 00000618 e4e00420 f1b5dad4 driverA+0×18531
[…]
f1b5dc3c 8081df85 89f48b48 87fa3008 89140d30 driverA+0×1df4

f1b5dc50 808f5437 87fa3078 89140d30 87fa3008 nt!IofCallDriver+0×45
f1b5dc64 808f61bf 89f48b48 87fa3008 89140d30 nt!IopSynchronousServiceTail+0×10b
f1b5dd00 808eed08 000000f0 00000000 00000000 nt!IopXxxControlFile+0×5e5
f1b5dd34 808897bc 000000f0 00000000 00000000 nt!NtDeviceIoControlFile+0×2a
f1b5dd34 7c8285ec 000000f0 00000000 00000000 nt!KiFastCallEntry+0xfc (TrapFrame @ f1b5dd64)
0856e154 7c826fcb 77e416f5 000000f0 00000000 ntdll!KiFastSystemCallRet
0856e158 77e416f5 000000f0 00000000 00000000 ntdll!NtDeviceIoControlFile+0xc
0856e1bc 6f050c6c 000000f0 5665824c 0856e234 kernel32!DeviceIoControl+0×137
[…]

From WDK help we know that the first parameter to IoIsOperationSynchronous is a pointer to an IRP structure:

0: kd> !irp 8923b008
Irp is active with 3 stacks 3 is current (= 0x8923b0c0)
 No Mdl: System buffer=878b7288: Thread 8758a020:  Irp stack trace. 
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
                     Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000   
                     Args: 00000000 00000000 00000000 00000000
>[  e, 0]   0  0 89f0b300 00000000 00000000-00000000   
              \FileSystem\DriverB
                     Args: 00000200 00000000 004d0008 00000000

Disassembling the function shows some pointer dereferencing and we can reconstruct it starting from EBP+8, a pointer to an IRP. 

0: kd> .asm no_code_bytes
Assembly options: no_code_bytes

0: kd> u nt!IoIsOperationSynchronous nt!IoIsOperationSynchronous+0xe
nt!IoIsOperationSynchronous:
8081c7b6 mov     edi,edi
8081c7b8 push    ebp
8081c7b9 mov     ebp,esp
8081c7bb mov     eax,dword ptr [ebp+8]
8081c7be mov     ecx,dword ptr [eax+60h]
8081c7c1 mov     ecx,dword ptr [ecx+18h]

EAX+60 seems to be a current stack location member of IRP and it is a pointer itself to _IO_STACK_LOCATION structure:

0: kd> dt -r _IRP 8923b008
ntdll!_IRP
   +0x000 Type             : 6
   +0x002 Size             : 0x268
   +0x004 MdlAddress       : (null)
   +0x008 Flags            : 0x70
[...]
   +0x038 CancelRoutine    : (null)
   +0x03c UserBuffer       : 0xf1b5d814
   +0×040 Tail             : __unnamed
      +0×000 Overlay          : __unnamed
         +0×000 DeviceQueueEntry : _KDEVICE_QUEUE_ENTRY
         +0×000 DriverContext    : [4] (null)
         +0×010 Thread           : 0×8758a020 _ETHREAD
         +0×014 AuxiliaryBuffer  : (null)
         +0×018 ListEntry        : _LIST_ENTRY [ 0×0 - 0×0 ]
         +0×020 CurrentStackLocation : 0×8923b0c0 _IO_STACK_LOCATION
[…]

ECX+18 is a pointer to a file object in _IO_STACK_LOCATION structure:

0: kd> dt _IO_STACK_LOCATION 8923b008+60
ntdll!_IO_STACK_LOCATION
   +0x000 MajorFunction    : 0xc0 ''
   +0x001 MinorFunction    : 0xb0 ''
   +0x002 Flags            : 0x23 '#'
   +0x003 Control          : 0x89 ''
   +0x004 Parameters       : __unnamed
   +0x014 DeviceObject     : (null)
   +0×018 FileObject       : (null)
   +0×01c CompletionRoutine : (null)
   +0×020 Context          : (null)

2C offset at the crash point test byte ptr [ecx+2Ch],2 is _FILE_OBJECT Flags member:

0: kd> dt _FILE_OBJECT
ntdll!_FILE_OBJECT
   +0x000 Type             : Int2B
   +0x002 Size             : Int2B
   +0x004 DeviceObject     : Ptr32 _DEVICE_OBJECT
   +0x008 Vpb              : Ptr32 _VPB
   +0x00c FsContext        : Ptr32 Void
   +0x010 FsContext2       : Ptr32 Void
   +0x014 SectionObjectPointer : Ptr32 _SECTION_OBJECT_POINTERS
   +0x018 PrivateCacheMap  : Ptr32 Void
   +0x01c FinalStatus      : Int4B
   +0x020 RelatedFileObject : Ptr32 _FILE_OBJECT
   +0x024 LockOperation    : UChar
   +0x025 DeletePending    : UChar
   +0x026 ReadAccess       : UChar
   +0x027 WriteAccess      : UChar
   +0x028 DeleteAccess     : UChar
   +0x029 SharedRead       : UChar
   +0x02a SharedWrite      : UChar
   +0x02b SharedDelete     : UChar
   +0×02c Flags            : Uint4B
   +0×030 FileName         : _UNICODE_STRING
   +0×038 CurrentByteOffset : _LARGE_INTEGER
   +0×040 Waiters          : Uint4B
   +0×044 Busy             : Uint4B
   +0×048 LastLock         : Ptr32 Void
   +0×04c Lock             : _KEVENT
   +0×05c Event            : _KEVENT
   +0×06c CompletionContext : Ptr32 _IO_COMPLETION_CONTEXT

So it looks like driverA passed an IRP with NULL File object address to driverB and this is also shown in the output of !irp command above.

- Dmitry Vostokov @ DumpAnalysis.org -