Management Bits and Tips blog

December 18th, 2007

To disassociate management activities and thoughts with crashes and hangs I have created a separate blog called

Management Bits and Tips

with the subtitle “Reflections on Software Engineering and Software Technical Support Management”.

Although, in the future, I reserve the right to metaphorically relate crash and hang dump analysis patterns with technical and people management.

All future posts in Management Bits and Tips category and related posts in Software Techical Support category will go there and here I will only post monthly or bi-monthly summary.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 41b)

December 17th, 2007

Now Manual Dump pattern as seen from process memory dumps. It is not possible to reliably identify manual dumps here because a debugger or another process dumper might have been attached to a process noninvasively and not leaving traces of intervention so we can only rely on the following information:

Comment field

Loading Dump File [C:\kktools\userdump8.1\x64\notepad.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Userdump generated complete user-mode minidump with Standalone function on COMPUTER-NAME'

Absence of exceptions

Loading Dump File [C:\UserDumps\notepad.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:31:31.000 2007 (GMT+0)
System Uptime: 0 days 0:45:11.148
Process Uptime: 0 days 0:00:36.000
....................
user32!ZwUserGetMessage+0xa:
00000000`76c8e6aa c3              ret
0:000> ~*kL

.  0  Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0029f618 00000000`76c8e6ea user32!ZwUserGetMessage+0xa
00000000`0029f620 00000000`ff2b6eca user32!GetMessageW+0x34
00000000`0029f650 00000000`ff2bcf8b notepad!WinMain+0x176
00000000`0029f6d0 00000000`76d7cdcd notepad!IsTextUTF8+0x24f
00000000`0029f790 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0029f7c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Wake debugger exception

Loading Dump File [C:\UserDumps\notepad2.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:35:37.000 2007 (GMT+0)
System Uptime: 0 days 0:49:13.806
Process Uptime: 0 days 0:02:54.000
....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(314.1b4): Wake debugger - code 80000007 (first/second chance not available)”

user32!ZwUserGetMessage+0xa:
00000000`76c8e6aa c3              ret

Break instruction exception

Loading Dump File [C:\UserDumps\notepad3.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Mon Dec 17 16:45:15.000 2007 (GMT+0)
System Uptime: 0 days 0:58:52.699
Process Uptime: 0 days 0:14:20.000
....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
ntdll!DbgBreakPoint:
00000000`76ecfdf0 cc              int     3

0:001> ~*kL

   0  Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`0029f618 00000000`76c8e6ea user32!ZwUserGetMessage+0xa
00000000`0029f620 00000000`ff2b6eca user32!GetMessageW+0x34
00000000`0029f650 00000000`ff2bcf8b notepad!WinMain+0x176
00000000`0029f6d0 00000000`76d7cdcd notepad!IsTextUTF8+0x24f
00000000`0029f790 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0029f7c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

#  1  Id: 1b8.ec4 Suspend: 1 Teb: 000007ff`fffda000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`030df798 00000000`76f633e8 ntdll!DbgBreakPoint
00000000`030df7a0 00000000`76d7cdcd ntdll!DbgUiRemoteBreakin+0×38

00000000`030df7d0 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd
00000000`030df800 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

The latter might also be some assertion statement in the code leading to a process crash like in the following instance of Dynamic Memory Corruption pattern (heap corruption):  

FAULTING_IP:
ntdll!DbgBreakPoint+0
77f813b1 cc int 3

EXCEPTION_RECORD: ffffffff -- (.exr ffffffffffffffff)
ExceptionAddress: 77f813b1 (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 09aef2ac
Parameter[2]: 09aeeee8

STACK_TEXT:
09aef0bc 77fb76aa ntdll!DbgBreakPoint
09aef0c4 77fa65c2 ntdll!RtlpBreakPointHeap+0×26
09aef2bc 77fb5367 ntdll!RtlAllocateHeapSlowly+0×212
09aef340 77fa64f6 ntdll!RtlDebugAllocateHeap+0xcb
09aef540 77fcc9e3 ntdll!RtlAllocateHeapSlowly+0×5a
09aef720 786f3f11 ntdll!RtlAllocateHeap+0×954
09aef730 786fd10e rpcrt4!operator new+0×12

09aef748 786fc042 rpcrt4!OSF_CCONNECTION::OSF_CCONNECTION+0×174
09aef79c 786fbe0d rpcrt4!OSF_CASSOCIATION::AllocateCCall+0xfa
09aef808 786fbd53 rpcrt4!OSF_BINDING_HANDLE::AllocateCCall+0×1cd
09aef83c 786f1f2f rpcrt4!OSF_BINDING_HANDLE::GetBuffer+0×28
09aef854 786f1ee4 rpcrt4!I_RpcGetBufferWithObject+0×6e
09aef860 786f1ea4 rpcrt4!I_RpcGetBuffer+0xb
09aef86c 78754762 rpcrt4!NdrGetBuffer+0×2b
09aefab8 796d78b5 rpcrt4!NdrClientCall2+0×3f9
09aefac8 796d7821 advapi32!LsarOpenPolicy2+0×14
09aefb1c 796d8b04 advapi32!LsaOpenPolicy+0xaf
09aefb84 796d8aa9 advapi32!LookupAccountSidInternal+0×63
09aefbac 0aaf5d8b advapi32!LookupAccountSidW+0×1f
WARNING: Stack unwind information not available. Following frames may be wrong.
09aeff40 0aad1665 ComponentDLL+0×35d8b
09aeff5c 3f69264c ComponentDLL+0×11665
09aeff7c 780085bc ComponentDLL+0×264c
09aeffb4 77e5438b msvcrt!_endthreadex+0xc1
09aeffec 00000000 kernel32!BaseThreadStart+0×52

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 42a)

December 14th, 2007

Wait Chain is another pattern and it is simply a sequence of causal relations between events: thread A is waiting for event E to happen that threads B, C or D are supposed to signal at some time in the future but they are all waiting for event F to happen that thread G is about to signal as soon as it finishes processing some critical task:

That subsumes various deadlock patterns too which are causal loops where thread A is waiting for event AB that thread B will signal as soon as thread A signals event BA thread B is waiting for:

In this context “Event” means any type of synchronization object, critical section, LPC/RPC reply or data arrival through some IPC channel and not only Win32 event object or kernel _KEVENT.

As the first example of Wait Chain pattern I show a process being terminated and waiting for the other thread to finish or in other words, considering thread termination as an event itself, the main process thread is waiting for the second thread object to be signaled. The second thread tries to cancel previous I/O request directed to some device. However that IRP is not cancellable and process hangs. This can be depicted on the following diagram:

where Thread A is our main thread waiting for Event A which is thread B itself waiting for I/O cancellation (Event B). Their stack traces are:

THREAD 8a3178d0  Cid 04bc.01cc  Teb: 7ffdf000 Win32Thread: bc1b6e70 WAIT: (Unknown) KernelMode Non-Alertable
    8af2c920  Thread
Not impersonating
DeviceMap                 e1032530
Owning Process            89ff8d88       Image:         processA.exe
Wait Start TickCount      80444          Ticks: 873 (0:00:00:13.640)
Context Switch Count      122                 LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.156
Win32 Start Address 0x010148a4
Start Address 0x77e617f8
Stack Init f3f29000 Current f3f28be8 Base f3f29000 Limit f3f25000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f3f28c00 80833465 nt!KiSwapContext+0x26
f3f28c2c 80829a62 nt!KiSwapThread+0x2e5
f3f28c74 8094c0ea nt!KeWaitForSingleObject+0x346 ; stack trace with arguments shows the first parameter as 8af2c920 
f3f28d0c 8094c63f nt!PspExitThread+0×1f0
f3f28d24 8094c839 nt!PspTerminateThreadByPointer+0×4b
f3f28d54 8088978c nt!NtTerminateProcess+0×125
f3f28d54 7c8285ec nt!KiFastCallEntry+0xfc

THREAD 8af2c920  Cid 04bc.079c  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    8af2c998  NotificationTimer
IRP List:
    8ad26260
: (0006,0220) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 e1032530
Owning Process            89ff8d88       Image:         processA.exe
Wait Start TickCount      81312          Ticks: 5 (0:00:00:00.078)
Context Switch Count      169                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×77da3ea5
Start Address 0×77e617ec
Stack Init f3e09000 Current f3e08bac Base f3e09000 Limit f3e05000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f3e08bc4 80833465 nt!KiSwapContext+0×26
f3e08bf0 80828f0b nt!KiSwapThread+0×2e5
f3e08c38 808ea7a4 nt!KeDelayExecutionThread+0×2ab
f3e08c68 8094c360 nt!IoCancelThreadIo+0×62
f3e08cf0 8094c569 nt!PspExitThread+0×466
f3e08cfc 8082e0b6 nt!PsExitSpecialApc+0×1d
f3e08d4c 80889837 nt!KiDeliverApc+0×1ae
f3e08d4c 7c8285ec nt!KiServiceExit+0×56

By inspecting IRP we can see a device it was directed to, see that it has cancel bit but doesn’t have a cancel routine:

0: kd> !irp 8ad26260  1
Irp is active with 5 stacks 4 is current (= 0x8ad2633c)
 No Mdl: No System Buffer: Thread 8af2c920:  Irp stack trace. 
Flags = 00000000
ThreadListEntry.Flink = 8af2cb28
ThreadListEntry.Blink = 8af2cb28
IoStatus.Status = 00000000
IoStatus.Information = 00000000
RequestorMode = 00000001
Cancel = 01
CancelIrql = 0
ApcEnvironment = 00
UserIosb = 77ecb700
UserEvent = 00000000
Overlay.AsynchronousParameters.UserApcRoutine = 00000000
Overlay.AsynchronousParameters.UserApcContext = 00000000
Overlay.AllocationSize = 00000000 - 00000000
CancelRoutine = 00000000  
UserBuffer = 77ecb720
&Tail.Overlay.DeviceQueueEntry = 8ad262a0
Tail.Overlay.Thread = 8af2c920
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = 00000000
Tail.Overlay.ListEntry.Blink = 00000000
Tail.Overlay.CurrentStackLocation = 8ad2633c
Tail.Overlay.OriginalFileObject = 89ff8920
Tail.Apc = 00000000
Tail.CompletionKey = 00000000
     cmd  flg cl Device   File     Completion-Context
 [  0, 0]   0  0 00000000 00000000 00000000-00000000
   

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
 [  0, 0]   0  0 00000000 00000000 00000000-00000000

   Args: 00000000 00000000 00000000 00000000
>[  c, 2]   0  1 8ab20388 89ff8920 00000000-00000000    pending
        \Device\DeviceA

   Args: 00000020 00000017 00000000 00000000
 [  c, 2]   0  0 8affa4b8 89ff8920 00000000-00000000   
        \Device\DeviceB
   Args: 00000020 00000017 00000000 00000000

- Dmitry Vostokov @ DumpAnalysis.org -

Flawless writing with Google

December 13th, 2007

Management Bits and Tips 0×1 - Many managers have flawless writing skills (bit). Use Google to check your writing (tip).

It is especially important for non-native English speakers like me. You can search simple sub-sentences and their alterations to compare search results.

For example, today I had a discussion about this sub-sentence:

“It’s main advantage is “

It gives 539 search results. However the sentence without apostrophe

“Its main advantage is “

gives 8,870 search results. Let’s check combinations with two “it”.

  • “It’s main advantage is it’s ” - 192
  • “Its main advantage is it’s ” - 0
  • “It’s main advantage is its ” - 299
  • “Its main advantage is its ” - 836

So you get the idea of what is more correct or more widely used from descriptive grammar point of view. 

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 41a)

December 12th, 2007

Some memory dumps are generated on purpose to troubleshoot process and system hangs. They are usually called Manual Dumps, manual crash dumps or manual memory dumps. Kernel, complete and kernel mini dumps can be generated using the famous keyboard method described in the following Microsoft article which has been recently updated and contains the fix for USB keyboards:

http://support.microsoft.com/kb/244139

The crash dump will show E2 bugcheck:

MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Various tools including Citrix SystemDump reuse E2 bug check code and its arguments.  There are many other 3rd-party tools used to bugcheck Windows OS such as BANG! from OSR or NotMyFault from Sysinternals. The old one is crash.exe that loads crashdrv.sys and uses the following bugcheck:

Unknown bugcheck code (69696969)
Unknown bugcheck description
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

In a memory dump you would see its characteristic stack trace pointing to crashdrv module: 

STACK_TEXT:
b5b3ebe0 f615888d nt!KeBugCheck+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
b5b3ebec f61584e3 crashdrv+0x88d
b5b3ec00 8041eec9 crashdrv+0x4e3
b5b3ec14 804b328a nt!IopfCallDriver+0x35
b5b3ec28 804b40de nt!IopSynchronousServiceTail+0x60
b5b3ed00 804abd0a nt!IopXxxControlFile+0x5d6
b5b3ed34 80468379 nt!NtDeviceIoControlFile+0x28
b5b3ed34 77f82ca0 nt!KiSystemService+0xc9
0006fed4 7c5794f4 ntdll!NtDeviceIoControlFile+0xb
0006ff38 01001a74 KERNEL32!DeviceIoControl+0xf8
0006ff70 01001981 crash+0x1a74
0006ff80 01001f93 crash+0x1981
0006ffc0 7c5989a5 crash+0x1f93
0006fff0 00000000 KERNEL32!BaseProcessStart+0x3d

Sometimes various hardware buttons are used to trigger NMI and generate a crash dump when keyboard is not available. The bugcheck will be:

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction. The hardware supplier should be called.
Arguments:
Arg1: 004f4454
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Critical process termination such as session 0 csrss.exe is used to force a memory dump:

CRITICAL_OBJECT_TERMINATION (f4)
A process or thread crucial to system operation has unexpectedly exited or been terminated.
Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function.
Arguments:
Arg1: 00000003, Process
Arg2: 8a090d88, Terminating object
Arg3: 8a090eec, Process image file name
Arg4: 80967b74, Explanatory message (ascii)

- Dmitry Vostokov @ DumpAnalysis.org -

Management: Analysis and Synthesis

December 11th, 2007

I created “Management Bits and Tips” category to write my thoughts on management and just realized how this category title fits into grand scientific modeling approach:

Analysis (Bits) -> Synthesis (Tips)

Contrast this with pure analytic approaches

  • “Management Bits”
  • “Management Bytes”
  • “Management Bits and Bytes”

or with pure synthetic approach “Management Tips”.

I was thinking about “Management QWords” category but abandoned that thought because QWord sounds to me as an abbreviation to “Cursing Words”. “Management DWords” ?

Perhaps I have to start a separate blog otherwise debugging community will complain for this off topic :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Expertise-Driven Motivation

December 11th, 2007

There are many X-Driven motivations out there but I prefer expertise-driven individuals, motivated by the desire to become experts. It is not bullshit as you might think. It is more like a persistent psychological state found in researchers and scientists and the best results are guaranteed when it is supplemented by money-driven positive feedback loop. I’ve seen such people in both software engineering and software technical support environments. It is very interesting topic and I might come back to it later.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 40a)

December 10th, 2007

In Advanced Windows Debugging book I encountered some thread stacks related to debugger events like Exit a Process, Load or Unload a Module and realized that I’ve seen process crash dumps with such stacks traces. These thread stacks are not normally encountered in healthy process dumps and, statistically speaking, when a process terminates or unloads a library the chances to save a memory dump manually using process dumpers like userdump.exe or Task Manager in Vista are very low unless an interactive debugger was attached or breakpoints were set in advance. Therefore the presence of such threads in a captured crash dump usually indicates some problem or at least focuses attention to the procedure used to save a dump. Such pattern merits its own name: Special Stack Trace.

For example, one process dump had the following stack trace showing process termination initiated from .NET runtime:

STACK_TEXT:
0012fc2c 7c827c1b ntdll!KiFastSystemCallRet
0012fc30 77e668c3 ntdll!NtTerminateProcess+0xc
0012fd24 77e66905 KERNEL32!_ExitProcess+0x63
0012fd38 01256d9b KERNEL32!ExitProcess+0x14
0012ff60 01256dc7 mscorwks!SafeExitProcess+0x11a
0012ff6c 011c5fa4 mscorwks!DisableRuntime+0xd0
0012ffb0 79181b5f mscorwks!_CorExeMain+0x8c
0012ffc0 77e6f23b mscoree!_CorExeMain+0x2c
0012fff0 00000000 KERNEL32!BaseProcessStart+0x23

The original problem was an error message box and the application disappeared when a user dismissed the message. How the dump was saved? Someone advised to attach NTSD to that process, hit ‘g’ and then save the memory dump when the process breaks into the debugger again. So the problem was already gone by that time and the better way would have been to create the manual user dump of that process when it was displaying the error message.

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 6)

December 7th, 2007

Previous parts were dealing with exceptions in kernel mode. In this and next parts I’m going to investigate the flow of exception processing in user mode. In part 1 I mentioned that interrupts and exceptions generated when CPU executes code in user mode require a processor to switch the current user mode stack to kernel mode stack. This can be seen when we have a user debugger attached and it gets an exception notification called first chance exception. Because of the stack switch we don’t see any saved processor context on user mode thread stack when WinDbg breaks on first-chance exception in TestDefaultDebugger64:

0:000> r
rax=0000000000000000 rbx=0000000000000001 rcx=000000000012fd80
rdx=00000000000003e8 rsi=000000000012fd80 rdi=0000000140033fe0
rip=0000000140001690 rsp=000000000012f198 rbp=0000000000000111
 r8=0000000000000000  r9=0000000140001690 r10=0000000140001690
r11=000000000012f260 r12=0000000000000000 r13=00000000000003e8
r14=0000000000000110 r15=0000000000000001
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1:
00000001`40001690 c704250000000000000000 mov dword ptr [0],0 ds:00000000`00000000=????????

0:000> kL 100
Child-SP          RetAddr           Call Site
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0x180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0x32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0x60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0x38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0x59
00000000`0012f560 00000000`77c4337a user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f630 00000000`77c4341b user32!SendMessageWorker+0x68c
00000000`0012f6d0 000007ff`7f07c89f user32!SendMessageW+0x9d
00000000`0012f720 000007ff`7f07f2e1 comctl32!Button_ReleaseCapture+0x14f
00000000`0012f750 00000000`77c43abc comctl32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c user32!UserCallWinProcCheckWow+0x1f9
00000000`0012f980 00000000`77c3966a user32!DispatchMessageWorker+0x3af
00000000`0012f9f0 00000001`40007148 user32!IsDialogMessageW+0x256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0x38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0x28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0x3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0x67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0x23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0x3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0x1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0x1c6
00000000`0012fd50 00000001`4002ccb1 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3
00000000`0012fe80 00000001`40016150 TestDefaultDebugger64!AfxWinMain+0x75
00000000`0012fec0 00000000`77d5964c TestDefaultDebugger64!__tmainCRTStartup+0x260
00000000`0012ff80 00000000`00000000 kernel32!BaseProcessStart+0x29

0:000> dqs 000000000012f198-20 000000000012f198+20
00000000`0012f178  00000001`4000bc25 TestDefaultDebugger64!CWnd::ReflectLastMsg+0x65
00000000`0012f180  00000000`00080334
00000000`0012f188  00000000`00000006
00000000`0012f190  00000000`0000000d
00000000`0012f198  00000001`40004ba0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1a0  ffffffff`fffffffe
00000000`0012f1a8  00000000`00000000
00000000`0012f1b0  00000000`00000000
00000000`0012f1b8  00000000`00000000

We see that there are no saved SS:RSP, RFLAGS, CS:RIP registers which we see on a stack if an exception happens in kernel mode as shown in part 2. If we bugcheck our system using SystemDump tool to generate complete memory dump at that time we can look later at the whole thread that experienced exception in user mode and its user mode and kernel mode stacks:

kd> !process fffffadfe7055c20 2
PROCESS fffffadfe7055c20
    SessionId: 0  Cid: 0c64    Peb: 7fffffd7000  ParentCid: 07b0
    DirBase: 27e3d000  ObjectTable: fffffa800073a550  HandleCount:  55.
    Image: TestDefaultDebugger64.exe

        THREAD fffffadfe78f2bf0  Cid 0c64.0c68  Teb: 000007fffffde000 Win32Thread: fffff97ff4d71010 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
            fffffadfdf7b6fc0  SynchronizationEvent

        THREAD fffffadfe734c3d0  Cid 0c64.0c88  Teb: 000007fffffdc000 Win32Thread: 0000000000000000 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
            fffffadfe734c670  Semaphore Limit 0x2

kd> .thread /r /p fffffadfe78f2bf0
Implicit thread is now fffffadf`e78f2bf0
Implicit process is now fffffadf`e7055c20
Loading User Symbols

kd> kL 100
Child-SP          RetAddr           Call Site
fffffadf`df7b6d30 fffff800`0103b063 nt!KiSwapContext+0x85
fffffadf`df7b6eb0 fffff800`0103c403 nt!KiSwapThread+0xc3
fffffadf`df7b6ef0 fffff800`013a9dc1 nt!KeWaitForSingleObject+0x528
fffffadf`df7b6f80 fffff800`01336dcf nt!DbgkpQueueMessage+0x281
fffffadf`df7b7130 fffff800`01011c69 nt!DbgkForwardException+0x1c5
fffffadf`df7b74f0 fffff800`0104146f nt!KiDispatchException+0x264
fffffadf`df7b7af0 fffff800`010402e1 nt!KiExceptionExit
fffffadf`df7b7c70 00000001`40001690 nt!KiPageFault+0×1e1
00000000`0012f198 00000001`40004ba0 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1
00000000`0012f1a0 00000001`40004de0 TestDefaultDebugger64!_AfxDispatchCmdMsg+0xc4
00000000`0012f1d0 00000001`4000564e TestDefaultDebugger64!CCmdTarget::OnCmdMsg+0×180
00000000`0012f230 00000001`4000c6b4 TestDefaultDebugger64!CDialog::OnCmdMsg+0×32
00000000`0012f270 00000001`4000d4d8 TestDefaultDebugger64!CWnd::OnCommand+0xcc
00000000`0012f300 00000001`400082e0 TestDefaultDebugger64!CWnd::OnWndMsg+0×60
00000000`0012f440 00000001`4000b77a TestDefaultDebugger64!CWnd::WindowProc+0×38
00000000`0012f480 00000001`4000b881 TestDefaultDebugger64!AfxCallWndProc+0xfe
00000000`0012f520 00000000`77c43abc TestDefaultDebugger64!AfxWndProc+0×59
00000000`0012f560 00000000`77c4337a USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f630 00000000`77c4341b USER32!SendMessageWorker+0×68c
00000000`0012f6d0 000007ff`7f07c89f USER32!SendMessageW+0×9d
00000000`0012f720 000007ff`7f07f2e1 COMCTL32!Button_ReleaseCapture+0×14f
00000000`0012f750 00000000`77c43abc COMCTL32!Button_WndProc+0xd51
00000000`0012f8b0 00000000`77c43f5c USER32!UserCallWinProcCheckWow+0×1f9
00000000`0012f980 00000000`77c3966a USER32!DispatchMessageWorker+0×3af
00000000`0012f9f0 00000001`40007148 USER32!IsDialogMessageW+0×256
00000000`0012fac0 00000001`400087f8 TestDefaultDebugger64!CWnd::IsDialogMessageW+0×38
00000000`0012faf0 00000001`4000560f TestDefaultDebugger64!CWnd::PreTranslateInput+0×28
00000000`0012fb20 00000001`4000b2ca TestDefaultDebugger64!CDialog::PreTranslateMessage+0xc3
00000000`0012fb50 00000001`400034a7 TestDefaultDebugger64!CWnd::WalkPreTranslateTree+0×3a
00000000`0012fb80 00000001`40003507 TestDefaultDebugger64!AfxInternalPreTranslateMessage+0×67
00000000`0012fbb0 00000001`400036d2 TestDefaultDebugger64!AfxPreTranslateMessage+0×23
00000000`0012fbe0 00000001`40003717 TestDefaultDebugger64!AfxInternalPumpMessage+0×3a
00000000`0012fc10 00000001`4000a806 TestDefaultDebugger64!AfxPumpMessage+0×1b
00000000`0012fc40 00000001`40005ff2 TestDefaultDebugger64!CWnd::RunModalLoop+0xea
00000000`0012fca0 00000001`40001163 TestDefaultDebugger64!CDialog::DoModal+0×1c6
00000000`0012fd50 00000000`00000000 TestDefaultDebugger64!CTestDefaultDebuggerApp::InitInstance+0xe3

Dumping kernel mode stack of our thread shows that the processor saved registers there:

kd> dqs fffffadf`df7b7c70  fffffadf`df7b7c70+200
fffffadf`df7b7c70  fffffadf`e78f2bf0
fffffadf`df7b7c78  00000000`00000000
fffffadf`df7b7c80  fffffadf`e78f2b01
fffffadf`df7b7c88  00000000`00000020
...
...
...
fffffadf`df7b7d90  00000000`00000000
fffffadf`df7b7d98  00000000`00000000
fffffadf`df7b7da0  00000000`00000000
fffffadf`df7b7da8  00000000`00000000
fffffadf`df7b7db0  00000000`001629b0
fffffadf`df7b7db8  00000000`00000001
fffffadf`df7b7dc0  00000000`00000001
fffffadf`df7b7dc8  00000000`00000111 ; RBP saved by KiPageFault
fffffadf`df7b7dd0  00000000`00000006 ; Page-Fault Error Code
fffffadf`df7b7dd8  00000001`40001690 TestDefaultDebugger64!CTestDefaultDebuggerDlg::OnBnClickedButton1 ; RIP
fffffadf`df7b7de0  00000000`00000033 ; CS
fffffadf`df7b7de8  00000000`00010246 ; RFLAGS
fffffadf`df7b7df0  00000000`0012f198 ; RSP
fffffadf`df7b7df8  00000000`0000002b ; SS
fffffadf`df7b7e00  00000000`0000027f
fffffadf`df7b7e08  00000000`00000000
fffffadf`df7b7e10  00000000`00000000
fffffadf`df7b7e18  0000ffff`00001f80
fffffadf`df7b7e20  00000000`00000000
fffffadf`df7b7e28  00000000`00000000
fffffadf`df7b7e30  00000000`00000000
fffffadf`df7b7e38  00000000`00000000


kd> .asm no_code_bytes
Assembly options: no_code_bytes

kd> u KiPageFault
nt!KiPageFault:
fffff800`01040100 push    rbp
fffff800`01040101 sub     rsp,158h
fffff800`01040108 lea     rbp,[rsp+80h]
fffff800`01040110 mov     byte ptr [rbp-55h],1
fffff800`01040114 mov     qword ptr [rbp-50h],rax
fffff800`01040118 mov     qword ptr [rbp-48h],rcx
fffff800`0104011c mov     qword ptr [rbp-40h],rdx
fffff800`01040120 mov     qword ptr [rbp-38h],r8

Error code 6 is 110 in binary and volume 3A of Intel manual tells us that “the fault was caused by a non-present page” (bit 0 is cleared), “the access causing the fault was a write” (bit 1 is set) and “the access causing the fault originated when the processor was executing in user mode” (bit 2 is set).

- Dmitry Vostokov @ DumpAnalysis.org -

SIMSIM Software Development Process

December 6th, 2007

Faced with the problem to find time to write troubleshooting tools that spring to my mind I devised this process that seems to be a novel way to write software for busy professionals. Its essence is in writing software when presenting it or when presenting software writing topics, for example, software architecture, design and implementation. It has some agile process flavour but magnified by a bigger audience than pair programming has and nicely complements my Reading Windows-based Code series. SIMSIM is an abbreviation for:

Show IMplementation and Subsequent IMprovement

More details will be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org -
 

Complexity and Memory Dumps (Part 1)

December 5th, 2007

Asking right questions at the appropriate hierarchical organization level is a known solution to complexity. In case of memory dumps it is sometimes useful to forget about bits, bytes, words, dwords and qwords, memory addresses, pointers, runtime structures, API and ask educated questions at component level, the simplest of it is the question about component timestamp, in WinDbg parlance, using variants of lm command, for example:

0:008> lmt m ModuleA
start    end        module name
76290000 762ad000   ModuleA  Sat Feb 17 13:59:59 2007 (45D70A5F)

0:008> lmt m ModuleB
start    end        module name
66c50000 66c65000   ModuleB  Fri Feb 02 22:30:03 2007 (45C3BB6B)

The next step is obvious: test with the newer version. Another good question is about consistency to exclude cases caused by α-particle hits. This latter possibility was mentioned in Andreas Zeller’s book I read some time ago and can be considered as the efficient cause of some crash dumps according to Aristotelian causation categories.   

- Dmitry Vostokov @ DumpAnalysis.org -

CAFF BugCheck

December 4th, 2007

Recently observed it in a kernel dump and found that userdump.sys generates it from userdump.exe request when process monitoring rules in Process Dumper from Microsoft userdump package are set to “Bugcheck after dumping”:

BUGCHECK_STR:  0xCAFF

PROCESS_NAME:  userdump.exe

kd> kL
Child-SP          RetAddr           Call Site
fffffadf`dfcf19b8 fffffadf`dfee38c4 nt!KeBugCheck
fffffadf`dfcf19c0 fffff800`012ce9cf userdump!UdIoctl+0x104
fffffadf`dfcf1a70 fffff800`012df026 nt!IopXxxControlFile+0xa5a
fffffadf`dfcf1b90 fffff800`010410fd nt!NtDeviceIoControlFile+0x56
fffffadf`dfcf1c00 00000000`77ef0a5a nt!KiSystemServiceCopyEnd+0x3
00000000`01eadd58 00000001`0000a755 ntdll!NtDeviceIoControlFile+0xa
00000000`01eadd60 00000000`77ef30a5 userdump_100000000!UdServiceWorkerAPC+0x1005
00000000`01eaf970 00000000`77ef0a2a ntdll!KiUserApcDispatcher+0x15
00000000`01eafe68 00000001`00007fe2 ntdll!NtWaitForSingleObject+0xa
00000000`01eafe70 00000001`00008a39 userdump_100000000!UdServiceWorker+0xb2
00000000`01eaff20 000007ff`7fee4db6 userdump_100000000!UdServiceStart+0x139
00000000`01eaff50 00000000`77d6b6da ADVAPI32!ScSvcctrlThreadW+0x25
00000000`01eaff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

This might be useful if you want to see kernel data that happened to be at exception time. In this case you can avoid requesting complete memory dump of physical memory and ask for kernel memory dump only (if it was configured in Control Panel) together with a user dump.

Note: do not set this option if you are unsure. It can have your production servers bluescreen in the case of false positive dumps.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 7)

December 3rd, 2007

Be language - excessive use of “is”. This anti-pattern was inspired by Alfred Korzybski notion of how “is” affects our understanding of the world. In the context of technical support the use of certain verbs sometimes leads to wrong troubleshooting and debugging paths. For example, the following phrase:

It is our pool tag. It is effected by driver A, driver B and driver C.  

Surely driver A, driver B and driver C were not developed by the same company that introduced the problem pool tag (smells Alien Component here). Unless supported by solid evidence the better phrase shall be:

It is our pool tag. It might have been effected by driver A, driver B or driver C.  

I’m not advocating to completely eradicate “be” verbs as was done in E-Prime language but to be conscious in their use. Thanks to Simple*ology in pointing me to the right direction.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 9d)

November 29th, 2007

Finally I got a good example of Deadlock pattern involving LPC. In the stack trace below svchost.exe thread (we call it thread A) receives an LPC call and dispatches it to componentA module which makes another LPC call (MessageId 000135b8) and waiting for a reply: 

THREAD 89143020  Cid 09b4.10dc  Teb: 7ff91000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8914320c  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 000135b8:
Current LPC port d64a5328
Not impersonating
DeviceMap                 d64028f0
Owning Process            891b8b80       Image:         svchost.exe
Wait Start TickCount      237408         Ticks: 1890 (0:00:00:29.531)
Context Switch Count      866            
UserTime                  00:00:00.031
KernelTime                00:00:00.015
Win32 Start Address 0×000135b2
LPC Server thread working on message Id 135b2
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init b91f9000 Current b91f8c08 Base b91f9000 Limit b91f6000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
b91f8c20 8083e6a2 nt!KiSwapContext+0×26
b91f8c4c 8083f164 nt!KiSwapThread+0×284
b91f8c94 8093983f nt!KeWaitForSingleObject+0×346
b91f8d50 80834d3f nt!NtRequestWaitReplyPort+0×776
b91f8d50 7c94ed54 nt!KiFastCallEntry+0xfc
02bae928 7c941c94 ntdll!KiFastSystemCallRet
02bae92c 77c42700 ntdll!NtRequestWaitReplyPort+0xc
02bae984 77c413ba RPCRT4!LRPC_CCALL::SendReceive+0×230
02bae990 77c42c7f RPCRT4!I_RpcSendReceive+0×24
02bae9a4 77cb5d63 RPCRT4!NdrSendReceive+0×2b
02baec48 674825b6 RPCRT4!NdrClientCall+0×334

02baec5c 67486776 componentA!bar+0×16



02baf8d4 77c40f3b componentA!foo+0×157
02baf8f8 77cb23f7 RPCRT4!Invoke+0×30
02bafcf8 77cb26ed RPCRT4!NdrStubCall2+0×299
02bafd14 77c409be RPCRT4!NdrServerCall2+0×19
02bafd48 77c4093f RPCRT4!DispatchToStubInCNoAvrf+0×38
02bafd9c 77c40865 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
02bafdc0 77c434b1 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
02bafdfc 77c41bb3 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
02bafe20 77c45458 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
02baff84 77c2778f RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
02baff8c 77c2f7dd RPCRT4!RecvLotsaCallsWrapper+0xd
02baffac 77c2de88 RPCRT4!BaseCachedThreadRoutine+0×9d
02baffb8 7c82608b RPCRT4!ThreadStartRoutine+0×1b
02baffec 00000000 kernel32!BaseThreadStart+0×34

We search for that LPC message to find the server thread:

1: kd> !lpc message 000135b8
Searching message 135b8 in threads ...
    Server thread 89115db0 is working on message 135b8
Client thread 89143020 waiting a reply from 135b8  


                       

It belongs to Process.exe (we call it thread B):

1: kd> !thread 89115db0 0x16
THREAD 89115db0  Cid 098c.0384  Teb: 7ff79000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a114628  SynchronizationEvent
Not impersonating
DeviceMap                 d64028f0
Owning Process            8a2c9d88       Image:         Process.exe
Wait Start TickCount      237408         Ticks: 1890 (0:00:00:29.531)
Context Switch Count      1590            
UserTime                  00:00:03.265
KernelTime                00:00:01.671
Win32 Start Address 0x000135b8
LPC Server thread working on message Id 135b8
Start Address kernel32!BaseThreadStartThunk (0x7c82b5f3)
Stack Init b952d000 Current b952cc60 Base b952d000 Limit b952a000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b952cc78 8083e6a2 89115e28 89115db0 89115e58 nt!KiSwapContext+0x26
b952cca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0x284
b952ccec 8092db70 8a114628 00000006 ffffff01 nt!KeWaitForSingleObject+0x346
b952cd50 80834d3f 00000a7c 00000000 00000000 nt!NtWaitForSingleObject+0x9a
b952cd50 7c94ed54 00000a7c 00000000 00000000 nt!KiFastCallEntry+0xfc
22aceb48 7c942124 7c95970f 00000a7c 00000000 ntdll!KiFastSystemCallRet
22aceb4c 7c95970f 00000a7c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
22aceb88 7c959620 00000000 00000004 00002000 ntdll!RtlpWaitOnCriticalSection+0x19c
22aceba8 1b005744 06d30940 1b05ea80 06d30940 ntdll!RtlEnterCriticalSection+0xa8
22acebb0 1b05ea80 06d30940 feffffff 0cd410c0 componentB!bar+0xb



22acf8b0 77c40f3b 00080002 000800e2 00000001 componentB!foo+0xeb
22acf8e0 77cb23f7 0de110dc 22acfac8 00000007 RPCRT4!Invoke+0×30
22acfce0 77cb26ed 00000000 00000000 19f38f94 RPCRT4!NdrStubCall2+0×299
22acfcfc 77c409be 19f38f94 17316ef0 19f38f94 RPCRT4!NdrServerCall2+0×19
22acfd30 77c75e41 0de1dc58 19f38f94 22acfdec RPCRT4!DispatchToStubInCNoAvrf+0×38
22acfd48 77c4093f 0de1dc58 19f38f94 22acfdec RPCRT4!DispatchToStubInCAvrf+0×14
22acfd9c 77c40865 00000041 00000000 0de2b398 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
22acfdc0 77c434b1 19f38f94 00000000 0de2b398 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
22acfdfc 77c41bb3 1beeaec8 16b96f50 1baeef00 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
22acfe20 77c45458 16b96f88 22acfe38 1beeaec8 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127

0×16 flags for !thread extension command are used to temporarily set the process context to the owning process and show the first three function call parameters. We see that the thread B is waiting for the critical section 06d30940 and we use user space !locks extension command to find who owns it after switching process context:

1: kd> .process /r /p 8a2c9d88
Implicit process is now 8a2c9d88
Loading User Symbols

1: kd> !ntsdexts.locks

CritSec +6d30940 at 06d30940
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       d6c
EntryCount         0
ContentionCount    1
*** Locked

Now we try to find a thread with TID d6c (thread C):

1: kd> !thread -t d6c
Looking for thread Cid = d6c ...
THREAD 890d8bb8  Cid 098c.0d6c  Teb: 7ff71000 Win32Thread: bc23cc20 WAIT: (Unknown) UserMode Non-Alertable
    890d8da4  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 000135ea:
Current LPC port d649a678
Not impersonating
DeviceMap                 d64028f0
Owning Process            8a2c9d88       Image:         Process.exe
Wait Start TickCount      237641         Ticks: 1657 (0:00:00:25.890)
Context Switch Count      2102                 LargeStack
UserTime                  00:00:00.734
KernelTime                00:00:00.234
Win32 Start Address msvcrt!_endthreadex (0×77b9b4bc)
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init ba91d000 Current ba91cc08 Base ba91d000 Limit ba919000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ba91cc20 8083e6a2 890d8c30 890d8bb8 890d8c60 nt!KiSwapContext+0×26
ba91cc4c 8083f164 890d8da4 890d8d78 890d8bb8 nt!KiSwapThread+0×284
ba91cc94 8093983f 890d8da4 00000011 8a2c9d01 nt!KeWaitForSingleObject+0×346
ba91cd50 80834d3f 000008bc 19c94f00 19c94f00 nt!NtRequestWaitReplyPort+0×776
ba91cd50 7c94ed54 000008bc 19c94f00 19c94f00 nt!KiFastCallEntry+0xfc
2709ebf4 7c941c94 77c42700 000008bc 19c94f00 ntdll!KiFastSystemCallRet
2709ebf8 77c42700 000008bc 19c94f00 19c94f00 ntdll!NtRequestWaitReplyPort+0xc
2709ec44 77c413ba 2709ec80 2709ec64 77c42c7f RPCRT4!LRPC_CCALL::SendReceive+0×230
2709ec50 77c42c7f 2709ec80 779b2770 2709f06c RPCRT4!I_RpcSendReceive+0×24
2709ec64 77cb219b 2709ecac 1957cfe4 1957ab38 RPCRT4!NdrSendReceive+0×2b
2709f04c 779b43a3 779b2770 779b1398 2709f06c RPCRT4!NdrClientCall2+0×22e




2709ff84 77b9b530 26658fb0 00000000 00000000 ComponentC!foo+0×18d
2709ffb8 7c82608b 26d9af70 00000000 00000000 msvcrt!_endthreadex+0xa3
2709ffec 00000000 77b9b4bc 26d9af70 00000000 kernel32!BaseThreadStart+0×34

We see that thread C makes another LPC call (MessageId 000135e) and waiting for a reply. Let’s find the server thread processing the message (thread D):

1: kd> !lpc message 000135ea
Searching message 135ea in threads ...
Client thread 890d8bb8 waiting a reply from 135ea                         
    Server thread 89010020 is working on message 135ea


1: kd> !thread 89010020 16
THREAD 89010020  Cid 09b4.1530  Teb: 7ff93000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8903ba28  Mutant - owning thread 89143020
Not impersonating
DeviceMap                 d64028f0
Owning Process            891b8b80       Image:         svchost.exe
Wait Start TickCount      237641         Ticks: 1657 (0:00:00:25.890)
Context Switch Count      8            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×000135ea
LPC Server thread working on message Id 135ea
Start Address kernel32!BaseThreadStartThunk (0×7c82b5f3)
Stack Init b9455000 Current b9454c60 Base b9455000 Limit b9452000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b9454c78 8083e6a2 89010098 89010020 890100c8 nt!KiSwapContext+0×26
b9454ca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0×284
b9454cec 8092db70 8903ba28 00000006 00000001 nt!KeWaitForSingleObject+0×346
b9454d50 80834d3f 00000514 00000000 00000000 nt!NtWaitForSingleObject+0×9a
b9454d50 7c94ed54 00000514 00000000 00000000 nt!KiFastCallEntry+0xfc
02b5f720 7c942124 75fdbe44 00000514 00000000 ntdll!KiFastSystemCallRet
02b5f724 75fdbe44 00000514 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
02b5f744 75fdc57f 000e6014 000da62c 02b5fca0 ComponentD!bar+0×42



02b5f8c8 77c40f3b 000d0a48 02b5fc90 00000001 ComponentD!foo+0×49
02b5f8f8 77cb23f7 75fdf8f2 02b5fae0 00000007 RPCRT4!Invoke+0×30
02b5fcf8 77cb26ed 00000000 00000000 000d4f24 RPCRT4!NdrStubCall2+0×299
02b5fd14 77c409be 000d4f24 000b5d70 000d4f24 RPCRT4!NdrServerCall2+0×19
02b5fd48 77c4093f 75fff834 000d4f24 02b5fdec RPCRT4!DispatchToStubInCNoAvrf+0×38
02b5fd9c 77c40865 00000005 00000000 7600589c RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×117
02b5fdc0 77c434b1 000d4f24 00000000 7600589c RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
02b5fdfc 77c41bb3 000d3550 000a78d0 001054b8 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
02b5fe20 77c45458 000a7908 02b5fe38 000d3550 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
02b5ff84 77c2778f 02b5ffac 77c2f7dd 000a78d0 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
02b5ff8c 77c2f7dd 000a78d0 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd
02b5ffac 77c2de88 0008ae00 02b5ffec 7c82608b RPCRT4!BaseCachedThreadRoutine+0×9d
02b5ffb8 7c82608b 000d5c20 00000000 00000000 RPCRT4!ThreadStartRoutine+0×1b
02b5ffec 00000000 77c2de6d 000d5c20 00000000 kernel32!BaseThreadStart+0×34

We see that thread D is waiting for the mutant object owned by thread A (89143020). Therefore we have a deadlock spanning 2 process boundaries via RPC/LPC calls with the following dependency graph:

A (svchost.exe) LPC-> B (Process.exe) CritSec-> C (Process.exe) LPC-> D (svchost.exe) Obj-> A (svchost.exe)

- Dmitry Vostokov @ DumpAnalysis.org -

Four pillars of software troubleshooting

November 29th, 2007

They are (sorted alphabetically):

  1. Crash Dump Analysis (also called Memory Dump Analysis or Core Dump Analysis)

  2. Problem Reproduction

  3. Trace and Log Analysis

  4. Virtual Assistance (also called Remote Assistance)

 

For troubleshooting software on Windows platforms Citrix provides GoToAssist for virtual on-site presence and Xen for problem reproduction.

- Dmitry Vostokov @ DumpAnalysis.org -

Understanding I/O Completion Ports

November 27th, 2007

Many articles and books explain Windows I/O completion ports from high level design considerations arising when building high-performance server software. But it is hard to recall them later when someone asks to explain and not everyone writes that software. Looking at complete memory dumps has an advantage of a bottom-up or reverse engineering approach where we see internals of server software and can immediately grasp the implementation of certain architectural and design decisions.

Consider this thread stack trace we can find almost inside any service or network application process:

THREAD 86cf09c0  Cid 05cc.2030  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a3bb970  QueueObject
    86cf0a38  NotificationTimer
Not impersonating
DeviceMap                 e15af5a8
Owning Process            8a3803d8       Image:         svchost.exe
Wait Start TickCount      2131621        Ticks: 1264 (0:00:00:19.750)
Context Switch Count      6            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address RPCRT4!ThreadStartRoutine (0×77c5de6d)
Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3)
Stack Init ba276000 Current ba275c38 Base ba276000 Limit ba273000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
ba275c50 8083d3b1 nt!KiSwapContext+0×26
ba275c7c 8083dea2 nt!KiSwapThread+0×2e5
ba275cc4 8092b205 nt!KeRemoveQueue+0×417
ba275d48 80833a6f nt!NtRemoveIoCompletion+0xdc

ba275d48 7c82ed54 nt!KiFastCallEntry+0xfc
0093feac 7c821bf4 ntdll!KiFastSystemCallRet
0093feb0 77e66142 ntdll!NtRemoveIoCompletion+0xc
0093fedc 77c604c3 kernel32!GetQueuedCompletionStatus+0×29

0093ff18 77c60655 RPCRT4!COMMON_ProcessCalls+0xa1
0093ff84 77c5f9f1 RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+0×117
0093ff8c 77c5f7dd RPCRT4!ProcessIOEventsWrapper+0xd
0093ffac 77c5de88 RPCRT4!BaseCachedThreadRoutine+0×9d
0093ffb8 77e6608b RPCRT4!ThreadStartRoutine+0×1b
0093ffec 00000000 kernel32!BaseThreadStart+0×34

We see that I/O completion port is implemented via kernel queue object so requests (work items, completion notifications, etc) are stored in that queue for further processing by threads. The number of active threads processing requests is bound to some maximum value that usually corresponds to the number of processors:

0: kd> dt _KQUEUE 8a3bb970
ntdll!_KQUEUE
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 EntryListHead    : _LIST_ENTRY [ 0x8a3bb980 - 0x8a3bb980 ]
   +0x018 CurrentCount     : 0
   +0×01c MaximumCount     : 2
   +0×020 ThreadListHead   : _LIST_ENTRY [ 0×86cf0ac8 - 0×89ff9520 ]

0: kd> !smt
SMT Summary:
------------
   KeActiveProcessors: **------------------------------ (00000003)
        KiIdleSummary: **------------------------------ (00000003)
No PRCB     Set Master SMT Set                                     IAID
 0 ffdff120 Master     **—————————— (00000003)  00
 1 f772f120 ffdff120   **—————————— (00000003)  01

Kernel work queues are also implemented via the same queue object as we might have guessed already:

THREAD 8a777660  Cid 0004.00d0  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    808b707c  QueueObject
Not impersonating
DeviceMap                 e1000928
Owning Process            8a780818       Image:         System
Wait Start TickCount      2615           Ticks: 2130270 (0:09:14:45.468)
Context Switch Count      301            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Start Address nt!ExpWorkerThread (0×8082d92b)
Stack Init f71e0000 Current f71dfcec Base f71e0000 Limit f71dd000 Call 0
Priority 12 BasePriority 12 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f71dfd04 8083d3b1 nt!KiSwapContext+0×26
f71dfd30 8083dea2 nt!KiSwapThread+0×2e5
f71dfd78 8082d9c1 nt!KeRemoveQueue+0×417
f71dfdac 809208fc nt!ExpWorkerThread+0xc8
f71dfddc 8083fc9f nt!PspSystemThreadStartup+0×2e
00000000 00000000 nt!KiThreadStartup+0×16

0: kd> dt _KQUEUE 808b707c
ntdll!_KQUEUE
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 EntryListHead    : _LIST_ENTRY [ 0x808b708c - 0x808b708c ]
   +0x018 CurrentCount     : 0
   +0×01c MaximumCount     : 2
   +0×020 ThreadListHead   : _LIST_ENTRY [ 0×8a77a128 - 0×8a777768 ]

I’ve created the simple UML diagram showing high-level relationship between various objects seen from crash dumps. Note that Active Thread object can process items from more than one completion port if its wait was satisfied for one port and then for another but I have never seen this. Obviously Waiting thread can wait only for one completion port. 

- Dmitry Vostokov @ DumpAnalysis.org -

DebugWare

November 27th, 2007

I’ve been slowly accumulating blog posts about various troubleshooting tools for my next book in a row with a working title:

DebugWare: The Art and Craft of Writing Troubleshooting and Debugging Tools

Details will be announced later together with supporting website which is under construction. This book will be about architecture, design and implementation of troubleshooting tools for software technical support.

- Dmitry Vostokov @ DumpAnalysis.org -

Teaching binary to decimal conversion

November 26th, 2007

Sometimes we have data in binary and we want to convert it to decimal to lookup some constant in a header file, for example. I used to do it previously via calc.exe. Now I use .formats WinDbg command and 0y binary prefix:

0:000> .formats 0y111010
Evaluate expression:
  Hex:     0000003a
  Decimal: 58
  Octal:   00000000072
  Binary:  00000000 00000000 00000000 00111010
  Chars:   ...:
  Time:    Thu Jan 01 00:00:58 1970
  Float:   low 8.12753e-044 high 0
  Double:  2.86558e-322

Some months ago I was flying SWISS and found this binary watch in their duty-free catalog which I use now to guess time :-)

01 The One Binary Watch
 

Buy from Amazon

It has 6 binary digits for minutes. There are desktop binary clocks and other binary watches available if you google them but they don’t have 6 binary digits for minutes. They approximate them by using 2 rows or columns: tenths of minutes and minutes (2 + 4 binary digits) and we are all good in handling 4 binary digits because of our work with hexadecimal nibbles but not good in handling more binary digits like 5 or 6 when we see them in one row. 

- Dmitry Vostokov @ DumpAnalysis.org -

Stack traces on the Web

November 26th, 2007

How many WinDbg stack traces on the web are available for mining? Google gave the answer when I searched for typical stack trace text fragments:

"ChildEBP RetAddr" - about 40,200

"ChildEBP RetAddr  Args to Child" - about 30,000

"Frame IP not in any known module" - about 10,800

- Dmitry Vostokov @ DumpAnalysis.org -

Five golden rules of troubleshooting

November 26th, 2007

It is difficult to analyze a problem when you have crash dumps and/or traces from various tracing tools and supporting information you have is incomplete or missing. After doing crash dump and trace analysis including ETW-based traces for more than 4 years I came up with this easy to remember 4WS questions to ask when you send or request traces and memory dumps:

What - What had happened or had been observed? Crash or hang, for example?

When - When did the problem happen if traces were recorded for hours?

Where - What server or workstation had been used for tracing or where memory dumps came from? For example, one trace is from a primary server and two others are from backup servers or one trace is from a client workstation and the other is from a server. 

Why - Why did a customer or a support engineer request a dump or a trace? This could shed the light on various assumptions including presuppositions hidden in problem description.  

Supporting information - needed to find a needle in a hay: process id, thread id, etc. Also, the answer to the following question is important: how dumps and traces were created?

Every trace or memory dump shall be accompanied by 4WS answers.  

4WS rule can be applied to any troubleshooting because even the problem description itself is some kind of a trace.

- Dmitry Vostokov @ DumpAnalysis.org -