Archive for the ‘Crash Dump Analysis’ Category

Inside Vista Error Reporting (Part 1)

Saturday, May 19th, 2007

This is a follow up to the post about postmortem debuggers on Windows XP/W2K3. Now we look inside the same mechanism on Vista. After launching TestDefaultDebugger and pushing its crash button we get the following Windows error reporting dialog:

If we attach WinDbg to our TestDefaultDebugger process we would no longer see our default unhandled exception filter waiting for the error reporting process:

Windows XP

0:000> k
ChildEBP RetAddr
0012d318 7c90e9ab ntdll!KiFastSystemCallRet
0012d31c 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
0012d3b8 7c80a075 kernel32!WaitForMultipleObjectsEx+0×12c
0012d3d4 6945763c kernel32!WaitForMultipleObjects+0×18
0012dd68 694582b1 faultrep!StartDWException+0×5df

0012eddc 7c8633e9 faultrep!ReportFault+0×533
0012f47c 00411eaa kernel32!UnhandledExceptionFilter+0×587
0012f8a4 00403263 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1
0012f8b4 00403470 TestDefaultDebugger!_AfxDispatchCmdMsg+0×43



0012fff0 00000000 kernel32!BaseProcessStart+0×23

Windows Vista

0:001> ~*kL 100
0 Id: 120c.148c Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0012f8a4 00403263 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1
0012f8b4 00403470 TestDefaultDebugger!_AfxDispatchCmdMsg+0×43
0012f8e4 00402a27 TestDefaultDebugger!CCmdTarget::OnCmdMsg+0×118
0012f908 00408e69 TestDefaultDebugger!CDialog::OnCmdMsg+0×1b
0012f958 004098d9 TestDefaultDebugger!CWnd::OnCommand+0×90
0012f9f4 00406258 TestDefaultDebugger!CWnd::OnWndMsg+0×36
0012fa14 0040836d TestDefaultDebugger!CWnd::WindowProc+0×22
0012fa7c 004083f4 TestDefaultDebugger!AfxCallWndProc+0×9a
0012fa9c 77b71a10 TestDefaultDebugger!AfxWndProc+0×34
0012fac8 77b71ae8 USER32!InternalCallWinProc+0×23
0012fb40 77b7286a USER32!UserCallWinProcCheckWow+0×14b
0012fb80 77b72bba USER32!SendMessageWorker+0×4b7
0012fba0 7504e5cc USER32!SendMessageW+0×7c
0012fbc0 7504e583 COMCTL32!Button_NotifyParent+0×3d
0012fbdc 7504e680 COMCTL32!Button_ReleaseCapture+0×112
0012fc34 77b71a10 COMCTL32!Button_WndProc+0xa4b
0012fc60 77b71ae8 USER32!InternalCallWinProc+0×23
0012fcd8 77b72a47 USER32!UserCallWinProcCheckWow+0×14b
0012fd3c 77b72a98 USER32!DispatchMessageWorker+0×322
0012fd4c 77b6120c USER32!DispatchMessageW+0xf
0012fd70 0040568b USER32!IsDialogMessageW+0×586
0012fd80 004065d8 TestDefaultDebugger!CWnd::IsDialogMessageW+0×2e
0012fd88 00402a07 TestDefaultDebugger!CWnd::PreTranslateInput+0×29
0012fd98 00408041 TestDefaultDebugger!CDialog::PreTranslateMessage+0×96
0012fda8 00403ae3 TestDefaultDebugger!CWnd::WalkPreTranslateTree+0×1f
0012fdbc 00403c1e TestDefaultDebugger!AfxInternalPreTranslateMessage+0×3b
0012fdc4 00403b29 TestDefaultDebugger!CWinThread::PreTranslateMessage+0×9
0012fdcc 00403c68 TestDefaultDebugger!AfxPreTranslateMessage+0×15
0012fddc 00407920 TestDefaultDebugger!AfxInternalPumpMessage+0×2b
0012fe00 004030a1 TestDefaultDebugger!CWnd::RunModalLoop+0xca
0012fe4c 0040110d TestDefaultDebugger!CDialog::DoModal+0×12c
0012fef8 004206fb TestDefaultDebugger!CTestDefaultDebuggerApp::InitInstance+0xdd
0012ff08 0040e852 TestDefaultDebugger!AfxWinMain+0×47
0012ffa0 77603833 TestDefaultDebugger!__tmainCRTStartup+0×176
0012ffac 779ea9bd kernel32!BaseThreadInitThunk+0xe
0012ffec 00000000 ntdll!_RtlUserThreadStart+0×23
# 1 Id: 120c.17e4 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
011cff70 77a3f0a9 ntdll!DbgBreakPoint
011cffa0 77603833 ntdll!DbgUiRemoteBreakin+0×3c
011cffac 779ea9bd kernel32!BaseThreadInitThunk+0xe
011cffec 00000000 ntdll!_RtlUserThreadStart+0×23

Let’s look at the faulting thread’s raw stack data:

0:001> ~0 s
eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8 edi=0012fe70
eip=004014f0 esp=0012f8a8 ebp=0012f8b4 iopl=0 nv up ei ng nz ac pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 mov dword ptr ds:[0],0 ds:0023:00000000=????????
0:000> !teb
TEB at 7ffdf000
ExceptionList: 0012f9e8
StackBase: 00130000
StackLimit: 0012d000
SubSystemTib: 00000000
FiberData: 00001e00
ArbitraryUserPointer: 00000000
Self: 7ffdf000
EnvironmentPointer: 00000000
ClientId: 0000120c . 0000148c
RpcHandle: 00000000
Tls Storage: 7ffdf02c
PEB Address: 7ffda000
LastErrorValue: 0
LastStatusValue: c000008a
Count Owned Locks: 0
HardErrorMode: 0
0:000>dds 0012d000 00130000



0012f368 0012f3c0
0012f36c 7760fb01 kernel32!GetApplicationRecoveryCallback+0×33
0012f370 ffffffff
0012f374 0012f380
0012f378 00000001
0012f37c 00000000
0012f380 00000000
0012f384 00000000
0012f388 00000000
0012f38c 00000000
0012f390 00000000
0012f394 00000000
0012f398 00000000
0012f39c 00000000
0012f3a0 00000000
0012f3a4 00000000
0012f3a8 00000000
0012f3ac 00000000
0012f3b0 00000000
0012f3b4 00000000
0012f3b8 00000000
0012f3bc 00000000
0012f3c0 0012f410
0012f3c4 7767aa88 kernel32!WerpReportExceptionInProcessContext+0×82
0012f3c8 ffffffff
0012f3cc 0012f3ec
0012f3d0 00000000
0012f3d4 00000000
0012f3d8 7767aab7 kernel32!WerpReportExceptionInProcessContext+0xa7
0012f3dc 001257b9
0012f3e0 00000001
0012f3e4 00000000
0012f3e8 0012f4c8
0012f3ec 00000000
0012f3f0 00000000
0012f3f4 00000000
0012f3f8 0012f3dc
0012f3fc ffffffff
0012f400 0012f488
0012f404 775d5ac9 kernel32!_except_handler4
0012f408 77670969 kernel32!Internal_NotifyUILanguageChange+0×4a6
0012f40c fffffffe
0012f410 7767aab7 kernel32!WerpReportExceptionInProcessContext+0xa7
0012f414 77655b41 kernel32!UnhandledExceptionFilter+0×1b2
0012f418 77655cbd kernel32!UnhandledExceptionFilter+0×32e
0012f41c 00125731
0012f420 00000000
0012f424 0012f4c8
0012f428 00000000
0012f42c 00000000
0012f430 00000000
0012f434 00000000
0012f438 00000000
0012f43c 00000800
0012f440 00000000
0012f444 00000000
0012f448 00000000
0012f44c 00000000
0012f450 00000000
0012f454 00000005
0012f458 994ac7c4
0012f45c 00000011
0012f460 00000000
0012f464 0012f5c0
0012f468 775d5ac9 kernel32!_except_handler4
0012f46c 00000001
0012f470 00000000
0012f474 77655cbd kernel32!UnhandledExceptionFilter+0×32e
0012f478 00000000
0012f47c 00000000
0012f480 0012f41c
0012f484 00000024
0012f488 0012f4f4
0012f48c 775d5ac9 kernel32!_except_handler4
0012f490 7765ff59 kernel32!PEWriteResource<_IMAGE_NT_HEADERS>+0×50a
0012f494 fffffffe
0012f498 77655cbd kernel32!UnhandledExceptionFilter+0×32e
0012f49c 77a29f8e ntdll!_RtlUserThreadStart+0×6f
0012f4a0 00000000
0012f4a4 779b8dd4 ntdll!_EH4_CallFilterFunc+0×12
0012f4a8 00000000
0012f4ac 0012ffec
0012f4b0 779ff108 ntdll! ?? ::FNODOBFM::`string’+0xb6e
0012f4b4 0012f4dc
0012f4b8 779b40e4 ntdll!_except_handler4+0xcc
0012f4bc 00000000
0012f4c0 00000000
0012f4c4 00000000
0012f4c8 0012f5c0
0012f4cc 0012f5dc
0012f4d0 779ff118 ntdll! ?? ::FNODOBFM::`string’+0xb7e
0012f4d4 00000001
0012f4d8 0112f5c0
0012f4dc 0012f500
0012f4e0 77a11039 ntdll!ExecuteHandler2+0×26
0012f4e4 fffffffe
0012f4e8 0012ffdc
0012f4ec 0012f5dc
0012f4f0 0012f59c
0012f4f4 0012f9e8
0012f4f8 77a1104d ntdll!ExecuteHandler2+0×3a
0012f4fc 0012ffdc
0012f500 0012f5a8
0012f504 77a1100b ntdll!ExecuteHandler+0×24
0012f508 0012f5c0
0012f50c 0012ffdc
0012f510 0012fe70
0012f514 0012f59c
0012f518 779b8bf2 ntdll!_except_handler4
0012f51c 00000000
0012f520 0012f5c0
0012f524 0012f538
0012f528 779b94e3 ntdll!RtlCallVectoredContinueHandlers+0×15
0012f52c 0012f5c0
0012f530 0012f5dc
0012f534 77a754c0 ntdll!RtlpCallbackEntryList
0012f538 0012f5a8
0012f53c 779b94c1 ntdll!RtlDispatchException+0×11f
0012f540 0012f5c0
0012f544 0012f5dc
0012f548 00425ae8 TestDefaultDebugger!CTestDefaultDebuggerApp::`vftable’+0×154
0012f54c 00000000
0012f550 00000502
0012f554 00000000
0012f558 00a460e0
0012f55c 00000000
0012f560 00000000
0012f564 00000070
0012f568 ffffffff
0012f56c ffffffff
0012f570 77b60dba USER32!UserCallDlgProcCheckWow+0×5f
0012f574 77b60e63 USER32!UserCallDlgProcCheckWow+0×16e
0012f578 0000006c
0012f57c 00000000
0012f580 00000000
0012f584 00000000
0012f588 00000000
0012f58c 0000004e
0012f590 00000000
0012f594 0012f634
0012f598 77bb76cc USER32!_except_handler4
0012f59c 0012f634
0012f5a0 00130000
0012f5a4 00000000
0012f5a8 0012f8b4
0012f5ac 77a10060 ntdll!NtRaiseException+0xc
0012f5b0 77a10eb2 ntdll!KiUserExceptionDispatcher+0×2a
0012f5b4 0012f5c0


It shows the presence of kernel32!UnhandledExceptionFilter calls. Let’s open TestDefaultDebugger.exe in WinDbg, put breakpoint on UnhandledExceptionFilter and trace the execution. We have to change the return value of IsDebugPortPresent to simulate the normal fault handling logic when no active debugger is attached:

0:000> bp kernel32!UnhandledExceptionFilter
0:000> g
(fb0.1190): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8 edi=0012fe70
eip=004014f0 esp=0012f8a8 ebp=0012f8b4 iopl=0 nv up ei ng nz ac pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 mov dword ptr ds:[0],0 ds:0023:00000000=????????
0:000> g
Breakpoint 0 hit
eax=0042ae58 ebx=00000000 ecx=0042ae58 edx=0042ae58 esi=003b07d8 edi=c0000005
eip=77655984 esp=0012f478 ebp=0012f494 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
kernel32!UnhandledExceptionFilter:
77655984 push 5Ch
0:000> g $$ skip first chance exception
Breakpoint 0 hit
eax=77655984 ebx=00000000 ecx=0012f404 edx=77a10f34 esi=0012f4c8 edi=00000000
eip=77655984 esp=0012f49c ebp=0012ffec iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
kernel32!UnhandledExceptionFilter:
77655984 push 5Ch
0:000> p
eax=77655984 ebx=00000000 ecx=0012f404 edx=77a10f34 esi=0012f4c8 edi=00000000
eip=77655986 esp=0012f498 ebp=0012ffec iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
kernel32!UnhandledExceptionFilter+0×2:
77655986 push offset kernel32!strcat_s+0×128d (77655cf0)



0:000> p
eax=00000000 ebx=0012f4c8 ecx=776558e5 edx=77a10f34 esi=00000000 edi=00000000
eip=77655a33 esp=0012f41c ebp=0012f498 iopl=0 nv up ei pl nz ac po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000213
kernel32!UnhandledExceptionFilter+0xa5:
77655a33 call kernel32!IsDebugPortPresent (7765594c)
0:000> p
eax=00000001 ebx=0012f4c8 ecx=0012f3f4 edx=77a10f34 esi=00000000 edi=00000000
eip=77655a38 esp=0012f41c ebp=0012f498 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
kernel32!UnhandledExceptionFilter+0xaa:
77655a38 test eax,eax
0:000> r eax=0
0:000> p
eax=00000000 ebx=0012f4c8 ecx=0012f3f4 edx=77a10f34 esi=00000000 edi=00000000
eip=77655a3a esp=0012f41c ebp=0012f498 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
kernel32!UnhandledExceptionFilter+0xac:
77655a3a jne kernel32!UnhandledExceptionFilter+0×22 (776559a6) [br=0]

Next, we continue to step over using p command until we see WerpReportExceptionInProcessContext function and step into it:

0:000> p
eax=c0000022 ebx=0012f4c8 ecx=0012f400 edx=77a10f34 esi=00000000 edi=00000001
eip=77655b3c esp=0012f418 ebp=0012f498 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
kernel32!UnhandledExceptionFilter+0×1ad:
77655b3c call kernel32!WerpReportExceptionInProcessContext (7767aa06)
0:000> t
eax=c0000022 ebx=0012f4c8 ecx=0012f400 edx=77a10f34 esi=00000000 edi=00000001
eip=7767aa06 esp=0012f414 ebp=0012f498 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
kernel32!WerpReportExceptionInProcessContext:
7767aa06 push 14h

At this point if we look at the stack trace we would see:

0:000> kL 100
ChildEBP RetAddr
0012f410 77655b41 kernel32!WerpReportExceptionInProcessContext
0012f498 77a29f8e kernel32!UnhandledExceptionFilter+0×1b2
0012f4a0 779b8dd4 ntdll!_RtlUserThreadStart+0×6f
0012f4b4 779b40f0 ntdll!_EH4_CallFilterFunc+0×12
0012f4dc 77a11039 ntdll!_except_handler4+0×8e
0012f500 77a1100b ntdll!ExecuteHandler2+0×26
0012f5a8 77a10e97 ntdll!ExecuteHandler+0×24
0012f5a8 004014f0 ntdll!KiUserExceptionDispatcher+0xf
0012f8a4 00403263 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1
0012f8b4 00403470 TestDefaultDebugger!_AfxDispatchCmdMsg+0×43


After that we step over again and find that the code flow returns from all exception handlers until KiUserExceptionDispatcher raises exception again via ZwRaiseException call.

So it looks that the default unhandled exception filter in Vista only reports the exception and doesn’t launch the error reporting process that displays the error box, WerFault.exe.

If we click on Debug button on the error reporting dialog to launch the postmortem debugger (I have Visual Studio Just-In-Time Debugger configured in AeDebug\Debugger registry key) and look at its parent process by using Process Explorer for example, we would see it is WerFault.exe which in turn has svchost.exe as its parent.

Now we quit WinDbg and launch TestDefaultDebugger again, push its big crash button and when the error reporting dialog appears we attach another instance of WinDbg to svchost.exe process hosting Windows Error Reporting Service (wersvc.dll).
We see the following threads:

0:000> ~*k
. 0 Id: f8c.f90 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0008f5b4 77a10080 ntdll!KiFastSystemCallRet
0008f5b8 7760853f ntdll!ZwReadFile+0xc
0008f630 7709ffe2 kernel32!ReadFile+0×20e
0008f65c 7709fdfb ADVAPI32!ScGetPipeInput+0×2a
0008f6c4 7709bdd2 ADVAPI32!ScDispatcherLoop+0×6c
0008f93c 004a241d ADVAPI32!StartServiceCtrlDispatcherW+0xce
0008f944 004a2401 svchost!SvcHostMain+0×12
0008f948 004a2183 svchost!wmain+0×5
0008f98c 77603833 svchost!_initterm_e+0×163
0008f998 779ea9bd kernel32!BaseThreadInitThunk+0xe
0008f9d8 00000000 ntdll!_RtlUserThreadStart+0×23
1 Id: f8c.fa4 Suspend: 1 Teb: 7ffdd000 Unfrozen
ChildEBP RetAddr
0086f6d0 77a10690 ntdll!KiFastSystemCallRet
0086f6d4 779cb65b ntdll!ZwWaitForMultipleObjects+0xc
0086f870 77603833 ntdll!TppWaiterpThread+0×294
0086f87c 779ea9bd kernel32!BaseThreadInitThunk+0xe
0086f8bc 00000000 ntdll!_RtlUserThreadStart+0×23
2 Id: f8c.fa8 Suspend: 1 Teb: 7ffdc000 Unfrozen
ChildEBP RetAddr
0031f81c 77a0f2c0 ntdll!KiFastSystemCallRet
0031f820 71cb1545 ntdll!NtAlpcSendWaitReceivePort+0xc
0031fd3c 71cb63c4 wersvc!CWerService::LpcServerThread+0×9c
0031fd44 77603833 wersvc!CWerService::StaticLpcServerThread+0xd
0031fd50 779ea9bd kernel32!BaseThreadInitThunk+0xe
0031fd90 00000000 ntdll!_RtlUserThreadStart+0×23
3 Id: f8c.2cc Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
00f8f768 77a106a0 ntdll!KiFastSystemCallRet
00f8f76c 776077d4 ntdll!NtWaitForSingleObject+0xc
00f8f7dc 77607742 kernel32!WaitForSingleObjectEx+0xbe
00f8f7f0 71cb6f4b kernel32!WaitForSingleObject+0×12
00f8f858 71cb6803 wersvc!CWerService::ReportCrashKernelMsg+0×256
00f8fb7c 71cb6770 wersvc!CWerService::DispatchPortRequestWorkItem+0×70a
00f8fb90 779c1fbb wersvc!CWerService::StaticDispatchPortRequestWorkItem+0×17
00f8fbb4 77a1a2b8 ntdll!TppSimplepExecuteCallback+0×10c
00f8fcdc 77603833 ntdll!TppWorkerThread+0×522
00f8fce8 779ea9bd kernel32!BaseThreadInitThunk+0xe
00f8fd28 00000000 ntdll!_RtlUserThreadStart+0×23
4 Id: f8c.1b38 Suspend: 1 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr
00d3fe08 77a10850 ntdll!KiFastSystemCallRet
00d3fe0c 77a1a1b4 ntdll!NtWaitForWorkViaWorkerFactory+0xc
00d3ff34 77603833 ntdll!TppWorkerThread+0×1f6
00d3ff40 779ea9bd kernel32!BaseThreadInitThunk+0xe
00d3ff80 00000000 ntdll!_RtlUserThreadStart+0×23

First, it looks like some LPC notification mechanism is present here (CWerService::LpcServerThread).
Next, if we look at CWerService::ReportCrashKernelMsg code we would see it calls CWerService::ReportCrash which in turn loads faultrep.dll

0:000> .asm no_code_bytes
Assembly options: no_code_bytes
0:000> uf wersvc!CWerService::ReportCrashKernelMsg



wersvc!CWerService::ReportCrashKernelMsg+0×226:
71cb6f13 lea  eax,[ebp-20h]
71cb6f16 push eax
71cb6f17 push dword ptr [ebp-34h]
71cb6f1a push dword ptr [ebp-2Ch]
71cb6f1d call dword ptr [wersvc!_imp__GetCurrentProcessId (71cb1120)]
71cb6f23 push eax
71cb6f24 mov  ecx,dword ptr [ebp-38h]
71cb6f27 call wersvc!CWerService::ReportCrash (71cb7008)
71cb6f2c mov  dword ptr [ebp-1Ch],eax
71cb6f2f cmp  eax,ebx
71cb6f31 jl   wersvc!CWerService::ReportCrashKernelMsg+0×279 (71cb6a10)



0:000> uf wersvc!CWerService::ReportCrash



wersvc!CWerService::ReportCrash+0×3d:
71cb7045 mov  dword ptr [ebp-4],edi
71cb7048 push offset wersvc!`string’ (71cb711c)
71cb704d call dword ptr [wersvc!_imp__LoadLibraryW (71cb1144)]

71cb7053 mov  dword ptr [ebp-2Ch],eax
71cb7056 cmp  eax,edi
71cb7058 je   wersvc!CWerService::ReportCrash+0×52 (71cb9b47)

wersvc!CWerService::ReportCrash+0×88:
71cb705e push offset wersvc!`string’ (71cb7100)
71cb7063 push eax
71cb7064 call dword ptr [wersvc!_imp__GetProcAddress (71cb1140)]

71cb706a mov  ebx,eax
71cb706c cmp  ebx,edi
71cb706e je   wersvc!CWerService::ReportCrash+0×9a (71cb9b7d)



0:000> du 71cb711c
71cb711c “faultrep.dll”
0:000> da 71cb7100
71cb7100 “WerpInitiateCrashReporting”

If we attach a new instance of WinDbg to WerFault.exe and inspect its threads we would see:

0:003> ~*k
0 Id: 1bfc.16c4 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0015de60 77a10690 ntdll!KiFastSystemCallRet
0015de64 77607e09 ntdll!ZwWaitForMultipleObjects+0xc
0015df00 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d
0015df54 77b68b83 USER32!RealMsgWaitForMultipleObjectsEx+0×13c
0015df70 6d46d90d USER32!MsgWaitForMultipleObjects+0×1f
0015dfc0 6d4acd77 wer!UtilMsgWaitForMultipleObjects+0×8a
0015dff4 6d4a7694 wer!CInitialConsentUI::Show+0×133
0015e040 6d4a9a69 wer!CEventUI::GetInitialDialogSelection+0xc6
0015e104 6d46df18 wer!CEventUI::Start+0×32
0015e39c 6d46b743 wer!CWatson::ReportProblem+0×438
0015e3ac 6d46b708 wer!WatsonReportSend+0×1e
0015e3c8 6d46b682 wer!CDWInstance::WatsonReportStub+0×17
0015e3ec 6d472a7f wer!CDWInstance::SubmitReport+0×21e
0015e410 730b6d0d wer!WerReportSubmit+0×5d
0015f33c 730b73c1 faultrep!CCrashWatson::GenerateCrashReport+0×5c4
0015f5d4 730b4de1 faultrep!CCrashWatson::ReportCrash+0×374
0015fad4 009bd895 faultrep!WerpInitiateCrashReporting+0×304
0015fb0c 009b60cd WerFault!UserCrashMain+0×14e
0015fb30 009b644a WerFault!wmain+0xbf
0015fb74 77603833 WerFault!_initterm_e+0×163
1 Id: 1bfc.894 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
024afbf8 77a10690 ntdll!KiFastSystemCallRet
024afbfc 77607e09 ntdll!ZwWaitForMultipleObjects+0xc
024afc98 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d
024afcec 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c
024afd0c 74fa2cb6 DUser!CoreSC::Wait+0×59
024afd34 74fa2c55 DUser!CoreSC::WaitMessage+0×54
024afd44 77b615c0 DUser!MphWaitMessageEx+0×22
024afd60 77a10e6e USER32!__ClientWaitMessageExMPH+0×1e
024afd7c 77b6b5bc ntdll!KiUserCallbackDispatcher+0×2e
024afd80 77b61598 USER32!NtUserWaitMessage+0xc
024afdb4 77b61460 USER32!DialogBox2+0×202
024afddc 77b614a2 USER32!InternalDialogBox+0xd0
024afdfc 77b61505 USER32!DialogBoxIndirectParamAorW+0×37
024afe1c 75036c51 USER32!DialogBoxIndirectParamW+0×1b
024afe40 75036beb comctl32!SHFusionDialogBoxIndirectParam+0×2d
024afe74 6d4a65a4 comctl32!CTaskDialog::Show+0×100
024afebc 6d4acb72 wer!IsolationAwareTaskDialogIndirect+0×64
024aff4c 6d4acc39 wer!CInitialConsentUI::InitialDlgThreadRoutine+0×369
024aff54 77603833 wer!CInitialConsentUI::Static_InitialDlgThreadRoutine+0xd
024aff60 779ea9bd kernel32!BaseThreadInitThunk+0xe
2 Id: 1bfc.1a04 Suspend: 1 Teb: 7ffdc000 Unfrozen
ChildEBP RetAddr
012bf998 77a10690 ntdll!KiFastSystemCallRet
012bf99c 77607e09 ntdll!ZwWaitForMultipleObjects+0xc
012bfa38 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d
012bfa8c 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c
012bfaac 74fa1642 DUser!CoreSC::Wait+0×59
012bfae0 74fac442 DUser!CoreSC::xwProcessNL+0xaa
012bfb00 74fac3a2 DUser!GetMessageExA+0×44
012bfb54 779262b6 DUser!ResourceManager::SharedThreadProc+0xb6
012bfb8c 779263de msvcrt!_endthreadex+0×44
012bfb94 77603833 msvcrt!_endthreadex+0xce
012bfba0 779ea9bd kernel32!BaseThreadInitThunk+0xe
012bfbe0 00000000 ntdll!_RtlUserThreadStart+0×23
# 3 Id: 1bfc.14a4 Suspend: 1 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr
02a1fc40 77a3f0a9 ntdll!DbgBreakPoint
02a1fc70 77603833 ntdll!DbgUiRemoteBreakin+0×3c
02a1fc7c 779ea9bd kernel32!BaseThreadInitThunk+0xe
02a1fcbc 00000000 ntdll!_RtlUserThreadStart+0×23

Next, we put a breakpoint on CreateProcess, push Debug button on the error reporting dialog and upon the breakpoint hit inspect CreateProcess parameters:

0:003> .asm no_code_bytes
Assembly options: no_code_bytes
0:003> bp kernel32!CreateProcessW
0:003> g
Breakpoint 0 hit
eax=00000000 ebx=00000000 ecx=7ffdf000 edx=0015db30 esi=00000001 edi=00000000
eip=775c1d27 esp=0015dfe0 ebp=0015e408 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
kernel32!CreateProcessW:
775c1d27 mov edi,edi
0:000> ddu esp+8 l1
0015dfe8 008b0000 “”C:\WINDOWS\system32\vsjitdebugger.exe” -p 8064 -e 312″

ESP points to return address, ESP+4 points to the first CreateProcess parameter and ESP+8 points to the second parameter. The thread stack now involves faultrep.dll:

0:000> k
ChildEBP RetAddr
0020dde0 730bb2b5 kernel32!CreateProcessW
0020e20c 730b6dae faultrep!WerpLaunchAeDebug+0×384
0020f140 730b73c1 faultrep!CCrashWatson::GenerateCrashReport+0×665
0020f3d8 730b4de1 faultrep!CCrashWatson::ReportCrash+0×374
0020f8d8 009bd895 faultrep!WerpInitiateCrashReporting+0×304
0020f910 009b60cd WerFault!UserCrashMain+0×14e
0020f934 009b644a WerFault!wmain+0xbf
0020f978 77603833 WerFault!_initterm_e+0×163
0020f984 779ea9bd kernel32!BaseThreadInitThunk+0xe
0020f9c4 00000000 ntdll!_RtlUserThreadStart+0×23

Therefore it looks like calls to faultrep.dll module to report faults and launch the posmortem debugger were moved from UnhandledExceptionFilter to WerFault.exe in Vista.

Finally, let’s go back to our UnhandledExceptionFilter. If we disassemble it we would see that it can call kernel32!WerpLaunchAeDebug too:

0:000> .asm no_code_bytes
Assembly options: no_code_bytes
0:000> uf kernel32!UnhandledExceptionFilter



kernel32!UnhandledExceptionFilter+0×2d0:
77655c5f push dword ptr [ebp-28h]
77655c62 push dword ptr [ebp-1Ch]
77655c65 push dword ptr [ebx+4]
77655c68 push dword ptr [ebx]
77655c6a push 0FFFFFFFEh
77655c6c call kernel32!GetCurrentProcess (775e9145)
77655c71 push eax
77655c72 call kernel32!WerpLaunchAeDebug (7767baaf)
77655c77 test eax,eax
77655c79 jge  kernel32!UnhandledExceptionFilter+0×2f3 (77655c82)



kernel32!UnhandledExceptionFilter+0×303:
77655c92 mov  eax,dword ptr [ebx]
77655c94 push dword ptr [eax]
77655c96 push 0FFFFFFFFh
77655c98 call dword ptr [kernel32!_imp__NtTerminateProcess (775c14bc)]

If we look at WerpLaunchAeDebug code we would see that it calls CreateProcess too and the code is the same as in faultrep.dll. This could mean that faultrep.dll imports that function from kernel32.dll. So some postmortem debugger launching code is still present in the default unhandled exception filter perhaps for compatibility or in case WER doesn’t work or disabled.

High-level description of the differences between Windows XP and Vista application crash support is present in the recent Mark Russinovich’s article:

Inside the Windows Vista Kernel: Part 3 (Enhanced Crash Support)

- Dmitry Vostokov -

Process and Thread Startup in Vista

Saturday, May 19th, 2007

If you looked at process dumps from Vista or did live debugging you might have noticed that there are no longer kernel32 functions BaseProcessStart on the main thread stack and BaseThreadStart on subsequent thread stacks. In Vista we have ntdll!_RtlUserThreadStart which calls kernel32!BaseThreadInitThunk for both main and secondary threads:

0:002> ~*k
0 Id: 13e8.1348 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr
0009f8d8 77b7199a ntdll!KiFastSystemCallRet
0009f8dc 77b719cd USER32!NtUserGetMessage+0xc
0009f8f8 006b24e8 USER32!GetMessageW+0x33
0009f954 006c2588 calc!WinMain+0x278
0009f9e4 77603833 calc!_initterm_e+0x1a1
0009f9f0 779ea9bd kernel32!BaseThreadInitThunk+0xe
0009fa30 00000000 ntdll!_RtlUserThreadStart+0×23

1 Id: 13e8.534 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
0236f9d8 77a106a0 ntdll!KiFastSystemCallRet
0236f9dc 776077d4 ntdll!NtWaitForSingleObject+0xc
0236fa4c 77607742 kernel32!WaitForSingleObjectEx+0xbe
0236fa60 006b4958 kernel32!WaitForSingleObject+0×12
0236fa78 77603833 calc!WatchDogThread+0×21
0236fa84 779ea9bd kernel32!BaseThreadInitThunk+0xe
0236fac4 00000000 ntdll!_RtlUserThreadStart+0×23

# 2 Id: 13e8.1188 Suspend: 1 Teb: 7ffdd000 Unfrozen
ChildEBP RetAddr
0078fec8 77a3f0a9 ntdll!DbgBreakPoint
0078fef8 77603833 ntdll!DbgUiRemoteBreakin+0×3c
0078ff04 779ea9bd kernel32!BaseThreadInitThunk+0xe
0078ff44 00000000 ntdll!_RtlUserThreadStart+0×23

0:000> .asm no_code_bytes
Assembly options: no_code_bytes
0:000> uf ntdll!_RtlUserThreadStart
...
...
...
ntdll!_RtlUserThreadStart:
779ea996 push 14h
779ea998 push offset ntdll! ?? ::FNODOBFM::`string'+0xb6e (779ff108)
779ea99d call ntdll!_SEH_prolog4 (779f47d8)
779ea9a2 and  dword ptr [ebp-4],0
779ea9a6 mov  eax,dword ptr [ntdll!Kernel32ThreadInitThunkFunction (77a752a0)]
779ea9ab push dword ptr [ebp+0Ch]
779ea9ae test eax,eax
779ea9b0 je   ntdll!_RtlUserThreadStart+0x32 (779c6326)
...
...
...
0:000> dds ntdll!Kernel32ThreadInitThunkFunction l1
77a752a0 77603821 kernel32!BaseThreadInitThunk

- Dmitry Vostokov -

Process crash - getting the dump manually

Wednesday, May 16th, 2007

Sometimes customers have process crashes with exception dialogs but no dumps are saved due to some reason, for example, Dr. Watson limitation, NTSD doesn’t save dumps on Windows 2000, etc. One solution is to dump the process manually while it displays an error message. Customers and support engineers can use Microsoft userdump.exe for this purpose. Then looking at the dump we would see the exception because it is processed by an exception handler that either shows the error dialog or creates Windows Error Reporting process. Non-interactive services usually call NtRaiseHardError to let csrss.exe display a message. The following stack trace is from IE dump saved when WER error dialog box was shown:

0:000> k
ChildEBP RetAddr
0012973c 7c59a072 NTDLL!ZwWaitForSingleObject+0xb
00129764 7c57b3e9 KERNEL32!WaitForSingleObjectEx+0x71
00129774 00401b2f KERNEL32!WaitForSingleObject+0xf
0012a238 7918cd0e IEXPLORE!DwExceptionFilter+0×284
0012a244 03a3f0c3 mscoree!__CxxUnhandledExceptionFilter+0×46
0012a250 7c59bf8d msvcr71!__CxxUnhandledExceptionFilter+0×46
0012a984 715206e0 KERNEL32!UnhandledExceptionFilter+0×140
0012ee74 71520957 BROWSEUI!BrowserProtectedThreadProc+0×64
0012fef0 71762a0a BROWSEUI!SHOpenFolderWindow+0×1ec
0012ff10 00401ecd SHDOCVW!IEWinMain+0×108
0012ff60 00401f7d IEXPLORE!WinMainT+0×2dc
0012ffc0 7c5989a5 IEXPLORE!ModuleEntry+0×97
0012fff0 00000000 KERNEL32!BaseProcessStart+0×3d

If we disassemble DwExceptionFilter we would see CreateProcess call:

0:000> ub IEXPLORE!DwExceptionFilter+0x284
IEXPLORE!DwExceptionFilter+0x263:
00401b0e call dword ptr [IEXPLORE!_imp__CreateProcessA (00401050)]
00401b14 test eax,eax
00401b16 je   IEXPLORE!DwExceptionFilter+0x2f6 (00401ba1)
00401b1c mov  dword ptr [ebp+7Ch],edi
00401b1f mov  edi,dword ptr [IEXPLORE!_imp__WaitForSingleObject (0040104c)]
00401b25 push 4E20h
00401b2a push dword ptr [ebp+68h]
00401b2d call edi

I already described WER processes in the previous post about post-mortem debugging so I won’t cover it here.

If we run !analyze -v command we are lucky because WinDbg will find the exception for us:

...
...
...
CONTEXT: 0012aa94 -- (.cxr 12aa94)
eax=00000000 ebx=00000000 ecx=00000000 edx=7283e058 esi=0271a60c edi=00000000
eip=35c5f973 esp=0012ad60 ebp=0012ad7c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00010246
componentA!InternalFoo+0x21:
35c5f973 8b01 mov eax,dword ptr [ecx] ds:0023:00000000=????????
...
...
...
STACK_TEXT:
0012ad7c 35c6042f 0012ae10 00000000 35c53390 componentA!InternalFoo+0x21
0012c350 779d7d5d 00000000 001ad114 00000000 componentA!InternalBar+0x157
0012c36c 77a2310e 02b23d5c 00000020 00000004 oleaut32!DispCallFunc+0x15d
0012c3fc 35cc8b60 024d2d94 02b23d5c 00000001 oleaut32!CTypeInfo2::Invoke+0x244
...
...
...

If you see several threads with UnhandledExceptionFilter - Multiple Exceptions pattern - you can set the exception context individually based on the first parameter of UnhandledExceptionFilter which is a pointer to _EXCEPTION_POINTERS structure and then use .cxr command:

0:000> ~*kv
...
...
...
. 0 Id: 1568.68c Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr Args to Child
...
...
...
0012a984 715206e0 0012a9ac 7800bdb5 0012a9b4 KERNEL32!UnhandledExceptionFilter+0×140 (FPO: [Non-Fpo])



0:000> dt _EXCEPTION_POINTERS 0012a9ac
+0×000 ep_xrecord : 0×12aa78
+0×004 ep_context : 0×12aa94
0:000> .cxr 0012aa94
eax=00000000 ebx=00000000 ecx=00000000 edx=7283e058 esi=0271a60c edi=00000000
eip=35c5f973 esp=0012ad60 ebp=0012ad7c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00010246
componentA!InternalFoo+0×21:
35c5f973 8b01 mov eax,dword ptr [ecx] ds:0023:00000000=????????

Another stack fragment comes from some Windows service and it shows the thread calling NtRaiseHardError:

0:000> ~*k
...
...
...
13 Id: 3624.16cc Suspend: 1 Teb: 7ffad000 Unfrozen
ChildEBP RetAddr
0148ed40 7c821b74 ntdll!KiFastSystemCallRet
0148ed44 77e99af9 ntdll!NtRaiseHardError+0xc
0148f3dc 77e84259 kernel32!UnhandledExceptionFilter+0×54b

0148f40c 7c82eeb2 kernel32!_except_handler3+0×61
0148f430 7c82ee84 ntdll!ExecuteHandler2+0×26
0148f4d8 7c82ecc6 ntdll!ExecuteHandler+0×24
0148f4d8 7c81e215 ntdll!KiUserExceptionDispatcher+0xe
0148f7e0 76133437 ntdll!RtlLengthSecurityDescriptor+0×2a
0148f80c 7613f33d serviceA!GetObjectSize+0×1c3
0148f8d0 77c70f3b serviceA!RpcGetObjectSize+0×1b
0148f8f8 77ce23f7 rpcrt4!Invoke+0×30
0148fcf8 77ce26ed rpcrt4!NdrStubCall2+0×299
0148fd14 77c709be rpcrt4!NdrServerCall2+0×19
0148fd48 77c7093f rpcrt4!DispatchToStubInCNoAvrf+0×38
0148fd9c 77c70865 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×117
0148fdc0 77c734b1 rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3


- Dmitry Vostokov -

Interrupts and exceptions explained (Part 3)

Tuesday, May 15th, 2007

In Part 1 discussed interrupt processing that happens when an x86 processor executes in privileged protected mode (ring 0). It pushes interrupt frame shown in the following pseudo-code:

push EFLAGS
push CS
push EIP
push ErrorCode
EIP := IDT[VectorNumber].ExtendedOffset<<16 +
   IDT[VectorNumber].Offset

Please note that this is an interrupt frame that is created by CPU and not a trap frame created by a software interrupt handler to save CPU state (_KTRAP_FRAME).

If an x86 processor executes in user mode (ring 3) and an interrupt happens then the stack switch occurs before the processor saves user mode stack pointer SS:ESP and pushes the rest of the interrupt frame. Pushing both SS:RSP always happens on x64 processor regardless of the current execution mode, kernel or user. Therefore the following x86 pseudo-code shows how interrupt frame is pushed on the current stack (to be precise, on the kernel space stack if the interrupt happened in user mode):

push SS
push ESP
push EFLAGS
push CS
push EIP
push ErrorCode
EIP := IDT[VectorNumber].ExtendedOffset<<16 +
   IDT[VectorNumber].Offset

Usually CS is 0×1b and SS is 0×23 for x86 Windows flat memory model so we can easily identify this pattern on raw stack data.

Why should we care about an interrupt frame? This is because in complete full memory dumps we can see exceptions that happened in user space and were being processed at the time the dump was saved.

Let’s look at some example:

PROCESS 89a94800 SessionId: 1 Cid: 1050 Peb: 7ffd7000 ParentCid: 08a4
DirBase: 390f5000 ObjectTable: e36ee0b8 HandleCount: 168.
Image: processA.exe
VadRoot 8981d0a0 Vads 309 Clone 0 Private 222555. Modified 10838. Locked 0.
DeviceMap e37957e0
Token e395b8f8
ElapsedTime 07:44:38.505
UserTime 00:54:52.906
KernelTime 00:00:58.109
QuotaPoolUsage[PagedPool] 550152
QuotaPoolUsage[NonPagedPool] 14200
Working Set Sizes (now,min,max) (213200, 50, 345) (852800KB, 200KB, 1380KB)
PeakWorkingSetSize 227093
VirtualSize 1032 Mb
PeakVirtualSize 1032 Mb
PageFaultCount 232357
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 233170
DebugPort 899b6a40

We see that the process has a DebugPort and the presence of it usually shows that some exception happened. Therefore if you dump all processes by entering !process 0 1 command you can search for any unhandled exceptions.

Indeed if we switch to this process (you can also use !process 89a94800 ff command for dumps coming from XP and higher systems) we see KiDispatchException on one of the processA’s threads:

0: kd> .process 89a94800
0: kd> .reload
0: kd> !process 89a94800
...
...
...
THREAD 89a93020 Cid 1050.1054 Teb: 7ffdf000 Win32Thread: bc1da760 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
f44dc3a8 SynchronizationEvent
Not impersonating
DeviceMap e37957e0
Owning Process 89a94800 Image: processA.exe
Wait Start TickCount 4244146 Ticks: 1232980 (0:05:21:05.312)
Context Switch Count 1139234 LargeStack
UserTime 00:54:51.0531
KernelTime 00:00:53.0937
Win32 Start Address processA!WinMainCRTStartup (0x00c534c8)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init f44dcbd0 Current f44dc2ec Base f44dd000 Limit f44d7000 Call f44dcbd8
Priority 12 BasePriority 8 PriorityDecrement 2
ChildEBP RetAddr
f44dc304 8083d5b1 nt!KiSwapContext+0x26
f44dc330 8083df9e nt!KiSwapThread+0x2e5
f44dc378 809c3cff nt!KeWaitForSingleObject+0x346
f44dc458 809c4f09 nt!DbgkpQueueMessage+0x178
f44dc47c 80977ad9 nt!DbgkpSendApiMessage+0x45
f44dc508 8081a94f nt!DbgkForwardException+0x90
f44dc8c4 808346b4 nt!KiDispatchException+0×1ea
f44dc92c 80834650 nt!CommonDispatchException+0×4a
f44dc9b8 80a801ae nt!Kei386EoiHelper+0×16e
0012f968 0046915d hal!HalpDispatchSoftwareInterrupt+0×5e
0012f998 0047cb72 processA!CalculateClientSizeFromPoint+0×5f
0012f9bc 0047cc1d processA!CalculateFromPoint+0×30
0012fa64 0047de83 processA!DrawUsingMemDC+0×1b9
0012fac0 0099fb43 processA!OnDraw+0×13
0012fb5c 7c17332d processA!OnPaint+0×56
0012fbe8 7c16e0b0 MFC71!CWnd::OnWndMsg+0×340
0012fc08 00c6253a MFC71!CWnd::WindowProc+0×22
0012fc24 0096cf9d processA!WindowProc+0×38
0012fcb8 7c16e1b8 MFC71!AfxCallWndProc+0×91
0012fcd8 7c16e1f6 MFC71!AfxWndProc+0×46
0012fd04 7739b6e3 MFC71!AfxWndProcBase+0×39
0012fd30 7739b874 USER32!InternalCallWinProc+0×28
0012fda8 7739c8b8 USER32!UserCallWinProcCheckWow+0×151
0012fe04 7739c9c6 USER32!DispatchClientMessage+0xd9
0012fe2c 7c828536 USER32!__fnDWORD+0×24
0012fe2c 80832dee ntdll!KiUserCallbackDispatcher+0×2e
f44dcbf0 8092d605 nt!KiCallUserMode+0×4
f44dcc48 bf8a26d3 nt!KeUserModeCallback+0×8f
f44dcccc bf89e985 win32k!SfnDWORD+0xb4
f44dcd0c bf89eb27 win32k!xxxDispatchMessage+0×223
f44dcd58 80833bdf win32k!NtUserDispatchMessage+0×4c
f44dcd58 7c8285ec nt!KiFastCallEntry+0xfc
0012fe2c 7c828536 ntdll!KiFastSystemCallRet
0012fe58 7739c57b ntdll!KiUserCallbackDispatcher+0×2e
0012fea8 773a16e5 USER32!NtUserDispatchMessage+0xc
0012feb8 7c169076 USER32!DispatchMessageA+0xf
0012fec8 7c16913e MFC71!AfxInternalPumpMessage+0×3e
0012fee4 0041cb0b MFC71!CWinThread::Run+0×54
0012ff08 7c172fc5 processA!CMain::Run+0×3b
0012ff18 00c5364d MFC71!AfxWinMain+0×68
0012ffc0 77e6f23b processA!WinMainCRTStartup+0×185
0012fff0 00000000 kernel32!BaseProcessStart+0×23

You might think that exception happened in CalculateClientSizeFromPoint function. However there is no nt!KiTrapXXX call and hal!HalpDispatchSoftwareInterrupt has user space return address and this looks suspicious. So we need to look at raw stack data and find our interrupt frame. We look for KiDispatchException, then for KiTrap substring and finally for 0000001b. If 0000001b and 00000023 are separated by 2 double words then we have found out interrupt frame:

0: kd> .thread 89a93020
Implicit thread is now 89a93020
0: kd> dds esp esp+1000
...
...
...
f44dc2f8 f44dc330
f44dc2fc 89a93098
f44dc300 ffdff120
f44dc304 89a93020
f44dc308 8083d5b1 nt!KiSwapThread+0x2e5
f44dc30c 89a93020
f44dc310 89a930c8
f44dc314 00000000
...
...
...
f44dc4e8 f44dcc38
f44dc4ec 8083a8cc nt!_except_handler3
f44dc4f0 80870868 nt!`string'+0xa4
f44dc4f4 ffffffff
f44dc4f8 80998bfd nt!Ki386CheckDivideByZeroTrap+0x273
f44dc4fc 8083484f nt!KiTrap00+0x88
f44dc500 00000001
f44dc504 0000bb40
f44dc508 f44dc8c4
f44dc50c 8081a94f nt!KiDispatchException+0×1ea
f44dc510 f44dc8e0
f44dc514 00000001
f44dc518 00000000
f44dc51c 00469583 processA!LPtoDP+0×19
f44dc520 16b748f0
f44dc524 00469583 processA!LPtoDP+0×19
f44dc528 00000000
f44dc52c 00000000



f44dc8c0 ffffffff
f44dc8c4 f44dc934
f44dc8c8 808346b4 nt!CommonDispatchException+0×4a
f44dc8cc f44dc8e0
f44dc8d0 00000000
f44dc8d4 f44dc934
f44dc8d8 00000001
f44dc8dc 00000001
f44dc8e0 c0000094
f44dc8e4 00000000
f44dc8e8 00000000
f44dc8ec 00469583 processA!LPtoDP+0×19
f44dc8f0 00000000
f44dc8f4 808a3988 nt!KiAbiosPresent+0×4
f44dc8f8 ffffffff
f44dc8fc 0000a6f2
f44dc900 00469585 processA!LPtoDP+0×1b
f44dc904 00000004
f44dc908 00000000
f44dc90c f9000001
f44dc910 f44dc8dc
f44dc914 ffffffff
f44dc918 f44dcc38
f44dc91c 8083a8cc nt!_except_handler3
f44dc920 80870868 nt!`string’+0xa4
f44dc924 ffffffff
f44dc928 80998bfd nt!Ki386CheckDivideByZeroTrap+0×273
f44dc92c 8083484f nt!KiTrap00+0×88
f44dc930 80834650 nt!Kei386EoiHelper+0×16e
f44dc934 0012f968
f44dc938 00469583 processA!LPtoDP+0×19
f44dc93c badb0d00
f44dc940 00000000
f44dc944 ffffffff
f44dc948 00007fff
f44dc94c 00000000
f44dc950 fffff800
f44dc954 ffffffff
f44dc958 00007fff
f44dc95c 00000000
f44dc960 00000000
f44dc964 80a80000 hal!HalpInitIrqlAuditFlag+0×4e
f44dc968 00000023
f44dc96c 00000023
f44dc970 00000000
f44dc974 00000000
f44dc978 00005334
f44dc97c 00000001
f44dc980 f44dcc38
f44dc984 0000003b
f44dc988 16b748f0
f44dc98c 16b748f0
f44dc990 0012f9fc
f44dc994 0012f968
f44dc998 00000000 ; ErrorCode
f44dc99c 00469583 processA!LPtoDP+0×19 ; EIP
f44dc9a0 0000001b ; CS
f44dc9a4 00010246 ; EFLAGS
f44dc9a8 0012f934 ; ESP
f44dc9ac 00000023 ; SS
f44dc9b0 8982e7e0
f44dc9b4 00000000

Why did we skip the first KiTrap00? Because KiDispatchException is called after KiTrap00 so we should see it before KiTrap00 on raw stack. To see all these calls we can disassemble return addresses:

0: kd> .asm no_code_bytes
Assembly options: no_code_bytes
0: kd> ub nt!KiTrap00+0x88
nt!KiTrap00+0x74:
8083483b test byte ptr [ebp+6Ch],1
8083483f je   nt!KiTrap00+0x81 (80834848)
80834841 cmp  word ptr [ebp+6Ch],1Bh
80834846 jne  nt!KiTrap00+0x9e (80834865)
80834848 sti
80834849 push ebp
8083484a call nt!Ki386CheckDivideByZeroTrap (8099897d)
8083484f mov  ebx,dword ptr [ebp+68h]

nt!KiTrap00+0×88 is not equal to nt!KiTrap00+0×74 so we have OMAP code optimization case here and we have to disassemble raw addresses as seen on the raw stack fragment repeated here:

...
...
...
f44dc8c8 808346b4 nt!CommonDispatchException+0x4a
...
...
...
f44dc924 ffffffff
f44dc928 80998bfd nt!Ki386CheckDivideByZeroTrap+0x273
f44dc92c 8083484f nt!KiTrap00+0×88
f44dc930 80834650 nt!Kei386EoiHelper+0×16e
f44dc934 0012f968


0: kd> u 8083484f
nt!KiTrap00+0×88:
8083484f mov  ebx,dword ptr [ebp+68h]
80834852 jmp  nt!Kei386EoiHelper+0×167 (80834649)
80834857 sti
80834858 mov  ebx,dword ptr [ebp+68h]
8083485b mov  eax,0C0000094h
80834860 jmp  nt!Kei386EoiHelper+0×167 (80834649)
80834865 mov  ebx,dword ptr fs:[124h]
8083486c mov  ebx,dword ptr [ebx+38h]
0: kd> u 80834649
nt!Kei386EoiHelper+0×167:
80834649 xor  ecx,ecx
8083464b call nt!CommonDispatchException (8083466a)
80834650 xor  edx,edx ; nt!Kei386EoiHelper+0×16e
80834652 mov  ecx,1
80834657 call nt!CommonDispatchException (8083466a)
8083465c xor  edx,edx
8083465e mov  ecx,2
80834663 call nt!CommonDispatchException (8083466a)
0: kd> ub 808346b4
nt!CommonDispatchException+0×38:
808346a2 mov  eax,dword ptr [ebp+6Ch]
808346a5 and  eax,1
808346a8 push 1
808346aa push eax
808346ab push ebp
808346ac push 0
808346ae push ecx
808346af call nt!KiDispatchException (80852a53)

So we see that KiTrap00 calls CommonDispatchException which calls KiDispatchException. If we look at our found interrupt frame we see that EIP of the exception was 00469583 and ESP was 0012f934:

...
...
...
f44dc998 00000000 ; ErrorCode
f44dc99c 00469583 processA!LPtoDP+0×19 ; EIP
f44dc9a0 0000001b ; CS
f44dc9a4 00010246 ; EFLAGS
f44dc9a8 0012f934 ; ESP
f44dc9ac 00000023 ; SS


Now we try to reconstruct stack trace by putting the values of ESP and EIP:

0: kd> k L=0012f934 0012f934 00469583 ; EBP ESP EIP format
ChildEBP RetAddr
0012f930 00469a16 processA!LPtoDP+0x19
0012f934 00000000 processA!GetColumnWidth+0x45

Stack trace doesn’t look good, there is neither BaseProcessStart nor BaseThreadStart, perhaps because we specified ESP value twice instead of EBP and ESP. Let’s hope to find EBP value by dumping the memory around ESP:

0: kd> dds 0012f934-10 0012f934+100
0012f924 00000000
0012f928 0012f934 ; the same as ESP
0012f92c 0012f968 ; looks good to us
0012f930 00469572 processA!LPtoDP+0×8
0012f934 00469a16 processA!GetColumnWidth+0×45
0012f938 00005334



0012f964 00005334
0012f968 0012f998
0012f96c 0046915d processA!CalculateClientSizeFromPoint+0×5f
0012f970 00000000
0012f974 0012f9fc
0012f978 16b748f0
0012f97c 0012fa48
0012f980 00000000
0012f984 00000000
0012f988 000003a0
0012f98c 00000237
0012f990 00000014
0012f994 00000000
0012f998 0012f9bc
0012f99c 0047cb72 processA!CalculateFromPoint+0×30
0012f9a0 0012f9fc
0012f9a4 0012f9b4
0012f9a8 0012fa48


So finally we get our stack trace:

0: kd> k L=0012f968 0012f934 00469583 100
ChildEBP RetAddr
0012f930 00469a16 processA!LPtoDP+0x19
0012f968 0046915d processA!GetColumnWidth+0x45
0012f998 0047cb72 processA!CalculateClientSizeFromPoint+0x5f
0012f9bc 0047cc1d processA!CalculateFromPoint+0x30
0012fa64 0047de83 processA!DrawUsingMemDC+0x1b9
0012fac0 0099fb43 processA!OnDraw+0x13
0012fb5c 7c17332d processA!OnPaint+0x56
0012fbe8 7c16e0b0 MFC71!CWnd::OnWndMsg+0x340
0012fc08 00c6253a MFC71!CWnd::WindowProc+0x22
0012fc24 0096cf9d processA!WindowProc+0x38
0012fcb8 7c16e1b8 MFC71!AfxCallWndProc+0x91
0012fcd8 7c16e1f6 MFC71!AfxWndProc+0x46
0012fd04 7739b6e3 MFC71!AfxWndProcBase+0x39
0012fd30 7739b874 USER32!InternalCallWinProc+0x28
0012fda8 7739c8b8 USER32!UserCallWinProcCheckWow+0x151
0012fe04 7739c9c6 USER32!DispatchClientMessage+0xd9
0012fe2c 7c828536 USER32!__fnDWORD+0x24
0012fe2c 80832dee ntdll!KiUserCallbackDispatcher+0x2e
f44dcbf0 8092d605 nt!KiCallUserMode+0x4
f44dcc48 bf8a26d3 nt!KeUserModeCallback+0x8f
f44dcccc bf89e985 win32k!SfnDWORD+0xb4
f44dcd0c bf89eb27 win32k!xxxDispatchMessage+0x223
f44dcd58 80833bdf win32k!NtUserDispatchMessage+0x4c
f44dcd58 7c8285ec nt!KiFastCallEntry+0xfc
0012fe2c 7c828536 ntdll!KiFastSystemCallRet
0012fe58 7739c57b ntdll!KiUserCallbackDispatcher+0x2e
0012fea8 773a16e5 USER32!NtUserDispatchMessage+0xc
0012feb8 7c169076 USER32!DispatchMessageA+0xf
0012fec8 7c16913e MFC71!AfxInternalPumpMessage+0x3e
0012fee4 0041cb0b MFC71!CWinThread::Run+0x54
0012ff08 7c172fc5 processA!CMain::Run+0x3b
0012ff18 00c5364d MFC71!AfxWinMain+0x68
0012ffc0 77e6f23b processA!WinMainCRTStartup+0x185
0012fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov -

Crash Dump Analysis Patterns (Part 14)

Friday, May 11th, 2007

The next pattern is Spiking Thread. If you have a process dump with many threads from a customer it is sometimes difficult to see which thread there was spiking CPU, that’s why it is always good to have some screenshots or notes from QSlice or Process Explorer showing spiking thread ID and process ID. The latter ID is to make sure that the process dump was from the correct process. New process dumpers and tools from Microsoft (userdump.exe, for example) save thread time information so you can open the dump and see the time spent in kernel and user mode for any thread by entering !runaway command. However if that command shows many threads with similar CPU consumption it will not highlight the particular thread that was spiking at the time the crash dump was saved so screenshots are still useful in some cases.

What to do if you don’t have spiking thread ID? Look at all threads and find those that are not waiting. Almost all threads are waiting most of the time. So the chances to dump the normal process and see some active threads are very low. If the thread is waiting the top function on its stack usually is (for XP/W2K3/Vista):

ntdll!KiFastSystemCallRet

and below it you could see some blocking calls waiting for some synchronization object, Sleep API call, IO completion or for LPC reply:

0:085> ~*kv
...
...
...
64 Id: 1b0.120c Suspend: -1 Teb: 7ff69000 Unfrozen
ChildEBP RetAddr Args to Child
02defe18 7c90e399 ntdll!KiFastSystemCallRet
02defe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc
02deff80 77e76c22 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
02deff88 77e76a3b rpcrt4!RecvLotsaCallsWrapper+0xd
02deffa8 77e76c0a rpcrt4!BaseCachedThreadRoutine+0×79
02deffb4 7c80b683 rpcrt4!ThreadStartRoutine+0×1a
02deffec 00000000 kernel32!BaseThreadStart+0×37

65 Id: 1b0.740 Suspend: -1 Teb: 7ff67000 Unfrozen
ChildEBP RetAddr Args to Child
02edff44 7c90d85c ntdll!KiFastSystemCallRet
02edff48 7c8023ed ntdll!NtDelayExecution+0xc
02edffa0 57cde2dd kernel32!SleepEx+0×61

02edffb4 7c80b683 component!foo+0×35
02edffec 00000000 kernel32!BaseThreadStart+0×37

66 Id: 1b0.131c Suspend: -1 Teb: 7ff66000 Unfrozen
ChildEBP RetAddr Args to Child
02f4ff38 7c90e9c0 ntdll!KiFastSystemCallRet
02f4ff3c 7c8025cb ntdll!ZwWaitForSingleObject+0xc
02f4ffa0 72001f65 kernel32!WaitForSingleObjectEx+0xa8

02f4ffb4 7c80b683 component!WorkerThread+0×15
02f4ffec 00000000 kernel32!BaseThreadStart+0×37

67 Id: 1b0.1320 Suspend: -1 Teb: 7ff65000 Unfrozen
ChildEBP RetAddr Args to Child
02f8fe1c 7c90e9ab ntdll!KiFastSystemCallRet
02f8fe20 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
02f8febc 7e4195f9 kernel32!WaitForMultipleObjectsEx+0×12c
02f8ff18 7e4196a8 user32!RealMsgWaitForMultipleObjectsEx+0×13e
02f8ff34 720019f6 user32!MsgWaitForMultipleObjects+0×1f

02f8ffa0 72001a29 component!bar+0xd9
02f8ffb4 7c80b683 component!MonitorWorkerThread+0×11
02f8ffec 00000000 kernel32!BaseThreadStart+0×37

68 Id: 1b0.1340 Suspend: -1 Teb: 7ff63000 Unfrozen
ChildEBP RetAddr Args to Child
0301ff1c 7c90e31b ntdll!KiFastSystemCallRet
0301ff20 7c80a746 ntdll!ZwRemoveIoCompletion+0xc
0301ff4c 57d46e65 kernel32!GetQueuedCompletionStatus+0×29

0301ffb4 7c80b683 component!AsyncEventsThread+0×91
0301ffec 00000000 kernel32!BaseThreadStart+0×37



# 85 Id: 1b0.17b4 Suspend: -1 Teb: 7ffd4000 Unfrozen
ChildEBP RetAddr Args to Child
00daffc8 7c9507a8 ntdll!DbgBreakPoint
00dafff4 00000000 ntdll!DbgUiRemoteBreakin+0×2d

Therefore if you have a different thread like this one below the chances that it was spiking are big:

58 Id: 1b0.9f4 Suspend: -1 Teb: 7ff75000 Unfrozen
ChildEBP RetAddr Args to Child
0280f64c 500af723 componentB!DoSomething+32
0280f85c 500b5391 componentB!CheckSomething+231
0280f884 500b7a3f componentB!ProcessWorkIteme+9f
0301ffec 00000000 kernel32!BaseThreadStart+0x37

There is no KiFastSystemCallRet on top and if we look at the currently executing instruction it indeed does some copy operation:

0:085> ~58r
eax=00000000 ebx=0280fdd4 ecx=0000005f edx=00000000 esi=03d30444 edi=0280f6dc
eip=500a4024 esp=0280f644 ebp=0280f64c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
componentB!DoSomething+32:
500a4024 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:0280f6dc=00000409 ds:0023:03d30444=00000409

In a kernel or a complete memory dump you can see spikes by checking KernelTime and UserTime:

0: kd> !thread 88b66768
THREAD 88b66768 Cid 01fc.1550 Teb: 7ffad000 Win32Thread: bc18f240 RUNNING on processor 1
IRP List:
89716008: (0006,0094) Flags: 00000a00 Mdl: 00000000
Impersonation token: e423a030 (Level Impersonation)
DeviceMap e3712480
Owning Process 8a0a56a0 Image: SomeSvc.exe
Wait Start TickCount 1782229 Ticks: 0
Context Switch Count 877610 LargeStack
UserTime 00:00:01.0078
KernelTime 02:23:21.0718

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 13a)

Wednesday, May 9th, 2007

Insufficient Memory pattern can be seen in many complete and kernel memory dumps. This condition could cause the system to crash, become slow, hang or refuse to provide the expected functionality, for example, refuse new terminal server connections. There are many types of memory resources and we can classify them initially into the following categories:

  • Committed memory
  • Virtual memory
    • Kernel space
      • Paged pool
      • Non-paged pool
      • Session pool
      • PTE limits
      • Desktop heap
      • GDI limits
    • User space
      • Virtual regions
      • Process heap

We will talk about all of them in separate parts. What I outline in this part is committed memory exhaustion. Committed memory is an allocated memory backed up by some physical memory or by a reserved space in the page file. Reserving the space needs to be done in case OS wants to swap out that memory’s data to disk if it is not currently used and there is no physical memory available for other processes. If that data is needed again OS brings it back to physical memory. If there is no space in the page file then physical memory is filled up. If committed memory is exhausted most likely the system will hang or result in a bugcheck soon so checking memory statistics shall always be done when you get a kernel or a complete memory dump. Even access violation bugchecks could result from insufficient memory when some memory allocation operation failed but a kernel mode component didn’t check the return value for NULL. Here is an example:

BugCheck 8E, {c0000005, 809203af, aa647c0c, 0}

0: kd> !analyze -v
...
...
...
TRAP_FRAME: aa647c0c -- (.trap ffffffffaa647c0c)
...
...
...

0: kd> .trap ffffffffaa647c0c
ErrCode = 00000000
eax=00000000 ebx=bc1f3cfc ecx=89589250 edx=000018c1 esi=bc1f3ce0 edi=aa647d14
eip=809203af esp=aa647c80 ebp=aa647c80 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
nt!SeTokenType+0x8:
809203af 8b8080000000 mov eax,dword ptr [eax+80h] ds:0023:00000080=????????

0: kd> k
ChildEBP RetAddr
aa647c80 bf8173c5 nt!SeTokenType+0x8
aa647cdc bf81713b win32k!GreGetSpoolMessage+0xb0
aa647d4c 80834d3f win32k!NtGdiGetSpoolMessage+0x96
aa647d4c 7c82ed54 nt!KiFastCallEntry+0xfc

If we enter !vm command to display memory statistics we would see that all committed memory is filled up:

0: kd> !vm
*** Virtual Memory Usage ***
 Physical Memory:      999294 (   3997176 Kb)
 Page File: \??\C:\pagefile.sys
   Current:   4193280 Kb  Free Space:    533744 Kb
   Minimum:   4193280 Kb  Maximum:      4193280 Kb
 Available Pages:       18698 (     74792 Kb)
 ResAvail Pages:       865019 (   3460076 Kb)
 Locked IO Pages:         290 (      1160 Kb)
 Free System PTEs:     155265 (    621060 Kb)
 Free NP PTEs:          32766 (    131064 Kb)
 Free Special NP:           0 (         0 Kb)
 Modified Pages:          113 (       452 Kb)
 Modified PF Pages:        61 (       244 Kb)
 NonPagedPool Usage:    12380 (     49520 Kb)
 NonPagedPool Max:      64799 (    259196 Kb)
 PagedPool 0 Usage:     40291 (    161164 Kb)
 PagedPool 1 Usage:      2463 (      9852 Kb)
 PagedPool 2 Usage:      2455 (      9820 Kb)
 PagedPool 3 Usage:      2453 (      9812 Kb)
 PagedPool 4 Usage:      2488 (      9952 Kb)
 PagedPool Usage:       50150 (    200600 Kb)
 PagedPool Maximum:     67584 (    270336 Kb)

 ********** 18 pool allocations have failed **********

 Shared Commit:         87304 (    349216 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:        56241 (    224964 Kb)
 PagedPool Commit:      50198 (    200792 Kb)
 Driver Commit:          1892 (      7568 Kb)
 Committed pages:     2006945 (   8027780 Kb)
 Commit limit:        2008205 (   8032820 Kb)

 ********** 1216024 commit requests have failed  **********

There might have been a memory leak or too many terminal sessions with fat applications to fit in physical memory and the page file. Actually for that particular case there were both.

- Dmitry Vostokov @ DumpAnalysis.org -

Interrupts and exceptions explained (Part 2)

Saturday, May 5th, 2007

As I promised in Part 1, I describe changes in 64-bit Windows. The size of IDTR is 10 bytes where 8 bytes hold 64-bit address of IDT. The size of IDT entry is 16 bytes and it holds the address of an interrupt procedure corresponding to an interrupt vector. However interrupt procedure names are different in x64 Windows, they do not follow the same pattern like KiTrapXX. 

The following UML class diagram describes the relationship and also shows what registers are saved. In native x64 mode SS and RSP are saved regardless of kernel or user mode.

Let’s dump all architecture-defined interrupt procedure names. This is a good exercise because we will use scripting. !pcr extension reports wrong IDT base so we use dt command:

kd> !pcr 0
KPCR for Processor 0 at fffff80001176000:
    Major 1 Minor 1
 NtTib.ExceptionList: fffff80000124000
     NtTib.StackBase: fffff80000125070
    NtTib.StackLimit: 0000000000000000
  NtTib.SubSystemTib: fffff80001176000
       NtTib.Version: 0000000001176180
   NtTib.UserPointer: fffff800011767f0
       NtTib.SelfTib: 000000007ef95000
             SelfPcr: 0000000000000000
                Prcb: fffff80001176180
                Irql: 0000000000000000
                 IRR: 0000000000000000
                 IDR: 0000000000000000
       InterruptMode: 0000000000000000
                 IDT: 0000000000000000
                 GDT: 0000000000000000
                 TSS: 0000000000000000
       CurrentThread: fffffadfe669f890
          NextThread: 0000000000000000
          IdleThread: fffff8000117a300
           DpcQueue:
kd> dt _KPCR fffff80001176000
nt!_KPCR
   +0×000 NtTib            : _NT_TIB
   +0×000 GdtBase          : 0xfffff800`00124000 _KGDTENTRY64
   +0×008 TssBase          : 0xfffff800`00125070 _KTSS64
   +0×010 PerfGlobalGroupMask : (null)
   +0×018 Self             : 0xfffff800`01176000 _KPCR
   +0×020 CurrentPrcb      : 0xfffff800`01176180 _KPRCB
   +0×028 LockArray        : 0xfffff800`011767f0 _KSPIN_LOCK_QUEUE
   +0×030 Used_Self        : 0×00000000`7ef95000
   +0×038 IdtBase          : 0xfffff800`00124070 _KIDTENTRY64
   +0×040 Unused           : [2] 0
   +0×050 Irql             : 0 ”
   +0×051 SecondLevelCacheAssociativity : 0×10 ”
   +0×052 ObsoleteNumber   : 0 ”
   +0×053 Fill0            : 0 ”
   +0×054 Unused0          : [3] 0
   +0×060 MajorVersion     : 1
   +0×062 MinorVersion     : 1
   +0×064 StallScaleFactor : 0×892
   +0×068 Unused1          : [3] (null)
   +0×080 KernelReserved   : [15] 0
   +0×0bc SecondLevelCacheSize : 0×100000
   +0×0c0 HalReserved      : [16] 0×82c5c880
   +0×100 Unused2          : 0
   +0×108 KdVersionBlock   : 0xfffff800`01174ca0
   +0×110 Unused3          : (null)
   +0×118 PcrAlign1        : [24] 0
   +0×180 Prcb             : _KPRCB

Next we dump the first entry of IDT array and glue together OffsetHigh, OffsetMiddle and OffsetLow fields to form the interrupt procedure address corresponding to the interrupt vector 0, divide by zero exception:

kd> dt _KIDTENTRY64 0xfffff800`00124070
nt!_KIDTENTRY64
   +0x000 OffsetLow        : 0xf240
   +0x002 Selector         : 0x10
   +0x004 IstIndex         : 0y000
   +0x004 Reserved0        : 0y00000 (0)
   +0x004 Type             : 0y01110 (0xe)
   +0x004 Dpl              : 0y00
   +0x004 Present          : 0y1
   +0×006 OffsetMiddle     : 0×103
   +0×008 OffsetHigh       : 0xfffff800
   +0×00c Reserved1        : 0
   +0×000 Alignment        : 0×1038e00`0010f240

kd> u 0xfffff8000103f240
nt!KiDivideErrorFault:
fffff800`0103f240 4883ec08        sub     rsp,8
fffff800`0103f244 55              push    rbp
fffff800`0103f245 4881ec58010000  sub     rsp,158h
fffff800`0103f24c 488dac2480000000 lea     rbp,[rsp+80h]
fffff800`0103f254 c645ab01        mov     byte ptr [rbp-55h],1
fffff800`0103f258 488945b0        mov     qword ptr [rbp-50h],rax
fffff800`0103f25c 48894db8        mov     qword ptr [rbp-48h],rcx
fffff800`0103f260 488955c0        mov     qword ptr [rbp-40h],rdx
kd> ln 0xfffff8000103f240
(fffff800`0103f240)   nt!KiDivideErrorFault   |
(fffff800`0103f300)   nt!KiDebugTrapOrFault
Exact matches:
    nt!KiDivideErrorFault = <no type information>

We see that the name of the procedure is KiDivideErrorFault and not KiTrap00. We can dump the second IDT entry manually by adding a 0×10 offset but in order to automate this I wrote the following WinDbg script to dump the first 20 vectors and get their interrupt procedure names:

r? $t0=(_KIDTENTRY64 *)0xfffff800`00124070; .for (r $t1=0; @$t1 <= 13; r? $t0=(_KIDTENTRY64 *)@$t0+1) { .printf “Interrupt vector %d (0x%x):\n”, @$t1, @$t1; ln @@c++(@$t0->OffsetHigh*0×100000000 + @$t0->OffsetMiddle*0×10000 + @$t0->OffsetLow); r $t1=$t1+1 }

Here is the same script but formatted:

r? $t0=(_KIDTENTRY64 *)0xfffff800`00124070;
.for (r $t1=0; @$t1 <= 13; r? $t0=(_KIDTENTRY64 *)@$t0+1)
{
    .printf "Interrupt vector %d (0x%x):\n", @$t1, @$t1;
    ln @@c++(@$t0->OffsetHigh*0x100000000 +
        @$t0->OffsetMiddle*0x10000 + @$t0->OffsetLow);
    r $t1=$t1+1
}

The output on my system is:

Interrupt vector 0 (0x0):
(fffff800`0103f240) nt!KiDivideErrorFault | (fffff800`0103f300) nt!KiDebugTrapOrFault
Exact matches:
  nt!KiDivideErrorFault = <no type information>
Interrupt vector 1 (0×1):
(fffff800`0103f300) nt!KiDebugTrapOrFault | (fffff800`0103f440) nt!KiNmiInterrupt
Exact matches:
  nt!KiDebugTrapOrFault = <no type information>
Interrupt vector 2 (0×2):
(fffff800`0103f440) nt!KiNmiInterrupt | (fffff800`0103f680) nt!KxNmiInterrupt
Exact matches:
  nt!KiNmiInterrupt = <no type information>
Interrupt vector 3 (0×3):
(fffff800`0103f780) nt!KiBreakpointTrap | (fffff800`0103f840) nt!KiOverflowTrap
Exact matches:
  nt!KiBreakpointTrap = <no type information>
Interrupt vector 4 (0×4):
(fffff800`0103f840) nt!KiOverflowTrap | (fffff800`0103f900) nt!KiBoundFault
Exact matches:
  nt!KiOverflowTrap = <no type information>
Interrupt vector 5 (0×5):
(fffff800`0103f900) nt!KiBoundFault | (fffff800`0103f9c0) nt!KiInvalidOpcodeFault
Exact matches:
  nt!KiBoundFault = <no type information>
Interrupt vector 6 (0×6):
(fffff800`0103f9c0) nt!KiInvalidOpcodeFault | (fffff800`0103fb80) nt!KiNpxNotAvailableFault
Exact matches:
  nt!KiInvalidOpcodeFault = <no type information>
Interrupt vector 7 (0×7):
(fffff800`0103fb80) nt!KiNpxNotAvailableFault | (fffff800`0103fc40) nt!KiDoubleFaultAbort
Exact matches:
  nt!KiNpxNotAvailableFault = <no type information>
Interrupt vector 8 (0×8):
(fffff800`0103fc40) nt!KiDoubleFaultAbort | (fffff800`0103fd00) nt!KiNpxSegmentOverrunAbort
Exact matches:
  nt!KiDoubleFaultAbort = <no type information>
Interrupt vector 9 (0×9):
(fffff800`0103fd00) nt!KiNpxSegmentOverrunAbort | (fffff800`0103fdc0) nt!KiInvalidTssFault
Exact matches:
  nt!KiNpxSegmentOverrunAbort = <no type information>
Interrupt vector 10 (0xa):
(fffff800`0103fdc0) nt!KiInvalidTssFault | (fffff800`0103fe80) nt!KiSegmentNotPresentFault
Exact matches:
  nt!KiInvalidTssFault = <no type information>
Interrupt vector 11 (0xb):
(fffff800`0103fe80) nt!KiSegmentNotPresentFault | (fffff800`0103ff80) nt!KiStackFault
Exact matches:
  nt!KiSegmentNotPresentFault = <no type information>
Interrupt vector 12 (0xc):
(fffff800`0103ff80) nt!KiStackFault | (fffff800`01040080) nt!KiGeneralProtectionFault
Exact matches:
  nt!KiStackFault = <no type information>
Interrupt vector 13 (0xd):
(fffff800`01040080) nt!KiGeneralProtectionFault | (fffff800`01040180) nt!KiPageFault
Exact matches:
  nt!KiGeneralProtectionFault = <no type information>
Interrupt vector 14 (0xe):
(fffff800`01040180) nt!KiPageFault | (fffff800`010404c0) nt!KiFloatingErrorFault
Exact matches:
  nt!KiPageFault = <no type information>
Interrupt vector 15 (0xf):
(fffff800`01179090) nt!KxUnexpectedInterrupt0+0xf0 | (fffff800`0117a0c0) nt!KiNode0
Interrupt vector 16 (0×10):
(fffff800`010404c0) nt!KiFloatingErrorFault | (fffff800`01040600) nt!KiAlignmentFault
Exact matches:
  nt!KiFloatingErrorFault = <no type information>
Interrupt vector 17 (0×11):
(fffff800`01040600) nt!KiAlignmentFault | (fffff800`010406c0) nt!KiMcheckAbort
Exact matches:
  nt!KiAlignmentFault = <no type information>
Interrupt vector 18 (0×12):
(fffff800`010406c0) nt!KiMcheckAbort | (fffff800`01040900) nt!KxMcheckAbort
Exact matches:
  nt!KiMcheckAbort = <no type information>
Interrupt vector 19 (0×13):
(fffff800`01040a00) nt!KiXmmException | (fffff800`01040b80) nt!KiRaiseAssertion
Exact matches:
  nt!KiXmmException = <no type information>

Let’s look at some dump.

BugCheck 1E, {ffffffffc0000005, fffffade5ba2d643, 0, 28}

This is KMODE_EXCEPTION_NOT_HANDLED and obviously it could have been invalid memory access. And indeed the stack WinDbg shows after opening a dump and entering !analyze -v command is:

RSP
fffffade`4e88fe68 nt!KeBugCheckEx
fffffade`4e88fe70 nt!KiDispatchException+0x128
fffffade`4e8905f0 nt!KiPageFault+0x1e1
fffffade`4e890780 driver!foo+0x9b

and the exception context is:

2: kd> r
Last set context:
rax=fffffade4e8907f4 rbx=fffffade6de0c2e0 rcx=fffffa8014412000
rdx=fffffade71e7e2ac rsi=0000000000000000 rdi=fffffadffff03000
rip=fffffade5ba2d643 rsp=fffffade4e890780 rbp=fffffade71e7ffff
 r8=00000000000005b0 r9=fffffade4e890a88 r10=fffffadffd077898
r11=fffffade71e7e260 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=0950 es=4e89 fs=fade gs=ffff efl=00010246
driver!foo+0×9b:
fffffade`5ba2d643 8b4e28 mov ecx,dword ptr [rsi+28h] ds:0950:0028=????????

We see that KiPageFault was called and from the dumped IDT we see it corresponds to the interrupt vector 14 (0xE) which is called on any memory reference that is not present in physical memory.

Now I’m going to dump the raw stack around fffffade`4e890780 to see our the values the processor saved before calling KiPageFault:

2: kd> dps fffffade`4e890780-50 fffffade`4e890780+50
fffffade`4e890730  fffffade`6de0c2e0
fffffade`4e890738  fffffadf`fff03000
fffffade`4e890740  00000000`00000000
fffffade`4e890748  fffffade`71e7ffff
fffffade`4e890750  00000000`00000000 ; ErrorCode
fffffade`4e890758  fffffade`5ba2d643 driver!foo+0×9b ; RIP
fffffade`4e890760  00000000`00000010 ; CS
fffffade`4e890768  00000000`00010246 ; RFLAGS
fffffade`4e890770  fffffade`4e890780 ; RSP
fffffade`4e890778  00000000`00000018 ; SS

fffffade`4e890780  00000000`00000000 ; RSP before interrupt
fffffade`4e890788  00000000`00000000
fffffade`4e890790  00000000`00000000
fffffade`4e890798  00000000`00000000
fffffade`4e8907a0  00000000`00000000
fffffade`4e8907a8  00000000`00000000
fffffade`4e8907b0  00000000`00000000
fffffade`4e8907b8  00000000`00000000
fffffade`4e8907c0  00000000`00000000
fffffade`4e8907c8  00000000`00000000
fffffade`4e8907d0  00000000`00000000

You see the values are exactly the same as WinDbg shows in the saved context above. Actually if you look at Page-Fault Error Code bits in Intel Architecture Software Developer’s Manual Volume 3A, you would see that for this case, all zeroes, we have:

  • the page was not present in memory
  • the fault was caused by the read access
  • the processor was executing in kernel mode
  • no reserved bits in page directory were set to 1 when 0s were expected
  • it was not caused by instruction fetch

- Dmitry Vostokov -

Parameterized WinDbg Scripts

Saturday, May 5th, 2007

I was looking for a way to pass arguments to scripts and the new WinDbg 6.7.5 documentation contains the description of $$>a< command:

$$>a< Filename arg1 arg2 arg3 … argn

argn

Specifies an argument that the debugger passes to the script. These parameters can be quoted strings or space-delimited strings. All arguments are optional.

This promises a lot of possibilities to write cool scripts and use structured programming. Couldn’t quickly find in the help who gets the passed parameter, a pseudu-register or a fixed-name alias. Certainly some experimentation is needed here.

- Dmitry Vostokov -

Crash Dumps for Dummies (Part 5)

Saturday, May 5th, 2007

In this part, I try to explain symbol files. They are usually called PDB files because they have .PDB extension although the older ones can have .DBG extension. PDB files are needed to read dump files properly. Without PDB files the dump file data is just a collection of numbers, the contents of memory, without any meaning. PDB files help tools like WinDbg to interpret the data and present it in a human-readable format. Roughly speaking, PDB files contain associations between numbers and their meanings expressed in short text strings:

Because these associations are changed when you have a fix or a service pack on a computer and you have a dump from it you need newer PDB files that correspond to updated components such as DLLs or drivers. 

Long time ago you had to download symbol files manually from Microsoft or get them from CDs. Now Microsoft has its dedicated internet symbol server and WinDbg can download PDB files automatically. However you need to specify Microsoft symbol server location in File\Symbol File Path… dialog and check Reload. The location is usually:

SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

If you don’t remember the location when you run WinDbg for the first time or on a new computer you can enter .symfix command to set Microsoft symbol server path automatically and specify the location where to download symbol files. You can check your current symbol search path by using .sympath command and don’t forget to reload symbols by entering .reload command:

0:000> .symfix
No downstream store given, using C:\Program Files\Debugging Tools for Windows\sym
0:000> .sympath
Symbol search path is: SRV**http://msdl.microsoft.com/download/symbols
0:000> .symfix c:\websymbols
0:000> .sympath
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
0:000> .reload

- Dmitry Vostokov @ DumpAnalysis.org -

Reading Korean

Saturday, May 5th, 2007

I’m very pleased that Heejune Kim and Taehwa Lee translated and posted the first crash dump analysis pattern article on www.driveronline.org

Heejune also has a blog insidekernel.net, mainly about WDF.

I’ve just looked at my blog idea notes and found that I have plans to write about 15 more crash dump analysis patterns. I will try to write most of them in forthcoming months and later reorganize and classify them in retrospection. 

As you might have guessed already I have books about Korean language too. First I’d like to mention 2 volumes written by Ross King, Jae-Hoon Yeon and Insun Lee:

Elementary Korean

Continuing Korean 

Korean writing system, Hankul, is usually described as the most scientific writing system in the world and as the world’s best alphabet.

If you are interested in Writing Systems in general (capitalized to avoid confusion with writing systems in C++) I would recommend two books that I browse sometimes:

Writing Systems: A Linguistic Approach

Writing Systems: A Linguistic Introduction

Both books use IPA (International Phonetic Alphabet) to illustrate mapping of writing to sounds so if you have never used IPA and you want to know about it, vowels, consonants and sounds of languages, basics of speech recognition and synthesis I would greatly recommend this amazing book with CD (I read the first edition but seems there is the second one):

Vowels and Consonants

- Dmitry Vostokov -

PDBFinder (public version 3.0.1)

Friday, May 4th, 2007

Finally I’ve managed to create a slimmer version 3.0.1 with many resource consumption improvements:

  • Built database occupies 3.7 times less disk space
  • 2 times faster load
  • 5 times less memory consumption

Additional improvements:

  • “FIXFinder” feature: shows folders where you can find newer modules
  • Search can be restricted to folder names containing sub-string (for example, “release” or “full” symbols only)
  • Ready to copy and paste folder names (for WinDbg Symbol File Path dialog) – exe/dll/sys subfolders are stripped off
  • Additional minor interface improvements (larger screen, ellipsis in paths during build, better keyboard focus handling)

Screenshot:

The tool has been tested with more than 1,000,000 PDB files.

PDBFinder Deluxe Pro 3.0.1 and its documentation can be downloaded from Citrix support web site (requires free registration).

The motivation behind the tool is explained in my previous post: Cons of Symbol Server.

- Dmitry Vostokov -

Cons of Symbol Server

Thursday, May 3rd, 2007

Symbol servers are great. However I found that in crash dump analysis the absence of automatically loaded symbols sometimes helps to identify a problem or at least gives some directions for further research. It also helps to see which hot fixes or service packs for your product were installed on a problem computer. The scenario I use sometimes when I analyze crash dumps from product A is the following:

  1. Set up WinDbg to point to Microsoft Symbol Server
  2. Load a crash dump and enter various commands based on the issue. Some OS or product A components become visible and their symbols are unresolved.
  3. From unresolved OS symbols I’m aware of the latest fixes or privates from MS
  4. From unresolved symbols of the product A and PDBFinder I determine the base product level and this already gives me some directions.
  5. I add the base product A symbols to symbol file path and continue my analysis.
  6. If unresolved symbols of the product A continue to come up I use PDBFinder again to find corresponding symbols and add them to symbol file path. By doing that I’m aware of the product A hot fix and/or service pack level.
  7. Also from the latest version of PDBFinder (3.0.1) I know whether there are any updates to the component in question.

Of course, all this works only if you store all PDB files from all your fixes and service packs in some location(s) with easily identified names, for example, PRODUCTA\VER20\SP31\FIX01. Adding symbols manually helps to be focused on components, gives attention to some threads where they appear. You might think it is a waste of time but it only takes very small percentage of time especially if you look at the dump for a couple of hours.

What is PDBFinder? This is a program I developed to be able to find right symbol files (especially for minidumps). It scans all locations for PDB or DBG files and adds them to a text database. Next time you run PDBFinder it loads that database and you can find PDB or DBG file location by specifying module name and its date. You can also do a fuzzy search by specifying some date interval. If you run it with -update command line option it will build the database automatically, useful for scheduling weekly updates.  

The public version of PDBFinder Deluxe 2.2.1 can be downloaded from Citrix support web site. The new version 3.0.1 on the way with major improvements and will be announced tomorrow.

- Dmitry Vostokov -

The new version of WinDbg has been released

Tuesday, May 1st, 2007

Version 6.7.5 of Debugging Tools for Windows was released last week:

I haven’t found so far significant additions for crash dump analysis. What I noticed today is that when I open a user dump and enter
!analyze -v command the new field appears in the output:

NTGLOBALFLAG:  400

I ran gflags.exe and enabled page heap for notepad.exe. Then I launched notepad.exe, attached WinDbg, entered the command and NTGLOBALFLAG field reflected the change:

NTGLOBALFLAG:  2000000

So I don’t need to type !gflag command anymore. 

- Dmitry Vostokov -

Who calls the postmortem debugger?

Monday, April 30th, 2007

I was trying to understand who calls dwwin.exe, the part of Windows Error Reporting on my Windows XP system when the crash happens. To find an answer I launched TestDefaultDebugger application and after pushing its crash button the following familiar WER dialog box appeared:

I repeated the same when running ProcessHistory in the background and looking at its log I found that the parent process for dwwin.exe and postmortem debugger (if we click on Debug button) was TestDefaultDebugger.exe. In my case the default postmortem debugger was drwtsn32.exe. To look inside I attached WinDbg to TestDefaultDebugger process when WER dialog box above was displayed and got the following stack trace:

0:000> k
ChildEBP RetAddr
0012d318 7c90e9ab ntdll!KiFastSystemCallRet
0012d31c 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
0012d3b8 7c80a075 kernel32!WaitForMultipleObjectsEx+0x12c
0012d3d4 6945763c kernel32!WaitForMultipleObjects+0x18
0012dd68 694582b1 faultrep!StartDWException+0×5df
0012eddc 7c8633e9 faultrep!ReportFault+0×533
0012f47c 00411eaa kernel32!UnhandledExceptionFilter+0×587
0012f49c 0040e879 TestDefaultDebugger+0×11eaa
0012ffc0 7c816fd7 TestDefaultDebugger+0xe879
0012fff0 00000000 kernel32!BaseProcessStart+0×23

The combination of StartDWException and WaitForMultipleObjects suggests that dwwin.exe process is started there. Indeed, when I disassembled StartDWException function I saw CreateProcess call just before the wait call:

0:000> uf faultrep!StartDWException
...
...
...
69457585 8d8568f7ffff    lea     eax,[ebp-898h]
6945758b 50              push    eax
6945758c 8d8524f7ffff    lea     eax,[ebp-8DCh]
69457592 50              push    eax
69457593 8d85d4fbffff    lea     eax,[ebp-42Ch]
69457599 50              push    eax
6945759a 33c0            xor     eax,eax
6945759c 50              push    eax
6945759d 6820000004      push    4000020h
694575a2 6a01            push    1
694575a4 50              push    eax
694575a5 50              push    eax
694575a6 ffb5a4f7ffff    push    dword ptr [ebp-85Ch]
694575ac 50              push    eax
694575ad ff1558114569    call    dword ptr [faultrep!_imp__CreateProcessW (69451158)]


The second parameter of CreateProcess, [ebp-85Ch], is the address of the process command line and we know EBP value from the call stack above, 0012dd68. Just to remind that parameters for Win32 API functions are pushed from right to left. So we get the command line straight away:

0:000> dpu 0012dd68-85Ch l1
0012d50c  0012d3ec "C:\WINDOWS\system32\dwwin.exe -x -s 208"

If we dismiss WER dialog by clicking on Debug button then the postmortem debugger starts. Also it starts without WER dialog displayed if we rename faultrep.dll beforehand. So it looks like the obvious place to look for postmortem debugger launch is UnhandledExceptionFilter. Indeed, we see it there:

0:000> uf kernel32!UnhandledExceptionFilter
...
...
...
7c8636a8 8d850cfaffff    lea     eax,[ebp-5F4h]
7c8636ae 50              push    eax
7c8636af 8d857cf9ffff    lea     eax,[ebp-684h]
7c8636b5 50              push    eax
7c8636b6 33c0            xor     eax,eax
7c8636b8 50              push    eax
7c8636b9 50              push    eax
7c8636ba 50              push    eax
7c8636bb 6a01            push    1
7c8636bd 50              push    eax
7c8636be 50              push    eax
7c8636bf 53              push    ebx
7c8636c0 50              push    eax
7c8636c1 e86cecf9ff      call    kernel32!CreateProcessW (7c802332)


Because this is the code that yet to be executed we need to put a breakpoint at 7c8636c1, continue execution, and when the breakpoint is hit dump the second parameter to CreateProcess that is the memory EBX points to:

0:000> bp 7c8636c1
0:000> g
Breakpoint 0 hit
eax=00000000 ebx=0012ed78 ecx=0012ec70 edx=7c90eb94 esi=0000003a edi=00000000
eip=7c8636c1 esp=0012ed50 ebp=0012f47c iopl=0 nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000 efl=00000246
kernel32!UnhandledExceptionFilter+0x84b:
7c8636c1 e86cecf9ff      call    kernel32!CreateProcessW (7c802332)
0:000> du @ebx
0012ed78  “c:\drwatson\drwtsn32 -p 656 -e 1″
0012edb8  “72 -g”

You see that UnhandledExceptionFilter is about to launch my custom Dr. Watson postmortem debugger.  

If you look further at UnhandledExceptionFilter disassembled code you would see that after creating postmortem debugger process and waiting for it to finish saving the process dump the function calls NtTerminateProcess.

Therefore all error reporting, calling the postmortem debugger and final process termination are done in the same process that had the exception. The latter two parts are also described in Matt Pietrek’s article as I found afterwards.

One addition to that article written in 1997: starting from Windows XP UnhandledExceptionFilter locates and loads faultrep.dll which launches dwwin.exe to report an error:

kernel32!UnhandledExceptionFilter+0x4f7:
7c863359 8d85acfaffff    lea     eax,[ebp-554h]
7c86335f 50              push    eax
7c863360 8d8570faffff    lea     eax,[ebp-590h]
7c863366 50              push    eax
7c863367 56              push    esi
7c863368 56              push    esi
7c863369 e8fbacfaff      call    kernel32!LdrLoadDll (7c80e069)
0:000> dt _UNICODE_STRING 0012f47c-590
 "C:\WINDOWS\system32\faultrep.dll"
   +0x000 Length           : 0x40
   +0x002 MaximumLength    : 0x100
   +0x004 Buffer           : 0x0012f360  "C:\WINDOWS\system32\faultrep.dll"

You can also see that all processing is done using the same thread stack. So if something is wrong with that stack then you have silent process termination and no error is reported. In Vista there are some improvements that I’m going to cover in detail in a separate post.

- Dmitry Vostokov -

Interrupts and exceptions explained (Part 1)

Saturday, April 28th, 2007

So far we have seen various bugchecks depicted. What I left there is the explanation of how exceptions happen in the first place and how the execution flow reaches KiDispatchException. When some abnormal condition happens such as breakpoint, division by zero or memory protection violation then normal CPU execution flow (code stream) is interrupted (therefore I use the terms “interrupt” and “exception” interchangeably here). The type of interrupt is specified by a number called interrupt vector number. Obviously CPU has to transfer execution to some procedure in memory to handle that interrupt. CPU has to find that procedure, theoretically either having one procedure for all interrupts and specifying interrupt vector number as a parameter or having a table containing pointers to various procedures that correspond to different interrupt vectors. Intel x86 and x64 CPUs use the latter approach which is depicted on the following diagram:

 

When an exception happens (divide by zero, for example) CPU gets the address of the procedure table from IDTR (Interrupt Descriptor Table Register). This IDT (Interrupt Descriptor Table) is a zero-based array of 8-byte descriptors (x86) or 16-byte descriptors (x64). CPU calculates the location of the necessary procedure to call and does some necessary steps like saving appropriate registers before calling it.

The same happens when some external I/O device interrupts. We will talk about external interrupts later. For I/O devices the term “interrupt” is more appropriate.  On the picture above I/O hardware interrupt vector numbers were taken from some dump. These are OS and user-defined numbers. The first 32 vectors are reserved by Intel. 

Before Windows switches CPU to protected mode during boot process it creates IDT table in memory and sets IDTR to point to it by using SIDT instruction.

Let me now illustrate this by using a UML class diagram annotated by pseudo-code that shows what CPU does before calling the appropriate procedure. The pseudo-code on the diagram below is valid for interrupts and exceptions happening when the current CPU execution mode is kernel. For interrupts and exceptions generated when CPU executes code in user mode the picture is a little more complicated because the processor has to switch the current user-mode stack to kernel mode stack.

The following diagram is for 32-bit x86 processor (x64 will be illustrated later):

Let’s see all this in some dump. The address of IDT can be found by using !pcr [processor number] command. Every processor on a multiprocessor machine has its own IDT:

0: kd> !pcr 0
KPCR for Processor 0 at ffdff000:
    Major 1 Minor 1
 NtTib.ExceptionList: f2178b8c
     NtTib.StackBase: 00000000
    NtTib.StackLimit: 00000000
  NtTib.SubSystemTib: 80042000
       NtTib.Version: 0005c645
   NtTib.UserPointer: 00000001
       NtTib.SelfTib: 7ffdf000
             SelfPcr: ffdff000
                Prcb: ffdff120
                Irql: 0000001f
                 IRR: 00000000
                 IDR: ffffffff
       InterruptMode: 00000000
                 IDT: 8003f400
                 GDT: 8003f000
                 TSS: 80042000
       CurrentThread: 88c1d3c0
          NextThread: 00000000
          IdleThread: 808a68c0
           DpcQueue:

Every entry in IDT has the type _KIDTENTRY and we can get the first entry for divide by zero exception which has vector number 0:

0: kd> dt _KIDTENTRY 8003f400
   +0x000 Offset           : 0x47ca
   +0x002 Selector         : 8
   +0x004 Access           : 0x8e00
   +0x006 ExtendedOffset   : 0x8083

By gluing together ExtendedOffset and Offset fields we get the address of the interrupt handling procedure (0×808347ca) which is KiTrap00:

0: kd> ln 0x808347ca
(808347ca)   nt!KiTrap00   |  (808348a5)   nt!Dr_kit1_a
Exact matches:
    nt!KiTrap00

We can also see the interrupt trace in raw stack. For example, we can open the kernel dump and see the following stack trace and registers in the output of !analyze -v command:

ErrCode = 00000000
eax=00001b00 ebx=00001b00 ecx=00000000 edx=00000000 esi=f2178cb4 edi=bc15a838
eip=bf972586 esp=f2178c1c ebp=f2178c90 iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293
driver!foo+0xf9:
bf972586 f77d10 idiv eax,dword ptr [ebp+10h] ss:0010:f2178ca0=00000000
STACK_TEXT:
f2178b44 809989be nt!KeBugCheck+0×14
f2178b9c 8083484f nt!Ki386CheckDivideByZeroTrap+0×41
f2178b9c bf972586 nt!KiTrap00+0×88
f2178c90 bf94c23c driver!foo+0xf9
f2178d54 80833bdf driver!bar+0×11c

The dumping memory around ESP value (f2178c1c) shows the values processor pushes when divide by zero exception happens:

0: kd> dds f2178c1c-100 f2178c1c+100
...
...
...
f2178b80  00000000
f2178b84  f2178b50
f2178b88  00000000
f2178b8c  f2178d44
f2178b90  8083a8cc nt!_except_handler3
f2178b94  80870828 nt!`string'+0xa4
f2178b98  ffffffff
f2178b9c  f2178ba8
f2178ba0  8083484f nt!KiTrap00+0x88
f2178ba4  f2178ba8
f2178ba8  f2178c90
f2178bac  bf972586 driver!foo+0xf9
f2178bb0  badb0d00
f2178bb4  00000000
f2178bb8  0000006d
f2178bbc  bf842315
f2178bc0  f2178c6c
f2178bc4  f2178bec
f2178bc8  bf844154
f2178bcc  f2178c88
f2178bd0  f2178c88
f2178bd4  00000000
f2178bd8  bc150000
f2178bdc  f2170023
f2178be0  f2170023
f2178be4  00000000
f2178be8  00000000
f2178bec  00001b00
f2178bf0  f2178c0c
f2178bf4  f2178d44
f2178bf8  f2170030
f2178bfc  bc15a838
f2178c00  f2178cb4
f2178c04  00001b00
f2178c08  f2178c90
f2178c0c  00000000 ; ErrorCode
f2178c10  bf972586 driver!foo+0xf9 ; EIP
f2178c14  00000008 ; CS
f2178c18  00010293 ; EFLAGS

f2178c1c  00000000 ; <- ESP before interrupt
f2178c20  0013c220
f2178c24  00000000
f2178c28  60000000
f2178c2c  00000001
f2178c30  00000000
f2178c34  00000000


ErrorCode is not the same as interrupt vector number although it is the same number here (0). I won’t cover interrupt error codes here. If you are interested please consult Intel Architecture Software Developer’s Manual.

If we try to execute !idt extension command it will show you only user-defined hardware interrupt vectors:

0: kd> !idt
Dumping IDT:
37: 80a7d1ac hal!PicSpuriousService37
50: 80a7d284 hal!HalpApicRebootService
51: 89495044 serial!SerialCIsrSw (KINTERRUPT 89495008)
52: 89496044 i8042prt!I8042MouseInterruptService (KINTERRUPT 89496008)
53: 894ea044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894ea008)
63: 894f2044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894f2008)
72: 89f59044 atapi!IdePortInterrupt (KINTERRUPT 89f59008)
73: 89580044 NDIS!ndisMIsr (KINTERRUPT 89580008)
83: 899e7824 NDIS!ndisMIsr (KINTERRUPT 899e77e8)
92: 89f56044 atapi!IdePortInterrupt (KINTERRUPT 89f56008)
93: 89f1e044 SCSIPORT!ScsiPortInterrupt (KINTERRUPT 89f1e008)
a3: 894fa044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894fa008)
a4: 894a3044 cpqcidrv+0×22AE (KINTERRUPT 894a3008)
b1: 89f697dc ACPI!ACPIInterruptServiceRoutine (KINTERRUPT 89f697a0)
b3: 89498824 i8042prt!I8042KeyboardInterruptService (KINTERRUPT 894987e8)
b4: 894a2044 cpqasm2+0×5B99 (KINTERRUPT 894a2008)
c1: 80a7d410 hal!HalpBroadcastCallService
d1: 80a7c754 hal!HalpClockInterrupt
e1: 80a7d830 hal!HalpIpiHandler
e3: 80a7d654 hal!HalpLocalApicErrorService
fd: 80a7e11c hal!HalpProfileInterrupt

In the next part I’m going to cover x64 specifics. 

- Dmitry Vostokov -

Bugchecks: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

Wednesday, April 25th, 2007

Another bugcheck that is similar to KMODE_EXCEPTION_NOT_HANDLED and KERNEL_MODE_EXCEPTION_NOT_HANDLED is SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (0×7E).

This bugcheck happens when you have an exception in a system thread and there is no exception handler to catch it, i.e. no __try/__except. System threads are created by calling PsCreateSystemThread function. Here is its description from DDK:

The PsCreateSystemThread routine creates a system thread that executes in kernel mode and returns a handle for the thread.

By default PspUnhandledExceptionInSystemThread function is set as a default exception handler and its purpose is to call KeBugCheckEx.

The typical call stack in dumps with 7E bugcheck is:

1: kd> k
ChildEBP RetAddr
f70703cc 809a1eb2 nt!KeBugCheckEx+0x1b (FPO: [Non-Fpo])
f70703e8 80964a94 nt!PspUnhandledExceptionInSystemThread+0x1a
f7070ddc 80841a96 nt!PspSystemThreadStartup+0x56
00000000 00000000 nt!KiThreadStartup+0x16

To see how this bugcheck is generated from processor trap we need to look at raw stack. Let’s look at some example. !analyze -v command gives us the following output:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: 80000003, The exception code that was not handled
Arg2: f69d9dd7, The address that the exception occurred at
Arg3: f70708c0, Exception Record Address
Arg4: f70705bc, Context Record Address
EXCEPTION_RECORD:  f70708c0 -- (.exr fffffffff70708c0)
ExceptionAddress: f69d9dd7 (driver+0x00014dd7)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 8a784020
   Parameter[2]: 00000044
CONTEXT:  f70705bc -- (.cxr fffffffff70705bc)
eax=00000001 ebx=00000032 ecx=8a37c000 edx=00000044 esi=8a37c000 edi=000000c7
eip=f69d9dd7 esp=f7070988 ebp=00004000 iopl=0 nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000 efl=00000246
driver+0x14dd7:
f69d9dd7 cc              int     3
STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
f7070998 f69dfbd1 80800000 00000000 8a37c000 driver+0x14dd7
00000000 00000000 00000000 00000000 00000000 driver+0x1abd1

We see that the thread encountered the breakpoint instruction and we also see that the call stack that led to breakpoint exception is incomplete. Here we must dump the raw stack data and try to reconstruct the stack manually. System threads are started with the execution of KiThreadStartup function. So let’s dump the stack starting from ESP register and up to some value, find startup function there and try to walk EBP chain:

1: kd> dds esp esp+1000
...
...
...
f7070a04  f7070a40
f7070a08  f71172ec NDIS!ndisMCommonHaltMiniport+0×375
f7070a0c  8a37c000
f7070a10  8a60f5c0
f7070a14  8a610748
f7070a18  8a610748
f7070a1c  00000058
f7070a20  00000005
f7070a24  00000004
f7070a28  f7527000
f7070a2c  00000685
f7070a30  f710943a NDIS!ndisMDummyIndicatePacket
f7070a34  00000000
f7070a38  8a610778
f7070a3c  00010748
f7070a40  f7070b74
f7070a44  f7117640 NDIS!ndisMHaltMiniport+0×21
f7070a48  8a610990

… (intermediate frames are omitted to save space)

f7070d7c  e104a338
f7070d80  f7070dac
f7070d84  8083f72e nt!ExpWorkerThread+0xeb
f7070d88  894d0ed8
f7070d8c  00000000
f7070d90  8a784020
f7070d94  00000000
f7070d98  00000000
f7070d9c  00000000
f7070da0  00000001
f7070da4  00000000
f7070da8  808f0cb7 nt!PiWalkDeviceList
f7070dac  f7070ddc
f7070db0  8092ccff nt!PspSystemThreadStartup+0×2e
f7070db4  894d0ed8
f7070db8  00000000
f7070dbc  00000000
f7070dc0  00000000
f7070dc4  f7070db8
f7070dc8  f7070410
f7070dcc  ffffffff
f7070dd0  8083b9bc nt!_except_handler3
f7070dd4  808408f8 nt!ObWatchHandles+0×5f4
f7070dd8  00000000
f7070ddc  00000000
f7070de0  80841a96 nt!KiThreadStartup+0×16
f7070de4  8083f671 nt!ExpWorkerThread

The last EBP value we are able to get is f7070a04 and we can specify it to the extended version of k command together with ESP and EIP value of crash point:

1: kd> k L=f7070a04 f7070988 f69d9dd7
ChildEBP RetAddr
f7070998 f69dfbd1 driver+0x14dd7
f7070a40 f7117640 NDIS!ndisMCommonHaltMiniport+0x375
f7070a48 f711891a NDIS!ndisMHaltMiniport+0x21
f7070b74 f71196e5 NDIS!ndisPnPRemoveDevice+0x189
f7070ba4 8083f9d0 NDIS!ndisPnPDispatch+0x15d
f7070bb8 808f6a25 nt!IofCallDriver+0x45
f7070be4 808e20b5 nt!IopSynchronousCall+0xbe
f7070c38 8080beae nt!IopRemoveDevice+0x97
f7070c60 808e149b nt!IopRemoveLockedDeviceNode+0x160
f7070c78 808e18cc nt!IopDeleteLockedDeviceNode+0x50
f7070cac 808e1732 nt!IopDeleteLockedDeviceNodes+0x3f
f7070d40 808e19b6 nt!PiProcessQueryRemoveAndEject+0x7ad
f7070d5c 808e7879 nt!PiProcessTargetDeviceEvent+0x2a
f7070d80 8083f72e nt!PiWalkDeviceList+0x1d2
f7070dac 8092ccff nt!ExpWorkerThread+0xeb
f7070ddc 80841a96 nt!PspSystemThreadStartup+0x2e
00000000 00000000 nt!KiThreadStartup+0x16

This was the execution path before the exception. However when int 3 instruction was hit then processor generated a trap with interrupt vector 3 (4th entry in interrupt descriptor table, zero-based) and the corresponding function in IDT, KiTrap03, was called. Therefore we need to find this function on the same raw stack and try to build stack trace for our exception handling code path:

1: kd> dds esp esp+1000
f70703b4  0000007e
f70703b8  80000003
f70703bc  f69d9dd7 driver+0x14dd7
f70703c0  f70708c0
f70703c4  f70705bc
f70703c8  00000000
f70703cc  f70703e8
f70703d0  809a1eb2 nt!PspUnhandledExceptionInSystemThread+0x1a
f70703d4  0000007e
...
...
...
f7070418 f707043c
f707041c 8083b578 nt!ExecuteHandler2+0×26
f7070420 f70708c0
f7070424 f7070dcc
f7070428 f70705bc
f707042c f70704d8
f7070430 f7070894
f7070434 8083b58c nt!ExecuteHandler2+0×3a
f7070438 f7070dcc
f707043c f70704ec
f7070440 8083b54a nt!ExecuteHandler+0×24
f7070444 f70708c0

… (intermediate frames are omitted to save space)

f7070890  f7070410
f7070894  f7070dcc
f7070898  8083b9bc nt!_except_handler3
f707089c  80850c10 nt!`string’+0×10c
f70708a0  ffffffff
f70708a4  f7070914
f70708a8  808357a4 nt!CommonDispatchException+0×4a
f70708ac  f70708c0
f70708b0  00000000
f70708b4  f7070914
f70708b8  00000000
f70708bc  00000001
f70708c0  80000003
f70708c4  00000000
f70708c8  00000000
f70708cc  f69d9dd7 driver+0×14dd7
f70708d0  00000003
f70708d4  00000000
f70708d8  8a784020
f70708dc  00000044
f70708e0  ffffffff
f70708e4  00000001
f70708e8  f7737a7c
f70708ec  f7070934
f70708f0  8083f164 nt!KeWaitForSingleObject+0×346
f70708f4  000000c7
f70708f8  00000000
f70708fc  00000031
f7070900  00000000
f7070904  f707091c
f7070908  80840569 nt!KeSetTimerEx+0×179
f707090c  000000c7
f7070910  80835f39 nt!KiTrap03+0xb0
f7070914  00004000

The last EBP value we are able to get is f7070418 so the following stack trace emerges albeit not accurate as it doesn’t show that KeBugCheckEx was called from PspUnhandledExceptionInSystemThread as can be seen from disassembly. However it does show that the main function that orchestrates stack unwinding is KiDispatchException:

1: kd> k L=f7070418
ChildEBP RetAddr
f7070418 8083b578 nt!KeBugCheckEx+0x1b
f707043c 8083b54a nt!ExecuteHandler2+0x26
f70704ec 80810664 nt!ExecuteHandler+0x24
f70708a4 808357a4 nt!KiDispatchException+0x131
f707090c 80835f39 nt!CommonDispatchException+0x4a
f707090c f69d9dd8 nt!KiTrap03+0xb0
1: kd> uf nt!PspUnhandledExceptionInSystemThread
nt!PspUnhandledExceptionInSystemThread:
809a1e98 8bff            mov     edi,edi
809a1e9a 55              push    ebp
809a1e9b 8bec            mov     ebp,esp
809a1e9d 8b4d08          mov     ecx,dword ptr [ebp+8]
809a1ea0 ff7104          push    dword ptr [ecx+4]
809a1ea3 8b01            mov     eax,dword ptr [ecx]
809a1ea5 50              push    eax
809a1ea6 ff700c          push    dword ptr [eax+0Ch]
809a1ea9 ff30            push    dword ptr [eax]
809a1eab 6a7e            push    7Eh
809a1ead e8f197edff      call    nt!KeBugCheckEx (8087b6a3)

At the end to illustrate this bugcheck I created the following UML sequence diagram (trap processing that leads to KiDispatchException will be depicted in another post):

- Dmitry Vostokov -

Bugchecks: KMODE_EXCEPTION_NOT_HANDLED

Wednesday, April 25th, 2007

This bugcheck (0×1E) is essentially the same as KERNEL_MODE_EXCEPTION_NOT_HANDLED (0×8E) although parameters are different:

KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8046ce72, The address that the exception occurred at
Arg3: 00000000, Parameter 0 of the exception
Arg4: 00000000, Parameter 1 of the exception

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0×80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but … If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 808cbb8d, The address that the exception occurred at
Arg3: f5a84638, Trap Frame
Arg4: 00000000

Bugcheck 0×1E is called from the same routine KiDispatchException on x64 W2K3 and on x86 W2K whereas 0×8E is called on x86 W2K3 and Vista platforms. I haven’t checked this with x64 Vista. Here is the modified UML diagram showing both bugchecks:

    

- Dmitry Vostokov -

Crash Dump Analysis Poster v1.1 (HTML version)

Sunday, April 22nd, 2007

Here is an HTML version of Crash Dump Analysis Poster with hyperlinks. Command links launch WinDbg Help for corresponding topic. If you click on !heap, for example, WinDbg Help window for that command will open. In order to have this functionality you need to save source code of the following HTML file below to your disk and launch it locally.

http://www.dumpanalysis.org/CDAPoster.html

Your WinDbg Help file must be in the default installation path, i.e.

C:\Program Files\Debugging Tools for Windows\debugger.chm

If you installed WinDbg to a different folder then you can simply create the default folder and copy debugger.chm there.

I keep this HTML file open locally on a second monitor and found it very easy to jump to an appropriate command help when I need its parameter description.

This HTML poster was created and edited in Notepad.

I’m working on the second version and will announce it as soon as it is ready.

- Dmitry Vostokov -

Bugchecks: KERNEL_MODE_EXCEPTION_NOT_HANDLED

Sunday, April 22nd, 2007

Here is the next depicted bugcheck: 0×8E. It is very common in kernel crash dumps and it means that:

  1. If access violation exception happened the read or write address was in user space

  2. Frame-based exception handling was allowed, kernel debugger (if any) didn’t handle the exception (first chance), then no exception handlers were willing to process the exception and at last kernel debugger (if any) didn’t handle the exception (second chance)

  3. Frame-based exception handling wasn’t allowed and kernel debugger (if any) didn’t handle the exception

The second option is depicted on the following UML sequence diagram:

Note: if you have an access violation and read or write address is in kernel space you get a different bugcheck as explained in Invalid Pointer Pattern 

I assumed that you know about structured and frame based exception handling (SEH). If you don’t know how it is implemented please read Matt Pietrek’s article: A Crash Course on the Depths of Win32 Structured Exception Handling

References used:

  1. “Windows NT/2000 Native API Reference” book by Gary Nebbett
  2. Local kernel debugging on Windows XP to check that the flow on the diagram above is correct

- Dmitry Vostokov -

More WinDbg Scripts

Saturday, April 21st, 2007

The good collection of WinDbg scripts was posted this month on Roberto Farah’s blog. Most of his scripts use DML (Debugger Markup Language). I also looked over my previous posts and put my old crash dump analysis scripts into one collection.

Found one problem. I was writing a script yesterday to call some external programs and found that I couldn’t easily pass parameters to the script. I mean something like this would be great:

$$><myscript.txt param1 param2

and I could possibly reference parameters inside the script perhaps by some pseudo-register. The current approach is to set up pseudo-registers before running the script and I don’t like it. I’ll post a better solution if I find it.

- Dmitry Vostokov -