Software Diagnostics Library

Resurrecting Dr. Watson on Vista

May 19th, 2007

Feeling nostalgic about pre-Vista times I recalled that one month before upgrading my Windows XP to Vista I saved the copy of Dr. Watson (drwtsn32.exe). Of course, during upgrade, drwtsn32.exe was removed from system32 folder. Now I copied it back and set it as the default postmortem debugger from the elevated command prompt:

When I looked at the registry I found the correctly set key values:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug Debugger=drwtsn32 -p %ld -e %ld -g Auto=1

Auto=1 means do not show the error message box, just go ahead and dump the process. Actually with Auto=0 Dr. Watson doesn’t work on my Vista.

Also I configured Dr. Watson to store the log and full user dump in c:\DrWatson folder by running drwtsn32.exe from the same elevated command prompt:

Next I launched TestDefaultDebugger and hit the big crash button. Access violation happened and I saw familiar “Program Error” message box:

The log was created and the user dump was saved in the specified folder. All subsequent crashes were appended to the log and user.dmp was updated. When I opened the dump in WinDbg I got the following output:

Loading Dump File [C:DrWatsonuser.dmp] User Mini Dump File with Full Memory: Only application data is available Comment: ‘Dr. Watson generated MiniDump’ Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows Vista Version 6000 UP Free x86 compatible Product: WinNt, suite: SingleUserTS Debug session time: Sat May 19 20:52:23.000 2007 (GMT+1) System Uptime: 5 days 20:00:04.062 Process Uptime: 0 days 0:00:03.000 This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. (1f70.1e0c): Access violation - code c0000005 (first/second chance not available) eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8 edi=0012fe70 eip=004014f0 esp=0012f8a8 ebp=0012f8b4 iopl=0 nv up ei ng nz ac pe cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010297 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1: 004014f0 c7050000000000000000 mov dword ptr ds:[0],0 ds:0023:00000000=???????

Therefore I believe that if I saved ntsd.exe before upgrading to Vista I would have been able to set it as a default postmortem debugger too.

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Tools, Vista | 8 Comments »

Inside Vista Error Reporting (Part 1)

May 19th, 2007

This is a follow up to the post about postmortem debuggers on Windows XP/W2K3. Now we look inside the same mechanism on Vista. After launching TestDefaultDebugger and pushing its crash button we get the following Windows error reporting dialog:

If we attach WinDbg to our TestDefaultDebugger process we would no longer see our default unhandled exception filter waiting for the error reporting process:

Windows XP

0:000> k ChildEBP RetAddr 0012d318 7c90e9ab ntdll!KiFastSystemCallRet 0012d31c 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc 0012d3b8 7c80a075 kernel32!WaitForMultipleObjectsEx+0×12c 0012d3d4 6945763c kernel32!WaitForMultipleObjects+0×18 0012dd68 694582b1 faultrep!StartDWException+0×5df 0012eddc 7c8633e9 faultrep!ReportFault+0×533 0012f47c 00411eaa kernel32!UnhandledExceptionFilter+0×587 0012f8a4 00403263 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1 0012f8b4 00403470 TestDefaultDebugger!_AfxDispatchCmdMsg+0×43 … … … 0012fff0 00000000 kernel32!BaseProcessStart+0×23

Windows Vista

0:001> ~*kL 100 0 Id: 120c.148c Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr 0012f8a4 00403263 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1 0012f8b4 00403470 TestDefaultDebugger!_AfxDispatchCmdMsg+0×43 0012f8e4 00402a27 TestDefaultDebugger!CCmdTarget::OnCmdMsg+0×118 0012f908 00408e69 TestDefaultDebugger!CDialog::OnCmdMsg+0×1b 0012f958 004098d9 TestDefaultDebugger!CWnd::OnCommand+0×90 0012f9f4 00406258 TestDefaultDebugger!CWnd::OnWndMsg+0×36 0012fa14 0040836d TestDefaultDebugger!CWnd::WindowProc+0×22 0012fa7c 004083f4 TestDefaultDebugger!AfxCallWndProc+0×9a 0012fa9c 77b71a10 TestDefaultDebugger!AfxWndProc+0×34 0012fac8 77b71ae8 USER32!InternalCallWinProc+0×23 0012fb40 77b7286a USER32!UserCallWinProcCheckWow+0×14b 0012fb80 77b72bba USER32!SendMessageWorker+0×4b7 0012fba0 7504e5cc USER32!SendMessageW+0×7c 0012fbc0 7504e583 COMCTL32!Button_NotifyParent+0×3d 0012fbdc 7504e680 COMCTL32!Button_ReleaseCapture+0×112 0012fc34 77b71a10 COMCTL32!Button_WndProc+0xa4b 0012fc60 77b71ae8 USER32!InternalCallWinProc+0×23 0012fcd8 77b72a47 USER32!UserCallWinProcCheckWow+0×14b 0012fd3c 77b72a98 USER32!DispatchMessageWorker+0×322 0012fd4c 77b6120c USER32!DispatchMessageW+0xf 0012fd70 0040568b USER32!IsDialogMessageW+0×586 0012fd80 004065d8 TestDefaultDebugger!CWnd::IsDialogMessageW+0×2e 0012fd88 00402a07 TestDefaultDebugger!CWnd::PreTranslateInput+0×29 0012fd98 00408041 TestDefaultDebugger!CDialog::PreTranslateMessage+0×96 0012fda8 00403ae3 TestDefaultDebugger!CWnd::WalkPreTranslateTree+0×1f 0012fdbc 00403c1e TestDefaultDebugger!AfxInternalPreTranslateMessage+0×3b 0012fdc4 00403b29 TestDefaultDebugger!CWinThread::PreTranslateMessage+0×9 0012fdcc 00403c68 TestDefaultDebugger!AfxPreTranslateMessage+0×15 0012fddc 00407920 TestDefaultDebugger!AfxInternalPumpMessage+0×2b 0012fe00 004030a1 TestDefaultDebugger!CWnd::RunModalLoop+0xca 0012fe4c 0040110d TestDefaultDebugger!CDialog::DoModal+0×12c 0012fef8 004206fb TestDefaultDebugger!CTestDefaultDebuggerApp::InitInstance+0xdd 0012ff08 0040e852 TestDefaultDebugger!AfxWinMain+0×47 0012ffa0 77603833 TestDefaultDebugger!__tmainCRTStartup+0×176 0012ffac 779ea9bd kernel32!BaseThreadInitThunk+0xe 0012ffec 00000000 ntdll!_RtlUserThreadStart+0×23 # 1 Id: 120c.17e4 Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr 011cff70 77a3f0a9 ntdll!DbgBreakPoint 011cffa0 77603833 ntdll!DbgUiRemoteBreakin+0×3c 011cffac 779ea9bd kernel32!BaseThreadInitThunk+0xe 011cffec 00000000 ntdll!_RtlUserThreadStart+0×23

Let’s look at the faulting thread’s raw stack data:

0:001> ~0 s eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8 edi=0012fe70 eip=004014f0 esp=0012f8a8 ebp=0012f8b4 iopl=0 nv up ei ng nz ac pe cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010297 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1: 004014f0 mov dword ptr ds:[0],0 ds:0023:00000000=???????? 0:000> !teb TEB at 7ffdf000 ExceptionList: 0012f9e8 StackBase: 00130000 StackLimit: 0012d000 SubSystemTib: 00000000 FiberData: 00001e00 ArbitraryUserPointer: 00000000 Self: 7ffdf000 EnvironmentPointer: 00000000 ClientId: 0000120c . 0000148c RpcHandle: 00000000 Tls Storage: 7ffdf02c PEB Address: 7ffda000 LastErrorValue: 0 LastStatusValue: c000008a Count Owned Locks: 0 HardErrorMode: 0 0:000>dds 0012d000 00130000 … … … 0012f368 0012f3c0 0012f36c 7760fb01 kernel32!GetApplicationRecoveryCallback+0×33 0012f370 ffffffff 0012f374 0012f380 0012f378 00000001 0012f37c 00000000 0012f380 00000000 0012f384 00000000 0012f388 00000000 0012f38c 00000000 0012f390 00000000 0012f394 00000000 0012f398 00000000 0012f39c 00000000 0012f3a0 00000000 0012f3a4 00000000 0012f3a8 00000000 0012f3ac 00000000 0012f3b0 00000000 0012f3b4 00000000 0012f3b8 00000000 0012f3bc 00000000 0012f3c0 0012f410 0012f3c4 7767aa88 kernel32!WerpReportExceptionInProcessContext+0×82 0012f3c8 ffffffff 0012f3cc 0012f3ec 0012f3d0 00000000 0012f3d4 00000000 0012f3d8 7767aab7 kernel32!WerpReportExceptionInProcessContext+0xa7 0012f3dc 001257b9 0012f3e0 00000001 0012f3e4 00000000 0012f3e8 0012f4c8 0012f3ec 00000000 0012f3f0 00000000 0012f3f4 00000000 0012f3f8 0012f3dc 0012f3fc ffffffff 0012f400 0012f488 0012f404 775d5ac9 kernel32!_except_handler4 0012f408 77670969 kernel32!Internal_NotifyUILanguageChange+0×4a6 0012f40c fffffffe 0012f410 7767aab7 kernel32!WerpReportExceptionInProcessContext+0xa7 0012f414 77655b41 kernel32!UnhandledExceptionFilter+0×1b2 0012f418 77655cbd kernel32!UnhandledExceptionFilter+0×32e 0012f41c 00125731 0012f420 00000000 0012f424 0012f4c8 0012f428 00000000 0012f42c 00000000 0012f430 00000000 0012f434 00000000 0012f438 00000000 0012f43c 00000800 0012f440 00000000 0012f444 00000000 0012f448 00000000 0012f44c 00000000 0012f450 00000000 0012f454 00000005 0012f458 994ac7c4 0012f45c 00000011 0012f460 00000000 0012f464 0012f5c0 0012f468 775d5ac9 kernel32!_except_handler4 0012f46c 00000001 0012f470 00000000 0012f474 77655cbd kernel32!UnhandledExceptionFilter+0×32e 0012f478 00000000 0012f47c 00000000 0012f480 0012f41c 0012f484 00000024 0012f488 0012f4f4 0012f48c 775d5ac9 kernel32!_except_handler4 0012f490 7765ff59 kernel32!PEWriteResource<_IMAGE_NT_HEADERS>+0×50a 0012f494 fffffffe 0012f498 77655cbd kernel32!UnhandledExceptionFilter+0×32e 0012f49c 77a29f8e ntdll!_RtlUserThreadStart+0×6f 0012f4a0 00000000 0012f4a4 779b8dd4 ntdll!_EH4_CallFilterFunc+0×12 0012f4a8 00000000 0012f4ac 0012ffec 0012f4b0 779ff108 ntdll! ?? ::FNODOBFM::`string’+0xb6e 0012f4b4 0012f4dc 0012f4b8 779b40e4 ntdll!_except_handler4+0xcc 0012f4bc 00000000 0012f4c0 00000000 0012f4c4 00000000 0012f4c8 0012f5c0 0012f4cc 0012f5dc 0012f4d0 779ff118 ntdll! ?? ::FNODOBFM::`string’+0xb7e 0012f4d4 00000001 0012f4d8 0112f5c0 0012f4dc 0012f500 0012f4e0 77a11039 ntdll!ExecuteHandler2+0×26 0012f4e4 fffffffe 0012f4e8 0012ffdc 0012f4ec 0012f5dc 0012f4f0 0012f59c 0012f4f4 0012f9e8 0012f4f8 77a1104d ntdll!ExecuteHandler2+0×3a 0012f4fc 0012ffdc 0012f500 0012f5a8 0012f504 77a1100b ntdll!ExecuteHandler+0×24 0012f508 0012f5c0 0012f50c 0012ffdc 0012f510 0012fe70 0012f514 0012f59c 0012f518 779b8bf2 ntdll!_except_handler4 0012f51c 00000000 0012f520 0012f5c0 0012f524 0012f538 0012f528 779b94e3 ntdll!RtlCallVectoredContinueHandlers+0×15 0012f52c 0012f5c0 0012f530 0012f5dc 0012f534 77a754c0 ntdll!RtlpCallbackEntryList 0012f538 0012f5a8 0012f53c 779b94c1 ntdll!RtlDispatchException+0×11f 0012f540 0012f5c0 0012f544 0012f5dc 0012f548 00425ae8 TestDefaultDebugger!CTestDefaultDebuggerApp::`vftable’+0×154 0012f54c 00000000 0012f550 00000502 0012f554 00000000 0012f558 00a460e0 0012f55c 00000000 0012f560 00000000 0012f564 00000070 0012f568 ffffffff 0012f56c ffffffff 0012f570 77b60dba USER32!UserCallDlgProcCheckWow+0×5f 0012f574 77b60e63 USER32!UserCallDlgProcCheckWow+0×16e 0012f578 0000006c 0012f57c 00000000 0012f580 00000000 0012f584 00000000 0012f588 00000000 0012f58c 0000004e 0012f590 00000000 0012f594 0012f634 0012f598 77bb76cc USER32!_except_handler4 0012f59c 0012f634 0012f5a0 00130000 0012f5a4 00000000 0012f5a8 0012f8b4 0012f5ac 77a10060 ntdll!NtRaiseException+0xc 0012f5b0 77a10eb2 ntdll!KiUserExceptionDispatcher+0×2a 0012f5b4 0012f5c0 … … …

It shows the presence of kernel32!UnhandledExceptionFilter calls. Let’s open TestDefaultDebugger.exe in WinDbg, put breakpoint on UnhandledExceptionFilter and trace the execution. We have to change the return value of IsDebugPortPresent to simulate the normal fault handling logic when no active debugger is attached:

0:000> bp kernel32!UnhandledExceptionFilter 0:000> g (fb0.1190): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8 edi=0012fe70 eip=004014f0 esp=0012f8a8 ebp=0012f8b4 iopl=0 nv up ei ng nz ac pe cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010297 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1: 004014f0 mov dword ptr ds:[0],0 ds:0023:00000000=???????? 0:000> g Breakpoint 0 hit eax=0042ae58 ebx=00000000 ecx=0042ae58 edx=0042ae58 esi=003b07d8 edi=c0000005 eip=77655984 esp=0012f478 ebp=0012f494 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 kernel32!UnhandledExceptionFilter: 77655984 push 5Ch 0:000> g $$ skip first chance exception Breakpoint 0 hit eax=77655984 ebx=00000000 ecx=0012f404 edx=77a10f34 esi=0012f4c8 edi=00000000 eip=77655984 esp=0012f49c ebp=0012ffec iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206 kernel32!UnhandledExceptionFilter: 77655984 push 5Ch 0:000> p eax=77655984 ebx=00000000 ecx=0012f404 edx=77a10f34 esi=0012f4c8 edi=00000000 eip=77655986 esp=0012f498 ebp=0012ffec iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206 kernel32!UnhandledExceptionFilter+0×2: 77655986 push offset kernel32!strcat_s+0×128d (77655cf0) … … … 0:000> p eax=00000000 ebx=0012f4c8 ecx=776558e5 edx=77a10f34 esi=00000000 edi=00000000 eip=77655a33 esp=0012f41c ebp=0012f498 iopl=0 nv up ei pl nz ac po cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000213 kernel32!UnhandledExceptionFilter+0xa5: 77655a33 call kernel32!IsDebugPortPresent (7765594c) 0:000> p eax=00000001 ebx=0012f4c8 ecx=0012f3f4 edx=77a10f34 esi=00000000 edi=00000000 eip=77655a38 esp=0012f41c ebp=0012f498 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 kernel32!UnhandledExceptionFilter+0xaa: 77655a38 test eax,eax 0:000> r eax=0 0:000> p eax=00000000 ebx=0012f4c8 ecx=0012f3f4 edx=77a10f34 esi=00000000 edi=00000000 eip=77655a3a esp=0012f41c ebp=0012f498 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 kernel32!UnhandledExceptionFilter+0xac: 77655a3a jne kernel32!UnhandledExceptionFilter+0×22 (776559a6) [br=0]

Next, we continue to step over using p command until we see WerpReportExceptionInProcessContext function and step into it:

0:000> p eax=c0000022 ebx=0012f4c8 ecx=0012f400 edx=77a10f34 esi=00000000 edi=00000001 eip=77655b3c esp=0012f418 ebp=0012f498 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 kernel32!UnhandledExceptionFilter+0×1ad: 77655b3c call kernel32!WerpReportExceptionInProcessContext (7767aa06) 0:000> t eax=c0000022 ebx=0012f4c8 ecx=0012f400 edx=77a10f34 esi=00000000 edi=00000001 eip=7767aa06 esp=0012f414 ebp=0012f498 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 kernel32!WerpReportExceptionInProcessContext: 7767aa06 push 14h

At this point if we look at the stack trace we would see:

0:000> kL 100 ChildEBP RetAddr 0012f410 77655b41 kernel32!WerpReportExceptionInProcessContext 0012f498 77a29f8e kernel32!UnhandledExceptionFilter+0×1b2 0012f4a0 779b8dd4 ntdll!_RtlUserThreadStart+0×6f 0012f4b4 779b40f0 ntdll!_EH4_CallFilterFunc+0×12 0012f4dc 77a11039 ntdll!_except_handler4+0×8e 0012f500 77a1100b ntdll!ExecuteHandler2+0×26 0012f5a8 77a10e97 ntdll!ExecuteHandler+0×24 0012f5a8 004014f0 ntdll!KiUserExceptionDispatcher+0xf 0012f8a4 00403263 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1 0012f8b4 00403470 TestDefaultDebugger!_AfxDispatchCmdMsg+0×43 … … …

After that we step over again and find that the code flow returns from all exception handlers until KiUserExceptionDispatcher raises exception again via ZwRaiseException call.

So it looks that the default unhandled exception filter in Vista only reports the exception and doesn’t launch the error reporting process that displays the error box, WerFault.exe.

If we click on Debug button on the error reporting dialog to launch the postmortem debugger (I have Visual Studio Just-In-Time Debugger configured in AeDebug\Debugger registry key) and look at its parent process by using Process Explorer for example, we would see it is WerFault.exe which in turn has svchost.exe as its parent.

Now we quit WinDbg and launch TestDefaultDebugger again, push its big crash button and when the error reporting dialog appears we attach another instance of WinDbg to svchost.exe process hosting Windows Error Reporting Service (wersvc.dll).
We see the following threads:

0:000> ~*k . 0 Id: f8c.f90 Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr 0008f5b4 77a10080 ntdll!KiFastSystemCallRet 0008f5b8 7760853f ntdll!ZwReadFile+0xc 0008f630 7709ffe2 kernel32!ReadFile+0×20e 0008f65c 7709fdfb ADVAPI32!ScGetPipeInput+0×2a 0008f6c4 7709bdd2 ADVAPI32!ScDispatcherLoop+0×6c 0008f93c 004a241d ADVAPI32!StartServiceCtrlDispatcherW+0xce 0008f944 004a2401 svchost!SvcHostMain+0×12 0008f948 004a2183 svchost!wmain+0×5 0008f98c 77603833 svchost!_initterm_e+0×163 0008f998 779ea9bd kernel32!BaseThreadInitThunk+0xe 0008f9d8 00000000 ntdll!_RtlUserThreadStart+0×23 1 Id: f8c.fa4 Suspend: 1 Teb: 7ffdd000 Unfrozen ChildEBP RetAddr 0086f6d0 77a10690 ntdll!KiFastSystemCallRet 0086f6d4 779cb65b ntdll!ZwWaitForMultipleObjects+0xc 0086f870 77603833 ntdll!TppWaiterpThread+0×294 0086f87c 779ea9bd kernel32!BaseThreadInitThunk+0xe 0086f8bc 00000000 ntdll!_RtlUserThreadStart+0×23 2 Id: f8c.fa8 Suspend: 1 Teb: 7ffdc000 Unfrozen ChildEBP RetAddr 0031f81c 77a0f2c0 ntdll!KiFastSystemCallRet 0031f820 71cb1545 ntdll!NtAlpcSendWaitReceivePort+0xc 0031fd3c 71cb63c4 wersvc!CWerService::LpcServerThread+0×9c 0031fd44 77603833 wersvc!CWerService::StaticLpcServerThread+0xd 0031fd50 779ea9bd kernel32!BaseThreadInitThunk+0xe 0031fd90 00000000 ntdll!_RtlUserThreadStart+0×23 3 Id: f8c.2cc Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr 00f8f768 77a106a0 ntdll!KiFastSystemCallRet 00f8f76c 776077d4 ntdll!NtWaitForSingleObject+0xc 00f8f7dc 77607742 kernel32!WaitForSingleObjectEx+0xbe 00f8f7f0 71cb6f4b kernel32!WaitForSingleObject+0×12 00f8f858 71cb6803 wersvc!CWerService::ReportCrashKernelMsg+0×256 00f8fb7c 71cb6770 wersvc!CWerService::DispatchPortRequestWorkItem+0×70a 00f8fb90 779c1fbb wersvc!CWerService::StaticDispatchPortRequestWorkItem+0×17 00f8fbb4 77a1a2b8 ntdll!TppSimplepExecuteCallback+0×10c 00f8fcdc 77603833 ntdll!TppWorkerThread+0×522 00f8fce8 779ea9bd kernel32!BaseThreadInitThunk+0xe 00f8fd28 00000000 ntdll!_RtlUserThreadStart+0×23 4 Id: f8c.1b38 Suspend: 1 Teb: 7ffdb000 Unfrozen ChildEBP RetAddr 00d3fe08 77a10850 ntdll!KiFastSystemCallRet 00d3fe0c 77a1a1b4 ntdll!NtWaitForWorkViaWorkerFactory+0xc 00d3ff34 77603833 ntdll!TppWorkerThread+0×1f6 00d3ff40 779ea9bd kernel32!BaseThreadInitThunk+0xe 00d3ff80 00000000 ntdll!_RtlUserThreadStart+0×23

First, it looks like some LPC notification mechanism is present here (CWerService::LpcServerThread).
Next, if we look at CWerService::ReportCrashKernelMsg code we would see it calls CWerService::ReportCrash which in turn loads faultrep.dll

0:000> .asm no_code_bytes Assembly options: no_code_bytes 0:000> uf wersvc!CWerService::ReportCrashKernelMsg … … … wersvc!CWerService::ReportCrashKernelMsg+0×226: 71cb6f13 lea eax,[ebp-20h] 71cb6f16 push eax 71cb6f17 push dword ptr [ebp-34h] 71cb6f1a push dword ptr [ebp-2Ch] 71cb6f1d call dword ptr [wersvc!_imp__GetCurrentProcessId (71cb1120)] 71cb6f23 push eax 71cb6f24 mov ecx,dword ptr [ebp-38h] 71cb6f27 call wersvc!CWerService::ReportCrash (71cb7008) 71cb6f2c mov dword ptr [ebp-1Ch],eax 71cb6f2f cmp eax,ebx 71cb6f31 jl wersvc!CWerService::ReportCrashKernelMsg+0×279 (71cb6a10) … … … 0:000> uf wersvc!CWerService::ReportCrash … … … wersvc!CWerService::ReportCrash+0×3d: 71cb7045 mov dword ptr [ebp-4],edi 71cb7048 push offset wersvc!`string’ (71cb711c) 71cb704d call dword ptr [wersvc!_imp__LoadLibraryW (71cb1144)] 71cb7053 mov dword ptr [ebp-2Ch],eax 71cb7056 cmp eax,edi 71cb7058 je wersvc!CWerService::ReportCrash+0×52 (71cb9b47)

wersvc!CWerService::ReportCrash+0×88: 71cb705e push offset wersvc!`string’ (71cb7100) 71cb7063 push eax 71cb7064 call dword ptr [wersvc!_imp__GetProcAddress (71cb1140)] 71cb706a mov ebx,eax 71cb706c cmp ebx,edi 71cb706e je wersvc!CWerService::ReportCrash+0×9a (71cb9b7d) … … … 0:000> du 71cb711c 71cb711c “faultrep.dll” 0:000> da 71cb7100 71cb7100 “WerpInitiateCrashReporting”

If we attach a new instance of WinDbg to WerFault.exe and inspect its threads we would see:

0:003> ~*k 0 Id: 1bfc.16c4 Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr 0015de60 77a10690 ntdll!KiFastSystemCallRet 0015de64 77607e09 ntdll!ZwWaitForMultipleObjects+0xc 0015df00 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d 0015df54 77b68b83 USER32!RealMsgWaitForMultipleObjectsEx+0×13c 0015df70 6d46d90d USER32!MsgWaitForMultipleObjects+0×1f 0015dfc0 6d4acd77 wer!UtilMsgWaitForMultipleObjects+0×8a 0015dff4 6d4a7694 wer!CInitialConsentUI::Show+0×133 0015e040 6d4a9a69 wer!CEventUI::GetInitialDialogSelection+0xc6 0015e104 6d46df18 wer!CEventUI::Start+0×32 0015e39c 6d46b743 wer!CWatson::ReportProblem+0×438 0015e3ac 6d46b708 wer!WatsonReportSend+0×1e 0015e3c8 6d46b682 wer!CDWInstance::WatsonReportStub+0×17 0015e3ec 6d472a7f wer!CDWInstance::SubmitReport+0×21e 0015e410 730b6d0d wer!WerReportSubmit+0×5d 0015f33c 730b73c1 faultrep!CCrashWatson::GenerateCrashReport+0×5c4 0015f5d4 730b4de1 faultrep!CCrashWatson::ReportCrash+0×374 0015fad4 009bd895 faultrep!WerpInitiateCrashReporting+0×304 0015fb0c 009b60cd WerFault!UserCrashMain+0×14e 0015fb30 009b644a WerFault!wmain+0xbf 0015fb74 77603833 WerFault!_initterm_e+0×163 1 Id: 1bfc.894 Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr 024afbf8 77a10690 ntdll!KiFastSystemCallRet 024afbfc 77607e09 ntdll!ZwWaitForMultipleObjects+0xc 024afc98 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d 024afcec 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c 024afd0c 74fa2cb6 DUser!CoreSC::Wait+0×59 024afd34 74fa2c55 DUser!CoreSC::WaitMessage+0×54 024afd44 77b615c0 DUser!MphWaitMessageEx+0×22 024afd60 77a10e6e USER32!__ClientWaitMessageExMPH+0×1e 024afd7c 77b6b5bc ntdll!KiUserCallbackDispatcher+0×2e 024afd80 77b61598 USER32!NtUserWaitMessage+0xc 024afdb4 77b61460 USER32!DialogBox2+0×202 024afddc 77b614a2 USER32!InternalDialogBox+0xd0 024afdfc 77b61505 USER32!DialogBoxIndirectParamAorW+0×37 024afe1c 75036c51 USER32!DialogBoxIndirectParamW+0×1b 024afe40 75036beb comctl32!SHFusionDialogBoxIndirectParam+0×2d 024afe74 6d4a65a4 comctl32!CTaskDialog::Show+0×100 024afebc 6d4acb72 wer!IsolationAwareTaskDialogIndirect+0×64 024aff4c 6d4acc39 wer!CInitialConsentUI::InitialDlgThreadRoutine+0×369 024aff54 77603833 wer!CInitialConsentUI::Static_InitialDlgThreadRoutine+0xd 024aff60 779ea9bd kernel32!BaseThreadInitThunk+0xe 2 Id: 1bfc.1a04 Suspend: 1 Teb: 7ffdc000 Unfrozen ChildEBP RetAddr 012bf998 77a10690 ntdll!KiFastSystemCallRet 012bf99c 77607e09 ntdll!ZwWaitForMultipleObjects+0xc 012bfa38 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d 012bfa8c 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c 012bfaac 74fa1642 DUser!CoreSC::Wait+0×59 012bfae0 74fac442 DUser!CoreSC::xwProcessNL+0xaa 012bfb00 74fac3a2 DUser!GetMessageExA+0×44 012bfb54 779262b6 DUser!ResourceManager::SharedThreadProc+0xb6 012bfb8c 779263de msvcrt!_endthreadex+0×44 012bfb94 77603833 msvcrt!_endthreadex+0xce 012bfba0 779ea9bd kernel32!BaseThreadInitThunk+0xe 012bfbe0 00000000 ntdll!_RtlUserThreadStart+0×23 # 3 Id: 1bfc.14a4 Suspend: 1 Teb: 7ffdb000 Unfrozen ChildEBP RetAddr 02a1fc40 77a3f0a9 ntdll!DbgBreakPoint 02a1fc70 77603833 ntdll!DbgUiRemoteBreakin+0×3c 02a1fc7c 779ea9bd kernel32!BaseThreadInitThunk+0xe 02a1fcbc 00000000 ntdll!_RtlUserThreadStart+0×23

Next, we put a breakpoint on CreateProcess, push Debug button on the error reporting dialog and upon the breakpoint hit inspect CreateProcess parameters:

0:003> .asm no_code_bytes Assembly options: no_code_bytes 0:003> bp kernel32!CreateProcessW 0:003> g Breakpoint 0 hit eax=00000000 ebx=00000000 ecx=7ffdf000 edx=0015db30 esi=00000001 edi=00000000 eip=775c1d27 esp=0015dfe0 ebp=0015e408 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 kernel32!CreateProcessW: 775c1d27 mov edi,edi 0:000> ddu esp+8 l1 0015dfe8 008b0000 “”C:\WINDOWS\system32\vsjitdebugger.exe” -p 8064 -e 312″

ESP points to return address, ESP+4 points to the first CreateProcess parameter and ESP+8 points to the second parameter. The thread stack now involves faultrep.dll:

0:000> k ChildEBP RetAddr 0020dde0 730bb2b5 kernel32!CreateProcessW 0020e20c 730b6dae faultrep!WerpLaunchAeDebug+0×384 0020f140 730b73c1 faultrep!CCrashWatson::GenerateCrashReport+0×665 0020f3d8 730b4de1 faultrep!CCrashWatson::ReportCrash+0×374 0020f8d8 009bd895 faultrep!WerpInitiateCrashReporting+0×304 0020f910 009b60cd WerFault!UserCrashMain+0×14e 0020f934 009b644a WerFault!wmain+0xbf 0020f978 77603833 WerFault!_initterm_e+0×163 0020f984 779ea9bd kernel32!BaseThreadInitThunk+0xe 0020f9c4 00000000 ntdll!_RtlUserThreadStart+0×23

Therefore it looks like calls to faultrep.dll module to report faults and launch the posmortem debugger were moved from UnhandledExceptionFilter to WerFault.exe in Vista.

Finally, let’s go back to our UnhandledExceptionFilter. If we disassemble it we would see that it can call kernel32!WerpLaunchAeDebug too:

0:000> .asm no_code_bytes Assembly options: no_code_bytes 0:000> uf kernel32!UnhandledExceptionFilter … … … kernel32!UnhandledExceptionFilter+0×2d0: 77655c5f push dword ptr [ebp-28h] 77655c62 push dword ptr [ebp-1Ch] 77655c65 push dword ptr [ebx+4] 77655c68 push dword ptr [ebx] 77655c6a push 0FFFFFFFEh 77655c6c call kernel32!GetCurrentProcess (775e9145) 77655c71 push eax 77655c72 call kernel32!WerpLaunchAeDebug (7767baaf) 77655c77 test eax,eax 77655c79 jge kernel32!UnhandledExceptionFilter+0×2f3 (77655c82) … … … kernel32!UnhandledExceptionFilter+0×303: 77655c92 mov eax,dword ptr [ebx] 77655c94 push dword ptr [eax] 77655c96 push 0FFFFFFFFh 77655c98 call dword ptr [kernel32!_imp__NtTerminateProcess (775c14bc)]

If we look at WerpLaunchAeDebug code we would see that it calls CreateProcess too and the code is the same as in faultrep.dll. This could mean that faultrep.dll imports that function from kernel32.dll. So some postmortem debugger launching code is still present in the default unhandled exception filter perhaps for compatibility or in case WER doesn’t work or disabled.

High-level description of the differences between Windows XP and Vista application crash support is present in the recent Mark Russinovich’s article:

Inside the Windows Vista Kernel: Part 3 (Enhanced Crash Support)

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Debugging, Vista | 3 Comments »

Process and Thread Startup in Vista

May 19th, 2007

If you looked at process dumps from Vista or did live debugging you might have noticed that there are no longer kernel32 functions BaseProcessStart on the main thread stack and BaseThreadStart on subsequent thread stacks. In Vista we have ntdll!_RtlUserThreadStart which calls kernel32!BaseThreadInitThunk for both main and secondary threads:

0:002> ~*k 0 Id: 13e8.1348 Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr 0009f8d8 77b7199a ntdll!KiFastSystemCallRet 0009f8dc 77b719cd USER32!NtUserGetMessage+0xc 0009f8f8 006b24e8 USER32!GetMessageW+0x33 0009f954 006c2588 calc!WinMain+0x278 0009f9e4 77603833 calc!_initterm_e+0x1a1 0009f9f0 779ea9bd kernel32!BaseThreadInitThunk+0xe 0009fa30 00000000 ntdll!_RtlUserThreadStart+0×23 1 Id: 13e8.534 Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr 0236f9d8 77a106a0 ntdll!KiFastSystemCallRet 0236f9dc 776077d4 ntdll!NtWaitForSingleObject+0xc 0236fa4c 77607742 kernel32!WaitForSingleObjectEx+0xbe 0236fa60 006b4958 kernel32!WaitForSingleObject+0×12 0236fa78 77603833 calc!WatchDogThread+0×21 0236fa84 779ea9bd kernel32!BaseThreadInitThunk+0xe 0236fac4 00000000 ntdll!_RtlUserThreadStart+0×23 # 2 Id: 13e8.1188 Suspend: 1 Teb: 7ffdd000 Unfrozen ChildEBP RetAddr 0078fec8 77a3f0a9 ntdll!DbgBreakPoint 0078fef8 77603833 ntdll!DbgUiRemoteBreakin+0×3c 0078ff04 779ea9bd kernel32!BaseThreadInitThunk+0xe 0078ff44 00000000 ntdll!_RtlUserThreadStart+0×23

0:000> .asm no_code_bytes Assembly options: no_code_bytes 0:000> uf ntdll!_RtlUserThreadStart ... ... ... ntdll!_RtlUserThreadStart: 779ea996 push 14h 779ea998 push offset ntdll! ?? ::FNODOBFM::`string'+0xb6e (779ff108) 779ea99d call ntdll!_SEH_prolog4 (779f47d8) 779ea9a2 and dword ptr [ebp-4],0 779ea9a6 mov eax,dword ptr [ntdll!Kernel32ThreadInitThunkFunction (77a752a0)] 779ea9ab push dword ptr [ebp+0Ch] 779ea9ae test eax,eax 779ea9b0 je ntdll!_RtlUserThreadStart+0x32 (779c6326) ... ... ... 0:000> dds ntdll!Kernel32ThreadInitThunkFunction l1 77a752a0 77603821 kernel32!BaseThreadInitThunk

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Debugging, Vista | No Comments »

Process crash - getting the dump manually

May 16th, 2007

Sometimes customers have process crashes with exception dialogs but no dumps are saved due to some reason, for example, Dr. Watson limitation, NTSD doesn’t save dumps on Windows 2000, etc. One solution is to dump the process manually while it displays an error message. Customers and support engineers can use Microsoft userdump.exe for this purpose. Then looking at the dump we would see the exception because it is processed by an exception handler that either shows the error dialog or creates Windows Error Reporting process. Non-interactive services usually call NtRaiseHardError to let csrss.exe display a message. The following stack trace is from IE dump saved when WER error dialog box was shown:

0:000> k ChildEBP RetAddr 0012973c 7c59a072 NTDLL!ZwWaitForSingleObject+0xb 00129764 7c57b3e9 KERNEL32!WaitForSingleObjectEx+0x71 00129774 00401b2f KERNEL32!WaitForSingleObject+0xf 0012a238 7918cd0e IEXPLORE!DwExceptionFilter+0×284 0012a244 03a3f0c3 mscoree!__CxxUnhandledExceptionFilter+0×46 0012a250 7c59bf8d msvcr71!__CxxUnhandledExceptionFilter+0×46 0012a984 715206e0 KERNEL32!UnhandledExceptionFilter+0×140 0012ee74 71520957 BROWSEUI!BrowserProtectedThreadProc+0×64 0012fef0 71762a0a BROWSEUI!SHOpenFolderWindow+0×1ec 0012ff10 00401ecd SHDOCVW!IEWinMain+0×108 0012ff60 00401f7d IEXPLORE!WinMainT+0×2dc 0012ffc0 7c5989a5 IEXPLORE!ModuleEntry+0×97 0012fff0 00000000 KERNEL32!BaseProcessStart+0×3d

If we disassemble DwExceptionFilter we would see CreateProcess call:

0:000> ub IEXPLORE!DwExceptionFilter+0x284 IEXPLORE!DwExceptionFilter+0x263: 00401b0e call dword ptr [IEXPLORE!_imp__CreateProcessA (00401050)] 00401b14 test eax,eax 00401b16 je IEXPLORE!DwExceptionFilter+0x2f6 (00401ba1) 00401b1c mov dword ptr [ebp+7Ch],edi 00401b1f mov edi,dword ptr [IEXPLORE!_imp__WaitForSingleObject (0040104c)] 00401b25 push 4E20h 00401b2a push dword ptr [ebp+68h] 00401b2d call edi

I already described WER processes in the previous post about post-mortem debugging so I won’t cover it here.

If we run !analyze -v command we are lucky because WinDbg will find the exception for us:

... ... ... CONTEXT: 0012aa94 -- (.cxr 12aa94) eax=00000000 ebx=00000000 ecx=00000000 edx=7283e058 esi=0271a60c edi=00000000 eip=35c5f973 esp=0012ad60 ebp=0012ad7c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00010246 componentA!InternalFoo+0x21: 35c5f973 8b01 mov eax,dword ptr [ecx] ds:0023:00000000=???????? ... ... ... STACK_TEXT: 0012ad7c 35c6042f 0012ae10 00000000 35c53390 componentA!InternalFoo+0x21 0012c350 779d7d5d 00000000 001ad114 00000000 componentA!InternalBar+0x157 0012c36c 77a2310e 02b23d5c 00000020 00000004 oleaut32!DispCallFunc+0x15d 0012c3fc 35cc8b60 024d2d94 02b23d5c 00000001 oleaut32!CTypeInfo2::Invoke+0x244 ... ... ...

If you see several threads with UnhandledExceptionFilter - Multiple Exceptions pattern - you can set the exception context individually based on the first parameter of UnhandledExceptionFilter which is a pointer to _EXCEPTION_POINTERS structure and then use .cxr command:

0:000> ~*kv ... ... ... . 0 Id: 1568.68c Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr Args to Child ... ... ... 0012a984 715206e0 0012a9ac 7800bdb5 0012a9b4 KERNEL32!UnhandledExceptionFilter+0×140 (FPO: [Non-Fpo]) … … … 0:000> dt _EXCEPTION_POINTERS 0012a9ac +0×000 ep_xrecord : 0×12aa78 +0×004 ep_context : 0×12aa94 0:000> .cxr 0012aa94 eax=00000000 ebx=00000000 ecx=00000000 edx=7283e058 esi=0271a60c edi=00000000 eip=35c5f973 esp=0012ad60 ebp=0012ad7c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00010246 componentA!InternalFoo+0×21: 35c5f973 8b01 mov eax,dword ptr [ecx] ds:0023:00000000=????????

Another stack fragment comes from some Windows service and it shows the thread calling NtRaiseHardError:

0:000> ~*k ... ... ... 13 Id: 3624.16cc Suspend: 1 Teb: 7ffad000 Unfrozen ChildEBP RetAddr 0148ed40 7c821b74 ntdll!KiFastSystemCallRet 0148ed44 77e99af9 ntdll!NtRaiseHardError+0xc 0148f3dc 77e84259 kernel32!UnhandledExceptionFilter+0×54b 0148f40c 7c82eeb2 kernel32!_except_handler3+0×61 0148f430 7c82ee84 ntdll!ExecuteHandler2+0×26 0148f4d8 7c82ecc6 ntdll!ExecuteHandler+0×24 0148f4d8 7c81e215 ntdll!KiUserExceptionDispatcher+0xe 0148f7e0 76133437 ntdll!RtlLengthSecurityDescriptor+0×2a 0148f80c 7613f33d serviceA!GetObjectSize+0×1c3 0148f8d0 77c70f3b serviceA!RpcGetObjectSize+0×1b 0148f8f8 77ce23f7 rpcrt4!Invoke+0×30 0148fcf8 77ce26ed rpcrt4!NdrStubCall2+0×299 0148fd14 77c709be rpcrt4!NdrServerCall2+0×19 0148fd48 77c7093f rpcrt4!DispatchToStubInCNoAvrf+0×38 0148fd9c 77c70865 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×117 0148fdc0 77c734b1 rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3 … … …

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Tools | 2 Comments »

Interrupts and exceptions explained (Part 3)

May 15th, 2007

In Part 1 discussed interrupt processing that happens when an x86 processor executes in privileged protected mode (ring 0). It pushes interrupt frame shown in the following pseudo-code:

push EFLAGS push CS push EIP push ErrorCode EIP := IDT[VectorNumber].ExtendedOffset<<16 + IDT[VectorNumber].Offset

Please note that this is an interrupt frame that is created by CPU and not a trap frame created by a software interrupt handler to save CPU state (_KTRAP_FRAME).

If an x86 processor executes in user mode (ring 3) and an interrupt happens then the stack switch occurs before the processor saves user mode stack pointer SS:ESP and pushes the rest of the interrupt frame. Pushing both SS:RSP always happens on x64 processor regardless of the current execution mode, kernel or user. Therefore the following x86 pseudo-code shows how interrupt frame is pushed on the current stack (to be precise, on the kernel space stack if the interrupt happened in user mode):

push SS push ESP push EFLAGS push CS push EIP push ErrorCode EIP := IDT[VectorNumber].ExtendedOffset<<16 + IDT[VectorNumber].Offset

Usually CS is 0×1b and SS is 0×23 for x86 Windows flat memory model so we can easily identify this pattern on raw stack data.

Why should we care about an interrupt frame? This is because in complete full memory dumps we can see exceptions that happened in user space and were being processed at the time the dump was saved.

Let’s look at some example:

PROCESS 89a94800 SessionId: 1 Cid: 1050 Peb: 7ffd7000 ParentCid: 08a4 DirBase: 390f5000 ObjectTable: e36ee0b8 HandleCount: 168. Image: processA.exe VadRoot 8981d0a0 Vads 309 Clone 0 Private 222555. Modified 10838. Locked 0. DeviceMap e37957e0 Token e395b8f8 ElapsedTime 07:44:38.505 UserTime 00:54:52.906 KernelTime 00:00:58.109 QuotaPoolUsage[PagedPool] 550152 QuotaPoolUsage[NonPagedPool] 14200 Working Set Sizes (now,min,max) (213200, 50, 345) (852800KB, 200KB, 1380KB) PeakWorkingSetSize 227093 VirtualSize 1032 Mb PeakVirtualSize 1032 Mb PageFaultCount 232357 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 233170 DebugPort 899b6a40

We see that the process has a DebugPort and the presence of it usually shows that some exception happened. Therefore if you dump all processes by entering !process 0 1 command you can search for any unhandled exceptions.

Indeed if we switch to this process (you can also use !process 89a94800 ff command for dumps coming from XP and higher systems) we see KiDispatchException on one of the processA’s threads:

0: kd> .process 89a94800 0: kd> .reload 0: kd> !process 89a94800 ... ... ... THREAD 89a93020 Cid 1050.1054 Teb: 7ffdf000 Win32Thread: bc1da760 WAIT: (Unknown) KernelMode Non-Alertable SuspendCount 1 f44dc3a8 SynchronizationEvent Not impersonating DeviceMap e37957e0 Owning Process 89a94800 Image: processA.exe Wait Start TickCount 4244146 Ticks: 1232980 (0:05:21:05.312) Context Switch Count 1139234 LargeStack UserTime 00:54:51.0531 KernelTime 00:00:53.0937 Win32 Start Address processA!WinMainCRTStartup (0x00c534c8) Start Address kernel32!BaseProcessStartThunk (0x77e617f8) Stack Init f44dcbd0 Current f44dc2ec Base f44dd000 Limit f44d7000 Call f44dcbd8 Priority 12 BasePriority 8 PriorityDecrement 2 ChildEBP RetAddr f44dc304 8083d5b1 nt!KiSwapContext+0x26 f44dc330 8083df9e nt!KiSwapThread+0x2e5 f44dc378 809c3cff nt!KeWaitForSingleObject+0x346 f44dc458 809c4f09 nt!DbgkpQueueMessage+0x178 f44dc47c 80977ad9 nt!DbgkpSendApiMessage+0x45 f44dc508 8081a94f nt!DbgkForwardException+0x90 f44dc8c4 808346b4 nt!KiDispatchException+0×1ea f44dc92c 80834650 nt!CommonDispatchException+0×4a f44dc9b8 80a801ae nt!Kei386EoiHelper+0×16e 0012f968 0046915d hal!HalpDispatchSoftwareInterrupt+0×5e 0012f998 0047cb72 processA!CalculateClientSizeFromPoint+0×5f 0012f9bc 0047cc1d processA!CalculateFromPoint+0×30 0012fa64 0047de83 processA!DrawUsingMemDC+0×1b9 0012fac0 0099fb43 processA!OnDraw+0×13 0012fb5c 7c17332d processA!OnPaint+0×56 0012fbe8 7c16e0b0 MFC71!CWnd::OnWndMsg+0×340 0012fc08 00c6253a MFC71!CWnd::WindowProc+0×22 0012fc24 0096cf9d processA!WindowProc+0×38 0012fcb8 7c16e1b8 MFC71!AfxCallWndProc+0×91 0012fcd8 7c16e1f6 MFC71!AfxWndProc+0×46 0012fd04 7739b6e3 MFC71!AfxWndProcBase+0×39 0012fd30 7739b874 USER32!InternalCallWinProc+0×28 0012fda8 7739c8b8 USER32!UserCallWinProcCheckWow+0×151 0012fe04 7739c9c6 USER32!DispatchClientMessage+0xd9 0012fe2c 7c828536 USER32!__fnDWORD+0×24 0012fe2c 80832dee ntdll!KiUserCallbackDispatcher+0×2e f44dcbf0 8092d605 nt!KiCallUserMode+0×4 f44dcc48 bf8a26d3 nt!KeUserModeCallback+0×8f f44dcccc bf89e985 win32k!SfnDWORD+0xb4 f44dcd0c bf89eb27 win32k!xxxDispatchMessage+0×223 f44dcd58 80833bdf win32k!NtUserDispatchMessage+0×4c f44dcd58 7c8285ec nt!KiFastCallEntry+0xfc 0012fe2c 7c828536 ntdll!KiFastSystemCallRet 0012fe58 7739c57b ntdll!KiUserCallbackDispatcher+0×2e 0012fea8 773a16e5 USER32!NtUserDispatchMessage+0xc 0012feb8 7c169076 USER32!DispatchMessageA+0xf 0012fec8 7c16913e MFC71!AfxInternalPumpMessage+0×3e 0012fee4 0041cb0b MFC71!CWinThread::Run+0×54 0012ff08 7c172fc5 processA!CMain::Run+0×3b 0012ff18 00c5364d MFC71!AfxWinMain+0×68 0012ffc0 77e6f23b processA!WinMainCRTStartup+0×185 0012fff0 00000000 kernel32!BaseProcessStart+0×23

You might think that exception happened in CalculateClientSizeFromPoint function. However there is no nt!KiTrapXXX call and hal!HalpDispatchSoftwareInterrupt has user space return address and this looks suspicious. So we need to look at raw stack data and find our interrupt frame. We look for KiDispatchException, then for KiTrap substring and finally for 0000001b. If 0000001b and 00000023 are separated by 2 double words then we have found out interrupt frame:

0: kd> .thread 89a93020 Implicit thread is now 89a93020 0: kd> dds esp esp+1000 ... ... ... f44dc2f8 f44dc330 f44dc2fc 89a93098 f44dc300 ffdff120 f44dc304 89a93020 f44dc308 8083d5b1 nt!KiSwapThread+0x2e5 f44dc30c 89a93020 f44dc310 89a930c8 f44dc314 00000000 ... ... ... f44dc4e8 f44dcc38 f44dc4ec 8083a8cc nt!_except_handler3 f44dc4f0 80870868 nt!`string'+0xa4 f44dc4f4 ffffffff f44dc4f8 80998bfd nt!Ki386CheckDivideByZeroTrap+0x273 f44dc4fc 8083484f nt!KiTrap00+0x88 f44dc500 00000001 f44dc504 0000bb40 f44dc508 f44dc8c4 f44dc50c 8081a94f nt!KiDispatchException+0×1ea f44dc510 f44dc8e0 f44dc514 00000001 f44dc518 00000000 f44dc51c 00469583 processA!LPtoDP+0×19 f44dc520 16b748f0 f44dc524 00469583 processA!LPtoDP+0×19 f44dc528 00000000 f44dc52c 00000000 … … … f44dc8c0 ffffffff f44dc8c4 f44dc934 f44dc8c8 808346b4 nt!CommonDispatchException+0×4a f44dc8cc f44dc8e0 f44dc8d0 00000000 f44dc8d4 f44dc934 f44dc8d8 00000001 f44dc8dc 00000001 f44dc8e0 c0000094 f44dc8e4 00000000 f44dc8e8 00000000 f44dc8ec 00469583 processA!LPtoDP+0×19 f44dc8f0 00000000 f44dc8f4 808a3988 nt!KiAbiosPresent+0×4 f44dc8f8 ffffffff f44dc8fc 0000a6f2 f44dc900 00469585 processA!LPtoDP+0×1b f44dc904 00000004 f44dc908 00000000 f44dc90c f9000001 f44dc910 f44dc8dc f44dc914 ffffffff f44dc918 f44dcc38 f44dc91c 8083a8cc nt!_except_handler3 f44dc920 80870868 nt!`string’+0xa4 f44dc924 ffffffff f44dc928 80998bfd nt!Ki386CheckDivideByZeroTrap+0×273 f44dc92c 8083484f nt!KiTrap00+0×88 f44dc930 80834650 nt!Kei386EoiHelper+0×16e f44dc934 0012f968 f44dc938 00469583 processA!LPtoDP+0×19 f44dc93c badb0d00 f44dc940 00000000 f44dc944 ffffffff f44dc948 00007fff f44dc94c 00000000 f44dc950 fffff800 f44dc954 ffffffff f44dc958 00007fff f44dc95c 00000000 f44dc960 00000000 f44dc964 80a80000 hal!HalpInitIrqlAuditFlag+0×4e f44dc968 00000023 f44dc96c 00000023 f44dc970 00000000 f44dc974 00000000 f44dc978 00005334 f44dc97c 00000001 f44dc980 f44dcc38 f44dc984 0000003b f44dc988 16b748f0 f44dc98c 16b748f0 f44dc990 0012f9fc f44dc994 0012f968 f44dc998 00000000 ; ErrorCode f44dc99c 00469583 processA!LPtoDP+0×19 ; EIP f44dc9a0 0000001b ; CS f44dc9a4 00010246 ; EFLAGS f44dc9a8 0012f934 ; ESP f44dc9ac 00000023 ; SS f44dc9b0 8982e7e0 f44dc9b4 00000000

Why did we skip the first KiTrap00? Because KiDispatchException is called after KiTrap00 so we should see it before KiTrap00 on raw stack. To see all these calls we can disassemble return addresses:

0: kd> .asm no_code_bytes Assembly options: no_code_bytes 0: kd> ub nt!KiTrap00+0x88 nt!KiTrap00+0x74: 8083483b test byte ptr [ebp+6Ch],1 8083483f je nt!KiTrap00+0x81 (80834848) 80834841 cmp word ptr [ebp+6Ch],1Bh 80834846 jne nt!KiTrap00+0x9e (80834865) 80834848 sti 80834849 push ebp 8083484a call nt!Ki386CheckDivideByZeroTrap (8099897d) 8083484f mov ebx,dword ptr [ebp+68h]

nt!KiTrap00+0×88 is not equal to nt!KiTrap00+0×74 so we have OMAP code optimization case here and we have to disassemble raw addresses as seen on the raw stack fragment repeated here:

... ... ... f44dc8c8 808346b4 nt!CommonDispatchException+0x4a ... ... ... f44dc924 ffffffff f44dc928 80998bfd nt!Ki386CheckDivideByZeroTrap+0x273 f44dc92c 8083484f nt!KiTrap00+0×88 f44dc930 80834650 nt!Kei386EoiHelper+0×16e f44dc934 0012f968 … … …

0: kd> u 8083484f nt!KiTrap00+0×88: 8083484f mov ebx,dword ptr [ebp+68h] 80834852 jmp nt!Kei386EoiHelper+0×167 (80834649) 80834857 sti 80834858 mov ebx,dword ptr [ebp+68h] 8083485b mov eax,0C0000094h 80834860 jmp nt!Kei386EoiHelper+0×167 (80834649) 80834865 mov ebx,dword ptr fs:[124h] 8083486c mov ebx,dword ptr [ebx+38h] 0: kd> u 80834649 nt!Kei386EoiHelper+0×167: 80834649 xor ecx,ecx 8083464b call nt!CommonDispatchException (8083466a) 80834650 xor edx,edx ; nt!Kei386EoiHelper+0×16e 80834652 mov ecx,1 80834657 call nt!CommonDispatchException (8083466a) 8083465c xor edx,edx 8083465e mov ecx,2 80834663 call nt!CommonDispatchException (8083466a) 0: kd> ub 808346b4 nt!CommonDispatchException+0×38: 808346a2 mov eax,dword ptr [ebp+6Ch] 808346a5 and eax,1 808346a8 push 1 808346aa push eax 808346ab push ebp 808346ac push 0 808346ae push ecx 808346af call nt!KiDispatchException (80852a53)

So we see that KiTrap00 calls CommonDispatchException which calls KiDispatchException. If we look at our found interrupt frame we see that EIP of the exception was 00469583 and ESP was 0012f934:

... ... ... f44dc998 00000000 ; ErrorCode f44dc99c 00469583 processA!LPtoDP+0×19 ; EIP f44dc9a0 0000001b ; CS f44dc9a4 00010246 ; EFLAGS f44dc9a8 0012f934 ; ESP f44dc9ac 00000023 ; SS … … …

Now we try to reconstruct stack trace by putting the values of ESP and EIP:

0: kd> k L=0012f934 0012f934 00469583 ; EBP ESP EIP format ChildEBP RetAddr 0012f930 00469a16 processA!LPtoDP+0x19 0012f934 00000000 processA!GetColumnWidth+0x45

Stack trace doesn’t look good, there is neither BaseProcessStart nor BaseThreadStart, perhaps because we specified ESP value twice instead of EBP and ESP. Let’s hope to find EBP value by dumping the memory around ESP:

0: kd> dds 0012f934-10 0012f934+100 0012f924 00000000 0012f928 0012f934 ; the same as ESP 0012f92c 0012f968 ; looks good to us 0012f930 00469572 processA!LPtoDP+0×8 0012f934 00469a16 processA!GetColumnWidth+0×45 0012f938 00005334 … … … 0012f964 00005334 0012f968 0012f998 0012f96c 0046915d processA!CalculateClientSizeFromPoint+0×5f 0012f970 00000000 0012f974 0012f9fc 0012f978 16b748f0 0012f97c 0012fa48 0012f980 00000000 0012f984 00000000 0012f988 000003a0 0012f98c 00000237 0012f990 00000014 0012f994 00000000 0012f998 0012f9bc 0012f99c 0047cb72 processA!CalculateFromPoint+0×30 0012f9a0 0012f9fc 0012f9a4 0012f9b4 0012f9a8 0012fa48 … … …

So finally we get our stack trace:

0: kd> k L=0012f968 0012f934 00469583 100 ChildEBP RetAddr 0012f930 00469a16 processA!LPtoDP+0x19 0012f968 0046915d processA!GetColumnWidth+0x45 0012f998 0047cb72 processA!CalculateClientSizeFromPoint+0x5f 0012f9bc 0047cc1d processA!CalculateFromPoint+0x30 0012fa64 0047de83 processA!DrawUsingMemDC+0x1b9 0012fac0 0099fb43 processA!OnDraw+0x13 0012fb5c 7c17332d processA!OnPaint+0x56 0012fbe8 7c16e0b0 MFC71!CWnd::OnWndMsg+0x340 0012fc08 00c6253a MFC71!CWnd::WindowProc+0x22 0012fc24 0096cf9d processA!WindowProc+0x38 0012fcb8 7c16e1b8 MFC71!AfxCallWndProc+0x91 0012fcd8 7c16e1f6 MFC71!AfxWndProc+0x46 0012fd04 7739b6e3 MFC71!AfxWndProcBase+0x39 0012fd30 7739b874 USER32!InternalCallWinProc+0x28 0012fda8 7739c8b8 USER32!UserCallWinProcCheckWow+0x151 0012fe04 7739c9c6 USER32!DispatchClientMessage+0xd9 0012fe2c 7c828536 USER32!__fnDWORD+0x24 0012fe2c 80832dee ntdll!KiUserCallbackDispatcher+0x2e f44dcbf0 8092d605 nt!KiCallUserMode+0x4 f44dcc48 bf8a26d3 nt!KeUserModeCallback+0x8f f44dcccc bf89e985 win32k!SfnDWORD+0xb4 f44dcd0c bf89eb27 win32k!xxxDispatchMessage+0x223 f44dcd58 80833bdf win32k!NtUserDispatchMessage+0x4c f44dcd58 7c8285ec nt!KiFastCallEntry+0xfc 0012fe2c 7c828536 ntdll!KiFastSystemCallRet 0012fe58 7739c57b ntdll!KiUserCallbackDispatcher+0x2e 0012fea8 773a16e5 USER32!NtUserDispatchMessage+0xc 0012feb8 7c169076 USER32!DispatchMessageA+0xf 0012fec8 7c16913e MFC71!AfxInternalPumpMessage+0x3e 0012fee4 0041cb0b MFC71!CWinThread::Run+0x54 0012ff08 7c172fc5 processA!CMain::Run+0x3b 0012ff18 00c5364d MFC71!AfxWinMain+0x68 0012ffc0 77e6f23b processA!WinMainCRTStartup+0x185 0012fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Debugging, Hardware | No Comments »

Crash Dump Analysis Patterns (Part 14)

May 11th, 2007

The next pattern is Spiking Thread. If you have a process dump with many threads from a customer it is sometimes difficult to see which thread there was spiking CPU, that’s why it is always good to have some screenshots or notes from QSlice or Process Explorer showing spiking thread ID and process ID. The latter ID is to make sure that the process dump was from the correct process. New process dumpers and tools from Microsoft (userdump.exe, for example) save thread time information so you can open the dump and see the time spent in kernel and user mode for any thread by entering !runaway command. However if that command shows many threads with similar CPU consumption it will not highlight the particular thread that was spiking at the time the crash dump was saved so screenshots are still useful in some cases.

What to do if you don’t have spiking thread ID? Look at all threads and find those that are not waiting. Almost all threads are waiting most of the time. So the chances to dump the normal process and see some active threads are very low. If the thread is waiting the top function on its stack usually is (for XP/W2K3/Vista):

ntdll!KiFastSystemCallRet

and below it you could see some blocking calls waiting for some synchronization object, Sleep API call, IO completion or for LPC reply:

0:085> ~*kv ... ... ... 64 Id: 1b0.120c Suspend: -1 Teb: 7ff69000 Unfrozen ChildEBP RetAddr Args to Child 02defe18 7c90e399 ntdll!KiFastSystemCallRet 02defe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc 02deff80 77e76c22 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4 02deff88 77e76a3b rpcrt4!RecvLotsaCallsWrapper+0xd 02deffa8 77e76c0a rpcrt4!BaseCachedThreadRoutine+0×79 02deffb4 7c80b683 rpcrt4!ThreadStartRoutine+0×1a 02deffec 00000000 kernel32!BaseThreadStart+0×37

65 Id: 1b0.740 Suspend: -1 Teb: 7ff67000 Unfrozen ChildEBP RetAddr Args to Child 02edff44 7c90d85c ntdll!KiFastSystemCallRet 02edff48 7c8023ed ntdll!NtDelayExecution+0xc 02edffa0 57cde2dd kernel32!SleepEx+0×61 02edffb4 7c80b683 component!foo+0×35 02edffec 00000000 kernel32!BaseThreadStart+0×37

66 Id: 1b0.131c Suspend: -1 Teb: 7ff66000 Unfrozen ChildEBP RetAddr Args to Child 02f4ff38 7c90e9c0 ntdll!KiFastSystemCallRet 02f4ff3c 7c8025cb ntdll!ZwWaitForSingleObject+0xc 02f4ffa0 72001f65 kernel32!WaitForSingleObjectEx+0xa8 02f4ffb4 7c80b683 component!WorkerThread+0×15 02f4ffec 00000000 kernel32!BaseThreadStart+0×37

67 Id: 1b0.1320 Suspend: -1 Teb: 7ff65000 Unfrozen ChildEBP RetAddr Args to Child 02f8fe1c 7c90e9ab ntdll!KiFastSystemCallRet 02f8fe20 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc 02f8febc 7e4195f9 kernel32!WaitForMultipleObjectsEx+0×12c 02f8ff18 7e4196a8 user32!RealMsgWaitForMultipleObjectsEx+0×13e 02f8ff34 720019f6 user32!MsgWaitForMultipleObjects+0×1f 02f8ffa0 72001a29 component!bar+0xd9 02f8ffb4 7c80b683 component!MonitorWorkerThread+0×11 02f8ffec 00000000 kernel32!BaseThreadStart+0×37

68 Id: 1b0.1340 Suspend: -1 Teb: 7ff63000 Unfrozen ChildEBP RetAddr Args to Child 0301ff1c 7c90e31b ntdll!KiFastSystemCallRet 0301ff20 7c80a746 ntdll!ZwRemoveIoCompletion+0xc 0301ff4c 57d46e65 kernel32!GetQueuedCompletionStatus+0×29 0301ffb4 7c80b683 component!AsyncEventsThread+0×91 0301ffec 00000000 kernel32!BaseThreadStart+0×37 … … … # 85 Id: 1b0.17b4 Suspend: -1 Teb: 7ffd4000 Unfrozen ChildEBP RetAddr Args to Child 00daffc8 7c9507a8 ntdll!DbgBreakPoint 00dafff4 00000000 ntdll!DbgUiRemoteBreakin+0×2d

Therefore if you have a different thread like this one below the chances that it was spiking are big:

58 Id: 1b0.9f4 Suspend: -1 Teb: 7ff75000 Unfrozen ChildEBP RetAddr Args to Child 0280f64c 500af723 componentB!DoSomething+32 0280f85c 500b5391 componentB!CheckSomething+231 0280f884 500b7a3f componentB!ProcessWorkIteme+9f 0301ffec 00000000 kernel32!BaseThreadStart+0x37

There is no KiFastSystemCallRet on top and if we look at the currently executing instruction it indeed does some copy operation:

0:085> ~58r eax=00000000 ebx=0280fdd4 ecx=0000005f edx=00000000 esi=03d30444 edi=0280f6dc eip=500a4024 esp=0280f644 ebp=0280f64c iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 componentB!DoSomething+32: 500a4024 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:0280f6dc=00000409 ds:0023:03d30444=00000409

In a kernel or a complete memory dump you can see spikes by checking KernelTime and UserTime:

0: kd> !thread 88b66768 THREAD 88b66768 Cid 01fc.1550 Teb: 7ffad000 Win32Thread: bc18f240 RUNNING on processor 1 IRP List: 89716008: (0006,0094) Flags: 00000a00 Mdl: 00000000 Impersonation token: e423a030 (Level Impersonation) DeviceMap e3712480 Owning Process 8a0a56a0 Image: SomeSvc.exe Wait Start TickCount 1782229 Ticks: 0 Context Switch Count 877610 LargeStack UserTime 00:00:01.0078 KernelTime 02:23:21.0718

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns | 24 Comments »

Crash Dump Analysis Patterns (Part 13a)

May 9th, 2007

Insufficient Memory pattern can be seen in many complete and kernel memory dumps. This condition could cause the system to crash, become slow, hang or refuse to provide the expected functionality, for example, refuse new terminal server connections. There are many types of memory resources and we can classify them initially into the following categories:

Committed memory
Virtual memory
- Kernel space
  - Paged pool
  - Non-paged pool
  - Session pool
  - PTE limits
  - Desktop heap
  - GDI limits
- User space
  - Virtual regions
  - Process heap

We will talk about all of them in separate parts. What I outline in this part is committed memory exhaustion. Committed memory is an allocated memory backed up by some physical memory or by a reserved space in the page file. Reserving the space needs to be done in case OS wants to swap out that memory’s data to disk if it is not currently used and there is no physical memory available for other processes. If that data is needed again OS brings it back to physical memory. If there is no space in the page file then physical memory is filled up. If committed memory is exhausted most likely the system will hang or result in a bugcheck soon so checking memory statistics shall always be done when you get a kernel or a complete memory dump. Even access violation bugchecks could result from insufficient memory when some memory allocation operation failed but a kernel mode component didn’t check the return value for NULL. Here is an example:

BugCheck 8E, {c0000005, 809203af, aa647c0c, 0}

0: kd> !analyze -v ... ... ... TRAP_FRAME: aa647c0c -- (.trap ffffffffaa647c0c) ... ... ...

0: kd> .trap ffffffffaa647c0c ErrCode = 00000000 eax=00000000 ebx=bc1f3cfc ecx=89589250 edx=000018c1 esi=bc1f3ce0 edi=aa647d14 eip=809203af esp=aa647c80 ebp=aa647c80 iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 nt!SeTokenType+0x8: 809203af 8b8080000000 mov eax,dword ptr [eax+80h] ds:0023:00000080=????????

0: kd> k ChildEBP RetAddr aa647c80 bf8173c5 nt!SeTokenType+0x8 aa647cdc bf81713b win32k!GreGetSpoolMessage+0xb0 aa647d4c 80834d3f win32k!NtGdiGetSpoolMessage+0x96 aa647d4c 7c82ed54 nt!KiFastCallEntry+0xfc

If we enter !vm command to display memory statistics we would see that all committed memory is filled up:

0: kd> !vm *** Virtual Memory Usage *** Physical Memory: 999294 ( 3997176 Kb) Page File: \??\C:\pagefile.sys Current: 4193280 Kb Free Space: 533744 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Available Pages: 18698 ( 74792 Kb) ResAvail Pages: 865019 ( 3460076 Kb) Locked IO Pages: 290 ( 1160 Kb) Free System PTEs: 155265 ( 621060 Kb) Free NP PTEs: 32766 ( 131064 Kb) Free Special NP: 0 ( 0 Kb) Modified Pages: 113 ( 452 Kb) Modified PF Pages: 61 ( 244 Kb) NonPagedPool Usage: 12380 ( 49520 Kb) NonPagedPool Max: 64799 ( 259196 Kb) PagedPool 0 Usage: 40291 ( 161164 Kb) PagedPool 1 Usage: 2463 ( 9852 Kb) PagedPool 2 Usage: 2455 ( 9820 Kb) PagedPool 3 Usage: 2453 ( 9812 Kb) PagedPool 4 Usage: 2488 ( 9952 Kb) PagedPool Usage: 50150 ( 200600 Kb) PagedPool Maximum: 67584 ( 270336 Kb)

********** 18 pool allocations have failed **********

Shared Commit: 87304 ( 349216 Kb) Special Pool: 0 ( 0 Kb) Shared Process: 56241 ( 224964 Kb) PagedPool Commit: 50198 ( 200792 Kb) Driver Commit: 1892 ( 7568 Kb) Committed pages: 2006945 ( 8027780 Kb) Commit limit: 2008205 ( 8032820 Kb)

********** 1216024 commit requests have failed **********

There might have been a memory leak or too many terminal sessions with fat applications to fit in physical memory and the page file. Actually for that particular case there were both.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns | 3 Comments »

Using SSSL principle to design support tools

May 6th, 2007

Start, Stop and Send Log (SSSL) principle was applied to design of many tools created in Citrix for troubleshooting customer issues. My favorite examples are WindowHistory, MessageHistory, and ScreenHistory.

In a typical scenario, a customer has a GUI issue, downloads a tool, runs it, clicks on a “Start” button, tries to reproduce the issue, clicks on a “Stop” button and then sends a log either saved in a file or copied from the tool’s dialog. Simple action, no need for customers to become familiar with colored and complex GUI tools especially when an issue is hot and there is no time to play. Upon the arrival of the log file a skilled engineer can analyze it and provide further recommendations. Of course, SSSL tools must record all information that would be necessary to look at during analysis.

SSSL principle promotes agile iterative and incremental development of tools. Because it is difficult to know in advance about all information necessary to record, the first version usually implements logging of essential data only and what is missing in other available tools. The GUI is very simple and this shortens the development time. Later, after support incidents and analysis of their logs, it becomes apparent that more information needs to be recorded and then the second version is born, etc.

SSSL principle also promotes design and implementation reuse. Because GUI and certain operations are the same you can reuse common source code.

Finally after some time when more and more SSSL tools appear you can refactor them into a unified SSSL framework where individual tools become pluggable components. This is what is coming this summer: GUI History Monitor that will combine all separate SSSL “History” tools together.

- Dmitry Vostokov -

Posted in Announcements, Software Architecture, Tools | No Comments »

Interrupts and exceptions explained (Part 2)

May 5th, 2007

As I promised in Part 1, I describe changes in 64-bit Windows. The size of IDTR is 10 bytes where 8 bytes hold 64-bit address of IDT. The size of IDT entry is 16 bytes and it holds the address of an interrupt procedure corresponding to an interrupt vector. However interrupt procedure names are different in x64 Windows, they do not follow the same pattern like KiTrapXX.

The following UML class diagram describes the relationship and also shows what registers are saved. In native x64 mode SS and RSP are saved regardless of kernel or user mode.

Let’s dump all architecture-defined interrupt procedure names. This is a good exercise because we will use scripting. !pcr extension reports wrong IDT base so we use dt command:

kd> !pcr 0 KPCR for Processor 0 at fffff80001176000: Major 1 Minor 1 NtTib.ExceptionList: fffff80000124000 NtTib.StackBase: fffff80000125070 NtTib.StackLimit: 0000000000000000 NtTib.SubSystemTib: fffff80001176000 NtTib.Version: 0000000001176180 NtTib.UserPointer: fffff800011767f0 NtTib.SelfTib: 000000007ef95000 SelfPcr: 0000000000000000 Prcb: fffff80001176180 Irql: 0000000000000000 IRR: 0000000000000000 IDR: 0000000000000000 InterruptMode: 0000000000000000 IDT: 0000000000000000 GDT: 0000000000000000 TSS: 0000000000000000 CurrentThread: fffffadfe669f890 NextThread: 0000000000000000 IdleThread: fffff8000117a300 DpcQueue: kd> dt _KPCR fffff80001176000 nt!_KPCR +0×000 NtTib : _NT_TIB +0×000 GdtBase : 0xfffff800`00124000 _KGDTENTRY64 +0×008 TssBase : 0xfffff800`00125070 _KTSS64 +0×010 PerfGlobalGroupMask : (null) +0×018 Self : 0xfffff800`01176000 _KPCR +0×020 CurrentPrcb : 0xfffff800`01176180 _KPRCB +0×028 LockArray : 0xfffff800`011767f0 _KSPIN_LOCK_QUEUE +0×030 Used_Self : 0×00000000`7ef95000 +0×038 IdtBase : 0xfffff800`00124070 _KIDTENTRY64 +0×040 Unused : [2] 0 +0×050 Irql : 0 ” +0×051 SecondLevelCacheAssociativity : 0×10 ” +0×052 ObsoleteNumber : 0 ” +0×053 Fill0 : 0 ” +0×054 Unused0 : [3] 0 +0×060 MajorVersion : 1 +0×062 MinorVersion : 1 +0×064 StallScaleFactor : 0×892 +0×068 Unused1 : [3] (null) +0×080 KernelReserved : [15] 0 +0×0bc SecondLevelCacheSize : 0×100000 +0×0c0 HalReserved : [16] 0×82c5c880 +0×100 Unused2 : 0 +0×108 KdVersionBlock : 0xfffff800`01174ca0 +0×110 Unused3 : (null) +0×118 PcrAlign1 : [24] 0 +0×180 Prcb : _KPRCB

Next we dump the first entry of IDT array and glue together OffsetHigh, OffsetMiddle and OffsetLow fields to form the interrupt procedure address corresponding to the interrupt vector 0, divide by zero exception:

kd> dt _KIDTENTRY64 0xfffff800`00124070 nt!_KIDTENTRY64 +0x000 OffsetLow : 0xf240 +0x002 Selector : 0x10 +0x004 IstIndex : 0y000 +0x004 Reserved0 : 0y00000 (0) +0x004 Type : 0y01110 (0xe) +0x004 Dpl : 0y00 +0x004 Present : 0y1 +0×006 OffsetMiddle : 0×103 +0×008 OffsetHigh : 0xfffff800 +0×00c Reserved1 : 0 +0×000 Alignment : 0×1038e00`0010f240

kd> u 0xfffff8000103f240 nt!KiDivideErrorFault: fffff800`0103f240 4883ec08 sub rsp,8 fffff800`0103f244 55 push rbp fffff800`0103f245 4881ec58010000 sub rsp,158h fffff800`0103f24c 488dac2480000000 lea rbp,[rsp+80h] fffff800`0103f254 c645ab01 mov byte ptr [rbp-55h],1 fffff800`0103f258 488945b0 mov qword ptr [rbp-50h],rax fffff800`0103f25c 48894db8 mov qword ptr [rbp-48h],rcx fffff800`0103f260 488955c0 mov qword ptr [rbp-40h],rdx kd> ln 0xfffff8000103f240 (fffff800`0103f240) nt!KiDivideErrorFault | (fffff800`0103f300) nt!KiDebugTrapOrFault Exact matches: nt!KiDivideErrorFault = <no type information>

We see that the name of the procedure is KiDivideErrorFault and not KiTrap00. We can dump the second IDT entry manually by adding a 0×10 offset but in order to automate this I wrote the following WinDbg script to dump the first 20 vectors and get their interrupt procedure names:

r? $t0=(_KIDTENTRY64 *)0xfffff800`00124070; .for (r $t1=0; @$t1 <= 13; r? $t0=(_KIDTENTRY64 *)@$t0+1) { .printf “Interrupt vector %d (0x%x):\n”, @$t1, @$t1; ln @@c++(@$t0->OffsetHigh*0×100000000 + @$t0->OffsetMiddle*0×10000 + @$t0->OffsetLow); r $t1=$t1+1 }

Here is the same script but formatted:

r? $t0=(_KIDTENTRY64 *)0xfffff800`00124070; .for (r $t1=0; @$t1 <= 13; r? $t0=(_KIDTENTRY64 *)@$t0+1) { .printf "Interrupt vector %d (0x%x):\n", @$t1, @$t1; ln @@c++(@$t0->OffsetHigh*0x100000000 + @$t0->OffsetMiddle*0x10000 + @$t0->OffsetLow); r $t1=$t1+1 }

The output on my system is:

Interrupt vector 0 (0x0): (fffff800`0103f240) nt!KiDivideErrorFault | (fffff800`0103f300) nt!KiDebugTrapOrFault Exact matches: nt!KiDivideErrorFault = <no type information> Interrupt vector 1 (0×1): (fffff800`0103f300) nt!KiDebugTrapOrFault | (fffff800`0103f440) nt!KiNmiInterrupt Exact matches: nt!KiDebugTrapOrFault = <no type information> Interrupt vector 2 (0×2): (fffff800`0103f440) nt!KiNmiInterrupt | (fffff800`0103f680) nt!KxNmiInterrupt Exact matches: nt!KiNmiInterrupt = <no type information> Interrupt vector 3 (0×3): (fffff800`0103f780) nt!KiBreakpointTrap | (fffff800`0103f840) nt!KiOverflowTrap Exact matches: nt!KiBreakpointTrap = <no type information> Interrupt vector 4 (0×4): (fffff800`0103f840) nt!KiOverflowTrap | (fffff800`0103f900) nt!KiBoundFault Exact matches: nt!KiOverflowTrap = <no type information> Interrupt vector 5 (0×5): (fffff800`0103f900) nt!KiBoundFault | (fffff800`0103f9c0) nt!KiInvalidOpcodeFault Exact matches: nt!KiBoundFault = <no type information> Interrupt vector 6 (0×6): (fffff800`0103f9c0) nt!KiInvalidOpcodeFault | (fffff800`0103fb80) nt!KiNpxNotAvailableFault Exact matches: nt!KiInvalidOpcodeFault = <no type information> Interrupt vector 7 (0×7): (fffff800`0103fb80) nt!KiNpxNotAvailableFault | (fffff800`0103fc40) nt!KiDoubleFaultAbort Exact matches: nt!KiNpxNotAvailableFault = <no type information> Interrupt vector 8 (0×8): (fffff800`0103fc40) nt!KiDoubleFaultAbort | (fffff800`0103fd00) nt!KiNpxSegmentOverrunAbort Exact matches: nt!KiDoubleFaultAbort = <no type information> Interrupt vector 9 (0×9): (fffff800`0103fd00) nt!KiNpxSegmentOverrunAbort | (fffff800`0103fdc0) nt!KiInvalidTssFault Exact matches: nt!KiNpxSegmentOverrunAbort = <no type information> Interrupt vector 10 (0xa): (fffff800`0103fdc0) nt!KiInvalidTssFault | (fffff800`0103fe80) nt!KiSegmentNotPresentFault Exact matches: nt!KiInvalidTssFault = <no type information> Interrupt vector 11 (0xb): (fffff800`0103fe80) nt!KiSegmentNotPresentFault | (fffff800`0103ff80) nt!KiStackFault Exact matches: nt!KiSegmentNotPresentFault = <no type information> Interrupt vector 12 (0xc): (fffff800`0103ff80) nt!KiStackFault | (fffff800`01040080) nt!KiGeneralProtectionFault Exact matches: nt!KiStackFault = <no type information> Interrupt vector 13 (0xd): (fffff800`01040080) nt!KiGeneralProtectionFault | (fffff800`01040180) nt!KiPageFault Exact matches: nt!KiGeneralProtectionFault = <no type information> Interrupt vector 14 (0xe): (fffff800`01040180) nt!KiPageFault | (fffff800`010404c0) nt!KiFloatingErrorFault Exact matches: nt!KiPageFault = <no type information> Interrupt vector 15 (0xf): (fffff800`01179090) nt!KxUnexpectedInterrupt0+0xf0 | (fffff800`0117a0c0) nt!KiNode0 Interrupt vector 16 (0×10): (fffff800`010404c0) nt!KiFloatingErrorFault | (fffff800`01040600) nt!KiAlignmentFault Exact matches: nt!KiFloatingErrorFault = <no type information> Interrupt vector 17 (0×11): (fffff800`01040600) nt!KiAlignmentFault | (fffff800`010406c0) nt!KiMcheckAbort Exact matches: nt!KiAlignmentFault = <no type information> Interrupt vector 18 (0×12): (fffff800`010406c0) nt!KiMcheckAbort | (fffff800`01040900) nt!KxMcheckAbort Exact matches: nt!KiMcheckAbort = <no type information> Interrupt vector 19 (0×13): (fffff800`01040a00) nt!KiXmmException | (fffff800`01040b80) nt!KiRaiseAssertion Exact matches: nt!KiXmmException = <no type information>

Let’s look at some dump.

BugCheck 1E, {ffffffffc0000005, fffffade5ba2d643, 0, 28}

This is KMODE_EXCEPTION_NOT_HANDLED and obviously it could have been invalid memory access. And indeed the stack WinDbg shows after opening a dump and entering !analyze -v command is:

RSP fffffade`4e88fe68 nt!KeBugCheckEx fffffade`4e88fe70 nt!KiDispatchException+0x128 fffffade`4e8905f0 nt!KiPageFault+0x1e1 fffffade`4e890780 driver!foo+0x9b

and the exception context is:

2: kd> r Last set context: rax=fffffade4e8907f4 rbx=fffffade6de0c2e0 rcx=fffffa8014412000 rdx=fffffade71e7e2ac rsi=0000000000000000 rdi=fffffadffff03000 rip=fffffade5ba2d643 rsp=fffffade4e890780 rbp=fffffade71e7ffff r8=00000000000005b0 r9=fffffade4e890a88 r10=fffffadffd077898 r11=fffffade71e7e260 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl zr na po nc cs=0010 ss=0018 ds=0950 es=4e89 fs=fade gs=ffff efl=00010246 driver!foo+0×9b: fffffade`5ba2d643 8b4e28 mov ecx,dword ptr [rsi+28h] ds:0950:0028=????????

We see that KiPageFault was called and from the dumped IDT we see it corresponds to the interrupt vector 14 (0xE) which is called on any memory reference that is not present in physical memory.

Now I’m going to dump the raw stack around fffffade`4e890780 to see our the values the processor saved before calling KiPageFault:

2: kd> dps fffffade`4e890780-50 fffffade`4e890780+50 fffffade`4e890730 fffffade`6de0c2e0 fffffade`4e890738 fffffadf`fff03000 fffffade`4e890740 00000000`00000000 fffffade`4e890748 fffffade`71e7ffff fffffade`4e890750 00000000`00000000 ; ErrorCode fffffade`4e890758 fffffade`5ba2d643 driver!foo+0×9b ; RIP fffffade`4e890760 00000000`00000010 ; CS fffffade`4e890768 00000000`00010246 ; RFLAGS fffffade`4e890770 fffffade`4e890780 ; RSP fffffade`4e890778 00000000`00000018 ; SS fffffade`4e890780 00000000`00000000 ; RSP before interrupt fffffade`4e890788 00000000`00000000 fffffade`4e890790 00000000`00000000 fffffade`4e890798 00000000`00000000 fffffade`4e8907a0 00000000`00000000 fffffade`4e8907a8 00000000`00000000 fffffade`4e8907b0 00000000`00000000 fffffade`4e8907b8 00000000`00000000 fffffade`4e8907c0 00000000`00000000 fffffade`4e8907c8 00000000`00000000 fffffade`4e8907d0 00000000`00000000

You see the values are exactly the same as WinDbg shows in the saved context above. Actually if you look at Page-Fault Error Code bits in Intel Architecture Software Developer’s Manual Volume 3A, you would see that for this case, all zeroes, we have:

the page was not present in memory
the fault was caused by the read access
the processor was executing in kernel mode
no reserved bits in page directory were set to 1 when 0s were expected
it was not caused by instruction fetch

- Dmitry Vostokov -

Posted in Bugchecks Depicted, Crash Dump Analysis, Debugging, Hardware | No Comments »

Parameterized WinDbg Scripts

May 5th, 2007

I was looking for a way to pass arguments to scripts and the new WinDbg 6.7.5 documentation contains the description of $$>a< command:

$$>a< Filename arg1 arg2 arg3 … argn

argn

Specifies an argument that the debugger passes to the script. These parameters can be quoted strings or space-delimited strings. All arguments are optional.

This promises a lot of possibilities to write cool scripts and use structured programming. Couldn’t quickly find in the help who gets the passed parameter, a pseudu-register or a fixed-name alias. Certainly some experimentation is needed here.

- Dmitry Vostokov -

Posted in Crash Dump Analysis, WinDbg Scripts | 2 Comments »

Crash Dumps for Dummies (Part 5)

May 5th, 2007

In this part, I try to explain symbol files. They are usually called PDB files because they have .PDB extension although the older ones can have .DBG extension. PDB files are needed to read dump files properly. Without PDB files the dump file data is just a collection of numbers, the contents of memory, without any meaning. PDB files help tools like WinDbg to interpret the data and present it in a human-readable format. Roughly speaking, PDB files contain associations between numbers and their meanings expressed in short text strings:

Because these associations are changed when you have a fix or a service pack on a computer and you have a dump from it you need newer PDB files that correspond to updated components such as DLLs or drivers.

Long time ago you had to download symbol files manually from Microsoft or get them from CDs. Now Microsoft has its dedicated internet symbol server and WinDbg can download PDB files automatically. However you need to specify Microsoft symbol server location in File\Symbol File Path… dialog and check Reload. The location is usually:

SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

If you don’t remember the location when you run WinDbg for the first time or on a new computer you can enter .symfix command to set Microsoft symbol server path automatically and specify the location where to download symbol files. You can check your current symbol search path by using .sympath command and don’t forget to reload symbols by entering .reload command:

0:000> .symfix No downstream store given, using C:\Program Files\Debugging Tools for Windows\sym 0:000> .sympath Symbol search path is: SRV**http://msdl.microsoft.com/download/symbols 0:000> .symfix c:\websymbols 0:000> .sympath Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols 0:000> .reload

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dumps for Dummies, WinDbg Tips and Tricks | 2 Comments »

Reading Korean

May 5th, 2007

I’m very pleased that Heejune Kim and Taehwa Lee translated and posted the first crash dump analysis pattern article on www.driveronline.org.

Heejune also has a blog insidekernel.net, mainly about WDF.

I’ve just looked at my blog idea notes and found that I have plans to write about 15 more crash dump analysis patterns. I will try to write most of them in forthcoming months and later reorganize and classify them in retrospection.

As you might have guessed already I have books about Korean language too. First I’d like to mention 2 volumes written by Ross King, Jae-Hoon Yeon and Insun Lee:

Elementary Korean

Continuing Korean

Korean writing system, Hankul, is usually described as the most scientific writing system in the world and as the world’s best alphabet.

If you are interested in Writing Systems in general (capitalized to avoid confusion with writing systems in C++) I would recommend two books that I browse sometimes:

Writing Systems: A Linguistic Approach

Writing Systems: A Linguistic Introduction

Both books use IPA (International Phonetic Alphabet) to illustrate mapping of writing to sounds so if you have never used IPA and you want to know about it, vowels, consonants and sounds of languages, basics of speech recognition and synthesis I would greatly recommend this amazing book with CD (I read the first edition but seems there is the second one):

Vowels and Consonants

- Dmitry Vostokov -

Posted in Crash Dump Analysis | No Comments »

PDBFinder (public version 3.0.1)

May 4th, 2007

Finally I’ve managed to create a slimmer version 3.0.1 with many resource consumption improvements:

Built database occupies 3.7 times less disk space
2 times faster load
5 times less memory consumption

Additional improvements:

“FIXFinder” feature: shows folders where you can find newer modules
Search can be restricted to folder names containing sub-string (for example, “release” or “full” symbols only)
Ready to copy and paste folder names (for WinDbg Symbol File Path dialog) – exe/dll/sys subfolders are stripped off
Additional minor interface improvements (larger screen, ellipsis in paths during build, better keyboard focus handling)

Screenshot:

The tool has been tested with more than 1,000,000 PDB files.

PDBFinder Deluxe Pro 3.0.1 and its documentation can be downloaded from Citrix support web site (requires free registration).

The motivation behind the tool is explained in my previous post: Cons of Symbol Server.

- Dmitry Vostokov -

Posted in Announcements, Crash Dump Analysis, Tools | 2 Comments »

Moving to UML 2.x

May 4th, 2007

As a part of improving my UML diagrams and moving towards UML 2.x I’m evaluating free community editors and found the one that seems to be good: Visual Paradigm for UML 6.0 Community Edition.
Here are some diagram screenshots:

http://www.visual-paradigm.com/product/vpuml/vpumlscreenshots.jsp

If you don’t know you can be certified in UML 2:

http://www.omg.org/uml-certification/index.htm

In 2003 I was one of beta-testers of UML Fundamentals exam and passed it. I guess from my certificate number (B100031) I was the 31st person who passed the exam, the highest bit offset in a double word. Now I’m aiming to be certified in the next Intermediate level. This exam is a pure UML language exam. It is not about applying UML in your domain. The only one book for this certification is this and which I found very good for exam preparation:

UML 2 Certification Guide: Fundamental & Intermediate Exams

- Dmitry Vostokov -

Posted in Books, Software Architecture | 2 Comments »

Cons of Symbol Server

May 3rd, 2007

Symbol servers are great. However I found that in crash dump analysis the absence of automatically loaded symbols sometimes helps to identify a problem or at least gives some directions for further research. It also helps to see which hot fixes or service packs for your product were installed on a problem computer. The scenario I use sometimes when I analyze crash dumps from product A is the following:

Set up WinDbg to point to Microsoft Symbol Server
Load a crash dump and enter various commands based on the issue. Some OS or product A components become visible and their symbols are unresolved.
From unresolved OS symbols I’m aware of the latest fixes or privates from MS
From unresolved symbols of the product A and PDBFinder I determine the base product level and this already gives me some directions.
I add the base product A symbols to symbol file path and continue my analysis.
If unresolved symbols of the product A continue to come up I use PDBFinder again to find corresponding symbols and add them to symbol file path. By doing that I’m aware of the product A hot fix and/or service pack level.
Also from the latest version of PDBFinder (3.0.1) I know whether there are any updates to the component in question.

Of course, all this works only if you store all PDB files from all your fixes and service packs in some location(s) with easily identified names, for example, PRODUCTA\VER20\SP31\FIX01. Adding symbols manually helps to be focused on components, gives attention to some threads where they appear. You might think it is a waste of time but it only takes very small percentage of time especially if you look at the dump for a couple of hours.

What is PDBFinder? This is a program I developed to be able to find right symbol files (especially for minidumps). It scans all locations for PDB or DBG files and adds them to a text database. Next time you run PDBFinder it loads that database and you can find PDB or DBG file location by specifying module name and its date. You can also do a fuzzy search by specifying some date interval. If you run it with -update command line option it will build the database automatically, useful for scheduling weekly updates.

The public version of PDBFinder Deluxe 2.2.1 can be downloaded from Citrix support web site. The new version 3.0.1 on the way with major improvements and will be announced tomorrow.

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Tools | 1 Comment »

The new version of WinDbg has been released

May 1st, 2007

Version 6.7.5 of Debugging Tools for Windows was released last week:

I haven’t found so far significant additions for crash dump analysis. What I noticed today is that when I open a user dump and enter
!analyze -v command the new field appears in the output:

NTGLOBALFLAG: 400

I ran gflags.exe and enabled page heap for notepad.exe. Then I launched notepad.exe, attached WinDbg, entered the command and NTGLOBALFLAG field reflected the change:

NTGLOBALFLAG: 2000000

So I don’t need to type !gflag command anymore.

- Dmitry Vostokov -

Posted in Announcements, Crash Dump Analysis, Debugging, Tools | No Comments »

Who calls the postmortem debugger?

April 30th, 2007

I was trying to understand who calls dwwin.exe, the part of Windows Error Reporting on my Windows XP system when the crash happens. To find an answer I launched TestDefaultDebugger application and after pushing its crash button the following familiar WER dialog box appeared:

I repeated the same when running ProcessHistory in the background and looking at its log I found that the parent process for dwwin.exe and postmortem debugger (if we click on Debug button) was TestDefaultDebugger.exe. In my case the default postmortem debugger was drwtsn32.exe. To look inside I attached WinDbg to TestDefaultDebugger process when WER dialog box above was displayed and got the following stack trace:

0:000> k ChildEBP RetAddr 0012d318 7c90e9ab ntdll!KiFastSystemCallRet 0012d31c 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc 0012d3b8 7c80a075 kernel32!WaitForMultipleObjectsEx+0x12c 0012d3d4 6945763c kernel32!WaitForMultipleObjects+0x18 0012dd68 694582b1 faultrep!StartDWException+0×5df 0012eddc 7c8633e9 faultrep!ReportFault+0×533 0012f47c 00411eaa kernel32!UnhandledExceptionFilter+0×587 0012f49c 0040e879 TestDefaultDebugger+0×11eaa 0012ffc0 7c816fd7 TestDefaultDebugger+0xe879 0012fff0 00000000 kernel32!BaseProcessStart+0×23

The combination of StartDWException and WaitForMultipleObjects suggests that dwwin.exe process is started there. Indeed, when I disassembled StartDWException function I saw CreateProcess call just before the wait call:

0:000> uf faultrep!StartDWException ... ... ... 69457585 8d8568f7ffff lea eax,[ebp-898h] 6945758b 50 push eax 6945758c 8d8524f7ffff lea eax,[ebp-8DCh] 69457592 50 push eax 69457593 8d85d4fbffff lea eax,[ebp-42Ch] 69457599 50 push eax 6945759a 33c0 xor eax,eax 6945759c 50 push eax 6945759d 6820000004 push 4000020h 694575a2 6a01 push 1 694575a4 50 push eax 694575a5 50 push eax 694575a6 ffb5a4f7ffff push dword ptr [ebp-85Ch] 694575ac 50 push eax 694575ad ff1558114569 call dword ptr [faultrep!_imp__CreateProcessW (69451158)] … … …

The second parameter of CreateProcess, [ebp-85Ch], is the address of the process command line and we know EBP value from the call stack above, 0012dd68. Just to remind that parameters for Win32 API functions are pushed from right to left. So we get the command line straight away:

0:000> dpu 0012dd68-85Ch l1 0012d50c 0012d3ec "C:\WINDOWS\system32\dwwin.exe -x -s 208"

If we dismiss WER dialog by clicking on Debug button then the postmortem debugger starts. Also it starts without WER dialog displayed if we rename faultrep.dll beforehand. So it looks like the obvious place to look for postmortem debugger launch is UnhandledExceptionFilter. Indeed, we see it there:

0:000> uf kernel32!UnhandledExceptionFilter ... ... ... 7c8636a8 8d850cfaffff lea eax,[ebp-5F4h] 7c8636ae 50 push eax 7c8636af 8d857cf9ffff lea eax,[ebp-684h] 7c8636b5 50 push eax 7c8636b6 33c0 xor eax,eax 7c8636b8 50 push eax 7c8636b9 50 push eax 7c8636ba 50 push eax 7c8636bb 6a01 push 1 7c8636bd 50 push eax 7c8636be 50 push eax 7c8636bf 53 push ebx 7c8636c0 50 push eax 7c8636c1 e86cecf9ff call kernel32!CreateProcessW (7c802332) … … …

Because this is the code that yet to be executed we need to put a breakpoint at 7c8636c1, continue execution, and when the breakpoint is hit dump the second parameter to CreateProcess that is the memory EBX points to:

0:000> bp 7c8636c1 0:000> g Breakpoint 0 hit eax=00000000 ebx=0012ed78 ecx=0012ec70 edx=7c90eb94 esi=0000003a edi=00000000 eip=7c8636c1 esp=0012ed50 ebp=0012f47c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 kernel32!UnhandledExceptionFilter+0x84b: 7c8636c1 e86cecf9ff call kernel32!CreateProcessW (7c802332) 0:000> du @ebx 0012ed78 “c:\drwatson\drwtsn32 -p 656 -e 1″ 0012edb8 “72 -g”

You see that UnhandledExceptionFilter is about to launch my custom Dr. Watson postmortem debugger.

If you look further at UnhandledExceptionFilter disassembled code you would see that after creating postmortem debugger process and waiting for it to finish saving the process dump the function calls NtTerminateProcess.

Therefore all error reporting, calling the postmortem debugger and final process termination are done in the same process that had the exception. The latter two parts are also described in Matt Pietrek’s article as I found afterwards.

One addition to that article written in 1997: starting from Windows XP UnhandledExceptionFilter locates and loads faultrep.dll which launches dwwin.exe to report an error:

kernel32!UnhandledExceptionFilter+0x4f7: 7c863359 8d85acfaffff lea eax,[ebp-554h] 7c86335f 50 push eax 7c863360 8d8570faffff lea eax,[ebp-590h] 7c863366 50 push eax 7c863367 56 push esi 7c863368 56 push esi 7c863369 e8fbacfaff call kernel32!LdrLoadDll (7c80e069) 0:000> dt _UNICODE_STRING 0012f47c-590 "C:\WINDOWS\system32\faultrep.dll" +0x000 Length : 0x40 +0x002 MaximumLength : 0x100 +0x004 Buffer : 0x0012f360 "C:\WINDOWS\system32\faultrep.dll"

You can also see that all processing is done using the same thread stack. So if something is wrong with that stack then you have silent process termination and no error is reported. In Vista there are some improvements that I’m going to cover in detail in a separate post.

- Dmitry Vostokov -

Posted in Crash Dump Analysis, Debugging, Vista | 5 Comments »

Spammers got my attention

April 29th, 2007

I got an email this morning whose subject begins with “We can define a private “stack pointer”…”. It got my attention because of the phrase “stack pointer”. The full subject was even more intriguing:

“We can define a private “stack pointer” that keeps track of which stack level we’re at, and addresses the corresponding register.”

When I opened that email I was utterly disappointed: the same spam advertisement that we get with less clever subjects. However I was curious to understand whether the spammer knew about low-level stuff or built the subject sentence automatically from keywords found on my blog. The latter assumption looked less plausible because it seemed to me that the first part of the sentence was written by a human or AI caught the wave again while I was immersed in crash dump analysis. On the other hand, the sentence as the whole was confusing to me because pointers do not address registers on Intel architecture, it is the way around. Finally I found that the sentence was written in 1988 by a human indeed. I googled it and it was the exact sentence from “LET’S BUILD A COMPILER!” article series:

http://compilers.iecc.com/crenshaw/tutor2.txt

I don’t know whether there was an engine behind the spam that picks subject text from sentences found on web sites based on keywords found on a target audience site or that was just a coincidence and everyone got the same e-mail.

- Dmitry Vostokov -

Posted in Security | No Comments »

Interrupts and exceptions explained (Part 1)

April 28th, 2007

So far we have seen various bugchecks depicted. What I left there is the explanation of how exceptions happen in the first place and how the execution flow reaches KiDispatchException. When some abnormal condition happens such as breakpoint, division by zero or memory protection violation then normal CPU execution flow (code stream) is interrupted (therefore I use the terms “interrupt” and “exception” interchangeably here). The type of interrupt is specified by a number called interrupt vector number. Obviously CPU has to transfer execution to some procedure in memory to handle that interrupt. CPU has to find that procedure, theoretically either having one procedure for all interrupts and specifying interrupt vector number as a parameter or having a table containing pointers to various procedures that correspond to different interrupt vectors. Intel x86 and x64 CPUs use the latter approach which is depicted on the following diagram:

When an exception happens (divide by zero, for example) CPU gets the address of the procedure table from IDTR (Interrupt Descriptor Table Register). This IDT (Interrupt Descriptor Table) is a zero-based array of 8-byte descriptors (x86) or 16-byte descriptors (x64). CPU calculates the location of the necessary procedure to call and does some necessary steps like saving appropriate registers before calling it.

The same happens when some external I/O device interrupts. We will talk about external interrupts later. For I/O devices the term “interrupt” is more appropriate. On the picture above I/O hardware interrupt vector numbers were taken from some dump. These are OS and user-defined numbers. The first 32 vectors are reserved by Intel.

Before Windows switches CPU to protected mode during boot process it creates IDT table in memory and sets IDTR to point to it by using SIDT instruction.

Let me now illustrate this by using a UML class diagram annotated by pseudo-code that shows what CPU does before calling the appropriate procedure. The pseudo-code on the diagram below is valid for interrupts and exceptions happening when the current CPU execution mode is kernel. For interrupts and exceptions generated when CPU executes code in user mode the picture is a little more complicated because the processor has to switch the current user-mode stack to kernel mode stack.

The following diagram is for 32-bit x86 processor (x64 will be illustrated later):

Let’s see all this in some dump. The address of IDT can be found by using !pcr [processor number] command. Every processor on a multiprocessor machine has its own IDT:

0: kd> !pcr 0 KPCR for Processor 0 at ffdff000: Major 1 Minor 1 NtTib.ExceptionList: f2178b8c NtTib.StackBase: 00000000 NtTib.StackLimit: 00000000 NtTib.SubSystemTib: 80042000 NtTib.Version: 0005c645 NtTib.UserPointer: 00000001 NtTib.SelfTib: 7ffdf000 SelfPcr: ffdff000 Prcb: ffdff120 Irql: 0000001f IRR: 00000000 IDR: ffffffff InterruptMode: 00000000 IDT: 8003f400 GDT: 8003f000 TSS: 80042000 CurrentThread: 88c1d3c0 NextThread: 00000000 IdleThread: 808a68c0 DpcQueue:

Every entry in IDT has the type _KIDTENTRY and we can get the first entry for divide by zero exception which has vector number 0:

0: kd> dt _KIDTENTRY 8003f400 +0x000 Offset : 0x47ca +0x002 Selector : 8 +0x004 Access : 0x8e00 +0x006 ExtendedOffset : 0x8083

By gluing together ExtendedOffset and Offset fields we get the address of the interrupt handling procedure (0×808347ca) which is KiTrap00:

0: kd> ln 0x808347ca (808347ca) nt!KiTrap00 | (808348a5) nt!Dr_kit1_a Exact matches: nt!KiTrap00

We can also see the interrupt trace in raw stack. For example, we can open the kernel dump and see the following stack trace and registers in the output of !analyze -v command:

ErrCode = 00000000 eax=00001b00 ebx=00001b00 ecx=00000000 edx=00000000 esi=f2178cb4 edi=bc15a838 eip=bf972586 esp=f2178c1c ebp=f2178c90 iopl=0 nv up ei ng nz ac po cy cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293 driver!foo+0xf9: bf972586 f77d10 idiv eax,dword ptr [ebp+10h] ss:0010:f2178ca0=00000000 STACK_TEXT: f2178b44 809989be nt!KeBugCheck+0×14 f2178b9c 8083484f nt!Ki386CheckDivideByZeroTrap+0×41 f2178b9c bf972586 nt!KiTrap00+0×88 f2178c90 bf94c23c driver!foo+0xf9 f2178d54 80833bdf driver!bar+0×11c

The dumping memory around ESP value (f2178c1c) shows the values processor pushes when divide by zero exception happens:

0: kd> dds f2178c1c-100 f2178c1c+100 ... ... ... f2178b80 00000000 f2178b84 f2178b50 f2178b88 00000000 f2178b8c f2178d44 f2178b90 8083a8cc nt!_except_handler3 f2178b94 80870828 nt!`string'+0xa4 f2178b98 ffffffff f2178b9c f2178ba8 f2178ba0 8083484f nt!KiTrap00+0x88 f2178ba4 f2178ba8 f2178ba8 f2178c90 f2178bac bf972586 driver!foo+0xf9 f2178bb0 badb0d00 f2178bb4 00000000 f2178bb8 0000006d f2178bbc bf842315 f2178bc0 f2178c6c f2178bc4 f2178bec f2178bc8 bf844154 f2178bcc f2178c88 f2178bd0 f2178c88 f2178bd4 00000000 f2178bd8 bc150000 f2178bdc f2170023 f2178be0 f2170023 f2178be4 00000000 f2178be8 00000000 f2178bec 00001b00 f2178bf0 f2178c0c f2178bf4 f2178d44 f2178bf8 f2170030 f2178bfc bc15a838 f2178c00 f2178cb4 f2178c04 00001b00 f2178c08 f2178c90 f2178c0c 00000000 ; ErrorCode f2178c10 bf972586 driver!foo+0xf9 ; EIP f2178c14 00000008 ; CS f2178c18 00010293 ; EFLAGS f2178c1c 00000000 ; <- ESP before interrupt f2178c20 0013c220 f2178c24 00000000 f2178c28 60000000 f2178c2c 00000001 f2178c30 00000000 f2178c34 00000000 … … …

ErrorCode is not the same as interrupt vector number although it is the same number here (0). I won’t cover interrupt error codes here. If you are interested please consult Intel Architecture Software Developer’s Manual.

If we try to execute !idt extension command it will show you only user-defined hardware interrupt vectors:

0: kd> !idt Dumping IDT: 37: 80a7d1ac hal!PicSpuriousService37 50: 80a7d284 hal!HalpApicRebootService 51: 89495044 serial!SerialCIsrSw (KINTERRUPT 89495008) 52: 89496044 i8042prt!I8042MouseInterruptService (KINTERRUPT 89496008) 53: 894ea044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894ea008) 63: 894f2044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894f2008) 72: 89f59044 atapi!IdePortInterrupt (KINTERRUPT 89f59008) 73: 89580044 NDIS!ndisMIsr (KINTERRUPT 89580008) 83: 899e7824 NDIS!ndisMIsr (KINTERRUPT 899e77e8) 92: 89f56044 atapi!IdePortInterrupt (KINTERRUPT 89f56008) 93: 89f1e044 SCSIPORT!ScsiPortInterrupt (KINTERRUPT 89f1e008) a3: 894fa044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894fa008) a4: 894a3044 cpqcidrv+0×22AE (KINTERRUPT 894a3008) b1: 89f697dc ACPI!ACPIInterruptServiceRoutine (KINTERRUPT 89f697a0) b3: 89498824 i8042prt!I8042KeyboardInterruptService (KINTERRUPT 894987e8) b4: 894a2044 cpqasm2+0×5B99 (KINTERRUPT 894a2008) c1: 80a7d410 hal!HalpBroadcastCallService d1: 80a7c754 hal!HalpClockInterrupt e1: 80a7d830 hal!HalpIpiHandler e3: 80a7d654 hal!HalpLocalApicErrorService fd: 80a7e11c hal!HalpProfileInterrupt

In the next part I’m going to cover x64 specifics.

- Dmitry Vostokov -

Posted in Bugchecks Depicted, Crash Dump Analysis, Debugging, Hardware | 2 Comments »

Bugchecks: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

April 25th, 2007

Another bugcheck that is similar to KMODE_EXCEPTION_NOT_HANDLED and KERNEL_MODE_EXCEPTION_NOT_HANDLED is SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (0×7E).

This bugcheck happens when you have an exception in a system thread and there is no exception handler to catch it, i.e. no __try/__except. System threads are created by calling PsCreateSystemThread function. Here is its description from DDK:

The PsCreateSystemThread routine creates a system thread that executes in kernel mode and returns a handle for the thread.

By default PspUnhandledExceptionInSystemThread function is set as a default exception handler and its purpose is to call KeBugCheckEx.

The typical call stack in dumps with 7E bugcheck is:

1: kd> k ChildEBP RetAddr f70703cc 809a1eb2 nt!KeBugCheckEx+0x1b (FPO: [Non-Fpo]) f70703e8 80964a94 nt!PspUnhandledExceptionInSystemThread+0x1a f7070ddc 80841a96 nt!PspSystemThreadStartup+0x56 00000000 00000000 nt!KiThreadStartup+0x16

To see how this bugcheck is generated from processor trap we need to look at raw stack. Let’s look at some example. !analyze -v command gives us the following output:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: 80000003, The exception code that was not handled Arg2: f69d9dd7, The address that the exception occurred at Arg3: f70708c0, Exception Record Address Arg4: f70705bc, Context Record Address EXCEPTION_RECORD: f70708c0 -- (.exr fffffffff70708c0) ExceptionAddress: f69d9dd7 (driver+0x00014dd7) ExceptionCode: 80000003 (Break instruction exception) ExceptionFlags: 00000000 NumberParameters: 3 Parameter[0]: 00000000 Parameter[1]: 8a784020 Parameter[2]: 00000044 CONTEXT: f70705bc -- (.cxr fffffffff70705bc) eax=00000001 ebx=00000032 ecx=8a37c000 edx=00000044 esi=8a37c000 edi=000000c7 eip=f69d9dd7 esp=f7070988 ebp=00004000 iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246 driver+0x14dd7: f69d9dd7 cc int 3 STACK_TEXT: WARNING: Stack unwind information not available. Following frames may be wrong. f7070998 f69dfbd1 80800000 00000000 8a37c000 driver+0x14dd7 00000000 00000000 00000000 00000000 00000000 driver+0x1abd1

We see that the thread encountered the breakpoint instruction and we also see that the call stack that led to breakpoint exception is incomplete. Here we must dump the raw stack data and try to reconstruct the stack manually. System threads are started with the execution of KiThreadStartup function. So let’s dump the stack starting from ESP register and up to some value, find startup function there and try to walk EBP chain:

1: kd> dds esp esp+1000 ... ... ... f7070a04 f7070a40 f7070a08 f71172ec NDIS!ndisMCommonHaltMiniport+0×375 f7070a0c 8a37c000 f7070a10 8a60f5c0 f7070a14 8a610748 f7070a18 8a610748 f7070a1c 00000058 f7070a20 00000005 f7070a24 00000004 f7070a28 f7527000 f7070a2c 00000685 f7070a30 f710943a NDIS!ndisMDummyIndicatePacket f7070a34 00000000 f7070a38 8a610778 f7070a3c 00010748 f7070a40 f7070b74 f7070a44 f7117640 NDIS!ndisMHaltMiniport+0×21 f7070a48 8a610990 … … (intermediate frames are omitted to save space) … f7070d7c e104a338 f7070d80 f7070dac f7070d84 8083f72e nt!ExpWorkerThread+0xeb f7070d88 894d0ed8 f7070d8c 00000000 f7070d90 8a784020 f7070d94 00000000 f7070d98 00000000 f7070d9c 00000000 f7070da0 00000001 f7070da4 00000000 f7070da8 808f0cb7 nt!PiWalkDeviceList f7070dac f7070ddc f7070db0 8092ccff nt!PspSystemThreadStartup+0×2e f7070db4 894d0ed8 f7070db8 00000000 f7070dbc 00000000 f7070dc0 00000000 f7070dc4 f7070db8 f7070dc8 f7070410 f7070dcc ffffffff f7070dd0 8083b9bc nt!_except_handler3 f7070dd4 808408f8 nt!ObWatchHandles+0×5f4 f7070dd8 00000000 f7070ddc 00000000 f7070de0 80841a96 nt!KiThreadStartup+0×16 f7070de4 8083f671 nt!ExpWorkerThread

The last EBP value we are able to get is f7070a04 and we can specify it to the extended version of k command together with ESP and EIP value of crash point:

1: kd> k L=f7070a04 f7070988 f69d9dd7 ChildEBP RetAddr f7070998 f69dfbd1 driver+0x14dd7 f7070a40 f7117640 NDIS!ndisMCommonHaltMiniport+0x375 f7070a48 f711891a NDIS!ndisMHaltMiniport+0x21 f7070b74 f71196e5 NDIS!ndisPnPRemoveDevice+0x189 f7070ba4 8083f9d0 NDIS!ndisPnPDispatch+0x15d f7070bb8 808f6a25 nt!IofCallDriver+0x45 f7070be4 808e20b5 nt!IopSynchronousCall+0xbe f7070c38 8080beae nt!IopRemoveDevice+0x97 f7070c60 808e149b nt!IopRemoveLockedDeviceNode+0x160 f7070c78 808e18cc nt!IopDeleteLockedDeviceNode+0x50 f7070cac 808e1732 nt!IopDeleteLockedDeviceNodes+0x3f f7070d40 808e19b6 nt!PiProcessQueryRemoveAndEject+0x7ad f7070d5c 808e7879 nt!PiProcessTargetDeviceEvent+0x2a f7070d80 8083f72e nt!PiWalkDeviceList+0x1d2 f7070dac 8092ccff nt!ExpWorkerThread+0xeb f7070ddc 80841a96 nt!PspSystemThreadStartup+0x2e 00000000 00000000 nt!KiThreadStartup+0x16

This was the execution path before the exception. However when int 3 instruction was hit then processor generated a trap with interrupt vector 3 (4th entry in interrupt descriptor table, zero-based) and the corresponding function in IDT, KiTrap03, was called. Therefore we need to find this function on the same raw stack and try to build stack trace for our exception handling code path:

1: kd> dds esp esp+1000 f70703b4 0000007e f70703b8 80000003 f70703bc f69d9dd7 driver+0x14dd7 f70703c0 f70708c0 f70703c4 f70705bc f70703c8 00000000 f70703cc f70703e8 f70703d0 809a1eb2 nt!PspUnhandledExceptionInSystemThread+0x1a f70703d4 0000007e ... ... ... f7070418 f707043c f707041c 8083b578 nt!ExecuteHandler2+0×26 f7070420 f70708c0 f7070424 f7070dcc f7070428 f70705bc f707042c f70704d8 f7070430 f7070894 f7070434 8083b58c nt!ExecuteHandler2+0×3a f7070438 f7070dcc f707043c f70704ec f7070440 8083b54a nt!ExecuteHandler+0×24 f7070444 f70708c0 … … (intermediate frames are omitted to save space) … f7070890 f7070410 f7070894 f7070dcc f7070898 8083b9bc nt!_except_handler3 f707089c 80850c10 nt!`string’+0×10c f70708a0 ffffffff f70708a4 f7070914 f70708a8 808357a4 nt!CommonDispatchException+0×4a f70708ac f70708c0 f70708b0 00000000 f70708b4 f7070914 f70708b8 00000000 f70708bc 00000001 f70708c0 80000003 f70708c4 00000000 f70708c8 00000000 f70708cc f69d9dd7 driver+0×14dd7 f70708d0 00000003 f70708d4 00000000 f70708d8 8a784020 f70708dc 00000044 f70708e0 ffffffff f70708e4 00000001 f70708e8 f7737a7c f70708ec f7070934 f70708f0 8083f164 nt!KeWaitForSingleObject+0×346 f70708f4 000000c7 f70708f8 00000000 f70708fc 00000031 f7070900 00000000 f7070904 f707091c f7070908 80840569 nt!KeSetTimerEx+0×179 f707090c 000000c7 f7070910 80835f39 nt!KiTrap03+0xb0 f7070914 00004000

The last EBP value we are able to get is f7070418 so the following stack trace emerges albeit not accurate as it doesn’t show that KeBugCheckEx was called from PspUnhandledExceptionInSystemThread as can be seen from disassembly. However it does show that the main function that orchestrates stack unwinding is KiDispatchException:

1: kd> k L=f7070418 ChildEBP RetAddr f7070418 8083b578 nt!KeBugCheckEx+0x1b f707043c 8083b54a nt!ExecuteHandler2+0x26 f70704ec 80810664 nt!ExecuteHandler+0x24 f70708a4 808357a4 nt!KiDispatchException+0x131 f707090c 80835f39 nt!CommonDispatchException+0x4a f707090c f69d9dd8 nt!KiTrap03+0xb0 1: kd> uf nt!PspUnhandledExceptionInSystemThread nt!PspUnhandledExceptionInSystemThread: 809a1e98 8bff mov edi,edi 809a1e9a 55 push ebp 809a1e9b 8bec mov ebp,esp 809a1e9d 8b4d08 mov ecx,dword ptr [ebp+8] 809a1ea0 ff7104 push dword ptr [ecx+4] 809a1ea3 8b01 mov eax,dword ptr [ecx] 809a1ea5 50 push eax 809a1ea6 ff700c push dword ptr [eax+0Ch] 809a1ea9 ff30 push dword ptr [eax] 809a1eab 6a7e push 7Eh 809a1ead e8f197edff call nt!KeBugCheckEx (8087b6a3)

At the end to illustrate this bugcheck I created the following UML sequence diagram (trap processing that leads to KiDispatchException will be depicted in another post):

- Dmitry Vostokov -

Posted in Bugchecks Depicted, Crash Dump Analysis, Debugging | No Comments »

July 2026
M	T	W	T	F	S	S
« Jun
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pages

Recent Comments

Categories

Archives

ARM64

Automated Analysis

Blogroll

Debugging Channels

Forensics

Hardware

Linux

Mac OS X

Magazines and Newspapers

Malware Analysis

Medical Diagnostics

Narratology

Related Links

Reversing

Scripting Languages

Source Code

Tracing Tools

Meta