Software Diagnostics Library

Four pillars of software troubleshooting

November 29th, 2007

They are (sorted alphabetically):

Crash Dump Analysis (also called Memory Dump Analysis or Core Dump Analysis)
Problem Reproduction
Trace and Log Analysis
Virtual Assistance (also called Remote Assistance)

For troubleshooting software on Windows platforms Citrix provides GoToAssist for virtual on-site presence and Xen for problem reproduction.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Citrix, Crash Dump Analysis, Software Technical Support | No Comments »

Understanding I/O Completion Ports

November 27th, 2007

Many articles and books explain Windows I/O completion ports from high level design considerations arising when building high-performance server software. But it is hard to recall them later when someone asks to explain and not everyone writes that software. Looking at complete memory dumps has an advantage of a bottom-up or reverse engineering approach where we see internals of server software and can immediately grasp the implementation of certain architectural and design decisions.

Consider this thread stack trace we can find almost inside any service or network application process:

THREAD 86cf09c0 Cid 05cc.2030 Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable 8a3bb970 QueueObject 86cf0a38 NotificationTimer Not impersonating DeviceMap e15af5a8 Owning Process 8a3803d8 Image: svchost.exe Wait Start TickCount 2131621 Ticks: 1264 (0:00:00:19.750) Context Switch Count 6 UserTime 00:00:00.000 KernelTime 00:00:00.000 Win32 Start Address RPCRT4!ThreadStartRoutine (0×77c5de6d) Start Address kernel32!BaseThreadStartThunk (0×77e6b5f3) Stack Init ba276000 Current ba275c38 Base ba276000 Limit ba273000 Call 0 Priority 8 BasePriority 8 PriorityDecrement 0 ChildEBP RetAddr ba275c50 8083d3b1 nt!KiSwapContext+0×26 ba275c7c 8083dea2 nt!KiSwapThread+0×2e5 ba275cc4 8092b205 nt!KeRemoveQueue+0×417 ba275d48 80833a6f nt!NtRemoveIoCompletion+0xdc ba275d48 7c82ed54 nt!KiFastCallEntry+0xfc 0093feac 7c821bf4 ntdll!KiFastSystemCallRet 0093feb0 77e66142 ntdll!NtRemoveIoCompletion+0xc 0093fedc 77c604c3 kernel32!GetQueuedCompletionStatus+0×29 0093ff18 77c60655 RPCRT4!COMMON_ProcessCalls+0xa1 0093ff84 77c5f9f1 RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+0×117 0093ff8c 77c5f7dd RPCRT4!ProcessIOEventsWrapper+0xd 0093ffac 77c5de88 RPCRT4!BaseCachedThreadRoutine+0×9d 0093ffb8 77e6608b RPCRT4!ThreadStartRoutine+0×1b 0093ffec 00000000 kernel32!BaseThreadStart+0×34

We see that I/O completion port is implemented via kernel queue object so requests (work items, completion notifications, etc) are stored in that queue for further processing by threads. The number of active threads processing requests is bound to some maximum value that usually corresponds to the number of processors:

0: kd> dt _KQUEUE 8a3bb970 ntdll!_KQUEUE +0x000 Header : _DISPATCHER_HEADER +0x010 EntryListHead : _LIST_ENTRY [ 0x8a3bb980 - 0x8a3bb980 ] +0x018 CurrentCount : 0 +0×01c MaximumCount : 2 +0×020 ThreadListHead : _LIST_ENTRY [ 0×86cf0ac8 - 0×89ff9520 ]

0: kd> !smt SMT Summary: ------------ KeActiveProcessors: **------------------------------ (00000003) KiIdleSummary: **------------------------------ (00000003) No PRCB Set Master SMT Set IAID 0 ffdff120 Master **—————————— (00000003) 00 1 f772f120 ffdff120 **—————————— (00000003) 01

Kernel work queues are also implemented via the same queue object as we might have guessed already:

THREAD 8a777660 Cid 0004.00d0 Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable 808b707c QueueObject Not impersonating DeviceMap e1000928 Owning Process 8a780818 Image: System Wait Start TickCount 2615 Ticks: 2130270 (0:09:14:45.468) Context Switch Count 301 UserTime 00:00:00.000 KernelTime 00:00:00.000 Start Address nt!ExpWorkerThread (0×8082d92b) Stack Init f71e0000 Current f71dfcec Base f71e0000 Limit f71dd000 Call 0 Priority 12 BasePriority 12 PriorityDecrement 0 Kernel stack not resident. ChildEBP RetAddr f71dfd04 8083d3b1 nt!KiSwapContext+0×26 f71dfd30 8083dea2 nt!KiSwapThread+0×2e5 f71dfd78 8082d9c1 nt!KeRemoveQueue+0×417 f71dfdac 809208fc nt!ExpWorkerThread+0xc8 f71dfddc 8083fc9f nt!PspSystemThreadStartup+0×2e 00000000 00000000 nt!KiThreadStartup+0×16

0: kd> dt _KQUEUE 808b707c ntdll!_KQUEUE +0x000 Header : _DISPATCHER_HEADER +0x010 EntryListHead : _LIST_ENTRY [ 0x808b708c - 0x808b708c ] +0x018 CurrentCount : 0 +0×01c MaximumCount : 2 +0×020 ThreadListHead : _LIST_ENTRY [ 0×8a77a128 - 0×8a777768 ]

I’ve created the simple UML diagram showing high-level relationship between various objects seen from crash dumps. Note that Active Thread object can process items from more than one completion port if its wait was satisfied for one port and then for another but I have never seen this. Obviously Waiting thread can wait only for one completion port.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Patterns, Software Architecture | 4 Comments »

DebugWare

November 27th, 2007

I’ve been slowly accumulating blog posts about various troubleshooting tools for my next book in a row with a working title:

DebugWare: The Art and Craft of Writing Troubleshooting and Debugging Tools

Details will be announced later together with supporting website which is under construction. This book will be about architecture, design and implementation of troubleshooting tools for software technical support.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Announcements, Books, Software Architecture, Software Technical Support, Tools | No Comments »

Teaching binary to decimal conversion

November 26th, 2007

Sometimes we have data in binary and we want to convert it to decimal to lookup some constant in a header file, for example. I used to do it previously via calc.exe. Now I use .formats WinDbg command and 0y binary prefix:

0:000> .formats 0y111010 Evaluate expression: Hex: 0000003a Decimal: 58 Octal: 00000000072 Binary: 00000000 00000000 00000000 00111010 Chars: ...: Time: Thu Jan 01 00:00:58 1970 Float: low 8.12753e-044 high 0 Double: 2.86558e-322

Some months ago I was flying SWISS and found this binary watch in their duty-free catalog which I use now to guess time

01 The One Binary Watch

It has 6 binary digits for minutes. There are desktop binary clocks and other binary watches available if you google them but they don’t have 6 binary digits for minutes. They approximate them by using 2 rows or columns: tenths of minutes and minutes (2 + 4 binary digits) and we are all good in handling 4 binary digits because of our work with hexadecimal nibbles but not good in handling more binary digits like 5 or 6 when we see them in one row.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Fun with Debugging, Hardware, Watches for Debugging, WinDbg Tips and Tricks | No Comments »

Stack traces on the Web

November 26th, 2007

How many WinDbg stack traces on the web are available for mining? Google gave the answer when I searched for typical stack trace text fragments:

"ChildEBP RetAddr" - about 40,200

"ChildEBP RetAddr Args to Child" - about 30,000

"Frame IP not in any known module" - about 10,800

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns | 4 Comments »

Five golden rules of troubleshooting

November 26th, 2007

It is difficult to analyze a problem when you have crash dumps and/or traces from various tracing tools and supporting information you have is incomplete or missing. After doing crash dump and trace analysis including ETW-based traces for more than 4 years I came up with this easy to remember 4WS questions to ask when you send or request traces and memory dumps:

What - What had happened or had been observed? Crash or hang, for example?

When - When did the problem happen if traces were recorded for hours?

Where - What server or workstation had been used for tracing or where memory dumps came from? For example, one trace is from a primary server and two others are from backup servers or one trace is from a client workstation and the other is from a server.

Why - Why did a customer or a support engineer request a dump or a trace? This could shed the light on various assumptions including presuppositions hidden in problem description.

Supporting information - needed to find a needle in a hay: process id, thread id, etc. Also, the answer to the following question is important: how dumps and traces were created?

Every trace or memory dump shall be accompanied by 4WS answers.

4WS rule can be applied to any troubleshooting because even the problem description itself is some kind of a trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dumps for Dummies, Debugging, Software Technical Support, Troubleshooting Methodology | 1 Comment »

Crash Dump Analysis Patterns (Part 39)

November 23rd, 2007

As mentioned in Early Crash Dump pattern saving crash dumps on first-chance exceptions helps to diagnose components that might have caused corruption and later crashes, hangs or CPU spikes by ignoring abnormal exceptions like access violation. In such cases we need to know whether an application installs its own Custom Exception Handler or several of them. If it uses only default handlers provided by runtime or windows subsystem then most likely a first-chance access violation exception will result in a last-chance exception and a postmortem dump. To check a chain of exception handlers we can use WinDbg !exchain extention command. For example:

0:000> !exchain 0017f9d8: TestDefaultDebugger!AfxWinMain+3f5 (00420aa9) 0017fa60: TestDefaultDebugger!AfxWinMain+34c (00420a00) 0017fb20: user32!_except_handler4+0 (770780eb) 0017fcc0: user32!_except_handler4+0 (770780eb) 0017fd24: user32!_except_handler4+0 (770780eb) 0017fe40: TestDefaultDebugger!AfxWinMain+16e (00420822) 0017feec: TestDefaultDebugger!AfxWinMain+797 (00420e4b) 0017ff90: TestDefaultDebugger!_except_handler4+0 (00410e00) 0017ffdc: ntdll!_except_handler4+0 (77961c78)

We see that TestDefaultDebugger doesn’t have its own exception handlers except ones provided by MFC and C/C++ runtime libraries which were linked statically. Here is another example. It was reported that a 3rd-party application was hanging and spiking CPU (Spiking Thread pattern) so a user dump was saved using command line userdump.exe:

0:000> vertarget Windows Server 2003 Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible Product: Server, suite: TerminalServer kernel32.dll version: 5.2.3790.4062 (srv03_sp2_gdr.070417-0203) Debug session time: Thu Nov 22 12:45:59.000 2007 (GMT+0) System Uptime: 0 days 10:43:07.667 Process Uptime: 0 days 4:51:32.000 Kernel time: 0 days 0:08:04.000 User time: 0 days 0:23:09.000

0:000> !runaway 3 User Mode Time Thread Time 0:1c1c 0 days 0:08:04.218 1:2e04 0 days 0:00:00.015 Kernel Mode Time Thread Time 0:1c1c 0 days 0:23:09.156 1:2e04 0 days 0:00:00.031

0:000> kL ChildEBP RetAddr 0012fb80 7739bf53 ntdll!KiFastSystemCallRet 0012fbb4 05ca73b0 user32!NtUserWaitMessage+0xc WARNING: Stack unwind information not available. Following frames may be wrong. 0012fd20 05c8be3f 3rdPartyDLL+0x573b0 0012fd50 05c9e9ea 3rdPartyDLL+0x3be3f 0012fd68 7739b6e3 3rdPartyDLL+0x4e9ea 0012fd94 7739b874 user32!InternalCallWinProc+0x28 0012fe0c 7739c8b8 user32!UserCallWinProcCheckWow+0x151 0012fe68 7739c9c6 user32!DispatchClientMessage+0xd9 0012fe90 7c828536 user32!__fnDWORD+0x24 0012febc 7739d1ec ntdll!KiUserCallbackDispatcher+0x2e 0012fef8 7738cee9 user32!NtUserMessageCall+0xc 0012ff18 0050aea9 user32!SendMessageA+0x7f 0012ff70 00452ae4 3rdPartyApp+0x10aea9 0012ffac 00511941 3rdPartyApp+0x52ae4 0012ffc0 77e6f23b 3rdPartyApp+0x111941 0012fff0 00000000 kernel32!BaseProcessStart+0x23

Exception chain showed custom exception handlers:

0:000> !exchain 0012fb8c: 3rdPartyDLL+57acb (05ca7acb) 0012fd28: 3rdPartyDLL+3be57 (05c8be57) 0012fd34: 3rdPartyDLL+3be68 (05c8be68) 0012fdfc: user32!_except_handler3+0 (773aaf18) CRT scope 0, func: user32!UserCallWinProcCheckWow+156 (773ba9ad) 0012fe58: user32!_except_handler3+0 (773aaf18) 0012fea0: ntdll!KiUserCallbackExceptionHandler+0 (7c8284e8) 0012ff3c: 3rdPartyApp+53310 (00453310) 0012ff48: 3rdPartyApp+5334b (0045334b) 0012ff9c: 3rdPartyApp+52d06 (00452d06) 0012ffb4: 3rdPartyApp+38d4 (004038d4) 0012ffe0: kernel32!_except_handler3+0 (77e61a60) CRT scope 0, filter: kernel32!BaseProcessStart+29 (77e76a10) func: kernel32!BaseProcessStart+3a (77e81469)

The customer then enabled MS Exception Monitor and selected only Access violation exception code (c0000005) to avoid False Positive Dumps. During application execution various 1st-chance exception crash dumps were saved pointing to numerous access violations including function calls into unloaded modules, for example:

0:000> kL 100 ChildEBP RetAddr WARNING: Frame IP not in any known module. Following frames may be wrong. 0012f910 7739b6e3 <Unloaded_Another3rdParty.dll>+0x4ce58 0012f93c 7739b874 user32!InternalCallWinProc+0x28 0012f9b4 7739c8b8 user32!UserCallWinProcCheckWow+0x151 0012fa10 7739c9c6 user32!DispatchClientMessage+0xd9 0012fa38 7c828536 user32!__fnDWORD+0x24 0012fa64 7739d1ec ntdll!KiUserCallbackDispatcher+0x2e 0012faa0 7738cee9 user32!NtUserMessageCall+0xc 0012fac0 0a0f2e01 user32!SendMessageA+0x7f 0012fae4 0a0f2ac7 3rdPartyDLL+0x52e01 0012fb60 7c81a352 3rdPartyDLL+0x52ac7 0012fb80 7c839dee ntdll!LdrpCallInitRoutine+0x14 0012fc94 77e6b1bb ntdll!LdrUnloadDll+0x41a 0012fca8 0050c9c1 kernel32!FreeLibrary+0x41 0012fdf4 004374af 3rdPartyApp+0x10c9c1 0012fe24 0044a076 3rdPartyApp+0x374af 0012fe3c 7739b6e3 3rdPartyApp+0x4a076 0012fe68 7739b874 user32!InternalCallWinProc+0x28 0012fee0 7739ba92 user32!UserCallWinProcCheckWow+0x151 0012ff48 773a16e5 user32!DispatchMessageWorker+0x327 0012ff58 00452aa0 user32!DispatchMessageA+0xf 0012ffac 00511941 3rdPartyApp+0x52aa0 0012ffc0 77e6f23b 3rdPartyApp+0x111941 0012fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns, WinDbg Tips and Tricks | 3 Comments »

Four causes of crash dumps

November 23rd, 2007

Obviously the appearance of crash dumps on your computer was caused by something. A bug, fault, defect or something else?

Aristotle suggested 4 types of causation 2 millennia ago and they are:

Material cause - presence of some substance, usually material one (hardware) but can be machine code (software). The distinction between hardware and software is often blurred today because of virtualization.

Formal cause - some form or arrangement (an algorithm)

Efficient cause - an agent (data flow or event caused an algorithm to be executed)

Final cause - the desire of someone (or something, operating system, for example).

We skip material causes because hardware and software are always involved. Obviously final causality should be among of crash dump causes because they were either anticipated or made deliberately. Let’s look at 3 examples with possible causes:

Buffer Overflow

Formal cause - a defect in code which might have arisen from incomplete or wrong model
Efficient cause - data is too big to fit in a buffer
Final cause - operating system and runtime library support decided to save a crash dump

Bugcheck (NMI)

Formal cause - NMI handler
Efficient cause - a button on a hardware panel or KeBugCheckEx
Final cause - “I need a memory dump” desire. Also crash dump saving functions were written before by kernel developers in anticipation of future crash dumps.

Bugcheck (A)

Formal cause - a defect in code again or particular disposition of threads
Efficient cause - Driver Verifier triggered paging out data
Final cause - deliberate OS bugcheck (here we can also say that it was anticipated by OS designers)

Concrete causes depend on the organizational level you use: software/hardware systems/components, modeling act by humans, etc.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dumps for Dummies, Science of Memory Dump Analysis | 1 Comment »

StressPrinters in press

November 23rd, 2007

Thomas Koetzing wrote a useful article on how to use StressPrinters and put some examples:

Understanding and using Citrix StressPrinters

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Announcements, Citrix, Tools | 1 Comment »

Crash Dump Analysis Patterns (Part 38)

November 22nd, 2007

Hooking functions using trampoline method is so common on Windows and sometimes we need to check Hooked Functions in specific modules and determine which module hooked them for troubleshooting or memory forensic analysis needs. If original unhooked modules are available (via symbol server, for example) this can be done by using !chkimg WinDbg extension command:

0:002> !chkimg -lo 50 -d !kernel32 -v Searching for module with expression: !kernel32 Will apply relocation fixups to file used for comparison Will ignore NOP/LOCK errors Will ignore patched instructions Image specific ignores will be applied Comparison image path: c:\mss\kernel32.dll\44C60F39102000\kernel32.dll No range specified

Scanning section: .text Size: 564445 Range to scan: 77e41000-77ecacdd 77e44004-77e44008 5 bytes - kernel32!GetDateFormatA [ 8b ff 55 8b ec:e9 f7 bf 08 c0 ] 77e4412e-77e44132 5 bytes - kernel32!GetTimeFormatA (+0×12a) [ 8b ff 55 8b ec:e9 cd be 06 c0 ] 77e4e857-77e4e85b 5 bytes - kernel32!FileTimeToLocalFileTime (+0xa729) [ 8b ff 55 8b ec:e9 a4 17 00 c0 ] 77e56b5f-77e56b63 5 bytes - kernel32!GetTimeZoneInformation (+0×8308) [ 8b ff 55 8b ec:e9 9c 94 00 c0 ] 77e579a9-77e579ad 5 bytes - kernel32!GetTimeFormatW (+0xe4a) [ 8b ff 55 8b ec:e9 52 86 06 c0 ] 77e57fc8-77e57fcc 5 bytes - kernel32!GetDateFormatW (+0×61f) [ 8b ff 55 8b ec:e9 33 80 08 c0 ] 77e6f32b-77e6f32f 5 bytes - kernel32!GetLocalTime (+0×17363) [ 8b ff 55 8b ec:e9 d0 0c 00 c0 ] 77e6f891-77e6f895 5 bytes - kernel32!LocalFileTimeToFileTime (+0×566) [ 8b ff 55 8b ec:e9 6a 07 01 c0 ] 77e83499-77e8349d 5 bytes - kernel32!SetLocalTime (+0×13c08) [ 8b ff 55 8b ec:e9 62 cb 00 c0 ] 77e88c32-77e88c36 5 bytes - kernel32!SetTimeZoneInformation (+0×5799) [ 8b ff 55 8b ec:e9 c9 73 01 c0 ] Total bytes compared: 564445(100%) Number of errors: 50 50 errors : !kernel32 (77e44004-77e88c36)

0:002> u 77e44004 kernel32!GetDateFormatA: 77e44004 e9f7bf08c0 jmp 37ed0000 77e44009 81ec18020000 sub esp,218h 77e4400f a148d1ec77 mov eax,dword ptr [kernel32!__security_cookie (77ecd148)] 77e44014 53 push ebx 77e44015 8b5d14 mov ebx,dword ptr [ebp+14h] 77e44018 56 push esi 77e44019 8b7518 mov esi,dword ptr [ebp+18h] 77e4401c 57 push edi

0:002> u 37ed0000 *** ERROR: Symbol file could not be found. Defaulted to export symbols for MyDateTimeHooks.dll - 37ed0000 e99b262f2d jmp MyDateTimeHooks+0×26a0 (651c26a0) 37ed0005 8bff mov edi,edi 37ed0007 55 push ebp 37ed0008 8bec mov ebp,esp 37ed000a e9fa3ff73f jmp kernel32!GetDateFormatA+0×5 (77e44009) 37ed000f 0000 add byte ptr [eax],al 37ed0011 0000 add byte ptr [eax],al 37ed0013 0000 add byte ptr [eax],al

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns, Security, WinDbg Tips and Tricks | 15 Comments »

Crash Dump Analysis AntiPatterns (Part 6)

November 22nd, 2007

Need the crash dump. Period. This might be the first thought when an engineer gets a stack trace fragment without symbolic information. It is usually based on the following presupposition:

We need an actual dump file to suggest further troubleshooting steps.

This is not actually true unless it is the first time you have the problem and get stack trace for it. Consider the following fragment from bugcheck kernel dump when no symbols were applied because the customer didn’t have them:

b90529f8 8085eced nt!KeBugCheckEx+0x1b b9052a70 8088c798 nt!MmAccessFault+0xb25 b9052a70 bfabd940 nt!_KiTrap0E+0xdc WARNING: Stack unwind information not available. Following frames may be wrong. b9052b14 bfabe452 MyDriver+0x27940

We can convert module+offset information into module!function+offset2 using MAP files or using DIA SDK (Debug Interface Access SDK) to query PDB files if we know module timestamp. This might be seen as a tedious exercise but we don’t need to do it if we keep raw stack trace signatures in some database when doing crash dump analysis. If we use our own symbol servers we might want to remove references to them and reload symbols. Then redo previous stack trace commands.

In my case it happened that I already analyzed similar previous bugcheck crash dumps months ago and saved stack trace prior to applying symbols. This helped me to point to solution without requesting the crash dump corresponding to that stack trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in AntiPatterns, Crash Dump Analysis, Crash Dump Patterns, Software Technical Support | No Comments »

Critical thinking when troubleshooting

November 22nd, 2007

Faulty thinking happens all the time in technical support environments partly due to hectic and demanding business realities.

Simple*ology book pointed me to this website:

http://www.fallacyfiles.org/

which taxonomically organizes fallacies:

http://www.fallacyfiles.org/taxonomy.html

For example, False Cause. Technical examples might include false causes inferred from trace analysis, customer problem description that includes steps to reproduce the problem, etc. This also applies to debugging and importance of thinking skills has been emphasized in the following book:

Debugging by Thinking: A Multidisciplinary Approach

Surface-level of basic crash dump analysis is less influenced by false cause fallacies because it doesn’t have explicitly recorded sequence of events although some caution should be exercised during detailed analysis of thread waiting times and other historical information.

Warning: when exercising critical thinking recursively we need to stop at the right time to avoid paralysis of analysis :-)

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Debugging, Software Technical Support | No Comments »

Crash Dump Analysis Patterns (Part 37)

November 21st, 2007

Some bugs are fixed using brute-force approach via putting an exception handler to catch access violations and other exceptions. Long time ago I saw one such “incredible fix” when the image processing application was crashing after approximately Nth heap free runtime call. To ignore crashes a SEH handler was put in place but the application started to crash in different places. Therefore the additional fix was to skip free calls when approaching N and resume afterwards. The application started to crash less frequently.

Here getting Early Crash Dump when a first-chance exception happens can help in component identification before corruption starts spreading across data. Recall that when an access violation happens in a process thread in user mode the system generates the first-chance exception which can be caught by an attached debugger and if there is no such debugger the system tries to find an exception handler and if that exception handler catches and dismisses the exception the thread resumes its normal execution path. If there are no such handlers found the system generates the so called second-chance exception with the same exception context to notify the attached debugger and if it is not attached a default thread exception handler usually saves a postmortem user dump.

You can get first-chance exception memory dumps with:

Debug Diagnostics
ADPlus in crash mode from Debugging Tools for Windows
Exception Monitor from User Mode Process Dumper package

Here is an example configuration rule for crashes in Debug Diagnostic tool for TestDefaultDebugger process (Unconfigured First Chance Exceptions option is set to Full Userdump):

When we push the big crash button in TestDefaultDebugger dialog box two crash dumps are saved, with first and second-chance exceptions pointing to the same code:

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_27PM__2__First chance exception 0XC0000005.dmp] User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. First chance exception 0XC0000005′ Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols Executable search path is: Windows Vista Version 6000 MP (2 procs) Free x86 compatible Product: WinNt, suite: SingleUserTS Debug session time: Wed Nov 21 16:28:27.000 2007 (GMT+0) System Uptime: 0 days 23:45:34.711 Process Uptime: 0 days 0:01:09.000

This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. (10dc.590): Access violation - code c0000005 (first/second chance not available) eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70 eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297 TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1: 004014f0 c7050000000000000000 mov dword ptr ds:[0],0 ds:002b:00000000=????????

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_34PM__693__ Second_Chance_Exception_C0000005.dmp] User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. Second_Chance_Exception_C0000005‘ Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols Executable search path is: Windows Vista Version 6000 MP (2 procs) Free x86 compatible Product: WinNt, suite: SingleUserTS Debug session time: Wed Nov 21 16:28:34.000 2007 (GMT+0) System Uptime: 0 days 23:45:39.313 Process Uptime: 0 days 0:01:16.000

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns, Debugging, Tools | 5 Comments »

Crash Dump Analysis on Solaris x86 - AMD64

November 20th, 2007

Found the following book which is an interesting read to see crash dump analysis from a different operating system architecture perspective but on the same Intel / AMD platform:

http://www.genunix.org/gen/crashdump/book.pdf

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Assembly Language, Books, Crash Dump Analysis, Software Architecture | No Comments »

Crash Dump Analysis Patterns (Part 31a)

November 20th, 2007

I have already discussed Passive Thread pattern in user space. In this part I continue with kernel space and passive system threads that don’t run in any user process context. These threads belong to the so called System process, don’t have any user space stack and their full stack traces can be seen from the output of !process command (if not completely paged out):

1: kd> !process 0 ff System

or from system portion of !stacks 2 command.

Some system threads from that list belong to core OS functionality and are not passive (function offsets can vary for different OS versions and service packs):

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 nt!MmZeroPageThread+0×180 nt!Phase1Initialization+0xe nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 nt!MiModifiedPageWriter+0×59 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForMultipleObjects+0x703 nt!MiMappedPageWriter+0xad nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForMultipleObjects+0x703 nt!KeBalanceSetManager+0×101 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 nt!KeSwapProcessOrStack+0×44 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 nt!EtwpLogger+0xdd nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 nt!KiExecuteDpc+0×198 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForMultipleObjects+0x703 nt!CcQueueLazyWriteScanThread+0×73 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForMultipleObjects+0x703 nt!ExpWorkerThreadBalanceManager+0×85 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

Other threads belong to various worker queues (they can also be seen from !exqueue ff command output) and wait for data items to arrive (passive threads):

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeRemoveQueueEx+0x848 nt!ExpWorkerThread+0×104 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x26 nt!KiSwapThread+0x2e5 nt!KeRemoveQueue+0x417 nt!ExpWorkerThread+0xc8 nt!PspSystemThreadStartup+0×2e nt!KiThreadStartup+0×16

Non-Exp system threads having Worker, Logging or Logger substrings in their function names are passive threads and wait for data too, for example:

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForMultipleObjects+0x703 nt!PfTLoggingWorker+0×81 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 nt!EtwpLogger+0xdd nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeRemoveQueueEx+0x848 nt!KeRemoveQueue+0x21 rdpdr!RxpWorkerThreadDispatcher+0×6f nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeWaitForSingleObject+0x5f5 HTTP!UlpThreadPoolWorker+0×26c nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeRemoveQueueEx+0x848 nt!KeRemoveQueue+0x21 srv2!SrvProcWorkerThread+0×74 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84 nt!KiSwapThread+0x125 nt!KeRemoveQueueEx+0x848 nt!KeRemoveQueue+0x21 srv!WorkerThread+0×90 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

Any deviations in memory dump can raise suspicion like in the stack below for driver.sys

nt!KiSwapContext+0x26 nt!KiSwapThread+0x284 nt!KeWaitForSingleObject+0×346 nt!ExpWaitForResource+0xd5 nt!ExAcquireResourceExclusiveLite+0×8d nt!ExEnterCriticalRegionAndAcquireResourceExclusive+0×19 driver!ProcessItem+0×2f driver!DelayedWorker+0×27 nt!ExpWorkerThread+0×104 nt!PspSystemThreadStartup+0×5b nt!KiStartSystemThread+0×16

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Crash Dump Patterns | 4 Comments »

Modeling side of DLL Injection

November 19th, 2007

Component injection can be used to model various process and system software behavior by writing customized DLL/SYS and injecting them into process/kernel space. Although often depicted either as security threat or value-added hooking mechanism very little has been written about its use to model various software defects. Here I don’t mean testing but studying faulty behavior and artifacts after injecting specific DLLs with design and implementation defects. For example, forgetting to release database connections or not closing file handles. NotMyLeak is an attempt to do it for different kind of leaks on x86 and x64 Windows platforms. It uses automatic DLL injection via standard Windows hooking mechanism. Stay tuned.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Science of Memory Dump Analysis | No Comments »

NotMyLeak

November 19th, 2007

To troubleshoot and study memory leaks the following tool called NotMyLeak will be released soon. It injects different kinds of leaks into specified processes and system:

Process heap
Runtime library
Performance counters
Kernel paged pool
Kernel nonpaged pool
IRP
Handles
PTE
etc…

The idea is to model various real-time leaks, analyze memory dumps and then apply discovered patterns to crash dump analysis of memory dumps coming from real-world systems.

The draft GUI (subject to change):

Note: the tool name prefix NotMy… was inspired by the name of Mark Russinovich’s tool called NotMyFault.

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Announcements, Crash Dump Analysis, Crash Dump Patterns, Debugging, Tools | No Comments »

Windows Internals book

November 19th, 2007

Scheduled to be updated with Windows Vista and Windows Server 2008 details:

Windows® Internals, Fifth Edition

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Announcements, Books, Crash Dump Analysis, Debugging, Software Architecture, Software Technical Support, Vista | 2 Comments »

Filtering processes

November 19th, 2007

When I analyze memory dumps coming from Microsoft or Citrix terminal service environments I frequently need to find a process hosting terminal service. In Windows 2000 it was the separate process termsrv.exe and now it is termsrv.dll which can be loaded into any of several instances of svchost.exe. The simplest way to narrow down that svchost.exe process if we have a complete memory dump is to use the module option of WinDbg !process command:

!process /m termsrv.dll 0

!process /m wsxica.dll 0

!process /m ctxrdpwsx.dll 0

Note: this option works only with W2K3, XP and later OS

Also to list all processes with user space stacks having the same image name we can use:

!process 0 ff msiexec.exe

!process 0 ff svchost.exe

Note: this command works with W2K too as well as session option (/s)

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Crash Dump Analysis, Debugging, WinDbg Tips and Tricks | No Comments »

Exceptions Ab Initio

November 16th, 2007

Where do native exceptions come from? How do they propagate from hardware and eventually result in crash dumps? I was asking these questions when I started doing crash dump analysis more than four years ago and I tried to find answers using IA-32 Intel® Architecture Software Developer’s Manual, WinDbg and complete memory dumps.

Eventually I wrote some blog posts about my findings. They are buried between many other posts so I dug them out and put on a dedicated page:

Interrupts and Exceptions Explained

- Dmitry Vostokov @ DumpAnalysis.org -

Posted in Announcements, Assembly Language, Bugchecks Depicted, Crash Dump Analysis, Debugging, Hardware | No Comments »

July 2026
M	T	W	T	F	S	S
« Jun
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pages

Recent Comments

Categories

Archives

ARM64

Automated Analysis

Blogroll

Debugging Channels

Forensics

Hardware

Linux

Mac OS X

Magazines and Newspapers

Malware Analysis

Medical Diagnostics

Narratology

Related Links

Reversing

Scripting Languages

Source Code

Tracing Tools

Meta