Crash Dump Analysis Patterns (Part 39)

November 23rd, 2007

As mentioned in Early Crash Dump pattern saving crash dumps on first-chance exceptions helps to diagnose components that might have caused corruption and later crashes, hangs or CPU spikes by ignoring abnormal exceptions like access violation. In such cases we need to know whether an application installs its own Custom Exception Handler or several of them. If it uses only default handlers provided by runtime or windows subsystem then most likely a first-chance access violation exception will result in a last-chance exception and a postmortem dump. To check a chain of exception handlers we can use WinDbg !exchain extention command. For example:

0:000> !exchain
0017f9d8: TestDefaultDebugger!AfxWinMain+3f5 (00420aa9)
0017fa60: TestDefaultDebugger!AfxWinMain+34c (00420a00)
0017fb20: user32!_except_handler4+0 (770780eb)
0017fcc0: user32!_except_handler4+0 (770780eb)
0017fd24: user32!_except_handler4+0 (770780eb)
0017fe40: TestDefaultDebugger!AfxWinMain+16e (00420822)
0017feec: TestDefaultDebugger!AfxWinMain+797 (00420e4b)
0017ff90: TestDefaultDebugger!_except_handler4+0 (00410e00)
0017ffdc: ntdll!_except_handler4+0 (77961c78)

We see that TestDefaultDebugger doesn’t have its own exception handlers except ones provided by MFC and C/C++ runtime libraries which were linked statically. Here is another example. It was reported that a 3rd-party application was hanging and spiking CPU (Spiking Thread pattern) so a user dump was saved using command line userdump.exe:

0:000> vertarget
Windows Server 2003 Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
Product: Server, suite: TerminalServer
kernel32.dll version: 5.2.3790.4062 (srv03_sp2_gdr.070417-0203)
Debug session time: Thu Nov 22 12:45:59.000 2007 (GMT+0)
System Uptime: 0 days 10:43:07.667
Process Uptime: 0 days 4:51:32.000 
Kernel time: 0 days 0:08:04.000 
User time: 0 days 0:23:09.000

0:000> !runaway 3 
User Mode Time 
Thread Time  
0:1c1c      0 days 0:08:04.218  
1:2e04      0 days 0:00:00.015
Kernel Mode Time 
Thread Time  
0:1c1c      0 days 0:23:09.156  
1:2e04      0 days 0:00:00.031

0:000> kL
ChildEBP RetAddr 
0012fb80 7739bf53 ntdll!KiFastSystemCallRet
0012fbb4 05ca73b0 user32!NtUserWaitMessage+0xc
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fd20 05c8be3f 3rdPartyDLL+0x573b0
0012fd50 05c9e9ea 3rdPartyDLL+0x3be3f
0012fd68 7739b6e3 3rdPartyDLL+0x4e9ea
0012fd94 7739b874 user32!InternalCallWinProc+0x28
0012fe0c 7739c8b8 user32!UserCallWinProcCheckWow+0x151
0012fe68 7739c9c6 user32!DispatchClientMessage+0xd9
0012fe90 7c828536 user32!__fnDWORD+0x24
0012febc 7739d1ec ntdll!KiUserCallbackDispatcher+0x2e
0012fef8 7738cee9 user32!NtUserMessageCall+0xc
0012ff18 0050aea9 user32!SendMessageA+0x7f
0012ff70 00452ae4 3rdPartyApp+0x10aea9
0012ffac 00511941 3rdPartyApp+0x52ae4
0012ffc0 77e6f23b 3rdPartyApp+0x111941
0012fff0 00000000 kernel32!BaseProcessStart+0x23

Exception chain showed custom exception handlers:

0:000> !exchain
0012fb8c: 3rdPartyDLL+57acb (05ca7acb)
0012fd28: 3rdPartyDLL+3be57 (05c8be57)
0012fd34: 3rdPartyDLL+3be68 (05c8be68)

0012fdfc: user32!_except_handler3+0 (773aaf18)
  CRT scope  0, func:   user32!UserCallWinProcCheckWow+156 (773ba9ad)
0012fe58: user32!_except_handler3+0 (773aaf18)
0012fea0: ntdll!KiUserCallbackExceptionHandler+0 (7c8284e8)
0012ff3c: 3rdPartyApp+53310 (00453310)
0012ff48: 3rdPartyApp+5334b (0045334b)
0012ff9c: 3rdPartyApp+52d06 (00452d06)
0012ffb4: 3rdPartyApp+38d4 (004038d4)

0012ffe0: kernel32!_except_handler3+0 (77e61a60)
  CRT scope  0, filter: kernel32!BaseProcessStart+29 (77e76a10)
                func:   kernel32!BaseProcessStart+3a (77e81469)

The customer then enabled MS Exception Monitor and selected only Access violation exception code (c0000005) to avoid False Positive Dumps. During application execution various 1st-chance exception crash dumps were saved pointing to numerous access violations including function calls into unloaded modules, for example:

0:000> kL 100
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f910 7739b6e3 <Unloaded_Another3rdParty.dll>+0x4ce58
0012f93c 7739b874 user32!InternalCallWinProc+0x28
0012f9b4 7739c8b8 user32!UserCallWinProcCheckWow+0x151
0012fa10 7739c9c6 user32!DispatchClientMessage+0xd9
0012fa38 7c828536 user32!__fnDWORD+0x24
0012fa64 7739d1ec ntdll!KiUserCallbackDispatcher+0x2e
0012faa0 7738cee9 user32!NtUserMessageCall+0xc
0012fac0 0a0f2e01 user32!SendMessageA+0x7f
0012fae4 0a0f2ac7 3rdPartyDLL+0x52e01
0012fb60 7c81a352 3rdPartyDLL+0x52ac7
0012fb80 7c839dee ntdll!LdrpCallInitRoutine+0x14
0012fc94 77e6b1bb ntdll!LdrUnloadDll+0x41a
0012fca8 0050c9c1 kernel32!FreeLibrary+0x41
0012fdf4 004374af 3rdPartyApp+0x10c9c1
0012fe24 0044a076 3rdPartyApp+0x374af
0012fe3c 7739b6e3 3rdPartyApp+0x4a076
0012fe68 7739b874 user32!InternalCallWinProc+0x28
0012fee0 7739ba92 user32!UserCallWinProcCheckWow+0x151
0012ff48 773a16e5 user32!DispatchMessageWorker+0x327
0012ff58 00452aa0 user32!DispatchMessageA+0xf
0012ffac 00511941 3rdPartyApp+0x52aa0
0012ffc0 77e6f23b 3rdPartyApp+0x111941
0012fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov @ DumpAnalysis.org -

Four causes of crash dumps

November 23rd, 2007

Obviously the appearance of crash dumps on your computer was caused by something. A bug, fault, defect or something else?

Aristotle suggested 4 types of causation 2 millennia ago and they are:

Material cause - presence of some substance, usually material one (hardware) but can be machine code (software). The distinction between hardware and software is often blurred today because of virtualization.

Formal cause - some form or arrangement (an algorithm)

Efficient cause - an agent (data flow or event caused an algorithm to be executed)

Final cause - the desire of someone (or something, operating system, for example).

We skip material causes because hardware and software are always involved. Obviously final causality should be among of crash dump causes because they were either anticipated or made deliberately. Let’s look at 3 examples with possible causes:

Buffer Overflow

  • Formal cause - a defect in code which might have arisen from incomplete or wrong model

  • Efficient cause - data is too big to fit in a buffer

  • Final cause - operating system and runtime library support decided to save a crash dump

Bugcheck (NMI)

  • Formal cause - NMI handler

  • Efficient cause - a button on a hardware panel or KeBugCheckEx

  • Final cause - “I need a memory dump” desire. Also crash dump saving functions were written before by kernel developers in anticipation of future crash dumps.

Bugcheck (A)

  • Formal cause - a defect in code again or particular disposition of threads

  • Efficient cause - Driver Verifier triggered paging out data

  • Final cause - deliberate OS bugcheck (here we can also say that it was anticipated by OS designers)

Concrete causes depend on the organizational level you use: software/hardware systems/components, modeling act by humans, etc.

- Dmitry Vostokov @ DumpAnalysis.org -

StressPrinters in press

November 23rd, 2007

Thomas Koetzing wrote a useful article on how to use StressPrinters and put some examples:

Understanding and using Citrix StressPrinters

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 38)

November 22nd, 2007

Hooking functions using trampoline method is so common on Windows and sometimes we need to check Hooked Functions in specific modules and determine which module hooked them for troubleshooting or memory forensic analysis needs. If original unhooked modules are available (via symbol server, for example) this can be done by using !chkimg WinDbg extension command:

0:002> !chkimg -lo 50 -d !kernel32 -v
Searching for module with expression: !kernel32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\kernel32.dll\44C60F39102000\kernel32.dll
No range specified

Scanning section:    .text
Size: 564445
Range to scan: 77e41000-77ecacdd
    77e44004-77e44008  5 bytes - kernel32!GetDateFormatA
 [ 8b ff 55 8b ec:e9 f7 bf 08 c0 ]
    77e4412e-77e44132  5 bytes - kernel32!GetTimeFormatA (+0×12a)
 [ 8b ff 55 8b ec:e9 cd be 06 c0 ]
    77e4e857-77e4e85b  5 bytes - kernel32!FileTimeToLocalFileTime (+0xa729)
 [ 8b ff 55 8b ec:e9 a4 17 00 c0 ]
    77e56b5f-77e56b63  5 bytes - kernel32!GetTimeZoneInformation (+0×8308)
 [ 8b ff 55 8b ec:e9 9c 94 00 c0 ]
    77e579a9-77e579ad  5 bytes - kernel32!GetTimeFormatW (+0xe4a)
 [ 8b ff 55 8b ec:e9 52 86 06 c0 ]
    77e57fc8-77e57fcc  5 bytes - kernel32!GetDateFormatW (+0×61f)
 [ 8b ff 55 8b ec:e9 33 80 08 c0 ]
    77e6f32b-77e6f32f  5 bytes - kernel32!GetLocalTime (+0×17363)
 [ 8b ff 55 8b ec:e9 d0 0c 00 c0 ]
    77e6f891-77e6f895  5 bytes - kernel32!LocalFileTimeToFileTime (+0×566)
 [ 8b ff 55 8b ec:e9 6a 07 01 c0 ]
    77e83499-77e8349d  5 bytes - kernel32!SetLocalTime (+0×13c08)
 [ 8b ff 55 8b ec:e9 62 cb 00 c0 ]
    77e88c32-77e88c36  5 bytes - kernel32!SetTimeZoneInformation (+0×5799)
 [ 8b ff 55 8b ec:e9 c9 73 01 c0 ]
Total bytes compared: 564445(100%)
Number of errors: 50
50 errors : !kernel32 (77e44004-77e88c36)

0:002> u 77e44004
kernel32!GetDateFormatA:
77e44004 e9f7bf08c0      jmp     37ed0000
77e44009 81ec18020000    sub     esp,218h
77e4400f a148d1ec77      mov     eax,dword ptr [kernel32!__security_cookie (77ecd148)]
77e44014 53              push    ebx
77e44015 8b5d14          mov     ebx,dword ptr [ebp+14h]
77e44018 56              push    esi
77e44019 8b7518          mov     esi,dword ptr [ebp+18h]
77e4401c 57              push    edi

0:002> u 37ed0000
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for MyDateTimeHooks.dll -
37ed0000 e99b262f2d      jmp     MyDateTimeHooks+0×26a0 (651c26a0)
37ed0005 8bff            mov     edi,edi
37ed0007 55              push    ebp
37ed0008 8bec            mov     ebp,esp
37ed000a e9fa3ff73f      jmp     kernel32!GetDateFormatA+0×5 (77e44009)
37ed000f 0000            add     byte ptr [eax],al
37ed0011 0000            add     byte ptr [eax],al
37ed0013 0000            add     byte ptr [eax],al

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 6)

November 22nd, 2007

Need the crash dump. Period. This might be the first thought when an engineer gets a stack trace fragment without symbolic information. It is usually based on the following presupposition:

We need an actual dump file to suggest further troubleshooting steps.

This is not actually true unless it is the first time you have the problem and get stack trace for it. Consider the following fragment from bugcheck kernel dump when no symbols were applied because the customer didn’t have them:

b90529f8 8085eced nt!KeBugCheckEx+0x1b
b9052a70 8088c798 nt!MmAccessFault+0xb25
b9052a70 bfabd940 nt!_KiTrap0E+0xdc
WARNING: Stack unwind information not available. Following frames may be wrong.
b9052b14 bfabe452 MyDriver+0x27940

We can convert module+offset information into module!function+offset2 using MAP files or using DIA SDK (Debug Interface Access SDK) to query PDB files if we know module timestamp. This might be seen as a tedious exercise but we don’t need to do it if we keep raw stack trace signatures in some database when doing crash dump analysis. If we use our own symbol servers we might want to remove references to them and reload symbols. Then redo previous stack trace commands.

In my case it happened that I already analyzed similar previous bugcheck crash dumps months ago and saved stack trace prior to applying symbols. This helped me to point to solution without requesting the crash dump corresponding to that stack trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Critical thinking when troubleshooting

November 22nd, 2007

Faulty thinking happens all the time in technical support environments partly due to hectic and demanding business realities.

Simple*ology book pointed me to this website:

http://www.fallacyfiles.org/ 

which taxonomically organizes fallacies:

http://www.fallacyfiles.org/taxonomy.html

For example, False Cause. Technical examples might include false causes inferred from trace analysis, customer problem description that includes steps to reproduce the problem, etc. This also applies to debugging and importance of thinking skills has been emphasized in the following book:

Debugging by Thinking: A Multidisciplinary Approach

Surface-level of basic crash dump analysis is less influenced by false cause fallacies because it doesn’t have explicitly recorded sequence of events although some caution should be exercised during detailed analysis of thread waiting times and other historical information.   

Warning: when exercising critical thinking recursively we need to stop at the right time to avoid paralysis of analysis :-) 

- Dmitry Vostokov @ DumpAnalysis.org

Crash Dump Analysis Patterns (Part 37)

November 21st, 2007

Some bugs are fixed using brute-force approach via putting an exception handler to catch access violations and other exceptions. Long time ago I saw one such “incredible fix” when the image processing application was crashing after approximately Nth heap free runtime call. To ignore crashes a SEH handler was put in place but the application started to crash in different places. Therefore the additional fix was to skip free calls when approaching N and resume afterwards. The application started to crash less frequently.

Here getting Early Crash Dump when a first-chance exception happens can help in component identification before corruption starts spreading across data. Recall that when an access violation happens in a process thread in user mode the system generates the first-chance exception which can be caught by an attached debugger and if there is no such debugger the system tries to find an exception handler and if that exception handler catches and dismisses the exception the thread resumes its normal execution path. If there are no such handlers found the system generates the so called second-chance exception with the same exception context to notify the attached debugger and if it is not attached a default thread exception handler usually saves a postmortem user dump.

You can get first-chance exception memory dumps with:

Here is an example configuration rule for crashes in Debug Diagnostic tool for TestDefaultDebugger process (Unconfigured First Chance Exceptions option is set to Full Userdump):

    

When we push the big crash button in TestDefaultDebugger dialog box two crash dumps are saved, with first and second-chance exceptions pointing to the same code:

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_27PM__2__First chance exception 0XC0000005.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. First chance exception 0XC0000005′
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Wed Nov 21 16:28:27.000 2007 (GMT+0)
System Uptime: 0 days 23:45:34.711
Process Uptime: 0 days 0:01:09.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(10dc.590): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70
eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

Loading Dump File [C:\Program Files (x86)\DebugDiag\Logs\Crash rule for all instances of TestDefaultDebugger.exe\TestDefaultDebugger__PID__4316__ Date__11_21_2007__Time_04_28_34PM__693__ Second_Chance_Exception_C0000005.dmp]
User Mini Dump File with Full Memory: Only application data is available

Comment: 'Dump created by DbgHost. Second_Chance_Exception_C0000005
Symbol search path is: srv*c:\mss*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Wed Nov 21 16:28:34.000 2007 (GMT+0)
System Uptime: 0 days 23:45:39.313
Process Uptime: 0 days 0:01:16.000

This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(10dc.590): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=0017fe70 edx=00000000 esi=00425ae8 edi=0017fe70
eip=004014f0 esp=0017f898 ebp=0017f8a4 iopl=0 nv up ei ng nz ac pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010297
TestDefaultDebugger!CTestDefaultDebuggerDlg::OnBnClickedButton1:
004014f0 c7050000000000000000 mov dword ptr ds:[0],0  ds:002b:00000000=????????

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis on Solaris x86 - AMD64

November 20th, 2007

Found the following book which is an interesting read to see crash dump analysis from a different operating system architecture perspective but on the same Intel / AMD platform:

http://www.genunix.org/gen/crashdump/book.pdf

- Dmitry Vostokov @ DumpAnalysis.org

Crash Dump Analysis Patterns (Part 31a)

November 20th, 2007

I have already discussed Passive Thread pattern in user space. In this part I continue with kernel space and passive system threads that don’t run in any user process context. These threads belong to the so called System process, don’t have any user space stack and their full stack traces can be seen from the output of !process command (if not completely paged out):

1: kd> !process 0 ff System

or from system portion of !stacks 2 command.  

Some system threads from that list belong to core OS functionality and are not passive (function offsets can vary for different OS versions and service packs):

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!MmZeroPageThread+0×180
nt!Phase1Initialization+0xe
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!MiModifiedPageWriter+0×59
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!MiMappedPageWriter+0xad
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!KeBalanceSetManager+0×101
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!KeSwapProcessOrStack+0×44
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!EtwpLogger+0xdd
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!KiExecuteDpc+0×198
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!CcQueueLazyWriteScanThread+0×73
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!ExpWorkerThreadBalanceManager+0×85
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

Other threads belong to various worker queues (they can also be seen from !exqueue ff command output) and wait for data items to arrive (passive threads):

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!ExpWorkerThread+0×104
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

or

nt!KiSwapContext+0x26
nt!KiSwapThread+0x2e5
nt!KeRemoveQueue+0x417
nt!ExpWorkerThread+0xc8
nt!PspSystemThreadStartup+0×2e
nt!KiThreadStartup+0×16

Non-Exp system threads having Worker, Logging or Logger substrings in their function names are passive threads and wait for data too, for example:

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForMultipleObjects+0x703
nt!PfTLoggingWorker+0×81
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
nt!EtwpLogger+0xdd
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!KeRemoveQueue+0x21
rdpdr!RxpWorkerThreadDispatcher+0×6f
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeWaitForSingleObject+0x5f5
HTTP!UlpThreadPoolWorker+0×26c
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!KeRemoveQueue+0x21
srv2!SrvProcWorkerThread+0×74
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

nt!KiSwapContext+0x84
nt!KiSwapThread+0x125
nt!KeRemoveQueueEx+0x848
nt!KeRemoveQueue+0x21
srv!WorkerThread+0×90
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

Any deviations in memory dump can raise suspicion like in the stack below for driver.sys 

nt!KiSwapContext+0x26
nt!KiSwapThread+0x284
nt!KeWaitForSingleObject+0×346
nt!ExpWaitForResource+0xd5
nt!ExAcquireResourceExclusiveLite+0×8d
nt!ExEnterCriticalRegionAndAcquireResourceExclusive+0×19

driver!ProcessItem+0×2f
driver!DelayedWorker+0×27

nt!ExpWorkerThread+0×104
nt!PspSystemThreadStartup+0×5b
nt!KiStartSystemThread+0×16

- Dmitry Vostokov @ DumpAnalysis.org

Modeling side of DLL Injection

November 19th, 2007

Component injection can be used to model various process and system software behavior by writing customized DLL/SYS and injecting them into process/kernel space. Although often depicted either as security threat or value-added hooking mechanism very little has been written about its use to model various software defects. Here I don’t mean testing but studying faulty behavior and artifacts after injecting specific DLLs with design and implementation defects. For example, forgetting to release database connections or not closing file handles. NotMyLeak is an attempt to do it for different kind of leaks on x86 and x64 Windows platforms. It uses automatic DLL injection via standard Windows hooking mechanism. Stay tuned.

- Dmitry Vostokov @ DumpAnalysis.org -  

NotMyLeak

November 19th, 2007

To troubleshoot and study memory leaks the following tool called NotMyLeak will be released soon. It injects different kinds of leaks into specified processes and system:

  • Process heap
  • Runtime library
  • Performance counters
  • Kernel paged pool
  • Kernel nonpaged pool
  • IRP
  • Handles
  • PTE
  • etc…

The idea is to model various real-time leaks, analyze memory dumps and then apply discovered patterns to crash dump analysis of memory dumps coming from real-world systems.   

The draft GUI (subject to change):

Note: the tool name prefix NotMy… was inspired by the name of Mark Russinovich’s tool called NotMyFault.

- Dmitry Vostokov @ DumpAnalysis.org

Windows Internals book

November 19th, 2007

Scheduled to be updated with Windows Vista and Windows Server 2008 details:

Windows® Internals, Fifth Edition

- Dmitry Vostokov @ DumpAnalysis.org

Filtering processes

November 19th, 2007

When I analyze memory dumps coming from Microsoft or Citrix terminal service environments I frequently need to find a process hosting terminal service. In Windows 2000 it was the separate process termsrv.exe and now it is termsrv.dll which can be loaded into any of several instances of svchost.exe. The simplest way to narrow down that svchost.exe process if we have a complete memory dump is to use the module option of WinDbg !process command:

!process /m termsrv.dll 0

!process /m wsxica.dll 0

!process /m ctxrdpwsx.dll 0

Note: this option works only with W2K3, XP and later OS

Also to list all processes with user space stacks having the same image name we can use:

!process 0 ff msiexec.exe

or  

!process 0 ff svchost.exe

Note: this command works with W2K too as well as session option (/s)

- Dmitry Vostokov @ DumpAnalysis.org

Exceptions Ab Initio

November 16th, 2007

Where do native exceptions come from? How do they propagate from hardware and eventually result in crash dumps? I was asking these questions when I started doing crash dump analysis more than four years ago and I tried to find answers using IA-32 Intel® Architecture Software Developer’s Manual, WinDbg and complete memory dumps.

Eventually I wrote some blog posts about my findings. They are buried between many other posts so I dug them out and put on a dedicated page:

Interrupts and Exceptions Explained

- Dmitry Vostokov @ DumpAnalysis.org

Memorillion and Quadrimemorillion

November 15th, 2007

What are these? These are names of the number of possible unique complete memory dumps when address space is 32 bit and 64-bit correspondingly:

256232 and 256264

The first of them can be approximated by 101010

This idea came to me after I learnt about the so called “immense number” proposed by Walter Elsasser. This number is so big that its digits cannot be listed because there is not enough particles in observable Universe to write them.

Certainly one memorillion is more than one googol 10100 but it requires only approx. 1010 particles in ideal case to list its digits and therefore not an immense number. It is however far less than one googolplex 1010100.

Consider a complete memory dump with bytes written in hexadecimal notation:

0x50414745554d500f000000ce0e00000090...

This number has more than 8 billion digits… And it is one possible number out of memorillion of them. So one memorillion in hexadecimal notation is just

0xFFFFFFFFFFFFFFFFFFFFF... + 1

where we have 2*232 ‘F’ symbols written sequentially. One quadrimemorillion has 2*264 ‘F’ symbols.

Also the question about the number of possible crash dumps can be considered as Microsoft interview style question when you have possible candidates and you want to assess their ability to think out of the box and handle large numbers. 

- Dmitry Vostokov @ DumpAnalysis.org -

Making Software Troubleshooting Simple

November 15th, 2007

Excellent read to refine general problem solving skills towards simplicity, understand broad applicability of modeling and just for fun:

Simpleology: The Simple Science of Getting What You Want

But from Amazon

Now I’m going to have a simple lunch and read this simple book. What about the rating? Of course, it is simple too! Maximum! 1 star in my simple zero-one binary rating system - worth (1) or not worth (0) to read.

- Dmitry Vostokov @ DumpAnalysis.org -

News for C++ and MFC funs

November 15th, 2007

I write most of my tools using C++, MFC and STL and I was really delighted to hear about new MFC framework improvements in forthcoming Visual Studio 2008. You can read the following press release from Russian ISV:

http://www.bcgsoft.com/pressreleases/PR071110.pdf

This is also discussed on MS Visual C++ team blog:

http://blogs.msdn.com/vcblog/archive/2007/11/09/quick-tour-of-new-mfc-functionality.aspx

I was also thinking about extending my MFC projects with .NET class library and found this interesting practical book:

Extending MFC Applications with the .NET Framework

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 36)

November 14th, 2007

The pattern I should have written as one of the first is called Local Buffer Overflow. It is observed on x86 platforms when a local variable and a function return address and/or saved frame pointer EBP are overwritten with some data. As a result, the instruction pointer EIP becomes Wild Pointer and we have a process crash in user mode or a bugcheck in kernel mode. Sometimes this pattern is diagnosed by looking at mismatched EBP and ESP values and in the case of ASCII or UNICODE buffer overflow EIP register may contain 4-char or 2-wchar_t value and ESP or EBP or both registers might point at some string fragment like in the example below:

0:000> r
eax=000fa101 ebx=0000c026 ecx=01010001 edx=bd43a010 esi=000003e0 edi=00000000
eip=0048004a esp=0012f158 ebp=00510044 iopl=0  nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000202
0048004a 0000 add     byte ptr [eax],al  ds:0023:000fa101=??

0:000> kL
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f154 00420047 0x48004a
0012f158 00440077 0x420047
0012f15c 00420043 0x440077
0012f160 00510076 0x420043
0012f164 00420049 0x510076
0012f168 00540041 0x420049
0012f16c 00540041 0x540041
...
...
...

Good buffer overflow case studies with complete analysis including assembly language tutorial can be found in Buffer Overflow Attacks book.

Buy from Amazon 

- Dmitry Vostokov @ DumpAnalysis.org -

Recent service downtime

November 13th, 2007

Due to increased popularity this site became slow and even had severe service disruptions during last few days. I have moved it to a dedicated virtual server and now it should be much faster with at least 99.9% service uptime.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 35)

November 12th, 2007

In kernel or complete memory dumps coming from hanging or slow workstations and servers !irpfind WinDbg command may show IRP Distribution Anomaly pattern when certain drivers have excessive count of active IRPs not observed under normal circumstances. I created two IRP distribution graphs from two problem kernel dumps by preprocessing command output using Visual Studio keyboard macros to eliminate completed IRPs and then using Excel. In one case it was a big number of I/O request packets from 3rd-party antivirus filter driver:

\Driver\3rdPartyAvFilter

In the second case it was the huge number of active IRPs targeted to kernel socket ancillary function driver:

\Driver\AFD

Two other peaks on both graphs are related to NTPS and NTFS, pipes and file system and usually normal. Here is IRP distribution graph from my Vista workstation captured while I was writing this post:

- Dmitry Vostokov @ DumpAnalysis.org -