Archive for October 30th, 2006

Crash Dump Analysis Patterns (Part 1)

Monday, October 30th, 2006

After doing crash dump analysis exclusively for more than 3 years I decided to organize my knowledge into a set of patterns (so to speak in a dump analysis pattern language and therefore try to facilitate its common vocabulary).

What is a pattern? It is a general solution you can apply in a specific context to a common recurrent problem.

There are many pattern and pattern languages in software engineering, for example, look at the following almanac that lists +700 patterns:

The Pattern Almanac 2000

and the following link is very useful:

Patterns Library

The first pattern I’m going to introduce today is Multiple Exceptions. This pattern captures the known fact that there could be as many exceptions (”crashes”) as many threads in a process. The following UML diagram depicts the relationship between Process, Thread and Exception entities:

Every process in Windows has at least one execution thread so there could be at least one exception per thread (like invalid memory reference) if things go wrong. There could be second exception in that thread if exception handling code experiences another exception or the first exception was handled and you have another one and so on.

So what is the general solution to that common problem when an application or service crashes and you have a crash dump file (common recurrent problem) from a customer (specific context)? The general solution is to look at all threads and their stacks and do not rely on what tools say.

Here is a concrete example from one of the dumps I got today:

Internet Explorer crashed and I opened it in WinDbg and ran ‘!analyze -v’ command. This is what I got in my WinDbg output:

ExceptionAddress: 7c822583 (ntdll!DbgBreakPoint)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 8fb834b8
   Parameter[2]: 00000003

Break instruction, you might think, shows that the dump was taken manually from the running application and there was no crash - the customer sent the wrong dump or misunderstood instructions. However I looked at all threads and noticed the following two stacks (threads 15 and 16):

0:016>~*kL
...
15  Id: 1734.8f4 Suspend: 1 Teb: 7ffab000 Unfrozen
ntdll!KiFastSystemCallRet
ntdll!NtRaiseHardError+0xc
kernel32!UnhandledExceptionFilter+0x54b
kernel32!BaseThreadStart+0x4a
kernel32!_except_handler3+0x61
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xe
componentA!xxx
componentB!xxx
mshtml!xxx
kernel32!BaseThreadStart+0x34

# 16  Id: 1734.11a4 Suspend: 1 Teb: 7ffaa000 Unfrozen
ntdll!DbgBreakPoint
ntdll!DbgUiRemoteBreakin+0x36

So we see here that the real crash happened in componentA.dll and componentB.dll or mshtml.dll might have influenced that. Why this happened? The customer might have dumped Internet Explorer manually while it was displaying an exception message box. The following reference says that ZwRaiseHardError displays a message box containing an error message:

Windows NT/2000 Native API Reference

Buy from Amazon

Or perhaps something else happened. Many cases where we see multiple thread exceptions in one process dump happened because crashed threads displayed message boxes like Visual C++ debug message box and preventing that process from termination. In our dump under discussion WinDbg automatic analysis command recognized only the last breakpoint exception (shown as # 16). In conclusion we shouldn’t rely on ”automatic analysis” often anyway and probably should write our own extension to list possible multiple exceptions (based on some heuristics I will talk about later).

- Dmitry Vostokov @ DumpAnalysis.org -

Applying API Wrapper Pattern

Monday, October 30th, 2006

Recently I had been porting my old Win32 legacy project (more than 100,000 lines) to Windows Mobile. Here I summarize the approach I used and I can say now that it was very successful (no single crash since I finished my porting - the original Win32 program was very stable indeed but we all know that hidden bugs surface or introduced when project is ported to another platform). The project was written in Windows 3.x 16-bit era and then it was already ported to Win32 in Windows 95 era. Win32 interface is huge and it contains many legacy Win16 functions (mainly for easy portability and compatibility with existing Win16 code base). The following UML component diagram depicts application dependencies on many Win32 API and runtime libraries from build perspective:

Windows Mobile (in essence Windows CE) has smaller interface and many functions available in Win32 API (especially legacy Win16) and many C runtime functions are absent. My project uses many such functions (due to its history) so the interface becomes broken. The following UML component diagram depicts this dependency:

First I ported my project to use UNICODE strings and UNICODE function equivalents throughout. This was a huge task already. Then instead of further rewriting my code which uses many absent functions and therefore quite possibly to introduce new bugs due to changed semantics I decided to apply a variant of Adapter or Wrapper pattern (for non-object-oriented API). My application now still uses old functions but links to a set of libraries translating these calls into existing Windows CE API. The following UML component depicts the final component infrastructure from build perspective:

Here is the list of functions I had to translate:

- RegEmu.lib: translates INI file calls into registry

  • GetProfileString
  • GetProfileInt
  • GetPrivateProfileString
  • GetPrivateProfileInt
  • WriteProfileString
  • WritePrivateProfileString

- FileEmu.lib: various file system related calls

  • _lcreatW
  • _lopenW
  • OpenFileW
  • WinExecW
  • _wsplitpath
  • _wsplitpathparam
  • _waccess
  • _wmkdir
  • _wrename
  • _wfindfirst
  • _wfindnext
  • _findclose
  • _wunlink
  • _wrmdir
  • _llseek
  • _lclose
  • _hread
  • _hwrite
  • _lread
  • _lwrite
  • _wstat
  • _wgetcwd
  • _wmakepath
  • _chdrive
  • _wchdir
  • ShellExecute
  • GetWindowsDirectory
  • GetSystemDirectory

- GdiEmu.lib: various graphics functions

  • SetMapMode
  • CreateFont
  • CreateDIBitmap

- UserEmu.lib: various UI related functions

  • GetKeyNameText
  • GetScrollPos
  • GetScrollRange
  • GetLastActivePopup
  • IsIconic
  • IsZoomed
  • IsBadStringPtr
  • IsMenu
  • VkKeyScan
  • GetKeyboardState
  • SetKeyboardState
  • ToAscii
  • MulDiv

PS. If you are interested in applying patterns to your C++ projects or simply in using them to describe your design and architecture these books are good place to start:

Design Patterns: Elements of Reusable Object-Oriented Software

Buy from Amazon

Pattern-Oriented Software Architecture, Volume 1: A System of Patterns

Buy from Amazon

Pattern-Oriented Software Architecture, Volume 2, Patterns for Concurrent and Networked Objects

Buy from Amazon

- Dmitry Vostokov -