Archive for November, 2008

Crash Dump Analysis Patterns (Part 42d)

Friday, November 7th, 2008

Another example of Wait Chain pattern for objects with ownership semantics is seen in kernel and complete memory dumps where threads are waiting for thread objects. A thread object is a synchronization object whose owner is a thread so they can be easily identified. For example, the main application thread is waiting for an object:

1: kd> !thread 8818e660 16
THREAD 8818e660  Cid 1890.1c50  Teb: 7ffdf000 Win32Thread: b8411008 WAIT: (Unknown) UserMode Non-Alertable
    87d569d8  Thread
    8818e6d8  NotificationTimer
Not impersonating
DeviceMap                 e10008d8
Owning Process            87db5d88       Image:         App.exe
Wait Start TickCount      299006         Ticks: 255 (0:00:00:03.984)
Context Switch Count      1208                 LargeStack
UserTime                  00:00:00.203
KernelTime                00:00:00.203
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b42c3000 Current b42c2c60 Base b42c3000 Limit b42be000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b42c2c78 80833e8d 8818e660 8818e708 00000003 nt!KiSwapContext+0×26
b42c2ca4 80829b74 00000000 b42c2d14 00000000 nt!KiSwapThread+0×2e5
b42c2cec 8093b034 87d569d8 00000006 00804c01 nt!KeWaitForSingleObject+0×346
b42c2d50 8088ac4c 000001ec 00000000 b42c2d14 nt!NtWaitForSingleObject+0×9a
b42c2d50 7c8285ec 000001ec 00000000 b42c2d14 nt!KiFastCallEntry+0xfc
0006fde4 7c827d0b 77e61d1e 000001ec 00000000 ntdll!KiFastSystemCallRet
0006fde8 77e61d1e 000001ec 00000000 0006fe2c ntdll!NtWaitForSingleObject+0xc
0006fe58 77e61c8d 000001ec 00001388 00000000 kernel32!WaitForSingleObjectEx+0xac
0006fe6c 01039308 000001ec 00001388 00000000 kernel32!WaitForSingleObject+0×12
0006fe94 010204ac 0007cc00 00000001 00000002 App!WaitForNotifyList+0xf2
[…]

However that object is a thread too:

THREAD 87d569d8  Cid 1890.1ec0  Teb: 7ffd9000 Win32Thread: b869ba48 WAIT: (Unknown) UserMode Non-Alertable
    8a1f8870  Thread

Therefore, we see that thread 8818e660 is waiting for another thread 87d569d8 which belongs to the same process with PID 1890 and thread 87d569d8 itself is waiting for thread 8a1f8870 which has the following stack trace:

1: kd> !thread 8a1f8870 16
THREAD 8a1f8870  Cid 1890.07d8  Teb: 7ff95000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
    8a0ce4c0  NotificationEvent
    886f1870  NotificationEvent

Not impersonating
DeviceMap                 e10008d8
Owning Process            87db5d88       Image:         App.exe
Wait Start TickCount      292599         Ticks: 6662 (0:00:01:44.093)
Context Switch Count      17            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address Dll!StartMonitoring (0×758217b8)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b6d4f000 Current b6d4e900 Base b6d4f000 Limit b6d4c000 Call 0
Priority 14 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b6d4e918 80833e8d 8a1f8870 00000002 00140000 nt!KiSwapContext+0×26
b6d4e944 808295ab 8a1f8870 00000002 00000000 nt!KiSwapThread+0×2e5
b6d4e978 8093b290 00000002 b6d4eaac 00000001 nt!KeWaitForMultipleObjects+0×3d7
b6d4ebf4 8093b3f2 00000002 b6d4ec1c 00000001 nt!ObpWaitForMultipleObjects+0×202
b6d4ed48 8088ac4c 00000002 026bfc08 00000001 nt!NtWaitForMultipleObjects+0xc8
b6d4ed48 7c8285ec 00000002 026bfc08 00000001 nt!KiFastCallEntry+0xfc
026bfbb8 7c827cfb 77e6202c 00000002 026bfc08 ntdll!KiFastSystemCallRet
026bfbbc 77e6202c 00000002 026bfc08 00000001 ntdll!NtWaitForMultipleObjects+0xc
026bfc64 77e62fbe 00000002 026bfca4 00000000 kernel32!WaitForMultipleObjectsEx+0×11a
026bfc80 6554a01f 00000002 026bfca4 00000000 kernel32!WaitForMultipleObjects+0×18
026bfcfc 758237a3 cd050002 ffffffff 026bfd4c Dll!GetStatusChange+0×7bf
026bffb8 77e64829 75833120 00000000 00000000 Dll!StartMonitoring+0×14b
026bffec 00000000 758217b8 75833120 00000000 kernel32!BaseThreadStart+0×34

Thread 8a1f8870 is waiting for two notification events disjointly which is confirmed by dumping WaitForMultipleObjects arguments. Neither of them is in signaled state and one is a named event “MyEventObject”:

1: kd> dd 026bfc08 l2
026bfc08  0000008c 00000084

1: kd> !handle 0000008c
processor number 1, process 87db5d88
PROCESS 87db5d88  SessionId: 4  Cid: 1890    Peb: 7ffdc000  ParentCid: 01d0
    DirBase: cfe438e0  ObjectTable: e178c228  HandleCount: 439.
    Image: App.exe

Handle table at e50d2000 with 439 Entries in use
008c: Object: 8a0ce4c0  GrantedAccess: 001f0003 Entry: e50d2118
Object: 8a0ce4c0  Type: (8b26ec00) Event
    ObjectHeader: 8a0ce4a8 (old version)
        HandleCount: 1  PointerCount: 3

1: kd> !handle 00000084
processor number 1, process 87db5d88
PROCESS 87db5d88  SessionId: 4  Cid: 1890    Peb: 7ffdc000  ParentCid: 01d0
    DirBase: cfe438e0  ObjectTable: e178c228  HandleCount: 439.
    Image: App.exe

Handle table at e50d2000 with 439 Entries in use
0084: Object: 886f1870  GrantedAccess: 001f0003 (Inherit) Entry: e50d2108
Object: 886f1870  Type: (8b26ec00) Event
    ObjectHeader: 886f1858 (old version)
        HandleCount: 1  PointerCount: 4
        Directory Object: e43ee320  Name: MyEventObject

1: kd> dt _DISPATCHER_HEADER 8a0ce4c0
cutildll!_DISPATCHER_HEADER
   +0×000 Type             : 0 ”
   +0×001 Absolute         : 0 ”
   +0×002 Size             : 0×4 ”
   +0×003 Inserted         : 0 ”
   +0×003 DebugActive      : 0 ”
   +0×000 Lock             : 262144
   +0×004 SignalState      : 0
   +0×008 WaitListHead     : _LIST_ENTRY [ 0×88519d18 - 0×8a1f8918 ]

1: kd> dt _DISPATCHER_HEADER 886f1870
cutildll!_DISPATCHER_HEADER
   +0×000 Type             : 0 ”
   +0×001 Absolute         : 0 ”
   +0×002 Size             : 0×4 ”
   +0×003 Inserted         : 0 ”
   +0×003 DebugActive      : 0 ”
   +0×000 Lock             : 262144
   +0×004 SignalState      : 0
   +0×008 WaitListHead     : _LIST_ENTRY [ 0×88519d30 - 0×8a1f8930 ]

Here is a diagram showing this wait chain:

- Dmitry Vostokov @ DumpAnalysis.org -

Hide, seek and dump in a Citrix farm

Friday, November 7th, 2008

CtxHideEx32 tool has been updated to the version 1.1 and can be downloaded from Citrix support. It now allows a substring search for a window title or class, for example:

CtxHideEx32.exe HIDE "*error" "" OK

As by-product coupled with an optional command line I discovered that it allows to automatically dump any process displaying a message box with an error message in its window title. Here is an example using TestDefaultDebugger64 to simulate an application fault message where the following instance of CtxHideEx32 was setup to dump a process showing WER dialog on Vista:

CtxHideEx32.exe NONE "*Microsoft Windows" "" "C:\kktools\userdump8.1\x64\userdump.exe %d"

We click on a big lightning button:

and then WER dialog appears:

Immediately CtxHideEx32 kicks in and starts dumping the owner process incessantly so you better to dismiss this dialog by choosing something:

We see it was WerFault.exe. 

Note: I think I have to amend CtxHideEx32 to make it wait until the spawned command line interpreter finishes its job. Stay tuned.

- Dmitry Vostokov @ DumpAnalysis.org -

Googol Dump: A Computational SF Novel

Friday, November 7th, 2008

Science fiction books are among my favourite. I used to read lots of them (in Russian) during my school and university years. Also I started reading science fiction in English 8 years ago upon my arrival to Ireland and one of Asimov’s books about Foundation was my first English fiction book fully read from cover to cover. Now I want to write something fictional related to memory dump analysis and the notion of 3-dimensional memory dumps is a fascinating idea to exploit:

Googol Dump - a 3D memory dump where the 3rd dimension is a time arrow of computational 2D memory snapshots.

Note: one googol is 10100 and this number of bits is sufficient enough to record the full history of memory snapshots for 64-bit, 128-bit, 256-bit and even 1024-bit computers running thousands of years.

The novel is planned to be published next year (ISBN: 978-1906717322) and is written from the perspective of a debugger.

- Dmitry Vostokov @ DumpAnalysis.org -

To bugcheck or not to bugcheck

Thursday, November 6th, 2008

This “Hamlet’s Question” of software technical support is often asked and unfortunately sometimes not even asked at all when troubleshooting and debugging complex enterprise environments. For applications the question of saving crash dumps is trivial. If a process is not in memory and not visible in Task Manager we won’t be able to dump it manually. With OS always running even when hanging the question often degenerates to “Let’s bugcheck and send the crash dump to dump file divers”. After that decision huge amounts of energy are spent in collecting, sending and storing gigabytes of data with always very little or no return. Therefore here is the preliminary list of symptoms where manual system dumps are appropriate and when they are not:

When a manual system dump is appropriate

  • - The system hangs visually (no GUI activity possible)

  • - No connections or logins are possible

  • - Abnormal system metrics (like pool, thread or process number growth)

  • - Insufficient system or session memory

When a manual process user dump is more appropriate than a complete memory dump

  • - Process hangs visually (other applications work as normal)

  • - Error message box appears

  • - Abnormal process metrics (like process memory growth or handle leaks)

When manual kernel and complete memory dumps are almost useless (I say almost because in rare circumstances they can aid in problem resolution so it is better not to collect them until explicitly asked from skilled memory dump file diver)

  • - Application failures resulted in their disappearance from the list of running processes

  • - Functional bugs (dynamic activity that requires historical tracing of events)

Note: 3rd-party kernel mode software developers should not face this question during the development of their drivers and delegate the responsibility for difficult bugcheck or panic decisions to an operating system. Surely Windows core developers face this question too.

Next we discuss another related question about choosing between kernel and complete memory dump options in Control Panel. 

- Dmitry Vostokov @ DumpAnalysis.org -

Lateral damage, stack overflow and execution residue: pattern cooperation

Wednesday, November 5th, 2008

As I mentioned in comments to Lateral Damage pattern it lies in between the normal healthy dump files and corrupt dumps. For example, the following 8Gb complete memory dump that fits perfectly into 16Gb page file had the problem of missing processor control region making it impossible to get meaningful information from certain WinDbg commands:

0: kd> !analyze -v

[...]

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7727fe0
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------

Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3
Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3

[...]

STACK_TEXT: 
WARNING: Stack unwind information not available. Following frames may be wrong.
8089a600 8088ddf2 00000000 0000000e 00000000 processr+0x2886
8089a604 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa

[...]

0: kd> ~1
Unable to read selector for PCR for processor 1
WARNING: Unable to reset page directories

1: kd> !pcr
Unable to read selector for PCR for processor 1
Cannot get PRCB address 

1: kd> kv
ChildEBP RetAddr  Args to Child             
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 0×0

The bugcheck argument 1 shows that we have a double fault that most often results from kernel stack overflow. If we go back to processor 0 to inspect its TSS we don’t get meaningful results too (we expect the value of Backlink to be 0×28):

0: kd> !pcr
KPCR for Processor 0 at ffdff000:
    Major 1 Minor 1
 NtTib.ExceptionList: ffffffff
     NtTib.StackBase: 00000000
    NtTib.StackLimit: 00000000
  NtTib.SubSystemTib: 80042000
       NtTib.Version: 2a1b0b08
   NtTib.UserPointer: 00000001
       NtTib.SelfTib: 00000000

             SelfPcr: ffdff000
                Prcb: ffdff120
                Irql: 0000001f
                 IRR: 00000000
                 IDR: ffffffff
       InterruptMode: 00000000
                 IDT: 8003f400
                 GDT: 8003f000
                 TSS: 80042000

       CurrentThread: 8089d8c0
          NextThread: 00000000
          IdleThread: 8089d8c0

           DpcQueue:

0: kd> dt _KTSS 80042000
nt!_KTSS
   +0×000 Backlink         : 0xc45
   +0×002 Reserved0        : 0×4d8a
   +0×004 Esp0             : 0×8089a6a0
   +0×008 Ss0              : 0×10
   +0×00a Reserved1        : 0xb70f
   +0×00c NotUsed1         : [4] 0×5031ff00
   +0×01c CR3              : 0×8b55ff8b
   +0×020 Eip              : 0xc75ffec
   +0×024 EFlags           : 0xe80875ff
   +0×028 Eax              : 0xfffffbdd
   +0×02c Ecx              : 0×1b75c084
   +0×030 Edx              : 0×8b184d8b
   +0×034 Ebx              : 0×7d8b57d1
   +0×038 Esp              : 0×2e9c110
   +0×03c Ebp              : 0xf3ffc883
   +0×040 Esi              : 0×83ca8bab
   +0×044 Edi              : 0xaaf303e1
   +0×048 Es               : 0xeb5f
   +0×04a Reserved2        : 0×6819
   +0×04c Cs               : 0×24fc
   +0×04e Reserved3        : 0×44
   +0×050 Ss               : 0×75ff
   +0×052 Reserved4        : 0xff18
   +0×054 Ds               : 0×1475
   +0×056 Reserved5        : 0×75ff
   +0×058 Fs               : 0xff10
   +0×05a Reserved6        : 0xc75
   +0×05c Gs               : 0×75ff
   +0×05e Reserved7        : 0xe808
   +0×060 LDT              : 0
   +0×062 Reserved8        : 0xffff
   +0×064 Flags            : 0
   +0×066 IoMapBase        : 0×20ac
   +0×068 IoMaps           : [1] _KiIoAccessMap
   +0×208c IntDirectionMap  : [32]  “???”

However if we try to list all thread stacks we see one thread running on processor 1:

0: kd> !process 0 ff

[...]

THREAD 8a241db0  Cid 1218.4420  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
IRP List:
   8b200008: (0006,0244) Flags: 00000884  Mdl: 00000000
   89beedb8: (0006,0244) Flags: 00000884  Mdl: 00000000
Not impersonating
DeviceMap                 e1002060
Owning Process            8bc63d88       Image:         svchost.exe
Wait Start TickCount      10242012       Ticks: 0
Context Switch Count      1832            
UserTime                  00:00:00.000
KernelTime                00:00:00.046
Start Address termdd (0xf75cc218)
Stack Init 9c849000 Current 9c846938 Base 9c849000 Limit 9c846000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Unable to read selector for PCR for processor 1

[...]

Now we can look at its raw stack to see execution residue and try to reconstruct partial stack traces:

0: kd> dds 9c846000  9c849000
9c846000  94040001
9c846004  00000014
9c846008  8d147848
9c84600c  8d0bfd08
9c846010  8d0bfd00
9c846014  00000001
9c846018  8d0bfd08
9c84601c  8d0bfd00
9c846020  8d0bfd00
9c846024  9c846034
9c846028  80a5c456 hal!KfLowerIrql+0×62
9c84602c  8d0bfdd8
9c846030  8d0bfd00
9c846034  9c846060
9c846038  80a5a56d hal!KeReleaseQueuedSpinLock+0×2d
9c84603c  00000011
9c846040  00000001
9c846044  8a241db0
9c846048  0000000e
9c84604c  00000000
9c846050  8d0bfdc0
9c846054  05000000
9c846058  00007400
9c84605c  00000001
9c846060  9c846084
9c846064  808b6138 driverA!MapData+0×4a
9c846068  8d0bfd08
9c84606c  00007400
9c846070  00000000
9c846074  00000018
9c846078  00000028
9c84607c  00001000
9c846080  00000018
9c846084  9c84609c
9c846088  f7b8f2e5 driverB!CheckData+0×7a
9c84608c  01b47538
9c846090  00000028
9c846094  0000001c
[…]

0: kd> k L=9c846024 9c846024 9c846024
ChildEBP RetAddr 
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846024 80a5c456 0×9c846024
9c846034 80a5a56d hal!KfLowerIrql+0×62
9c8460f0 8080d164 hal!KeReleaseQueuedSpinLock+0×2d
9c846060 808b6138 driverA!RemapData+0×3e
9c846084 f7b8f2e5 driverA!MapData+0×4a
9c84609c f7b8f340 driverB!CheckData+0×7a
9c8460e4 808b4000 driverB!CheckAttributes+0×36f
9c84610c f7b8e503 driverB!AddToRecord+0×2a
9c846174 f7b90df0 driverB!ReadRecord+0×1d0
f7b8e508 90909090 driverB!ReadAllRecords+0×7a
[…]

Using the current stack pointer we get another partial stack trace:

0: kd> k L=9c846034 9c846938 9c846938
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846954 8081df65 0×9c846938
9c846968 808f5437 nt!IofCallDriver+0×45
9c84697c 808ef963 nt!IopSynchronousServiceTail+0×10b
9c8469a0 8088978c nt!NtQueryDirectoryFile+0×5d
9c8469a0 8082f1c1 nt!KiFastCallEntry+0xfc
9c846a44 f5296f4b nt!ZwQueryDirectoryFile+0×11
9c846a90 f5297451 DriverC+0×2f4b
9c846adc f52a54cb DriverC+0×3451
9c846af8 f52a44e6 DriverC+0×114cb
9c846b1c f52b2941 DriverC+0×104e6
9c846b4c f52b2626 DriverC+0×1e941
9c846b88 f52a34a7 DriverC+0×1e626
9c846be8 f52a487c DriverC+0xf4a7
[…]

Using different base pointers for k command we can reconstruct different partial stack traces. We can also analyze the longest ones for any stack usage using variant of knf command that shows stack frame size in bytes and find drivers that consume the most of kernel stack. Because we see execution residue on top of the kernel stack (Limit) we can suspect this thread caused the actual stack overflow which resulted in the double fault bugcheck.

- Dmitry Vostokov @ DumpAnalysis.org -

New powerful memory snapshot tool

Wednesday, November 5th, 2008

Matthieu Suiche has released the new version of win32dd tool with the ability to save physical memory in a WinDbg-compliant memory dump file including pages that normally are not saved in a complete memory dump.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.60

Monday, November 3rd, 2008

This is an example of an almost totally bugtated quotation:

Blue Color “of” Crash.

Ernest Lehman, Sweet Smell of Success

- Dmitry Vostokov @ DumpAnalysis.org -

The mystery of top hit kifastsystemcallret

Monday, November 3rd, 2008

I was always suspicious why kifastsystemcallretis the most searched keyword and now I think there are automated web scanning engines doing data mining for stack traces to keep their databases for crash dump analysis and other stats up-to-date. This is how I would design my own internet bot to find such stack traces. Originally I thought that people are looking for it and wrote this article:

What is KiFastSystemCallRet?

I might be wrong here and this function is searched by humans indeed because it is on top of stack traces and novice users of WinDbg or other debugging tools check its purpose.

- Dmitry Vostokov @ DumpAnalysis.org -