Software Diagnostics Library

Crash Dump Analysis Patterns (Part 1b)

Almost 2 years have passed since I wrote the first post about crash dump analysis patterns: Multiple Exceptions. Today I write about multiple exceptions or faults in kernel mode. Here I distinguish multiple exceptions from nested exceptions. The latter ones in kernel result in double faults, see for example, Stack Overflow pattern. At the first glance it looked like the dump was saved manually:

0: kd> !analyze -v [...] MANUALLY_INITIATED_CRASH (e2) The user manually initiated this crash dump. Arguments: Arg1: 00000000 Arg2: 00000000 Arg3: 00000000 Arg4: 00000000 [...]

However further down in analysis report there was the presence of page fault:

TRAP_FRAME: a38df520 -- (.trap 0xffffffffa38df520) ErrCode = 00000002 eax=b6d9220f ebx=b6ab4ffb ecx=00000304 edx=eaf2fdea esi=b6d9214c edi=b6ab8189 eip=bfa10e6e esp=a38df594 ebp=a38df5ac iopl=0 nv up ei ng nz ac po cy cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293 driver+0x2ae6e: bfa10e6e 895304 mov dword ptr [ebx+4],edx ds:0023:b6ab4fff=???????? Resetting default scope

STACK_TEXT: a38df410 b48aa532 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1b a38df440 b48a9d2c 000eba28 9282c8c6 00000000 i8042prt!I8xProcessCrashDump+0x256 a38df488 80839595 89d0c008 8a0eb970 0001000a i8042prt!I8042KeyboardInterruptService+0x225 a38df488 80836bfa 89d0c008 8a0eb970 0001000a nt!KiInterruptDispatch+0x49 a38df520 bfa10e6e badb0d00 eaf2fdea 8867cbe8 nt!KiTrap0E+0xbc WARNING: Stack unwind information not available. Following frames may be wrong. a38df5ac bfa22461 b6ab423b 000003dc 00000007 driver+0×2ae6e […]

Looking at b6ab4fff address shows that it crosses page boundary, see Data Alignment pattern.

We also see that this thread was running and consumed too much kernel time, see Spiking Thread pattern:

0: kd> !thread THREAD 88e686d8 Cid 1e48.1f7c Teb: 7ffdf000 Win32Thread: b669de70 RUNNING on processor 0 Not impersonating DeviceMap dc971120 Owning Process 889e0d88 Image: ProcessA.EXE Wait Start TickCount 9231345 Ticks: 0 Context Switch Count 2196221 LargeStack UserTime 00:00:35.562 KernelTime 04:51:23.656 […]

Intrigued, we see another running thread on the second processor:

0: kd> !running Prcb Current Next 0 ffdff120 88e686d8 ................ 1 f7727120 88bd33f8 …………….

0: kd> !thread 88bd33f8 THREAD 88bd33f8 Cid 2fdc.27f0 Teb: 7ffdf000 Win32Thread: b6640ab8 RUNNING on processor 1 Not impersonating DeviceMap d7a13b40 Owning Process 89e45200 Image: ProcessA.EXE Wait Start TickCount 9231345 Ticks: 0 Context Switch Count 2324364 LargeStack UserTime 00:00:21.171 KernelTime 05:02:09.500 Win32 Start Address ProcessA (0×30001e28) Start Address kernel32!BaseProcessStartThunk (0×77e617f8) Stack Init ac5e7bd0 Current ac5e7078 Base ac5e8000 Limit ac5e1000 Call ac5e7bd8 Priority 6 BasePriority 6 PriorityDecrement 0 ChildEBP RetAddr Args to Child ac5e7150 bfa10e6e badb0d00 dbeaffdb 89a793d8 nt!KiTrap0E+0xbc (FPO: [0,0] TrapFrame @ ac5e7150) WARNING: Stack unwind information not available. Following frames may be wrong. ac5e71dc bfa22461 b701f15f ffffff24 00000007 driver+0×2ae6e […]

We see it is spiking CPU too and we detect a possible loop in page fault handler:

0: kd> .thread 88bd33f8 Implicit thread is now 88bd33f8

0: kd> ~1s

1: kd> r eax=fffff81c ebx=ac5e71dc ecx=88bd33f8 edx=dbeaffdb esi=b6f81168 edi=b701fffe eip=80836bfa esp=ac5e7150 ebp=ac5e7150 iopl=0 nv up ei pl nz na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000206 nt!KiTrap0E+0xbc: 80836bfa 0f84e5010000 je nt!KiTrap0E+0×2a7 (80836de5) [br=0]

When looking at raw stack we see that the loop happened after processing this exception:

1: kd> .trap ac5e7150 ErrCode = 00000002 eax=b6f8122b ebx=b701fffa ecx=fffffe4c edx=dbeaffdb esi=b6f81168 edi=b70201a0 eip=bfa10e6e esp=ac5e71c4 ebp=ac5e71dc iopl=0 nv up ei ng nz ac po cy cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293 driver+0×2ae6e: bfa10e6e 895304 mov dword ptr [ebx+4],edx ds:0023:b701fffe=????????

The address crosses page boundary too:

1: kd> !pte b701fffe VA b701fffe PDE at C0300B70 PTE at C02DC07C contains 642CF863 contains 2F336863 pfn 642cf ---DA--KWEV pfn 2f336 ---DA--KWEV

1: kd> !pte b701fffe+3 VA b7020001 PDE at C0300B70 PTE at C02DC080 contains 642CF863 contains 00000080 pfn 642cf ---DA--KWEV not valid DemandZero Protect: 4 - ReadWrite

This is because trap processing code is found below the current ESP value and also 3rd-party virtual block drivers which I guess were trying to satisfy page fault (the latter not shown in the raw stack fragment here):

1: kd> dds esp-1000 esp [...] ac5e6f78 00000002 ac5e6f7c 899c05b0 ac5e6f80 88bd33f8 ac5e6f84 00000010 ac5e6f88 ac5e702c ac5e6f8c 808457ff nt!KeContextFromKframes+0x9b ac5e6f90 00000023 ac5e6f94 f7727120 ac5e6f98 00000000 ac5e6f9c 808458fd nt!KeContextFromKframes+0x2bc ac5e6fa0 ac5e70dc ac5e6fa4 1f840a42 ac5e6fa8 00000000 ac5e6fac f7727000 ac5e6fb0 00000000 ac5e6fb4 f7727a7c ac5e6fb8 ac5e6fd4 ac5e6fbc 808398d4 nt!KiDispatchInterrupt+0xd8 ac5e6fc0 00000000 ac5e6fc4 80a801ae hal!HalpDispatchSoftwareInterrupt+0x5e ac5e6fc8 ac5e700c ac5e6fcc ac5e7001 ac5e6fd0 00000002 ac5e6fd4 ac5e6ff0 ac5e6fd8 80a80397 hal!HalpCheckForSoftwareInterrupt+0x3f ac5e6fdc 00000002 ac5e6fe0 ac5e700c ac5e6fe4 ac5e700c ac5e6fe8 ac5e70b0 ac5e6fec 00000001 ac5e6ff0 f772f120 ac5e6ff4 88bd33f8 ac5e6ff8 00000002 ac5e6ffc ac5e700c ac5e7000 8a0a88a0 ac5e7004 88bd33f8 ac5e7008 f7727002 ac5e700c 80a8057e hal!HalEndSystemInterrupt+0x6e ac5e7010 88bd33f8 ac5e7014 f7727002 ac5e7018 00000002 ac5e701c ac5e702c ac5e7020 80a80456 hal!KfLowerIrql+0x62 ac5e7024 f7727000 ac5e7028 0000bb40 ac5e702c ac5e70ac ac5e7030 808093eb nt!KiSaveProcessorState+0x20 ac5e7034 ac5e70dc ac5e7038 00000000 ac5e703c 808093f0 nt!KiSaveProcessorState+0x25 ac5e7040 f772713c ac5e7044 8087dcbd nt!KiFreezeTargetExecution+0x6a ac5e7048 ac5e70dc ac5e704c 00000000 ac5e7050 f7727120 ac5e7054 00000000 ac5e7058 80a7e501 hal!KeAcquireQueuedSpinLockRaiseToSynch+0x21 ac5e705c 88bd3401 ac5e7060 ac5e7070 ac5e7064 80a80456 hal!KfLowerIrql+0x62 ac5e7068 80a7e530 hal!KeReleaseInStackQueuedSpinLock ac5e706c 88bd3401 ac5e7070 ac5e70b0 ac5e7074 80a7e56d hal!KeReleaseQueuedSpinLock+0x2d ac5e7078 80823822 nt!KiDeliverApc+0x1cc ac5e707c 00000000 ac5e7080 ac806e00 ac5e7084 00000200 ac5e7088 00000000 ac5e708c 88bd343c ac5e7090 00000001 ac5e7094 ac5e7934 ac5e7098 89e45200 ac5e709c 809282c8 nt!CmpPostApc ac5e70a0 00000000 ac5e70a4 0000010c ac5e70a8 1d01f008 ac5e70ac ac5e70dc ac5e70b0 80837c86 nt!KiIpiServiceRoutine+0x8b ac5e70b4 ac5e70dc ac5e70b8 00000000 ac5e70bc 80836bfa nt!KiTrap0E+0xbc ac5e70c0 b6f81168 ac5e70c4 ac5e7150 ac5e70c8 80a7d8fc hal!HalpIpiHandler+0xcc ac5e70cc ac5e70dc ac5e70d0 00000000 ac5e70d4 80a80300 hal!HalpLowerIrqlHardwareInterrupts+0x10c ac5e70d8 000000e1 ac5e70dc ac5e7150 ac5e70e0 80836bfa nt!KiTrap0E+0xbc ac5e70e4 badb0d00 ac5e70e8 dbeaffdb ac5e70ec ac5e70fc ac5e70f0 80a80456 hal!KfLowerIrql+0x62 ac5e70f4 2f336801 ac5e70f8 ac806e00 ac5e70fc ac5e7138 ac5e7100 8081a2bf nt!MmAccessFault+0x558 ac5e7104 b701fffe ac5e7108 00000000 ac5e710c 00000000 ac5e7110 00000023 ac5e7114 00000023 ac5e7118 dbeaffdb ac5e711c 88bd33f8 ac5e7120 fffff81c ac5e7124 00000000 ac5e7128 ac5e72b0 ac5e712c 00000030 ac5e7130 b701fffe ac5e7134 b6f81168 ac5e7138 ac5e71dc ac5e713c ac5e7150 ac5e7140 00000000 ac5e7144 80836bfa nt!KiTrap0E+0xbc ac5e7148 00000008 ac5e714c 00000206 ac5e7150 ac5e71dc

What we may guess here is the fact that two page faults happened simultaneously or nearly at the same time and one of them possibly during the attempt to satisfy it and this resulted in two processors looping. The whole system was hang and the usual keyboard method via Scroll Lock was used to generate the manual dump.

- Dmitry Vostokov @ DumpAnalysis.org -

This entry was posted on Wednesday, October 15th, 2008 at 7:32 am and is filed under Crash Dump Analysis, Crash Dump Patterns, Debugging. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “Crash Dump Analysis Patterns (Part 1b)”

Crash Dump Analysis » Blog Archive » Invalid pointer, incorrect stack trace, multiple exceptions, insufficient memory and memory leak: pattern cooperation Says:
June 10th, 2009 at 2:59 pm
[…] the same time we see a second fault on another processor but it seems to be a demand zero page […]
Crash Dump Analysis » Blog Archive » WOW64 process, NULL data pointer, stack overflow, main thread, incorrect stack trace, nested exceptions, hidden exception, manual dump, multiple exceptions and virtualized system: pattern cooperation Says:
July 12th, 2009 at 5:59 pm
[…] see a page fault on a kernel stack trace of the main process […]
Crash Dump Analysis » Blog Archive » Icons for Memory Dump Analysis Patterns (Part 2) Says:
March 9th, 2010 at 12:52 pm
[…] we introduce an icon for Multiple Exceptions (kernel mode) […]

You must be logged in to post a comment.

Crash Dump Analysis Patterns (Part 1b)

3 Responses to “Crash Dump Analysis Patterns (Part 1b)”

Leave a Reply