Crash Dump Analysis Patterns (Part 1b)

Almost 2 years have passed since I wrote the first post about crash dump analysis patterns: Multiple Exceptions. Today I write about multiple exceptions or faults in kernel mode. Here I distinguish multiple exceptions from nested exceptions. The latter ones in kernel result in double faults, see for example, Stack Overflow pattern. At the first glance it looked like the dump was saved manually:

0: kd> !analyze -v
[...]
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
[...]

However further down in analysis report there was the presence of page fault:

TRAP_FRAME:  a38df520 -- (.trap 0xffffffffa38df520)
ErrCode = 00000002
eax=b6d9220f ebx=b6ab4ffb ecx=00000304 edx=eaf2fdea esi=b6d9214c edi=b6ab8189
eip=bfa10e6e esp=a38df594 ebp=a38df5ac iopl=0  nv up ei ng nz ac po cy
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000  efl=00010293
driver+0x2ae6e:
bfa10e6e 895304   mov     dword ptr [ebx+4],edx ds:0023:b6ab4fff=????????
Resetting default scope

STACK_TEXT: 
a38df410 b48aa532 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1b
a38df440 b48a9d2c 000eba28 9282c8c6 00000000 i8042prt!I8xProcessCrashDump+0x256
a38df488 80839595 89d0c008 8a0eb970 0001000a i8042prt!I8042KeyboardInterruptService+0x225
a38df488 80836bfa 89d0c008 8a0eb970 0001000a nt!KiInterruptDispatch+0x49
a38df520 bfa10e6e badb0d00 eaf2fdea 8867cbe8 nt!KiTrap0E+0xbc
WARNING: Stack unwind information not available. Following frames may be wrong.
a38df5ac bfa22461 b6ab423b 000003dc 00000007 driver+0×2ae6e
[…]

Looking at b6ab4fff address shows that it crosses page boundary, see Data Alignment pattern.

We also see that this thread was running and consumed too much kernel time, see Spiking Thread pattern:

0: kd> !thread
THREAD 88e686d8  Cid 1e48.1f7c  Teb: 7ffdf000 Win32Thread: b669de70 RUNNING on processor 0
Not impersonating
DeviceMap                 dc971120
Owning Process            889e0d88       Image:         ProcessA.EXE
Wait Start TickCount      9231345        Ticks: 0
Context Switch Count      2196221                 LargeStack
UserTime                  00:00:35.562
KernelTime                04:51:23.656
[…]

Intrigued, we see another running thread on the second processor:

0: kd> !running
     Prcb      Current   Next   
  0  ffdff120  88e686d8            ................
  1  f7727120  88bd33f8            …………….

0: kd> !thread 88bd33f8
THREAD 88bd33f8  Cid 2fdc.27f0  Teb: 7ffdf000 Win32Thread: b6640ab8 RUNNING on processor 1
Not impersonating
DeviceMap                 d7a13b40
Owning Process            89e45200       Image:         ProcessA.EXE
Wait Start TickCount      9231345        Ticks: 0
Context Switch Count      2324364                 LargeStack
UserTime                  00:00:21.171
KernelTime                05:02:09.500
Win32 Start Address ProcessA (0×30001e28)
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init ac5e7bd0 Current ac5e7078 Base ac5e8000 Limit ac5e1000 Call ac5e7bd8
Priority 6 BasePriority 6 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ac5e7150 bfa10e6e badb0d00 dbeaffdb 89a793d8 nt!KiTrap0E+0xbc (FPO: [0,0] TrapFrame @ ac5e7150)
WARNING: Stack unwind information not available. Following frames may be wrong.
ac5e71dc bfa22461 b701f15f ffffff24 00000007 driver+0×2ae6e
[…]

We see it is spiking CPU too and we detect a possible loop in page fault handler:

0: kd> .thread 88bd33f8
Implicit thread is now 88bd33f8

0: kd> ~1s

1: kd> r
eax=fffff81c ebx=ac5e71dc ecx=88bd33f8 edx=dbeaffdb esi=b6f81168 edi=b701fffe
eip=80836bfa esp=ac5e7150 ebp=ac5e7150 iopl=0 nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000  efl=00000206
nt!KiTrap0E+0xbc:
80836bfa 0f84e5010000  je  nt!KiTrap0E+0×2a7 (80836de5) [br=0]

When looking at raw stack we see that the loop happened after processing this exception:

1: kd> .trap ac5e7150
ErrCode = 00000002
eax=b6f8122b ebx=b701fffa ecx=fffffe4c edx=dbeaffdb esi=b6f81168 edi=b70201a0
eip=bfa10e6e esp=ac5e71c4 ebp=ac5e71dc iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293
driver+0×2ae6e:
bfa10e6e 895304 mov dword ptr [ebx+4],edx ds:0023:b701fffe=????????

The address crosses page boundary too:

1: kd> !pte b701fffe
               VA b701fffe
PDE at   C0300B70        PTE at C02DC07C
contains 642CF863      contains 2F336863
pfn 642cf ---DA--KWEV    pfn 2f336 ---DA--KWEV

1: kd> !pte b701fffe+3
               VA b7020001
PDE at   C0300B70        PTE at C02DC080
contains 642CF863      contains 00000080
pfn 642cf ---DA--KWEV                           not valid
                       DemandZero
                       Protect: 4 - ReadWrite

 
This is because trap processing code is found below the current ESP value and also 3rd-party virtual block drivers which I guess were trying to satisfy page fault (the latter not shown in the raw stack fragment here):

1: kd> dds esp-1000 esp
[...]
ac5e6f78  00000002
ac5e6f7c  899c05b0
ac5e6f80  88bd33f8
ac5e6f84  00000010
ac5e6f88  ac5e702c
ac5e6f8c  808457ff nt!KeContextFromKframes+0x9b
ac5e6f90  00000023
ac5e6f94  f7727120
ac5e6f98  00000000
ac5e6f9c  808458fd nt!KeContextFromKframes+0x2bc
ac5e6fa0  ac5e70dc
ac5e6fa4  1f840a42
ac5e6fa8  00000000
ac5e6fac  f7727000
ac5e6fb0  00000000
ac5e6fb4  f7727a7c
ac5e6fb8  ac5e6fd4
ac5e6fbc  808398d4 nt!KiDispatchInterrupt+0xd8
ac5e6fc0  00000000
ac5e6fc4  80a801ae hal!HalpDispatchSoftwareInterrupt+0x5e
ac5e6fc8  ac5e700c
ac5e6fcc  ac5e7001
ac5e6fd0  00000002
ac5e6fd4  ac5e6ff0
ac5e6fd8  80a80397 hal!HalpCheckForSoftwareInterrupt+0x3f
ac5e6fdc  00000002
ac5e6fe0  ac5e700c
ac5e6fe4  ac5e700c
ac5e6fe8  ac5e70b0
ac5e6fec  00000001
ac5e6ff0  f772f120
ac5e6ff4  88bd33f8
ac5e6ff8  00000002
ac5e6ffc  ac5e700c
ac5e7000  8a0a88a0
ac5e7004  88bd33f8
ac5e7008  f7727002
ac5e700c  80a8057e hal!HalEndSystemInterrupt+0x6e
ac5e7010  88bd33f8
ac5e7014  f7727002
ac5e7018  00000002
ac5e701c  ac5e702c
ac5e7020  80a80456 hal!KfLowerIrql+0x62
ac5e7024  f7727000
ac5e7028  0000bb40
ac5e702c  ac5e70ac
ac5e7030  808093eb nt!KiSaveProcessorState+0x20
ac5e7034  ac5e70dc
ac5e7038  00000000
ac5e703c  808093f0 nt!KiSaveProcessorState+0x25
ac5e7040  f772713c
ac5e7044  8087dcbd nt!KiFreezeTargetExecution+0x6a
ac5e7048  ac5e70dc
ac5e704c  00000000
ac5e7050  f7727120
ac5e7054  00000000
ac5e7058  80a7e501 hal!KeAcquireQueuedSpinLockRaiseToSynch+0x21
ac5e705c  88bd3401
ac5e7060  ac5e7070
ac5e7064  80a80456 hal!KfLowerIrql+0x62
ac5e7068  80a7e530 hal!KeReleaseInStackQueuedSpinLock
ac5e706c  88bd3401
ac5e7070  ac5e70b0
ac5e7074  80a7e56d hal!KeReleaseQueuedSpinLock+0x2d
ac5e7078  80823822 nt!KiDeliverApc+0x1cc
ac5e707c  00000000
ac5e7080  ac806e00
ac5e7084  00000200
ac5e7088  00000000
ac5e708c  88bd343c
ac5e7090  00000001
ac5e7094  ac5e7934
ac5e7098  89e45200
ac5e709c  809282c8 nt!CmpPostApc
ac5e70a0  00000000
ac5e70a4  0000010c
ac5e70a8  1d01f008
ac5e70ac  ac5e70dc
ac5e70b0  80837c86 nt!KiIpiServiceRoutine+0x8b
ac5e70b4  ac5e70dc
ac5e70b8  00000000
ac5e70bc  80836bfa nt!KiTrap0E+0xbc
ac5e70c0  b6f81168
ac5e70c4  ac5e7150
ac5e70c8  80a7d8fc hal!HalpIpiHandler+0xcc
ac5e70cc  ac5e70dc
ac5e70d0  00000000
ac5e70d4  80a80300 hal!HalpLowerIrqlHardwareInterrupts+0x10c
ac5e70d8  000000e1
ac5e70dc  ac5e7150
ac5e70e0  80836bfa nt!KiTrap0E+0xbc
ac5e70e4  badb0d00
ac5e70e8  dbeaffdb
ac5e70ec  ac5e70fc
ac5e70f0  80a80456 hal!KfLowerIrql+0x62
ac5e70f4  2f336801
ac5e70f8  ac806e00
ac5e70fc  ac5e7138
ac5e7100  8081a2bf nt!MmAccessFault+0x558
ac5e7104  b701fffe
ac5e7108  00000000
ac5e710c  00000000
ac5e7110  00000023
ac5e7114  00000023
ac5e7118  dbeaffdb
ac5e711c  88bd33f8
ac5e7120  fffff81c
ac5e7124  00000000
ac5e7128  ac5e72b0
ac5e712c  00000030
ac5e7130  b701fffe
ac5e7134  b6f81168
ac5e7138  ac5e71dc
ac5e713c  ac5e7150
ac5e7140  00000000
ac5e7144  80836bfa nt!KiTrap0E+0xbc
ac5e7148  00000008
ac5e714c  00000206
ac5e7150  ac5e71dc

What we may guess here is the fact that two page faults happened simultaneously or nearly at the same time and one of them possibly during the attempt to satisfy it and this resulted in two processors looping. The whole system was hang and the usual keyboard method via Scroll Lock was used to generate the manual dump.

- Dmitry Vostokov @ DumpAnalysis.org -

3 Responses to “Crash Dump Analysis Patterns (Part 1b)”

  1. Crash Dump Analysis » Blog Archive » Invalid pointer, incorrect stack trace, multiple exceptions, insufficient memory and memory leak: pattern cooperation Says:

    […] the same time we see a second fault on another processor but it seems to be a demand zero page […]

  2. Crash Dump Analysis » Blog Archive » WOW64 process, NULL data pointer, stack overflow, main thread, incorrect stack trace, nested exceptions, hidden exception, manual dump, multiple exceptions and virtualized system: pattern cooperation Says:

    […] see a page fault on a kernel stack trace of the main process […]

  3. Crash Dump Analysis » Blog Archive » Icons for Memory Dump Analysis Patterns (Part 2) Says:

    […] we introduce an icon for Multiple Exceptions (kernel mode) […]

Leave a Reply