Crash Dump Analysis Patterns (Part 1b)
Almost 2 years have passed since I wrote the first post about crash dump analysis patterns: Multiple Exceptions. Today I write about multiple exceptions or faults in kernel mode. Here I distinguish multiple exceptions from nested exceptions. The latter ones in kernel result in double faults, see for example, Stack Overflow pattern. At the first glance it looked like the dump was saved manually:
0: kd> !analyze -v
[...]
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
[...]
However further down in analysis report there was the presence of page fault:
TRAP_FRAME: a38df520 -- (.trap 0xffffffffa38df520)
ErrCode = 00000002
eax=b6d9220f ebx=b6ab4ffb ecx=00000304 edx=eaf2fdea esi=b6d9214c edi=b6ab8189
eip=bfa10e6e esp=a38df594 ebp=a38df5ac iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293
driver+0x2ae6e:
bfa10e6e 895304 mov dword ptr [ebx+4],edx ds:0023:b6ab4fff=????????
Resetting default scope
STACK_TEXT:
a38df410 b48aa532 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1b
a38df440 b48a9d2c 000eba28 9282c8c6 00000000 i8042prt!I8xProcessCrashDump+0x256
a38df488 80839595 89d0c008 8a0eb970 0001000a i8042prt!I8042KeyboardInterruptService+0x225
a38df488 80836bfa 89d0c008 8a0eb970 0001000a nt!KiInterruptDispatch+0x49
a38df520 bfa10e6e badb0d00 eaf2fdea 8867cbe8 nt!KiTrap0E+0xbc
WARNING: Stack unwind information not available. Following frames may be wrong.
a38df5ac bfa22461 b6ab423b 000003dc 00000007 driver+0×2ae6e
[…]
Looking at b6ab4fff address shows that it crosses page boundary, see Data Alignment pattern.
We also see that this thread was running and consumed too much kernel time, see Spiking Thread pattern:
0: kd> !thread
THREAD 88e686d8 Cid 1e48.1f7c Teb: 7ffdf000 Win32Thread: b669de70 RUNNING on processor 0
Not impersonating
DeviceMap dc971120
Owning Process 889e0d88 Image: ProcessA.EXE
Wait Start TickCount 9231345 Ticks: 0
Context Switch Count 2196221 LargeStack
UserTime 00:00:35.562
KernelTime 04:51:23.656
[…]
Intrigued, we see another running thread on the second processor:
0: kd> !running
Prcb Current Next
0 ffdff120 88e686d8 ................
1 f7727120 88bd33f8 …………….
0: kd> !thread 88bd33f8
THREAD 88bd33f8 Cid 2fdc.27f0 Teb: 7ffdf000 Win32Thread: b6640ab8 RUNNING on processor 1
Not impersonating
DeviceMap d7a13b40
Owning Process 89e45200 Image: ProcessA.EXE
Wait Start TickCount 9231345 Ticks: 0
Context Switch Count 2324364 LargeStack
UserTime 00:00:21.171
KernelTime 05:02:09.500
Win32 Start Address ProcessA (0×30001e28)
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init ac5e7bd0 Current ac5e7078 Base ac5e8000 Limit ac5e1000 Call ac5e7bd8
Priority 6 BasePriority 6 PriorityDecrement 0
ChildEBP RetAddr Args to Child
ac5e7150 bfa10e6e badb0d00 dbeaffdb 89a793d8 nt!KiTrap0E+0xbc (FPO: [0,0] TrapFrame @ ac5e7150)
WARNING: Stack unwind information not available. Following frames may be wrong.
ac5e71dc bfa22461 b701f15f ffffff24 00000007 driver+0×2ae6e
[…]
We see it is spiking CPU too and we detect a possible loop in page fault handler:
0: kd> .thread 88bd33f8
Implicit thread is now 88bd33f8
0: kd> ~1s
1: kd> r
eax=fffff81c ebx=ac5e71dc ecx=88bd33f8 edx=dbeaffdb esi=b6f81168 edi=b701fffe
eip=80836bfa esp=ac5e7150 ebp=ac5e7150 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000206
nt!KiTrap0E+0xbc:
80836bfa 0f84e5010000 je nt!KiTrap0E+0×2a7 (80836de5) [br=0]
When looking at raw stack we see that the loop happened after processing this exception:
1: kd> .trap ac5e7150
ErrCode = 00000002
eax=b6f8122b ebx=b701fffa ecx=fffffe4c edx=dbeaffdb esi=b6f81168 edi=b70201a0
eip=bfa10e6e esp=ac5e71c4 ebp=ac5e71dc iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293
driver+0×2ae6e:
bfa10e6e 895304 mov dword ptr [ebx+4],edx ds:0023:b701fffe=????????
The address crosses page boundary too:
1: kd> !pte b701fffe
VA b701fffe
PDE at C0300B70 PTE at C02DC07C
contains 642CF863 contains 2F336863
pfn 642cf ---DA--KWEV pfn 2f336 ---DA--KWEV
1: kd> !pte b701fffe+3
VA b7020001
PDE at C0300B70 PTE at C02DC080
contains 642CF863 contains 00000080
pfn 642cf ---DA--KWEV not valid
DemandZero
Protect: 4 - ReadWrite
This is because trap processing code is found below the current ESP value and also 3rd-party virtual block drivers which I guess were trying to satisfy page fault (the latter not shown in the raw stack fragment here):
1: kd> dds esp-1000 esp
[...]
ac5e6f78 00000002
ac5e6f7c 899c05b0
ac5e6f80 88bd33f8
ac5e6f84 00000010
ac5e6f88 ac5e702c
ac5e6f8c 808457ff nt!KeContextFromKframes+0x9b
ac5e6f90 00000023
ac5e6f94 f7727120
ac5e6f98 00000000
ac5e6f9c 808458fd nt!KeContextFromKframes+0x2bc
ac5e6fa0 ac5e70dc
ac5e6fa4 1f840a42
ac5e6fa8 00000000
ac5e6fac f7727000
ac5e6fb0 00000000
ac5e6fb4 f7727a7c
ac5e6fb8 ac5e6fd4
ac5e6fbc 808398d4 nt!KiDispatchInterrupt+0xd8
ac5e6fc0 00000000
ac5e6fc4 80a801ae hal!HalpDispatchSoftwareInterrupt+0x5e
ac5e6fc8 ac5e700c
ac5e6fcc ac5e7001
ac5e6fd0 00000002
ac5e6fd4 ac5e6ff0
ac5e6fd8 80a80397 hal!HalpCheckForSoftwareInterrupt+0x3f
ac5e6fdc 00000002
ac5e6fe0 ac5e700c
ac5e6fe4 ac5e700c
ac5e6fe8 ac5e70b0
ac5e6fec 00000001
ac5e6ff0 f772f120
ac5e6ff4 88bd33f8
ac5e6ff8 00000002
ac5e6ffc ac5e700c
ac5e7000 8a0a88a0
ac5e7004 88bd33f8
ac5e7008 f7727002
ac5e700c 80a8057e hal!HalEndSystemInterrupt+0x6e
ac5e7010 88bd33f8
ac5e7014 f7727002
ac5e7018 00000002
ac5e701c ac5e702c
ac5e7020 80a80456 hal!KfLowerIrql+0x62
ac5e7024 f7727000
ac5e7028 0000bb40
ac5e702c ac5e70ac
ac5e7030 808093eb nt!KiSaveProcessorState+0x20
ac5e7034 ac5e70dc
ac5e7038 00000000
ac5e703c 808093f0 nt!KiSaveProcessorState+0x25
ac5e7040 f772713c
ac5e7044 8087dcbd nt!KiFreezeTargetExecution+0x6a
ac5e7048 ac5e70dc
ac5e704c 00000000
ac5e7050 f7727120
ac5e7054 00000000
ac5e7058 80a7e501 hal!KeAcquireQueuedSpinLockRaiseToSynch+0x21
ac5e705c 88bd3401
ac5e7060 ac5e7070
ac5e7064 80a80456 hal!KfLowerIrql+0x62
ac5e7068 80a7e530 hal!KeReleaseInStackQueuedSpinLock
ac5e706c 88bd3401
ac5e7070 ac5e70b0
ac5e7074 80a7e56d hal!KeReleaseQueuedSpinLock+0x2d
ac5e7078 80823822 nt!KiDeliverApc+0x1cc
ac5e707c 00000000
ac5e7080 ac806e00
ac5e7084 00000200
ac5e7088 00000000
ac5e708c 88bd343c
ac5e7090 00000001
ac5e7094 ac5e7934
ac5e7098 89e45200
ac5e709c 809282c8 nt!CmpPostApc
ac5e70a0 00000000
ac5e70a4 0000010c
ac5e70a8 1d01f008
ac5e70ac ac5e70dc
ac5e70b0 80837c86 nt!KiIpiServiceRoutine+0x8b
ac5e70b4 ac5e70dc
ac5e70b8 00000000
ac5e70bc 80836bfa nt!KiTrap0E+0xbc
ac5e70c0 b6f81168
ac5e70c4 ac5e7150
ac5e70c8 80a7d8fc hal!HalpIpiHandler+0xcc
ac5e70cc ac5e70dc
ac5e70d0 00000000
ac5e70d4 80a80300 hal!HalpLowerIrqlHardwareInterrupts+0x10c
ac5e70d8 000000e1
ac5e70dc ac5e7150
ac5e70e0 80836bfa nt!KiTrap0E+0xbc
ac5e70e4 badb0d00
ac5e70e8 dbeaffdb
ac5e70ec ac5e70fc
ac5e70f0 80a80456 hal!KfLowerIrql+0x62
ac5e70f4 2f336801
ac5e70f8 ac806e00
ac5e70fc ac5e7138
ac5e7100 8081a2bf nt!MmAccessFault+0x558
ac5e7104 b701fffe
ac5e7108 00000000
ac5e710c 00000000
ac5e7110 00000023
ac5e7114 00000023
ac5e7118 dbeaffdb
ac5e711c 88bd33f8
ac5e7120 fffff81c
ac5e7124 00000000
ac5e7128 ac5e72b0
ac5e712c 00000030
ac5e7130 b701fffe
ac5e7134 b6f81168
ac5e7138 ac5e71dc
ac5e713c ac5e7150
ac5e7140 00000000
ac5e7144 80836bfa nt!KiTrap0E+0xbc
ac5e7148 00000008
ac5e714c 00000206
ac5e7150 ac5e71dc
What we may guess here is the fact that two page faults happened simultaneously or nearly at the same time and one of them possibly during the attempt to satisfy it and this resulted in two processors looping. The whole system was hang and the usual keyboard method via Scroll Lock was used to generate the manual dump.
- Dmitry Vostokov @ DumpAnalysis.org -
June 10th, 2009 at 2:59 pm
[…] the same time we see a second fault on another processor but it seems to be a demand zero page […]
July 12th, 2009 at 5:59 pm
[…] see a page fault on a kernel stack trace of the main process […]
March 9th, 2010 at 12:52 pm
[…] we introduce an icon for Multiple Exceptions (kernel mode) […]