Lateral damage, stack overflow and execution residue: pattern cooperation
As I mentioned in comments to Lateral Damage pattern it lies in between the normal healthy dump files and corrupt dumps. For example, the following 8Gb complete memory dump that fits perfectly into 16Gb page file had the problem of missing processor control region making it impossible to get meaningful information from certain WinDbg commands:
0: kd> !analyze -v
[...]
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault). The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken
(on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7727fe0
Arg3: 00000000
Arg4: 00000000
Debugging Details:
------------------
Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3
Unable to read selector for PCR for processor 1
Unable to read selector for PCR for processor 3
[...]
STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
8089a600 8088ddf2 00000000 0000000e 00000000 processr+0x2886
8089a604 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa
[...]
0: kd> ~1
Unable to read selector for PCR for processor 1
WARNING: Unable to reset page directories
1: kd> !pcr
Unable to read selector for PCR for processor 1
Cannot get PRCB address
1: kd> kv
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 0×0
The bugcheck argument 1 shows that we have a double fault that most often results from kernel stack overflow. If we go back to processor 0 to inspect its TSS we don’t get meaningful results too (we expect the value of Backlink to be 0×28):
0: kd> !pcr
KPCR for Processor 0 at ffdff000:
Major 1 Minor 1
NtTib.ExceptionList: ffffffff
NtTib.StackBase: 00000000
NtTib.StackLimit: 00000000
NtTib.SubSystemTib: 80042000
NtTib.Version: 2a1b0b08
NtTib.UserPointer: 00000001
NtTib.SelfTib: 00000000
SelfPcr: ffdff000
Prcb: ffdff120
Irql: 0000001f
IRR: 00000000
IDR: ffffffff
InterruptMode: 00000000
IDT: 8003f400
GDT: 8003f000
TSS: 80042000
CurrentThread: 8089d8c0
NextThread: 00000000
IdleThread: 8089d8c0
DpcQueue:
0: kd> dt _KTSS 80042000
nt!_KTSS
+0×000 Backlink : 0xc45
+0×002 Reserved0 : 0×4d8a
+0×004 Esp0 : 0×8089a6a0
+0×008 Ss0 : 0×10
+0×00a Reserved1 : 0xb70f
+0×00c NotUsed1 : [4] 0×5031ff00
+0×01c CR3 : 0×8b55ff8b
+0×020 Eip : 0xc75ffec
+0×024 EFlags : 0xe80875ff
+0×028 Eax : 0xfffffbdd
+0×02c Ecx : 0×1b75c084
+0×030 Edx : 0×8b184d8b
+0×034 Ebx : 0×7d8b57d1
+0×038 Esp : 0×2e9c110
+0×03c Ebp : 0xf3ffc883
+0×040 Esi : 0×83ca8bab
+0×044 Edi : 0xaaf303e1
+0×048 Es : 0xeb5f
+0×04a Reserved2 : 0×6819
+0×04c Cs : 0×24fc
+0×04e Reserved3 : 0×44
+0×050 Ss : 0×75ff
+0×052 Reserved4 : 0xff18
+0×054 Ds : 0×1475
+0×056 Reserved5 : 0×75ff
+0×058 Fs : 0xff10
+0×05a Reserved6 : 0xc75
+0×05c Gs : 0×75ff
+0×05e Reserved7 : 0xe808
+0×060 LDT : 0
+0×062 Reserved8 : 0xffff
+0×064 Flags : 0
+0×066 IoMapBase : 0×20ac
+0×068 IoMaps : [1] _KiIoAccessMap
+0×208c IntDirectionMap : [32] “???”
However if we try to list all thread stacks we see one thread running on processor 1:
0: kd> !process 0 ff
[...]
THREAD 8a241db0 Cid 1218.4420 Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 1
IRP List:
8b200008: (0006,0244) Flags: 00000884 Mdl: 00000000
89beedb8: (0006,0244) Flags: 00000884 Mdl: 00000000
Not impersonating
DeviceMap e1002060
Owning Process 8bc63d88 Image: svchost.exe
Wait Start TickCount 10242012 Ticks: 0
Context Switch Count 1832
UserTime 00:00:00.000
KernelTime 00:00:00.046
Start Address termdd (0xf75cc218)
Stack Init 9c849000 Current 9c846938 Base 9c849000 Limit 9c846000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
Unable to read selector for PCR for processor 1
[...]
Now we can look at its raw stack to see execution residue and try to reconstruct partial stack traces:
0: kd> dds 9c846000 9c849000
9c846000 94040001
9c846004 00000014
9c846008 8d147848
9c84600c 8d0bfd08
9c846010 8d0bfd00
9c846014 00000001
9c846018 8d0bfd08
9c84601c 8d0bfd00
9c846020 8d0bfd00
9c846024 9c846034
9c846028 80a5c456 hal!KfLowerIrql+0×62
9c84602c 8d0bfdd8
9c846030 8d0bfd00
9c846034 9c846060
9c846038 80a5a56d hal!KeReleaseQueuedSpinLock+0×2d
9c84603c 00000011
9c846040 00000001
9c846044 8a241db0
9c846048 0000000e
9c84604c 00000000
9c846050 8d0bfdc0
9c846054 05000000
9c846058 00007400
9c84605c 00000001
9c846060 9c846084
9c846064 808b6138 driverA!MapData+0×4a
9c846068 8d0bfd08
9c84606c 00007400
9c846070 00000000
9c846074 00000018
9c846078 00000028
9c84607c 00001000
9c846080 00000018
9c846084 9c84609c
9c846088 f7b8f2e5 driverB!CheckData+0×7a
9c84608c 01b47538
9c846090 00000028
9c846094 0000001c
[…]
0: kd> k L=9c846024 9c846024 9c846024
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846024 80a5c456 0×9c846024
9c846034 80a5a56d hal!KfLowerIrql+0×62
9c8460f0 8080d164 hal!KeReleaseQueuedSpinLock+0×2d
9c846060 808b6138 driverA!RemapData+0×3e
9c846084 f7b8f2e5 driverA!MapData+0×4a
9c84609c f7b8f340 driverB!CheckData+0×7a
9c8460e4 808b4000 driverB!CheckAttributes+0×36f
9c84610c f7b8e503 driverB!AddToRecord+0×2a
9c846174 f7b90df0 driverB!ReadRecord+0×1d0
f7b8e508 90909090 driverB!ReadAllRecords+0×7a
[…]
Using the current stack pointer we get another partial stack trace:
0: kd> k L=9c846034 9c846938 9c846938
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
9c846954 8081df65 0×9c846938
9c846968 808f5437 nt!IofCallDriver+0×45
9c84697c 808ef963 nt!IopSynchronousServiceTail+0×10b
9c8469a0 8088978c nt!NtQueryDirectoryFile+0×5d
9c8469a0 8082f1c1 nt!KiFastCallEntry+0xfc
9c846a44 f5296f4b nt!ZwQueryDirectoryFile+0×11
9c846a90 f5297451 DriverC+0×2f4b
9c846adc f52a54cb DriverC+0×3451
9c846af8 f52a44e6 DriverC+0×114cb
9c846b1c f52b2941 DriverC+0×104e6
9c846b4c f52b2626 DriverC+0×1e941
9c846b88 f52a34a7 DriverC+0×1e626
9c846be8 f52a487c DriverC+0xf4a7
[…]
Using different base pointers for k command we can reconstruct different partial stack traces. We can also analyze the longest ones for any stack usage using variant of knf command that shows stack frame size in bytes and find drivers that consume the most of kernel stack. Because we see execution residue on top of the kernel stack (Limit) we can suspect this thread caused the actual stack overflow which resulted in the double fault bugcheck.
- Dmitry Vostokov @ DumpAnalysis.org -
July 28th, 2009 at 2:57 pm
Thank you, I was able to extract more information from a customers “corrupted” crash dump with this writeup.
March 30th, 2010 at 5:55 pm
Backlink of TSS that you have pointed to was not corrupted. It is just uninitialized. See SS0, ESP0, Flags and LDT - they are valid. So the structure itself is not corrupted. It has never been used instead. On doublefault Windows uses task gate #8 in IDT.
March 30th, 2010 at 6:47 pm
Thanks! Have never considered the possibility of being uninitialized. Now I’m thinking about another pattern (if I find other cases)
June 21st, 2010 at 11:09 am
[…] • Crash Dump Analysis Patterns (Part 16a) - Stack overflow in kernel. Generated some comments and can also be see in the following pattern case study: Lateral damage, stack overflow and execution residue […]