Crash Dump Analysis Patterns (Part 16a)
In this part I will show one example of Stack Overflow pattern in x86 Windows kernel. When it happens in kernel mode we usually have bugcheck 7F with the first argument being EXCEPTION_DOUBLE_FAULT (8):
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it’s a trap of a kind that the kernel isn’t allowed to have/catch (bound trap) or that is always instant death (double fault). The first number in the bugcheck params is the number of the trap (8 = double fault, etc). Consult an Intel x86 family manual to learn more about what these traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7747fe0
Arg3: 00000000
Arg4: 00000000
The kernel stack size for a thread is limited to 12Kb and is guarded by an invalid page. Therefore when you hit an invalid address on that page the processor generates a page fault, tries to push registers and gets a second page fault. This is what “double fault” means. In this scenario the processor switches to another stack via TSS (task state segment) task switching mechanism because IDT entry for trap 8 contains not an interrupt handler address but a so called TSS segment selector. This selector points to a memory segment that contains a new kernel stack pointer. The difference between normal IDT entry and double fault entry can be seen by inspecting IDT:
5: kd> !pcr 5
KPCR for Processor 5 at f7747000:
Major 1 Minor 1
NtTib.ExceptionList: b044e0b8
NtTib.StackBase: 00000000
NtTib.StackLimit: 00000000
NtTib.SubSystemTib: f7747fe0
NtTib.Version: 00ae1064
NtTib.UserPointer: 00000020
NtTib.SelfTib: 7ffdf000
SelfPcr: f7747000
Prcb: f7747120
Irql: 00000000
IRR: 00000000
IDR: ffffffff
InterruptMode: 00000000
IDT: f774d800
GDT: f774d400
TSS: f774a2e0
CurrentThread: 8834c020
NextThread: 00000000
IdleThread: f774a090
5: kd> dt _KIDTENTRY f774d800
+0x000 Offset : 0x97e8
+0x002 Selector : 8
+0x004 Access : 0x8e00
+0x006 ExtendedOffset : 0x8088
5: kd> ln 0x808897e8
(808897e8) nt!KiTrap00 | (808898c0) nt!Dr_kit1_a
Exact matches:
nt!KiTrap00
5: kd> dt _KIDTENTRY f774d800+7*8
+0x000 Offset : 0xa880
+0x002 Selector : 8
+0x004 Access : 0x8e00
+0x006 ExtendedOffset : 0x8088
5: kd> ln 8088a880
(8088a880) nt!KiTrap07 | (8088ab72) nt!KiTrap08
Exact matches:
nt!KiTrap07
5: kd> dt _KIDTENTRY f774d800+8*8
+0×000 Offset : 0×1238
+0×002 Selector : 0×50
+0×004 Access : 0×8500
+0×006 ExtendedOffset : 0
5: kd> dt _KIDTENTRY f774d800+9*8
+0x000 Offset : 0xac94
+0x002 Selector : 8
+0x004 Access : 0x8e00
+0x006 ExtendedOffset : 0x8088
5: kd> ln 8088ac94
(8088ac94) nt!KiTrap09 | (8088ad10) nt!Dr_kita_a
Exact matches:
nt!KiTrap09
If we switch to selector 50 explicitly we will see nt!KiTrap08 function which does bugcheck and saves the dump in KeBugCheck2 function:
5: kd> .tss 50
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=00000000
eip=8088ab72 esp=f774d3c0 ebp=00000000 iopl=0 nv up di pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000000
nt!KiTrap08:
8088ab72 fa cli
5: kd> .asm no_code_bytes
Assembly options: no_code_bytes
5: kd> uf nt!KiTrap08
nt!KiTrap08:
8088ab72 cli
8088ab73 mov eax,dword ptr fs:[00000040h]
8088ab79 mov ecx,dword ptr fs:[124h]
8088ab80 mov edi,dword ptr [ecx+38h]
8088ab83 mov ecx,dword ptr [edi+18h]
8088ab86 mov dword ptr [eax+1Ch],ecx
8088ab89 mov cx,word ptr [edi+30h]
8088ab8d mov word ptr [eax+66h],cx
8088ab91 mov ecx,dword ptr [edi+20h]
8088ab94 test ecx,ecx
8088ab96 je nt!KiTrap08+0x2a (8088ab9c)
nt!KiTrap08+0x26:
8088ab98 mov cx,48h
nt!KiTrap08+0x2a:
8088ab9c mov word ptr [eax+60h],cx
8088aba0 mov ecx,dword ptr fs:[3Ch]
8088aba7 lea eax,[ecx+50h]
8088abaa mov byte ptr [eax+5],89h
8088abae pushfd
8088abaf and dword ptr [esp],0FFFFBFFFh
8088abb6 popfd
8088abb7 mov eax,dword ptr fs:[0000003Ch]
8088abbd mov ch,byte ptr [eax+57h]
8088abc0 mov cl,byte ptr [eax+54h]
8088abc3 shl ecx,10h
8088abc6 mov cx,word ptr [eax+52h]
8088abca mov eax,dword ptr fs:[00000040h]
8088abd0 mov dword ptr fs:[40h],ecx
nt!KiTrap08+0x65:
8088abd7 push 0
8088abd9 push 0
8088abdb push 0
8088abdd push eax
8088abde push 8
8088abe0 push 7Fh
8088abe2 call nt!KeBugCheck2 (80826a92)
8088abe7 jmp nt!KiTrap08+0x65 (8088abd7)
We can inspect the TSS address shown in the !pcr command output above:
5: kd> dt _KTSS f774a2e0
+0×000 Backlink : 0×28
+0×002 Reserved0 : 0
+0×004 Esp0 : 0xf774d3c0
+0×008 Ss0 : 0×10
+0×00a Reserved1 : 0
+0×00c NotUsed1 : [4] 0
+0×01c CR3 : 0×646000
+0×020 Eip : 0×8088ab72
+0×024 EFlags : 0
+0×028 Eax : 0
+0×02c Ecx : 0
+0×030 Edx : 0
+0×034 Ebx : 0
+0×038 Esp : 0xf774d3c0
+0×03c Ebp : 0
+0×040 Esi : 0
+0×044 Edi : 0
+0×048 Es : 0×23
+0×04a Reserved2 : 0
+0×04c Cs : 8
+0×04e Reserved3 : 0
+0×050 Ss : 0×10
+0×052 Reserved4 : 0
+0×054 Ds : 0×23
+0×056 Reserved5 : 0
+0×058 Fs : 0×30
+0×05a Reserved6 : 0
+0×05c Gs : 0
+0×05e Reserved7 : 0
+0×060 LDT : 0
+0×062 Reserved8 : 0
+0×064 Flags : 0
+0×066 IoMapBase : 0×20ac
+0×068 IoMaps : [1] _KiIoAccessMap
+0×208c IntDirectionMap : [32] “???”
We see that EIP points to nt!KiTrap08 and we see that Backlink value is 28 which is the previous TSS selector value that was before the double fault trap:
5: kd> .tss 28
eax=00000020 ebx=8bef5100 ecx=01404800 edx=8bee4aa8 esi=01404400 edi=00000000
eip=80882e4b esp=b044e000 ebp=b044e034 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
nt!_SEH_prolog+0x1b:
80882e4b push esi
5: kd> k 100
ChildEBP RetAddr
b044e034 f7b840ac nt!_SEH_prolog+0x1b
b044e054 f7b846e6 Ntfs!NtfsMapStream+0x4b
b044e0c8 f7b84045 Ntfs!NtfsReadMftRecord+0x86
b044e100 f7b840f4 Ntfs!NtfsReadFileRecord+0x7a
b044e138 f7b7cdb5 Ntfs!NtfsLookupInFileRecord+0x37
b044e210 f7b6efef Ntfs!NtfsWriteFileSizes+0x76
b044e260 f7b6eead Ntfs!NtfsFlushAndPurgeScb+0xd4
b044e464 f7b7e302 Ntfs!NtfsCommonCleanup+0x1ca8
b044e5d4 8081dce5 Ntfs!NtfsFsdCleanup+0xcf
b044e5e8 f70fac53 nt!IofCallDriver+0x45
b044e610 8081dce5 fltMgr!FltpDispatch+0x6f
b044e624 f420576a nt!IofCallDriver+0x45
b044e634 f4202621 component2!DispatchEx+0xa4
b044e640 8081dce5 component2!Dispatch+0x53
b044e654 f4e998c7 nt!IofCallDriver+0x45
b044e67c f4e9997c component!PassThrough+0xbb
b044e688 8081dce5 component!Dispatch+0x78
b044e69c f41e72ff nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
b044e6c0 f41e71ed ofant+0xc2ff
00000000 00000000 ofant+0xc1ed
This is what !analyze -v does for this dump:
STACK_COMMAND: .tss 0x28 ; kb
In our case NTFS tries to process an exception and SEH exception handler causes double fault when trying to save registers on the stack. Let’s look at the stack trace and crash point. We see that ESP points to the beginning of the valid stack page but the push decrements ESP before memory access and the previous page is clearly invalid:
TSS: 00000028 -- (.tss 28)
eax=00000020 ebx=8bef5100 ecx=01404800 edx=8bee4aa8 esi=01404400 edi=00000000
eip=80882e4b esp=b044e000 ebp=b044e034 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
nt!_SEH_prolog+0×1b:
80882e4b 56 push esi
5: kd> dd b044e000-4
b044dffc ???????? 8bef5100 00000000 00000000
b044e00c 00000000 00000000 00000000 00000000
b044e01c 00000000 00000000 b044e0b8 80880c80
b044e02c 808b6426 80801300 b044e054 f7b840ac
b044e03c 8bece5e0 b044e064 00000400 00000001
b044e04c b044e134 b044e164 b044e0c8 f7b846e6
b044e05c b044e480 8bee4aa8 01404400 00000000
b044e06c 00000400 b044e134 b044e164 e143db08
5: kd> !pte b044e000-4
VA b044dffc
PDE at 00000000C0602C10 PTE at 00000000C0582268
contains 000000010AA3C863 contains 0000000000000000
pfn 10aa3c —DA–KWEV
WinDbg was unable to get all stack frames and we don’t see big frame values (”Memory” column below):
5: kd> knf 100
*** Stack trace for last set context - .thread/.cxr resets it
# Memory ChildEBP RetAddr
00 b044e034 f7b840ac nt!_SEH_prolog+0x1b
01 20 b044e054 f7b846e6 Ntfs!NtfsMapStream+0x4b
02 74 b044e0c8 f7b84045 Ntfs!NtfsReadMftRecord+0x86
03 38 b044e100 f7b840f4 Ntfs!NtfsReadFileRecord+0x7a
04 38 b044e138 f7b7cdb5 Ntfs!NtfsLookupInFileRecord+0x37
05 d8 b044e210 f7b6efef Ntfs!NtfsWriteFileSizes+0x76
06 50 b044e260 f7b6eead Ntfs!NtfsFlushAndPurgeScb+0xd4
07 204 b044e464 f7b7e302 Ntfs!NtfsCommonCleanup+0x1ca8
08 170 b044e5d4 8081dce5 Ntfs!NtfsFsdCleanup+0xcf
09 14 b044e5e8 f70fac53 nt!IofCallDriver+0x45
0a 28 b044e610 8081dce5 fltMgr!FltpDispatch+0x6f
0b 14 b044e624 f420576a nt!IofCallDriver+0x45
0c 10 b044e634 f4202621 component2!DispatchEx+0xa4
0d c b044e640 8081dce5 component2!Dispatch+0x53
0e 14 b044e654 f4e998c7 nt!IofCallDriver+0x45
0f 28 b044e67c f4e9997c component!PassThrough+0xbb
10 c b044e688 8081dce5 component!Dispatch+0x78
11 14 b044e69c f41e72ff nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
12 24 b044e6c0 f41e71ed ofant+0xc2ff
13 00000000 00000000 ofant+0xc1ed
To see all components involved we need to dump raw stack data (12Kb is 0×3000). There we can also see some software exceptions processed and get some partial stack traces for them. Some caution is required because stack traces might be incomplete and misleading due to overwritten stack data.
5: kd> dds b044e000 b044e000+3000
…
…
…
…
b044ebc4 b044ec74
b044ebc8 b044ec50
b044ebcc f41f9458 ofant+0x1e458
b044ebd0 b044f140
b044ebd4 b044ef44
b044ebd8 b044f138
b044ebdc 80877290 nt!RtlDispatchException+0x8c
b044ebe0 b044ef44
b044ebe4 b044f138
b044ebe8 b044ec74
b044ebec b044ec50
b044ebf0 f41f9458 ofant+0x1e458
b044ebf4 8a7668c0
b044ebf8 e16c2e80
b044ebfc 00000000
b044ec00 00000000
b044ec04 00000002
b044ec08 01000000
b044ec0c 00000000
b044ec10 00000000
...
...
...
b044ec60 00000000
b044ec64 b044ef94
b044ec68 8088e13f nt!RtlRaiseStatus+0x47
b044ec6c b044ef44
b044ec70 b044ec74
b044ec74 00010007
…
…
…
b0450fe8 00000000
b0450fec 00000000
b0450ff0 00000000
b0450ff4 00000000
b0450ff8 00000000
b0450ffc 00000000
b0451000 ????????
5: kd> .exr b044ef44
ExceptionAddress: f41dde6d (ofant+0x00002e6d)
ExceptionCode: c0000043
ExceptionFlags: 00000001
NumberParameters: 0
5: kd> .cxr b044ec74
eax=c0000043 ebx=00000000 ecx=89fe1bc0 edx=b044f084 esi=e16c2e80 edi=8a7668c0
eip=f41dde6d esp=b044efa0 ebp=b044f010 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246
ofant+0x2e6d:
f41dde6d e92f010000 jmp ofant+0x2fa1 (f41ddfa1)
5: kd> knf
*** Stack trace for last set context - .thread/.cxr resets it
# Memory ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
00 b044f010 f41ddce6 ofant+0x2e6d
01 b0 b044f0c0 f41dd930 ofant+0x2ce6
02 38 b044f0f8 f41e88eb ofant+0x2930
03 2c b044f124 f6598eba ofant+0xd8eb
04 24 b044f148 f41dcd40 SYMEVENT!SYMEvent_AllocVMData+0x84da
05 18 b044f160 8081dce5 ofant+0x1d40
06 14 b044f174 f6596741 nt!IofCallDriver+0x45
07 28 b044f19c f659dd70 SYMEVENT!SYMEvent_AllocVMData+0x5d61
08 1c b044f1b8 f65967b9 SYMEVENT!EventObjectCreate+0xa60
09 40 b044f1f8 8081dce5 SYMEVENT!SYMEvent_AllocVMData+0x5dd9
0a 14 b044f20c 808f8255 nt!IofCallDriver+0x45
0b e8 b044f2f4 80936af5 nt!IopParseDevice+0xa35
0c 80 b044f374 80932de6 nt!ObpLookupObjectName+0x5a9
0d 54 b044f3c8 808ea211 nt!ObOpenObjectByName+0xea
0e 7c b044f444 808eb4ab nt!IopCreateFile+0x447
0f 5c b044f4a0 808edf2a nt!IoCreateFile+0xa3
10 40 b044f4e0 80888c6c nt!NtCreateFile+0x30
11 0 b044f4e0 8082e105 nt!KiFastCallEntry+0xfc
12 a4 b044f584 f657f20d nt!ZwCreateFile+0x11
13 54 b044f5d8 f65570f6 NAVAP+0x2e20d
Therefore, the following components found on raw stack look suspicious:
ofant.sys, SYMEVENT.SYS and NAVAP.sys.
We should check their timestamps using lmv command and contact their vendors for any existing updates. The workaround would be to remove those products. The rest are Microsoft modules and drivers component.sys and component2.sys.
For the latter two we don’t have significant local variable usage in their functions.
OSR NT Insider article provides another example:
http://www.osronline.com/article.cfm?article=254
The following Citrix article provides an example of stack overflow in ICA protocol stack:
http://support.citrix.com/article/CTX106209
- Dmitry Vostokov @ DumpAnalysis.org -
February 11th, 2008 at 2:43 pm
Here is another example from NT Debugging blog:
http://blogs.msdn.com/ntdebugging/archive/2008/02/01/kernel-stack-overflows.aspx
May 12th, 2008 at 5:56 pm
Dmitry
This dump file gives clues as to the trap and .tss selector. What do these clues mean:
1: kd> !analyze -v
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it’s a trap of a kind
that the kernel isn’t allowed to have/catch (bound trap) or that
is always instant death (double fault). The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken
(on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7727fe0
Arg3: 00000000
Arg4: 00000000
Debugging Details:
——————
WARNING: Process directory table base AFFB7740 doesn’t match CR3 00545000
Unable to get PEB pointer
WARNING: Process directory table base AFFB7740 doesn’t match CR3 00545000
Unable to get PEB pointer
BUGCHECK_STR: 0×7f_8
TSS: 00000028 — (.tss 0×28)
eax=00000000 ebx=f78dd100 ecx=a53a5b40 edx=007ffff8 esi=80000000 edi=c0603018
eip=8085e1d0 esp=f78dcfb8 ebp=f78dd008 iopl=0 nv up ei ng nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010286
nt!MmAccessFault+0×8:
8085e1d0 and dword ptr [ebp-1Ch],0 ss:0010:f78dcfec=00000000
Resetting default scope
DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: drwtsn32.exe
CURRENT_IRQL: 2
TRAP_FRAME: f78dd308 — (.trap 0xfffffffff78dd308)
ErrCode = 00000000
eax=40000000 ebx=c0400000 ecx=c0603018 edx=007ffff8 esi=80000000 edi=00000001
eip=8084d509 esp=f78dd37c ebp=f78dd3e8 iopl=0 nv up ei pl nz na po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010203
nt!MiCheckPdeForPagedPool+0×73:
8084d509 mov eax,dword ptr [ecx] ds:0023:c0603018=00549063
Resetting default scope
LAST_CONTROL_TRANSFER: from 8088c798 to 8085e1d0
STACK_TEXT:
f78dd008 8088c798 00000000 c0603018 00000000 nt!MmAccessFault+0×8
(the dump file continues)
June 10th, 2008 at 4:17 pm
[…] just found that although I covered Stack Overflow in kernel mode I didn’t do this for user mode. In fact this is one of the simplest patterns to see in crash […]
June 24th, 2008 at 3:44 am
According to the result of : dds b044e000 b044e000+3000
May I know whay to choose “b044ef44″ & “b044ec74″ to inspect? Does we need to choose the value of “RtlRaiseStatus+0×47″ after? How about if I can’t find “RtlRaiseStatus+0×47″?
Thanks!
June 24th, 2008 at 7:39 am
I chose b044e000 because I wanted to inspect the raw stack data from the top of the stack (ESP value in red above). By choosing other ranges we might miss something.
November 5th, 2008 at 6:39 pm
[…] bugcheck argument 1 shows that we have a double fault that most often results from kernel stack overflow. If we go back to processor 0 to inspect its TSS we don’t get meaningful results too (we […]
March 19th, 2009 at 10:58 pm
[…] Stack Overflow Pattern (kernel mode) […]
March 7th, 2010 at 4:53 am
I have the same question as Blue Fish had(June 24th, 2008 at 3:44 am) :
May I know why to choose “b044ef44″ & “b044ec74″ to inspect?
March 7th, 2010 at 11:42 pm
Certain functions like RtlDispatchException, RtlRaiseStatus and others have function parameters pointing to an exception record and context. See, for example,
http://source.winehq.org/source/dlls/kernel32/except.c#L84
May 4th, 2010 at 10:37 am
[…] we introduce an icon for Stack Overflow (kernel mode) […]
June 21st, 2010 at 11:07 am
[…] • Crash Dump Analysis Patterns (Part 16a) - Stack overflow in kernel. Generated some comments and can also be see in the following pattern case study: Lateral damage, stack overflow and execution residue […]