Crash Dump Analysis Patterns (Part 16a)

In this part I will show one example of Stack Overflow pattern in x86 Windows kernel. When it happens in kernel mode we usually have bugcheck 7F with the first argument being EXCEPTION_DOUBLE_FAULT (8):

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it’s a trap of a kind that the kernel isn’t allowed to have/catch (bound trap) or that is always instant death (double fault). The first number in the bugcheck params is the number of the trap (8 = double fault, etc). Consult an Intel x86 family manual to learn more about what these traps are. Here is a *portion* of those codes:
If kv shows a taskGate
  use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
  use .trap on that value
Else
  .trap on the appropriate frame will show where the trap was taken (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: f7747fe0
Arg3: 00000000
Arg4: 00000000

The kernel stack size for a thread is limited to 12Kb and is guarded by an invalid page. Therefore when you hit an invalid address on that page the processor generates a page fault, tries to push registers and gets a second page fault. This is what “double fault” means. In this scenario the processor switches to another stack via TSS (task state segment) task switching mechanism because IDT entry for trap 8 contains not an interrupt handler address but a so called TSS segment selector. This selector points to a memory segment that contains a new kernel stack pointer. The difference between normal IDT entry and double fault entry can be seen by inspecting IDT:

5: kd> !pcr 5
KPCR for Processor 5 at f7747000:
    Major 1 Minor 1
 NtTib.ExceptionList: b044e0b8
     NtTib.StackBase: 00000000
    NtTib.StackLimit: 00000000
  NtTib.SubSystemTib: f7747fe0
       NtTib.Version: 00ae1064
   NtTib.UserPointer: 00000020
       NtTib.SelfTib: 7ffdf000
             SelfPcr: f7747000
                Prcb: f7747120
                Irql: 00000000
                 IRR: 00000000
                 IDR: ffffffff
       InterruptMode: 00000000
                 IDT: f774d800
                 GDT: f774d400
                 TSS: f774a2e0
       CurrentThread: 8834c020
          NextThread: 00000000
          IdleThread: f774a090

5: kd> dt _KIDTENTRY f774d800
   +0x000 Offset           : 0x97e8
   +0x002 Selector         : 8
   +0x004 Access           : 0x8e00
   +0x006 ExtendedOffset   : 0x8088

5: kd> ln 0x808897e8
(808897e8)   nt!KiTrap00   |  (808898c0)   nt!Dr_kit1_a
Exact matches:
    nt!KiTrap00

5: kd> dt _KIDTENTRY f774d800+7*8
   +0x000 Offset           : 0xa880
   +0x002 Selector         : 8
   +0x004 Access           : 0x8e00
   +0x006 ExtendedOffset   : 0x8088

5: kd> ln 8088a880
(8088a880)   nt!KiTrap07   |  (8088ab72)   nt!KiTrap08
Exact matches:
    nt!KiTrap07

5: kd> dt _KIDTENTRY f774d800+8*8
   +0×000 Offset           : 0×1238
   +0×002 Selector         : 0×50
   +0×004 Access           : 0×8500
   +0×006 ExtendedOffset   : 0

5: kd> dt _KIDTENTRY f774d800+9*8
  +0x000 Offset : 0xac94
  +0x002 Selector : 8
  +0x004 Access : 0x8e00
  +0x006 ExtendedOffset : 0x8088

5: kd> ln 8088ac94
(8088ac94) nt!KiTrap09 | (8088ad10) nt!Dr_kita_a
Exact matches:
  nt!KiTrap09

If we switch to selector 50 explicitly we will see nt!KiTrap08 function which does bugcheck and saves the dump in KeBugCheck2 function:

5: kd> .tss 50
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=00000000
eip=8088ab72 esp=f774d3c0 ebp=00000000 iopl=0 nv up di pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000000
nt!KiTrap08:
8088ab72 fa              cli

5: kd> .asm no_code_bytes
Assembly options: no_code_bytes

5: kd> uf nt!KiTrap08
nt!KiTrap08:
8088ab72 cli
8088ab73 mov     eax,dword ptr fs:[00000040h]
8088ab79 mov     ecx,dword ptr fs:[124h]
8088ab80 mov     edi,dword ptr [ecx+38h]
8088ab83 mov     ecx,dword ptr [edi+18h]
8088ab86 mov     dword ptr [eax+1Ch],ecx
8088ab89 mov     cx,word ptr [edi+30h]
8088ab8d mov     word ptr [eax+66h],cx
8088ab91 mov     ecx,dword ptr [edi+20h]
8088ab94 test    ecx,ecx
8088ab96 je      nt!KiTrap08+0x2a (8088ab9c)

nt!KiTrap08+0x26:
8088ab98 mov     cx,48h

nt!KiTrap08+0x2a:
8088ab9c mov     word ptr [eax+60h],cx
8088aba0 mov     ecx,dword ptr fs:[3Ch]
8088aba7 lea     eax,[ecx+50h]
8088abaa mov     byte ptr [eax+5],89h
8088abae pushfd
8088abaf and     dword ptr [esp],0FFFFBFFFh
8088abb6 popfd
8088abb7 mov     eax,dword ptr fs:[0000003Ch]
8088abbd mov     ch,byte ptr [eax+57h]
8088abc0 mov     cl,byte ptr [eax+54h]
8088abc3 shl     ecx,10h
8088abc6 mov     cx,word ptr [eax+52h]
8088abca mov     eax,dword ptr fs:[00000040h]
8088abd0 mov     dword ptr fs:[40h],ecx

nt!KiTrap08+0x65:
8088abd7 push    0
8088abd9 push    0
8088abdb push    0
8088abdd push    eax
8088abde push    8
8088abe0 push    7Fh
8088abe2 call    nt!KeBugCheck2 (80826a92)
8088abe7 jmp     nt!KiTrap08+0x65 (8088abd7)

We can inspect the TSS address shown in the !pcr command output above:

5: kd> dt _KTSS f774a2e0
   +0×000 Backlink         : 0×28
   +0×002 Reserved0        : 0
   +0×004 Esp0             : 0xf774d3c0
   +0×008 Ss0              : 0×10
   +0×00a Reserved1        : 0
   +0×00c NotUsed1         : [4] 0
   +0×01c CR3              : 0×646000
   +0×020 Eip              : 0×8088ab72
   +0×024 EFlags           : 0
   +0×028 Eax              : 0
   +0×02c Ecx              : 0
   +0×030 Edx              : 0
   +0×034 Ebx              : 0
   +0×038 Esp              : 0xf774d3c0
   +0×03c Ebp              : 0
   +0×040 Esi              : 0
   +0×044 Edi              : 0
   +0×048 Es               : 0×23
   +0×04a Reserved2        : 0
   +0×04c Cs               : 8
   +0×04e Reserved3        : 0
   +0×050 Ss               : 0×10
   +0×052 Reserved4        : 0
   +0×054 Ds               : 0×23
   +0×056 Reserved5        : 0
   +0×058 Fs               : 0×30
   +0×05a Reserved6        : 0
   +0×05c Gs               : 0
   +0×05e Reserved7        : 0
   +0×060 LDT              : 0
   +0×062 Reserved8        : 0
   +0×064 Flags            : 0
   +0×066 IoMapBase        : 0×20ac
   +0×068 IoMaps           : [1] _KiIoAccessMap
   +0×208c IntDirectionMap  : [32]  “???”

We see that EIP points to nt!KiTrap08 and we see that Backlink value is 28 which is the previous TSS selector value that was before the double fault trap:

5: kd> .tss 28
eax=00000020 ebx=8bef5100 ecx=01404800 edx=8bee4aa8 esi=01404400 edi=00000000
eip=80882e4b esp=b044e000 ebp=b044e034 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
nt!_SEH_prolog+0x1b:
80882e4b push    esi

5: kd> k 100
ChildEBP RetAddr
b044e034 f7b840ac nt!_SEH_prolog+0x1b
b044e054 f7b846e6 Ntfs!NtfsMapStream+0x4b
b044e0c8 f7b84045 Ntfs!NtfsReadMftRecord+0x86
b044e100 f7b840f4 Ntfs!NtfsReadFileRecord+0x7a
b044e138 f7b7cdb5 Ntfs!NtfsLookupInFileRecord+0x37
b044e210 f7b6efef Ntfs!NtfsWriteFileSizes+0x76
b044e260 f7b6eead Ntfs!NtfsFlushAndPurgeScb+0xd4
b044e464 f7b7e302 Ntfs!NtfsCommonCleanup+0x1ca8
b044e5d4 8081dce5 Ntfs!NtfsFsdCleanup+0xcf
b044e5e8 f70fac53 nt!IofCallDriver+0x45
b044e610 8081dce5 fltMgr!FltpDispatch+0x6f
b044e624 f420576a nt!IofCallDriver+0x45
b044e634 f4202621 component2!DispatchEx+0xa4
b044e640 8081dce5 component2!Dispatch+0x53
b044e654 f4e998c7 nt!IofCallDriver+0x45
b044e67c f4e9997c component!PassThrough+0xbb
b044e688 8081dce5 component!Dispatch+0x78
b044e69c f41e72ff nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
b044e6c0 f41e71ed ofant+0xc2ff
00000000 00000000 ofant+0xc1ed

This is what !analyze -v does for this dump:

STACK_COMMAND:  .tss 0x28 ; kb

In our case NTFS tries to process an exception and SEH exception handler causes double fault when trying to save registers on the stack. Let’s look at the stack trace and crash point. We see that ESP points to the beginning of the valid stack page but the push decrements ESP before memory access and the previous page is clearly invalid:

TSS:  00000028 -- (.tss 28)
eax=00000020 ebx=8bef5100 ecx=01404800 edx=8bee4aa8 esi=01404400 edi=00000000
eip=80882e4b esp=b044e000 ebp=b044e034 iopl=0  nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
nt!_SEH_prolog+0×1b:
80882e4b 56              push    esi

5: kd> dd b044e000-4
b044dffc  ???????? 8bef5100 00000000 00000000
b044e00c  00000000 00000000 00000000 00000000
b044e01c  00000000 00000000 b044e0b8 80880c80
b044e02c  808b6426 80801300 b044e054 f7b840ac
b044e03c  8bece5e0 b044e064 00000400 00000001
b044e04c  b044e134 b044e164 b044e0c8 f7b846e6
b044e05c  b044e480 8bee4aa8 01404400 00000000
b044e06c  00000400 b044e134 b044e164 e143db08

5: kd> !pte b044e000-4
               VA b044dffc
PDE at 00000000C0602C10    PTE at 00000000C0582268
contains 000000010AA3C863  contains 0000000000000000
pfn 10aa3c —DA–KWEV
 

WinDbg was unable to get all stack frames and we don’t see big frame values (”Memory” column below):

5: kd> knf 100
  *** Stack trace for last set context - .thread/.cxr resets it
 #   Memory  ChildEBP RetAddr
00           b044e034 f7b840ac nt!_SEH_prolog+0x1b
01        20 b044e054 f7b846e6 Ntfs!NtfsMapStream+0x4b
02        74 b044e0c8 f7b84045 Ntfs!NtfsReadMftRecord+0x86
03        38 b044e100 f7b840f4 Ntfs!NtfsReadFileRecord+0x7a
04        38 b044e138 f7b7cdb5 Ntfs!NtfsLookupInFileRecord+0x37
05        d8 b044e210 f7b6efef Ntfs!NtfsWriteFileSizes+0x76
06        50 b044e260 f7b6eead Ntfs!NtfsFlushAndPurgeScb+0xd4
07       204 b044e464 f7b7e302 Ntfs!NtfsCommonCleanup+0x1ca8
08       170 b044e5d4 8081dce5 Ntfs!NtfsFsdCleanup+0xcf
09        14 b044e5e8 f70fac53 nt!IofCallDriver+0x45
0a        28 b044e610 8081dce5 fltMgr!FltpDispatch+0x6f
0b        14 b044e624 f420576a nt!IofCallDriver+0x45
0c        10 b044e634 f4202621 component2!DispatchEx+0xa4
0d         c b044e640 8081dce5 component2!Dispatch+0x53
0e        14 b044e654 f4e998c7 nt!IofCallDriver+0x45
0f        28 b044e67c f4e9997c component!PassThrough+0xbb
10         c b044e688 8081dce5 component!Dispatch+0x78
11        14 b044e69c f41e72ff nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
12        24 b044e6c0 f41e71ed ofant+0xc2ff
13           00000000 00000000 ofant+0xc1ed

To see all components involved we need to dump raw stack data (12Kb is 0×3000). There we can also see some software exceptions processed and get some partial stack traces for them. Some caution is required because stack traces might be incomplete and misleading due to overwritten stack data.

5: kd> dds b044e000 b044e000+3000




b044ebc4  b044ec74
b044ebc8  b044ec50
b044ebcc  f41f9458 ofant+0x1e458
b044ebd0  b044f140
b044ebd4  b044ef44
b044ebd8  b044f138
b044ebdc  80877290 nt!RtlDispatchException+0x8c
b044ebe0  b044ef44
b044ebe4  b044f138
b044ebe8  b044ec74
b044ebec  b044ec50
b044ebf0  f41f9458 ofant+0x1e458
b044ebf4  8a7668c0
b044ebf8  e16c2e80
b044ebfc  00000000
b044ec00  00000000
b044ec04  00000002
b044ec08  01000000
b044ec0c  00000000
b044ec10  00000000
...
...
...
b044ec60  00000000
b044ec64  b044ef94
b044ec68  8088e13f nt!RtlRaiseStatus+0x47
b044ec6c  b044ef44
b044ec70  b044ec74

b044ec74  00010007



b0450fe8  00000000
b0450fec  00000000
b0450ff0  00000000
b0450ff4  00000000
b0450ff8  00000000
b0450ffc  00000000
b0451000  ????????

5: kd> .exr b044ef44
ExceptionAddress: f41dde6d (ofant+0x00002e6d)
   ExceptionCode: c0000043
  ExceptionFlags: 00000001
NumberParameters: 0

5: kd> .cxr b044ec74
eax=c0000043 ebx=00000000 ecx=89fe1bc0 edx=b044f084 esi=e16c2e80 edi=8a7668c0
eip=f41dde6d esp=b044efa0 ebp=b044f010 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246
ofant+0x2e6d:
f41dde6d e92f010000      jmp     ofant+0x2fa1 (f41ddfa1)

5: kd> knf
  *** Stack trace for last set context - .thread/.cxr resets it
 #   Memory  ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
00           b044f010 f41ddce6 ofant+0x2e6d
01        b0 b044f0c0 f41dd930 ofant+0x2ce6
02        38 b044f0f8 f41e88eb ofant+0x2930
03        2c b044f124 f6598eba ofant+0xd8eb
04        24 b044f148 f41dcd40 SYMEVENT!SYMEvent_AllocVMData+0x84da
05        18 b044f160 8081dce5 ofant+0x1d40
06        14 b044f174 f6596741 nt!IofCallDriver+0x45
07        28 b044f19c f659dd70 SYMEVENT!SYMEvent_AllocVMData+0x5d61
08        1c b044f1b8 f65967b9 SYMEVENT!EventObjectCreate+0xa60
09        40 b044f1f8 8081dce5 SYMEVENT!SYMEvent_AllocVMData+0x5dd9
0a        14 b044f20c 808f8255 nt!IofCallDriver+0x45
0b        e8 b044f2f4 80936af5 nt!IopParseDevice+0xa35
0c        80 b044f374 80932de6 nt!ObpLookupObjectName+0x5a9
0d        54 b044f3c8 808ea211 nt!ObOpenObjectByName+0xea
0e        7c b044f444 808eb4ab nt!IopCreateFile+0x447
0f        5c b044f4a0 808edf2a nt!IoCreateFile+0xa3
10        40 b044f4e0 80888c6c nt!NtCreateFile+0x30
11         0 b044f4e0 8082e105 nt!KiFastCallEntry+0xfc
12        a4 b044f584 f657f20d nt!ZwCreateFile+0x11
13        54 b044f5d8 f65570f6 NAVAP+0x2e20d

Therefore, the following components found on raw stack look suspicious:

ofant.sys, SYMEVENT.SYS and NAVAP.sys.

We should check their timestamps using lmv command and contact their vendors for any existing updates. The workaround would be to remove those products. The rest are Microsoft modules and drivers component.sys and component2.sys.

For the latter two we don’t have significant local variable usage in their functions.

OSR NT Insider article provides another example:

http://www.osronline.com/article.cfm?article=254

The following Citrix article provides an example of stack overflow in ICA protocol stack:

http://support.citrix.com/article/CTX106209 

- Dmitry Vostokov @ DumpAnalysis.org -

11 Responses to “Crash Dump Analysis Patterns (Part 16a)”

  1. Dmitry Vostokov Says:

    Here is another example from NT Debugging blog:

    http://blogs.msdn.com/ntdebugging/archive/2008/02/01/kernel-stack-overflows.aspx

  2. Bill Says:

    Dmitry

    This dump file gives clues as to the trap and .tss selector. What do these clues mean:

    1: kd> !analyze -v

    UNEXPECTED_KERNEL_MODE_TRAP (7f)
    This means a trap occurred in kernel mode, and it’s a trap of a kind
    that the kernel isn’t allowed to have/catch (bound trap) or that
    is always instant death (double fault). The first number in the
    bugcheck params is the number of the trap (8 = double fault, etc)
    Consult an Intel x86 family manual to learn more about what these
    traps are. Here is a *portion* of those codes:
    If kv shows a taskGate
    use .tss on the part before the colon, then kv.
    Else if kv shows a trapframe
    use .trap on that value
    Else
    .trap on the appropriate frame will show where the trap was taken
    (on x86, this will be the ebp that goes with the procedure KiTrap)
    Endif
    kb will then show the corrected stack.
    Arguments:
    Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
    Arg2: f7727fe0
    Arg3: 00000000
    Arg4: 00000000

    Debugging Details:
    ——————

    WARNING: Process directory table base AFFB7740 doesn’t match CR3 00545000
    Unable to get PEB pointer

    WARNING: Process directory table base AFFB7740 doesn’t match CR3 00545000
    Unable to get PEB pointer

    BUGCHECK_STR: 0×7f_8

    TSS: 00000028 — (.tss 0×28)
    eax=00000000 ebx=f78dd100 ecx=a53a5b40 edx=007ffff8 esi=80000000 edi=c0603018
    eip=8085e1d0 esp=f78dcfb8 ebp=f78dd008 iopl=0 nv up ei ng nz na pe nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010286
    nt!MmAccessFault+0×8:
    8085e1d0 and dword ptr [ebp-1Ch],0 ss:0010:f78dcfec=00000000
    Resetting default scope

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    PROCESS_NAME: drwtsn32.exe

    CURRENT_IRQL: 2

    TRAP_FRAME: f78dd308 — (.trap 0xfffffffff78dd308)
    ErrCode = 00000000
    eax=40000000 ebx=c0400000 ecx=c0603018 edx=007ffff8 esi=80000000 edi=00000001
    eip=8084d509 esp=f78dd37c ebp=f78dd3e8 iopl=0 nv up ei pl nz na po cy
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010203
    nt!MiCheckPdeForPagedPool+0×73:
    8084d509 mov eax,dword ptr [ecx] ds:0023:c0603018=00549063
    Resetting default scope

    LAST_CONTROL_TRANSFER: from 8088c798 to 8085e1d0

    STACK_TEXT:
    f78dd008 8088c798 00000000 c0603018 00000000 nt!MmAccessFault+0×8

    (the dump file continues)

  3. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 16b) Says:

    […] just found that although I covered Stack Overflow in kernel mode I didn’t do this for user mode. In fact this is one of the simplest patterns to see in crash […]

  4. Blue Fish Says:

    According to the result of : dds b044e000 b044e000+3000
    May I know whay to choose “b044ef44″ & “b044ec74″ to inspect? Does we need to choose the value of “RtlRaiseStatus+0×47″ after? How about if I can’t find “RtlRaiseStatus+0×47″?

    Thanks!

  5. Dmitry Vostokov Says:

    I chose b044e000 because I wanted to inspect the raw stack data from the top of the stack (ESP value in red above). By choosing other ranges we might miss something.

  6. Crash Dump Analysis » Blog Archive » Lateral damage, stack overflow and execution residue: pattern cooperation Says:

    […] bugcheck argument 1 shows that we have a double fault that most often results from kernel stack overflow. If we go back to processor 0 to inspect its TSS we don’t get meaningful results too (we […]

  7. Crash Dump Analysis » Blog Archive » Variable Kernel Stack in Vista and W2K8 Says:

    […] Stack Overflow Pattern (kernel mode) […]

  8. Finy Says:

    I have the same question as Blue Fish had(June 24th, 2008 at 3:44 am) :

    May I know why to choose “b044ef44″ & “b044ec74″ to inspect?

  9. Dmitry Vostokov Says:

    Certain functions like RtlDispatchException, RtlRaiseStatus and others have function parameters pointing to an exception record and context. See, for example,

    http://source.winehq.org/source/dlls/kernel32/except.c#L84

  10. Crash Dump Analysis » Blog Archive » Icons for Memory Dump Analysis Patterns (Part 31) Says:

    […] we introduce an icon for Stack Overflow (kernel mode) […]

  11. Crash Dump Analysis » Blog Archive » Old Mental Dumps from June 21st Says:

    […] • Crash Dump Analysis Patterns (Part 16a) - Stack overflow in kernel. Generated some comments and can also be see in the following pattern case study: Lateral damage, stack overflow and execution residue […]

Leave a Reply

You must be logged in to post a comment.