Archive for the ‘Hardware’ Category

Interrupts and exceptions explained (Part 3)

Tuesday, May 15th, 2007

In Part 1 discussed interrupt processing that happens when an x86 processor executes in privileged protected mode (ring 0). It pushes interrupt frame shown in the following pseudo-code:

push EFLAGS
push CS
push EIP
push ErrorCode
EIP := IDT[VectorNumber].ExtendedOffset<<16 +
   IDT[VectorNumber].Offset

Please note that this is an interrupt frame that is created by CPU and not a trap frame created by a software interrupt handler to save CPU state (_KTRAP_FRAME).

If an x86 processor executes in user mode (ring 3) and an interrupt happens then the stack switch occurs before the processor saves user mode stack pointer SS:ESP and pushes the rest of the interrupt frame. Pushing both SS:RSP always happens on x64 processor regardless of the current execution mode, kernel or user. Therefore the following x86 pseudo-code shows how interrupt frame is pushed on the current stack (to be precise, on the kernel space stack if the interrupt happened in user mode):

push SS
push ESP
push EFLAGS
push CS
push EIP
push ErrorCode
EIP := IDT[VectorNumber].ExtendedOffset<<16 +
   IDT[VectorNumber].Offset

Usually CS is 0×1b and SS is 0×23 for x86 Windows flat memory model so we can easily identify this pattern on raw stack data.

Why should we care about an interrupt frame? This is because in complete full memory dumps we can see exceptions that happened in user space and were being processed at the time the dump was saved.

Let’s look at some example:

PROCESS 89a94800 SessionId: 1 Cid: 1050 Peb: 7ffd7000 ParentCid: 08a4
DirBase: 390f5000 ObjectTable: e36ee0b8 HandleCount: 168.
Image: processA.exe
VadRoot 8981d0a0 Vads 309 Clone 0 Private 222555. Modified 10838. Locked 0.
DeviceMap e37957e0
Token e395b8f8
ElapsedTime 07:44:38.505
UserTime 00:54:52.906
KernelTime 00:00:58.109
QuotaPoolUsage[PagedPool] 550152
QuotaPoolUsage[NonPagedPool] 14200
Working Set Sizes (now,min,max) (213200, 50, 345) (852800KB, 200KB, 1380KB)
PeakWorkingSetSize 227093
VirtualSize 1032 Mb
PeakVirtualSize 1032 Mb
PageFaultCount 232357
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 233170
DebugPort 899b6a40

We see that the process has a DebugPort and the presence of it usually shows that some exception happened. Therefore if you dump all processes by entering !process 0 1 command you can search for any unhandled exceptions.

Indeed if we switch to this process (you can also use !process 89a94800 ff command for dumps coming from XP and higher systems) we see KiDispatchException on one of the processA’s threads:

0: kd> .process 89a94800
0: kd> .reload
0: kd> !process 89a94800
...
...
...
THREAD 89a93020 Cid 1050.1054 Teb: 7ffdf000 Win32Thread: bc1da760 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
f44dc3a8 SynchronizationEvent
Not impersonating
DeviceMap e37957e0
Owning Process 89a94800 Image: processA.exe
Wait Start TickCount 4244146 Ticks: 1232980 (0:05:21:05.312)
Context Switch Count 1139234 LargeStack
UserTime 00:54:51.0531
KernelTime 00:00:53.0937
Win32 Start Address processA!WinMainCRTStartup (0x00c534c8)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init f44dcbd0 Current f44dc2ec Base f44dd000 Limit f44d7000 Call f44dcbd8
Priority 12 BasePriority 8 PriorityDecrement 2
ChildEBP RetAddr
f44dc304 8083d5b1 nt!KiSwapContext+0x26
f44dc330 8083df9e nt!KiSwapThread+0x2e5
f44dc378 809c3cff nt!KeWaitForSingleObject+0x346
f44dc458 809c4f09 nt!DbgkpQueueMessage+0x178
f44dc47c 80977ad9 nt!DbgkpSendApiMessage+0x45
f44dc508 8081a94f nt!DbgkForwardException+0x90
f44dc8c4 808346b4 nt!KiDispatchException+0×1ea
f44dc92c 80834650 nt!CommonDispatchException+0×4a
f44dc9b8 80a801ae nt!Kei386EoiHelper+0×16e
0012f968 0046915d hal!HalpDispatchSoftwareInterrupt+0×5e
0012f998 0047cb72 processA!CalculateClientSizeFromPoint+0×5f
0012f9bc 0047cc1d processA!CalculateFromPoint+0×30
0012fa64 0047de83 processA!DrawUsingMemDC+0×1b9
0012fac0 0099fb43 processA!OnDraw+0×13
0012fb5c 7c17332d processA!OnPaint+0×56
0012fbe8 7c16e0b0 MFC71!CWnd::OnWndMsg+0×340
0012fc08 00c6253a MFC71!CWnd::WindowProc+0×22
0012fc24 0096cf9d processA!WindowProc+0×38
0012fcb8 7c16e1b8 MFC71!AfxCallWndProc+0×91
0012fcd8 7c16e1f6 MFC71!AfxWndProc+0×46
0012fd04 7739b6e3 MFC71!AfxWndProcBase+0×39
0012fd30 7739b874 USER32!InternalCallWinProc+0×28
0012fda8 7739c8b8 USER32!UserCallWinProcCheckWow+0×151
0012fe04 7739c9c6 USER32!DispatchClientMessage+0xd9
0012fe2c 7c828536 USER32!__fnDWORD+0×24
0012fe2c 80832dee ntdll!KiUserCallbackDispatcher+0×2e
f44dcbf0 8092d605 nt!KiCallUserMode+0×4
f44dcc48 bf8a26d3 nt!KeUserModeCallback+0×8f
f44dcccc bf89e985 win32k!SfnDWORD+0xb4
f44dcd0c bf89eb27 win32k!xxxDispatchMessage+0×223
f44dcd58 80833bdf win32k!NtUserDispatchMessage+0×4c
f44dcd58 7c8285ec nt!KiFastCallEntry+0xfc
0012fe2c 7c828536 ntdll!KiFastSystemCallRet
0012fe58 7739c57b ntdll!KiUserCallbackDispatcher+0×2e
0012fea8 773a16e5 USER32!NtUserDispatchMessage+0xc
0012feb8 7c169076 USER32!DispatchMessageA+0xf
0012fec8 7c16913e MFC71!AfxInternalPumpMessage+0×3e
0012fee4 0041cb0b MFC71!CWinThread::Run+0×54
0012ff08 7c172fc5 processA!CMain::Run+0×3b
0012ff18 00c5364d MFC71!AfxWinMain+0×68
0012ffc0 77e6f23b processA!WinMainCRTStartup+0×185
0012fff0 00000000 kernel32!BaseProcessStart+0×23

You might think that exception happened in CalculateClientSizeFromPoint function. However there is no nt!KiTrapXXX call and hal!HalpDispatchSoftwareInterrupt has user space return address and this looks suspicious. So we need to look at raw stack data and find our interrupt frame. We look for KiDispatchException, then for KiTrap substring and finally for 0000001b. If 0000001b and 00000023 are separated by 2 double words then we have found out interrupt frame:

0: kd> .thread 89a93020
Implicit thread is now 89a93020
0: kd> dds esp esp+1000
...
...
...
f44dc2f8 f44dc330
f44dc2fc 89a93098
f44dc300 ffdff120
f44dc304 89a93020
f44dc308 8083d5b1 nt!KiSwapThread+0x2e5
f44dc30c 89a93020
f44dc310 89a930c8
f44dc314 00000000
...
...
...
f44dc4e8 f44dcc38
f44dc4ec 8083a8cc nt!_except_handler3
f44dc4f0 80870868 nt!`string'+0xa4
f44dc4f4 ffffffff
f44dc4f8 80998bfd nt!Ki386CheckDivideByZeroTrap+0x273
f44dc4fc 8083484f nt!KiTrap00+0x88
f44dc500 00000001
f44dc504 0000bb40
f44dc508 f44dc8c4
f44dc50c 8081a94f nt!KiDispatchException+0×1ea
f44dc510 f44dc8e0
f44dc514 00000001
f44dc518 00000000
f44dc51c 00469583 processA!LPtoDP+0×19
f44dc520 16b748f0
f44dc524 00469583 processA!LPtoDP+0×19
f44dc528 00000000
f44dc52c 00000000



f44dc8c0 ffffffff
f44dc8c4 f44dc934
f44dc8c8 808346b4 nt!CommonDispatchException+0×4a
f44dc8cc f44dc8e0
f44dc8d0 00000000
f44dc8d4 f44dc934
f44dc8d8 00000001
f44dc8dc 00000001
f44dc8e0 c0000094
f44dc8e4 00000000
f44dc8e8 00000000
f44dc8ec 00469583 processA!LPtoDP+0×19
f44dc8f0 00000000
f44dc8f4 808a3988 nt!KiAbiosPresent+0×4
f44dc8f8 ffffffff
f44dc8fc 0000a6f2
f44dc900 00469585 processA!LPtoDP+0×1b
f44dc904 00000004
f44dc908 00000000
f44dc90c f9000001
f44dc910 f44dc8dc
f44dc914 ffffffff
f44dc918 f44dcc38
f44dc91c 8083a8cc nt!_except_handler3
f44dc920 80870868 nt!`string’+0xa4
f44dc924 ffffffff
f44dc928 80998bfd nt!Ki386CheckDivideByZeroTrap+0×273
f44dc92c 8083484f nt!KiTrap00+0×88
f44dc930 80834650 nt!Kei386EoiHelper+0×16e
f44dc934 0012f968
f44dc938 00469583 processA!LPtoDP+0×19
f44dc93c badb0d00
f44dc940 00000000
f44dc944 ffffffff
f44dc948 00007fff
f44dc94c 00000000
f44dc950 fffff800
f44dc954 ffffffff
f44dc958 00007fff
f44dc95c 00000000
f44dc960 00000000
f44dc964 80a80000 hal!HalpInitIrqlAuditFlag+0×4e
f44dc968 00000023
f44dc96c 00000023
f44dc970 00000000
f44dc974 00000000
f44dc978 00005334
f44dc97c 00000001
f44dc980 f44dcc38
f44dc984 0000003b
f44dc988 16b748f0
f44dc98c 16b748f0
f44dc990 0012f9fc
f44dc994 0012f968
f44dc998 00000000 ; ErrorCode
f44dc99c 00469583 processA!LPtoDP+0×19 ; EIP
f44dc9a0 0000001b ; CS
f44dc9a4 00010246 ; EFLAGS
f44dc9a8 0012f934 ; ESP
f44dc9ac 00000023 ; SS
f44dc9b0 8982e7e0
f44dc9b4 00000000

Why did we skip the first KiTrap00? Because KiDispatchException is called after KiTrap00 so we should see it before KiTrap00 on raw stack. To see all these calls we can disassemble return addresses:

0: kd> .asm no_code_bytes
Assembly options: no_code_bytes
0: kd> ub nt!KiTrap00+0x88
nt!KiTrap00+0x74:
8083483b test byte ptr [ebp+6Ch],1
8083483f je   nt!KiTrap00+0x81 (80834848)
80834841 cmp  word ptr [ebp+6Ch],1Bh
80834846 jne  nt!KiTrap00+0x9e (80834865)
80834848 sti
80834849 push ebp
8083484a call nt!Ki386CheckDivideByZeroTrap (8099897d)
8083484f mov  ebx,dword ptr [ebp+68h]

nt!KiTrap00+0×88 is not equal to nt!KiTrap00+0×74 so we have OMAP code optimization case here and we have to disassemble raw addresses as seen on the raw stack fragment repeated here:

...
...
...
f44dc8c8 808346b4 nt!CommonDispatchException+0x4a
...
...
...
f44dc924 ffffffff
f44dc928 80998bfd nt!Ki386CheckDivideByZeroTrap+0x273
f44dc92c 8083484f nt!KiTrap00+0×88
f44dc930 80834650 nt!Kei386EoiHelper+0×16e
f44dc934 0012f968


0: kd> u 8083484f
nt!KiTrap00+0×88:
8083484f mov  ebx,dword ptr [ebp+68h]
80834852 jmp  nt!Kei386EoiHelper+0×167 (80834649)
80834857 sti
80834858 mov  ebx,dword ptr [ebp+68h]
8083485b mov  eax,0C0000094h
80834860 jmp  nt!Kei386EoiHelper+0×167 (80834649)
80834865 mov  ebx,dword ptr fs:[124h]
8083486c mov  ebx,dword ptr [ebx+38h]
0: kd> u 80834649
nt!Kei386EoiHelper+0×167:
80834649 xor  ecx,ecx
8083464b call nt!CommonDispatchException (8083466a)
80834650 xor  edx,edx ; nt!Kei386EoiHelper+0×16e
80834652 mov  ecx,1
80834657 call nt!CommonDispatchException (8083466a)
8083465c xor  edx,edx
8083465e mov  ecx,2
80834663 call nt!CommonDispatchException (8083466a)
0: kd> ub 808346b4
nt!CommonDispatchException+0×38:
808346a2 mov  eax,dword ptr [ebp+6Ch]
808346a5 and  eax,1
808346a8 push 1
808346aa push eax
808346ab push ebp
808346ac push 0
808346ae push ecx
808346af call nt!KiDispatchException (80852a53)

So we see that KiTrap00 calls CommonDispatchException which calls KiDispatchException. If we look at our found interrupt frame we see that EIP of the exception was 00469583 and ESP was 0012f934:

...
...
...
f44dc998 00000000 ; ErrorCode
f44dc99c 00469583 processA!LPtoDP+0×19 ; EIP
f44dc9a0 0000001b ; CS
f44dc9a4 00010246 ; EFLAGS
f44dc9a8 0012f934 ; ESP
f44dc9ac 00000023 ; SS


Now we try to reconstruct stack trace by putting the values of ESP and EIP:

0: kd> k L=0012f934 0012f934 00469583 ; EBP ESP EIP format
ChildEBP RetAddr
0012f930 00469a16 processA!LPtoDP+0x19
0012f934 00000000 processA!GetColumnWidth+0x45

Stack trace doesn’t look good, there is neither BaseProcessStart nor BaseThreadStart, perhaps because we specified ESP value twice instead of EBP and ESP. Let’s hope to find EBP value by dumping the memory around ESP:

0: kd> dds 0012f934-10 0012f934+100
0012f924 00000000
0012f928 0012f934 ; the same as ESP
0012f92c 0012f968 ; looks good to us
0012f930 00469572 processA!LPtoDP+0×8
0012f934 00469a16 processA!GetColumnWidth+0×45
0012f938 00005334



0012f964 00005334
0012f968 0012f998
0012f96c 0046915d processA!CalculateClientSizeFromPoint+0×5f
0012f970 00000000
0012f974 0012f9fc
0012f978 16b748f0
0012f97c 0012fa48
0012f980 00000000
0012f984 00000000
0012f988 000003a0
0012f98c 00000237
0012f990 00000014
0012f994 00000000
0012f998 0012f9bc
0012f99c 0047cb72 processA!CalculateFromPoint+0×30
0012f9a0 0012f9fc
0012f9a4 0012f9b4
0012f9a8 0012fa48


So finally we get our stack trace:

0: kd> k L=0012f968 0012f934 00469583 100
ChildEBP RetAddr
0012f930 00469a16 processA!LPtoDP+0x19
0012f968 0046915d processA!GetColumnWidth+0x45
0012f998 0047cb72 processA!CalculateClientSizeFromPoint+0x5f
0012f9bc 0047cc1d processA!CalculateFromPoint+0x30
0012fa64 0047de83 processA!DrawUsingMemDC+0x1b9
0012fac0 0099fb43 processA!OnDraw+0x13
0012fb5c 7c17332d processA!OnPaint+0x56
0012fbe8 7c16e0b0 MFC71!CWnd::OnWndMsg+0x340
0012fc08 00c6253a MFC71!CWnd::WindowProc+0x22
0012fc24 0096cf9d processA!WindowProc+0x38
0012fcb8 7c16e1b8 MFC71!AfxCallWndProc+0x91
0012fcd8 7c16e1f6 MFC71!AfxWndProc+0x46
0012fd04 7739b6e3 MFC71!AfxWndProcBase+0x39
0012fd30 7739b874 USER32!InternalCallWinProc+0x28
0012fda8 7739c8b8 USER32!UserCallWinProcCheckWow+0x151
0012fe04 7739c9c6 USER32!DispatchClientMessage+0xd9
0012fe2c 7c828536 USER32!__fnDWORD+0x24
0012fe2c 80832dee ntdll!KiUserCallbackDispatcher+0x2e
f44dcbf0 8092d605 nt!KiCallUserMode+0x4
f44dcc48 bf8a26d3 nt!KeUserModeCallback+0x8f
f44dcccc bf89e985 win32k!SfnDWORD+0xb4
f44dcd0c bf89eb27 win32k!xxxDispatchMessage+0x223
f44dcd58 80833bdf win32k!NtUserDispatchMessage+0x4c
f44dcd58 7c8285ec nt!KiFastCallEntry+0xfc
0012fe2c 7c828536 ntdll!KiFastSystemCallRet
0012fe58 7739c57b ntdll!KiUserCallbackDispatcher+0x2e
0012fea8 773a16e5 USER32!NtUserDispatchMessage+0xc
0012feb8 7c169076 USER32!DispatchMessageA+0xf
0012fec8 7c16913e MFC71!AfxInternalPumpMessage+0x3e
0012fee4 0041cb0b MFC71!CWinThread::Run+0x54
0012ff08 7c172fc5 processA!CMain::Run+0x3b
0012ff18 00c5364d MFC71!AfxWinMain+0x68
0012ffc0 77e6f23b processA!WinMainCRTStartup+0x185
0012fff0 00000000 kernel32!BaseProcessStart+0x23

- Dmitry Vostokov -

Interrupts and exceptions explained (Part 2)

Saturday, May 5th, 2007

As I promised in Part 1, I describe changes in 64-bit Windows. The size of IDTR is 10 bytes where 8 bytes hold 64-bit address of IDT. The size of IDT entry is 16 bytes and it holds the address of an interrupt procedure corresponding to an interrupt vector. However interrupt procedure names are different in x64 Windows, they do not follow the same pattern like KiTrapXX. 

The following UML class diagram describes the relationship and also shows what registers are saved. In native x64 mode SS and RSP are saved regardless of kernel or user mode.

Let’s dump all architecture-defined interrupt procedure names. This is a good exercise because we will use scripting. !pcr extension reports wrong IDT base so we use dt command:

kd> !pcr 0
KPCR for Processor 0 at fffff80001176000:
    Major 1 Minor 1
 NtTib.ExceptionList: fffff80000124000
     NtTib.StackBase: fffff80000125070
    NtTib.StackLimit: 0000000000000000
  NtTib.SubSystemTib: fffff80001176000
       NtTib.Version: 0000000001176180
   NtTib.UserPointer: fffff800011767f0
       NtTib.SelfTib: 000000007ef95000
             SelfPcr: 0000000000000000
                Prcb: fffff80001176180
                Irql: 0000000000000000
                 IRR: 0000000000000000
                 IDR: 0000000000000000
       InterruptMode: 0000000000000000
                 IDT: 0000000000000000
                 GDT: 0000000000000000
                 TSS: 0000000000000000
       CurrentThread: fffffadfe669f890
          NextThread: 0000000000000000
          IdleThread: fffff8000117a300
           DpcQueue:
kd> dt _KPCR fffff80001176000
nt!_KPCR
   +0×000 NtTib            : _NT_TIB
   +0×000 GdtBase          : 0xfffff800`00124000 _KGDTENTRY64
   +0×008 TssBase          : 0xfffff800`00125070 _KTSS64
   +0×010 PerfGlobalGroupMask : (null)
   +0×018 Self             : 0xfffff800`01176000 _KPCR
   +0×020 CurrentPrcb      : 0xfffff800`01176180 _KPRCB
   +0×028 LockArray        : 0xfffff800`011767f0 _KSPIN_LOCK_QUEUE
   +0×030 Used_Self        : 0×00000000`7ef95000
   +0×038 IdtBase          : 0xfffff800`00124070 _KIDTENTRY64
   +0×040 Unused           : [2] 0
   +0×050 Irql             : 0 ”
   +0×051 SecondLevelCacheAssociativity : 0×10 ”
   +0×052 ObsoleteNumber   : 0 ”
   +0×053 Fill0            : 0 ”
   +0×054 Unused0          : [3] 0
   +0×060 MajorVersion     : 1
   +0×062 MinorVersion     : 1
   +0×064 StallScaleFactor : 0×892
   +0×068 Unused1          : [3] (null)
   +0×080 KernelReserved   : [15] 0
   +0×0bc SecondLevelCacheSize : 0×100000
   +0×0c0 HalReserved      : [16] 0×82c5c880
   +0×100 Unused2          : 0
   +0×108 KdVersionBlock   : 0xfffff800`01174ca0
   +0×110 Unused3          : (null)
   +0×118 PcrAlign1        : [24] 0
   +0×180 Prcb             : _KPRCB

Next we dump the first entry of IDT array and glue together OffsetHigh, OffsetMiddle and OffsetLow fields to form the interrupt procedure address corresponding to the interrupt vector 0, divide by zero exception:

kd> dt _KIDTENTRY64 0xfffff800`00124070
nt!_KIDTENTRY64
   +0x000 OffsetLow        : 0xf240
   +0x002 Selector         : 0x10
   +0x004 IstIndex         : 0y000
   +0x004 Reserved0        : 0y00000 (0)
   +0x004 Type             : 0y01110 (0xe)
   +0x004 Dpl              : 0y00
   +0x004 Present          : 0y1
   +0×006 OffsetMiddle     : 0×103
   +0×008 OffsetHigh       : 0xfffff800
   +0×00c Reserved1        : 0
   +0×000 Alignment        : 0×1038e00`0010f240

kd> u 0xfffff8000103f240
nt!KiDivideErrorFault:
fffff800`0103f240 4883ec08        sub     rsp,8
fffff800`0103f244 55              push    rbp
fffff800`0103f245 4881ec58010000  sub     rsp,158h
fffff800`0103f24c 488dac2480000000 lea     rbp,[rsp+80h]
fffff800`0103f254 c645ab01        mov     byte ptr [rbp-55h],1
fffff800`0103f258 488945b0        mov     qword ptr [rbp-50h],rax
fffff800`0103f25c 48894db8        mov     qword ptr [rbp-48h],rcx
fffff800`0103f260 488955c0        mov     qword ptr [rbp-40h],rdx
kd> ln 0xfffff8000103f240
(fffff800`0103f240)   nt!KiDivideErrorFault   |
(fffff800`0103f300)   nt!KiDebugTrapOrFault
Exact matches:
    nt!KiDivideErrorFault = <no type information>

We see that the name of the procedure is KiDivideErrorFault and not KiTrap00. We can dump the second IDT entry manually by adding a 0×10 offset but in order to automate this I wrote the following WinDbg script to dump the first 20 vectors and get their interrupt procedure names:

r? $t0=(_KIDTENTRY64 *)0xfffff800`00124070; .for (r $t1=0; @$t1 <= 13; r? $t0=(_KIDTENTRY64 *)@$t0+1) { .printf “Interrupt vector %d (0x%x):\n”, @$t1, @$t1; ln @@c++(@$t0->OffsetHigh*0×100000000 + @$t0->OffsetMiddle*0×10000 + @$t0->OffsetLow); r $t1=$t1+1 }

Here is the same script but formatted:

r? $t0=(_KIDTENTRY64 *)0xfffff800`00124070;
.for (r $t1=0; @$t1 <= 13; r? $t0=(_KIDTENTRY64 *)@$t0+1)
{
    .printf "Interrupt vector %d (0x%x):\n", @$t1, @$t1;
    ln @@c++(@$t0->OffsetHigh*0x100000000 +
        @$t0->OffsetMiddle*0x10000 + @$t0->OffsetLow);
    r $t1=$t1+1
}

The output on my system is:

Interrupt vector 0 (0x0):
(fffff800`0103f240) nt!KiDivideErrorFault | (fffff800`0103f300) nt!KiDebugTrapOrFault
Exact matches:
  nt!KiDivideErrorFault = <no type information>
Interrupt vector 1 (0×1):
(fffff800`0103f300) nt!KiDebugTrapOrFault | (fffff800`0103f440) nt!KiNmiInterrupt
Exact matches:
  nt!KiDebugTrapOrFault = <no type information>
Interrupt vector 2 (0×2):
(fffff800`0103f440) nt!KiNmiInterrupt | (fffff800`0103f680) nt!KxNmiInterrupt
Exact matches:
  nt!KiNmiInterrupt = <no type information>
Interrupt vector 3 (0×3):
(fffff800`0103f780) nt!KiBreakpointTrap | (fffff800`0103f840) nt!KiOverflowTrap
Exact matches:
  nt!KiBreakpointTrap = <no type information>
Interrupt vector 4 (0×4):
(fffff800`0103f840) nt!KiOverflowTrap | (fffff800`0103f900) nt!KiBoundFault
Exact matches:
  nt!KiOverflowTrap = <no type information>
Interrupt vector 5 (0×5):
(fffff800`0103f900) nt!KiBoundFault | (fffff800`0103f9c0) nt!KiInvalidOpcodeFault
Exact matches:
  nt!KiBoundFault = <no type information>
Interrupt vector 6 (0×6):
(fffff800`0103f9c0) nt!KiInvalidOpcodeFault | (fffff800`0103fb80) nt!KiNpxNotAvailableFault
Exact matches:
  nt!KiInvalidOpcodeFault = <no type information>
Interrupt vector 7 (0×7):
(fffff800`0103fb80) nt!KiNpxNotAvailableFault | (fffff800`0103fc40) nt!KiDoubleFaultAbort
Exact matches:
  nt!KiNpxNotAvailableFault = <no type information>
Interrupt vector 8 (0×8):
(fffff800`0103fc40) nt!KiDoubleFaultAbort | (fffff800`0103fd00) nt!KiNpxSegmentOverrunAbort
Exact matches:
  nt!KiDoubleFaultAbort = <no type information>
Interrupt vector 9 (0×9):
(fffff800`0103fd00) nt!KiNpxSegmentOverrunAbort | (fffff800`0103fdc0) nt!KiInvalidTssFault
Exact matches:
  nt!KiNpxSegmentOverrunAbort = <no type information>
Interrupt vector 10 (0xa):
(fffff800`0103fdc0) nt!KiInvalidTssFault | (fffff800`0103fe80) nt!KiSegmentNotPresentFault
Exact matches:
  nt!KiInvalidTssFault = <no type information>
Interrupt vector 11 (0xb):
(fffff800`0103fe80) nt!KiSegmentNotPresentFault | (fffff800`0103ff80) nt!KiStackFault
Exact matches:
  nt!KiSegmentNotPresentFault = <no type information>
Interrupt vector 12 (0xc):
(fffff800`0103ff80) nt!KiStackFault | (fffff800`01040080) nt!KiGeneralProtectionFault
Exact matches:
  nt!KiStackFault = <no type information>
Interrupt vector 13 (0xd):
(fffff800`01040080) nt!KiGeneralProtectionFault | (fffff800`01040180) nt!KiPageFault
Exact matches:
  nt!KiGeneralProtectionFault = <no type information>
Interrupt vector 14 (0xe):
(fffff800`01040180) nt!KiPageFault | (fffff800`010404c0) nt!KiFloatingErrorFault
Exact matches:
  nt!KiPageFault = <no type information>
Interrupt vector 15 (0xf):
(fffff800`01179090) nt!KxUnexpectedInterrupt0+0xf0 | (fffff800`0117a0c0) nt!KiNode0
Interrupt vector 16 (0×10):
(fffff800`010404c0) nt!KiFloatingErrorFault | (fffff800`01040600) nt!KiAlignmentFault
Exact matches:
  nt!KiFloatingErrorFault = <no type information>
Interrupt vector 17 (0×11):
(fffff800`01040600) nt!KiAlignmentFault | (fffff800`010406c0) nt!KiMcheckAbort
Exact matches:
  nt!KiAlignmentFault = <no type information>
Interrupt vector 18 (0×12):
(fffff800`010406c0) nt!KiMcheckAbort | (fffff800`01040900) nt!KxMcheckAbort
Exact matches:
  nt!KiMcheckAbort = <no type information>
Interrupt vector 19 (0×13):
(fffff800`01040a00) nt!KiXmmException | (fffff800`01040b80) nt!KiRaiseAssertion
Exact matches:
  nt!KiXmmException = <no type information>

Let’s look at some dump.

BugCheck 1E, {ffffffffc0000005, fffffade5ba2d643, 0, 28}

This is KMODE_EXCEPTION_NOT_HANDLED and obviously it could have been invalid memory access. And indeed the stack WinDbg shows after opening a dump and entering !analyze -v command is:

RSP
fffffade`4e88fe68 nt!KeBugCheckEx
fffffade`4e88fe70 nt!KiDispatchException+0x128
fffffade`4e8905f0 nt!KiPageFault+0x1e1
fffffade`4e890780 driver!foo+0x9b

and the exception context is:

2: kd> r
Last set context:
rax=fffffade4e8907f4 rbx=fffffade6de0c2e0 rcx=fffffa8014412000
rdx=fffffade71e7e2ac rsi=0000000000000000 rdi=fffffadffff03000
rip=fffffade5ba2d643 rsp=fffffade4e890780 rbp=fffffade71e7ffff
 r8=00000000000005b0 r9=fffffade4e890a88 r10=fffffadffd077898
r11=fffffade71e7e260 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=0950 es=4e89 fs=fade gs=ffff efl=00010246
driver!foo+0×9b:
fffffade`5ba2d643 8b4e28 mov ecx,dword ptr [rsi+28h] ds:0950:0028=????????

We see that KiPageFault was called and from the dumped IDT we see it corresponds to the interrupt vector 14 (0xE) which is called on any memory reference that is not present in physical memory.

Now I’m going to dump the raw stack around fffffade`4e890780 to see our the values the processor saved before calling KiPageFault:

2: kd> dps fffffade`4e890780-50 fffffade`4e890780+50
fffffade`4e890730  fffffade`6de0c2e0
fffffade`4e890738  fffffadf`fff03000
fffffade`4e890740  00000000`00000000
fffffade`4e890748  fffffade`71e7ffff
fffffade`4e890750  00000000`00000000 ; ErrorCode
fffffade`4e890758  fffffade`5ba2d643 driver!foo+0×9b ; RIP
fffffade`4e890760  00000000`00000010 ; CS
fffffade`4e890768  00000000`00010246 ; RFLAGS
fffffade`4e890770  fffffade`4e890780 ; RSP
fffffade`4e890778  00000000`00000018 ; SS

fffffade`4e890780  00000000`00000000 ; RSP before interrupt
fffffade`4e890788  00000000`00000000
fffffade`4e890790  00000000`00000000
fffffade`4e890798  00000000`00000000
fffffade`4e8907a0  00000000`00000000
fffffade`4e8907a8  00000000`00000000
fffffade`4e8907b0  00000000`00000000
fffffade`4e8907b8  00000000`00000000
fffffade`4e8907c0  00000000`00000000
fffffade`4e8907c8  00000000`00000000
fffffade`4e8907d0  00000000`00000000

You see the values are exactly the same as WinDbg shows in the saved context above. Actually if you look at Page-Fault Error Code bits in Intel Architecture Software Developer’s Manual Volume 3A, you would see that for this case, all zeroes, we have:

  • the page was not present in memory
  • the fault was caused by the read access
  • the processor was executing in kernel mode
  • no reserved bits in page directory were set to 1 when 0s were expected
  • it was not caused by instruction fetch

- Dmitry Vostokov -

Interrupts and exceptions explained (Part 1)

Saturday, April 28th, 2007

So far we have seen various bugchecks depicted. What I left there is the explanation of how exceptions happen in the first place and how the execution flow reaches KiDispatchException. When some abnormal condition happens such as breakpoint, division by zero or memory protection violation then normal CPU execution flow (code stream) is interrupted (therefore I use the terms “interrupt” and “exception” interchangeably here). The type of interrupt is specified by a number called interrupt vector number. Obviously CPU has to transfer execution to some procedure in memory to handle that interrupt. CPU has to find that procedure, theoretically either having one procedure for all interrupts and specifying interrupt vector number as a parameter or having a table containing pointers to various procedures that correspond to different interrupt vectors. Intel x86 and x64 CPUs use the latter approach which is depicted on the following diagram:

 

When an exception happens (divide by zero, for example) CPU gets the address of the procedure table from IDTR (Interrupt Descriptor Table Register). This IDT (Interrupt Descriptor Table) is a zero-based array of 8-byte descriptors (x86) or 16-byte descriptors (x64). CPU calculates the location of the necessary procedure to call and does some necessary steps like saving appropriate registers before calling it.

The same happens when some external I/O device interrupts. We will talk about external interrupts later. For I/O devices the term “interrupt” is more appropriate.  On the picture above I/O hardware interrupt vector numbers were taken from some dump. These are OS and user-defined numbers. The first 32 vectors are reserved by Intel. 

Before Windows switches CPU to protected mode during boot process it creates IDT table in memory and sets IDTR to point to it by using SIDT instruction.

Let me now illustrate this by using a UML class diagram annotated by pseudo-code that shows what CPU does before calling the appropriate procedure. The pseudo-code on the diagram below is valid for interrupts and exceptions happening when the current CPU execution mode is kernel. For interrupts and exceptions generated when CPU executes code in user mode the picture is a little more complicated because the processor has to switch the current user-mode stack to kernel mode stack.

The following diagram is for 32-bit x86 processor (x64 will be illustrated later):

Let’s see all this in some dump. The address of IDT can be found by using !pcr [processor number] command. Every processor on a multiprocessor machine has its own IDT:

0: kd> !pcr 0
KPCR for Processor 0 at ffdff000:
    Major 1 Minor 1
 NtTib.ExceptionList: f2178b8c
     NtTib.StackBase: 00000000
    NtTib.StackLimit: 00000000
  NtTib.SubSystemTib: 80042000
       NtTib.Version: 0005c645
   NtTib.UserPointer: 00000001
       NtTib.SelfTib: 7ffdf000
             SelfPcr: ffdff000
                Prcb: ffdff120
                Irql: 0000001f
                 IRR: 00000000
                 IDR: ffffffff
       InterruptMode: 00000000
                 IDT: 8003f400
                 GDT: 8003f000
                 TSS: 80042000
       CurrentThread: 88c1d3c0
          NextThread: 00000000
          IdleThread: 808a68c0
           DpcQueue:

Every entry in IDT has the type _KIDTENTRY and we can get the first entry for divide by zero exception which has vector number 0:

0: kd> dt _KIDTENTRY 8003f400
   +0x000 Offset           : 0x47ca
   +0x002 Selector         : 8
   +0x004 Access           : 0x8e00
   +0x006 ExtendedOffset   : 0x8083

By gluing together ExtendedOffset and Offset fields we get the address of the interrupt handling procedure (0×808347ca) which is KiTrap00:

0: kd> ln 0x808347ca
(808347ca)   nt!KiTrap00   |  (808348a5)   nt!Dr_kit1_a
Exact matches:
    nt!KiTrap00

We can also see the interrupt trace in raw stack. For example, we can open the kernel dump and see the following stack trace and registers in the output of !analyze -v command:

ErrCode = 00000000
eax=00001b00 ebx=00001b00 ecx=00000000 edx=00000000 esi=f2178cb4 edi=bc15a838
eip=bf972586 esp=f2178c1c ebp=f2178c90 iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293
driver!foo+0xf9:
bf972586 f77d10 idiv eax,dword ptr [ebp+10h] ss:0010:f2178ca0=00000000
STACK_TEXT:
f2178b44 809989be nt!KeBugCheck+0×14
f2178b9c 8083484f nt!Ki386CheckDivideByZeroTrap+0×41
f2178b9c bf972586 nt!KiTrap00+0×88
f2178c90 bf94c23c driver!foo+0xf9
f2178d54 80833bdf driver!bar+0×11c

The dumping memory around ESP value (f2178c1c) shows the values processor pushes when divide by zero exception happens:

0: kd> dds f2178c1c-100 f2178c1c+100
...
...
...
f2178b80  00000000
f2178b84  f2178b50
f2178b88  00000000
f2178b8c  f2178d44
f2178b90  8083a8cc nt!_except_handler3
f2178b94  80870828 nt!`string'+0xa4
f2178b98  ffffffff
f2178b9c  f2178ba8
f2178ba0  8083484f nt!KiTrap00+0x88
f2178ba4  f2178ba8
f2178ba8  f2178c90
f2178bac  bf972586 driver!foo+0xf9
f2178bb0  badb0d00
f2178bb4  00000000
f2178bb8  0000006d
f2178bbc  bf842315
f2178bc0  f2178c6c
f2178bc4  f2178bec
f2178bc8  bf844154
f2178bcc  f2178c88
f2178bd0  f2178c88
f2178bd4  00000000
f2178bd8  bc150000
f2178bdc  f2170023
f2178be0  f2170023
f2178be4  00000000
f2178be8  00000000
f2178bec  00001b00
f2178bf0  f2178c0c
f2178bf4  f2178d44
f2178bf8  f2170030
f2178bfc  bc15a838
f2178c00  f2178cb4
f2178c04  00001b00
f2178c08  f2178c90
f2178c0c  00000000 ; ErrorCode
f2178c10  bf972586 driver!foo+0xf9 ; EIP
f2178c14  00000008 ; CS
f2178c18  00010293 ; EFLAGS

f2178c1c  00000000 ; <- ESP before interrupt
f2178c20  0013c220
f2178c24  00000000
f2178c28  60000000
f2178c2c  00000001
f2178c30  00000000
f2178c34  00000000


ErrorCode is not the same as interrupt vector number although it is the same number here (0). I won’t cover interrupt error codes here. If you are interested please consult Intel Architecture Software Developer’s Manual.

If we try to execute !idt extension command it will show you only user-defined hardware interrupt vectors:

0: kd> !idt
Dumping IDT:
37: 80a7d1ac hal!PicSpuriousService37
50: 80a7d284 hal!HalpApicRebootService
51: 89495044 serial!SerialCIsrSw (KINTERRUPT 89495008)
52: 89496044 i8042prt!I8042MouseInterruptService (KINTERRUPT 89496008)
53: 894ea044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894ea008)
63: 894f2044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894f2008)
72: 89f59044 atapi!IdePortInterrupt (KINTERRUPT 89f59008)
73: 89580044 NDIS!ndisMIsr (KINTERRUPT 89580008)
83: 899e7824 NDIS!ndisMIsr (KINTERRUPT 899e77e8)
92: 89f56044 atapi!IdePortInterrupt (KINTERRUPT 89f56008)
93: 89f1e044 SCSIPORT!ScsiPortInterrupt (KINTERRUPT 89f1e008)
a3: 894fa044 USBPORT!USBPORT_InterruptService (KINTERRUPT 894fa008)
a4: 894a3044 cpqcidrv+0×22AE (KINTERRUPT 894a3008)
b1: 89f697dc ACPI!ACPIInterruptServiceRoutine (KINTERRUPT 89f697a0)
b3: 89498824 i8042prt!I8042KeyboardInterruptService (KINTERRUPT 894987e8)
b4: 894a2044 cpqasm2+0×5B99 (KINTERRUPT 894a2008)
c1: 80a7d410 hal!HalpBroadcastCallService
d1: 80a7c754 hal!HalpClockInterrupt
e1: 80a7d830 hal!HalpIpiHandler
e3: 80a7d654 hal!HalpLocalApicErrorService
fd: 80a7e11c hal!HalpProfileInterrupt

In the next part I’m going to cover x64 specifics. 

- Dmitry Vostokov -

Notes about NMI_HARDWARE_FAILURE

Saturday, December 23rd, 2006

WinDbg help states that NMI_HARDWARE_FAILURE (0×80) bugcheck 80 indicates a hardware fault. This description can easily lead to a conclusion that a kernel or complete crash dump you just got from your customer doesn’t worth examining. But hardware malfunction is not always the case especially if your customer mentions that their system was hanging and they forced a manual dump. Here I would advise to check whether they have a special hardware for debugging purposes, for example, a card or an integrated iLO chip (Integrated Lights-Out) for remote server administration. Both can generate NMI (Non Maskable Interrupt) on demand and therefore bugcheck the system. If this is the case then it is worth examining their dump to see why the system was hanging.

- Dmitry Vostokov -

Citrix Access Thing (back to the future)

Wednesday, October 18th, 2006

I’m writing a presentation about the history of Voice Recognition on Windows platforms and found the following wikipedia article Covox Speech Thing about pioneering work of Covox Inc. where I was a remote employee in Moscow, Russia in 1992-1993 designing and implementing components for their VoiceBlaster speech recognition application (as far as I remember subject to the past war with Creative Labs and their SoundBlaster trademark)

Looking back almost 15 years ago I envisage that 15 years in the future someone might write an article called “Citrix Access Thing” featuring schematics of Citrix hardware and software appliances. Hope this will not happen.

- Dmitry Vostokov -

Citrix and hardware

Wednesday, September 13th, 2006

Don’t expect me to talk about Netscaler stuff. I’m a Windows guy. It’s started in 1989 when I got PS/2 with 2Mb of memory on board and Windows 2.x as a GUI appliance to an IBM thermal printer. And then suddenly Windows 3.0 appeared and I didn’t have a clue about programming on it (I was an MS DOS guy). Thanks to BBS (some of you probably have never heard about it - it was mini Internet at that time) I got a text file - that wonderful book “Programming Windows” 1st edition written by Charles Petzold and read it twice and being facinated by Windows GUI independence from hardware went straight programming Norton Commander variant. Enough nostalgia. Let’s come back to Citrix and hardware.

I’m a big fan of OSR. Read their articles and bought some hardware from them to learn about USB driver programming, like this one:

 

This is a real USB device! You connect it via cable to your USB port and you have a button, switches and indicators. Inspired by this device I’m writing a driver which will monitor the health of a Citrix server by lighting appropiate indicators when your IMA service is gone, showing the number of sessions active, etc. And there is a button which could force a server to show a blue screen (in another words force a system dump to do an analysis later - that’s my job in Citrix) if things go beyond control of a Citrix administrator. Something like a magic to me. Stay tuned. 

- Dmitry Vostokov -