Crash Dump Analysis Patterns (Part 87)
Sometimes we get rare or hardware-related bugchecks and these might have been influenced by hardware virtualization (that is also a software and could have its own defects). Therefore it is beneficial to recognize when a system is running under a VM (Virtualized System pattern):
Memory Dumps from Xen-virtualized Windows
Memory dumps from VMware images (Virtual PC diagnostics in post comments)
Memory Dumps from Hyper-Virtualized Windows
For example, we get the following bugcheck and stack trace for the first processor:
0: kd> !analyze -v
[...]
CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 00000060, Clock interrupt time out interval in nominal clock ticks.
Arg2: 00000000, 0.
Arg3: 805d8120, The PRCB address of the hung processor.
Arg4: 00000001, 0.
CURRENT_IRQL: 1c
STACK_TEXT:
8d283694 816df8a5 nt!KeBugCheckEx+0x1e
8d2836c8 816e163d nt!KeUpdateRunTime+0xd5
8d2836c8 84617008 nt!KeUpdateSystemTime+0xed
WARNING: Frame IP not in any known module. Following frames may be wrong.
8d283748 816f46a1 0×84617008
8d283758 816fa6aa nt!HvlpInitiateHypercall+0×21
8d283784 8166aca5 nt!HvlNotifyLongSpinWait+0×2b
8d2837a0 816cce7e nt!KeFlushSingleTb+0xc4
8d283808 81681db4 nt!MmAccessFault+0xc1d
8d283808 816dd033 nt!KiTrap0E+0xdc
8d28389c 8168ed58 nt!memcpy+0×33
8d283954 816712bf nt!MmCopyToCachedPage+0×1193
8d2839ec 81663053 nt!CcMapAndCopy+0×210
8d283a74 8c688218 nt!CcFastCopyWrite+0×283
8d283b98 8c40badc Ntfs!NtfsCopyWriteA+0×23e
8d283bcc 8c40bcab fltmgr!FltpPerformFastIoCall+0×22e
8d283bf8 8c41dc30 fltmgr!FltpPassThroughFastIo+0×7d
8d283c3c 818471cd fltmgr!FltpFastIoWrite+0×146
8d283d38 8167ec7a nt!NtWriteFile+0×34c
8d283d38 77115e74 nt!KiFastCallEntry+0×12a
01cfee80 00000000 0×77115e74
The thread was servicing a page fault. Notice the gap between KeUpdateSystemTime and HvlpInitiateHypercall. This is normal and consistent code if we look closer:
0: kd> .asm no_code_bytes
Assembly options: no_code_bytes
0: kd> uf HvlpInitiateHypercall
nt!HvlpInitiateHypercall:
816f4680 push edi
816f4681 push esi
816f4682 push ebx
816f4683 mov eax,dword ptr [esp+10h]
816f4687 mov edx,dword ptr [esp+14h]
816f468b mov ecx,dword ptr [esp+18h]
816f468f mov ebx,dword ptr [esp+1Ch]
816f4693 mov esi,dword ptr [esp+20h]
816f4697 mov edi,dword ptr [esp+24h]
816f469b call dword ptr [nt!HvlpHypercallCodeVa (8176bb8c)]
816f46a1 pop ebx
816f46a2 pop esi
816f46a3 pop edi
816f46a4 ret 18h
0: kd> dp 8176bb8c l1
8176bb8c 84617000
0: kd> uf 84617000
84617000 or eax,80000000h
84617005 vmcall
84617008ret
We have the address of RET instruction (84617008) on the stack trace:
0: kd> kv
ChildEBP RetAddr Args to Child
8d283694 816df8a5 00000101 00000060 00000000 nt!KeBugCheckEx+0x1e
8d2836c8 816e163d 84e1521b 000000d1 8d283784 nt!KeUpdateRunTime+0xd5
8d2836c8 8461700884e1521b 000000d1 8d283784 nt!KeUpdateSystemTime+0xed (FPO: [0,2] TrapFrame @ 8d2836d8)
WARNING: Frame IP not in any known module. Following frames may be wrong.
8d283748 816f46a1 84e6c900 22728000 8172e28c 0×84617008
8d283758 816fa6aa 00010008 00000000 22728000 nt!HvlpInitiateHypercall+0×21 (FPO: [6,3,0])
8d283784 8166aca5 22728000 00000000 00000000 nt!HvlNotifyLongSpinWait+0×2b
[…]
The second processor is busy too:
0: kd> !running
System Processors 3 (affinity mask)
Idle Processors 0
Prcbs Current Next
0 8172c920 84e6c900 ................
1 805d8120 85138030 85a50d78 …………….
0: kd> !thread 85138030
THREAD 85138030 Cid 0564.11c8 Teb: 7ff9f000 Win32Thread: 00000000 RUNNING on processor 1
IRP List:
85ab5d00: (0006,01fc) Flags: 00000884 Mdl: 00000000
85445ab8: (0006,0094) Flags: 00060000 Mdl: 00000000
Not impersonating
DeviceMap 98a7d558
Owning Process 84f0d938 Image: Application.exe
Attached Process N/A Image: N/A
Wait Start TickCount 695643 Ticks: 224 (0:00:00:03.500)
Context Switch Count 20
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x705e2679
Stack Init a1d13000 Current a1d10a70 Base a1d13000 Limit a1d10000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr Args to Child
00000000 00000000 00000000 00000000 00000000 0×0
Because we have (Limit, Current, Base) triple for our thread we check its execution residue on kernel raw stack. We find traces of a hypercall too:
0: kd> dds a1d10000 a1d13000
[...]
a1d1215c 816fa6da nt!HvlEndSystemInterrupt+0x20
a1d12160 40000070
a1d12164 00000000
a1d12168 81608838 hal!HalEndSystemInterrupt+0x7a
a1d1216c 805d8000
a1d12170 a1d12180
a1d12174 81618cc9 hal!HalpIpiHandler+0x189
a1d12178 84f4841b
a1d1217c 000000e1
a1d12180 a1d1222c
a1d12184 84617008
a1d12188 badb0d00
a1d1218c 00000000
a1d12190 81665699 nt!RtlWalkFrameChain+0x58
a1d12194 816659c4 nt!RtlWalkFrameChain+0x377
a1d12198 11c4649e
a1d1219c 00000002
a1d121a0 00000003
a1d121a4 85478444
a1d121a8 0000001f
a1d121ac 00000000
a1d121b0 00000000
a1d121b4 816f46a1 nt!HvlpInitiateHypercall+0×21
a1d121b8 8172c800 nt!KiInitialPCR
a1d121bc 85138030
a1d121c0 85a72ac0
a1d121c4 8171b437 nt!HvlSwitchVirtualAddressSpace+0×28
a1d121c8 00010001
a1d121cc 00000000
a1d121d0 85a72ac0
a1d121d4 85138030
a1d121d8 8172c802 nt!KiInitialPCR+0×2
a1d121dc 00000000
a1d121e0 85138030
a1d121e4 816fa6da nt!HvlEndSystemInterrupt+0×20
a1d121e8 40000070
a1d121ec 00000000
a1d121f0 81608838 hal!HalEndSystemInterrupt+0×7a
a1d121f4 816f46a1 nt!HvlpInitiateHypercall+0×21
a1d121f8 805d8000
a1d121fc 85138030
a1d12200 805dc1e0
a1d12204 8171b437 nt!HvlSwitchVirtualAddressSpace+0×28
a1d12208 00010001
a1d1220c 00000000
a1d12210 00000000
a1d12214 81608468 hal!HalpDispatchSoftwareInterrupt+0×5e
a1d12218 00000000
a1d1221c 00000000
a1d12220 00000206
a1d12224 a1d12240
a1d12228 81608668 hal!HalpCheckForSoftwareInterrupt+0×64
a1d1222c 00000002
a1d12230 00000000
a1d12234 816086a8 hal!KfLowerIrql
a1d12238 00000000
a1d1223c 00000002
a1d12240 a1d12250
a1d12244 8160870c hal!KfLowerIrql+0×64
a1d12248 00000000
a1d1224c 00000000
a1d12250 a1d12294
a1d12254 816e035a nt!KiSwapThread+0×477
a1d12258 85138030
a1d1225c 851380b8
a1d12260 805d8120
a1d12264 0014d1f8
[…]
Looking at raw stack further we can even see that it was processing a page fault too and manually reconstruct its stack trace:
[...]
a1d1074c 85aef510
a1d10750 a1d10768
a1d10754 81678976 nt!IofCallDriver+0×63
a1d10758 84c87d50
a1d1075c 85aef510
a1d10760 00000000
a1d10764 84c87d50
a1d10768 a1d10784
a1d1076c 8166d74e nt!IoPageRead+0×172
a1d10770 85138030
a1d10774 84a1352c
a1d10778 84a134f8
a1d1077c 84a13538
a1d10780 84c87d50
a1d10784 a1d10840
a1d10788 816abf07 nt!MiDispatchFault+0xd14
a1d1078c 00000043
a1d10790 85138030
a1d10794 84a13538
a1d10798 84a1350c
a1d1079c 84a1352c
a1d107a0 8174c800 nt!MmSystemCacheWs
a1d107a4 00000000
a1d107a8 85138030
a1d107ac a5397bf8
a1d107b0 85b01c48
a1d107b4 00000000
a1d107b8 00000000
a1d107bc a5397bf8
a1d107c0 84a1358c
a1d107c4 a1d10864
a1d107c8 00000000
a1d107cc 8463a590
a1d107d0 84a134f8
a1d107d4 c0518000
a1d107d8 00000000
a1d107dc 00000000
a1d107e0 00000028
a1d107e4 a1d107f4
a1d107e8 00000000
a1d107ec 00000038
a1d107f0 859f5640
a1d107f4 a4bfa390
a1d107f8 00000000
a1d107fc 00000000
a1d10800 00000000
a1d10804 a1d10938
a1d10808 818652bd nt!MmCreateSection+0×98f
a1d1080c 00000000
a1d10810 846652e8
a1d10814 00000000
a1d10818 00000000
a1d1081c 00000000
a1d10820 00000028
a1d10824 00000000
a1d10828 00000080
a1d1082c 0000000a
a1d10830 85ae1c98
a1d10834 85ae1c20
a1d10838 00000000
a1d1083c 00000000
a1d10840 a1d108b8
a1d10844 816cd325 nt!MmAccessFault+0×10c6
a1d10848 a3000000
a1d1084c a5397bf8
a1d10850 00000000
a1d10854 8174c800 nt!MmSystemCacheWs
a1d10858 00000000
a1d1085c 00000000
a1d10860 a5397bf8
a1d10864 00000000
[…]
0: kd> k L=a1d10750 a1d10750 a1d10750
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
a1d10750 81678976 0xa1d10750
a1d10768 8166d74e nt!IofCallDriver+0x63
a1d10784 816abf07 nt!IoPageRead+0x172
a1d10840 816cd325 nt!MiDispatchFault+0xd14
a1d108b8 816f0957 nt!MmAccessFault+0x10c6
a1d10924 8181c952 nt!MmCheckCachedPageState+0x801
a1d109b0 8c60f850 nt!CcCopyRead+0x435
a1d109dc 8c613c52 Ntfs!NtfsCachedRead+0x13b
a1d10abc 8c612b6f Ntfs!NtfsCommonRead+0x105a
a1d10b2c 81678976 Ntfs!NtfsFsdRead+0x273
a1d10b44 8c40cba7 nt!IofCallDriver+0x63
a1d10b68 8c40d7c7 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x251
a1d10ba0 8c40dbe7 fltmgr!FltPerformSynchronousIo+0xb9
a1d10c10 9ca680e5 fltmgr!FltReadFile+0x2ed
[...]
A bit of reverse engineering offtopic here: I think that the absence of a trap means that cache processing NTFS code reuses page fault handling code.
- Dmitry Vostokov @ DumpAnalysis.org -
July 12th, 2009 at 6:10 pm
[…] at drivers shows that the system was virtualized under a […]
June 7th, 2010 at 11:49 pm
[…] We find the system virtualized: […]