Crash Dump Analysis Patterns (Part 57)
Thursday, April 3rd, 2008Another pattern that occurs frequently is Hardware Error. This can be internal CPU malfunction due to overheating, RAM or hard disk I/O problem. It usually results in the appropriate bugcheck and the most frequent one is the 6th from the top of Bug Check Frequency Table:
-
BugCheck 9C: MACHINE_CHECK_EXCEPTION
Other relevant bugchecks include:
-
BugCheck 7B: INACCESSIBLE_BOOT_DEVICE
-
BugCheck 77: KERNEL_STACK_INPAGE_ERROR
-
BugCheck 7A: KERNEL_DATA_INPAGE_ERROR
Another bugcheck from this category can also be triggered on purpose to get a crash dump of a hanging or slow system:
Please also note that other popular bugchecks like
-
BugCheck 7F: UNEXPECTED_KERNEL_MODE_TRAP
-
BugCheck 50: PAGE_FAULT_IN_NONPAGED_AREA
can result from RAM problems but we should try to find a software cause first.
Sometimes the following bugchecks like
-
BugCheck 7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED
report EXCEPTION_DOESNOT_MATCH_CODE where read or write address doesn’t correspond to faulted instruction at EIP:
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: bf802671, The address that the exception occurred at
Arg3: f10b8c74, Exception Record Address
Arg4: f10b88c4, Context Record Address
FAULTING_IP:
driver!AcquireSemaphoreShared+4
bf802671 90 nop
EXCEPTION_RECORD: f10b8c74 -- (.exr fffffffff10b8c74)
ExceptionAddress: bf802671 (driver!AcquireSemaphoreShared+0x00000004)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000001
Parameter[1]: 0000000c
Attempt to write to address 0000000c
CONTEXT: f10b88c4 -- (.cxr fffffffff10b88c4)
eax=884d2d01 ebx=0000000c ecx=00000000 edx=80010031 esi=8851ef60 edi=bc3846d4
eip=bf802671 esp=f10b8d3c ebp=f10b8d70 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
driver!AcquireSemaphoreShared+0x4:
bf802671 90 nop
Resetting default scope
WRITE_ADDRESS: 0000000c
EXCEPTION_DOESNOT_MATCH_CODE: This indicates a hardware error.
Instruction at bf802671 does not read/write to 0000000c
Code mismatch can also happen in user mode but from my experience it usually results from improper Hooked Function or similar corruption:
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 7c848768 (ntdll!_LdrpInitialize+0x00000184)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000001
NumberParameters: 0
DEFAULT_BUCKET_ID: CODE_ADDRESS_MISMATCH
WRITE_ADDRESS: f774f120
FAULTING_IP:
ntdll!_LdrpInitialize+184
7c848768 cc int 3
EXCEPTION_DOESNOT_MATCH_CODE: This indicates a hardware error.
Instruction at 7c848768 does not read/write to f774f120
STACK_TEXT:
0012fd14 7c8284c5 0012fd28 7c800000 00000000 ntdll!_LdrpInitialize+0x184
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25
In such cases EIP might point to the middle of the expected instruction (Wild Code):
FAULTING_IP:
+59c3659
059c3659 86990508f09b xchg bl,byte ptr [ecx-640FF7FBh]
Here is an example of the real hardware error (note the concatenated error code for bugcheck 0×9C):
MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
x86 Processors
If the processor has ONLY MCE feature available (For example Intel
Pentium), the parameters are:
1 - Low 32 bits of P5_MC_TYPE MSR
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of P5_MC_ADDR MSR
4 - Low 32 bits of P5_MC_ADDR MSR
If the processor also has MCA feature available (For example Intel
Pentium Pro), the parameters are:
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
IA64 Processors
1 - Bugcheck Type
1 - MCA_ASSERT
2 - MCA_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing MCA.
3 - MCA_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA.
4 - MCA_FATAL
FW reported a fatal MCA.
5 - MCA_NONFATAL
SAL reported a recoverable MCA and we don't support currently
support recovery or SAL generated an MCA and then couldn't
produce an error record.
0xB - INIT_ASSERT
0xC - INIT_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing INIT event.
0xD - INIT_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event.
0xE - INIT_FATAL
Not used.
2 - Address of log
3 - Size of log
4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO
AMD64 Processors
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
Arguments:
Arg1: 00000000
Arg2: 808a07a0
Arg3: be000300
Arg4: 1008081f
Debugging Details:
------------------
NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.
This error is documented in the following publication:
- IA-32 Intel(r) Architecture Software Developer's Manual
Volume 3: System Programming Guide
Bit Mask:
MA Model Specific MCA
O ID Other Information Error Code Error Code
VV SDP ___________|____________ _______|_______ _______|______
AEUECRC| | |
LRCNVVC| | |
^^^^^^^| | |
6 5 4 3 2 1
3210987654321098765432109876543210987654321098765432109876543210
----------------------------------------------------------------
1011111000000000000000110000000000010000000010000000100000011111
VAL - MCi_STATUS register is valid
Indicates that the information contained within the IA32_MCi_STATUS
register is valid. When this flag is set, the processor follows the
rules given for the OVER flag in the IA32_MCi_STATUS register when
overwriting previously valid entries. The processor sets the VAL
flag and software is responsible for clearing it.
UC - Error Uncorrected
Indicates that the processor did not or was not able to correct the
error condition. When clear, this flag indicates that the processor
was able to correct the error condition.
EN - Error Enabled
Indicates that the error was enabled by the associated EEj bit of the
IA32_MCi_CTL register.
MISCV - IA32_MCi_MISC Register Valid
Indicates that the IA32_MCi_MISC register contains additional
information regarding the error. When clear, this flag indicates
that the IA32_MCi_MISC register is either not implemented or does
not contain additional information regarding the error.
ADDRV - IA32_MCi_ADDR register valid
Indicates that the IA32_MCi_ADDR register contains the address where
the error occurred.
PCC - Processor Context Corrupt
Indicates that the state of the processor might have been corrupted
by the error condition detected and that reliable restarting of the
processor may not be possible.
BUSCONNERR - Bus and Interconnect Error BUS{LL}_{PP}_{RRRR}_{II}_{T}_err
These errors match the format 0000 1PPT RRRR IILL
Concatenated Error Code:
--------------------------
_VAL_UC_EN_MISCV_ADDRV_PCC_BUSCONNERR_1F
This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.
BUGCHECK_STR: 0x9C_IA32_GenuineIntel
DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: Idle
CURRENT_IRQL: 2
LAST_CONTROL_TRANSFER: from 80a7fbd8 to 8087b6be
STACK_TEXT:
f773d280 80a7fbd8 0000009c 00000000 f773d2b0 nt!KeBugCheckEx+0x1b
f773d3b4 80a7786f f7737fe0 00000000 00000000 hal!HalpMcaExceptionHandler+0x11e
f773d3b4 f75a9ca2 f7737fe0 00000000 00000000 hal!HalpMcaExceptionHandlerWrapper+0x77
f78c6d50 8083abf2 00000000 0000000e 00000000 intelppm!AcpiC1Idle+0x12
f78c6d54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa
- Dmitry Vostokov @ DumpAnalysis.org -