Archive for April 3rd, 2008

Crash Dump Analysis Patterns (Part 57)

Thursday, April 3rd, 2008

Another pattern that occurs frequently is Hardware Error. This can be internal CPU malfunction due to overheating, RAM or hard disk I/O problem. It usually results in the appropriate bugcheck and the most frequent one is the 6th from the top of Bug Check Frequency Table:

  • BugCheck 9C: MACHINE_CHECK_EXCEPTION

Other relevant bugchecks include:

  • BugCheck 7B: INACCESSIBLE_BOOT_DEVICE

  • BugCheck 77: KERNEL_STACK_INPAGE_ERROR

  • BugCheck 7A: KERNEL_DATA_INPAGE_ERROR

Another bugcheck from this category can also be triggered on purpose to get a crash dump of a hanging or slow system:

Please also note that other popular bugchecks like  

  • BugCheck 7F: UNEXPECTED_KERNEL_MODE_TRAP

  • BugCheck 50: PAGE_FAULT_IN_NONPAGED_AREA

can result from RAM problems but we should try to find a software cause first.

Sometimes the following bugchecks like

  • BugCheck 7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

report EXCEPTION_DOESNOT_MATCH_CODE where read or write address doesn’t correspond to faulted instruction at EIP:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: bf802671, The address that the exception occurred at
Arg3: f10b8c74, Exception Record Address
Arg4: f10b88c4, Context Record Address

FAULTING_IP:
driver!AcquireSemaphoreShared+4
bf802671 90 nop

EXCEPTION_RECORD: f10b8c74 -- (.exr fffffffff10b8c74)
ExceptionAddress: bf802671 (driver!AcquireSemaphoreShared+0x00000004)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000001
Parameter[1]: 0000000c
Attempt to write to address 0000000c

CONTEXT: f10b88c4 -- (.cxr fffffffff10b88c4)
eax=884d2d01 ebx=0000000c ecx=00000000 edx=80010031 esi=8851ef60 edi=bc3846d4
eip=bf802671 esp=f10b8d3c ebp=f10b8d70 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
driver!AcquireSemaphoreShared+0x4:
bf802671 90 nop
Resetting default scope

WRITE_ADDRESS: 0000000c

EXCEPTION_DOESNOT_MATCH_CODE: This indicates a hardware error.
Instruction at bf802671 does not read/write to 0000000c

Code mismatch can also happen in user mode but from my experience it usually results from improper Hooked Function or similar corruption: 

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 7c848768 (ntdll!_LdrpInitialize+0x00000184)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000001
NumberParameters: 0

DEFAULT_BUCKET_ID: CODE_ADDRESS_MISMATCH

WRITE_ADDRESS: f774f120

FAULTING_IP:
ntdll!_LdrpInitialize+184
7c848768 cc int 3

EXCEPTION_DOESNOT_MATCH_CODE: This indicates a hardware error.
Instruction at 7c848768 does not read/write to f774f120

STACK_TEXT:
0012fd14 7c8284c5 0012fd28 7c800000 00000000 ntdll!_LdrpInitialize+0x184
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25

In such cases EIP might point to the middle of the expected instruction (Wild Code):

FAULTING_IP:
+59c3659
059c3659 86990508f09b xchg bl,byte ptr [ecx-640FF7FBh]

Here is an example of the real hardware error (note the concatenated error code for bugcheck 0×9C):

MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
    x86 Processors
        If the processor has ONLY MCE feature available (For example Intel
        Pentium), the parameters are:
        1 - Low  32 bits of P5_MC_TYPE MSR
        2 - Address of MCA_EXCEPTION structure
        3 - High 32 bits of P5_MC_ADDR MSR
        4 - Low  32 bits of P5_MC_ADDR MSR
        If the processor also has MCA feature available (For example Intel
        Pentium Pro), the parameters are:
        1 - Bank number
        2 - Address of MCA_EXCEPTION structure
        3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
        4 - Low  32 bits of MCi_STATUS MSR for the MCA bank that had the error
    IA64 Processors
        1 - Bugcheck Type
            1 - MCA_ASSERT
            2 - MCA_GET_STATEINFO
                SAL returned an error for SAL_GET_STATEINFO while processing MCA.
            3 - MCA_CLEAR_STATEINFO
                SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA.
            4 - MCA_FATAL
                FW reported a fatal MCA.
            5 - MCA_NONFATAL
                SAL reported a recoverable MCA and we don't support currently
                support recovery or SAL generated an MCA and then couldn't
                produce an error record.
            0xB - INIT_ASSERT
            0xC - INIT_GET_STATEINFO
                  SAL returned an error for SAL_GET_STATEINFO while processing INIT event.
            0xD - INIT_CLEAR_STATEINFO
                  SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event.
            0xE - INIT_FATAL
                  Not used.
        2 - Address of log
        3 - Size of log
        4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO
    AMD64 Processors
        1 - Bank number
        2 - Address of MCA_EXCEPTION structure
        3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
        4 - Low  32 bits of MCi_STATUS MSR for the MCA bank that had the error
Arguments:
Arg1: 00000000
Arg2: 808a07a0
Arg3: be000300
Arg4: 1008081f

Debugging Details:
------------------

   NOTE:  This is a hardware error.  This error was reported by the CPU
   via Interrupt 18.  This analysis will provide more information about
   the specific error.  Please contact the manufacturer for additional
   information about this error and troubleshooting assistance.

   This error is documented in the following publication:

      - IA-32 Intel(r) Architecture Software Developer's Manual
        Volume 3: System Programming Guide

   Bit Mask:

    MA                           Model Specific       MCA
 O  ID      Other Information      Error Code     Error Code
VV  SDP ___________|____________ _______|_______ _______|______
AEUECRC|                        |               |             
LRCNVVC|                        |               |             
^^^^^^^|                        |               |              
   6         5         4         3         2         1
3210987654321098765432109876543210987654321098765432109876543210
----------------------------------------------------------------
1011111000000000000000110000000000010000000010000000100000011111 

VAL   - MCi_STATUS register is valid
        Indicates that the information contained within the IA32_MCi_STATUS
        register is valid.  When this flag is set, the processor follows the
        rules given for the OVER flag in the IA32_MCi_STATUS register when
        overwriting previously valid entries.  The processor sets the VAL
        flag and software is responsible for clearing it.

UC    - Error Uncorrected
        Indicates that the processor did not or was not able to correct the
        error condition.  When clear, this flag indicates that the processor
        was able to correct the error condition.

EN    - Error Enabled
        Indicates that the error was enabled by the associated EEj bit of the
        IA32_MCi_CTL register.

MISCV - IA32_MCi_MISC Register Valid
        Indicates that the IA32_MCi_MISC register contains additional
        information regarding the error.  When clear, this flag indicates
        that the IA32_MCi_MISC register is either not implemented or does
        not contain additional information regarding the error.

ADDRV - IA32_MCi_ADDR register valid
        Indicates that the IA32_MCi_ADDR register contains the address where
        the error occurred.

PCC   - Processor Context Corrupt
        Indicates that the state of the processor might have been corrupted
        by the error condition detected and that reliable restarting of the
        processor may not be possible.

BUSCONNERR - Bus and Interconnect Error   BUS{LL}_{PP}_{RRRR}_{II}_{T}_err
        These errors match the format 0000 1PPT RRRR IILL

   Concatenated Error Code:
   --------------------------
   _VAL_UC_EN_MISCV_ADDRV_PCC_BUSCONNERR_1F

   This error code can be reported back to the manufacturer.
   They may be able to provide additional information based upon
   this error.  All questions regarding STOP 0x9C should be
   directed to the hardware manufacturer.

BUGCHECK_STR:  0x9C_IA32_GenuineIntel

DEFAULT_BUCKET_ID:  DRIVER_FAULT

PROCESS_NAME:  Idle

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from 80a7fbd8 to 8087b6be

STACK_TEXT: 
f773d280 80a7fbd8 0000009c 00000000 f773d2b0 nt!KeBugCheckEx+0x1b
f773d3b4 80a7786f f7737fe0 00000000 00000000 hal!HalpMcaExceptionHandler+0x11e
f773d3b4 f75a9ca2 f7737fe0 00000000 00000000 hal!HalpMcaExceptionHandlerWrapper+0x77
f78c6d50 8083abf2 00000000 0000000e 00000000 intelppm!AcpiC1Idle+0x12
f78c6d54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa

- Dmitry Vostokov @ DumpAnalysis.org -