Software Diagnostics Library

Crash Dump Analysis Patterns (Part 57)

Another pattern that occurs frequently is Hardware Error. This can be internal CPU malfunction due to overheating, RAM or hard disk I/O problem. It usually results in the appropriate bugcheck and the most frequent one is the 6th from the top of Bug Check Frequency Table:

BugCheck 9C: MACHINE_CHECK_EXCEPTION

Other relevant bugchecks include:

BugCheck 7B: INACCESSIBLE_BOOT_DEVICE
BugCheck 77: KERNEL_STACK_INPAGE_ERROR
BugCheck 7A: KERNEL_DATA_INPAGE_ERROR

Another bugcheck from this category can also be triggered on purpose to get a crash dump of a hanging or slow system:

BugCheck 80: NMI_HARDWARE_FAILURE

Please also note that other popular bugchecks like

BugCheck 7F: UNEXPECTED_KERNEL_MODE_TRAP
BugCheck 50: PAGE_FAULT_IN_NONPAGED_AREA

can result from RAM problems but we should try to find a software cause first.

Sometimes the following bugchecks like

BugCheck 7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

report EXCEPTION_DOESNOT_MATCH_CODE where read or write address doesn’t correspond to faulted instruction at EIP:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: c0000005, The exception code that was not handled Arg2: bf802671, The address that the exception occurred at Arg3: f10b8c74, Exception Record Address Arg4: f10b88c4, Context Record Address

FAULTING_IP: driver!AcquireSemaphoreShared+4 bf802671 90 nop

EXCEPTION_RECORD: f10b8c74 -- (.exr fffffffff10b8c74) ExceptionAddress: bf802671 (driver!AcquireSemaphoreShared+0x00000004) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 00000001 Parameter[1]: 0000000c Attempt to write to address 0000000c

CONTEXT: f10b88c4 -- (.cxr fffffffff10b88c4) eax=884d2d01 ebx=0000000c ecx=00000000 edx=80010031 esi=8851ef60 edi=bc3846d4 eip=bf802671 esp=f10b8d3c ebp=f10b8d70 iopl=0 nv up ei pl nz na po nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206 driver!AcquireSemaphoreShared+0x4: bf802671 90 nop Resetting default scope

WRITE_ADDRESS: 0000000c

EXCEPTION_DOESNOT_MATCH_CODE: This indicates a hardware error. Instruction at bf802671 does not read/write to 0000000c

Code mismatch can also happen in user mode but from my experience it usually results from improper Hooked Function or similar corruption:

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 7c848768 (ntdll!_LdrpInitialize+0x00000184) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000001 NumberParameters: 0

DEFAULT_BUCKET_ID: CODE_ADDRESS_MISMATCH

WRITE_ADDRESS: f774f120

FAULTING_IP: ntdll!_LdrpInitialize+184 7c848768 cc int 3

EXCEPTION_DOESNOT_MATCH_CODE: This indicates a hardware error. Instruction at 7c848768 does not read/write to f774f120

STACK_TEXT: 0012fd14 7c8284c5 0012fd28 7c800000 00000000 ntdll!_LdrpInitialize+0x184 00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25

In such cases EIP might point to the middle of the expected instruction (Wild Code):

FAULTING_IP: +59c3659 059c3659 86990508f09b xchg bl,byte ptr [ecx-640FF7FBh]

Here is an example of the real hardware error (note the concatenated error code for bugcheck 0×9C):

MACHINE_CHECK_EXCEPTION (9c) A fatal Machine Check Exception has occurred. KeBugCheckEx parameters; x86 Processors If the processor has ONLY MCE feature available (For example Intel Pentium), the parameters are: 1 - Low 32 bits of P5_MC_TYPE MSR 2 - Address of MCA_EXCEPTION structure 3 - High 32 bits of P5_MC_ADDR MSR 4 - Low 32 bits of P5_MC_ADDR MSR If the processor also has MCA feature available (For example Intel Pentium Pro), the parameters are: 1 - Bank number 2 - Address of MCA_EXCEPTION structure 3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error 4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error IA64 Processors 1 - Bugcheck Type 1 - MCA_ASSERT 2 - MCA_GET_STATEINFO SAL returned an error for SAL_GET_STATEINFO while processing MCA. 3 - MCA_CLEAR_STATEINFO SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA. 4 - MCA_FATAL FW reported a fatal MCA. 5 - MCA_NONFATAL SAL reported a recoverable MCA and we don't support currently support recovery or SAL generated an MCA and then couldn't produce an error record. 0xB - INIT_ASSERT 0xC - INIT_GET_STATEINFO SAL returned an error for SAL_GET_STATEINFO while processing INIT event. 0xD - INIT_CLEAR_STATEINFO SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event. 0xE - INIT_FATAL Not used. 2 - Address of log 3 - Size of log 4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO AMD64 Processors 1 - Bank number 2 - Address of MCA_EXCEPTION structure 3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error 4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error Arguments: Arg1: 00000000 Arg2: 808a07a0 Arg3: be000300 Arg4: 1008081f

Debugging Details: ------------------

NOTE: This is a hardware error. This error was reported by the CPU via Interrupt 18. This analysis will provide more information about the specific error. Please contact the manufacturer for additional information about this error and troubleshooting assistance.

This error is documented in the following publication:

- IA-32 Intel(r) Architecture Software Developer's Manual Volume 3: System Programming Guide

Bit Mask:

MA Model Specific MCA O ID Other Information Error Code Error Code VV SDP ___________|____________ _______|_______ _______|______ AEUECRC| | | LRCNVVC| | | ^^^^^^^| | | 6 5 4 3 2 1 3210987654321098765432109876543210987654321098765432109876543210 ---------------------------------------------------------------- 1011111000000000000000110000000000010000000010000000100000011111

VAL - MCi_STATUS register is valid Indicates that the information contained within the IA32_MCi_STATUS register is valid. When this flag is set, the processor follows the rules given for the OVER flag in the IA32_MCi_STATUS register when overwriting previously valid entries. The processor sets the VAL flag and software is responsible for clearing it.

UC - Error Uncorrected Indicates that the processor did not or was not able to correct the error condition. When clear, this flag indicates that the processor was able to correct the error condition.

EN - Error Enabled Indicates that the error was enabled by the associated EEj bit of the IA32_MCi_CTL register.

MISCV - IA32_MCi_MISC Register Valid Indicates that the IA32_MCi_MISC register contains additional information regarding the error. When clear, this flag indicates that the IA32_MCi_MISC register is either not implemented or does not contain additional information regarding the error.

ADDRV - IA32_MCi_ADDR register valid Indicates that the IA32_MCi_ADDR register contains the address where the error occurred.

PCC - Processor Context Corrupt Indicates that the state of the processor might have been corrupted by the error condition detected and that reliable restarting of the processor may not be possible.

BUSCONNERR - Bus and Interconnect Error BUS{LL}_{PP}_{RRRR}_{II}_{T}_err These errors match the format 0000 1PPT RRRR IILL

Concatenated Error Code: -------------------------- _VAL_UC_EN_MISCV_ADDRV_PCC_BUSCONNERR_1F

This error code can be reported back to the manufacturer. They may be able to provide additional information based upon this error. All questions regarding STOP 0x9C should be directed to the hardware manufacturer.

BUGCHECK_STR: 0x9C_IA32_GenuineIntel

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: Idle

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 80a7fbd8 to 8087b6be

STACK_TEXT: f773d280 80a7fbd8 0000009c 00000000 f773d2b0 nt!KeBugCheckEx+0x1b f773d3b4 80a7786f f7737fe0 00000000 00000000 hal!HalpMcaExceptionHandler+0x11e f773d3b4 f75a9ca2 f7737fe0 00000000 00000000 hal!HalpMcaExceptionHandlerWrapper+0x77 f78c6d50 8083abf2 00000000 0000000e 00000000 intelppm!AcpiC1Idle+0x12 f78c6d54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0xa

- Dmitry Vostokov @ DumpAnalysis.org -

This entry was posted on Thursday, April 3rd, 2008 at 5:16 pm and is filed under Bugchecks Depicted, Crash Dump Analysis, Crash Dump Patterns, Hardware. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

7 Responses to “Crash Dump Analysis Patterns (Part 57)”

Dmitry Vostokov Says:
March 15th, 2010 at 12:15 am
Another possibility of a hardware error: frequent multiple unrelated bugchecks and / or bugchecks in memory dumps with valid instructions at faulting IP. Beware also about misaligned IP that can also look as a valid instruction.
Crash Dump Analysis » Blog Archive » Fault context, wild code and hardware error: pattern cooperation Says:
March 16th, 2010 at 10:38 pm
[…] Most fault IPs were showing signs of Wild Code pattern and that most probably implicated Hardware Error (Looks like WinDbg suggests that MISALIGNED_IP implicates hardware). Here is the listing of […]
Crash Dump Analysis » Blog Archive » Crash Dump Analysis AntiPatterns (Part 14) Says:
June 4th, 2010 at 11:43 pm
[…] wouldn’t be so quick. Check Hardware Error pattern post and comments there. So let’s de-analyze the analysis. “c0000005 is Access […]

Dmitry Vostokov Says:
January 22nd, 2013 at 11:48 pm

Another example is this:

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: fffffa8004b46748, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000000000, Low order 32-bits of the MCi_STATUS value.

1: kd> dt -r _WHEA_ERROR_RECORD fffffa8004b46748
hal!_WHEA_ERROR_RECORD
   +0x000 Header           : _WHEA_ERROR_RECORD_HEADER
      +0x000 Signature        : 0x52455043
      +0x004 Revision         : _WHEA_REVISION
         +0x000 MinorRevision    : 0x10 ''
         +0x001 MajorRevision    : 0x2 ''
         +0x000 AsUSHORT         : 0x210
      +0x006 SignatureEnd     : 0xffffffff
      +0x00a SectionCount     : 3
      +0x00c Severity         : 1 ( WheaErrSevFatal )
      +0x010 ValidBits        : _WHEA_ERROR_RECORD_HEADER_VALIDBITS
         +0x000 PlatformId       : 0y0
         +0x000 Timestamp        : 0y1
         +0x000 PartitionId      : 0y0
         +0x000 Reserved         : 0y00000000000000000000000000000 (0)
         +0x000 AsULONG          : 2
      +0x014 Length           : 0x3a0
      +0x018 Timestamp        : _WHEA_TIMESTAMP
         +0x000 Seconds          : 0y00100010 (0x22)
         +0x000 Minutes          : 0y00101011 (0x2b)
         +0x000 Hours            : 0y00001100 (0xc)
         +0x000 Precise          : 0y0
         +0x000 Reserved         : 0y0000000 (0)
         +0x000 Day              : 0y00010110 (0x16)
         +0x000 Month            : 0y00000100 (0x4)
         +0x000 Year             : 0y00001010 (0xa)
         +0x000 Century          : 0y00010100 (0x14)
         +0x000 AsLARGE_INTEGER  : _LARGE_INTEGER 0x140a0416`000c2b22
      +0x020 PlatformId       : _GUID {00000000-0000-0000-0000-000000000000}
         +0x000 Data1            : 0
         +0x004 Data2            : 0
         +0x006 Data3            : 0
         +0x008 Data4            : [8]  ""
      +0x030 PartitionId      : _GUID {00000000-0000-0000-0000-000000000000}
         +0x000 Data1            : 0
         +0x004 Data2            : 0
         +0x006 Data3            : 0
         +0x008 Data4            : [8]  ""
      +0x040 CreatorId        : _GUID {cf07c4bd-b789-4e18-b3c4-1f732cb57131}
         +0x000 Data1            : 0xcf07c4bd
         +0x004 Data2            : 0xb789
         +0x006 Data3            : 0x4e18
         +0x008 Data4            : [8]  "???"
      +0x050 NotifyType       : _GUID {e8f56ffe-919c-4cc5-ba88-65abe14913bb}
         +0x000 Data1            : 0xe8f56ffe
         +0x004 Data2            : 0x919c
         +0x006 Data3            : 0x4cc5
         +0x008 Data4            : [8]  "???"
      +0x060 RecordId         : 0x01cae219`673474d3
      +0x068 Flags            : _WHEA_ERROR_RECORD_HEADER_FLAGS
         +0x000 Recovered        : 0y0
         +0x000 PreviousError    : 0y1
         +0x000 Simulated        : 0y0
         +0x000 Reserved         : 0y00000000000000000000000000000 (0)
         +0x000 AsULONG          : 2
      +0x06c PersistenceInfo  : _WHEA_PERSISTENCE_INFO
         +0x000 Signature        : 0y0000000000000000 (0)
         +0x000 Length           : 0y000000000000000000000000 (0)
         +0x000 Identifier       : 0y0000000000000000 (0)
         +0x000 Attributes       : 0y00
         +0x000 DoNotLog         : 0y0
         +0x000 Reserved         : 0y00000 (0)
         +0x000 AsULONGLONG      : 0
      +0x074 Reserved         : [12]  ""
   +0x080 SectionDescriptor : [1] _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR
      +0x000 SectionOffset    : 0x158
      +0x004 SectionLength    : 0xc0
      +0x008 Revision         : _WHEA_REVISION
         +0x000 MinorRevision    : 0x1 ''
         +0x001 MajorRevision    : 0x2 ''
         +0x000 AsUSHORT         : 0x201
      +0x00a ValidBits        : _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR_VALIDBITS
         +0x000 FRUId            : 0y0
         +0x000 FRUText          : 0y0
         +0x000 Reserved         : 0y000000 (0)
         +0x000 AsUCHAR          : 0 ''
      +0x00b Reserved         : 0 ''
      +0x00c Flags            : _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR_FLAGS
         +0x000 Primary          : 0y1
         +0x000 ContainmentWarning : 0y0
         +0x000 Reset            : 0y0
         +0x000 ThresholdExceeded : 0y0
         +0x000 ResourceNotAvailable : 0y0
         +0x000 LatentError      : 0y0
         +0x000 Reserved         : 0y00000000000000000000000000 (0)
         +0x000 AsULONG          : 1
      +0x010 SectionType      : _GUID {9876ccad-47b4-4bdb-b65e-16f193c4f3db}
         +0x000 Data1            : 0x9876ccad
         +0x004 Data2            : 0x47b4
         +0x006 Data3            : 0x4bdb
         +0x008 Data4            : [8]  "???"
      +0x020 FRUId            : _GUID {00000000-0000-0000-0000-000000000000}
         +0x000 Data1            : 0
         +0x004 Data2            : 0
         +0x006 Data3            : 0
         +0x008 Data4            : [8]  ""
      +0x030 SectionSeverity  : 1 ( WheaErrSevFatal )
      +0x034 FRUText          : [20]  ""

Dmitry Vostokov Says:
February 18th, 2013 at 10:30 pm
KERNEL_STACK_INPAGE_ERROR (77)
The requested page of kernel data could not be read in. Caused by
bad block in paging file or disk controller error.
In the case when the first arguments is 0 or 1, the stack signature
in the kernel stack was not found. Again, bad hardware.
An I/O status of c000009c (STATUS_DEVICE_DATA_ERROR) or
C000016AL (STATUS_DISK_OPERATION_FAILED) normally indicates
the data could not be read from the disk due to a bad
block. Upon reboot autocheck will run and attempt to map out the bad
sector. If the status is C0000185 (STATUS_IO_DEVICE_ERROR) and the paging
file is on a SCSI disk device, then the cabling and termination should be
checked. See the knowledge base article on SCSI termination.
Arguments:
Arg1: 0000000000000001, (page was retrieved from disk)
Arg2: fffffa800818e870, value found in stack where signature should be
Arg3: 0000000000000000, 0
Arg4: fffff8800c6e5e80, address of signature on kernel stack

2: kd> k
Child-SP RetAddr Call Site
fffff880`0371da18 fffff800`03110b01 nt!KeBugCheckEx
fffff880`0371da20 fffff800`030c8c54 nt! ?? ::FNODOBFM::`string’+0×51e31
fffff880`0371db30 fffff800`030c8bef nt!MmInPageKernelStack+0×40
fffff880`0371db90 fffff800`030c8928 nt!KiInSwapKernelStacks+0×1f
fffff880`0371dbc0 fffff800`0332be5a nt!KeSwapProcessOrStack+0×84
fffff880`0371dc00 fffff800`03085d26 nt!PspSystemThreadStartup+0×5a
fffff880`0371dc40 00000000`00000000 nt!KiStartSystemThread+0×16
Dmitry Vostokov Says:
October 4th, 2016 at 5:28 pm
For WHEA_UNCORRECTABLE_ERROR (124) we have additional WinDbg commands !whea, !errrec, and !errpkt:

2: kd> !whea
Error Source Table @ fffff8004bbd4a90
4 Error Sources
Error Source 0 @ ffffe00014376bd0
Notify Type : {14374010-e000-ffff-984a-bd4b00f8ffff}
Type : 0×0 (MCE)
Error Count : 1
Record Count : 4
Record Length : 728
Error Records : wrapper @ ffffe000110e0000 record @ ffffe000110e0028
: wrapper @ ffffe000110e0728 record @ ffffe000110e0750
: wrapper @ ffffe000110e0e50 record @ ffffe000110e0e78
: wrapper @ ffffe000110e1578 record @ ffffe000110e15a0
Descriptor : @ ffffe00014376c29
Length : 3cc
Max Raw Data Length : 141
Num Records To Preallocate : 4
Max Sections Per Record : 4
Error Source ID : 0
Flags : 00000000
[…]

2: kd> !errrec ffffe000110e0028
============================================
Common Platform Error Record @ ffffe000110e0028
——————————————————————————-
Record Id : 01d21a1a7e5fffd1
Severity : Fatal (1)
Length : 928
Creator : Microsoft
Notify Type : Machine Check Exception
Timestamp : 9/30/2016 9:05:50 (UTC)
Flags : 0×00000000

============================================
Section 0 : Processor Generic
——————————————————————————-
Descriptor @ ffffe000110e00a8
Section @ ffffe000110e0180
Offset : 344
Length : 192
Flags : 0×00000001 Primary
Severity : Fatal

Proc. Type : x86/x64
Instr. Set : x64
Error Type : Micro-Architectural Error
Flags : 0×00
CPU Version : 0×00000000000306a9
Processor ID : 0×0000000000000002

============================================
Section 1 : x86/x64 Processor Specific
——————————————————————————-
Descriptor @ ffffe000110e00f0
Section @ ffffe000110e0240
Offset : 536
Length : 128
Flags : 0×00000000
Severity : Fatal

Local APIC Id : 0×0000000000000002
CPU Id : a9 06 03 00 00 08 10 02 - bf e3 ba 7f ff fb eb bf
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0 @ ffffe000110e0240

============================================
Section 2 : x86/x64 MCA
——————————————————————————-
Descriptor @ ffffe000110e0138
Section @ ffffe000110e02c0
Offset : 664
Length : 264
Flags : 0×00000000
Severity : Fatal

Error : Internal unclassified (Proc 2 Bank 4)
Status : 0xb200000000100402
Dmitry Vostokov Says:
April 23rd, 2018 at 7:49 pm
Recently we observed internal errors in Visual C++ compiler followed by memory management bugchecks a few seconds later.

You must be logged in to post a comment.

Crash Dump Analysis Patterns (Part 57)

7 Responses to “Crash Dump Analysis Patterns (Part 57)”

Leave a Reply