Notes about NMI_HARDWARE_FAILURE

WinDbg help states that NMI_HARDWARE_FAILURE (0×80) bugcheck 80 indicates a hardware fault. This description can easily lead to a conclusion that a kernel or complete crash dump you just got from your customer doesn’t worth examining. But hardware malfunction is not always the case especially if your customer mentions that their system was hanging and they forced a manual dump. Here I would advise to check whether they have a special hardware for debugging purposes, for example, a card or an integrated iLO chip (Integrated Lights-Out) for remote server administration. Both can generate NMI (Non Maskable Interrupt) on demand and therefore bugcheck the system. If this is the case then it is worth examining their dump to see why the system was hanging.

- Dmitry Vostokov -

5 Responses to “Notes about NMI_HARDWARE_FAILURE”

  1. !analyze -v : Notes about NMI_HARDWARE_FAILURE Says:

    […] 이 문서는 http://www.dumpanalysis.org/blog blog 의 번역이며 원래의 자료가 통보 없이 변경될 수 있습니다. 이 자료는 법률적 보증이 없으며 의견을 주시기 위해 원래의 blog 를 방문하실 수 있습니다. (http://www.dumpanalysis.org/blog/index.php/2006/12/23/notes-about-nmi_hardware_failure/)" […]

  2. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 13f) Says:

    […] in running threads, the first one is a normal memory access fault (blue) and the other is forced NMI bugcheck to save a memory dump […]

  3. Crash Dump Analysis » Blog Archive » Reflecting on 2008 (Part 1) Says:

    […] exception information can be accessed via .ecxr. windbg tips 0×80070026 dmitri windbg gdb teb nmi_hardware_failure system_thread_exception_not_handled system_thread_exception_not_handled (7e) windbg command pool allocations have failed receivelotsacalls trap frame vista dr watson bios […]

  4. Tim Says:

    Can you shed any light on this? I cannot find out the meaning of parameter1 for a “BugCheck 80″, but am hoping it can give us a clue.

    BugCheck 80, {4f4454, 0, 0, 0}

    *** ERROR: Module load completed but symbols could not be loaded for intelppm.sys
    Probably caused by : intelppm.sys ( intelppm+3b42 )

    Followup: MachineOwner
    ———

    nt!RtlpBreakWithStatusInstruction:
    fffff800`01026c80 cc int 3

    Regards,

    Tim

  5. Dmitry Vostokov Says:

    It looks like an ASCII string “TDO”

    1: kd> .formats 4f4454
    Evaluate expression:
    Hex: 00000000`004f4454
    Decimal: 5194836
    Octal: 0000000000000023642124
    Binary: 00000000 00000000 00000000 00000000 00000000 01001111 01000100 01010100
    Chars: …..ODT
    Time: Mon Mar 02 03:00:36 1970
    Float: low 7.27952e-039 high 0
    Double: 2.56659e-317

Leave a Reply

You must be logged in to post a comment.