The Importance of First Fault
Thursday, November 27th, 2008I’ve been thinking through the so called First Faults after Dan Skwire, a veteran in mission-critical computer system problem resolution, problem prevention, and system recovery, organized a group on LinkedIn for first fault problem solving activity. He also has a website:
http://www.firstfaultproblemresolution.com/
From my software technical support experience first fault problem resolution is very important on Windows platforms, especially in enterprise terminal service and virtualized environments where hundreds of users can be hosted on just one server. Therefore, proper tools, processes and checklists need to be set up and established for effective and efficient troubleshooting and problem resolution from both engineering and customer relationship managing perspectives. Here crash and hang dump analysis helps immensely, especially memory analysis patterns and fault databases. More on this later with specific examples. I’m also working currently on incorporating first fault problem resolution into VERSION troubleshooting steps and PARTS troubleshooting methodology.
- Dmitry Vostokov @ DumpAnalysis.org -