Troubleshooting as debugging
This post is motivated by TRAFFIC steps introduced by Andreas Zeller in his book ”Why Programs Fail?”. This book is wonderful and it gives practical debugging skills coherent and solid systematical foundation.
However these steps are for fixing defects in code, the traditional view of the software debugging process. Based on an analogy with systems theories where we have different levels of abstraction like psychology, biology, chemistry and physics, I would say that debugging starts when you have the failure at the system level.
If we compare systems to applications, troubleshooting to source code debugging, the question we ask at the higher level is “Who caused the product to fail?” which also has a business and political flavor. Therefore I propose a different acronym: VERSION. If you always try to fix system problems at the code level you will get a huge “traffic” in all sense but if you troubleshoot them first you get a different system / subsystem / component version and get your problem solved faster. This is why we have technical support departments in organizations.
There are some parallels between TRAFFIC and VERSION steps:
Track View the problem
Reproduce Environment/repro steps
Automate (and simplify) Relevant description
Find origins Subsystem/component
identification
Focus Identify the origin
(subsystem/component)
Isolate (defect in code) Obtain the solution
(replace/eliminate
subsystem/component)
Correct (defect in code) New case study
(document,
postmortem analysis)
Troubleshooting doesn’t eliminate the need to look at source code. In many cases a support engineer has to be proficient in code reading skill to be able to map from traces to source code. This will help in component identification, especially if your product has extensive tracing facility. I have started development of ”Code Reading” training targeted for Windows environments and will post some presentations soon. The first one will be available tomorrow, so stay tuned.
- Dmitry Vostokov @ DumpAnalysis.org -
November 27th, 2008 at 11:38 am
[…] I’m also working currently on incorporating first fault problem resolution into my VERSION steps and PARTS troubleshooting […]