Archive for the ‘First Fault Problem Solving’ Category

Patterns of Software Diagnostics (Part 1)

Saturday, June 9th, 2012

While preparing a seminar on Software Diagnostics I made a lot of notes and realized that a system of patterns, corresponding vocabulary and pattern language are needed for this discipline. Here patterns are supposed to be broad in nature and be different from patterns for specific artifacts such as memory dumps and software traces. So the first pattern addresses a diagnostic encounter with a First Fault in comparison to subsequent faults where the problem becomes noticeable and diagnostic resources are allocated. Such faults should not be dismissed. Dan Skwire is a passionate advocate of first fault software problem solving and wrote a book:

First Fault Software Problem Solving: A Guide for Engineers, Managers and Users

The following paper proposes distributed control flow reconstruction for first fault diagnosis:

TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow

Memory Dump Analysis Services uses patterns of abnormal software behavior for its first fault diagnostics that doesn’t require any special instrumentation:

Join Debugging Diagnostics Revolution!

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Just In Time Crash Analysis Report (JIT CAR)

Thursday, April 21st, 2011

Imagine a pattern-driven crash analysis report (car) when you need it: at the very moment of a crash, just in time! And the car drives you to a problem resolution. Imagine also a periodic pattern-driven just-in-time memory space analysis (JIT MSA) that provides you instant intelligent reports on what’s going on inside memory while your application, service or system is running! This is a forthcoming optional client side part of CARE (Crash Analysis Report Environment) which is being developed by Memory Dump Analysis Services engineering team under the leadership of Alexey Golikov. Combined with generative debugging techniques both client and server parts form a complete unique enterprise crash and hang analysis solution suitable for development and production environments. Stay tuned for further exciting updates.

PS. The car drives on a road to the first fault software problem solving.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

The New School of Debugging

Saturday, January 1st, 2011

With the new year starts the new initiative to integrate traditional multidisciplinary debugging approaches and methodologies with multiplatform pattern-driven software problem solving, unified debugging patterns, best practices in memory dump analysis and software tracing, computer security, economics, and the new emerging trends I’m going to write about during this year.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Debugging in 2021: Trends for the Next Decade (Part 1)

Friday, December 17th, 2010

As the new decade is approaching (2011-2020) we would like to make a few previews and predictions:

- Increased complexity of software will bring more methods from biological, social sciences and humanities in addition to existing methods of automated debugging and computer science techniques

- Focus on first fault software problem solving (when aspect)

- Focus on pattern-driven software problem solving (how aspect)

- Fusion of debugging and malware analysis into a unified structural and behavioral pattern framework

- Visual debugging, memory and software trace visualization techniques

- Software maintenance certification

- Focus on domain-driven troubleshooting and debugging tools as a service (debugware TaaS)

- Focus on security issues related to memory dumps and software traces

- New scripting languages and programming language extensions for debugging

- The maturation of the science of memory snapshots and software traces (memoretics)

Imagining is not not limited to the above and more to come and explain in the forthcoming parts.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Review of First Fault Software Problem Solving Book

Sunday, May 2nd, 2010

c’t – Magazin für Computertechnik has published a review of First Fault Software Problem Solving book: 

http://www.heise.de/ct/inhalt/2010/08/192/ (in German)

Fabian Röken kindly translated it into English:

No single large software package comes without errors. It seems that customers simply accept this, patiently waiting and hoping for patches or updates. Skwire sticks up for a more target-aimed approach: one will never get a faultless software, but it would already be a great improvement if flaws were already solved on their first occurrence (”first fault”) and not only after a long analysis (”second fault”).

The advantages are actually obvious. However, a corresponding stringent system architecture, as common on mainframes such as IBM’s z/OS, did not become prevalent in the PC market.

Skwire outlines the types of errors and strategies to resolve them in all details. His 40 years of experience, such as at IBM, shimmers through again and again. He puts emphasis on making sure that the reader understands the terminology he is using: “What is a problem in the first place?”, “What is a service point?” - in some cases he also explains specific metrics such as the “serviceability rating”.

His tool classification includes teaching tips, e.g. regarding the structure of a protocol in case of errors; or for tracking the important information how often an error must occur before a solution has to be approached. His suggestions equally address developers, designers, testers, managers - and the end user. In his last chapter he presents and reviews commercial tools in the first fault and second fault environment.

Skwire addresses a topic which is unfortunately very much neglected, and this alone already makes it worth enough to take a look at his book (***). Short quotations and humorous drawings relax the technical topic. If you are looking for an overview then you will be fine with this book. However, if you are a software developer looking for source code samples then you will search in vain. Skwire has released the book under the print-on-demand process. You will find it on Amazon, for example.

(Tobias Engler/fm)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

FFSPS Book is No. 1 Microsoft OS English Book Bestseller in Germany

Friday, April 23rd, 2010

Source: Amazon DE (at the time of this writing)

 

It is also the top best seller among OpenTask titles: http://www.opentask.com/top-bestseller-april-2010

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

First Fault Software Problem Solving Book

Wednesday, December 9th, 2009

I’m very pleased to announce that Dan Skwire’s unique book has been published by OpenTask:

First Fault Software Problem Solving: A Guide for Engineers, Managers and Users

 

- Dmitry Vostokov @ DumpAnalysis.org -

Forthcoming Books in Q4, 2009

Thursday, September 17th, 2009

I plan the following titles to be published in Q4:

- Debugged! MZ/PE: Software Tracing, September, 2009 (ISBN: 978-1906717797)
- Windows Debugging Notebook: Essential Concepts, WinDbg Commands and Tools (ISBN: 978-1906717001)
- Memory Dump Analysis Anthology, Volume 3 (ISBN: 978-1906717438 and 978-1906717445)
- Memory Dump Analysis Anthology: Color Supplement for Volumes 1-3 (ISBN: 978-1906717698)
- First Fault Software Problem Solving: A Guide for Engineers, Managers and Users (ISBN: 978-1906717421) by Dan Skwire
- Crash Dump Analysis for System Administrators and Support Engineers (Windows Edition)  (ISBN: 978-1906717025) 

The title of the latter book was slightly changed. After some time we realized that the same material is appropriate for support engineers as well.

- Dmitry Vostokov @ DumpAnalysis.org -

The Importance of First Fault

Thursday, November 27th, 2008

I’ve been thinking through the so called First Faults after Dan Skwire, a veteran in mission-critical computer system  problem resolution, problem prevention, and system recovery, organized a group on LinkedIn for first fault problem solving activity. He also has a website:

http://www.firstfaultproblemresolution.com/ 

From my software technical support experience first fault problem resolution is very important on Windows platforms, especially in enterprise terminal service and virtualized environments where hundreds of users can be hosted on just one server. Therefore, proper tools, processes and checklists need to be set up and established for effective and efficient troubleshooting and problem resolution from both engineering and customer relationship managing perspectives. Here crash and hang dump analysis helps immensely, especially memory analysis patterns and fault databases. More on this later with specific examples. I’m also working currently on incorporating first fault problem resolution into VERSION troubleshooting steps and PARTS troubleshooting methodology.

- Dmitry Vostokov @ DumpAnalysis.org -