Archive for the ‘Troubleshooting Methodology’ Category

Forthcoming Webinar: Fundamentals of Complete Crash and Hang Memory Dump Analysis

Sunday, July 18th, 2010

Complete Memory Dump Analysis Logo

Memory Dump Analysis Services (DumpAnalysis.com) organizes a free webinar

Date: 18th of August 2010
Time: 21:00 (BST) 16:00 (Eastern) 13:00 (Pacific)
Duration: 90 minutes

Topics include:

- User vs. kernel vs. physical (complete) memory space
- Challenges of complete memory dump analysis
- Common WinDbg commands
- Patterns
- Common mistakes
- Fiber bundles
- Hands-on exercise: a complete memory dump analysis
- A guide to DumpAnalysis.org case studies

Prerequisites: working knowledge of basic user process and kernel memory dump analysis or live debugging using WinDbg 

The webinar link will be posted before 18th of August on DumpAnalysis.com

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Review of First Fault Software Problem Solving Book

Sunday, May 2nd, 2010

c’t – Magazin für Computertechnik has published a review of First Fault Software Problem Solving book: 

http://www.heise.de/ct/inhalt/2010/08/192/ (in German)

Fabian Röken kindly translated it into English:

No single large software package comes without errors. It seems that customers simply accept this, patiently waiting and hoping for patches or updates. Skwire sticks up for a more target-aimed approach: one will never get a faultless software, but it would already be a great improvement if flaws were already solved on their first occurrence (”first fault”) and not only after a long analysis (”second fault”).

The advantages are actually obvious. However, a corresponding stringent system architecture, as common on mainframes such as IBM’s z/OS, did not become prevalent in the PC market.

Skwire outlines the types of errors and strategies to resolve them in all details. His 40 years of experience, such as at IBM, shimmers through again and again. He puts emphasis on making sure that the reader understands the terminology he is using: “What is a problem in the first place?”, “What is a service point?” - in some cases he also explains specific metrics such as the “serviceability rating”.

His tool classification includes teaching tips, e.g. regarding the structure of a protocol in case of errors; or for tracking the important information how often an error must occur before a solution has to be approached. His suggestions equally address developers, designers, testers, managers - and the end user. In his last chapter he presents and reviews commercial tools in the first fault and second fault environment.

Skwire addresses a topic which is unfortunately very much neglected, and this alone already makes it worth enough to take a look at his book (***). Short quotations and humorous drawings relax the technical topic. If you are looking for an overview then you will be fine with this book. However, if you are a software developer looking for source code samples then you will search in vain. Skwire has released the book under the print-on-demand process. You will find it on Amazon, for example.

(Tobias Engler/fm)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

SsOnExpert: DebugWare Patterns in Use

Tuesday, April 20th, 2010

The following tool published by Citrix follows DebugWare patterns in its overall architecture and design and was implemented by a team of engineers using RADII process:

SsOnExpert - Single Sign-On XenApp Plug-in Troubleshooting Tool

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Modern Memory Dump and Software Trace Analysis: Volumes 1-3

Sunday, April 18th, 2010

OpenTask to offer first 3 volumes of Memory Dump Analysis Anthology in one set:

The set is available exclusively from OpenTask e-Commerce web site starting from June. Individual volumes are also available from Amazon, Barnes & Noble and other bookstores worldwide.

Product information:

  • Title: Modern Memory Dump and Software Trace Analysis: Volumes 1-3
  • Author: Dmitry Vostokov
  • Language: English
  • Product Dimensions: 22.86 x 15.24
  • Paperback: 1600 pages
  • Publisher: Opentask (31 May 2010)
  • ISBN-13: 978-1-906717-99-5

Information about individual volumes:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Memory Dump and Software Trace Analysis Training and Seminars

Friday, April 9th, 2010

Plan to start providing training and seminars in my free time. If you are interested please answer these questions (you can either respond here in comments or use this form for private communication http://www.dumpanalysis.org/contact):

  • Are you interested in on-site training, prefer traveling or attending webinars?
  • Are you interested in software trace analysis as well?
  • What specific topics are you interested in?
  • What training level (beginner, intermediate, advanced) are you interested in? (please provide an example, if possible)

Additional topics of expertise that can be integrated into training include Source Code Reading and Analysis, Debugging, Windows Architecture, Device Drivers, Troubleshooting Tools Design and Implementation, Multithreading, Deep Down C and C++, x86 and x64 Assembly Language Reading.

Looking forward to your responses. Any suggestions are welcome.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

The Korean Edition of Memory Dump Analysis Anthology, Volume 1

Monday, April 5th, 2010

I’m very pleased to announce that the Korean edition is available:

The book can be found on: 

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Prescriptive Value-Added Debugging

Saturday, March 13th, 2010

This is a new methodology I’m working upon. The idea came from reading “About the Author” page in a book I got yesterday in my post:

The Nomadic Developer: Surviving and Thriving in the World of Technology Consulting

Buy from Amazon

I post a review here and on Amazon when finished reading. Just a few words now. This is the first career book I’m reading where I find pages in roman numerals useful. The page xiii itself looks like a good template (or an example) for a business-oriented CV summary. Thinking now about updating my CV book (2nd edition?):

Resume and CV: As a Book

Buy from Amazon

With fix-privet,
Dr. DebugLove

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Forthcoming Memory Dump Analysis Anthology, Volume 4

Thursday, February 11th, 2010

This is a revised, edited, cross-referenced and thematically organized volume of selected DumpAnalysis.org blog posts about crash dump analysis and debugging written in July 2009 - January 2010 for software engineers developing and maintaining products on Windows platforms, quality assurance engineers testing software on Windows platforms and technical support and escalation engineers dealing with complex software issues. The fourth volume features:

- 13 new crash dump analysis patterns
- 13 new pattern interaction case studies
- 10 new trace analysis patterns
- 6 new Debugware patterns and case study
- Workaround patterns
- Updated checklist
- Fully cross-referenced with Volume 1, Volume 2 and Volume 3
- New appendixes

Product information:

  • Title: Memory Dump Analysis Anthology, Volume 4
  • Author: Dmitry Vostokov
  • Language: English
  • Product Dimensions: 22.86 x 15.24
  • Paperback: 410 pages
  • Publisher: Opentask (30 March 2010)
  • ISBN-13: 978-1-906717-86-5
  • Hardcover: 410 pages
  • Publisher: Opentask (30 April 2010)
  • ISBN-13: 978-1-906717-87-2

Back cover features memory space art image: Internal Process Combustion.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Workaround Patterns (Part 3)

Tuesday, January 26th, 2010

What happens when Hidden Output and Frozen Process patterns don’t help with annoying popup windows? The former can’t prevent windows from reappearing afresh and the latter could block other coupled processes that might exchange window messages with our suspended process or simply use any IPC mechanism. Here Axed Code pattern can help as demonstrated below. One process was frequently and briefly showing network disconnection message box or dialog. The problem is that it was also bringing its main window into foreground disrupting work in other windows because they were loosing focus. Next time the dialog appeared we found its process ID in Task Manager and attached WinDbg to it. We wasn’t sure what dialog function to intercept so we put a general breakpoint on all “Dialog” functions for all threads:

0:000:x86> bm *Dialog*
[...]
  6: 73a8ba81 @!"MFC80!CDialog::~CDialog"
  7: 73ac25e2 @!"MFC80!CPageSetupDialog::~CPageSetupDialog"
  8: 73a94b6b @!"MFC80!CDHtmlDialog::_AfxSimpleScanf"
  9: 73a8fbe9 @!"MFC80!CFileDialog::OnTypeChange"
 10: 73a90b17 @!"MFC80!CColorDialog::GetRuntimeClass"
 11: 73a8bb4a @!"MFC80!CDialog::CreateIndirect"
[...]
360: 73a93750 @!"MFC80!CDHtmlDialog::OnNavigateComplete"
361: 73a8f1f3 @!"MFC80!CCommonDialog::OnOK"
362: 73a95d9f @!"MFC80!CDHtmlDialog::GetDropTarget"
363: 73a90266 @!"MFC80!CPrintDialog::GetDevMode"
364: 73ac1514 @!"MFC80!COleInsertDialog::COleInsertDialog"
365: 73ac27c7 @!"MFC80!COlePropertiesDialog::COlePropertiesDialog"
366: 73a75282 @!"MFC80!CWnd::UpdateDialogControls"
367: 73a7fd86 @!"MFC80!CDialogBar::SetOccDialogInfo"

0:000:x86> g
Breakpoint 314 hit
MFC80!_AfxPostInitDialog:
73a7134e 55              push    ebp

0:000:x86> kL 100
ChildEBP RetAddr  Args to Child             
0027ed2c 73a7180a MFC80!_AfxPostInitDialog
0027ed90 75628817 MFC80!_AfxActivationWndProc+0x90
0027edbc 7562898e USER32!InternalCallWinProc+0x23
0027ee34 7562c306 USER32!UserCallWinProcCheckWow+0x109
0027ee78 756375a2 USER32!SendMessageWorker+0x55b
0027ef4c 7563787a USER32!InternalCreateDialog+0xb64
0027ef70 75649b65 USER32!CreateDialogIndirectParamAorW+0x33
0027ef9c 75225192 USER32!CreateDialogParamA+0x4a
WARNING: Stack unwind information not available. Following frames may be wrong.
0027efc8 010c3bf1 DllA!WarningPopup+0×152
0027effc 73a71812 ProcessA+0×9fa1
00000000 00000000 MFC80!_AfxActivationWndProc+0×98

Now we cleared all breakpoints and put the new breakpoint on WarningPopup function:

0:000:x86> bc *

0:000:x86> bp DllA!WarningPopup

0:000:x86> g
Breakpoint 0 hit
DllA!WarningPopup:
75225040 51              push    ecx

Then we assumed that the calling convention was the default one used by C or C++ code like _cdecl and took the bold step to replace push ecx with ret instruction:

0:000:x86> a 75225040
75225040 ret
ret
75225041

0:000:x86> g
Breakpoint 0 hit
DllA!WarningPopup:
75225040 c3 ret

0:000:x86> bc *

0:000:x86> g

Result: no warning popups anymore.

I originally intended to name the pattern Patched Code but then realized that code axing can also be done at the source code level as a quick temporal fix.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Workaround Patterns (Part 2)

Monday, January 25th, 2010

Another workaround pattern for some problems is to freeze a process responsible for an annoying or excessive activity like in the case study: Debugger as a Shut Up Application. We can also use other tools for this purpose like Mark Russinovich’s PsSuspend. The suitable name for this pattern is Frozen Process.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Workaround Patterns (Part 1)

Sunday, January 24th, 2010

After fighting HTML comments in Safari and Chrome (see the case study below) I came to an idea to name and catalog workaround patterns in troubleshooting and debugging. The first one is called Hidden Output. Sometimes we can just remove message boxes reporting minor problems and generating unnecessary support calls by hiding their windows, for example, by using CtxHideEx32. A different example is what I did today when troubleshooting Amazon aStore widget HTML code. It worked well in IE8:

However, in Apple Safari and Google Chrome the widget code was visible at the top of the page:

 

After a few unsuccessful attempts to debug the problem and faced with other pressing tasks I got a flash in my mind to hide the visible code by changing its color to be the same as its background:

<font color=”D3E7F4″><script type=”text/javascript”><!–
amazon_ad_tag=”crasdumpanala-20″;
amazon_ad_width=”728″;
amazon_ad_height=”90″;
amazon_color_background=”D3E7F4″;
amazon_color_border=”0000FF”;
amazon_color_logo=”FFFFFF”;
amazon_color_link=”0000FF”;
amazon_ad_logo=”hide”;
amazon_ad_link_target=”new”;
amazon_ad_border=”hide”;
amazon_ad_title=”OpenTask Books, Magazines and Notebooks”; //–></script>
<script type=”text/javascript” src=”http://www.assoc-amazon.com/s/asw.js”></script></font>

 
After that the picture became nicer:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Memory Dump Analysis Anthology, Volume 3

Sunday, December 20th, 2009

“Memory dumps are facts.”

I’m very excited to announce that Volume 3 is available in paperback, hardcover and digital editions:

Memory Dump Analysis Anthology, Volume 3

Table of Contents

In two weeks paperback edition should also appear on Amazon and other bookstores. Amazon hardcover edition is planned to be available in January 2010.

The amount of information was so voluminous that I had to split the originally planned volume into two. Volume 4 should appear by the middle of February together with Color Supplement for Volumes 1-4. 

- Dmitry Vostokov @ DumpAnalysis.org -

Debugged! MZ/PE September issue is out

Wednesday, December 16th, 2009

Finally, after the long delay, the issue is available in print on Amazon and through other sellers:

Debugged! MZ/PE: Software Tracing

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

The Law of Simple Tools

Wednesday, December 9th, 2009

In its simplest form the first law of troubleshooting and debugging states that:

The more frequent a problem is, the simpler tool is needed to resolve and fix it.

- Dmitry Vostokov @ DumpAnalysis.org -

First Fault Software Problem Solving Book

Wednesday, December 9th, 2009

I’m very pleased to announce that Dan Skwire’s unique book has been published by OpenTask:

First Fault Software Problem Solving: A Guide for Engineers, Managers and Users

 

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 92)

Tuesday, November 24th, 2009

Sometimes the functionality of a system depends upon a specific application or service process. For example, in a database server environment it might be a database process, in printing environment it is a print spooler process or in a terminal services environment it is a terminal services process (termsvc, hosted by svchost.exe). In system failure scenarios we should check these processes for their presence (and also the presence of any coupled processes), hence the name of this pattern: Missing Process. However, if the vital process is present we should check if it is exited but references to it exist or there are any missing threads or components inside it, any suspended threads and special processes like a postmortem debugger. We shouldn’t also forget about service dependencies and their relevant process startup order. For example, we know that our service is hosted by svchost.exe and we see one such process exited but its object still referenced somewhere:

0: kd> !vm

*** Virtual Memory Usage ***
[...]
         0ed8 svchost.exe          0 (         0 Kb)
[…]

However, another command shows that it could be a different service hosted by the same image, svchost.exe, if we know that ServiceA depends on our service:

0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 8b581818  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
    DirBase: bff4d020  ObjectTable: e1001e18  HandleCount: 1601.
    Image: System

PROCESS 8b06d778  SessionId: none  Cid: 01a8    Peb: 7ffde000  ParentCid: 0004
    DirBase: bff4d040  ObjectTable: e13eae40  HandleCount:  22.
    Image: smss.exe

[...]

PROCESS 8aabed88  SessionId: 0  Cid: 0854    Peb: 7ffd6000  ParentCid: 0220
    DirBase: bff4d4a0  ObjectTable: e1c867a8  HandleCount: 778.
    Image: ServiceA.exe

[...]

PROCESS 8aaa6510  SessionId: 0  Cid: 0ed8    Peb: 7ffd4000  ParentCid: 0220
    DirBase: bff4d580  ObjectTable: 00000000  HandleCount:   0.
    Image: svchost.exe

[...]

Another alternative is that our service was restarted but then exited. If our process is not visible it could be possible that it was either stopped or simply crashed before.

- Dmitry Vostokov @ DumpAnalysis.org -

There Ought to be a Planet at that Location!

Thursday, October 22nd, 2009

One ETW trace pointed to a set of intermittent symptoms (messages were simplified for this post):

#        PID        TID        Message 
[...]
31278    2300       7060       RequestXMLData entry
31281    2300       7060       RequestXMLData: XML error     
[...]

Searching for issues having this error only pointed to a case with a mixed software product environment where some servers had the product version X and other servers the product version X+1. However, in the new case the customer claimed that he had only the product version X+1 on all production servers. We insisted and, after the closer inspection, servers with the product X were found… 

- Dmitry Vostokov @ TraceAnalysis.org -

Can Software Tweet?

Monday, September 28th, 2009

Every PID has its twitter account. Processes emit short trace messages (STM) and others subscribe to them. This is the technical support of the future, the concept of SoftWeet (*):

www.SoftWeet.com

(*) to weet

to know; to wit (archaic)

- Dmitry Vostokov @ DumpAnalysis.org -

Forthcoming Memory Dump Analysis Anthology, Volume 3

Saturday, September 26th, 2009

This is a revised, edited, cross-referenced and thematically organized volume of selected DumpAnalysis.org blog posts about crash dump analysis and debugging written in October 2008 - June 2009 for software engineers developing and maintaining products on Windows platforms, quality assurance engineers testing software on Windows platforms and technical support and escalation engineers dealing with complex software issues. The third volume features:

- 15 new crash dump analysis patterns
- 29 new pattern interaction case studies
- Trace analysis patterns
- Updated checklist
- Fully cross-referenced with Volume 1 and Volume 2
- New appendixes

Product information:

  • Title: Memory Dump Analysis Anthology, Volume 3
  • Author: Dmitry Vostokov
  • Language: English
  • Product Dimensions: 22.86 x 15.24
  • Paperback: 404 pages
  • Publisher: Opentask (20 December 2009)
  • ISBN-13: 978-1-906717-43-8
  • Hardcover: 404 pages
  • Publisher: Opentask (30 January 2010)
  • ISBN-13: 978-1-906717-44-5

Back cover features 3D computer memory visualization image.

- Dmitry Vostokov @ DumpAnalysis.org -

DebugWare Patterns (Part 9)

Thursday, September 24th, 2009

Real troubleshooting is usually done by combining several units of work chosen from a manual. Checklist pattern summarizes this recurrent practice. Checklist Coordinator orchestrates troubleshooting units of work (TUWs) components from TUW Repository according to checklists from Checklist Repository (in the simple case it can be just one checklist). This is illustrated on the following UML component diagram:

- Dmitry Vostokov @ DumpAnalysis.org -