Archive for the ‘Software Trace Analysis’ Category

Trace Analysis Patterns (Part 29)

Monday, September 20th, 2010

Sometimes, when comparing normal, expected (working) and abnormal (non-working) traces we can get a clue for further troubleshooting and debugging by looking at module load events. For example, when we see an unexpected module loaded in our non-working trace, its function (and sometimes even module name) can signify some difference to pay attention to:

#     PID  TID  Time         Message
[...]
14492 6908 6912 11:06:41.953 LoadImageEvent: ImageName(\WINDOWS\system32\3rdPartySso.dll)
[...]

I call this pattern Guest Component and it is a different from Missing Component. Although in the latter pattern a missing component in one trace may appear in another but the component name is known apriori and expected. In the former pattern the component is unexpected. For example, in the trace above, its partial name fragment “Sso” may trigger a suggestion to relate differences in authentication where in a non-working case SSO (single sign-on) was configured.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Debugged! MZ/PE March print issue is out

Sunday, September 19th, 2010

Finally, after the delay, the issue is available in print on Amazon and through other sellers:

Debugged! MZ/PE: Multithreading

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 28)

Monday, September 13th, 2010

Often we see errors in software traces recorded during deviant software behavior (often called non-working software traces) and when we double check their presence in normal expected software behavior traces (often called working traces) we find them there too. This pattern is called False Positive Error. I already mentioned similar false positives when I introduced the first software trace analysis pattern called Periodic Error. Here is an example from the real trace. In a non-working trace we found this error in an adjoint thread of a foreground component:

OpenProcess error 5

However, we found the same error in the working trace so we continued looking and found several other errors:

Message request report: last error 1168, …
[…]
GetMsg result -2146435043

The last one if converted to a hex status is 8010001D but, unfortunately, the same errors were present in the working trace too in the same activity regions.

After that we started comparing both traces looking for a bifurcation point and we found the error that was only present in a non-working trace with a significant trace differences after that:

Error reading from the named pipe: 800700E9

My favourite tool (WinDbg) to convert error and status values gave this description:

0:000> !error 800700E9
Error code: (HRESULT) 0x800700e9 (2147942633) - No process is on the other end of the pipe.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 27)

Friday, August 27th, 2010

Like in a literature (and in a narratology in general) we have components that trace themselves and components that tell the story of a computation including status updates they query about other components and subsystems. This pattern gets its name Diegetic Messages from diegesis. Here’s the difference between diegetic (in blue bold) and non-diegetic trace messages:

PID    TID    TIME          MESSAGE     
11864  11912  06:34:53.598  ModuleA: foo called bar. Status OK.
11620  10372  06:34:59.754  ModuleB: ModuleA integrity check. Status OK.

Some modules may emit messages that tell about their status but from their message text we know the larger computation story like in a process startup sequence example.  

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Bugtation No.124

Monday, August 16th, 2010

Madness, the absence of memory dumps.

Michel Foucault

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

4 Years of Blogging!

Saturday, August 14th, 2010

Today I celebrated 4 years of blogging that resulted in more than 2,000 posts across 8 blogs. I would like to thank everyone for their continuing support!

The updated timeline

This blog post belongs to the 5th year of blogging.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 26)

Tuesday, August 10th, 2010

Sometimes a software trace is truncated when the trace session was stopped prematurely, often when a problem didn’t manifest itself visually. We can diagnose such traces by their short time duration, missing anchor messages or components necessary for analysis. My favourite example is user session initialization in a Citrix terminal services environment when problem effects are visible only after the session is fully initialized and an application is launched but a truncated CDF trace only shows the launch of winlogon.exe despite the presence of a process creation trace provider or other components that record the process launch sequence and the trace itself lasts only a few seconds after that.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Blog (Debugging Slang, Part 15)

Thursday, July 29th, 2010

Blog - Binary log.

Examples: This program blogged. I’m reading its blog now.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Welcome to Memory Dump Analysis Services!

Sunday, July 11th, 2010

Our future sponsor has been registered in Ireland and has its own independent website and logo: DumpAnalysis.com

More information will be available later this month.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Presenting a Software Story

Sunday, July 4th, 2010

It’s time to introduce a conceptual software narratological framework for viewing software traces (using rich ETW / CDF tracing as our main focus). Here we consider a software story (fabula) as a full trace when every component was selected for tracing and emits debug messages during code execution paths. However, during viewing we can filter on and off certain modules, threads, processes, messages, etc. (adjoint threading) and see a different sub-story or plot (sujet). Every software plot (please do not confuse with PLOT acronym) can be presented differently (using appropriate discourse). Some presentational examples include temporal rearrangement, collapse of repetitive regions, source code hypertext (lexia) and allegorical devices such as message tool-tip comments. Here is a diagram that depicts story (fable, fabula) - plot (sujet) - presentation (discourse):

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Music for Debugging: Retry, Abort, Escape

Saturday, July 3rd, 2010

Finally found music appropriate for ETW / CDF trace analysis. It’s Andre Gagnon’s album Escape:

Buy from Amazon

Here’s my version of track titles (some of them are also appropriate for crash dump analysis) with my comments in italics:

1. Non Fatal Error
2. Trace Dance (Samba)
3. En Hive
4. Char, The
5. L”Debug”
6. “Memoria”L
7. Process Hearts (cores)
8. Holidays (, but always looking back)
9. WOW (64)
10. DA+TA Master
11. Concert for 4 Threads (“Concertino” doesn’t sound good here)
12. Toc-Cat-ta of Strings
13. Bugville Promenade (along bug clusters?)
14. MOVS
15. The Sea Named Trace (after Solaris movie)
16. Catching The Bottle (it is often difficult to find a relevant problem message in a billion-line trace)
17. Debug Me Tender (DebugLove?)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Introducing Crash and Hang Analysis Audit Service

Wednesday, June 16th, 2010

Adding AI. Analysis Improvement.

After reading earlier today Windows Internals pages about system audit an idea came to my mind in the evening to provide audit services for memory dump and software trace analysis. One mind is good but two are better, especially if the second is a pattern-driven AI. Here are possible problem scenarios:

Problem: You are not satisfied with a crash report.

Problem: Your critical issue is escalated to the VP level. Engineers analyze memory dumps and software traces. No definite conclusion so far. You want to be sure that nothing has been omitted from the analysis.

Problem: You analyze a system dump or a software trace. You need a second pair of eyes but don’t want to send your memory dump due to your company security policies.

Other scenarios (use cases) will be added as soon as I see the service fit to the realities of software technical support.

I plan to make this service operational in July - August, 2010. Prices to be announced soon.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

The Extended Software Trace

Sunday, June 13th, 2010

By analogy with paratext let’s introduce a software narratological concept of the extended software trace that consists of a software trace plus additional supporting information that makes troubleshooting and debugging easier. Such “paratextual” information can consists of pictures, videos, accounts of scenarios and past problem histories, customer interviews and even software trace delivery medium and format (if preformatted).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 25)

Tuesday, June 8th, 2010

Here we introduce a software narratological partitioning of a trace into Head, Prologue, Core, Epilogue and Tail segments. This is useful for comparative software trace analysis. Suppose, a trace started just before the reproduction steps or a start marker was injected (by CDFMarker, for example) and finished just after the last repro steps or after an end marker was injected. Then its core trace messages are surrounded by prologue and epilogue statements. What is before and after are not necessary for analysis and usually distract an analyst. They are shown as grey areas on the following picture where the left trace is for working (non-working) scenario and the right trace is for non-working (working) scenario:

The size of a core segment need not be the same because environments and executed code paths might be different. However, often some traces are truncated (another pattern). Also sometimes it is difficult to establish whether the first trace is normal and the second has a tail or the first one is truncated and the second one is normal with an optional tail. Here artificial markers are important.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Ana-Trace-Log-Lyzer and Closed Session

Wednesday, June 2nd, 2010

This paleodebugging tool was excavated from Central Russia (thanks to Mr. Kutuzov) and generously provided for a photo session by its owner Mr. Mansour:

 

It also inspired this sequence of strcat: Analog -> Anatrace -> Analyzer -> Tracelyzer -> Loglyzer.

… enough tracing. It’s time to close our session:

… what is left? If you are curious, look at this conceptual picture:

Component Trace

If you wonder what electricity has to do with tracing (at a metaphorical level) look at this trace analysis pattern:

Statement Density and Current

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 24)

Wednesday, June 2nd, 2010

The limit of Discontinuity pattern is No Activity. The absence of activity can be seen at a thread level or at a process level where it is similar to Missing Component pattern. The difference from the latter pattern is that we know for certain that we selected our process modules for tracing but don’t see any trace messages. Consider this example:

#  Source        PID  TID  Time         Function     Message
1  TraceSettings 1480 8692 08:04:20.682              **** Start Trace Session 
[... TraceSettings messages 2-11 show that we selected AppA for tracing ...]
12 ModuleB       3124 4816 08:04:37.049 WorkerThread Worker thread running
13 TraceSettings 1480 8692 08:04:41.966              **** Trace Session was stopped

Only modules from AppA process and modules from a coupled process (for example, ModuleB) were selected. However, we only see a reminder message from the coupled process (3124.4816:ModuleB!WorkerThread) and no messages for 21 seconds. Fortunately, AppA process memory dump was saved during the tracing session:

Debug session time: Fri May 21 08:04:31.000 2010 (GMT+0)

We see 2 threads waiting for a critical section:

0:000> ~*kL

  14  Id: 640.8b8 Suspend: 1 Teb: 7ffa7000 Unfrozen
ChildEBP RetAddr 
0248f8c0 7c827d29 ntdll!KiFastSystemCallRet
0248f8c4 7c83d266 ntdll!ZwWaitForSingleObject+0xc
0248f900 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0x1a3
0248f920 0040dea8 ntdll!RtlEnterCriticalSection+0xa8
[…]
0248f9a4 77ce78aa rpcrt4!Invoke+0×30
0248f9c0 77ce7a94 rpcrt4!NdrCallServerManager+0×17
0248fcb8 77ce7b7c rpcrt4!NdrStubCall+0×1d6
0248fcd0 77c7ff7a rpcrt4!NdrServerCall+0×15
0248fd04 77c8042d rpcrt4!DispatchToStubInCNoAvrf+0×38
0248fd58 77c80353 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×11f
0248fd7c 77c7e0d4 rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3
0248fdbc 77c7e080 rpcrt4!RPC_INTERFACE::DispatchToStubWithObject+0xc0
0248fdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0×41e
0248fe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
0248ff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
0248ff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
0248ffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0×9d

  15  Id: 640.18c0 Suspend: 1 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr 
01b8ff40 7c827d29 ntdll!KiFastSystemCallRet
01b8ff44 7c83d266 ntdll!ZwWaitForSingleObject+0xc
01b8ff80 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0x1a3
01b8ffa0 0040dba7 ntdll!RtlEnterCriticalSection+0xa8
[…]
01b8ffec 00000000 kernel32!BaseThreadStart+0×34

Unfortunately it belongs to a missing thread and blocked threads wait forever:

0:000> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x01facdd0
Critical section   = 0x01da19c0 (+0x1DA19C0)
LOCKED
LockCount          = 0×2
WaiterWoken        = No
OwningThread       = 0×00001384
RecursionCount     = 0×1
LockSemaphore      = 0×578
SpinCount          = 0×00000000
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled

0:000> ~~[1384]
              ^ Illegal thread error in ‘~~[1384]’

Apparently AppA process was hanging and it explains why we don’t see any activity in the trace. We suggested to enable user mode stack trace database using this article as an example: CTX106970 and get a new dump.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Forthcoming Webinars: Complete Debugging and Crash Analysis for Windows

Sunday, May 30th, 2010

Finally, after careful consideration, I’ve come up with the topic that has been neglected so far but at the same time important for both camps (kernel space and user space, including managed space): complete memory dump and software trace analysis. I plan to publish the first webinar agenda early in July and deliver the webinar in August (the date should be finalized by mid July).

PS. Sailing memory spaces under an RGB flag :-)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Software Trace Analysis Tools: CDFMarker

Tuesday, May 25th, 2010

Finally Citrix has published a tool (written by my colleague Colm Naish, lead escalation engineer) that allows controlled injection of events into CDF (ETW) trace message stream. This is useful in many troubleshooting scenarios where we need to rely on Significant Event and Anchor Message analysis patterns to partition traces into artificial Activity Regions to start our analysis with. This is also analogous for the imposition of the external time on the stream of tracing events from software narratology perspective:

CDFMarker On Demand - For XenApp and XenDesktop

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Psychoanalysis of Software Troubleshooting and Debugging (Part 1)

Friday, May 21st, 2010

In this part I would like to introduce the notion of Forgotten Facts in opposition to Basic Facts or supporting information. These are facts that engineers often feel uncomfortable to mention because they are troubleshooting information they couldn’t obtain (if they tried) due to some time or customer pressures, failures, incorrectly understood troubleshooting procedures or some other obstacles. Therefore it is important to have a set of counter questions or checklists mapped from common software behaviour patterns to software troubleshooting patterns. Problem descriptions should also be subjected to close reading to reveal unconsciously concealed information. Next part will explore this in more detail with some case studies commissioned by Software Maintenance Institute.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Two Readings of a Software Trace

Friday, May 21st, 2010

When we have a software trace we read it in two directions. The first one is to deconstruct it into a linear ordered source code based on PLOT fragments. The second direction is to construct an interpretation that serve as an explanation for reported software behaviour. During the interpretive reading we remove irrelevant information, compress relevant activity regions and construct the new fictional software trace based on discovered patterns and our problem description.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -