Archive for the ‘Software Trace Analysis Tips and Tricks’ Category

Trace Analysis Patterns (Part 160)

Saturday, September 22nd, 2018

Sometimes, we ask for a log file to see State and Event pattern, and see it there, only to find that we cannot do Back Trace of State Dumps from some Significant Event for Inter-Correlation analysis because our Data Interval is truncated (Truncated Trace). This highlights the importance of proper tracing intervals that we call Significant Interval analysis pattern by analogy with significant digits in scientific measurements. The following diagram illustrate the pattern:

If you find out you get truncated traces and logs often you may want to increase Statement Current for state logging.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 252)

Sunday, May 21st, 2017

Some Stack Traces reveal a functional purpose, for example, painting, querying a database, doing HTML or JavaScript processing, loading a file, printing, or something else. Such traces from various Stack Trace Collections (unmanaged, managed, predicate, CPUs) may be compared for similarity and help with analysis patterns, such as examples in Wait Chain (C++11 condition variable, SRW lock), finding semantically Coupled Processes, and many others where we look at the meaning of stack trace frame sequences to relate them to each other. We call this analysis pattern Stack Trace Motif by analogy with Motif trace and log analysis pattern. Longer stack traces may contain several Motives and also Technology-Specific Subtraces (for example, COM interface invocation frames).

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 148)

Wednesday, May 17th, 2017

In order to perform Inter-Correlational analysis among traces and logs especially when we have Indexical Trace we need a dual operation: an ability to identify traces and Use Case Trails, and, if necessary, find their corresponding Braids of Activity in an index trace. Some data from the tracing domain or use case description may serve is Intrinsic ID. It can be itself some numeric ID, user or computer name.

A typical log analysis case from a distributed environment is illustrated on the following diagram:

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 147)

Wednesday, May 3rd, 2017

Often, for Inter-Correlational trace and log analysis, we need to make sure that we have synchronized traces. The one version of Unsynchronized Traces analysis pattern is depicted in the following diagram where one trace ends  (possibly Truncated Trace) before the start of another trace and both were traced within one hour:

If tracing was done in different time zones with different local times specified in logs we can determine whether the traces are synchronized (when time zone information is not available in Basic Facts) by looking at minutes as shown for the third trace in the diagram above. This technique can also be used in trace calibration (see Calibrating Trace).

There is a similar analysis pattern for memory analysis called Unsynchronized Dumps.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 113)

Saturday, September 12th, 2015

Recently we analyzed a few logs which ended with a specialized Activity Region from a subsystem that sets operational parameters. The problem description stated that the system became unresponsive after changing parameters in a certain sequence. Usually, for that system, when we stop logging (even after setting parameters) we end up with messages from some Background Components since some time passes between the end of setting parameters activity and the time the operator sends stop logging request:

However, in the problem case we see message flow stops right in the middle of parameter setting activity:

So we advised to check for any crashes or hangs, and, indeed, it was found that the system was actually experiencing system crashes, and we got memory dumps for analysis where we found Top Module from a 3rd-party vendor related to parameter setting activity.

Please also note an analogy here between normal thread stack traces from threads that are waiting most of the time and Spiking Thread stack trace caught up in the middle of some function.

We call this pattern Ruptured Trace after a ruptured computation.

Note, that if it is possible to restart the system and resume the same tracing we may get an instance of Blackout analysis pattern.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 107)

Sunday, May 3rd, 2015

If you analyze ETW-based traces such as CDF you may frequently encounter No Trace Metafile pattern especially after product updates and fixes. This complicates pattern analysis because we may not be able to see Significant Events, Anchor Messages, and Error Messages. In some cases we can recover messages by comparing Message Context for unknown messages. If we have source code access this may also help. Both approaches are illustrated in the following diagram:

The same approach may also be applied for a different kind of trace artifacts when some messages are corrupt. In such cases it is possible to recover diagnostic evidence and, therefore, we call this pattern Recovered Messages.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 103)

Saturday, March 14th, 2015

Sometimes we have a uniform stream of messages that belong to some Activity Region, Thread of Activity, or Adjoint Thread of Activity. We can use micro-Discontinuities to structure that message stream into groups of actions, for example, Macrofunctions, especially if semantics of trace messages is not yet fully clear to us. This may also help us to recognize Visitor trace. Originally we wanted to call this pattern Micro Delays, but, after recognizing that such delays only make sense for one activity (since there can be too many of them in the overall log), we named this pattern Punctuated Activity. Usually such delays are small compare to Timeouts and belong to Silent Messages.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 101)

Sunday, February 22nd, 2015

Sometimes we have Basic Facts in a problem description but can’t find messages corresponding to them in a trace or log file but we are sure the tracing (logging) was done correctly. This may be because we have Sparse Trace, or we are not familiar well with product or system tracing messages (such as with Implementation Discourse).

In such a case we for search for Indirect Message of a possible cause:

Having found such a message we may hypothesize that Missing Message should have located nearby (this is based on semantics of both messages), and we then explore corresponding Message Context:

The same analysis strategy is possible for missing causal messages. Here we search for effect or side effect messages:

Having found them we proceed with further analysis:

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 100)

Saturday, January 31st, 2015

Sometimes we need memory reference information not available in software traces and logs, for example, to see pointer dereferences, to follow pointers and linked structures. In such cases memory dumps saved during logging sessions may help. In case of process memory dumps we can even have several Step Dumps. Complete and kernel memory dumps may be forced after saving a log file. We call such pattern Adjoint Space:

Then we can analyze logs and memory dumps together, for example, to follow pointer data further in memory space:

There is also a reverse situation when we use logs to see past data changes before memory snapshot time (Paratext memory analysis pattern):

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 96)

Wednesday, November 19th, 2014

There are events that by design or system configuration should be seen in a log only once or not seen at all if code responsible for them was executed before tracing session. For example, the launch of certain services during system initialization shouldn’t be seen again when we trace system activity long after that. It can also be just messages from singleton objects in the application log. The appearance of extra Singleton Events may point to design violations or some abnormal system events such as process restart. The latter may Intra-Correlate with the start of the fault handling process such as WerFault.exe in Windows Process Monitor logs (Guest Component).

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 95)

Sunday, November 16th, 2014

Trace and log analysis starts with assessment of artifact File Size, especially with multiple logging scenarios in distributed systems. If all log files are of the same size we might have either Circular Traces or Truncated Traces. Both point to wrong trace timing plan or just using default tracing tool configuration.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 93)

Sunday, November 9th, 2014

Sometimes we have several use case instances traced in one log file. Messages and Activity Regions from many Use Case Trails intermingle and make analysis difficult especially with the absence of UCID (Use Case ID), any other identification tags, or Message Links. However, initially most of the time we are interested in a sequence of Significant Events. After finding Anchor Messages we can use Time Deltas to differentiate between trace statements from different Use Case Trails. Here we assume correct Event Sequence Order. We call this pattern Event Sequence Phase by analogy with wave phases (all such individual “waves” may have different “shapes” due to various delays between different stages of their use case and implementation narratives):

In the picture above we also identified Dominant Event Sequence for use case instance C.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 78)

Wednesday, February 12th, 2014

The next trace and log analysis pattern name is borrowed from Factor Groups in mathematics (or quotient groups). Here a group is, of course, not a mathematical group but just a group (or set) of log messages or trace statements. However, every trace message has variable and invariant parts. Variable parts usually contain some values, addresses or status bits. It can even be string values. Such values form a set too and can be partitioned into disjoint (non-overlapping) subsets. For example, a window foreground status can be either true or false. And we can group messages into disjoint factor groups each one having either only true or only false foreground status. The following trace graph illustrates a WindowHistory64 log where it was reported that one window was periodically loosing and gaining focus:

Factor Group Trace and Log Analysis Pattern

We found messages related to the reported process window title. By using Density Distribution pattern another such group of messages was found for another process window. Then a factor group was formed with two subgroups and their Relative Density was compared. For correlated alternating values it was expected to be 1. This was a very simple case, of course, which was analysed just by looking at a textual log but in more complex cases a computer assistance is required. A member of a factor group can also be generalized as a message subset with messages having variable part values from some domain subset or even calculated from it (some sort of a predicate): Mi = { m | P(m) }, where the original group of messages is a disjoin union of such message subsets: M = U Mi.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 61)

Thursday, December 13th, 2012

Some tracing tools such as Citrix CDFControl have an option to split software traces and logs into several files during long recording. Although this should be done judiciously it is really necessary sometimes. We call this pattern Split Trace. What to do if we get several trace files and we want to use some other analysis tool such as Citrix CDFAnalyzer? If we know that the problem happened just before the tracing was stopped we can look at the last few such files from the file sequence (although we recommend a circular trace here). Otherwise we can convert them into CVS files and import into Excel which also supports adjoint threading.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 60)

Tuesday, December 11th, 2012

We have decided to factor out Anchor Messages example of message interleave into another pattern called Message Interleave. It covers superposition of different anchor messages, for example, process launch and exit, or DLL load and unload:

- Dmitry Vostokov @ + -