Archive for the ‘Trace Analysis Patterns’ Category

Trace Analysis Patterns (Part 107)

Sunday, May 3rd, 2015

If you analyze ETW-based traces such as CDF you may frequently encounter No Trace Metafile pattern especially after product updates and fixes. This complicates pattern analysis because we may not be able to see Significant Events, Anchor Messages, and Error Messages. In some cases we can recover messages by comparing Message Context for unknown messages. If we have source code access this may also help. Both approaches are illustrated in the following diagram:

The same approach may also be applied for a different kind of trace artifacts when some messages are corrupt. In such cases it is possible to recover diagnostic evidence and, therefore, we call this pattern Recovered Messages.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 106)

Tuesday, April 28th, 2015

Sometimes we notice that data values are in a different order than expected. We call this pattern Data Reversal. By data values we mean some variable parts of a specific repeated message such the address of some structure or object. Data Reversal may happen for one message type:

But it can also happen for some message types and not for others. Typical example here are Enter/Leave trace messages for nested synchronization objects such as monitors and critical sections:

Since we talk about the same message type (the same Message Invariant) this pattern is different from Event Sequence Order pattern.

In rare cases we may observe Data Reversal inside one message with several variable parts but this may also be a case of Data Association.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 105)

Thursday, April 23rd, 2015

Reading Boris Uspensky’s book “A Poetics of Composition: The Structure of the Artistic Text and Typology of a Compositional Form” (in its original Russian version) led me to borrow the concept of viewpoints. The resulting analysis pattern is called Trace Viewpoints. These viewpoints are, “subjective” (semantically laden from the perspective of a trace and log reader), and can be (not limited to):

- Error viewpoints (see also False Positive Error, Periodic Error, and Error Distribution)

- Use case (functional) viewpoints (see also Use Case Trail)

- Architectural (design) viewpoints (see also Milestones)

- Implementation viewpoints (see also Implementation Discourse, Macrofunctions, and Focus of Tracing)

- Non-functional viewpoints (see also Counter Value and Diegetic Messages)

- Signal / noise viewpoints (see also Background and Foreground Components)

In comparison, Activity Regions, Data Flow, Thread of Activity, and Adjoint Thread of Activity are “objective” (structural, syntactical) viewpoints.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 104)

Tuesday, March 17th, 2015

Trace Mask is a superposition of two (or many) different traces. This is different from Inter-Correlation pattern where we may only search for certain messages without the synthesis of a new log. The most useful Trace Mask is when we have different time scales (or significantly different Trace Currents). Then we impose an additional structure on the one of the traces:

We got the idea from Narrative Masks discussed in Miroslav Drozda’s book “Narativní masky ruské prózy” (”Narrative Masks in Russian Prose”).

The very simple example of Trace Mask is shown in Debugging TV Episode 0×15.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 103)

Saturday, March 14th, 2015

Sometimes we have a uniform stream of messages that belong to some Activity Region, Thread of Activity, or Adjoint Thread of Activity. We can use micro-Discontinuities to structure that message stream into groups of actions, for example, Macrofunctions, especially if semantics of trace messages is not yet fully clear to us. This may also help us to recognize Visitor trace. Originally we wanted to call this pattern Micro Delays, but, after recognizing that such delays only make sense for one activity (since there can be too many of them in the overall log), we named this pattern Punctuated Activity. Usually such delays are small compare to Timeouts and belong to Silent Messages.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 102)

Tuesday, March 3rd, 2015

When we do tracing and logging much of computational activity is not visible. For live tracing and debugging this can be alleviated by adding Watch Threads. These are selected memory locations that may or may not be formatted according to specific data structures and are inspected at each main trace message occurrence or after specific intervals or events:

This analysis pattern is different from State Dump which is about intrinsic tracing where the developer of logging statements already incorporated variable watch in source code. Watch Threads are completely independent from original tracing and may be added independently. Counter Value is the simplest example of Watch Thread if done externally because the former usually doesn’t require source code and often means some OS or module variable independent of product internals. Watch Thread is also similar to Data Flow pattern where specific data we are interested in is a part of every trace message.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 101)

Sunday, February 22nd, 2015

Sometimes we have Basic Facts in a problem description but can’t find messages corresponding to them in a trace or log file but we are sure the tracing (logging) was done correctly. This may be because we have Sparse Trace, or we are not familiar well with product or system tracing messages (such as with Implementation Discourse).

In such a case we for search for Indirect Message of a possible cause:

Having found such a message we may hypothesize that Missing Message should have located nearby (this is based on semantics of both messages), and we then explore corresponding Message Context:

The same analysis strategy is possible for missing causal messages. Here we search for effect or side effect messages:

Having found them we proceed with further analysis:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 100)

Saturday, January 31st, 2015

Sometimes we need memory reference information not available in software traces and logs, for example, to see pointer dereferences, to follow pointers and linked structures. In such cases memory dumps saved during logging sessions may help. In case of process memory dumps we can even have several Step Dumps. Complete and kernel memory dumps may be forced after saving a log file. We call such pattern Adjoint Space:

Then we can analyze logs and memory dumps together, for example, to follow pointer data further in memory space:

There is also a reverse situation when we use logs to see past data changes before memory snapshot time (Paratext memory analysis pattern):

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 99)

Saturday, January 24th, 2015

Sometimes specific parts of simultaneous Use Case Trails, blocks of Significant Events or Message Sets in general may overlap. This may point to possible synchronization problems such as race conditions (prognostics) or be visible root causes of them if such problems are reported (diagnostics). We call this pattern Activity Overlap:

For example, a first request may start a new session and we expect the second request to be processed by the same already established session:

However, users report the second session started upon the second request. If we filter execution log by session id and do Intra-Correlational analysis we find out that session initialization prologues are overlapped. The new session started because the first session initialization was not completed:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 98)

Wednesday, January 7th, 2015

Some Discontinuities may be Periodic as Silent Messages. If such discontinuities belong to the same Thread of Activity and their Time Deltas are constant we may see Timeout pattern. When timeouts are followed by Error Message we can identify them by back tracing. Timeouts are different from Blackouts where the latter are usually Singleton Events and have large time deltas.

Here is a generalized graphical case study. An error message was identified based on incident Basic Facts:

We filtered the trace for error message TID and found 3 timeouts 30 minutes each:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 97)

Saturday, November 29th, 2014

Some traces and logs may have Periodic Message Blocks with very similar message structure and content (mostly Message Invariants). The only significant difference between them is some unique data. We call such pattern Visitor Trace by analogy with Visitor design pattern where tracing code “visits” each object data or data part to log its content or status.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 96)

Wednesday, November 19th, 2014

There are events that by design or system configuration should be seen in a log only once or not seen at all if code responsible for them was executed before tracing session. For example, the launch of certain services during system initialization shouldn’t be seen again when we trace system activity long after that. It can also be just messages from singleton objects in the application log. The appearance of extra Singleton Events may point to design violations or some abnormal system events such as process restart. The latter may Intra-Correlate with the start of the fault handling process such as WerFault.exe in Windows Process Monitor logs (Guest Component).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 215)

Sunday, November 16th, 2014

Sometimes Wait Chains such as involving critical sections may have a Missing Thread endpoint. But in some cases we might see a Ghost Thread whose TID was reused by subsequent thread creation in a different process. For example, critical section structure may refer to such TID as in the example below.

// Critical section from LSASS process

THREAD fffffa803431cb50 Cid 03e8.2718 Teb: 000007fffff80000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
fffffa80330e0500 SynchronizationEvent
Impersonation token: fffff8a00b807060 (Level Impersonation)
Owning Process            fffffa8032354c40     Image: lsass.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      107175         Ticks: 19677 (0:00:05:06.963)
Context Switch Count      2303           IdealProcessor: 1
UserTime                  00:00:00.218
KernelTime                00:00:00.109
Win32 Start Address ntdll!TppWorkerThread (0×0000000076e1f2e0)
Stack Init fffff88008e5fdb0 Current fffff88008e5f900
Base fffff88008e60000 Limit fffff88008e5a000 Call 0
Priority 10 BasePriority 10 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP RetAddr Call Site
fffff880`08e5f940 fffff800`01c7cf72 nt!KiSwapContext+0×7a
fffff880`08e5fa80 fffff800`01c8e39f nt!KiCommitThreadWait+0×1d2
fffff880`08e5fb10 fffff800`01f7fe3e nt!KeWaitForSingleObject+0×19f
fffff880`08e5fbb0 fffff800`01c867d3 nt!NtWaitForSingleObject+0xde
fffff880`08e5fc20 00000000`76e5067a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`08e5fc20)
00000000`0427cca8 00000000`76e4d808 ntdll!NtWaitForSingleObject+0xa
00000000`0427ccb0 00000000`76e4d6fb ntdll!RtlpWaitOnCriticalSection+0xe8
00000000`0427cd60 000007fe`f46a4afe ntdll!RtlEnterCriticalSection+0xd1

[…]

1: kd> .process /r /p fffffa8032354c40
Implicit process is now fffffa80`32353b30
Loading User Symbols

1: kd> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x0000000003475220
Critical section   = 0x0000000003377740 (+0x3377740)
LOCKED
LockCount          = 0×10
WaiterWoken        = No
OwningThread       = 0×00000000000004e4
RecursionCount     = 0×0
LockSemaphore      = 0×0
SpinCount          = 0×0000000000000000
OwningThread       = .thread fffffa80344e4c00
[…]

// The "owner" thread is from winlogon.exe

1: kd> !thread fffffa80344e4c00 3f
THREAD fffffa80344e4c00 Cid 21d0.14e4 Teb: 000007fffffae000 Win32Thread: fffff900c0998c20 WAIT: (WrUserRequest) UserMode Non-Alertable
fffffa80355817d0 SynchronizationEvent
Not impersonating
DeviceMap            fffff8a0000088f0
Owning Process       fffffa8034ff77c0       Image: winlogon.exe
[…]

A PML (Process Monitor) log was recorded before the complete memory dump was forced, and it clearly shows Glued Activity trace analysis pattern. The thread was owned by LSASS but then exited and its TID was subsequently reused by 2 other processes.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 95)

Sunday, November 16th, 2014

Trace and log analysis starts with assessment of artifact File Size, especially with multiple logging scenarios in distributed systems. If all log files are of the same size we might have either Circular Traces or Truncated Traces. Both point to wrong trace timing plan or just using default tracing tool configuration.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 94)

Tuesday, November 11th, 2014

Trace messages may correspond to specific implementation code such as recording the status of an operation, dumping data values, printing errors, or they may correspond to higher levels of software design and architecture, and even to use case stories. We call such messages Milestones by analogy with project management. Alternative names can be Chapter Messages, Summary Messages, Checkpoints, or Use Case Messages. These are different from Macrofunctions which are collections messages grouped by some higher function. Milestone messages are specifically designed distinct trace statements:

They can also be a part of Significant Events, serve the role of Anchor Messages, and be a part of Basic Facts and Vocabulary Index.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 93)

Sunday, November 9th, 2014

Sometimes we have several use case instances traced in one log file. Messages and Activity Regions from many Use Case Trails intermingle and make analysis difficult especially with the absence of UCID (Use Case ID), any other identification tags, or Message Links. However, initially most of the time we are interested in a sequence of Significant Events. After finding Anchor Messages we can use Time Deltas to differentiate between trace statements from different Use Case Trails. Here we assume correct Event Sequence Order. We call this pattern Event Sequence Phase by analogy with wave phases (all such individual “waves” may have different “shapes” due to various delays between different stages of their use case and implementation narratives):

In the picture above we also identified Dominant Event Sequence for use case instance C.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 92)

Sunday, November 9th, 2014

Use cases are implemented in various components such as subsystems, processes, modules, and source code files. Most of the time with good tracing implementation we are able to see Use Case Trails: log messages corresponding to use case scenarios. For simple systems one log may fully correspond to just one use case, but for complex systems, especially distributed client-server ones, there may be several use case instances present simultaneously in one log. One way to disentangle them in the absence of UCID (Use Case ID) or some other grouping tag is to use Event Sequence Phase, the subject of the next pattern.

Master Traces may also correspond to use cases but they should ideally correspond to only one use case instance.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 91)

Monday, October 6th, 2014

Sometimes the absence of messages, for example, errors and exceptions, may save time during troubleshooting and debugging by pointing to what wasn’t happening and provide additional insight. For example, on the picture below we see the same exceptions in the new and old incidents. However, in the old incident we see another exception that was linked to one unavailable server in a distributed broker architecture. Therefore, we can assume provisionally that all servers were operational when the new incident happened.

Missing Message pattern is different from Missing Component pattern where the latter may point to the component that was not loaded or executed, or simply that it wasn’t selected for tracing.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 90)

Thursday, September 25th, 2014

We recently analysed a Process Monitor log which had a several hour gap. So we decided to describe it as Blackout pattern. If you see such a pattern it might have the following possible causes:

- Some files from Split Trace are missing
- Split Trace file set was artificially created
- The tracing scope system was paused or frozen (for example, a virtualized system), or restarted
- The tracing itself was paused

Blackout trace and log analysis pattern

Blackout is different from Visibility Limit where the latter is about the inherent inability to trace whereas the former is only temporary inability due to circumstances listed above. It is also different from Discontinuity pattern where the latter is about gaps in individual Threads of Activity or Adjoint Threads of Activity

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 89)

Thursday, June 26th, 2014

Usually when we analyse traces and find an Anchor or Error Message we do backtracking using a combination of Data Flow and Message Sets and selecting appropriate log messages to form Back Trace leading to a possible root cause message:

This pattern is different from Error Thread pattern which just backtracks messages having the same TID (or in general ATID). It is also different from Exception Stack Trace pattern which is just a serialized stack trace from memory snapshot.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -