Trace Analysis Patterns (Part 85)

May 12th, 2014

Most of the time, especially for large software logs, we need to select messages based on some criteria be it a set of Error Messages, a set of messages containing Basic Facts, or some other predicate. Then we can use selected messages from that Message Set as Anchor Messages or reverse Pivot Messages as an aid in further analysis.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 206)

May 10th, 2014

Here we introduce another Wait Chain pattern where a client thread makes a request and a created server thread servicing the request makes another request to the client which creates a new client thread to service the server request. The new client thread makes a request to the server again and a new server thread is created which makes a new client request, and so on. We call such a pattern Screwbolt Wait Chain. The additional signs here may be an abnormal number of threads and possibly Handle Leak pattern although the latter may be present only in a client or server process only. Thread Age, Waiting Thread Time, and common Blocking Module patterns may be used to unwind the chain and diagnose the possible problem module and corresponding Module Product Process. The pattern is illustrated on this diagram:

Although we initially found this pattern related to LPC /ALPC IPC we think it not limited to it and can occur in different client-server communication implementations.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 84)

May 7th, 2014

One of the powerful trace analysis techniques is using Adjoint Threads of Activity to filter various linear message activities (as a generalization of Thread Of Activity). Such filtered activities can then be analysed either separately (Sheaf of Activities) or together such as a new pattern we introduce here: Message Cover. If we identify parallel ATIDs (ATID is Adjoint TID, see an example) and see that one covers the other we can then make a hypothesis that they are correlated. Here is a graphical example of a Periodic Message Block largely composed from various Error Messages that covers periodic Discontinuities from another ATID (we can also consider the latter as periodic message blocks consisted from Silent Messages):

This is analogous to a cover in topology.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 83)

May 5th, 2014

Introduced last year in Debugging TV Frames episode 0×32 about Android / Java debugging State Dump pattern solves the problem of a program state analysis when memory dump generation is not available or doesn’t help or complicated in the case of interpreted code. Basically a developer identifies a set of state variables and periodically prints their values to the output logging stream. Such output may also include but not limited to Counter Values.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 82)

May 5th, 2014

So far we have been discussing trace analysis patterns related to execution of a particular software version. However, software code changes and also its tracing and logging output: from large scale changes where components are replaced to small scale code refactoring affecting message structure and format. On a software narratological level this corresponds to a narrative about a software trace or log, it evolution. Such Meta Trace analysis pattern is different from Master Trace pattern where the latter is similar to what Metanarrative is usually meant in narratology: a master or grand idea - an expected trace if all functional requirements were correctly identified and implemented during software construction and non-functional ones are met during software execution.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 81)

May 2nd, 2014

Sometimes we are interested in changes in particular {property, value} pairs or in tuples {x1,x2,x3,…) in general where xi can be a number or a substring. This is more general pattern than Message Change because such tuples can be from different sources and belong to different messages:

This pattern is also different from Data Flow where a value stays constant across different sources and messages. It is also different from Gossip pattern which involves more semantic changes. Metaphorically we can think of Data Association pattern as a partial derivative.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Falsity and Coincidence Patterns

April 28th, 2014

A page to reference all different kinds of coincidence and falsity related patterns is necessary, so I created this post:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 205)

April 26th, 2014

When calculating effective addresses such as [r10+10h] or [rax+rcx*12h+40h] to show their value in the output of some commands such as .trap or .cxr a debugger uses CPU register values from a saved trap frame or context structure. If such information is invalid the reported effective address doesn’t correspond to the real one during code execution. So we call this analysis pattern False Effective Address similar to False Function Parameters. Therefore, if a fault address is saved during bugcheck or exception processing it may not correspond to the output of some commands where such calculation is necessary. For example, in a bugcheck parameter we have this referenced memory address:

Arg1: fffffadda17d001d, memory referenced

but the output of .trap command shows a NULL pointer address:

NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
[...]
movzx eax,word ptr [rax+10h] 0010=????

Usually we are lucky and an effective address is correct despite the warning such as here and here.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 80)

April 21st, 2014

Sometimes we know from Basic Facts some data or activity we seek to identify in different traces collected together to perform inter-correlational analysis. It can be a shared file name, a named synchronization object, a locked file with sharing violations, a common virtual address in kernel space, or just some activity notification. We call this pattern Shared Point by analogy with intersecting curves in some abstract space.

It is similar to Linked Messages pattern but is more high level and not confined to a common parameter (can be an action description).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 79)

April 20th, 2014

We mostly analyse real messages in software traces and logs. In such message streams we may see easily detectable Discontinuity patterns. However, in some cases it is beneficial to analyse the absence of messages. Message stream is not uniform, there may be different currents. If time resolution is 1 ms, for example, then we may have a current N msg/ms or in the case of lesser current, such as, 0.5 msg/ms we have the so called Silent Messages (—-):

[...]
11 ms: message
12 ms: ----
13 ms: message
14 ms: ----
15 ms: message
16 ms: message
17 ms: ----
18 ms: ----
19 ms: message
[...]

So, by a silent message we understand the possible message that would occupy the minimal time resolution gap. If we look at the following illustration we would see that the whole pattern analysis apparatus can be applied to the analysis of distribution of silent messages.

Silent Messages pattern is different from Discontinuity pattern because the latter is about large unexpected silences and Sparse Trace which is about missing trace statements from source code.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 190b)

April 19th, 2014

While working on Thread Cluster pattern I realized that we need a predicate version of Module Collection pattern, similar to the predicate version of Stack Trace Collection pattern. A predicate can be anything: a company vendor, semantic proximity, functionality such as printing, remote file management, and so on. Such module sub-collections can be used instead of modules in more complex patterns: an example of software diagnostics pattern substitution and composition. For example, we might be able to identify a possible coupling between 2 semantically different module groups explained by IPC Wait Chains such as on this diagram:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 204)

April 18th, 2014

One of useful patterns for the analysis of system hangs is Thread Waiting Time. If there are many such threads of interest they can be partitioned by waiting time and modules of interest from their stack traces. Modules of interest may include Directing, Coupled, Blocking, Top, and/or Problem modules depending on the problem description. We call the resulting composite pattern Thread Cluster. Extra-dimensional information can also be added such as the number of threads having the same or similar waiting time and other attributes by using different colours. For example, on this diagram, illustrating a real system hang, we see clustering of threads running through one 3rd-party module of interest and having the longest waiting time. Also we are able to identify possibly coupled (semantically related) threads running through another module of interest:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 42l)

April 9th, 2014

This is a variation of a general Wait Chain pattern related to CLR threads. When looking at Stack Trace Collection from a complete memory dump we may find threads using a monitor synchronization mechanism:

[... 32-bit ...]
09d2e908 6ba4d409 clr!CLREvent::WaitEx+0x106
09d2e91c 6bb90160 clr!CLREvent::Wait+0x19
09d2e9ac 6bb90256 clr!AwareLock::EnterEpilogHelper+0xa8
09d2e9ec 6bb9029b clr!AwareLock::EnterEpilog+0x42
09d2ea0c 6ba90f78 clr!AwareLock::Enter+0x5f
09d2eaa8 05952499 clr!JIT_MonEnterWorker_Portable+0xf8
[…]

or

[... 64-bit ...]
00000000`2094e230 000007fe`eedc3e3a clr!CLREvent::WaitEx+0xc1
00000000`2094e2d0 000007fe`eedc3d43 clr!AwareLock::EnterEpilogHelper+0xca
00000000`2094e3a0 000007fe`eee3e613 clr!AwareLock::EnterEpilog+0x63
00000000`2094e400 000007ff`007f4c38 clr!JIT_MonEnterWorker_Portable+0×14f
[…]

When seeing such threads we may ask for a process memory dump to perform .NET memory dump analysis using SOS or other WinDbg extensions such as in Deadlock pattern example for CLR 2 (mscorwks).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 78)

February 12th, 2014

The next trace and log analysis pattern name is borrowed from Factor Groups in mathematics (or quotient groups). Here a group is, of course, not a mathematical group but just a group (or set) of log messages or trace statements. However, every trace message has variable and invariant parts. Variable parts usually contain some values, addresses or status bits. It can even be string values. Such values form a set too and can be partitioned into disjoint (non-overlapping) subsets. For example, a window foreground status can be either true or false. And we can group messages into disjoint factor groups each one having either only true or only false foreground status. The following trace graph illustrates a WindowHistory64 log where it was reported that one window was periodically loosing and gaining focus:

Factor Group Trace and Log Analysis Pattern

We found messages related to the reported process window title. By using Density Distribution pattern another such group of messages was found for another process window. Then a factor group was formed with two subgroups and their Relative Density was compared. For correlated alternating values it was expected to be 1. This was a very simple case, of course, which was analysed just by looking at a textual log but in more complex cases a computer assistance is required. A member of a factor group can also be generalized as a message subset with messages having variable part values from some domain subset or even calculated from it (some sort of a predicate): Mi = { m | P(m) }, where the original group of messages is a disjoin union of such message subsets: M = U Mi.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 77)

February 11th, 2014

Sometimes we find a grouping of some messages in one trace and then we are interested in the some groupings either in the same trace (Intra-Correlation) or in another trace (Inter-Correlation). We may consider such grouping as having some local density compared to global Message Density pattern. Then we might be interested in that selected message grouping Density Distribution illustrated on this minimal trace graph:

Density Distribution trace and log analysis pattern

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 76)

December 22nd, 2013

Activity Regions or blocks of messages having the same TID or PID usually follow each other in a typical complex software trace. Such following can be completely random and independent or it may be linear based on IPC or some inter-thread communication mechanism. For example, after filtering out Background Components we my find that an RPC client call setup is followed by messages from an RPC server:

Using a coordinate approach with message number and PID axes we can reformat this minimal trace diagram:

We call such pattern Piecewise Activity where we borrowed the concept of a piecewise linear function in mathematics (and piecewise continuity). In some problem software behaviour scenarios where we encountered such analysis pattern it was complemented by Discontinuity pattern. For example, an RPC call may be blocked and we don’t see client messages after that break till the end of the trace. In such cases we always recommended forcing a complete memory dump to check for wait chain memory analysis patterns.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 203)

December 7th, 2013

Sometimes we look at a stack trace collection or it’s predicate subset and recognize that one of parameters is actually the same structure address or handle. We call this pattern Shared Structure. In x64 case we may possibly see it from the return address backwards disassembly (ub WinDbg command) but in x86 case most of the time we can spot that directly from the verbose stack trace, like in the snippet below (unless a parameter memory slot was reused):

THREAD 830f9990 Cid 0428.0e94 Teb: 7ffdf000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable
[...]
ChildEBP RetAddr  Args to Child
0031f74c 7784b071 00000000 00000000 7ffdb000 ntdll!RtlpWaitOnCriticalSection+0x154
0031f774 00a91150 00a9b7a8 00000000 00a91452 ntdll!RtlEnterCriticalSection+0×152
WARNING: Stack unwind information not available. Following frames may be wrong.
0031f7c8 76113833 7ffdb000 0031f814 7784a9bd Application+0×1150
0031f7d4 7784a9bd 7ffdb000 003114bf 00000000 kernel32!BaseThreadInitThunk+0xe
0031f814 00000000 00a914a9 7ffdb000 00000000 ntdll!_RtlUserThreadStart+0×23

THREAD 886ee030 Cid 0428.0ef4 Teb: 7ffde000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable
[...]
ChildEBP RetAddr  Args to Child
0098fcb8 77f881b1 00000000 00000000 001614a0 ntdll!RtlpUnWaitCriticalSection+0x1b
0098fce0 00a9102e 00a9b7a8 00000000 00000000 ntdll!RtlEnterCriticalSection+0×152
WARNING: Stack unwind information not available. Following frames may be wrong.
0098fd28 00a91275 0098fd3c 76113833 001614a0 Application+0×102e
0098fd30 76113833 001614a0 0098fd7c 7784a9bd Application+0×1275
0098fd3c 7784a9bd 001614a0 009811d7 00000000 kernel32!BaseThreadInitThunk+0xe
0098fd7c 00000000 00a911ff 001614a0 00000000 ntdll!_RtlUserThreadStart+0×23

In case of multiple exceptions or even a single exception on one thread involving invalid access to a structure field the reference to the same structure on a different thread may point to possible synchronization problems.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 202)

November 9th, 2013

Sometimes we see the so called Small Values in memory (such as on raw stack) or in CPU registers which can be an ASCII or UNICODE value, some ID or even a handle. When in aggregates they can form a certain Semantic Structure such as a PID.TID example or Regular Data pattern. Here we illustrate a handle example (also an example of a Wait Chain analysis in user space):

0:000> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
00000000`0016de78 000007fe`fcf010dc : 00000000`02c79fa0 00000000`08c3faf0 00000000`021551f0 00000000`08c3fb00 : ntdll!NtWaitForSingleObject+0xa
00000000`0016de80 000007fe`f90e6d7f : 00000000`10b40010 00000000`10b40010 00000000`00000000 00000000`000007e0 : KERNELBASE!WaitForSingleObjectEx+0×79
[…]

0:000> !handle 00000000`000007e0 ff
Handle 00000000000007d0
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,
Impersonate,DirectImpersonate
  HandleCount   5
  PointerCount  9
  Name          <none>
  Object specific information
    Thread Id   278c.a58
    Priority    13
    Base Priority 0

0:000> ~~[a58]s
ntdll!NtWaitForMultipleObjects+0xa:
00000000`770c186a c3              ret

0:002> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
00000000`0f6af758 000007fe`fcf01430 : 00000000`00000025 00000000`00000000 00000000`00000000 000007fe`e35a1fb0 : ntdll!NtWaitForMultipleObjects+0xa
00000000`0f6af760 00000000`76e61220 : 00000000`0f6af8a8 00000000`0f6af890 00000000`00000000 00000000`00000000 : KERNELBASE!WaitForMultipleObjectsEx+0xe8
[...]

0:026> dp 00000000`0f6af890 L4
00000000`0f6af890  00000000`00000dbc 00000000`000007c0
00000000`0f6af8a0  00000000`00000000 00000000`00000000

0:002> !handle dbc ff
Handle 0000000000000dbc
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,
Impersonate,DirectImpersonate
  HandleCount   2
  PointerCount  4
  Name          <none>
  Object specific information
    Thread Id   278c.24ac
    Priority    14
    Base Priority 0

0:002> !handle 7c0 ff
Handle 00000000000007c0
  Type          Thread
  Attributes    0
  GrantedAccess 0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,
Impersonate,DirectImpersonate
  HandleCount   2
  PointerCount  4
  Name          <none>
  Object specific information
    Thread Id   278c.628
    Priority    14
    Base Priority 0

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 75)

November 9th, 2013

When analysing Inter-Correlation or Intra-Correlation and finding Discontinuities in one part or in a different trace (for example, in client-server environments) it is useful to see if there are corresponding Correlated Discontinuities in another part of the same trace (for example, a different Thread of Activity) or in a different trace. Such pattern may point to the underlying communication problem and may suggest to gather a different trace (for example, a network trace) for further analysis.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 201)

November 4th, 2013

Sometimes there are similar crashes in multiplatform products where only some potion of Crash Signature is similar. We call such a pattern Crash Signature Invariant, for example:

x86: cmp dword ptr [eax], 1
x64: cmp dword ptr [r10]. 1

One crash dump had the following condensed stack trace: 

0: kd> kc
DriverA
win32k!DrvSetMonitorPowerState
win32k!xxxSysCommand
win32k!xxxRealDefWindowProc
win32k!NtUserfnNCDESTROY
win32k!NtUserMessageCall
nt!KiSystemServiceCopyEnd

with the following faulting instruction:

DriverA+0x1234:
cmp     dword ptr [r11],1 ds:002b:00000000`00000000=????????

A search for DriverA led to this x86 crash analysed some time ago:

0: kd> kc
DriverA
nt!IopfCallDriver
win32k!GreDeviceIoControl
win32k!DrvSetMonitorPowerState
win32k!xxxSysCommand
win32k!xxxRealDefWindowProc
win32k!xxxWrapRealDefWindowProc
win32k!NtUserfnNCDESTROY
win32k!NtUserMessageCall
nt!KiSystemServicePostCall

0: kd> r
DtiverA+0x1423:
cmp     dword ptr [ecx],1    ds:0023:00000000=????????

We see common function names on both stack traces and overall flow is the same (only 3 functions are omitted in x64 trace); we see the same NULL pointer dereference for the same comparison instruction with the same comparison operand, #1.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -