Archive for January, 2015

Trace Analysis Patterns (Part 100)

Saturday, January 31st, 2015

Sometimes we need memory reference information not available in software traces and logs, for example, to see pointer dereferences, to follow pointers and linked structures. In such cases memory dumps saved during logging sessions may help. In case of process memory dumps we can even have several Step Dumps. Complete and kernel memory dumps may be forced after saving a log file. We call such pattern Adjoint Space:

Then we can analyze logs and memory dumps together, for example, to follow pointer data further in memory space:

There is also a reverse situation when we use logs to see past data changes before memory snapshot time (Paratext memory analysis pattern):

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 99)

Saturday, January 24th, 2015

Sometimes specific parts of simultaneous Use Case Trails, blocks of Significant Events or Message Sets in general may overlap. This may point to possible synchronization problems such as race conditions (prognostics) or be visible root causes of them if such problems are reported (diagnostics). We call this pattern Activity Overlap:

For example, a first request may start a new session and we expect the second request to be processed by the same already established session:

However, users report the second session started upon the second request. If we filter execution log by session id and do Intra-Correlational analysis we find out that session initialization prologues are overlapped. The new session started because the first session initialization was not completed:

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 98)

Wednesday, January 7th, 2015

Some Discontinuities may be Periodic as Silent Messages. If such discontinuities belong to the same Thread of Activity and their Time Deltas are constant we may see Timeout pattern. When timeouts are followed by Error Message we can identify them by back tracing. Timeouts are different from Blackouts where the latter are usually Singleton Events and have large time deltas.

Here is a generalized graphical case study. An error message was identified based on incident Basic Facts:

We filtered the trace for error message TID and found 3 timeouts 30 minutes each:

- Dmitry Vostokov @ + -