Archive for the ‘Data Science’ Category

Trace Analysis Patterns (Part 207)

Monday, April 26th, 2021

Trace Schema can be represented as Schema Trace or, avoiding naming confusion, Definition Trace. The resulting trace looses ordering (similar to unordered Message Set) but allows application of trace and log analysis patterns, especially if some order is fixed, for example, alphabetical for names or original presentation column arrangement. Schema definition Trace Schema can be represented as another Definition Trace as illustrated in the following diagram:

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 206)

Sunday, April 11th, 2021

Most of trace and log analysis pattern illustrations using Dia|gram language are of these two general forms:

Although the first form represents typical ETW trace attributes, the analysis pattern descriptions are usually independent of attribute name semantics. It, therefore, makes sense to generalize such forms into the following Trace Schema forms, with ATIDs for Adjoint Threads of Activity for the first form, and with FIDs for Features of Activity for the second form:

Such Trace Schemas are useful for various trace and log joins other than Trace Mask.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 205)

Sunday, April 4th, 2021

When looking at trace and log messages we are usually interested in some features (for example, when doing feature engineering, but not limited to) which can be labelled via Feature IDs (FID). Messages that have the same FID value constitute Feature of Activity, similar to Thread of Activity (or Adjoint Thread of Activity).

Such Features of Activity can span several (A)TIDs in contrast to Fibers of Activity which are confined to the same (A)TID and may have different FID values. Therefore, inside (A)TID there can be several Features of Activity having different FID values.

This analysis pattern serves as a base for other data science analysis patterns we add next.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 275)

Wednesday, March 3rd, 2021

If we have Step Dumps or Evental Dumps or simply some different memory dumps, for example, from Fiber Bundle and Orbifold memory spaces, we may run debugger commands across them. Then we can track changes in their output like we did in Stack Trace Change analysis pattern. We call the generalization of the latter pattern Structure Sheaf by analogy with structure sheaves of ringed spaces in mathematics. Here we metaphorically treat sequences of debugger commands applied to memory areas (memory structures) as rings of functions on open subsets. We originally wanted to call this analysis pattern Stack Trace (command) for one command and Stack Trace Collection (commands) for a set of commands but realized that the stack trace analogy here makes sense only for sequential memory dumps ordered in time and not for memory dumps taken from different sources.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 260)

Friday, September 27th, 2019

Manual analysis of Execution Residue in stack regions can be quite daunting so some basic statistics on the distribution of address, value, and symbolic information can be useful. To automate this process we use Pandas python library and interpret preprocessed WinDbg output of dps and dpp commands as DataFrame object:

import pandas as pd
import pandas_profiling

df = pd.read_csv("stack.csv")
html_file = open("stack.html", "w")
html_file.write (pandas_profiling.ProfileReport(df).to_html())

We get a rudimentary profile report: stack.html for stack.csv file. The same was also done for Address, Address/Value, Value, Symbol output of dpp command: stack4columns.html for stack4columns.csv file.

We call this analysis pattern Region Profile since any Memory Region can be used. This analysis pattern is not limited to Python/Pandas and different libraries/scripts/script languages can also be used for statistical and correlational profiling. We plan to provide more examples here in the future.

- Dmitry Vostokov @ + -