Artificial Chemistry Approach to Software Trace and Log Analysis

In the past we proposed two metaphors regarding software trace and log analysis patterns (we abbreviate them as TAP):

  • TAP as “genes” of software structure and behavior.
  • Logs as “proteins” generated by code with TAP as patterns of “protein” structure.

We now introduce a third metaphor with strong modeling and implementation potential we are currently working on: Artificial Chemistry (AC) approach* where logs are “DNA” and log analysis is a set of reactions between logs and TAP which are individual “molecules”.

In addition to trace and logs as “macro-molecules”, we also have different molecule families of general patterns (P) and concrete patterns (C). General patterns, general analysis (L) and concrete analysis (A) patterns are also molecules (that may also be composed of patterns and analysis patterns) that may serve the role of enzymes. Here we follow the division of patterns into four types. During the reaction, a trace T is usually transformed into T’ (having a different “energy”) molecule (with a marked site to necessitate further elastic collisions to avoid duplicate analysis).

T + Pi -> T’ + Pi + Ck
T + Ci -> T’ + 2Ci
T + Li -> T’ + Li + Ck
T + Li -> T’ + Li + Ak
T + Ai -> T’ + Ai + Ak
Ci + Ck -> Ci-Ck
Cl + Cm -> Ck
Ai + Ak -> Ai-Ak
T + Ai-Ak -> T’ + Ai-Ak + Ci

Different reactions can be dynamically specified according to a reactor algorithm. The following diagram shows a few elementary reactions:

Concentrations of patterns (reaction educts) increases the chances of producing reaction products according to corresponding reaction "mass action". We can also introduce pattern consuming reactions such as T + Li -> T' + Ck but this requires the constant supply of analysis pattern molecules. Intermediate molecules may react with a log as well and be a part of analysis construction (second order trace and log).

Since traces and logs can be enormous, such reactions can occur randomly according to the Brownian motion of molecules. The reactor algorithm can also use Trace Sharding.

Some reactions may catalyze log transformation into a secondary structure with certain TAP molecules now binding to log sites. Alternatively, we can use different types of reactors, for example, well stirred or topologically arranged. We visualize a reactor for the reactions shown in the diagram above:

We can also add reactions that split and concatenate traces based on collision with certain patterns and reactions between different logs.

Many AC reactions are unpredictable and may uncover emergent novelty that can be missed during the traditional pattern matching and rule-based techniques.

The AC approach also allows simulations of various pattern and reaction sets independently of concrete traces and logs to find the best analysis approaches.

In addition to software trace and log analysis of traditional software execution artifacts, the same AC approach can be applied to malware analysis, network trace analysis and pattern-oriented software data analysis in general.

* Artificial Chemistries by Wolfgang Banzhaf and Lidia Yamamoto (ISBN: 978-0262029438)