Archive for the ‘Machine Learning’ Category

Trace Analysis Patterns (Part 253)

Sunday, September 14th, 2025

Message Embedding, as a representational technique in ML, are a variant of Trace Field. We can also consider the sequence of Message Embeddings as a trace itself with columns as latent features, forming separate latent Features of Activity. We can also treat these embeddings as sentence embeddings when interpreting traces and logs as Text Traces.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 251)

Friday, March 28th, 2025

Trace Precision and Recall analysis patterns are borrowed from information retrieval and machine learning. In essence, Trace Precision is the fraction of trace messages used for successful diagnostics among all selected trace messages, for example, among CoTrace messages to reduce the noise. Trace Recall is the fraction of trace messages used for successful diagnostics among all relevant messages (these can also be messages not included in the trace, see Trace Bias).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 250)

Tuesday, March 18th, 2025

Traces and logs can be considered models of computation as was first proposed in Trace Bias analysis pattern. Borrowing metaphors from machine learning we can consider trace messages as Trace Parameters having associated values, for example, through Trace Field. These values can be numerical weights or linguistic values, for example, as proposed by Zadeh. Message values can be field values themselves, for example, Counter Values. As was noted in Trace Variance, different tracing environments with their set of “features” can be considered input data. if we use an artificial neural network metaphor, trace analysis itself can be considered as an activation function. CoTraces, as models of trace analysis, may form other “hidden” layers with “backpropagation” updating message weights. Tracing configuration parameters including Trace Schemas can be considered as Trace Hyperparameters. These concepts can be roughly illustrated by the following diagram:

To reduce the number of Trace Parameters we can use Quotient Traces.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 249)

Sunday, March 16th, 2025

In some cases, when we have two trace messages we can infer the missing Implicit Message. This trace augmentation may help when we Sparse Traces and we want to train ML models on them. To illustrate, we asked ChatGPT:

We have a log file with two messages: 1. Reading file test.txt 2. Access denied. There is an implicit message in between.

The implicit message between the two log entries is:

“Attempting to open file ‘test.txt’.”

Reasoning:

  1. “Reading file test.txt” → This suggests that the system or user has issued a command to read the file.
  2. Implicit step (not logged explicitly) → The system attempts to open the file to read its contents.
  3. “Access denied.” → The attempt fails due to permission restrictions or security policies.

Thus, the missing step is the system trying to open or access the file, which leads to the access denial.

Implicit Messages are different from Silent Messages which are padding messages to occupy the minimum resolution gaps.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 296)

Sunday, March 31st, 2024

If we dump contents of memory regions such as thread stacks we get different region memory addresses due to ASLR. To compare such regions from different executions we need to make them Normalized Regions replacing varying address parts with a common symbolic denominator, for example, 0-based addresses (or some other non-digital symbols to account of the possibility of already existing similar 0-based addresses) and adjusting address offsets appropriately. For example, the following raw stack region from Foreign Stack analysis pattern (we do not normalize foreign stack addresses below):

00000055`adbfe000 00000000`00000000
00000055`adbfe008 00000000`00000000
00000055`adbfe010 00000000`00000000
00000055`adbfe018 00000000`00000000
00000055`adbfe020 00000000`00000000

00000055`adbffa58 00000000`00000008
00000055`adbffa60 00000055`adbffb28
00000055`adbffa68 00007ff7`96e116a3 module!func+0×13
00000055`adbffa70 00000055`ad8ffde0
00000055`adbffa78 00000000`00000000
00000055`adbffa80 00000000`00000000
00000055`adbffa88 00007ffa`28ecc9a8 ucrtbase!`string’
00000055`adbffa90 00000000`00000000
00000055`adbffa98 00007ff7`96e116c3 module!thread_proc+0×13
00000055`adbffaa0 00000055`ad8ffde0
00000055`adbffaa8 00000055`adbffb28
00000055`adbffab0 ffffffff`ffffffff
00000055`adbffab8 00007ffa`28fa0748 KERNELBASE!GetAppModelPolicy+0×18
00000055`adbffac0 00000055`adbffba0
00000055`adbffac8 00007ff7`96e12657 module!std::invoke<void (__cdecl*)(int *),int *>+0×27
00000055`adbffad0 00000055`ad8ffde0
00000055`adbffad8 00000055`adbffb00
00000055`adbffae0 00000055`adbffb18

--------`00000000 00000000`00000000
--------`00000008 00000000`00000000
--------`00000010 00000000`00000000
--------`00000018 00000000`00000000
--------`00000020 00000000`00000000

--------`00001a58 00000000`00000008
--------`00001a60 --------`00001b28
--------`00001a68 00007ff7`96e116a3 module!func+0×13
--------`00001a70 00000055`ad8ffde0
--------`00001a78 00000000`00000000
--------`00001a80 00000000`00000000
--------`00001a88 00007ffa`28ecc9a8 ucrtbase!`string’
--------`00001a90 00000000`00000000
--------`00001a98 00007ff7`96e116c3 module!thread_proc+0×13
--------`00001aa0 00000055`ad8ffde0
--------`00001aa8 --------`00001b28
--------`00001ab0 ffffffff`ffffffff
--------`00001ab8 00007ffa`28fa0748 KERNELBASE!GetAppModelPolicy+0×18
--------`00001ac0 --------`00001ba0
--------`00001ac8 00007ff7`96e12657 module!std::invoke<void (__cdecl*)(int *),int *>+0×27
--------`00001ad0 00000055`ad8ffde0
--------`00001ad8 --------`00001b00
--------`00001ae0 --------`00001b18

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 295)

Saturday, March 30th, 2024

Values in memory regions, when interpreted as addresses, such as stack can be classified according to their categories such as zero, module (code and data), heap, stack, Foreign Stack, Small Value, error, and out-of-range (for example, kernel space address in user space). Such classification can be encoded and optionally visualized as Region Spectrum:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 291)

Thursday, March 14th, 2024

Generative AI LLM models such as GPT-4 are very good at producing Annotated Stack Traces. This is similar to Message Annotations trace and log analysis pattern. Below are two examples from ChatGPT:

Explain this stack trace fragment line by line:
06 ffffa784`8ee7db20 fffff804`0516bbf2 nt!IofCallDriver+0x55
07 ffffa784`8ee7db60 fffff804`0516b9d2 nt!IopSynchronousServiceTail+0x1d2
08 ffffa784`8ee7dc10 fffff804`0516ad36 nt!IopXxxControlFile+0xc82
09 ffffa784`8ee7dd40 fffff804`04e28775 nt!NtDeviceIoControlFile+0x56
0a ffffa784`8ee7ddb0 00007ffe`4d263834 nt!KiSystemServiceCopyEnd+0x25
0b 0000009b`cc0fedd8 00007ffe`4a893ffb ntdll!NtDeviceIoControlFile+0x14
0c 0000009b`cc0fede0 00007ffe`4bb95f91 KERNELBASE!DeviceIoControl+0x6b
0d 0000009b`cc0fee50 00007ff7`6544342f KERNEL32!DeviceIoControlImplementation+0x81

This stack trace appears to be from the Windows operating system, showing a series of function calls related to an I/O operation, specifically a device I/O control operation (DeviceIoControl). The stack trace is displayed in reverse chronological order, meaning the most recent call is shown first. Here’s a line-by-line explanation:

Lines 06 to 09 (nt!…): These lines are part of the Windows kernel (ntoskrnl.exe), which is the core of the Windows operating system. The prefix nt! indicates functions within the kernel.

IofCallDriver (line 06): A kernel function that forwards an I/O request packet (IRP) to a driver. The +0×55 indicates the offset within the function where the call occurred.
IopSynchronousServiceTail (line 07): A function that handles synchronous I/O operations. The +0×1d2 offset shows where in the function the operation occurred, indicating further processing of the IRP.
IopXxxControlFile (line 08): Likely a function that processes control operations (like IOCTL calls) on files. The +0xc82 is the offset where the call happened.
NtDeviceIoControlFile (line 09): A system service function that applications can call to execute device I/O control operations. The +0×56 offset indicates the point of execution within the function.
Line 0a (nt!KiSystemServiceCopyEnd): This is part of the system call dispatch mechanism, which is responsible for transitioning from user mode to kernel mode. +0×25 is the offset within this function.

Line 0b (ntdll!NtDeviceIoControlFile): This represents a call made from a user-mode application into the Windows Native API (NTDLL.dll) to perform a device I/O control operation. The +0×14 offset indicates the execution point within the function.

Line 0c (KERNELBASE!DeviceIoControl): This call originates from the KERNELBASE.dll, a Windows DLL that provides the base-level functionality for high-level APIs. The DeviceIoControl function is a wrapper that applications can use to send control codes to drivers. The +0×6b offset indicates where the function proceeds with the operation.

Line 0d (KERNEL32!DeviceIoControlImplementation): This is the implementation of the DeviceIoControl function in KERNEL32.dll, which is another layer of abstraction that eventually calls into the kernel-mode function to perform the operation. The +0×81 offset indicates the execution point within this implementation.

This stack trace illustrates how a device I/O control operation initiated by an application flows through various layers of the Windows operating system, from high-level APIs down to kernel-mode functions, to interact with a device driver. Each step in the process is crucial for understanding how the operating system processes I/O requests, handles system calls, and communicates with hardware.

Explain this stack trace fragment line by line:
00 msvcrt!vsnwprintf
01 wdscore!StringCchPrintfW
02 wdscore!CLogManager::LogA
03 wdscore!WdsSetupLogMessageA
04 cbscore!CBSWdsLog
05 cbscore!CCbsExecutionObject::AddPackage
06 cbscore!CCbsExecutionObject::AddPackage
07 cbscore!CCbsExecutionObject::AddPackage
08 cbscore!CCbsExecutionObject::RecallOrSupersedLowerVersion
09 cbscore!CCbsExecutionObject::ProcessMumServicing
0a cbscore!CCbsExecutionObject::AddPackage
0b cbscore!CCbsExecutionObject::Plan
0c cbscore!CCbsExecutionObject::GenerateExpressFileList
0d cbscore!CCbsSession::ActionListGenerateMissingFileList
0e cbscore!CCbsPublicSession::WritePackageFileList
0f RPCRT4!Invoke
10 RPCRT4!Ndr64StubWorker
11 RPCRT4!NdrStubCall3
12 combase!CStdStubBuffer_Invoke
13 RPCRT4!CStdStubBuffer_Invoke
14 combase!InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_ c9f3956a20c9da92a64affc24fdd69ec>::operator()
15 combase!ObjectMethodExceptionHandlingAction<<lambda_ c9f3956a20c9da92a64affc24fdd69ec> >
16 combase!InvokeStubWithExceptionPolicyAndTracing
17 combase!DefaultStubInvoke
18 combase!SyncServerCall::StubInvoke
19 combase!StubInvoke
1a combase!ServerCall::ContextInvoke
1b combase!CServerChannel::ContextInvoke
1c combase!DefaultInvokeInApartment
1d combase!ComInvokeWithLockAndIPID
1e combase!ThreadInvokeReturnHresult
1f combase!ThreadInvoke
20 RPCRT4!DispatchToStubInCNoAvrf
21 RPCRT4!RPC_INTERFACE::DispatchToStubWorker
22 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
23 RPCRT4!LRPC_SCALL::DispatchRequest
24 RPCRT4!LRPC_SCALL::HandleRequest
25 RPCRT4!LRPC_SASSOCIATION::HandleRequest
26 RPCRT4!LRPC_ADDRESS::HandleRequest
27 RPCRT4!LRPC_ADDRESS::ProcessIO
28 RPCRT4!LrpcIoComplete
29 ntdll!TppAlpcpExecuteCallback
2a ntdll!TppWorkerThread
2b KERNEL32!BaseThreadInitThunk
2c ntdll!RtlUserThreadStart

This stack trace seems to come from a Windows environment, and it illustrates the flow of a specific operation—likely a component-based servicing action (related to Windows Updates or system component management) that involves logging, package addition, and possibly dealing with an RPC (Remote Procedure Call) request. Let’s break down the stack trace line by line:

msvcrt!vsnwprintf: This function is part of the Microsoft Visual C Runtime Library. It’s a variant of snprintf that works with wide (Unicode) strings, used for formatted string operations.

wdscore!StringCchPrintfW: A function likely related to Windows Deployment Services (WDS) for secure string formatting into a buffer. StringCchPrintfW is a safer version of sprintf for wide strings.

wdscore!CLogManager::LogA: A method in the WDS core library for logging messages. This method suggests the operation involves ASCII string logging.

wdscore!WdsSetupLogMessageA: Another logging-related function in WDS, focused on setup log messages.

cbscore!CBSWdsLog: Indicates a function within the Component-Based Servicing (CBS) core library, specifically for logging in the context of WDS.

6-0a. cbscore!CCbsExecutionObject::AddPackage (lines 05 to 0a): Repeated calls to a method that adds a package to the CBS execution object. This could be part of an update or system component management operation.

cbscore!CCbsExecutionObject::RecallOrSupersedLowerVersion: A method that deals with recalling or superseding lower versions of a package or component.

cbscore!CCbsExecutionObject::ProcessMumServicing: A method likely involved in servicing manifest updates (MUM files), which are used in Windows servicing to describe updates.

cbscore!CCbsExecutionObject::Plan: Part of the planning phase in a CBS operation, where the system determines what actions to take with the packages.

cbscore!CCbsExecutionObject::GenerateExpressFileList: Generates a list of files for an express installation, which is a faster update method that only downloads necessary components.

cbscore!CCbsSession::ActionListGenerateMissingFileList: A method to generate a list of missing files based on the planned actions.

cbscore!CCbsPublicSession::WritePackageFileList: Writes out the list of files associated with the packages being serviced.

13-1e. RPCRT4 and combase (lines 0f to 1e): These lines involve the RPC runtime and COM base libraries, indicating that part of the operation was invoked over RPC, a mechanism that allows a program to execute code on another machine or process.

1f-2c. The stack trace concludes with system and thread initialization functions, like TppWorkerThread and BaseThreadInitThunk, which are part of the thread pool and thread startup routines in Windows. These functions are standard for threads executing in the Windows environment.

This trace is a deep dive into how Windows manages system components, specifically around updating or managing packages, and involves various subsystems like WDS, CBS, and RPC communications.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 290)

Wednesday, March 13th, 2024

If we look at memory values in a thread stack region we see they are either come from the stack or some other non-stack region, such as heap, pool, or just some static code or data location. What is quite unusual is when such value belongs to another thread stack:

0:004> ~0e !teb
TEB at 00000055ad63d000
ExceptionList: 0000000000000000
StackBase: 00000055ad900000
StackLimit: 00000055ad8fd000
SubSystemTib: 0000000000000000
FiberData: 0000000000001e00
ArbitraryUserPointer: 0000000000000000
Self: 00000055ad63d000
EnvironmentPointer: 0000000000000000
ClientId: 00000000000035c4 . 0000000000005fa4
RpcHandle: 0000000000000000
Tls Storage: 000001c2e6269540
PEB Address: 00000055ad63c000
LastErrorValue: 0
LastStatusValue: c0000034
Count Owned Locks: 0
HardErrorMode: 0

0:004> ~3e !teb
TEB at 00000055ad643000
ExceptionList: 0000000000000000
StackBase: 00000055adc00000
StackLimit: 00000055adbfe000
SubSystemTib: 0000000000000000
FiberData: 0000000000001e00
ArbitraryUserPointer: 0000000000000000
Self: 00000055ad643000
EnvironmentPointer: 0000000000000000
ClientId: 00000000000035c4 . 0000000000005f1c
RpcHandle: 0000000000000000
Tls Storage: 000001c2e62714b0
PEB Address: 00000055ad63c000
LastErrorValue: 187
LastStatusValue: c000000d
Count Owned Locks: 0
HardErrorMode: 0

0:004> dps 00000055adbfe000 00000055adc00000
00000055`adbfe000 00000000`00000000
00000055`adbfe008 00000000`00000000
00000055`adbfe010 00000000`00000000
00000055`adbfe018 00000000`00000000
00000055`adbfe020 00000000`00000000

00000055`adbffa58 00000000`00000008
00000055`adbffa60 00000055`adbffb28
00000055`adbffa68 00007ff7`96e116a3 module!func+0×13
00000055`adbffa70 00000055`ad8ffde0
00000055`adbffa78 00000000`00000000
00000055`adbffa80 00000000`00000000
00000055`adbffa88 00007ffa`28ecc9a8 ucrtbase!`string’
00000055`adbffa90 00000000`00000000
00000055`adbffa98 00007ff7`96e116c3 module!thread_proc+0×13
00000055`adbffaa0 00000055`ad8ffde0
00000055`adbffaa8 00000055`adbffb28
00000055`adbffab0 ffffffff`ffffffff
00000055`adbffab8 00007ffa`28fa0748 KERNELBASE!GetAppModelPolicy+0×18
00000055`adbffac0 00000055`adbffba0
00000055`adbffac8 00007ff7`96e12657 module!std::invoke<void (__cdecl*)(int *),int *>+0×27
00000055`adbffad0 00000055`ad8ffde0
00000055`adbffad8 00000055`adbffb00
00000055`adbffae0 00000055`adbffb18

Such Foreign Stack references in user space may point to possibly questionable use of pointers to local variables in asynchronous scenarios. In kernel space, !findthreads WinDbg command may find values from the kernel stack of the specified thread address on other thread kernel stacks even from different processes. Such references may also reveal deep process and thread relationships in kernel.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 243)

Sunday, February 18th, 2024

Trace Variance is the measure of variability of traces and logs in different environments. This trace and log analysis pattern metaphor is borrowed from variance in machine learning like Trace Bias. Here, different environments or the same environment at different times are considered as input data.

Other analysis patterns can be used to investigate Trace Variance such as Bifurcation Point, Trace String, and Polytrace.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 242)

Sunday, February 18th, 2024

Trace Bias is the measure of how faithfully traces and logs as a models of computation reflect the actual computation. The term bias has many meanings. This trace and log analysis pattern metaphor is borrowed from model bias in machine learning and statistics. Traces, and therefore, their biases range from Empty Traces to Sparse Traces to traces from Time Travel Debugging tools. How we do tracing, for example, via Declarative Traces or Moduli Traces, have direct impact on Trace Bias.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 286)

Sunday, January 28th, 2024

Sometimes, when we have debugging symbols, information about local variables may be helpful in making sense of function disassembly. For example, we have this code fragment from WinDbg uf command:

511 00007ff6`6ab22a44 mov dword ptr [rbp+2078h],1
511 00007ff6`6ab22a4e mov dword ptr [rbp+207Ch],2
513 00007ff6`6ab22a58 mov eax,dword ptr [rbp+2078h]
513 00007ff6`6ab22a5e mov dword ptr [rbp+0Ch],eax
514 00007ff6`6ab22a61 mov dword ptr [rbp+0Ch],64h
515 00007ff6`6ab22a68 mov dword ptr [rbp+48h],3
515 00007ff6`6ab22a6f mov dword ptr [rbp+4Ch],4
516 00007ff6`6ab22a76 mov eax,dword ptr [rbp+0Ch]

Although source code lines are shown, suppose we don’t have source code to match. However, we can match Address Representations, such as [rbp+xxx], from the output of dv /V WinDbg command:

0:000> dv /V
...
000000ab`740fd00c @rbp+0x000c myBase = struct wmain::__l2::Base
...
000000ab`740ff078 @rbp+0x2078 myDerived = struct wmain::__l2::Derived
...
000000ab`740fd048 @rbp+0x0048 myDerived2 = struct wmain::__l2::Derived
...

Another usage is matching values in raw stack data with local variable addresses. Values as addresses and their symbolic representations here have some connection to ADDR Symbolic and Interpreted Pointers.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 285)

Sunday, January 21st, 2024

Almost 15 years ago we introduced Dereference Fixpoints when the address value is equal to the value at the address. In doing raw stack data classification and pattern matching we may be interested in more general Dereference Nearpoints (especially in position independent ones) illustrated in the following diagram:

Such Dereference Nearpoints may appear due to exception processing when a stack exception address or exception stack pointer address is propagated during exception processing, and multiple structure references, for example, when a local structure address is propagated during function calls.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -