Trace Analysis Patterns (Part 124)

April 30th, 2016

Trace Extension is an obvious log analysis pattern that is about trace messages that refer to some other trace or log that may or may not exist. Sometimes, there can be instructions to enable additional tracing that is not possible to cover by the current trace source. We have seen this in some trace statements from .NET Exception Stack Traces.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 123)

April 27th, 2016

The trace statements in source code can be considered as Declarative Trace by analogy with variable declaration and definition in programming languages such as C and C++. Declaration of the variable doesn’t mean that the variable will be actually used. Some declared variables such as arrays will actually expand in memory when used (as in .bss sections). The same is with trace messages from Declarative Trace. Some of them will not appear in the actual software execution trace and some will be repeated because of loops and multiple code reentrance. However, Declarative Traces are useful for studying the possibilities of tracing and logging design, implementation, and coverage (for example, Sparse Trace). Some trace analysis patterns are also applicable for Declarative Traces such as Message Sets and Bifurcation Points (among different source code versions). This is illustrated in the following picture:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 238)

March 17th, 2016

Sometimes developers introduce their own variants of synchronization code instead of using synchronization API provided by language runtime and OS. If we are lucky we can spot it in function and class method names and then use Constant Subtrace analysis pattern:

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it
# Call Site
00 nt!KiSwapContext
01 nt!KiCommitThreadWait
02 nt!KeWaitForSingleObject
03 nt!NtWaitForSingleObject
04 nt!KiSystemServiceCopyEnd
05 ntdll!ZwWaitForSingleObject
06 KERNELBASE!WaitForSingleObjectEx
07 wbemcomn!CWbemCriticalSection::Enter
08 wbemcore!EnsureInitialized
09 wbemcore!InitAndWaitForClient
0a wbemcore!CWbemLevel1Login::ConnectorLogin
0b wbemcore!CWbemLevel1Login::NTLMLogin
0c RPCRT4!Invoke
0d RPCRT4!NdrStubCall2
0e ole32!CStdStubBuffer_Invoke
0f ole32!SyncStubInvoke
10 ole32!StubInvoke
11 ole32!CCtxComChnl::ContextInvoke
12 ole32!AppInvoke
13 ole32!ComInvokeWithLockAndIPID
14 ole32!ThreadInvoke
15 RPCRT4!DispatchToStubInCNoAvrf
16 RPCRT4!RPC_INTERFACE::DispatchToStubWorker
17 RPCRT4!RPC_INTERFACE::DispatchToStub
18 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
19 RPCRT4!LRPC_SCALL::DispatchRequest
1a RPCRT4!LRPC_SCALL::HandleRequest
1b RPCRT4!LRPC_SASSOCIATION::HandleRequest
1c RPCRT4!LRPC_ADDRESS::HandleRequest
1d RPCRT4!LRPC_ADDRESS::ProcessIO
1e RPCRT4!LrpcIoComplete
1f ntdll!TppAlpcpExecuteCallback
20 ntdll!TppWorkerThread
21 kernel32!BaseThreadInitThunk
22 ntdll!RtlUserThreadStart

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it
# Call Site
00 repdrvfs!SCachePage::operator=
01 repdrvfs!std::vector<scachepage,wbem_allocator<scachepage> >::erase
02 repdrvfs!CPageCache::Read
03 repdrvfs!CPageFile::GetPage
04 repdrvfs!ValidateBTreeAgainstObjHeap
05 repdrvfs!PerformAllValidations
06 repdrvfs!VerifyRepositoryOnline
07 repdrvfs!VerifyRepository
08 repdrvfs!CPageSource::Startup
09 repdrvfs!CPageSource::Init
0a repdrvfs!CFileCache::InnerInitialize
0b repdrvfs!CFileCache::Initialize
0c repdrvfs!CRepository::Initialize
0d repdrvfs!CRepository::Logon
0e wbemcore!CRepository::Init
0f wbemcore!InitSubsystems
10 wbemcore!ConfigMgr::InitSystem
11 wbemcore!EnsureInitialized
12 wbemcore!InitAndWaitForClient
13 wbemcore!CWbemLevel1Login::ConnectorLogin
14 wbemcore!CWbemLevel1Login::NTLMLogin
15 RPCRT4!Invoke
16 RPCRT4!NdrStubCall2
17 ole32!CStdStubBuffer_Invoke
18 ole32!SyncStubInvoke
19 ole32!StubInvoke
1a ole32!CCtxComChnl::ContextInvoke
1b ole32!AppInvoke
1c ole32!ComInvokeWithLockAndIPID
1d ole32!ThreadInvoke
1e RPCRT4!DispatchToStubInCNoAvrf
1f RPCRT4!RPC_INTERFACE::DispatchToStubWorker
20 RPCRT4!RPC_INTERFACE::DispatchToStub
21 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
22 RPCRT4!LRPC_SCALL::DispatchRequest
23 RPCRT4!LRPC_SCALL::HandleRequest
24 RPCRT4!LRPC_SASSOCIATION::HandleRequest
25 RPCRT4!LRPC_ADDRESS::HandleRequest
26 RPCRT4!LRPC_ADDRESS::ProcessIO
27 RPCRT4!LrpcIoComplete
28 ntdll!TppAlpcpExecuteCallback
29 ntdll!TppWorkerThread
2a kernel32!BaseThreadInitThunk
2b ntdll!RtlUserThreadStart

These two thread stack traces were spotted from a complete memory dump Stack Trace Collection as the part of a larger ALPC Wait Chain. We switched to these threads using .thread /r /p WinDbg command to get the stripped stack trace via kc command for better illustration. We see Constant Subtrace until wbemcore!EnsureInitialized function which serves as a bifurcation stack frame. The first stack trace has “CriticalSection::Enter” after the bifurcation stack frame compared to the second stack trace which looks like Spiking Thread in user space.

There is no hidden critical section associated with that process except the one which is probably related to the spiking Variable Subtrace since it doesn’t have any LockCount:

0: kd> !cs -l -o -s
DebugInfo = 0x0000000004a774d0
Critical section = 0x000000000308d690 (+0x308D690)
LOCKED
LockCount = 0×0
WaiterWoken = No
OwningThread = 0×0000000000000928
RecursionCount = 0×1
LockSemaphore = 0×0
SpinCount = 0×0000000000000000
OwningThread = .thread fffffa806ebd8060
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled

We can also disassemble wbemcomn!CWbemCriticalSection::Enter and find out that it calls WaitForSingleObject once and no other synchronization API indeed:

0: kd> uf wbemcomn!CWbemCriticalSection::Enter
[...]
wbemcomn!CWbemCriticalSection::Enter+0x41:
000007fe`f78ad1c0 mov rcx,qword ptr [rbx+10h]
000007fe`f78ad1c4 mov edx,r12d
000007fe`f78ad1c7 call qword ptr [wbemcomn!_imp_WaitForSingleObject (000007fe`f78ee4f0)]
000007fe`f78ad1cd cmp eax,esi
000007fe`f78ad1cf je wbemcomn!CWbemCriticalSection::Enter+0x52 (000007fe`f78a62a3) Branch

We add this nonstandard synchronization memory analysis pattern to Wait Chain analysis pattern category.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 237)

March 17th, 2016

Variable Subtrace analysis pattern was introduced for inter-correlational analysis of CPU spikes across memory snapshots with just one thread involved. In contrast, we found Constant Subtrace pattern useful in Wait Chain analysis involving several threads in just one memory snapshot (intra-correlational analysis). Here a constant subtrace groups stack traces from Stack Trace Collection with a bifurcation stack trace frame (similar to Bifurcation Point trace analysis pattern) providing some wait chain relationship hint. Such subtraces traces may be initially found by the preceding wait chain analysis or by technology-specific subtraces such as ALPC/RPC server thread frames (as seen in an example stack from COM interface invocation). Here is a minimal stack trace diagram (similar to minimal trace graphs introduced in Accelerated Windows Software Trace Analysis training) illustrating the pattern (it also shows Spiking Thread pattern in user space as seen from a complete memory dump):

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 122)

March 12th, 2016

Data Selector is a variant of Inter-Correlation trace analysis pattern where we use data found in one trace to select Message Set or Adjoint Thread of Activity in another trace. This analysis activity is depicted in the following picture where we have a client log and corresponding server log. In the server log we have log entries for many client sessions. To select messages corresponding to our client session we use some data attribute in the client trace, for example, the user name, and Linked Messages analysis pattern to find one of the messages in the server log that contains the same user name. Then we find out which user session it belongs to and form its Adjoint Thread:

This pattern is different from Identification Messages where we don’t even know the object that emitted trace messages. In Data Selector case we know in principle what kind of messages we are looking for. We just need to select among many alternatives.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 236)

February 13th, 2016

When we have a performance issue we may request a set of consecutive memory dump saved after some interval. In such memory dumps we may see the same thread(s) having similar stack trace(s). In this simple diagnostic scenario we may diagnose several patterns based on the stack traces: Active Threads that can be Spiking Threads with Spike Intervals or stable, not changing, Wait Chains. Here we may easily identify Top active and Blocking modules based on Module Wait Chain.

More complex case arises when we have different Active Threads and/or Wait Chains with different thread IDs at different times. However, if their Top Module is the same we may have found it as a performance root cause component especially in the case of Active Threads since it is statistically probable that such threads were active for considerable time deltas around the snapshot times (since threads are usually waiting). Such hypothesis may also be confirmed by Inter-Correlation analysis with software traces and logs where we can see Thread of Activity Discontinuities and Time Deltas.

We call this analysis pattern Diachronic Module since we see the module component appears in different thread stack traces diachronically (at different times). The typical simplified scenario is illustrated in this diagram:

This analysis pattern is different from synchronous module case (the module component appears in different thread stack traces at the same time) which was named Ubiquitous Component.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 121)

January 30th, 2016

Often, we need to identify the source of messages based on problem object or subsystem description (what question) before we proceed answering where question (where in the trace we can find messages related to the problem). Even when we know where are messages there can be many sources to select from (if we don’t know the where question we can use Indirect Message analysis pattern). To answer what question we propose Identification Messages analysis pattern. Basic Fact problem description may include properties and behavioural description of the problem object or subsystem. Based on that we can map them to the log messages that such an object can produce:

These messages may not be Error Messages or some other type of messages reflecting abnormal behavior. These messages are only used to identify the software object, module or subsystem.

For example, in one case there were problems with the custom status bar. However, the window handle for it or its parent wasn’t specified in the problem report. In the log file we had a lot of messages describing GUI behavior of many windows. To find out the status bar we thought that it should have small height but long width. Indeed we found one such child window. In addition, for this window the log file contained many messages related to frequent window text changes, possibly reflecting the status bar updates. Having identified the window handle, we proceeded to the analysis of another log with thousands of window messages. Because of the known window handle we were able to select only messages pertaining to our problem status bar.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 120)

January 16th, 2016

When we have very large traces and Basic Facts containing some data values such as a user name, device name, or registry key value we may use Data Interval analysis pattern to select the trace fragment for the initial log analysis. The first and the last trace messages containing selected data for the closed Data Interval. Depending on the trace size and other considerations we can also choose open Data Intervals. It is illustrated in the following diagram where we use Analysis interval notation borrowed mathematics:

Interval boundary messages may also be used as Trace Mask for another trace.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 36, Linux)

December 19th, 2015

This is a Linux variant of Local Buffer Overflow pattern previously described for Mac OS X and Windows platforms. Most of the time simple mistakes in using memory and string manipulation functions are easily detected by runtime. The more sophisticated example which overwrites stack trace without being detected involves overwriting indirectly via a pointer to a local buffer passed to the called function. In such cases we might see incorrect and truncated stack traces:

(gdb) bt
#0  0×0000000000000000 in ?? ()
#1  0×0000000000000000 in ?? ()

(gdb) x/100a $rsp
[...]
0x7fc3dd9dece8: 0x0 0x0
0x7fc3dd9decf8: 0x0 0x0
0x7fc3dd9ded08: 0x0 0x0
0x7fc3dd9ded18: 0x0 0x0
0x7fc3dd9ded28: 0×7fc3dd9ded48 0×4005cc <procA+40>
0×7fc3dd9ded38: 0×422077654e20794d 0×7542207265676769
0×7fc3dd9ded480×72656666 0×0
0×7fc3dd9ded58: 0×0 0×0
0×7fc3dd9ded68: 0×0 0×0
0×7fc3dd9ded78: 0×0 0×0
[…]

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 16b, Linux)

December 19th, 2015

This is a Linux variant of Stack Overflow (user mode) pattern previously described for Mac OS X and Windows platforms:

(gdb) bt
#0  0x00000000004004fb in procF ()
#1  0x000000000040054b in procF ()
#2  0x000000000040054b in procF ()
#3  0x000000000040054b in procF ()
#4  0x000000000040054b in procF ()
#5  0x000000000040054b in procF ()
#6  0x000000000040054b in procF ()
#7  0x000000000040054b in procF ()
#8  0x000000000040054b in procF ()
#9  0x000000000040054b in procF ()
#10 0x000000000040054b in procF ()
#11 0x000000000040054b in procF ()
#12 0x000000000040054b in procF ()
#13 0x000000000040054b in procF ()
#14 0x000000000040054b in procF ()
#15 0x000000000040054b in procF ()
#16 0x000000000040054b in procF ()
[...]

(gdb) bt -10
#15409 0x000000000040054b in procF ()
#15410 0x000000000040054b in procF ()
#15411 0x000000000040054b in procF ()
#15412 0x000000000040055b in procE ()
#15413 0x0000000000400575 in bar_one ()
#15414 0x0000000000400585 in foo_one ()
#15415 0x000000000040059d in thread_one ()
#15416 0x0000000000401690 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#15417 0x0000000000432549 in clone ()
#15418 0x0000000000000000 in ?? ()

In case of a stack overflow the stack pointer is decremented beyond the stack region boundary into an non-accessible region so any stack memory access triggers an access violation:

(gdb) x $rsp
0×7eff46109ec0: 0×0

(gdb) frame 1
#1  0x000000000040054b in procF ()

(gdb) x $rsp
0×7eff4610a0e0: 0×0

(gdb) maintenance info sections
[...]
Core file:
[...]
0×7eff46109000->0×7eff4610a000 at 0×02034000: load13 ALLOC LOAD READONLY HAS_CONTENTS
0×7eff4610a000->0×7eff4690a000 at 0×02035000: load14 ALLOC LOAD HAS_CONTENTS
[…]

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 24, Linux)

December 19th, 2015

This is a Linux variant of Coincidental Symbolic Information pattern previously described for Mac OS X and Windows platforms. The idea is the same: to disassemble the address to see if the preceding instruction is a call. If it is indeed then most likely the symbolic address is a return address from past Execution Residue:

(gdb) x/i 0x4005e6
0x4005e6 <_Z6work_3v+9>: pop    %rbp

(gdb) disassemble 0x4005e6
Dump of assembler code for function _Z6work_3v:
0x00000000004005dd <+0>: push   %rbp
0x00000000004005de <+1>: mov    %rsp,%rbp
0x00000000004005e1 <+4>: callq  0×4005d2 <_Z6work_4v>
0×00000000004005e6 <+9>: pop    %rbp
0×00000000004005e7 <+10>: retq
End of assembler dump.

(gdb) x/4i 0x49c740-4
0x49c73c: add    %al,(%rax)
0x49c73e: add    %al,(%rax)
0×49c740 <default_attr>: add    %al,(%rax)
0×49c742 <default_attr+2>: add    %al,(%rax)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 60, Linux)

December 19th, 2015

This is a Linux variant of Execution Residue pattern previously described for Mac OS X and Windows platforms. This is symbolic information left in a stack region including ASCII and UNICODE fragments or pointers to them, for example, return addresses from past function calls:

(gdb) bt
#0  0x00000000004431f1 in nanosleep ()
#1  0x00000000004430c0 in sleep ()
#2  0x0000000000400771 in procNE() ()
#3  0x00000000004007aa in bar_two() ()
#4  0x00000000004007b5 in foo_two() ()
#5  0x00000000004007c8 in thread_two(void*) ()
#6  0x00000000004140f0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#7  0x0000000000445879 in clone ()
#8  0x0000000000000000 in ?? ()

(gdb) x/512a $rsp-2000
0x7f4cacc42360: 0x0 0x0
0x7f4cacc42370: 0x0 0x0
0x7f4cacc42380: 0x0 0x0
0x7f4cacc42390: 0x0 0x0
[...]
0x7f4cacc42830: 0x0 0x0
0x7f4cacc42840: 0x0 0x0
0x7f4cacc42850: 0x0 0x0
0x7f4cacc42860: 0x7f4cacc42870 0×4005af <_Z6work_8v+9>
0×7f4cacc42870: 0×7f4cacc42880 0×4005ba <_Z6work_7v+9>
0×7f4cacc42880: 0×7f4cacc42890 0×4005c5 <_Z6work_6v+9>
0×7f4cacc42890: 0×7f4cacc428a0 0×4005d0 <_Z6work_5v+9>
0×7f4cacc428a0: 0×7f4cacc428b0 0×4005db <_Z6work_4v+9>
0×7f4cacc428b0: 0×7f4cacc428c0 0×4005e6 <_Z6work_3v+9>
0×7f4cacc428c0: 0×7f4cacc428d0 0×4005f1 <_Z6work_2v+9>
0×7f4cacc428d0: 0×7f4cacc428e0 0×4005fc <_Z6work_1v+9>
0×7f4cacc428e0: 0×7f4cacc42cf0 0×40060e <_Z4workv+16>
0×7f4cacc428f0: 0×0 0×0
0×7f4cacc42900: 0×0 0×0
0×7f4cacc42910: 0×0 0×0
[…]
0×7f4cacc42af0: 0×0 0×0
0×7f4cacc42b00: 0×0 0×0
0×7f4cacc42b10: 0×0 0×0
0×7f4cacc42b20: 0×0 0×4431e6 <nanosleep+38>
0×7f4cacc42b30: 0×0 0×4430c0 <sleep+224>
0×7f4cacc42b40: 0×0 0×0
0×7f4cacc42b50: 0×0 0×0
0×7f4cacc42b60: 0×0 0×0
0×7f4cacc42b70: 0×0 0×0
[…]
0×7f4cacc42cb0: 0×0 0×0
0×7f4cacc42cc0: 0×0 0×0
0×7f4cacc42cd0: 0×0 0×0
0×7f4cacc42ce0: 0xfffffed2 0×3ad3affa
0×7f4cacc42cf0: 0×7f4cacc42d00 0×0
0×7f4cacc42d00: 0×7f4cacc42d20 0×49c740 <default_attr>
0×7f4cacc42d10: 0×7f4cacc439c0 0×400771 <_Z6procNEv+19>
0×7f4cacc42d20: 0×7f4cacc42d30 0×4007aa <_Z7bar_twov+9>
0×7f4cacc42d30: 0×7f4cacc42d40 0×4007b5 <_Z7foo_twov+9>
0×7f4cacc42d40: 0×7f4cacc42d60 0×4007c8 <_Z10thread_twoPv+17>
0×7f4cacc42d50: 0×0 0×0
0×7f4cacc42d60: 0×0 0×4140f0 <start_thread+208>
0×7f4cacc42d70: 0×0 0×7f4cacc43700
0×7f4cacc42d80: 0×0 0×0
0×7f4cacc42d90: 0×0 0×0
[…]

However, supposed return addresses need to be checked for Coincidental Symbolic Information.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 2, Linux)

December 19th, 2015

This is a Linux variant of Dynamic Memory Corruption (process heap) pattern previously described for Mac OS X and Windows platforms.

The corruption may be internal for heap structures with subsequent memory access violation:

(gdb) bt
#0  0×000000000041482e in _int_malloc ()
#1  0×0000000000416d88 in malloc ()
#2  0×00000000004005dc in proc ()
#3  0×00000000004006ee in bar_three ()
#4  0×00000000004006fe in foo_three ()
#5  0×0000000000400716 in thread_three ()
#6  0×0000000000401760 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#7  0×0000000000432609 in clone ()
#8  0×0000000000000000 in ?? ()

(gdb) x/i $rip
=> 0x41482e <_int_malloc+622>: mov    %rbx,0×10(%r12)

(gdb) x $r12+0x10
0x21687371: Cannot access memory at address 0x21687371

(gdb) p (char[4])0x21687371
$1 = "qsh!"

Or it may be detected with a diagnostic message (similar to double free):

(gdb) bt
#0  0×000000000043ef65 in raise ()
#1  0×0000000000409fc0 in abort ()
#2  0×000000000040bf5b in __libc_message ()
#3  0×0000000000412042 in malloc_printerr ()

#4  0×0000000000416c27 in free ()
#5  0×0000000000400586 in proc ()
#6  0×000000000040067e in bar_four ()
#7  0×000000000040068e in foo_four ()
#8  0×00000000004006a6 in thread_four ()
#9  0×00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#10 0×0000000000432589 in clone ()
#11 0×0000000000000000 in ?? ()

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 235, Linux)

December 18th, 2015

We first introduced Critical Region pattern in Accelerated Mac OS X Core Dump Analysis training but didn’t submit the pattern itself to the catalog at that time.

A critical region is usually a region of code protected by synchronization objects such as critical sections and mutexes. However, Critical Region analysis pattern is about identifying code regions “sandwiched” between contending function calls (which may or may not involve synchronization objects and corresponding synchronization calls such as identified in Contention patterns), and then identifying any possible shared data referenced by such code regions:

(gdb) thread apply all bt

Thread 6 (Thread 0x7f2665377700 (LWP 17000)):
#0  0x00000000004151a1 in _int_malloc ()
#1  0x0000000000416cf8 in malloc ()
#2  0x00000000004005a4 in proc ()
#3  0x0000000000400604 in bar_two ()
#4  0x0000000000400614 in foo_two ()
#5  0x000000000040062c in thread_two ()
#6  0x00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#7  0x0000000000432589 in clone ()
#8  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f2664b76700 (LWP 17001)):
#0  __lll_unlock_wake_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:343
#1  0×000000000041886d in _L_unlock_9670 ()
#2  0×0000000000416d22 in malloc ()
#3  0×00000000004005a4 in proc ()

#4  0×0000000000400641 in bar_three ()
#5  0×0000000000400651 in foo_three ()
#6  0×0000000000400669 in thread_three ()
#7  0×00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#8  0×0000000000432589 in clone ()
#9  0×0000000000000000 in ?? ()

Thread 4 (Thread 0x7f2665b78700 (LWP 16999)):
#0  __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0×0000000000418836 in _L_lock_9558 ()
#2  0×0000000000416c1c in free ()
#3  0×0000000000400586 in proc ()

#4  0×00000000004005c7 in bar_one ()
#5  0×00000000004005d7 in foo_one ()
#6  0×00000000004005ef in thread_one ()
#7  0×00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#8  0×0000000000432589 in clone ()
#9  0×0000000000000000 in ?? ()

Thread 3 (Thread 0x1ab1860 (LWP 16998)):
#0  0x000000000042fed1 in nanosleep ()
#1  0x000000000042fda0 in sleep ()
#2  0x000000000040078a in main ()

Thread 2 (Thread 0x7f2663b74700 (LWP 17003)):
#0  __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x0000000000418836 in _L_lock_9558 ()
#2  0x0000000000416c1c in free ()
#3  0x0000000000400586 in proc ()
#4  0x00000000004006bb in bar_five ()
#5  0x00000000004006cb in foo_five ()
#6  0x00000000004006e3 in thread_five ()
#7  0x00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#8  0x0000000000432589 in clone ()
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f2664375700 (LWP 17002)):
#0  0x000000000043ef65 in raise ()
#1  0x0000000000409fc0 in abort ()
#2  0x000000000040bf5b in __libc_message ()
#3  0x0000000000412042 in malloc_printerr ()
#4  0x0000000000416c27 in free ()
#5  0x0000000000400586 in proc ()
#6  0x000000000040067e in bar_four ()
#7  0x000000000040068e in foo_four ()
#8  0x00000000004006a6 in thread_four ()
#9  0x00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#10 0x0000000000432589 in clone ()
#11 0x0000000000000000 in ?? ()

From threads #4 and #5 we can identify one such a region with a shared buffer 0×6b8fc0 which may further point to heap entries.

(gdb) disassemble proc
Dump of assembler code for function proc:
0x00000000004004f0 <+0>: push   %rbp
0x00000000004004f1 <+1>: mov    %rsp,%rbp
0x00000000004004f4 <+4>: push   %rbx
0x00000000004004f5 <+5>: sub    $0x18,%rsp
0x00000000004004f9 <+9>: callq  0x40ac70 <rand>
0x00000000004004fe <+14>: mov    %eax,%ecx
0x0000000000400500 <+16>: mov    $0x68db8bad,%edx
0x0000000000400505 <+21>: mov    %ecx,%eax
0x0000000000400507 <+23>: imul   %edx
0x0000000000400509 <+25>: sar    $0xc,%edx
0x000000000040050c <+28>: mov    %ecx,%eax
0x000000000040050e <+30>: sar    $0x1f,%eax
0x0000000000400511 <+33>: mov    %edx,%ebx
0x0000000000400513 <+35>: sub    %eax,%ebx
0x0000000000400515 <+37>: mov    %ebx,%eax
0x0000000000400517 <+39>: mov    %eax,-0x14(%rbp)
0x000000000040051a <+42>: mov    -0x14(%rbp),%eax
0x000000000040051d <+45>: imul   $0x2710,%eax,%eax
0x0000000000400523 <+51>: mov    %ecx,%edx
0x0000000000400525 <+53>: sub    %eax,%edx
0x0000000000400527 <+55>: mov    %edx,%eax
0x0000000000400529 <+57>: mov    %eax,-0x14(%rbp)
0x000000000040052c <+60>: callq  0x40ac70 <rand>
0x0000000000400531 <+65>: mov    %eax,%ecx
0x0000000000400533 <+67>: mov    $0x68db8bad,%edx
0x0000000000400538 <+72>: mov    %ecx,%eax
0x000000000040053a <+74>: imul   %edx
0x000000000040053c <+76>: sar    $0xc,%edx
0x000000000040053f <+79>: mov    %ecx,%eax
0x0000000000400541 <+81>: sar    $0x1f,%eax
0x0000000000400544 <+84>: mov    %edx,%ebx
0x0000000000400546 <+86>: sub    %eax,%ebx
0x0000000000400548 <+88>: mov    %ebx,%eax
0x000000000040054a <+90>: mov    %eax,-0x18(%rbp)
0x000000000040054d <+93>: mov    -0x18(%rbp),%eax
0x0000000000400550 <+96>: imul   $0x2710,%eax,%eax
0x0000000000400556 <+102>: mov    %ecx,%edx
0x0000000000400558 <+104>: sub    %eax,%edx
0x000000000040055a <+106>: mov    %edx,%eax
0x000000000040055c <+108>: mov    %eax,-0x18(%rbp)
0x000000000040055f <+111>: mov    -0x14(%rbp),%eax
0x0000000000400562 <+114>: cltq
0x0000000000400564 <+116>: mov    0x6b8fc0(,%rax,8),%rax
0x000000000040056c <+124>: test   %rax,%rax
0x000000000040056f <+127>: je     0x400597 <proc+167>
0x0000000000400571 <+129>: mov    -0x14(%rbp),%eax
0x0000000000400574 <+132>: cltq
0x0000000000400576 <+134>: mov    0x6b8fc0(,%rax,8),%rax
0x000000000040057e <+142>: mov    %rax,%rdi
0x0000000000400581 <+145>: callq  0x416bc0 <free>
0×0000000000400586 <+150>: mov    -0×14(%rbp),%eax
0×0000000000400589 <+153>: cltq
0×000000000040058b <+155>: movq   $0×0,0×6b8fc0(,%rax,8)
0×0000000000400597 <+167>: mov    -0×18(%rbp),%eax
0×000000000040059a <+170>: cltq
0×000000000040059c <+172>: mov    %rax,%rdi

0×000000000040059f <+175>: callq  0×416c90 <malloc>
0×00000000004005a4 <+180>: mov    %rax,%rdx
0×00000000004005a7 <+183>: mov    -0×14(%rbp),%eax
0×00000000004005aa <+186>: cltq
0×00000000004005ac <+188>: mov    %rdx,0×6b8fc0(,%rax,8)
0×00000000004005b4 <+196>: jmpq   0×4004f9 <proc+9>
End of assembler dump.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 6b, Linux)

December 18th, 2015

This is a Linux variant of NULL Pointer (data) pattern previously described for Mac OS X and Windows platforms:

(gdb) bt
#0  0×0000000000400500 in procA ()
#1  0×000000000040057a in bar_two ()
#2  0×000000000040058a in foo_two ()
#3  0×00000000004005a2 in thread_two ()
#4  0×0000000000401630 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#5  0×00000000004324e9 in clone ()
#6  0×0000000000000000 in ?? ()

(gdb) x/i 0x400500
=> 0x400500 <procA+16>: movl   $0x1,(%rax)

(gdb) info r $rax
rax            0×0 0

(gdb) x $rax
0×0: Cannot access memory at address 0×0

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 25, Linux)

December 18th, 2015

This is a Linux variant of Stack Trace pattern previously described for Mac OS X and Windows platforms. Here we show a stack trace when debug symbols are not available (stripped executable) and also how to apply debug symbols from the executable where they were preserved:

(gdb) bt
#0 0x000000000043e4f1 in nanosleep ()
#1 0x000000000043e3c0 in sleep ()
#2 0x0000000000400789 in main ()

(gdb) symbol-file ./App/App.debug
Reading symbols from /home/Apps/App/App.debug...done.

(gdb) bt
#0 0x000000000043e4f1 in nanosleep ()
#1 0x000000000043e3c0 in sleep ()
#2 0x0000000000400789 in main (argc=1, argv=0x7fff5d1572d8) at main.cpp:85

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 6a, Linux)

December 18th, 2015

This is a Linux variant of NULL Pointer (code) pattern previously described for Mac OS X and Windows platforms:

(gdb) bt
#0  0×0000000000000000 in ?? ()
#1  0×0000000000400531 in procB ()
#2  0×00000000004005f8 in bar_four ()
#3  0×0000000000400608 in foo_four ()
#4  0×0000000000400620 in thread_four ()
#5  0×0000000000401630 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#6  0×00000000004324e9 in clone ()
#7  0×0000000000000000 in ?? ()

(gdb) disassemble procB
Dump of assembler code for function procB:
0x0000000000400516 <+0>: push   %rbp
0x0000000000400517 <+1>: mov    %rsp,%rbp
0x000000000040051a <+4>: sub    $0x10,%rsp
0x000000000040051e <+8>: movq   $0x0,-0x8(%rbp)
0x0000000000400526 <+16>: mov    -0x8(%rbp),%rdx
0x000000000040052a <+20>: mov    $0x0,%eax
0x000000000040052f <+25>: callq  *%rdx
0×0000000000400531 <+27>: leaveq
0×0000000000400532 <+28>: retq
End of assembler dump.

(gdb) info r rdx
rdx            0×0 0

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 78a, Linux)

December 15th, 2015

This is a Linux variant of Divide by Zero (user mode) pattern previously described for Mac OS X and Windows platforms:

GNU gdb (GDB)
[...]
Program terminated with signal 8, Arithmetic exception.
#0 0×000000000040056f in procD ()

(gdb) x/i $rip
=> 0x40056f <procD+18>: idivl -0×8(%rbp)

(gdb) info r $rax
rax 0x1 1

(gdb) x/w $rbp-0x8
0x7f0f6806bd28: 0×00000000

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 4, Linux)

December 15th, 2015

This is a Linux variant of Lateral Damage pattern previously described for Windows platforms. It also covers memory dumps where some usual commands may not work and we have to find a workaround to simulate their output, for example, by using other commands:

(gdb) info threads
Cannot find new threads: generic error

(gdb) thread apply all bt
Cannot find new threads: generic error

(gdb) thread 2
[Switching to thread 2 (LWP 12567)]
#0 0x000000000042ff51 in nanosleep ()

(gdb) thread 3
[Switching to thread 3 (LWP 12566)]
#0 0x000000000041482e in _int_malloc ()

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 187, Linux)

December 14th, 2015

Here we publish a Linux variant of Active Thread pattern that was previously introduced for Mac OS X and Windows. Basically it is a thread that is not waiting, sleeping, or suspended (most threads are). However, from a memory dump it is not possible to find out whether it was Spiking Thread at the dump generation time (unless we have a set of memory snapshots and in each one we have the same or similar back trace) and we don’t have any Paratext with CPU consumption stats for threads. For example, in one core dump we have this thread:

(gdb) info threads
Id Target Id Frame
6 Thread 0×7f560d467700 (LWP 3483) 0×00000000004324a9 in clone ()
5 Thread 0×7f560c465700 (LWP 3485) 0×000000000042fe31 in nanosleep ()
4 Thread 0×7f560bc64700 (LWP 3486) 0×000000000042fe31 in nanosleep ()
3 Thread 0×7f560b463700 (LWP 3487) 0×000000000042fe31 in nanosleep ()
2 Thread 0×18b9860 (LWP 3482) 0×000000000042fe31 in nanosleep ()
1 Thread 0×7f560cc66700 (LWP 3484) 0×000000000042fe31 in nanosleep ()

Thread #6 is not waiting so we inspect its back trace:

(gdb) thread 6
[Switching to thread 6 (Thread 0x7f560d467700 (LWP 3483))]
#0 0x00000000004324a9 in clone ()

(gdb) bt
#0 0×00000000004324a9 in clone ()
#1 0×0000000000401560 in ?? () at pthread_create.c:217
#2 0×00007f560d467700 in ?? ()
#3 0×0000000000000000 in ?? ()

(gdb) x/i 0x4324a9
=> 0x4324a9 : test %rax,%rax

Perhaps the core dump was saved at the thread creation time.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -