Archive for the ‘Crash Dump Patterns’ Category

Crash Dump Analysis Patterns (Part 58b)

Sunday, June 5th, 2016

Windows imposes a restriction for the number of GDI handles per process and by default it is approx. 10,000. If this number is reached we can have abnormal software behaviour symptoms such as hangs, glitches in visual appearance, and out-of-memory exceptions resulted in error messages and crashes. We already documented this analysis pattern for kernel and complete memory dumps that we called Handle Limit (GDI). However, one of Software Diagnostics Services’ training customers reported an out-of-memory exception with trace analysis diagnostics pointing to 10,000 leaked GDI objects. The process memory dump was saved and the customer asked whether it was possible to analyze it or similar memory dumps to find out from the dump itself whether we have GDI leak and what GDI objects were involved.

We recreated one of the applications from the kernel pattern variant (the one that leaks fonts) with one modification that it just stops after 10,000 font creation attempts. After launch we tried to open About dialog box but the whole application became unresponsive and no dialog box was visible. We save the process memory dump ad found out that its Main Thread was inside Dialog Box processing:

0:000> kc
# Call Site
00 user32!NtUserWaitMessage
01 user32!DialogBox2
02 user32!InternalDialogBox
03 user32!DialogBoxIndirectParamAorW
04 user32!DialogBoxParamW

05 GUIHandleLeak!WndProc
06 user32!UserCallWinProcCheckWow
07 user32!DispatchMessageWorker
08 GUIHandleLeak!wWinMain
09 GUIHandleLeak!invoke_main
0a GUIHandleLeak!__scrt_common_main_seh
0b kernel32!BaseThreadInitThunk
0c ntdll!RtlUserThreadStart

In order to look at GDI handle table we studied the relevant chapters in Feng Yuan’s book “Windows Graphics Programming” and this post which has all necessary structure definitions: http://stackoverflow.com/questions/13905661/how-to-get-list-of-gdi-handles

We get the current process GDI table address from the disassembly:

0:000> .asm no_code_bytes
Assembly options: no_code_bytes

0:000> uf gdi32!GdiQueryTable
gdi32!GdiQueryTable:
00007ffc`7f172610 sub     rsp,38h
00007ffc`7f172614 or      qword ptr [rsp+20h],0FFFFFFFFFFFFFFFFh
00007ffc`7f17261a lea     rdx,[rsp+20h]
00007ffc`7f17261f mov     ecx,0Eh
00007ffc`7f172624 mov     byte ptr [rsp+28h],0
00007ffc`7f172629 call    qword ptr [gdi32!_imp_NtVdmControl (00007ffc`7f1ba5a8)]
00007ffc`7f17262f test    eax,eax
00007ffc`7f172631 js      gdi32!GdiQueryTable+0x33 (00007ffc`7f172643)  Branch

gdi32!GdiQueryTable+0x23:
00007ffc`7f172633 cmp     byte ptr [rsp+28h],0
00007ffc`7f172638 je      gdi32!GdiQueryTable+0x33 (00007ffc`7f172643)  Branch

gdi32!GdiQueryTable+0x2a:
00007ffc`7f17263a mov     rax,qword ptr [gdi32!pGdiSharedHandleTable (00007ffc`7f2541b8)]
00007ffc`7f172641 jmp     gdi32!GdiQueryTable+0×35 (00007ffc`7f172645)  Branch

gdi32!GdiQueryTable+0x33:
00007ffc`7f172643 xor     eax,eax

gdi32!GdiQueryTable+0x35:
00007ffc`7f172645 add     rsp,38h
00007ffc`7f172649 ret

0:000> dp 00007ffc`7f2541b8 L1
00007ffc`7f2541b8  000000db`56a50000

We dump the first 0×1000 qword values:

0:000> dq 000000db`56a50000 L1000
000000db`56a50000  00000000`00000000 40000000`00000000
000000db`56a50010  00000000`00000000 00000000`00000000
000000db`56a50020  40000000`00000000 00000000`00000000
000000db`56a50030  00000000`00000000 00000000`00000000
000000db`56a50040  00000000`00000000 00000000`00000000
000000db`56a50050  00000000`00000000 00000000`00000000
000000db`56a50060  00000000`00000000 00000000`00000000
000000db`56a50070  00000000`00000000 00000000`00000000
000000db`56a50080  00000000`00000000 00000000`00000000
000000db`56a50090  00000000`00000000 00000000`00000000
000000db`56a500a0  00000000`00000000 00000000`00000000
000000db`56a500b0  00000000`00000000 00000000`00000000
000000db`56a500c0  00000000`00000000 00000000`00000000
000000db`56a500d0  00000000`00000000 00000000`00000000
000000db`56a500e0  00000000`00000000 00000000`00000000
000000db`56a500f0  fffff901`40000e60 40040104`00000000
000000db`56a50100  00000000`00000000 fffff901`40000d60
000000db`56a50110  40080188`00000000 00000000`00000000
000000db`56a50120  fffff901`400008b0 40080108`00000000
000000db`56a50130  00000000`00000000 fffff901`400007c0
000000db`56a50140  40080108`00000000 00000000`00000000
000000db`56a50150  fffff901`400006d0 40080108`00000000
[…]
000000db`56a57ce0 fffff901`429d24f0 400aee0a`00002c30
000000db`56a57cf0 000000db`564f3b20 fffff901`42910570
000000db`56a57d00 400a360a`00002c30 000000db`564e57b0
000000db`56a57d10 fffff901`40700420 40105310`000002b4
000000db`56a57d20 00000089`39410fc0 fffff901`407ec4a0
000000db`56a57d30 400a010a`000002b4 00000089`3900ae70
000000db`56a57d40 fffff901`407036d0 400a010a`000002b4
000000db`56a57d50 00000089`3900ae60 fffff901`440b56e0
000000db`56a57d60 400a030a`00002c30 000000db`564f0e90
000000db`56a57d70 fffff901`43e7fd50 40040104`00000000
000000db`56a57d80 00000000`00000000 fffff901`42c0f010
000000db`56a57d90 400a4b0a`00003190 0000003a`1a30b670
000000db`56a57da0 fffff901`440deaf0 400a6d0a`00002c30
000000db`56a57db0 000000db`564f3b00 fffff901`407f2010
000000db`56a57dc0 40100510`00001704 000000d8`d6230f60
000000db`56a57dd0 fffff901`40714180 40100210`00001704
000000db`56a57de0 000000d8`d6230f48 fffff901`4009d840
000000db`56a57df0 40100210`00001704 000000d8`d6230f78
000000db`56a57e00 fffff901`43e50950 40100230`00001704
000000db`56a57e10 00000000`00000000 fffff901`43e30010
000000db`56a57e20 40100230`00001704 00000000`00000000
000000db`56a57e30 fffff901`44f1d010 4005a105`0000168c
000000db`56a57e40 00000000`00000000 fffff901`440ded80
000000db`56a57e50 400a2e0a`00002c30 000000db`564f3b10
000000db`56a57e60 fffff901`4070b3b0 40050405`00000000
000000db`56a57e70 00000000`00000000 fffff901`42a0a010
000000db`56a57e80 400a870a`00002c30 000000db`564e7160
000000db`56a57e90 fffff901`407a7450 4008cd08`00000000
000000db`56a57ea0 00000000`00000000 fffff901`400c49c0
000000db`56a57eb0 40046904`000002b4 00000089`39410f90
000000db`56a57ec0 fffff901`41fb8010 4005c705`00000000
000000db`56a57ed0 00000000`00000000 fffff901`423dc790
000000db`56a57ee0 40086708`00000000 00000000`00000000
000000db`56a57ef0 fffff901`40699620 40010301`0000168c
000000db`56a57f00 00000000`01100000 fffff901`43e54510
000000db`56a57f10 40050305`0000168c 00000000`00000000
000000db`56a57f20 fffff901`407164c0 40100610`00001448
000000db`56a57f30 000000fe`4d8d0cf0 fffff901`407eee50
000000db`56a57f40 40100410`00001448 000000fe`4d8d0cd8
000000db`56a57f50 fffff901`43e2abb0 40080508`00000000
000000db`56a57f60 00000000`00000000 fffff901`40715010
000000db`56a57f70 40050305`0000168c 00000000`00000000
000000db`56a57f80 fffff901`42872b80 40084e08`00000000
000000db`56a57f90 00000000`00000000 fffff901`407175a0
000000db`56a57fa0 410f080f`00000000 00000000`00000000
000000db`56a57fb0 fffff901`407f4000 40050605`00000000
000000db`56a57fc0 00000000`00000000 fffff901`406d6bb0
000000db`56a57fd0 40080508`00000000 00000000`00000000
000000db`56a57fe0 fffff901`43e3a4c0 40120812`00000000
000000db`56a57ff0 00000000`00000000 fffff901`44fe1010

We see that typical cell value has 3 qwords (8-byte or 4-word sized values for both x64 and Virtualized Process). The non-zeroed data starts from 000000db`56a500f0 address. Clearly some entries have wProcessId equal to the PID from our dump:

0:000> ~
.  0  Id: 2c30.292c Suspend: 0 Teb: 00007ff7`1bf3e000 Unfrozen

Let’s look at one of such entries (the first and the last qword values are pointers):

0:000> dq 000000db`56a57da0 L3
000000db`56a57da0  fffff901`440deaf0 400a6d0a`00002c30
000000db`56a57db0  000000db`564f3b00

0:000> dw 000000db`56a57da0 L3*4
000000db`56a57da0  eaf0 440d f901 ffff 2c30 0000 6d0a 400a
000000db`56a57db0  3b00 564f 00db 0000

Applying 7f to wType word 400a gives us 0xa or 10 which is a font handle:

0:000> ? 400a & 7f
Evaluate expression: 10 = 00000000`0000000a

Since there are entries from other processes from the same session in this table assessing the handle leak visually is difficult so we wrote a WinDbg script that goes from the first non-zero pKernelAddress ($t0) to the first zero entry and for the given wProcessId ($tpid) counts the number of entries ($t1) and the number of entries ($t3) for the given wType ($t2). The script also counts the total entries till the first zero one ($t4):

.while (qwo(@$t0)) { .if (wo(@$t0+8) == @$tpid) {r $t1 = @$t1+1; .if (((qwo(@$t0+8) >> 0n48) & 7f) == @$t2) {r $t3 = @$t3+1} }; r $t0 = @$t0+3*8; r $t4 = @$t4+1}

To execute it we prepare the variables:

0:000> r $t0=000000db`56a500f0

0:000> r $t1=0

0:000> r $t2=a

0:000> r $t3=0

0:000> r $t4=0

0:000> .while (qwo(@$t0)) { .if (wo(@$t0+8) == @$tpid) {r $t1 = @$t1+1; .if (((qwo(@$t0+8) >> 0n48) & 7f) == @$t2) {r $t3 = @$t3+1} }; r $t0 = @$t0+3*8; r $t4 = @$t4+1}

After execution we get the modified variables that show us that the total consecutive non-zero handle table entries is 21464, the total number of entries for the current process is 9990, and the total number of fonts is 9982:

0:000> ? $t0
Evaluate expression: 942052007216 = 000000db`56acdd30

0:000> ? $t1
Evaluate expression: 9990 = 00000000`00002706

0:000> ? $t2
Evaluate expression: 10 = 00000000`0000000a

0:000> ? $t3
Evaluate expression: 9982 = 00000000`000026fe

0:000> ? $t4
Evaluate expression: 21464 = 00000000`000053d8

If we repeat the same script for device contexts (wType is 1) we get only 2 entries for our PID:

0:000> r $t0=000000db`56a500f0

0:000> r $t1=0

0:000> r $t2=1

0:000> r $t3=0

0:000> r $t4=0

0:000> .while (qwo(@$t0)) { .if (wo(@$t0+8) == @$tpid) {r $t1 = @$t1+1; .if (((qwo(@$t0+8) >> 0n48)  & 7f) == @$t2) {r $t3 = @$t3+1} }; r $t0 = @$t0+3*8; r $t4 = @$t4+1}

0:000> ? $t3
Evaluate expression: 2 = 00000000`00000002

Of course, this script may be further improved, for example, to process all possible wType values and print their statistics. It can also be made as a textual WinDbg script procedure with arguments.

We rename the original pattern variant to Handle Limit (GDI, kernel space) and name this pattern variant as Handle Limit (GDI, user space).

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 241)

Sunday, May 29th, 2016

Most Execution Residue traces in memory dumps are not explicitly temporal (see Special and General Trace and Log Analysis) but may be ordered by some space coordinate, such as memory addresses or page frame numbers. Furthermore, virtual space can be further subdivided into places such as modules and physical space may be restructured into places such as processes. Simple space trace of some data value can be constructed using Value References analysis pattern. These and higher structural space trace constructs can be named as a general Place Trace analysis pattern illustrated in this diagram:

Memory attributes, such as page protection, or derived attributes from memory contents can also be considered as Place Trace data. Sometimes, time ordering can be reconstructed by looking at time information for place containers, for example, elapsed process time or ordering in the process list, or thread order and times for stack region thread owners.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 240)

Sunday, May 29th, 2016

Windows processes may contain Execution Residue such as ASCII window class names in mapped memory regions pointing to other running processes (perhaps as a result of Hooksware). For example, calc.exe process memory dump saved on my Windows 10 notebook “knows” about Visio and WinDbg windows that were opened at that time:

0:000> s-a 0 L?FFFFFFFFFFFFFFFF "VISIOA"
00000015`42c6bdd0 56 49 53 49 4f 41 00 00-00 00 00 00 00 00 00 00 VISIOA.............

0:000> s-a 0 L?FFFFFFFFFFFFFFFF "WinDbg"
00000015`42d19720 57 69 6e 44 62 67 46 72-61 6d 65 43 6c 61 73 73 WinDbgFrameClass

This may be useful for some troubleshooting scenarios, for example, pointing to processes which are known for their problematic behavior or Special Processes. Of course, we assume that those windows or classes were genuine, not faked. We call this analysis pattern Window Hint similar to Environment Hint and Module Hint analysis patterns.

Going deeper, we can dump strings from the whole region limiting the output to the strings with length more than 5:

0:000> !address 00000015`42d19720

Usage:                  <unknown>
Base Address:           00000015`42b20000
End Address:            00000015`42d3a000

Region Size:            00000000`0021a000 (   2.102 MB)
State:                  00001000          MEM_COMMIT
Protect:                00000002          PAGE_READONLY
Type:                   00040000          MEM_MAPPED
Allocation Base:        00000015`42b20000
Allocation Protect:     00000002          PAGE_READONLY

Content source: 1 (target), length: 208e0

0:000> s-[l5]sa 00000015`42b20000 00000015`42d3a000
00000015`42b20a60  “#32769″
00000015`42b20cc0  “Message”
00000015`42b20f40  “#32774″
00000015`42b21060  “#32772″
00000015`42b21510  “Ghost”
00000015`42b215e0  “LivePreview”
00000015`42b216f0  “UserAdapterWindowClass”
00000015`42b21ce0  “MSCTFIME Composition”
00000015`42b222a0  “#32772″
00000015`42b22390  “#32772″
00000015`42b22460  “RichEdit20W”
00000015`42b22530  “RichEdit20A”
00000015`42b22600  “ToolbarWindow32″
00000015`42b226e0  “tooltips_class32″
00000015`42b227c0  “msctls_statusbar32″
00000015`42b228a0  “SysListView32″
00000015`42b22980  “SysHeader32″
00000015`42b22a50  “SysTabControl32″
00000015`42b22b30  “SysTreeView32″
00000015`42b22c10  “msctls_trackbar32″
00000015`42b22cf0  “msctls_updown32″
00000015`42b22dd0  “msctls_progress32″
00000015`42b22eb0  “msctls_hotkey32″
00000015`42b22f8f  “‘SysAnimate32″
00000015`42b230f0  “SysIPAddress32″
00000015`42b231d0  “ReBarWindow32″
00000015`42b232b0  “ComboBoxEx32″
00000015`42b23390  “SysMonthCal32″
00000015`42b23470  “SysDateTimePick32″
00000015`42b23550  “DropDown”
00000015`42b23620  “SysLink”
00000015`42b236f0  “SysPager”
00000015`42b23960  “msctls_netaddress”

[...]

00000015`42d175e0  "OutlookFbThreadWnd"
00000015`42d19720  "WinDbgFrameClass"
00000015`42d19750  "DockClass"
00000015`42d19770  "GhostClass"
00000015`42d19a30  "ATL:00007FF60D792730"
00000015`42d1a0f0  "MSCTFIME Composition"
00000015`42d1a4af  "%OleMainThreadWndClass"
00000015`42d1be10  "CicMarshalWndClass"
00000015`42d1c0e0  "VSyncHelper-00000040EC4CA5F0-1f8"
00000015`42d1c100  "8855daf"
00000015`42d1c190  "URL Moniker Notification Window"
00000015`42d1c390  "UserAdapterWindowClass"
00000015`42d1d080  "@>zG#"
00000015`42d1dcaf  "!VSyncHelper-00000040D60C5850-1e"
00000015`42d1dccf  "ef0477df"
00000015`42d20d50  "VSyncHelper-00000040F39C5650-1f0"
00000015`42d20d70  "313c5a0"
00000015`42d250d0  "#32770"
00000015`42d250f0  "URL Moniker Notification Window"
00000015`42d29270  "VSyncHelper-00000079321C32E0-1f2"
00000015`42d29290  "fb11f8c"
00000015`42d2a1d0  "MSCTFIME Composition"
00000015`42d2a480  "CicMarshalWndClass"
00000015`42d2ac80  "MSCTFIME Composition"
00000015`42d2b8d0  "ShockwaveFlashFullScreen"
00000015`42d2bbb8  "P?U!\"
00000015`42d2c690  "Xaml_WindowedPopupClass"
00000015`42d30a10  "ShockwaveFlashFullScreen"
00000015`42d30b50  "MSCTFIME UI"
00000015`42d30b90  "WinBaseClass"
00000015`42d3441f  "!Alternate Owner"
00000015`42d34460  "ShockwaveFlashFullScreen"
00000015`42d344a0  "ATL:00007FF60D792530"
00000015`42d34a50  "SysAnimate32"
00000015`42d34a7f  "'ComboBoxEx32"
00000015`42d34ed0  "tooltips_class32"
00000015`42d34f00  "msctls_statusbar32"
00000015`42d35e70  "RawInputClass"
00000015`42d36a10  "SysTabControl32"
00000015`42d38650  "CicMarshalWndClass"
00000015`42d38eb0  "#32772"
00000015`42d3951f  "!VSyncHelper-000000C9DA06CD10-1f"
00000015`42d3953f  "110e8d16"

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 239)

Monday, May 2nd, 2016

We found out that in Windows 10 (at least on our working system) Notepad is no longer a single threaded application even without opening any common dialogs (like in Evental Dumps analysis pattern example). It has at least 3 additional threads (and other modeling applications we use for our training also have additional threads):

0:000> ~*k

0 Id: 3a64.3b38 Suspend: 1 Teb: 00007ff6`a914d000 Unfrozen
# Child-SP RetAddr Call Site
00 000000e5`6298f938 00007ffa`e57cf8e5 USER32!NtUserGetMessage+0xa
01 000000e5`6298f940 00007ff6`a9603470 USER32!GetMessageW+0x25
02 000000e5`6298f970 00007ff6`a96141f5 notepad!WinMain+0x178
03 000000e5`6298f9f0 00007ffa`e3b42d92 notepad!WinMainCRTStartup+0x1c5
04 000000e5`6298fab0 00007ffa`e5bc9f64 KERNEL32!BaseThreadInitThunk+0x22
05 000000e5`6298fae0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

1 Id: 3a64.38b0 Suspend: 1 Teb: 00007ff6`a914b000 Unfrozen
# Child-SP RetAddr Call Site
00 000000e5`62bffa58 00007ffa`e5bf93a5 ntdll!NtWaitForWorkViaWorkerFactory+0xa
01 000000e5`62bffa60 00007ffa`e3b42d92 ntdll!TppWorkerThread+0x295
02 000000e5`62bffe60 00007ffa`e5bc9f64 KERNEL32!BaseThreadInitThunk+0x22
03 000000e5`62bffe90 00000000`00000000 ntdll!RtlUserThreadStart+0x34

2 Id: 3a64.3940 Suspend: 1 Teb: 00007ff6`a9149000 Unfrozen
# Child-SP RetAddr Call Site
00 000000e5`62c7f718 00007ffa`e5bf93a5 ntdll!NtWaitForWorkViaWorkerFactory+0xa
01 000000e5`62c7f720 00007ffa`e3b42d92 ntdll!TppWorkerThread+0x295
02 000000e5`62c7fb20 00007ffa`e5bc9f64 KERNEL32!BaseThreadInitThunk+0x22
03 000000e5`62c7fb50 00000000`00000000 ntdll!RtlUserThreadStart+0x34

3 Id: 3a64.1030 Suspend: 1 Teb: 00007ff6`a9147000 Unfrozen
# Child-SP RetAddr Call Site
00 000000e5`62d1f878 00007ffa`e5bf93a5 ntdll!NtWaitForWorkViaWorkerFactory+0xa
01 000000e5`62d1f880 00007ffa`e3b42d92 ntdll!TppWorkerThread+0x295
02 000000e5`62d1fc80 00007ffa`e5bc9f64 KERNEL32!BaseThreadInitThunk+0x22
03 000000e5`62d1fcb0 00000000`00000000 ntdll!RtlUserThreadStart+0x34

This gave us an idea for the analysis pattern we call Not My Thread since additional threads can be started by any other process DLLs, for example, by Hooksware. However, we need to distinguish between unexpectedly added threads, threads with Special Stack Traces and Special Threads, for example, from .NET support.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 238)

Thursday, March 17th, 2016

Sometimes developers introduce their own variants of synchronization code instead of using synchronization API provided by language runtime and OS. If we are lucky we can spot it in function and class method names and then use Constant Subtrace analysis pattern:

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it
# Call Site
00 nt!KiSwapContext
01 nt!KiCommitThreadWait
02 nt!KeWaitForSingleObject
03 nt!NtWaitForSingleObject
04 nt!KiSystemServiceCopyEnd
05 ntdll!ZwWaitForSingleObject
06 KERNELBASE!WaitForSingleObjectEx
07 wbemcomn!CWbemCriticalSection::Enter
08 wbemcore!EnsureInitialized
09 wbemcore!InitAndWaitForClient
0a wbemcore!CWbemLevel1Login::ConnectorLogin
0b wbemcore!CWbemLevel1Login::NTLMLogin
0c RPCRT4!Invoke
0d RPCRT4!NdrStubCall2
0e ole32!CStdStubBuffer_Invoke
0f ole32!SyncStubInvoke
10 ole32!StubInvoke
11 ole32!CCtxComChnl::ContextInvoke
12 ole32!AppInvoke
13 ole32!ComInvokeWithLockAndIPID
14 ole32!ThreadInvoke
15 RPCRT4!DispatchToStubInCNoAvrf
16 RPCRT4!RPC_INTERFACE::DispatchToStubWorker
17 RPCRT4!RPC_INTERFACE::DispatchToStub
18 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
19 RPCRT4!LRPC_SCALL::DispatchRequest
1a RPCRT4!LRPC_SCALL::HandleRequest
1b RPCRT4!LRPC_SASSOCIATION::HandleRequest
1c RPCRT4!LRPC_ADDRESS::HandleRequest
1d RPCRT4!LRPC_ADDRESS::ProcessIO
1e RPCRT4!LrpcIoComplete
1f ntdll!TppAlpcpExecuteCallback
20 ntdll!TppWorkerThread
21 kernel32!BaseThreadInitThunk
22 ntdll!RtlUserThreadStart

0: kd> kc
*** Stack trace for last set context - .thread/.cxr resets it
# Call Site
00 repdrvfs!SCachePage::operator=
01 repdrvfs!std::vector<scachepage,wbem_allocator<scachepage> >::erase
02 repdrvfs!CPageCache::Read
03 repdrvfs!CPageFile::GetPage
04 repdrvfs!ValidateBTreeAgainstObjHeap
05 repdrvfs!PerformAllValidations
06 repdrvfs!VerifyRepositoryOnline
07 repdrvfs!VerifyRepository
08 repdrvfs!CPageSource::Startup
09 repdrvfs!CPageSource::Init
0a repdrvfs!CFileCache::InnerInitialize
0b repdrvfs!CFileCache::Initialize
0c repdrvfs!CRepository::Initialize
0d repdrvfs!CRepository::Logon
0e wbemcore!CRepository::Init
0f wbemcore!InitSubsystems
10 wbemcore!ConfigMgr::InitSystem
11 wbemcore!EnsureInitialized
12 wbemcore!InitAndWaitForClient
13 wbemcore!CWbemLevel1Login::ConnectorLogin
14 wbemcore!CWbemLevel1Login::NTLMLogin
15 RPCRT4!Invoke
16 RPCRT4!NdrStubCall2
17 ole32!CStdStubBuffer_Invoke
18 ole32!SyncStubInvoke
19 ole32!StubInvoke
1a ole32!CCtxComChnl::ContextInvoke
1b ole32!AppInvoke
1c ole32!ComInvokeWithLockAndIPID
1d ole32!ThreadInvoke
1e RPCRT4!DispatchToStubInCNoAvrf
1f RPCRT4!RPC_INTERFACE::DispatchToStubWorker
20 RPCRT4!RPC_INTERFACE::DispatchToStub
21 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject
22 RPCRT4!LRPC_SCALL::DispatchRequest
23 RPCRT4!LRPC_SCALL::HandleRequest
24 RPCRT4!LRPC_SASSOCIATION::HandleRequest
25 RPCRT4!LRPC_ADDRESS::HandleRequest
26 RPCRT4!LRPC_ADDRESS::ProcessIO
27 RPCRT4!LrpcIoComplete
28 ntdll!TppAlpcpExecuteCallback
29 ntdll!TppWorkerThread
2a kernel32!BaseThreadInitThunk
2b ntdll!RtlUserThreadStart

These two thread stack traces were spotted from a complete memory dump Stack Trace Collection as the part of a larger ALPC Wait Chain. We switched to these threads using .thread /r /p WinDbg command to get the stripped stack trace via kc command for better illustration. We see Constant Subtrace until wbemcore!EnsureInitialized function which serves as a bifurcation stack frame. The first stack trace has “CriticalSection::Enter” after the bifurcation stack frame compared to the second stack trace which looks like Spiking Thread in user space.

There is no hidden critical section associated with that process except the one which is probably related to the spiking Variable Subtrace since it doesn’t have any LockCount:

0: kd> !cs -l -o -s
DebugInfo = 0x0000000004a774d0
Critical section = 0x000000000308d690 (+0x308D690)
LOCKED
LockCount = 0×0
WaiterWoken = No
OwningThread = 0×0000000000000928
RecursionCount = 0×1
LockSemaphore = 0×0
SpinCount = 0×0000000000000000
OwningThread = .thread fffffa806ebd8060
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled

We can also disassemble wbemcomn!CWbemCriticalSection::Enter and find out that it calls WaitForSingleObject once and no other synchronization API indeed:

0: kd> uf wbemcomn!CWbemCriticalSection::Enter
[...]
wbemcomn!CWbemCriticalSection::Enter+0x41:
000007fe`f78ad1c0 mov rcx,qword ptr [rbx+10h]
000007fe`f78ad1c4 mov edx,r12d
000007fe`f78ad1c7 call qword ptr [wbemcomn!_imp_WaitForSingleObject (000007fe`f78ee4f0)]
000007fe`f78ad1cd cmp eax,esi
000007fe`f78ad1cf je wbemcomn!CWbemCriticalSection::Enter+0x52 (000007fe`f78a62a3) Branch

We add this nonstandard synchronization memory analysis pattern to Wait Chain analysis pattern category.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 237)

Thursday, March 17th, 2016

Variable Subtrace analysis pattern was introduced for inter-correlational analysis of CPU spikes across memory snapshots with just one thread involved. In contrast, we found Constant Subtrace pattern useful in Wait Chain analysis involving several threads in just one memory snapshot (intra-correlational analysis). Here a constant subtrace groups stack traces from Stack Trace Collection with a bifurcation stack trace frame (similar to Bifurcation Point trace analysis pattern) providing some wait chain relationship hint. Such subtraces traces may be initially found by the preceding wait chain analysis or by technology-specific subtraces such as ALPC/RPC server thread frames (as seen in an example stack from COM interface invocation). Here is a minimal stack trace diagram (similar to minimal trace graphs introduced in Accelerated Windows Software Trace Analysis training) illustrating the pattern (it also shows Spiking Thread pattern in user space as seen from a complete memory dump):

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 236)

Saturday, February 13th, 2016

When we have a performance issue we may request a set of consecutive memory dump saved after some interval. In such memory dumps we may see the same thread(s) having similar stack trace(s). In this simple diagnostic scenario we may diagnose several patterns based on the stack traces: Active Threads that can be Spiking Threads with Spike Intervals or stable, not changing, Wait Chains. Here we may easily identify Top active and Blocking modules based on Module Wait Chain.

More complex case arises when we have different Active Threads and/or Wait Chains with different thread IDs at different times. However, if their Top Module is the same we may have found it as a performance root cause component especially in the case of Active Threads since it is statistically probable that such threads were active for considerable time deltas around the snapshot times (since threads are usually waiting). Such hypothesis may also be confirmed by Inter-Correlation analysis with software traces and logs where we can see Thread of Activity Discontinuities and Time Deltas.

We call this analysis pattern Diachronic Module since we see the module component appears in different thread stack traces diachronically (at different times). The typical simplified scenario is illustrated in this diagram:

This analysis pattern is different from synchronous module case (the module component appears in different thread stack traces at the same time) which was named Ubiquitous Component.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 36, Linux)

Saturday, December 19th, 2015

This is a Linux variant of Local Buffer Overflow pattern previously described for Mac OS X and Windows platforms. Most of the time simple mistakes in using memory and string manipulation functions are easily detected by runtime. The more sophisticated example which overwrites stack trace without being detected involves overwriting indirectly via a pointer to a local buffer passed to the called function. In such cases we might see incorrect and truncated stack traces:

(gdb) bt
#0  0×0000000000000000 in ?? ()
#1  0×0000000000000000 in ?? ()

(gdb) x/100a $rsp
[...]
0x7fc3dd9dece8: 0x0 0x0
0x7fc3dd9decf8: 0x0 0x0
0x7fc3dd9ded08: 0x0 0x0
0x7fc3dd9ded18: 0x0 0x0
0x7fc3dd9ded28: 0×7fc3dd9ded48 0×4005cc <procA+40>
0×7fc3dd9ded38: 0×422077654e20794d 0×7542207265676769
0×7fc3dd9ded480×72656666 0×0
0×7fc3dd9ded58: 0×0 0×0
0×7fc3dd9ded68: 0×0 0×0
0×7fc3dd9ded78: 0×0 0×0
[…]

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 16b, Linux)

Saturday, December 19th, 2015

This is a Linux variant of Stack Overflow (user mode) pattern previously described for Mac OS X and Windows platforms:

(gdb) bt
#0  0x00000000004004fb in procF ()
#1  0x000000000040054b in procF ()
#2  0x000000000040054b in procF ()
#3  0x000000000040054b in procF ()
#4  0x000000000040054b in procF ()
#5  0x000000000040054b in procF ()
#6  0x000000000040054b in procF ()
#7  0x000000000040054b in procF ()
#8  0x000000000040054b in procF ()
#9  0x000000000040054b in procF ()
#10 0x000000000040054b in procF ()
#11 0x000000000040054b in procF ()
#12 0x000000000040054b in procF ()
#13 0x000000000040054b in procF ()
#14 0x000000000040054b in procF ()
#15 0x000000000040054b in procF ()
#16 0x000000000040054b in procF ()
[...]

(gdb) bt -10
#15409 0x000000000040054b in procF ()
#15410 0x000000000040054b in procF ()
#15411 0x000000000040054b in procF ()
#15412 0x000000000040055b in procE ()
#15413 0x0000000000400575 in bar_one ()
#15414 0x0000000000400585 in foo_one ()
#15415 0x000000000040059d in thread_one ()
#15416 0x0000000000401690 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#15417 0x0000000000432549 in clone ()
#15418 0x0000000000000000 in ?? ()

In case of a stack overflow the stack pointer is decremented beyond the stack region boundary into an non-accessible region so any stack memory access triggers an access violation:

(gdb) x $rsp
0×7eff46109ec0: 0×0

(gdb) frame 1
#1  0x000000000040054b in procF ()

(gdb) x $rsp
0×7eff4610a0e0: 0×0

(gdb) maintenance info sections
[...]
Core file:
[...]
0×7eff46109000->0×7eff4610a000 at 0×02034000: load13 ALLOC LOAD READONLY HAS_CONTENTS
0×7eff4610a000->0×7eff4690a000 at 0×02035000: load14 ALLOC LOAD HAS_CONTENTS
[…]

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 24, Linux)

Saturday, December 19th, 2015

This is a Linux variant of Coincidental Symbolic Information pattern previously described for Mac OS X and Windows platforms. The idea is the same: to disassemble the address to see if the preceding instruction is a call. If it is indeed then most likely the symbolic address is a return address from past Execution Residue:

(gdb) x/i 0x4005e6
0x4005e6 <_Z6work_3v+9>: pop    %rbp

(gdb) disassemble 0x4005e6
Dump of assembler code for function _Z6work_3v:
0x00000000004005dd <+0>: push   %rbp
0x00000000004005de <+1>: mov    %rsp,%rbp
0x00000000004005e1 <+4>: callq  0×4005d2 <_Z6work_4v>
0×00000000004005e6 <+9>: pop    %rbp
0×00000000004005e7 <+10>: retq
End of assembler dump.

(gdb) x/4i 0x49c740-4
0x49c73c: add    %al,(%rax)
0x49c73e: add    %al,(%rax)
0×49c740 <default_attr>: add    %al,(%rax)
0×49c742 <default_attr+2>: add    %al,(%rax)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 60, Linux)

Saturday, December 19th, 2015

This is a Linux variant of Execution Residue pattern previously described for Mac OS X and Windows platforms. This is symbolic information left in a stack region including ASCII and UNICODE fragments or pointers to them, for example, return addresses from past function calls:

(gdb) bt
#0  0x00000000004431f1 in nanosleep ()
#1  0x00000000004430c0 in sleep ()
#2  0x0000000000400771 in procNE() ()
#3  0x00000000004007aa in bar_two() ()
#4  0x00000000004007b5 in foo_two() ()
#5  0x00000000004007c8 in thread_two(void*) ()
#6  0x00000000004140f0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#7  0x0000000000445879 in clone ()
#8  0x0000000000000000 in ?? ()

(gdb) x/512a $rsp-2000
0x7f4cacc42360: 0x0 0x0
0x7f4cacc42370: 0x0 0x0
0x7f4cacc42380: 0x0 0x0
0x7f4cacc42390: 0x0 0x0
[...]
0x7f4cacc42830: 0x0 0x0
0x7f4cacc42840: 0x0 0x0
0x7f4cacc42850: 0x0 0x0
0x7f4cacc42860: 0x7f4cacc42870 0×4005af <_Z6work_8v+9>
0×7f4cacc42870: 0×7f4cacc42880 0×4005ba <_Z6work_7v+9>
0×7f4cacc42880: 0×7f4cacc42890 0×4005c5 <_Z6work_6v+9>
0×7f4cacc42890: 0×7f4cacc428a0 0×4005d0 <_Z6work_5v+9>
0×7f4cacc428a0: 0×7f4cacc428b0 0×4005db <_Z6work_4v+9>
0×7f4cacc428b0: 0×7f4cacc428c0 0×4005e6 <_Z6work_3v+9>
0×7f4cacc428c0: 0×7f4cacc428d0 0×4005f1 <_Z6work_2v+9>
0×7f4cacc428d0: 0×7f4cacc428e0 0×4005fc <_Z6work_1v+9>
0×7f4cacc428e0: 0×7f4cacc42cf0 0×40060e <_Z4workv+16>
0×7f4cacc428f0: 0×0 0×0
0×7f4cacc42900: 0×0 0×0
0×7f4cacc42910: 0×0 0×0
[…]
0×7f4cacc42af0: 0×0 0×0
0×7f4cacc42b00: 0×0 0×0
0×7f4cacc42b10: 0×0 0×0
0×7f4cacc42b20: 0×0 0×4431e6 <nanosleep+38>
0×7f4cacc42b30: 0×0 0×4430c0 <sleep+224>
0×7f4cacc42b40: 0×0 0×0
0×7f4cacc42b50: 0×0 0×0
0×7f4cacc42b60: 0×0 0×0
0×7f4cacc42b70: 0×0 0×0
[…]
0×7f4cacc42cb0: 0×0 0×0
0×7f4cacc42cc0: 0×0 0×0
0×7f4cacc42cd0: 0×0 0×0
0×7f4cacc42ce0: 0xfffffed2 0×3ad3affa
0×7f4cacc42cf0: 0×7f4cacc42d00 0×0
0×7f4cacc42d00: 0×7f4cacc42d20 0×49c740 <default_attr>
0×7f4cacc42d10: 0×7f4cacc439c0 0×400771 <_Z6procNEv+19>
0×7f4cacc42d20: 0×7f4cacc42d30 0×4007aa <_Z7bar_twov+9>
0×7f4cacc42d30: 0×7f4cacc42d40 0×4007b5 <_Z7foo_twov+9>
0×7f4cacc42d40: 0×7f4cacc42d60 0×4007c8 <_Z10thread_twoPv+17>
0×7f4cacc42d50: 0×0 0×0
0×7f4cacc42d60: 0×0 0×4140f0 <start_thread+208>
0×7f4cacc42d70: 0×0 0×7f4cacc43700
0×7f4cacc42d80: 0×0 0×0
0×7f4cacc42d90: 0×0 0×0
[…]

However, supposed return addresses need to be checked for Coincidental Symbolic Information.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 2, Linux)

Saturday, December 19th, 2015

This is a Linux variant of Dynamic Memory Corruption (process heap) pattern previously described for Mac OS X and Windows platforms.

The corruption may be internal for heap structures with subsequent memory access violation:

(gdb) bt
#0  0×000000000041482e in _int_malloc ()
#1  0×0000000000416d88 in malloc ()
#2  0×00000000004005dc in proc ()
#3  0×00000000004006ee in bar_three ()
#4  0×00000000004006fe in foo_three ()
#5  0×0000000000400716 in thread_three ()
#6  0×0000000000401760 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#7  0×0000000000432609 in clone ()
#8  0×0000000000000000 in ?? ()

(gdb) x/i $rip
=> 0x41482e <_int_malloc+622>: mov    %rbx,0×10(%r12)

(gdb) x $r12+0x10
0x21687371: Cannot access memory at address 0x21687371

(gdb) p (char[4])0x21687371
$1 = "qsh!"

Or it may be detected with a diagnostic message (similar to double free):

(gdb) bt
#0  0×000000000043ef65 in raise ()
#1  0×0000000000409fc0 in abort ()
#2  0×000000000040bf5b in __libc_message ()
#3  0×0000000000412042 in malloc_printerr ()

#4  0×0000000000416c27 in free ()
#5  0×0000000000400586 in proc ()
#6  0×000000000040067e in bar_four ()
#7  0×000000000040068e in foo_four ()
#8  0×00000000004006a6 in thread_four ()
#9  0×00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#10 0×0000000000432589 in clone ()
#11 0×0000000000000000 in ?? ()

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 235, Linux)

Friday, December 18th, 2015

We first introduced Critical Region pattern in Accelerated Mac OS X Core Dump Analysis training but didn’t submit the pattern itself to the catalog at that time.

A critical region is usually a region of code protected by synchronization objects such as critical sections and mutexes. However, Critical Region analysis pattern is about identifying code regions “sandwiched” between contending function calls (which may or may not involve synchronization objects and corresponding synchronization calls such as identified in Contention patterns), and then identifying any possible shared data referenced by such code regions:

(gdb) thread apply all bt

Thread 6 (Thread 0x7f2665377700 (LWP 17000)):
#0  0x00000000004151a1 in _int_malloc ()
#1  0x0000000000416cf8 in malloc ()
#2  0x00000000004005a4 in proc ()
#3  0x0000000000400604 in bar_two ()
#4  0x0000000000400614 in foo_two ()
#5  0x000000000040062c in thread_two ()
#6  0x00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#7  0x0000000000432589 in clone ()
#8  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f2664b76700 (LWP 17001)):
#0  __lll_unlock_wake_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:343
#1  0×000000000041886d in _L_unlock_9670 ()
#2  0×0000000000416d22 in malloc ()
#3  0×00000000004005a4 in proc ()

#4  0×0000000000400641 in bar_three ()
#5  0×0000000000400651 in foo_three ()
#6  0×0000000000400669 in thread_three ()
#7  0×00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#8  0×0000000000432589 in clone ()
#9  0×0000000000000000 in ?? ()

Thread 4 (Thread 0x7f2665b78700 (LWP 16999)):
#0  __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0×0000000000418836 in _L_lock_9558 ()
#2  0×0000000000416c1c in free ()
#3  0×0000000000400586 in proc ()

#4  0×00000000004005c7 in bar_one ()
#5  0×00000000004005d7 in foo_one ()
#6  0×00000000004005ef in thread_one ()
#7  0×00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#8  0×0000000000432589 in clone ()
#9  0×0000000000000000 in ?? ()

Thread 3 (Thread 0x1ab1860 (LWP 16998)):
#0  0x000000000042fed1 in nanosleep ()
#1  0x000000000042fda0 in sleep ()
#2  0x000000000040078a in main ()

Thread 2 (Thread 0x7f2663b74700 (LWP 17003)):
#0  __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x0000000000418836 in _L_lock_9558 ()
#2  0x0000000000416c1c in free ()
#3  0x0000000000400586 in proc ()
#4  0x00000000004006bb in bar_five ()
#5  0x00000000004006cb in foo_five ()
#6  0x00000000004006e3 in thread_five ()
#7  0x00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#8  0x0000000000432589 in clone ()
#9  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f2664375700 (LWP 17002)):
#0  0x000000000043ef65 in raise ()
#1  0x0000000000409fc0 in abort ()
#2  0x000000000040bf5b in __libc_message ()
#3  0x0000000000412042 in malloc_printerr ()
#4  0x0000000000416c27 in free ()
#5  0x0000000000400586 in proc ()
#6  0x000000000040067e in bar_four ()
#7  0x000000000040068e in foo_four ()
#8  0x00000000004006a6 in thread_four ()
#9  0x00000000004016c0 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#10 0x0000000000432589 in clone ()
#11 0x0000000000000000 in ?? ()

From threads #4 and #5 we can identify one such a region with a shared buffer 0×6b8fc0 which may further point to heap entries.

(gdb) disassemble proc
Dump of assembler code for function proc:
0x00000000004004f0 <+0>: push   %rbp
0x00000000004004f1 <+1>: mov    %rsp,%rbp
0x00000000004004f4 <+4>: push   %rbx
0x00000000004004f5 <+5>: sub    $0x18,%rsp
0x00000000004004f9 <+9>: callq  0x40ac70 <rand>
0x00000000004004fe <+14>: mov    %eax,%ecx
0x0000000000400500 <+16>: mov    $0x68db8bad,%edx
0x0000000000400505 <+21>: mov    %ecx,%eax
0x0000000000400507 <+23>: imul   %edx
0x0000000000400509 <+25>: sar    $0xc,%edx
0x000000000040050c <+28>: mov    %ecx,%eax
0x000000000040050e <+30>: sar    $0x1f,%eax
0x0000000000400511 <+33>: mov    %edx,%ebx
0x0000000000400513 <+35>: sub    %eax,%ebx
0x0000000000400515 <+37>: mov    %ebx,%eax
0x0000000000400517 <+39>: mov    %eax,-0x14(%rbp)
0x000000000040051a <+42>: mov    -0x14(%rbp),%eax
0x000000000040051d <+45>: imul   $0x2710,%eax,%eax
0x0000000000400523 <+51>: mov    %ecx,%edx
0x0000000000400525 <+53>: sub    %eax,%edx
0x0000000000400527 <+55>: mov    %edx,%eax
0x0000000000400529 <+57>: mov    %eax,-0x14(%rbp)
0x000000000040052c <+60>: callq  0x40ac70 <rand>
0x0000000000400531 <+65>: mov    %eax,%ecx
0x0000000000400533 <+67>: mov    $0x68db8bad,%edx
0x0000000000400538 <+72>: mov    %ecx,%eax
0x000000000040053a <+74>: imul   %edx
0x000000000040053c <+76>: sar    $0xc,%edx
0x000000000040053f <+79>: mov    %ecx,%eax
0x0000000000400541 <+81>: sar    $0x1f,%eax
0x0000000000400544 <+84>: mov    %edx,%ebx
0x0000000000400546 <+86>: sub    %eax,%ebx
0x0000000000400548 <+88>: mov    %ebx,%eax
0x000000000040054a <+90>: mov    %eax,-0x18(%rbp)
0x000000000040054d <+93>: mov    -0x18(%rbp),%eax
0x0000000000400550 <+96>: imul   $0x2710,%eax,%eax
0x0000000000400556 <+102>: mov    %ecx,%edx
0x0000000000400558 <+104>: sub    %eax,%edx
0x000000000040055a <+106>: mov    %edx,%eax
0x000000000040055c <+108>: mov    %eax,-0x18(%rbp)
0x000000000040055f <+111>: mov    -0x14(%rbp),%eax
0x0000000000400562 <+114>: cltq
0x0000000000400564 <+116>: mov    0x6b8fc0(,%rax,8),%rax
0x000000000040056c <+124>: test   %rax,%rax
0x000000000040056f <+127>: je     0x400597 <proc+167>
0x0000000000400571 <+129>: mov    -0x14(%rbp),%eax
0x0000000000400574 <+132>: cltq
0x0000000000400576 <+134>: mov    0x6b8fc0(,%rax,8),%rax
0x000000000040057e <+142>: mov    %rax,%rdi
0x0000000000400581 <+145>: callq  0x416bc0 <free>
0×0000000000400586 <+150>: mov    -0×14(%rbp),%eax
0×0000000000400589 <+153>: cltq
0×000000000040058b <+155>: movq   $0×0,0×6b8fc0(,%rax,8)
0×0000000000400597 <+167>: mov    -0×18(%rbp),%eax
0×000000000040059a <+170>: cltq
0×000000000040059c <+172>: mov    %rax,%rdi

0×000000000040059f <+175>: callq  0×416c90 <malloc>
0×00000000004005a4 <+180>: mov    %rax,%rdx
0×00000000004005a7 <+183>: mov    -0×14(%rbp),%eax
0×00000000004005aa <+186>: cltq
0×00000000004005ac <+188>: mov    %rdx,0×6b8fc0(,%rax,8)
0×00000000004005b4 <+196>: jmpq   0×4004f9 <proc+9>
End of assembler dump.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 6b, Linux)

Friday, December 18th, 2015

This is a Linux variant of NULL Pointer (data) pattern previously described for Mac OS X and Windows platforms:

(gdb) bt
#0  0×0000000000400500 in procA ()
#1  0×000000000040057a in bar_two ()
#2  0×000000000040058a in foo_two ()
#3  0×00000000004005a2 in thread_two ()
#4  0×0000000000401630 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#5  0×00000000004324e9 in clone ()
#6  0×0000000000000000 in ?? ()

(gdb) x/i 0x400500
=> 0x400500 <procA+16>: movl   $0x1,(%rax)

(gdb) info r $rax
rax            0×0 0

(gdb) x $rax
0×0: Cannot access memory at address 0×0

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 25, Linux)

Friday, December 18th, 2015

This is a Linux variant of Stack Trace pattern previously described for Mac OS X and Windows platforms. Here we show a stack trace when debug symbols are not available (stripped executable) and also how to apply debug symbols from the executable where they were preserved:

(gdb) bt
#0 0x000000000043e4f1 in nanosleep ()
#1 0x000000000043e3c0 in sleep ()
#2 0x0000000000400789 in main ()

(gdb) symbol-file ./App/App.debug
Reading symbols from /home/Apps/App/App.debug...done.

(gdb) bt
#0 0x000000000043e4f1 in nanosleep ()
#1 0x000000000043e3c0 in sleep ()
#2 0x0000000000400789 in main (argc=1, argv=0x7fff5d1572d8) at main.cpp:85

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 6a, Linux)

Friday, December 18th, 2015

This is a Linux variant of NULL Pointer (code) pattern previously described for Mac OS X and Windows platforms:

(gdb) bt
#0  0×0000000000000000 in ?? ()
#1  0×0000000000400531 in procB ()
#2  0×00000000004005f8 in bar_four ()
#3  0×0000000000400608 in foo_four ()
#4  0×0000000000400620 in thread_four ()
#5  0×0000000000401630 in start_thread (arg=<optimized out>)
at pthread_create.c:304
#6  0×00000000004324e9 in clone ()
#7  0×0000000000000000 in ?? ()

(gdb) disassemble procB
Dump of assembler code for function procB:
0x0000000000400516 <+0>: push   %rbp
0x0000000000400517 <+1>: mov    %rsp,%rbp
0x000000000040051a <+4>: sub    $0x10,%rsp
0x000000000040051e <+8>: movq   $0x0,-0x8(%rbp)
0x0000000000400526 <+16>: mov    -0x8(%rbp),%rdx
0x000000000040052a <+20>: mov    $0x0,%eax
0x000000000040052f <+25>: callq  *%rdx
0×0000000000400531 <+27>: leaveq
0×0000000000400532 <+28>: retq
End of assembler dump.

(gdb) info r rdx
rdx            0×0 0

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 78a, Linux)

Tuesday, December 15th, 2015

This is a Linux variant of Divide by Zero (user mode) pattern previously described for Mac OS X and Windows platforms:

GNU gdb (GDB)
[...]
Program terminated with signal 8, Arithmetic exception.
#0 0×000000000040056f in procD ()

(gdb) x/i $rip
=> 0x40056f <procD+18>: idivl -0×8(%rbp)

(gdb) info r $rax
rax 0x1 1

(gdb) x/w $rbp-0x8
0x7f0f6806bd28: 0×00000000

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 4, Linux)

Tuesday, December 15th, 2015

This is a Linux variant of Lateral Damage pattern previously described for Windows platforms. It also covers memory dumps where some usual commands may not work and we have to find a workaround to simulate their output, for example, by using other commands:

(gdb) info threads
Cannot find new threads: generic error

(gdb) thread apply all bt
Cannot find new threads: generic error

(gdb) thread 2
[Switching to thread 2 (LWP 12567)]
#0 0x000000000042ff51 in nanosleep ()

(gdb) thread 3
[Switching to thread 3 (LWP 12566)]
#0 0x000000000041482e in _int_malloc ()

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 187, Linux)

Monday, December 14th, 2015

Here we publish a Linux variant of Active Thread pattern that was previously introduced for Mac OS X and Windows. Basically it is a thread that is not waiting, sleeping, or suspended (most threads are). However, from a memory dump it is not possible to find out whether it was Spiking Thread at the dump generation time (unless we have a set of memory snapshots and in each one we have the same or similar back trace) and we don’t have any Paratext with CPU consumption stats for threads. For example, in one core dump we have this thread:

(gdb) info threads
Id Target Id Frame
6 Thread 0×7f560d467700 (LWP 3483) 0×00000000004324a9 in clone ()
5 Thread 0×7f560c465700 (LWP 3485) 0×000000000042fe31 in nanosleep ()
4 Thread 0×7f560bc64700 (LWP 3486) 0×000000000042fe31 in nanosleep ()
3 Thread 0×7f560b463700 (LWP 3487) 0×000000000042fe31 in nanosleep ()
2 Thread 0×18b9860 (LWP 3482) 0×000000000042fe31 in nanosleep ()
1 Thread 0×7f560cc66700 (LWP 3484) 0×000000000042fe31 in nanosleep ()

Thread #6 is not waiting so we inspect its back trace:

(gdb) thread 6
[Switching to thread 6 (Thread 0x7f560d467700 (LWP 3483))]
#0 0x00000000004324a9 in clone ()

(gdb) bt
#0 0×00000000004324a9 in clone ()
#1 0×0000000000401560 in ?? () at pthread_create.c:217
#2 0×00007f560d467700 in ?? ()
#3 0×0000000000000000 in ?? ()

(gdb) x/i 0x4324a9
=> 0x4324a9 : test %rax,%rax

Perhaps the core dump was saved at the thread creation time.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 14, Linux)

Monday, December 14th, 2015

This is a variant of Spiking Thread pattern previously described for Mac OS X and Windows platforms:

(gdb) info threads
Id Target Id Frame
6 LWP 3712 0x00000000004329d1 in nanosleep ()
5 LWP 3717 0×00000000004007a3 in isnan ()
4 LWP 3716 0×00000000004329d1 in nanosleep ()
3 LWP 3715 0×00000000004329d1 in nanosleep ()
2 LWP 3714 0×00000000004329d1 in nanosleep ()
* 1 LWP 3713 0×00000000004329d1 in nanosleep ()

We notice a non-waiting thread and switch to it:

(gdb) thread 5
[Switching to thread 5 (LWP 3717)]
#0 0x00000000004007a3 in isnan ()

(gdb) bt
#0 0×00000000004007a3 in isnan ()
#1 0×0000000000400743 in sqrt ()
#2 0×0000000000400528 in procB ()
#3 0×0000000000400639 in bar_five ()
#4 0×0000000000400649 in foo_five ()
#5 0×0000000000400661 in thread_five ()
#6 0×0000000000403e30 in start_thread ()
#7 0×0000000000435089 in clone ()
#8 0×0000000000000000 in ?? ()

If we disassemble the return address for procB function to come back from sqrt call we see an infinite loop:

(gdb) disassemble 0x400528
Dump of assembler code for function procB:
0x0000000000400500 <+0>: push   %rbp
0x0000000000400501 <+1>: mov    %rsp,%rbp
0x0000000000400504 <+4>: sub    $0x20,%rsp
0x0000000000400508 <+8>: movabs $0x3fd5555555555555,%rax
0x0000000000400512 <+18>: mov    %rax,-0x8(%rbp)
0×0000000000400516 <+22>: mov    -0×8(%rbp),%rax
0×000000000040051a <+26>: mov    %rax,-0×18(%rbp)
0×000000000040051e <+30>: movsd  -0×18(%rbp),%xmm0
0×0000000000400523 <+35>: callq  0×400710 <sqrt>
0×0000000000400528 <+40>: movsd  %xmm0,-0×18(%rbp)
0×000000000040052d <+45>: mov    -0×18(%rbp),%rax
0×0000000000400531 <+49>: mov    %rax,-0×8(%rbp)
0×0000000000400535 <+53>: jmp    0×400516 <procB+22>
End of assembler dump.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -