Trace Analysis Patterns (Part 6, Java)

August 19th, 2013

While working on Mobile Software Diagnostics webinar and exploring the richness of Android platform and similarity of its LogCat traces (Eclipse representation) to Minimal Trace Graphs (see a corresponding slide in trace analysis pattern reference) we recognized the time has come to provide a Java implementation for a general software narratological pattern-oriented trace analysis approach and also see what memory analysis patterns can also be applied there. Here by implementation we mean concrete platform examples. For example, in the past we did similar implementations of memory analysis patterns (originally developed for unmanaged and native Windows code) for .NET (WinDbg) and Mac OS X (GDB). The first trace analysis pattern we cover here is very obvious and simple and is called Exception Stack Trace. In the original pattern example a stack trace was inside a single trace message but can also be split such as each frame has its own message (date and time columns were removed for clarity):

E/AndroidRuntime(31416): java.lang.NullPointerException
E/AndroidRuntime(31416):  at android.view.MotionEvent.writeToParcel(MotionEvent.java:1596)
E/AndroidRuntime(31416):  at com.example.nullpointer.FullscreenActivity$1.onTouch(FullscreenActivity.java:139)
E/AndroidRuntime(31416):  at android.view.View.dispatchTouchEvent(View.java:3881)
E/AndroidRuntime(31416):  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:869)
E/AndroidRuntime(31416):  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:869)
E/AndroidRuntime(31416):  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:869)
E/AndroidRuntime(31416):  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:869)
E/AndroidRuntime(31416):  at android.view.ViewGroup.dispatchTouchEvent(ViewGroup.java:869)
E/AndroidRuntime(31416):  at com.android.internal.policy.impl.PhoneWindow$DecorView. superDispatchTouchEvent(PhoneWindow.java:1750)
E/AndroidRuntime(31416):  at com.android.internal.policy.impl.PhoneWindow. superDispatchTouchEvent(PhoneWindow.java:1135)
E/AndroidRuntime(31416):  at android.app.Activity.dispatchTouchEvent(Activity.java:2096)
E/AndroidRuntime(31416):  at com.android.internal.policy.impl.PhoneWindow$DecorView. dispatchTouchEvent(PhoneWindow.java:1734)
E/AndroidRuntime(31416):  at android.view.ViewRoot.deliverPointerEvent(ViewRoot.java:2216)
E/AndroidRuntime(31416):  at android.view.ViewRoot.handleMessage(ViewRoot.java:1887)
E/AndroidRuntime(31416):  at android.os.Handler.dispatchMessage(Handler.java:99)
E/AndroidRuntime(31416):  at android.os.Looper.loop(Looper.java:130)
E/AndroidRuntime(31416):  at android.app.ActivityThread.main(ActivityThread.java:3687)
E/AndroidRuntime(31416):  at java.lang.reflect.Method.invokeNative(Native Method)
E/AndroidRuntime(31416):  at java.lang.reflect.Method.invoke(Method.java:507)
E/AndroidRuntime(31416):  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:867)
E/AndroidRuntime(31416):  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:625)
E/AndroidRuntime(31416):  at dalvik.system.NativeStart.main(Native Method)

Since many memory analysis patterns are based on stack traces here we also have similarity with the following patterns names as Managed Code Exception and Managed Stack Trace. We hope that in the subsequent pattern implementation examples we show more complex patterns of abnormal software behaviour and may also discover any missing ones.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 200)

August 16th, 2013

If you have found module related patterns in a complete memory dump and suspect a particular module it may be worth looking at Module Product Process if it exists especially if this module (component, DLL) has product information or some related hint (lmv or !lmi commands). In complex environments such modules may be loaded not only by hooking mechanisms but also as plugins. If you are not sure if there is any such process the best ways is to get module collection and find a process module that has the same vendor as the module in question. Then such process should also be analysed for anomalies.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 74)

July 19th, 2013

Most of the time when we look at software trace fragments we recognize certain Motifs* such as client-server interaction, publisher-subscriber notifications, database queries, plugin sequence initialization, etc. This pattern is different from Master Trace which corresponds to a normal use-case or working software scenario and may actually contain several Motifs as it is usually happens in complex software environments. On the other side of the spectrum there are software narremes (basic narrative units) and Macrofunctions (single semantic units). Motifs help to further bridge the great divide between software construction and software diagnostics with software narremes corresponding to implementation patterns, macrofunctions to design patterns, and motifs to architectural patterns although an overlap between these categories is possible.

* The idea of a pattern name comes from motives in mathematics.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 73)

June 12th, 2013

Dialogue is an important pattern especially in network trace analysis. It usually involves a message source, a different message target (although both can be the same) and some alternation between them as shown on this abstract trace diagram:

Message and source are not only IP addresses or port numbers. They can be window handles, for example. Sometimes, the roles of source and target play different Process ID and Thread ID combinations (Client ID, CID). In such cases some parts of a message text may signify reply and response as shown graphically:

The similar illustration can be done for multi-computer trace, for example, when several traces from different servers are combined into one, where a combination of CID and a computer ID (Co) or just Co can play the roles of source and target.

Note that on all illustrations above the 3rd request doesn’t have a reply message: a possible Incomplete History pattern.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 85b)

May 31st, 2013

This is a kernel space counterpart of Ubiquitous Component pattern. Such a component especially when it is Top Module can be a sign of Wait Chain(s) and Blocking Module and if it is present in the same process names - a sign of Distributed Wait Chain.

0: kd> !stacks 0 ModuleA
Proc.Thread  .Thread  Ticks   ThreadState Blocker

                            [fffffa800e673b30 svchost.exe]
 534.006240  fffffa801388f5f0 fffd41d9 Blocked    ModuleA+0x12468

                            [fffffa800e705b30 svchost.exe]
 630.000e14  fffffa800edacb50 fffdcf7a Blocked    ModuleA+0x12468
 630.000f04  fffffa8012c2fb50 fffdcf49 Blocked    ModuleA+0x12468
 630.006610  fffffa80134f5b50 fffdcf46 Blocked    ModuleA+0x12468
 630.001cfc  fffffa800f55a2d0 fffdcf44 Blocked    ModuleA+0x12468
 630.003db8  fffffa80121f1540 fffdcf43 Blocked    ModuleA+0x12468
 630.000b9c  fffffa80133d1780 fffdcf3c Blocked    ModuleA+0x12468
 630.0041c4  fffffa8013c77b50 fffdcf43 Blocked    ModuleA+0x12468
 630.00641c  fffffa8012476b50 fffdcf43 Blocked    ModuleA+0x12468
 630.006424  fffffa8013207b50 fffdcf40 Blocked    ModuleA+0x12468
 630.002fcc  fffffa80128f9060 fffdcf3e Blocked    ModuleA+0x12468
 630.003de8  fffffa80139edb50 fffdcf3d Blocked    ModuleA+0x12468
 630.0062c4  fffffa800f5ff2d0 fffdcf3c Blocked    ModuleA+0x12468
 630.0065e8  fffffa80139dcb50 fffdcf3b Blocked    ModuleA+0x12468
 630.004524  fffffa8011e51b50 fffdcf3a Blocked    ModuleA+0x12468
 630.004570  fffffa801346b060 fffdcf39 Blocked    ModuleA+0x12468
 630.00173c  fffffa8010b99b50 fffdcf39 Blocked    ModuleA+0x12468

                            [fffffa800f63db30 iexplore.exe]
24c4.0024c8  fffffa800fe854e0 fffcb6cf Blocked    ModuleA+0x12468

                            [fffffa8010b9ab30 explorer.exe]
2b64.0043d0  fffffa8012e8ab00 fffd9095 Blocked    ModuleA+0x12468

                            [fffffa800fe55060 explorer.exe]
2c80.002e58  fffffa8012e75060 fffba7af Blocked    ModuleA+0x12468

                            [fffffa8010c54b30 iexplore.exe]
2e3c.002e98  fffffa8010c75620 fffcbb7f Blocked    ModuleA+0x12468

                            [fffffa80111c3720 iexplore.exe]
32d8.003230  fffffa80111b1b00 fffd41d9 Blocked    ModuleA+0x12468

                            [fffffa80110cb690 iexplore.exe]
2e74.002854  fffffa8011121b00 fffbe8a4 Blocked    ModuleA+0x12468

                            [fffffa801146cb30 OUTLOOK.EXE]
35cc.0035e8  fffffa8013831b00 fffaf33a Blocked    ModuleA+0x12468

                            [fffffa80105a5640 OUTLOOK.EXE]
3858.00385c  fffffa801133ab00 fffd3691 Blocked    ModuleA+0x12468

                            [fffffa8011998060 explorer.exe]
3d70.004a0c  fffffa80139ddb00 fffd0482 Blocked    ModuleA+0x12468

                            [fffffa8010ff5850 OUTLOOK.EXE]
3540.000458  fffffa8011052b00 fffbd007 Blocked    ModuleA+0x12468

                            [fffffa8011d3d060 OUTLOOK.EXE]
49f8.0049fc  fffffa8011c78060 fffdbbf9 Blocked    ModuleA+0x12468

                            [fffffa801241b060 OUTLOOK.EXE]
4888.005af0  fffffa8012e8eab0 fffae442 Blocked    ModuleA+0x12468
4888.003d24  fffffa800eca7b00 fffae443 Blocked    ModuleA+0x12468

                            [fffffa8012687b30 explorer.exe]
5048.0051fc  fffffa801129cb00 fffca8bf Blocked    ModuleA+0x12468

                            [fffffa8011c1e060 OUTLOOK.EXE]
52c4.00117c  fffffa80130f8710 fffaa157 Blocked    ModuleA+0x12468
52c4.0045fc  fffffa801374f060 fffaa15e Blocked    ModuleA+0x12468

                            [fffffa8011c42b30 explorer.exe]
5898.0001ec  fffffa80137a1b00 fffd8da0 Blocked    ModuleA+0x12468

                            [fffffa8012e04b30 OUTLOOK.EXE]
5a74.004954  fffffa8012e05060 fffa9ff8 Blocked    ModuleA+0x12468

                            [fffffa8010908b30 spoolsv.exe]
2724.004190  fffffa8011ea1060 fffdcafb Blocked    ModuleA+0x12468

                            [fffffa801206eb30 WerFault.exe]
3e50.005424  fffffa8013c5eb00 fffdcf39 Blocked    ModuleA+0x12468

                            [fffffa800f8cf2a0 WerFault.exe]
 9f4.00570c  fffffa8013c8ab00 fffdca9f Blocked    ModuleA+0x12468

                            [fffffa8013af1060 WerFault.exe]
3c74.002b80  fffffa8013c5c060 fffd9dc8 Blocked    ModuleA+0x12468

                            [fffffa800f8053a0 WINWORD.EXE]
3dd0.0066a8  fffffa800ce618c0 fffd7c02 Blocked    ModuleA+0x12468

                            [fffffa8010b66b30 WINWORD.EXE]
62a4.001934  fffffa801368c430 fffd7ce7 Blocked    ModuleA+0x12468

                            [fffffa80141dc060 WerFault.exe]
17d0.0052e4  fffffa801347a060 fffd57b8 Blocked    ModuleA+0x12468

                            [fffffa8012629760 WerFault.exe]
621c.005b64  fffffa8011e395d0 fffc8dc2 Blocked    ModuleA+0x12468

                            [fffffa80131a75d0 explorer.exe]
4884.002b34  fffffa8013dc3b00 fffd67bc Blocked    ModuleA+0x12468

[...]

Threads Processed: 5948

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 199)

May 28th, 2013

Processes with one thread like Notepad are rare. Such a process is always suspicious especially if it is a service or belongs to a complex product. We call such a pattern One-Thread Process. Usually this happens when all other threads terminated and the remaining thread is blocked in some wait chain. For example, this process has a thread which is blocked in an ALPC request to itself (the same process):

0: kd> !process fffffa8013ed9b30 ff
PROCESS fffffa8013ed9b30
    SessionId: 0  Cid: 44b4    Peb: 7fffffd8000  ParentCid: 0114
    DirBase: 2da448000  ObjectTable: fffff8a01948c670  HandleCount: 660.
    Image: ServiceA.exe
    VadRoot fffffa801356dd10 Vads 398 Clone 0 Private 5795. Modified 204253. Locked 0.
    DeviceMap fffff8a000008340
    Token                             fffff8a01b546060
    ElapsedTime                       01:32:37.622
    UserTime                          00:00:01.421
    KernelTime                        00:00:01.578
    QuotaPoolUsage[PagedPool]         0
    QuotaPoolUsage[NonPagedPool]      0
    Working Set Sizes (now,min,max)  (1525, 50, 345) (6100KB, 200KB, 1380KB)
    PeakWorkingSetSize                7607
    VirtualSize                       178 Mb
    PeakVirtualSize                   182 Mb
    PageFaultCount                    752709
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      8043

        THREAD fffffa8012caab50  Cid 44b4.4f70  Teb: 000007fffff5a000 Win32Thread: 0000000000000000 WAIT: (WrLpcReply) KernelMode Non-Alertable
            fffffa8012caaf18  Semaphore Limit 0x1
        Waiting for reply to ALPC Message fffff8a0194d4780 : queued at port fffffa8012911c80 : owned by process fffffa8013ed9b30
        IRP List:
            fffffa8013923300: (0006,0118) Flags: 00060000  Mdl: 00000000
        Not impersonating
        DeviceMap                 fffff8a000008340
        Owning Process            fffffa8013ed9b30       Image:         ServiceA.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      139828         Ticks: 347372 (0:01:30:27.687)
        Context Switch Count      7380            
        UserTime                  00:00:00.031
        KernelTime                00:00:04.890
        Win32 Start Address ServiceA (0×00000001401156e0)
        Stack Init fffff88014c9ddb0 Current fffff88014c9c6b0
        Base fffff88014c9e000 Limit fffff88014c98000 Call 0
        Priority 8 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           Call Site
        fffff880`14c9c6f0 fffff800`01873652 nt!KiSwapContext+0×7a
        fffff880`14c9c830 fffff800`01884a9f nt!KiCommitThreadWait+0×1d2
        fffff880`14c9c8c0 fffff800`0189f04f nt!KeWaitForSingleObject+0×19f
        fffff880`14c9c960 fffff800`01b919f6 nt!AlpcpSignalAndWait+0×8f
        fffff880`14c9ca10 fffff800`01b910f0 nt!AlpcpReceiveSynchronousReply+0×46
        fffff880`14c9ca70 fffff800`01b9519d nt!AlpcpProcessSynchronousRequest+0×33d
        fffff880`14c9cbb0 fffff800`01b95276 nt!LpcpRequestWaitReplyPort+0×9c
        fffff880`14c9cc10 fffff800`0187ced3 nt!NtRequestWaitReplyPort+0×76
        fffff880`14c9cc60 fffff800`01879490 nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`14c9cc60)
        fffff880`14c9cdf8 fffff880`05c31050 nt!KiServiceLinkage
        fffff880`14c9ce70 fffff880`045ce005 ModuleA+0×12468
        […]
        fffff880`14c9da10 fffff800`01b9d3b6 nt!IopXxxControlFile+0×607
        fffff880`14c9db40 fffff800`0187ced3 nt!NtDeviceIoControlFile+0×56
        fffff880`14c9dbb0 00000000`76d8138a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`14c9dc20)
        00000000`082af028 000007fe`fd366cf6 ntdll!NtDeviceIoControlFile+0xa
        00000000`082af030 00000000`76c2683f KERNELBASE!TlsGetValue+0×1a36
        00000000`082af0a0 00000001`4019d38c kernel32!DeviceIoControlImplementation+0×7f
        […]

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Crash Dump Analysis Patterns (Part 198)

May 26th, 2013

All previous wait chain patterns were about single wait chains. However, it is often a case when there are many different wait chains in a memory dump especially in terminal services environments. There can be ALPC and critical section wait chains at the same time. The can be related or completely disjoint. Here we call a special case of several wait chains having the same structure (and possibly pointing to one direction) Distributed Wait Chain. One such example we put below. In a stack trace collection from a complete memory dump from a hanging system we found several explorer.exe processes with critical section wait chains having the same structure and endpoint of a top and blocking ModuleA:

THREAD fffffa80137cf060  Cid 4884.4f9c  Teb: 000007fffffaa000 Win32Thread: fffff900c0fb98b0 WAIT: (UserRequest) UserMode Non-Alertable
    fffffa8013570dc0  SynchronizationEvent
Not impersonating
DeviceMap                 fffff8a014e21d90
Owning Process            fffffa80131a75d0       Image:         explorer.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      274752         Ticks: 212448 (0:00:55:19.500)
Context Switch Count      9889                 LargeStack
UserTime                  00:00:00.093
KernelTime                00:00:00.171
Win32 Start Address SHLWAPI!WrapperThreadProc (0×000007fefdafc608)
Stack Init fffff88013c25db0 Current fffff88013c25900
Base fffff88013c26000 Limit fffff88013c1b000 Call 0
Priority 11 BasePriority 9 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP          RetAddr           Call Site
fffff880`13c25940 fffff800`01873652 nt!KiSwapContext+0×7a
fffff880`13c25a80 fffff800`01884a9f nt!KiCommitThreadWait+0×1d2
fffff880`13c25b10 fffff800`01b7768e nt!KeWaitForSingleObject+0×19f
fffff880`13c25bb0 fffff800`0187ced3 nt!NtWaitForSingleObject+0xde
fffff880`13c25c20 00000000`76d8135a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`13c25c20)
00000000`0489e518 00000000`76d7e4e8 ntdll!ZwWaitForSingleObject+0xa
00000000`0489e520 00000000`76d7e3db ntdll!RtlpWaitOnCriticalSection+0xe8
00000000`0489e5d0 000007fe`fdf8ff50 ntdll!RtlEnterCriticalSection+0xd1
00000000`0489e600 000007fe`fdf8fbd3 SHELL32!CFSFolder::GetIconOf+0×24b
00000000`0489f3a0 000007fe`fdf903d3 SHELL32!SHGetIconIndexFromPIDL+0×3f
00000000`0489f3d0 00000000`ff900328 SHELL32!SHMapIDListToSystemImageListIndexAsync+0×73
00000000`0489f470 00000000`ff8fff4b Explorer!SFTBarHost::AddImageForItem+0×9c
00000000`0489f4d0 00000000`ff8fd2f1 Explorer!SFTBarHost::_InternalRepopulateList+0×4ad
00000000`0489f5d0 00000000`ff8fd0b4 Explorer!SFTBarHost::_RepopulateList+0×1f3
00000000`0489f600 00000000`ff8fcccd Explorer!SFTBarHost::_OnBackgroundEnumDone+0xc1
00000000`0489f630 00000000`ff8fc9e2 Explorer!SFTBarHost::_WndProc+0×451
00000000`0489f680 00000000`76669bd1 Explorer!SFTBarHost::_WndProc_ProgramsMFU+0×1b
00000000`0489f6b0 00000000`766698da USER32!UserCallWinProcCheckWow+0×1ad
00000000`0489f770 00000000`ff8f1177 USER32!DispatchMessageWorker+0×3b5
00000000`0489f7f0 00000000`ff9130e9 Explorer!CTray::_MessageLoop+0×446
00000000`0489f880 000007fe`fdafc71e Explorer!CTray::MainThreadProc+0×8a
00000000`0489f8b0 00000000`76c2652d SHLWAPI!WrapperThreadProc+0×19b
00000000`0489f9b0 00000000`76d5c521 kernel32!BaseThreadInitThunk+0xd
00000000`0489f9e0 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

0: kd> .process /r /p fffffa80131a75d0
Implicit process is now fffffa80`131a75d0
Loading User Symbols

0: kd> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x0000000000499d90
Critical section   = 0x000007fefe3d5900 (SHELL32!g_csIconCache+0x0)
LOCKED
LockCount          = 0×2
WaiterWoken        = No
OwningThread       = 0×0000000000002b34
RecursionCount     = 0×1
LockSemaphore      = 0×7F8
SpinCount          = 0×0000000000000000
OwningThread       = .thread fffffa8013dc3b00

THREAD fffffa8013dc3b00  Cid 4884.2b34  Teb: 000007fffffac000 Win32Thread: fffff900c2bc1010 WAIT: (Executive) KernelMode Non-Alertable
    fffff88011c03600  SynchronizationEvent
IRP List:
    fffffa800f8fc790: (0006,0430) Flags: 00000404  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a014e21d90
Owning Process            fffffa80131a75d0       Image:         explorer.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      170052         Ticks: 317148 (0:01:22:35.437)
Context Switch Count      2                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address SHELL32!ShutdownThreadProc (0x000007fefe13ef54)
Stack Init fffff88011c03db0 Current fffff88011c03320
Base fffff88011c04000 Limit fffff88011bfd000 Call 0
Priority 11 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
Child-SP          RetAddr           Call Site
fffff880`11c03360 fffff800`01873652 nt!KiSwapContext+0x7a
fffff880`11c034a0 fffff800`01884a9f nt!KiCommitThreadWait+0x1d2
fffff880`11c03530 fffff880`05c12383 nt!KeWaitForSingleObject+0x19f
fffff880`11c035d0 fffff880`012b9288 ModuleA+0×12468
fffff880`11c03750 fffff880`012b7d1b fltmgr!FltpPerformPostCallbacks+0×368
fffff880`11c03820 fffff880`012b66df fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0×39b
fffff880`11c038b0 fffff880`01b895ff fltmgr!FltpDispatch+0xcf
fffff880`11c03a30 fffff800`01b783b4 nt!IopCloseFile+0×11f
fffff880`11c03ac0 fffff800`01b78171 nt!ObpDecrementHandleCount+0xb4
fffff880`11c03b40 fffff800`01b78734 nt!ObpCloseHandleTableEntry+0xb1
fffff880`11c03bd0 fffff800`0187ced3 nt!ObpCloseHandle+0×94
fffff880`11c03c20 00000000`76d8140a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`11c03c20)
00000000`0754f348 000007fe`fd341873 ntdll!NtClose+0xa
00000000`0754f350 00000000`76c32f51 KERNELBASE!CloseHandle+0×13
00000000`0754f380 000007fe`fdaf9690 kernel32!CloseHandleImplementation+0×3d
00000000`0754f490 000007fe`fe191d7f SHLWAPI!CFileStream::Release+0×84
00000000`0754f4c0 000007fe`fe13ed57 SHELL32!IconCacheSave+0×2b7
00000000`0754f780 000007fe`fe13f0c6 SHELL32!CommonRestart+0×2f
00000000`0754f7f0 00000000`76c2652d SHELL32!ShutdownThreadProc+0×172
00000000`0754f820 00000000`76d5c521 kernel32!BaseThreadInitThunk+0xd
00000000`0754f850 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 72)

May 21st, 2013

Sometimes we look at a trace or log and instead of Error Messages we only see their “signs” such as a DLL load event for an error or fault reporting module or a module that is related to symbol files such diasymreader.dll. We call this pattern Hidden Error by an analogy to Hidden Exception in memory dump analysis pattern catalogue although sometimes we can see such modules in memory dump Module Collection. For example, the presence of diasymreader module may signify an unreported .NET exception and suggest a dump collection strategy.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 71)

May 12th, 2013

Sometimes we need to analyse Last Activity before Significant Event or Discontinuity. By this pattern we mean a loose semantic collection of messages before process exit, for example. This might give some clues to further troubleshooting. In one incident a process was suddenly exiting. Its own detailed trace didn’t have any messages explaining that probably due to insufficient tracing coverage (Sparse Trace). Fortunately, a different external trace (from Process Monitor) was collected (Inter-Correlation) and it had LDAP network communication messages just before thread and process exit events.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -   

Memory Dump Analysis Best Practices (Part 3)

May 8th, 2013

Another best practice that is directly related to productivity is a parallel processing of the same memory dump especially in the case of complete memory dumps. Here an analysis might start with running time consuming scripts that dump all process and threads in the variety of formats such as x64 and x86 thread stack traces. However, if the nature of the problem is such that it is possible to start with some pattern and continue unfolding its analysis then we can do that in parallel. One of examples may be a discovered Incomplete Session with an ALPC Wait Chain. Here we can follow such a wait chain while another WinDbg instance dumps all threads for further pattern search later.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org

Memory Dump Analysis Best Practices (Part 2)

May 7th, 2013

We continue with best practices, the previous was SCP. The second best practice is to check the system for additional patterns after the main pattern was found (similar to avoiding Common Mistake 8). For example, in the case of a bug check resulted from NULL pointer dereference or any other exception in some 3rd-party driver code don’t stop but look at all CPUs, processes and threads to find any other patterns such as Spiking Threads, Busy System, and Contention. Inspection of associated thread stack traces might reveal the same module and/or give additional clues to system behaviour prior to the fault.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 70)

May 4th, 2013

For event- or message-driven architectures it is important to differentiate between event and state messages (including state transition). For example, a system may be doing some work while being in some particular state with a lot of tracing and respond to various external events with each of them having a corresponding trace message. Upon such an event the system transitions to some other state with its own set of possible trace messages. We call such a pattern State and Event. Typical example here is a windowing terminal services system and WM_ENDSESSION event illustrated in the following abstract trace diagram with a corresponding state transition diagram below it:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 69)

April 25th, 2013

Sometimes in the case of missing Basic Facts we can discern Indirect Facts from message text and even from other patterns. For example, in one incident we were interested in all messages from the certain process name but its PID was missing from the problem description. Fortunately we were able to get its PID from one of individual messages from completely different source:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Fool (Debugging Slang, Part 38)

March 31st, 2013

Fool - The long version of a foo program used in C and C++ tutorials.

Examples: What a foolish program!

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Generalized Software Narrative and Trace

March 25th, 2013

In the past we viewed software traces and logs as temporarily ordered event sequences. Since events are just memory data we have a map

T  -> M

as can be seen in the definition of a software trace. Here we generalize the domain to any arbitrary set, for example, it can be a list of indexes or pointers or even memory itself. The latter map can give us narrative chains such as

M -> M -> M -> M

and even give us a grand unification of memory and log analysis and the possibility to apply software narratology to memory dump analysis as well. We talk about it soon and provide some generalized software narrative examples.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 68)

March 23rd, 2013

While preparing a presentation on malware narratives we found that one essential pattern is missing from the current log analysis pattern catalog. Most of the time we see some abnormal or unexpected value in a software trace or log such as a network address outside expected range and this triggers further investigation. The message structure may be the same having the same Message Invariant but variable part may contain such values as depicted graphically:

We call this pattern Abnormal Value and plan to add the similar one to memory dump analysis pattern catalog. Please not that we also have Significant Event pattern that is more general and also covers messages without variable part or just suspicious log entries.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

More 10 Common Mistakes in Memory Analysis (Part 1)

March 22nd, 2013

Because we finished the list of 10 common mistakes some time ago we continue with “more” series. A year ago we discovered the need to pay attention to differences between 32-bit and 64-bit versions of critical section structures and the need for explicit symbol qualification in x86 mode to avoid mistakes. This post was in draft since then and we now publish it. Suppose we see the address of a critical section on a 32-bit stack trace:

0:000:x86> kv
ChildEBP RetAddr  Args to Child
0044f40c 774e8dd4 00000a94 00000000 00000000 ntdll_774b0000!ZwWaitForSingleObject+0x15
0044f470 774e8cb8 00000000 00000000 041f4b78 ntdll_774b0000!RtlpWaitOnCriticalSection+0x13e
0044f498 0123f70c 010d97c0 8c62ec9c 010cc5fc ntdll_774b0000!RtlEnterCriticalSection+0×150

0:000:x86> dt _RTL_CRITICAL_SECTION 010d97c0
ntdll!_RTL_CRITICAL_SECTION
+0x000 DebugInfo        : 0x00862680 _RTL_CRITICAL_SECTION_DEBUG
+0x008 LockCount        : 0n1
+0x00c RecursionCount   : 0n103356
+0×010 OwningThread     : 0×00000a94 Void
+0×018 LockSemaphore    : 0×0817d72d Void
+0×020 SpinCount        : 0×6130910c`010d9840

Its owner thread has a94 TID but we don’t see it in the thread list:

0:000:x86> ~
.  0  Id: 19508.17944 Suspend: 0 Teb: 7efdb000 Unfrozen
1  Id: 19508.1922c Suspend: 0 Teb: 7efd8000 Unfrozen
2  Id: 19508.195d4 Suspend: 0 Teb: 7efd5000 Unfrozen
3  Id: 19508.19a80 Suspend: 0 Teb: 7efa7000 Unfrozen
4  Id: 19508.19544 Suspend: 0 Teb: 7efa4000 Unfrozen
5  Id: 19508.1925c Suspend: 0 Teb: 7efa1000 Unfrozen
6  Id: 19508.193d4 Suspend: 0 Teb: 7ef9d000 Unfrozen
7  Id: 19508.19b18 Suspend: 0 Teb: 7ef9a000 Unfrozen
8  Id: 19508.19bfc Suspend: 0 Teb: 7ef97000 Unfrozen
9  Id: 19508.19bc4 Suspend: 0 Teb: 7ef94000 Unfrozen
10  Id: 19508.19a90 Suspend: 0 Teb: 7ef91000 Unfrozen
11  Id: 19508.189c0 Suspend: 0 Teb: 7ef8d000 Unfrozen
12  Id: 19508.193bc Suspend: 0 Teb: 7ef8a000 Unfrozen
13  Id: 19508.18f3c Suspend: 0 Teb: 7ef87000 Unfrozen
14  Id: 19508.18834 Suspend: 0 Teb: 7ef84000 Unfrozen
15  Id: 19508.19aec Suspend: 0 Teb: 7ef81000 Unfrozen
16  Id: 19508.180f4 Suspend: 0 Teb: 7ef7d000 Unfrozen
17  Id: 19508.19a3c Suspend: 0 Teb: 7ef7a000 Unfrozen
18  Id: 19508.1916c Suspend: 0 Teb: 7ef77000 Unfrozen
19  Id: 19508.19324 Suspend: 0 Teb: 7ef74000 Unfrozen
20  Id: 19508.19a78 Suspend: 0 Teb: 7ef71000 Unfrozen
21  Id: 19508.19ad4 Suspend: 0 Teb: 7ef6d000 Unfrozen
22  Id: 19508.19834 Suspend: 0 Teb: 7ef6a000 Unfrozen
23  Id: 19508.19754 Suspend: 0 Teb: 7ef67000 Unfrozen
24  Id: 19508.19aa0 Suspend: 0 Teb: 7ef64000 Unfrozen
25  Id: 19508.19bd0 Suspend: 0 Teb: 7ef61000 Unfrozen
26  Id: 19508.19384 Suspend: 0 Teb: 7ef5d000 Unfrozen
27  Id: 19508.1734c Suspend: 0 Teb: 7ef5a000 Unfrozen
28  Id: 19508.19148 Suspend: 0 Teb: 7ef57000 Unfrozen
29  Id: 19508.19b74 Suspend: 0 Teb: 7ef54000 Unfrozen
30  Id: 19508.18290 Suspend: 0 Teb: 7ef51000 Unfrozen
31  Id: 19508.19a4c Suspend: 0 Teb: 7ef4d000 Unfrozen
32  Id: 19508.19bc0 Suspend: 0 Teb: 7ef4a000 Unfrozen
33  Id: 19508.18bf0 Suspend: 0 Teb: 7ef47000 Unfrozen
34  Id: 19508.1895c Suspend: 0 Teb: 7ef44000 Unfrozen
35  Id: 19508.19314 Suspend: 0 Teb: 7ef41000 Unfrozen
36  Id: 19508.19934 Suspend: 0 Teb: 7ef3a000 Unfrozen
37  Id: 19508.197b0 Suspend: 0 Teb: 7ef31000 Unfrozen
38  Id: 19508.1962c Suspend: 0 Teb: 7ef2d000 Unfrozen
39  Id: 19508.191e0 Suspend: 0 Teb: 7ef2a000 Unfrozen
40  Id: 19508.19438 Suspend: 0 Teb: 7ef27000 Unfrozen
41  Id: 19508.197e8 Suspend: 0 Teb: 7ef24000 Unfrozen
42  Id: 19508.18c38 Suspend: 0 Teb: 7ef21000 Unfrozen
43  Id: 19508.197b4 Suspend: 0 Teb: 7ef1d000 Unfrozen
44  Id: 19508.1978c Suspend: 0 Teb: 7ef1a000 Unfrozen
45  Id: 19508.19b84 Suspend: 0 Teb: 7ef17000 Unfrozen
46  Id: 19508.197a8 Suspend: 0 Teb: 7ef14000 Unfrozen
47  Id: 19508.19660 Suspend: 0 Teb: 7ef3d000 Unfrozen
48  Id: 19508.18574 Suspend: 0 Teb: 7efad000 Unfrozen
49  Id: 19508.17a04 Suspend: 0 Teb: 7efaa000 Unfrozen

We see a correct result if we specify a different structure:

0:000:x86> dt CRITICAL_SECTION 010d97c0
ModuleA!CRITICAL_SECTION
+0x000 DebugInfo        : 0x00862680 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount        : 0n-6
+0x008 RecursionCount   : 0n1
+0×00c OwningThread     : 0×000193bc Void
+0×010 LockSemaphore    : 0×00000a94 Void
+0×014 SpinCount        : 0

This is because the structure definition is from a 32-bit module:

0:000:x86> dt ModuleA!CRITICAL_SECTION
ModuleA!CRITICAL_SECTION
+0x000 DebugInfo        : Ptr32 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount        : Int4B
+0x008 RecursionCount   : Int4B
+0×00c OwningThread     : Ptr32 Void
+0×010 LockSemaphore    : Ptr32 Void
+0×014 SpinCount        : Uint4B

However, the structure we used first is from 64-bit module and has a different offset and size for OwningThread field:

0:000:x86> dt ntdll!_RTL_CRITICAL_SECTION
+0x000 DebugInfo        : Ptr64 _RTL_CRITICAL_SECTION_DEBUG
+0x008 LockCount        : Int4B
+0x00c RecursionCount   : Int4B
+0×010 OwningThread     : Ptr64 Void
+0×018 LockSemaphore    : Ptr64 Void
+0×020 SpinCount        : Uint8B

Because a different 32-bit ntdll module is also loaded we can use it for explicit symbol qualification:

0:000:x86> dt ntdll_774b0000!_RTL_CRITICAL_SECTION
+0×000 DebugInfo        : Ptr32 _RTL_CRITICAL_SECTION_DEBUG
+0×004 LockCount        : Int4B
+0×008 RecursionCount   : Int4B
+0×00c OwningThread     : Ptr32 Void
+0×010 LockSemaphore    : Ptr32 Void
+0×014 SpinCount        : Uint4B

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 67)

March 20th, 2013

Sometimes we have insufficiently detailed problem description and/or there are several similar parallel user activities going on simultaneously, for example, several session are launched in a terminal services environment. In such cases when tracing is done for the duration of specific user activity Dominant Event Sequence pattern may help. Here we select the most full sequence of events or event sequence based on some Basic Facts. For example, if session id was missing in problem description we can choose the most long and full process launch seqence and assume that its session id was the one missing:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Happy St. Patrick’s Screen

March 17th, 2013


- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 66)

March 16th, 2013

Suppose we form an adjoint thread based on some message or operation type or some other attribute:

But we do not know where to start to look backwards for any anomalies relevant to our problem:

We go back to our full trace and find a problem message:

Although it is not in our adjoint thread we formed previously it is still considered as Pivot Message helping us to go backwards there:

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -