Archive for May, 2013

Crash Dump Analysis Patterns (Part 85b)

Friday, May 31st, 2013

This is a kernel space counterpart of Ubiquitous Component pattern. Such a component especially when it is Top Module can be a sign of Wait Chain(s) and Blocking Module and if it is present in the same process names - a sign of Distributed Wait Chain.

0: kd> !stacks 0 ModuleA
Proc.Thread  .Thread  Ticks   ThreadState Blocker

                            [fffffa800e673b30 svchost.exe]
 534.006240  fffffa801388f5f0 fffd41d9 Blocked    ModuleA+0x12468

                            [fffffa800e705b30 svchost.exe]
 630.000e14  fffffa800edacb50 fffdcf7a Blocked    ModuleA+0x12468
 630.000f04  fffffa8012c2fb50 fffdcf49 Blocked    ModuleA+0x12468
 630.006610  fffffa80134f5b50 fffdcf46 Blocked    ModuleA+0x12468
 630.001cfc  fffffa800f55a2d0 fffdcf44 Blocked    ModuleA+0x12468
 630.003db8  fffffa80121f1540 fffdcf43 Blocked    ModuleA+0x12468
 630.000b9c  fffffa80133d1780 fffdcf3c Blocked    ModuleA+0x12468
 630.0041c4  fffffa8013c77b50 fffdcf43 Blocked    ModuleA+0x12468
 630.00641c  fffffa8012476b50 fffdcf43 Blocked    ModuleA+0x12468
 630.006424  fffffa8013207b50 fffdcf40 Blocked    ModuleA+0x12468
 630.002fcc  fffffa80128f9060 fffdcf3e Blocked    ModuleA+0x12468
 630.003de8  fffffa80139edb50 fffdcf3d Blocked    ModuleA+0x12468
 630.0062c4  fffffa800f5ff2d0 fffdcf3c Blocked    ModuleA+0x12468
 630.0065e8  fffffa80139dcb50 fffdcf3b Blocked    ModuleA+0x12468
 630.004524  fffffa8011e51b50 fffdcf3a Blocked    ModuleA+0x12468
 630.004570  fffffa801346b060 fffdcf39 Blocked    ModuleA+0x12468
 630.00173c  fffffa8010b99b50 fffdcf39 Blocked    ModuleA+0x12468

                            [fffffa800f63db30 iexplore.exe]
24c4.0024c8  fffffa800fe854e0 fffcb6cf Blocked    ModuleA+0x12468

                            [fffffa8010b9ab30 explorer.exe]
2b64.0043d0  fffffa8012e8ab00 fffd9095 Blocked    ModuleA+0x12468

                            [fffffa800fe55060 explorer.exe]
2c80.002e58  fffffa8012e75060 fffba7af Blocked    ModuleA+0x12468

                            [fffffa8010c54b30 iexplore.exe]
2e3c.002e98  fffffa8010c75620 fffcbb7f Blocked    ModuleA+0x12468

                            [fffffa80111c3720 iexplore.exe]
32d8.003230  fffffa80111b1b00 fffd41d9 Blocked    ModuleA+0x12468

                            [fffffa80110cb690 iexplore.exe]
2e74.002854  fffffa8011121b00 fffbe8a4 Blocked    ModuleA+0x12468

                            [fffffa801146cb30 OUTLOOK.EXE]
35cc.0035e8  fffffa8013831b00 fffaf33a Blocked    ModuleA+0x12468

                            [fffffa80105a5640 OUTLOOK.EXE]
3858.00385c  fffffa801133ab00 fffd3691 Blocked    ModuleA+0x12468

                            [fffffa8011998060 explorer.exe]
3d70.004a0c  fffffa80139ddb00 fffd0482 Blocked    ModuleA+0x12468

                            [fffffa8010ff5850 OUTLOOK.EXE]
3540.000458  fffffa8011052b00 fffbd007 Blocked    ModuleA+0x12468

                            [fffffa8011d3d060 OUTLOOK.EXE]
49f8.0049fc  fffffa8011c78060 fffdbbf9 Blocked    ModuleA+0x12468

                            [fffffa801241b060 OUTLOOK.EXE]
4888.005af0  fffffa8012e8eab0 fffae442 Blocked    ModuleA+0x12468
4888.003d24  fffffa800eca7b00 fffae443 Blocked    ModuleA+0x12468

                            [fffffa8012687b30 explorer.exe]
5048.0051fc  fffffa801129cb00 fffca8bf Blocked    ModuleA+0x12468

                            [fffffa8011c1e060 OUTLOOK.EXE]
52c4.00117c  fffffa80130f8710 fffaa157 Blocked    ModuleA+0x12468
52c4.0045fc  fffffa801374f060 fffaa15e Blocked    ModuleA+0x12468

                            [fffffa8011c42b30 explorer.exe]
5898.0001ec  fffffa80137a1b00 fffd8da0 Blocked    ModuleA+0x12468

                            [fffffa8012e04b30 OUTLOOK.EXE]
5a74.004954  fffffa8012e05060 fffa9ff8 Blocked    ModuleA+0x12468

                            [fffffa8010908b30 spoolsv.exe]
2724.004190  fffffa8011ea1060 fffdcafb Blocked    ModuleA+0x12468

                            [fffffa801206eb30 WerFault.exe]
3e50.005424  fffffa8013c5eb00 fffdcf39 Blocked    ModuleA+0x12468

                            [fffffa800f8cf2a0 WerFault.exe]
 9f4.00570c  fffffa8013c8ab00 fffdca9f Blocked    ModuleA+0x12468

                            [fffffa8013af1060 WerFault.exe]
3c74.002b80  fffffa8013c5c060 fffd9dc8 Blocked    ModuleA+0x12468

                            [fffffa800f8053a0 WINWORD.EXE]
3dd0.0066a8  fffffa800ce618c0 fffd7c02 Blocked    ModuleA+0x12468

                            [fffffa8010b66b30 WINWORD.EXE]
62a4.001934  fffffa801368c430 fffd7ce7 Blocked    ModuleA+0x12468

                            [fffffa80141dc060 WerFault.exe]
17d0.0052e4  fffffa801347a060 fffd57b8 Blocked    ModuleA+0x12468

                            [fffffa8012629760 WerFault.exe]
621c.005b64  fffffa8011e395d0 fffc8dc2 Blocked    ModuleA+0x12468

                            [fffffa80131a75d0 explorer.exe]
4884.002b34  fffffa8013dc3b00 fffd67bc Blocked    ModuleA+0x12468


Threads Processed: 5948

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 199)

Tuesday, May 28th, 2013

Processes with one thread like Notepad are rare. Such a process is always suspicious especially if it is a service or belongs to a complex product. We call such a pattern One-Thread Process. Usually this happens when all other threads terminated and the remaining thread is blocked in some wait chain. For example, this process has a thread which is blocked in an ALPC request to itself (the same process):

0: kd> !process fffffa8013ed9b30 ff
PROCESS fffffa8013ed9b30
    SessionId: 0  Cid: 44b4    Peb: 7fffffd8000  ParentCid: 0114
    DirBase: 2da448000  ObjectTable: fffff8a01948c670  HandleCount: 660.
    Image: ServiceA.exe
    VadRoot fffffa801356dd10 Vads 398 Clone 0 Private 5795. Modified 204253. Locked 0.
    DeviceMap fffff8a000008340
    Token                             fffff8a01b546060
    ElapsedTime                       01:32:37.622
    UserTime                          00:00:01.421
    KernelTime                        00:00:01.578
    QuotaPoolUsage[PagedPool]         0
    QuotaPoolUsage[NonPagedPool]      0
    Working Set Sizes (now,min,max)  (1525, 50, 345) (6100KB, 200KB, 1380KB)
    PeakWorkingSetSize                7607
    VirtualSize                       178 Mb
    PeakVirtualSize                   182 Mb
    PageFaultCount                    752709
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      8043

        THREAD fffffa8012caab50  Cid 44b4.4f70  Teb: 000007fffff5a000 Win32Thread: 0000000000000000 WAIT: (WrLpcReply) KernelMode Non-Alertable
            fffffa8012caaf18  Semaphore Limit 0x1
        Waiting for reply to ALPC Message fffff8a0194d4780 : queued at port fffffa8012911c80 : owned by process fffffa8013ed9b30
        IRP List:
            fffffa8013923300: (0006,0118) Flags: 00060000  Mdl: 00000000
        Not impersonating
        DeviceMap                 fffff8a000008340
        Owning Process            fffffa8013ed9b30       Image:         ServiceA.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      139828         Ticks: 347372 (0:01:30:27.687)
        Context Switch Count      7380            
        UserTime                  00:00:00.031
        KernelTime                00:00:04.890
        Win32 Start Address ServiceA (0×00000001401156e0)
        Stack Init fffff88014c9ddb0 Current fffff88014c9c6b0
        Base fffff88014c9e000 Limit fffff88014c98000 Call 0
        Priority 8 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           Call Site
        fffff880`14c9c6f0 fffff800`01873652 nt!KiSwapContext+0×7a
        fffff880`14c9c830 fffff800`01884a9f nt!KiCommitThreadWait+0×1d2
        fffff880`14c9c8c0 fffff800`0189f04f nt!KeWaitForSingleObject+0×19f
        fffff880`14c9c960 fffff800`01b919f6 nt!AlpcpSignalAndWait+0×8f
        fffff880`14c9ca10 fffff800`01b910f0 nt!AlpcpReceiveSynchronousReply+0×46
        fffff880`14c9ca70 fffff800`01b9519d nt!AlpcpProcessSynchronousRequest+0×33d
        fffff880`14c9cbb0 fffff800`01b95276 nt!LpcpRequestWaitReplyPort+0×9c
        fffff880`14c9cc10 fffff800`0187ced3 nt!NtRequestWaitReplyPort+0×76
        fffff880`14c9cc60 fffff800`01879490 nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`14c9cc60)
        fffff880`14c9cdf8 fffff880`05c31050 nt!KiServiceLinkage
        fffff880`14c9ce70 fffff880`045ce005 ModuleA+0×12468
        fffff880`14c9da10 fffff800`01b9d3b6 nt!IopXxxControlFile+0×607
        fffff880`14c9db40 fffff800`0187ced3 nt!NtDeviceIoControlFile+0×56
        fffff880`14c9dbb0 00000000`76d8138a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`14c9dc20)
        00000000`082af028 000007fe`fd366cf6 ntdll!NtDeviceIoControlFile+0xa
        00000000`082af030 00000000`76c2683f KERNELBASE!TlsGetValue+0×1a36
        00000000`082af0a0 00000001`4019d38c kernel32!DeviceIoControlImplementation+0×7f

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 198)

Sunday, May 26th, 2013

All previous wait chain patterns were about single wait chains. However, it is often a case when there are many different wait chains in a memory dump especially in terminal services environments. There can be ALPC and critical section wait chains at the same time. The can be related or completely disjoint. Here we call a special case of several wait chains having the same structure (and possibly pointing to one direction) Distributed Wait Chain. One such example we put below. In a stack trace collection from a complete memory dump from a hanging system we found several explorer.exe processes with critical section wait chains having the same structure and endpoint of a top and blocking ModuleA:

THREAD fffffa80137cf060  Cid 4884.4f9c  Teb: 000007fffffaa000 Win32Thread: fffff900c0fb98b0 WAIT: (UserRequest) UserMode Non-Alertable
    fffffa8013570dc0  SynchronizationEvent
Not impersonating
DeviceMap                 fffff8a014e21d90
Owning Process            fffffa80131a75d0       Image:         explorer.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      274752         Ticks: 212448 (0:00:55:19.500)
Context Switch Count      9889                 LargeStack
UserTime                  00:00:00.093
KernelTime                00:00:00.171
Win32 Start Address SHLWAPI!WrapperThreadProc (0×000007fefdafc608)
Stack Init fffff88013c25db0 Current fffff88013c25900
Base fffff88013c26000 Limit fffff88013c1b000 Call 0
Priority 11 BasePriority 9 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP          RetAddr           Call Site
fffff880`13c25940 fffff800`01873652 nt!KiSwapContext+0×7a
fffff880`13c25a80 fffff800`01884a9f nt!KiCommitThreadWait+0×1d2
fffff880`13c25b10 fffff800`01b7768e nt!KeWaitForSingleObject+0×19f
fffff880`13c25bb0 fffff800`0187ced3 nt!NtWaitForSingleObject+0xde
fffff880`13c25c20 00000000`76d8135a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`13c25c20)
00000000`0489e518 00000000`76d7e4e8 ntdll!ZwWaitForSingleObject+0xa
00000000`0489e520 00000000`76d7e3db ntdll!RtlpWaitOnCriticalSection+0xe8
00000000`0489e5d0 000007fe`fdf8ff50 ntdll!RtlEnterCriticalSection+0xd1
00000000`0489e600 000007fe`fdf8fbd3 SHELL32!CFSFolder::GetIconOf+0×24b
00000000`0489f3a0 000007fe`fdf903d3 SHELL32!SHGetIconIndexFromPIDL+0×3f
00000000`0489f3d0 00000000`ff900328 SHELL32!SHMapIDListToSystemImageListIndexAsync+0×73
00000000`0489f470 00000000`ff8fff4b Explorer!SFTBarHost::AddImageForItem+0×9c
00000000`0489f4d0 00000000`ff8fd2f1 Explorer!SFTBarHost::_InternalRepopulateList+0×4ad
00000000`0489f5d0 00000000`ff8fd0b4 Explorer!SFTBarHost::_RepopulateList+0×1f3
00000000`0489f600 00000000`ff8fcccd Explorer!SFTBarHost::_OnBackgroundEnumDone+0xc1
00000000`0489f630 00000000`ff8fc9e2 Explorer!SFTBarHost::_WndProc+0×451
00000000`0489f680 00000000`76669bd1 Explorer!SFTBarHost::_WndProc_ProgramsMFU+0×1b
00000000`0489f6b0 00000000`766698da USER32!UserCallWinProcCheckWow+0×1ad
00000000`0489f770 00000000`ff8f1177 USER32!DispatchMessageWorker+0×3b5
00000000`0489f7f0 00000000`ff9130e9 Explorer!CTray::_MessageLoop+0×446
00000000`0489f880 000007fe`fdafc71e Explorer!CTray::MainThreadProc+0×8a
00000000`0489f8b0 00000000`76c2652d SHLWAPI!WrapperThreadProc+0×19b
00000000`0489f9b0 00000000`76d5c521 kernel32!BaseThreadInitThunk+0xd
00000000`0489f9e0 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

0: kd> .process /r /p fffffa80131a75d0
Implicit process is now fffffa80`131a75d0
Loading User Symbols

0: kd> !cs -l -o -s
DebugInfo          = 0x0000000000499d90
Critical section   = 0x000007fefe3d5900 (SHELL32!g_csIconCache+0x0)
LockCount          = 0×2
WaiterWoken        = No
OwningThread       = 0×0000000000002b34
RecursionCount     = 0×1
LockSemaphore      = 0×7F8
SpinCount          = 0×0000000000000000
OwningThread       = .thread fffffa8013dc3b00

THREAD fffffa8013dc3b00  Cid 4884.2b34  Teb: 000007fffffac000 Win32Thread: fffff900c2bc1010 WAIT: (Executive) KernelMode Non-Alertable
    fffff88011c03600  SynchronizationEvent
IRP List:
    fffffa800f8fc790: (0006,0430) Flags: 00000404  Mdl: 00000000
Not impersonating
DeviceMap                 fffff8a014e21d90
Owning Process            fffffa80131a75d0       Image:         explorer.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      170052         Ticks: 317148 (0:01:22:35.437)
Context Switch Count      2                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address SHELL32!ShutdownThreadProc (0x000007fefe13ef54)
Stack Init fffff88011c03db0 Current fffff88011c03320
Base fffff88011c04000 Limit fffff88011bfd000 Call 0
Priority 11 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
Child-SP          RetAddr           Call Site
fffff880`11c03360 fffff800`01873652 nt!KiSwapContext+0x7a
fffff880`11c034a0 fffff800`01884a9f nt!KiCommitThreadWait+0x1d2
fffff880`11c03530 fffff880`05c12383 nt!KeWaitForSingleObject+0x19f
fffff880`11c035d0 fffff880`012b9288 ModuleA+0×12468
fffff880`11c03750 fffff880`012b7d1b fltmgr!FltpPerformPostCallbacks+0×368
fffff880`11c03820 fffff880`012b66df fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0×39b
fffff880`11c038b0 fffff880`01b895ff fltmgr!FltpDispatch+0xcf
fffff880`11c03a30 fffff800`01b783b4 nt!IopCloseFile+0×11f
fffff880`11c03ac0 fffff800`01b78171 nt!ObpDecrementHandleCount+0xb4
fffff880`11c03b40 fffff800`01b78734 nt!ObpCloseHandleTableEntry+0xb1
fffff880`11c03bd0 fffff800`0187ced3 nt!ObpCloseHandle+0×94
fffff880`11c03c20 00000000`76d8140a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`11c03c20)
00000000`0754f348 000007fe`fd341873 ntdll!NtClose+0xa
00000000`0754f350 00000000`76c32f51 KERNELBASE!CloseHandle+0×13
00000000`0754f380 000007fe`fdaf9690 kernel32!CloseHandleImplementation+0×3d
00000000`0754f490 000007fe`fe191d7f SHLWAPI!CFileStream::Release+0×84
00000000`0754f4c0 000007fe`fe13ed57 SHELL32!IconCacheSave+0×2b7
00000000`0754f780 000007fe`fe13f0c6 SHELL32!CommonRestart+0×2f
00000000`0754f7f0 00000000`76c2652d SHELL32!ShutdownThreadProc+0×172
00000000`0754f820 00000000`76d5c521 kernel32!BaseThreadInitThunk+0xd
00000000`0754f850 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 72)

Tuesday, May 21st, 2013

Sometimes we look at a trace or log and instead of Error Messages we only see their “signs” such as a DLL load event for an error or fault reporting module or a module that is related to symbol files such diasymreader.dll. We call this pattern Hidden Error by an analogy to Hidden Exception in memory dump analysis pattern catalogue although sometimes we can see such modules in memory dump Module Collection. For example, the presence of diasymreader module may signify an unreported .NET exception and suggest a dump collection strategy.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 71)

Sunday, May 12th, 2013

Sometimes we need to analyse Last Activity before Significant Event or Discontinuity. By this pattern we mean a loose semantic collection of messages before process exit, for example. This might give some clues to further troubleshooting. In one incident a process was suddenly exiting. Its own detailed trace didn’t have any messages explaining that probably due to insufficient tracing coverage (Sparse Trace). Fortunately, a different external trace (from Process Monitor) was collected (Inter-Correlation) and it had LDAP network communication messages just before thread and process exit events.

- Dmitry Vostokov @ + -   

Memory Dump Analysis Best Practices (Part 3)

Wednesday, May 8th, 2013

Another best practice that is directly related to productivity is a parallel processing of the same memory dump especially in the case of complete memory dumps. Here an analysis might start with running time consuming scripts that dump all process and threads in the variety of formats such as x64 and x86 thread stack traces. However, if the nature of the problem is such that it is possible to start with some pattern and continue unfolding its analysis then we can do that in parallel. One of examples may be a discovered Incomplete Session with an ALPC Wait Chain. Here we can follow such a wait chain while another WinDbg instance dumps all threads for further pattern search later.

- Dmitry Vostokov @ +

Memory Dump Analysis Best Practices (Part 2)

Tuesday, May 7th, 2013

We continue with best practices, the previous was SCP. The second best practice is to check the system for additional patterns after the main pattern was found (similar to avoiding Common Mistake 8). For example, in the case of a bug check resulted from NULL pointer dereference or any other exception in some 3rd-party driver code don’t stop but look at all CPUs, processes and threads to find any other patterns such as Spiking Threads, Busy System, and Contention. Inspection of associated thread stack traces might reveal the same module and/or give additional clues to system behaviour prior to the fault.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 70)

Saturday, May 4th, 2013

For event- or message-driven architectures it is important to differentiate between event and state messages (including state transition). For example, a system may be doing some work while being in some particular state with a lot of tracing and respond to various external events with each of them having a corresponding trace message. Upon such an event the system transitions to some other state with its own set of possible trace messages. We call such a pattern State and Event. Typical example here is a windowing terminal services system and WM_ENDSESSION event illustrated in the following abstract trace diagram with a corresponding state transition diagram below it:

- Dmitry Vostokov @ + -