Crash Dump Analysis Patterns (Part 14)

2009 (0x7D9) - The Year of Debugging

The next pattern is Spiking Thread. If you have a process dump with many threads from a customer it is sometimes difficult to see which thread there was spiking CPU, that’s why it is always good to have some screenshots or notes from QSlice or Process Explorer showing spiking thread ID and process ID. The latter ID is to make sure that the process dump was from the correct process. New process dumpers and tools from Microsoft (userdump.exe, for example) save thread time information so you can open the dump and see the time spent in kernel and user mode for any thread by entering !runaway command. However if that command shows many threads with similar CPU consumption it will not highlight the particular thread that was spiking at the time the crash dump was saved so screenshots are still useful in some cases.

What to do if you don’t have spiking thread ID? Look at all threads and find those that are not waiting. Almost all threads are waiting most of the time. So the chances to dump the normal process and see some active threads are very low. If the thread is waiting the top function on its stack usually is (for XP/W2K3/Vista):

ntdll!KiFastSystemCallRet

and below it you could see some blocking calls waiting for some synchronization object, Sleep API call, IO completion or for LPC reply:

0:085> ~*kv
...
...
...
64 Id: 1b0.120c Suspend: -1 Teb: 7ff69000 Unfrozen
ChildEBP RetAddr Args to Child
02defe18 7c90e399 ntdll!KiFastSystemCallRet
02defe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc
02deff80 77e76c22 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
02deff88 77e76a3b rpcrt4!RecvLotsaCallsWrapper+0xd
02deffa8 77e76c0a rpcrt4!BaseCachedThreadRoutine+0×79
02deffb4 7c80b683 rpcrt4!ThreadStartRoutine+0×1a
02deffec 00000000 kernel32!BaseThreadStart+0×37

65 Id: 1b0.740 Suspend: -1 Teb: 7ff67000 Unfrozen
ChildEBP RetAddr Args to Child
02edff44 7c90d85c ntdll!KiFastSystemCallRet
02edff48 7c8023ed ntdll!NtDelayExecution+0xc
02edffa0 57cde2dd kernel32!SleepEx+0×61

02edffb4 7c80b683 component!foo+0×35
02edffec 00000000 kernel32!BaseThreadStart+0×37

66 Id: 1b0.131c Suspend: -1 Teb: 7ff66000 Unfrozen
ChildEBP RetAddr Args to Child
02f4ff38 7c90e9c0 ntdll!KiFastSystemCallRet
02f4ff3c 7c8025cb ntdll!ZwWaitForSingleObject+0xc
02f4ffa0 72001f65 kernel32!WaitForSingleObjectEx+0xa8

02f4ffb4 7c80b683 component!WorkerThread+0×15
02f4ffec 00000000 kernel32!BaseThreadStart+0×37

67 Id: 1b0.1320 Suspend: -1 Teb: 7ff65000 Unfrozen
ChildEBP RetAddr Args to Child
02f8fe1c 7c90e9ab ntdll!KiFastSystemCallRet
02f8fe20 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
02f8febc 7e4195f9 kernel32!WaitForMultipleObjectsEx+0×12c
02f8ff18 7e4196a8 user32!RealMsgWaitForMultipleObjectsEx+0×13e
02f8ff34 720019f6 user32!MsgWaitForMultipleObjects+0×1f

02f8ffa0 72001a29 component!bar+0xd9
02f8ffb4 7c80b683 component!MonitorWorkerThread+0×11
02f8ffec 00000000 kernel32!BaseThreadStart+0×37

68 Id: 1b0.1340 Suspend: -1 Teb: 7ff63000 Unfrozen
ChildEBP RetAddr Args to Child
0301ff1c 7c90e31b ntdll!KiFastSystemCallRet
0301ff20 7c80a746 ntdll!ZwRemoveIoCompletion+0xc
0301ff4c 57d46e65 kernel32!GetQueuedCompletionStatus+0×29

0301ffb4 7c80b683 component!AsyncEventsThread+0×91
0301ffec 00000000 kernel32!BaseThreadStart+0×37



# 85 Id: 1b0.17b4 Suspend: -1 Teb: 7ffd4000 Unfrozen
ChildEBP RetAddr Args to Child
00daffc8 7c9507a8 ntdll!DbgBreakPoint
00dafff4 00000000 ntdll!DbgUiRemoteBreakin+0×2d

Therefore if you have a different thread like this one below the chances that it was spiking are big:

58 Id: 1b0.9f4 Suspend: -1 Teb: 7ff75000 Unfrozen
ChildEBP RetAddr Args to Child
0280f64c 500af723 componentB!DoSomething+32
0280f85c 500b5391 componentB!CheckSomething+231
0280f884 500b7a3f componentB!ProcessWorkIteme+9f
0301ffec 00000000 kernel32!BaseThreadStart+0x37

There is no KiFastSystemCallRet on top and if we look at the currently executing instruction it indeed does some copy operation:

0:085> ~58r
eax=00000000 ebx=0280fdd4 ecx=0000005f edx=00000000 esi=03d30444 edi=0280f6dc
eip=500a4024 esp=0280f644 ebp=0280f64c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
componentB!DoSomething+32:
500a4024 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:0280f6dc=00000409 ds:0023:03d30444=00000409

In a kernel or a complete memory dump you can see spikes by checking KernelTime and UserTime:

0: kd> !thread 88b66768
THREAD 88b66768 Cid 01fc.1550 Teb: 7ffad000 Win32Thread: bc18f240 RUNNING on processor 1
IRP List:
89716008: (0006,0094) Flags: 00000a00 Mdl: 00000000
Impersonation token: e423a030 (Level Impersonation)
DeviceMap e3712480
Owning Process 8a0a56a0 Image: SomeSvc.exe
Wait Start TickCount 1782229 Ticks: 0
Context Switch Count 877610 LargeStack
UserTime 00:00:01.0078
KernelTime 02:23:21.0718

- Dmitry Vostokov -

Announcements

New Books:

DLL List Landscape: The Art from Computer Memory Space

Dumps, Bugs and Debugging Forensics: The Adventures of Dr. Debugalov

WinDbg: A Reference Poster and Learning Cards

Memory Dump Analysis Anthology, Volume 2

Also available:

Memory Dump Analysis Anthology, Volume 1

New Children's Book:

Baby Turing

9 Responses to “Crash Dump Analysis Patterns (Part 14)”

  1. Dmitry Vostokov Says:

    By default !runaway shows only user mode time. By specifying additional flags it is possible to see both kernel and user time:

    0:000> !runaway 3
    User Mode Time
    Thread Time
    8:15a4 0 days 0:12:32.812
    0:1c00 0 days 0:00:00.312
    9:1b50 0 days 0:00:00.296
    22:2698 0 days 0:00:00.046
    17:22b8 0 days 0:00:00.031
    14:2034 0 days 0:00:00.031
    21:21b4 0 days 0:00:00.000
    20:27b0 0 days 0:00:00.000
    19:278c 0 days 0:00:00.000
    18:2788 0 days 0:00:00.000
    16:2194 0 days 0:00:00.000
    15:2064 0 days 0:00:00.000
    13:2014 0 days 0:00:00.000
    12:1e38 0 days 0:00:00.000
    11:1c54 0 days 0:00:00.000
    10:1d40 0 days 0:00:00.000
    7:1994 0 days 0:00:00.000
    6:1740 0 days 0:00:00.000
    5:1c18 0 days 0:00:00.000
    4:c10 0 days 0:00:00.000
    3:1774 0 days 0:00:00.000
    2:1a08 0 days 0:00:00.000
    1:fb8 0 days 0:00:00.000
    Kernel Mode Time
    Thread Time
    9:1b50 0 days 1:21:54.125
    8:15a4 0 days 0:02:48.390
    0:1c00 0 days 0:00:00.328
    14:2034 0 days 0:00:00.234
    22:2698 0 days 0:00:00.156
    17:22b8 0 days 0:00:00.015
    21:21b4 0 days 0:00:00.000
    20:27b0 0 days 0:00:00.000
    19:278c 0 days 0:00:00.000
    18:2788 0 days 0:00:00.000
    16:2194 0 days 0:00:00.000
    15:2064 0 days 0:00:00.000
    13:2014 0 days 0:00:00.000
    12:1e38 0 days 0:00:00.000
    11:1c54 0 days 0:00:00.000
    10:1d40 0 days 0:00:00.000
    7:1994 0 days 0:00:00.000
    6:1740 0 days 0:00:00.000
    5:1c18 0 days 0:00:00.000
    4:c10 0 days 0:00:00.000
    3:1774 0 days 0:00:00.000
    2:1a08 0 days 0:00:00.000
    1:fb8 0 days 0:00:00.000

    We see that thread #15a4 spikes mostly in user mode but thread #1b50 spikes mostly in kernel mode!

  2. Dmitry Vostokov Says:

    In kernel and complete memory dumps we can scan all threads with

    Ticks: 0

    or

    Elapsed Ticks: 0

    to check their kernel and user times:

    PROCESS 8782cd60 SessionId: 52 Cid: 4a58 Peb: 7ffdf000 ParentCid: 1ea4
    DirBase: 0a0260c0 ObjectTable: 88ab33a8 TableSize: 486.
    Image: IEXPLORE.EXE
    VadRoot 87f59ea8 Clone 0 Private 2077. Modified 123. Locked 0.
    DeviceMap 880f6828
    Token e8217cd0
    ElapsedTime 0:03:09.0765
    UserTime 0:00:00.0890
    KernelTime 0:00:10.0171
    QuotaPoolUsage[PagedPool] 100320
    QuotaPoolUsage[NonPagedPool] 58100
    Working Set Sizes (now,min,max) (4944, 50, 345) (19776KB, 200KB, 1380KB)
    PeakWorkingSetSize 4974
    VirtualSize 83 Mb
    PeakVirtualSize 83 Mb
    PageFaultCount 8544
    MemoryPriority FOREGROUND
    BasePriority 8
    CommitCharge 2262

    THREAD 87836580 Cid 4a58.57cc Teb: 7ffde000 Win32Thread: a224f1d8 WAIT: (Executive) KernelMode Non-Alertable
    89dee788 Semaphore Limit 0×7fffffff
    87836668 NotificationTimer
    Not impersonating
    Owning Process 8782cd60
    Wait Start TickCount 123758 Elapsed Ticks: 0
    Context Switch Count 97636 LargeStack
    UserTime 0:00:00.0593
    KernelTime 0:00:08.0359
    Start Address 0×7c57b70c
    Win32 Start Address 0×00401ee6
    Stack Init ac154770 Current ac154320 Base ac155000 Limit ac14d000 Call ac15477c
    Priority 11 BasePriority 8 PriorityDecrement 0 DecrementCount 0

    ChildEBP RetAddr Args to Child
    ac154338 8042d8d7 00000000 8047bd00 00000001 nt!KiSwapThread+0×1b1
    ac154360 80415d61 89dee788 00000000 00000000 nt!KeWaitForSingleObject+0×1a3
    ac15439c 8041547c 00000000 00000001 8051c501 nt!ExpWaitForResource+0×2d
    ac1543b4 8046907a 8047bd00 00000001 805225e9 nt!ExAcquireResourceSharedLite+0xc6
    ac1543c0 805225e9 00000000 00000001 8051c501 nt!CmpLockRegistry+0×18
    ac154430 8051c718 e7c5fd08 ac15448c 00000001 nt!CmSetValueKey+0×31
    ac1544b4 8046b2a9 00000798 00125c04 00000000 nt!NtSetValueKey+0×196
    ac1544b4 77f88de7 00000798 00125c04 00000000 nt!KiSystemService+0xc9
    00125bb0 00000000 00000000 00000000 00000000 +0×77f88de7

  3. Dmitry Vostokov Says:

    For complete and kernel dumps we can also pay attention to the output of !running command and also to the output of !stacks command (Ticks and ThreadState columns)

  4. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 44) Says:

    […] Spiking Thread pattern includes normal threads running at PASSIVE_LEVEL or APC_LEVEL IRQL that can be preempted by any other higher priority thread. Therefore, spiking threads are not necessarily ones that were in RUNNING state when the memory dump was saved. They consumed much CPU and this is reflected in their User and Kernel time values. The pattern also includes threads running at DISPATCH_LEVEL and higher IRQL. These threads cannot be preempted by another thread so they usually remain in RUNNING state all the time unless they lower their IRQL. Some of them I’ve seen were trying to acquire a spinlock and I decided to create another more specialized pattern and call it Dispatch Level Spin. We would see it when a spinlock for some data structure wasn’t released or was corrupt and some thread tries to acquire it and enters endless spinning loop unless interrupted by higher IRQL interrupt. These infinite loops can also happen due to software defects in code running at dispatch level or higher IRQL. […]

  5. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 63) Says:

    […] dispatcher effectively starving other threads. If prioritized threads are CPU-bound we also see Spiking Thread pattern. However, if their thread priorities were normal they would have been preempted by other […]

  6. Crash Dump Analysis » Blog Archive » Heap and spike: pattern cooperation Says:

    […] showing how different patterns interact. It was reported that the service was spiking CPU (see Spiking Thread pattern) and other processes couldn’t communicate with it (see Coupled Processes pattern). […]

  7. !analyze -v : Crash Dump Analysis Patterns (Part 14) Says:

    […] Crash Dump Analysis Patterns (Part 14) […]

  8. Crash Dump Analysis » Blog Archive » Truncated dump, spiking thread, not my version and hooked functions: pattern cooperation Says:

    […] Recently I got another manual complete memory dump from a hang workstation where default analysis pointed to a possibility for a Spiking Thread pattern: […]

  9. Crash Dump Analysis » Blog Archive » Crash Dump Analysis AntiPatterns (Part 12) Says:

    […] CPU spikes, application or session hangs where user space analysis is needed. Here is an example of Spiking Thread pattern in a kernel memory dump where analysis cannot be done because threads were spiking mostly […]

Leave a Reply