Crash Dump Analysis Patterns (Part 14)

The next pattern is Spiking Thread. If you have a process dump with many threads from a customer it is sometimes difficult to see which thread there was spiking CPU, that’s why it is always good to have some screenshots or notes from QSlice or Process Explorer showing spiking thread ID and process ID. The latter ID is to make sure that the process dump was from the correct process. New process dumpers and tools from Microsoft (userdump.exe, for example) save thread time information so you can open the dump and see the time spent in kernel and user mode for any thread by entering !runaway command. However if that command shows many threads with similar CPU consumption it will not highlight the particular thread that was spiking at the time the crash dump was saved so screenshots are still useful in some cases.

What to do if you don’t have spiking thread ID? Look at all threads and find those that are not waiting. Almost all threads are waiting most of the time. So the chances to dump the normal process and see some active threads are very low. If the thread is waiting the top function on its stack usually is (for XP/W2K3/Vista):

ntdll!KiFastSystemCallRet

and below it you could see some blocking calls waiting for some synchronization object, Sleep API call, IO completion or for LPC reply:

0:085> ~*kv
...
...
...
64 Id: 1b0.120c Suspend: -1 Teb: 7ff69000 Unfrozen
ChildEBP RetAddr Args to Child
02defe18 7c90e399 ntdll!KiFastSystemCallRet
02defe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc
02deff80 77e76c22 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
02deff88 77e76a3b rpcrt4!RecvLotsaCallsWrapper+0xd
02deffa8 77e76c0a rpcrt4!BaseCachedThreadRoutine+0×79
02deffb4 7c80b683 rpcrt4!ThreadStartRoutine+0×1a
02deffec 00000000 kernel32!BaseThreadStart+0×37

65 Id: 1b0.740 Suspend: -1 Teb: 7ff67000 Unfrozen
ChildEBP RetAddr Args to Child
02edff44 7c90d85c ntdll!KiFastSystemCallRet
02edff48 7c8023ed ntdll!NtDelayExecution+0xc
02edffa0 57cde2dd kernel32!SleepEx+0×61

02edffb4 7c80b683 component!foo+0×35
02edffec 00000000 kernel32!BaseThreadStart+0×37

66 Id: 1b0.131c Suspend: -1 Teb: 7ff66000 Unfrozen
ChildEBP RetAddr Args to Child
02f4ff38 7c90e9c0 ntdll!KiFastSystemCallRet
02f4ff3c 7c8025cb ntdll!ZwWaitForSingleObject+0xc
02f4ffa0 72001f65 kernel32!WaitForSingleObjectEx+0xa8

02f4ffb4 7c80b683 component!WorkerThread+0×15
02f4ffec 00000000 kernel32!BaseThreadStart+0×37

67 Id: 1b0.1320 Suspend: -1 Teb: 7ff65000 Unfrozen
ChildEBP RetAddr Args to Child
02f8fe1c 7c90e9ab ntdll!KiFastSystemCallRet
02f8fe20 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
02f8febc 7e4195f9 kernel32!WaitForMultipleObjectsEx+0×12c
02f8ff18 7e4196a8 user32!RealMsgWaitForMultipleObjectsEx+0×13e
02f8ff34 720019f6 user32!MsgWaitForMultipleObjects+0×1f

02f8ffa0 72001a29 component!bar+0xd9
02f8ffb4 7c80b683 component!MonitorWorkerThread+0×11
02f8ffec 00000000 kernel32!BaseThreadStart+0×37

68 Id: 1b0.1340 Suspend: -1 Teb: 7ff63000 Unfrozen
ChildEBP RetAddr Args to Child
0301ff1c 7c90e31b ntdll!KiFastSystemCallRet
0301ff20 7c80a746 ntdll!ZwRemoveIoCompletion+0xc
0301ff4c 57d46e65 kernel32!GetQueuedCompletionStatus+0×29

0301ffb4 7c80b683 component!AsyncEventsThread+0×91
0301ffec 00000000 kernel32!BaseThreadStart+0×37



# 85 Id: 1b0.17b4 Suspend: -1 Teb: 7ffd4000 Unfrozen
ChildEBP RetAddr Args to Child
00daffc8 7c9507a8 ntdll!DbgBreakPoint
00dafff4 00000000 ntdll!DbgUiRemoteBreakin+0×2d

Therefore if you have a different thread like this one below the chances that it was spiking are big:

58 Id: 1b0.9f4 Suspend: -1 Teb: 7ff75000 Unfrozen
ChildEBP RetAddr Args to Child
0280f64c 500af723 componentB!DoSomething+32
0280f85c 500b5391 componentB!CheckSomething+231
0280f884 500b7a3f componentB!ProcessWorkIteme+9f
0301ffec 00000000 kernel32!BaseThreadStart+0x37

There is no KiFastSystemCallRet on top and if we look at the currently executing instruction it indeed does some copy operation:

0:085> ~58r
eax=00000000 ebx=0280fdd4 ecx=0000005f edx=00000000 esi=03d30444 edi=0280f6dc
eip=500a4024 esp=0280f644 ebp=0280f64c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
componentB!DoSomething+32:
500a4024 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:0280f6dc=00000409 ds:0023:03d30444=00000409

In a kernel or a complete memory dump you can see spikes by checking KernelTime and UserTime:

0: kd> !thread 88b66768
THREAD 88b66768 Cid 01fc.1550 Teb: 7ffad000 Win32Thread: bc18f240 RUNNING on processor 1
IRP List:
89716008: (0006,0094) Flags: 00000a00 Mdl: 00000000
Impersonation token: e423a030 (Level Impersonation)
DeviceMap e3712480
Owning Process 8a0a56a0 Image: SomeSvc.exe
Wait Start TickCount 1782229 Ticks: 0
Context Switch Count 877610 LargeStack
UserTime 00:00:01.0078
KernelTime 02:23:21.0718

- Dmitry Vostokov @ DumpAnalysis.org -

24 Responses to “Crash Dump Analysis Patterns (Part 14)”

  1. Dmitry Vostokov Says:

    By default !runaway shows only user mode time. By specifying additional flags it is possible to see both kernel and user time:

    0:000> !runaway 3
    User Mode Time
    Thread Time
    8:15a4 0 days 0:12:32.812
    0:1c00 0 days 0:00:00.312
    9:1b50 0 days 0:00:00.296
    22:2698 0 days 0:00:00.046
    17:22b8 0 days 0:00:00.031
    14:2034 0 days 0:00:00.031
    21:21b4 0 days 0:00:00.000
    20:27b0 0 days 0:00:00.000
    19:278c 0 days 0:00:00.000
    18:2788 0 days 0:00:00.000
    16:2194 0 days 0:00:00.000
    15:2064 0 days 0:00:00.000
    13:2014 0 days 0:00:00.000
    12:1e38 0 days 0:00:00.000
    11:1c54 0 days 0:00:00.000
    10:1d40 0 days 0:00:00.000
    7:1994 0 days 0:00:00.000
    6:1740 0 days 0:00:00.000
    5:1c18 0 days 0:00:00.000
    4:c10 0 days 0:00:00.000
    3:1774 0 days 0:00:00.000
    2:1a08 0 days 0:00:00.000
    1:fb8 0 days 0:00:00.000
    Kernel Mode Time
    Thread Time
    9:1b50 0 days 1:21:54.125
    8:15a4 0 days 0:02:48.390
    0:1c00 0 days 0:00:00.328
    14:2034 0 days 0:00:00.234
    22:2698 0 days 0:00:00.156
    17:22b8 0 days 0:00:00.015
    21:21b4 0 days 0:00:00.000
    20:27b0 0 days 0:00:00.000
    19:278c 0 days 0:00:00.000
    18:2788 0 days 0:00:00.000
    16:2194 0 days 0:00:00.000
    15:2064 0 days 0:00:00.000
    13:2014 0 days 0:00:00.000
    12:1e38 0 days 0:00:00.000
    11:1c54 0 days 0:00:00.000
    10:1d40 0 days 0:00:00.000
    7:1994 0 days 0:00:00.000
    6:1740 0 days 0:00:00.000
    5:1c18 0 days 0:00:00.000
    4:c10 0 days 0:00:00.000
    3:1774 0 days 0:00:00.000
    2:1a08 0 days 0:00:00.000
    1:fb8 0 days 0:00:00.000

    We see that thread #15a4 spikes mostly in user mode but thread #1b50 spikes mostly in kernel mode!

  2. Dmitry Vostokov Says:

    In kernel and complete memory dumps we can scan all threads with

    Ticks: 0

    or

    Elapsed Ticks: 0

    to check their kernel and user times:

    PROCESS 8782cd60 SessionId: 52 Cid: 4a58 Peb: 7ffdf000 ParentCid: 1ea4
    DirBase: 0a0260c0 ObjectTable: 88ab33a8 TableSize: 486.
    Image: IEXPLORE.EXE
    VadRoot 87f59ea8 Clone 0 Private 2077. Modified 123. Locked 0.
    DeviceMap 880f6828
    Token e8217cd0
    ElapsedTime 0:03:09.0765
    UserTime 0:00:00.0890
    KernelTime 0:00:10.0171
    QuotaPoolUsage[PagedPool] 100320
    QuotaPoolUsage[NonPagedPool] 58100
    Working Set Sizes (now,min,max) (4944, 50, 345) (19776KB, 200KB, 1380KB)
    PeakWorkingSetSize 4974
    VirtualSize 83 Mb
    PeakVirtualSize 83 Mb
    PageFaultCount 8544
    MemoryPriority FOREGROUND
    BasePriority 8
    CommitCharge 2262

    THREAD 87836580 Cid 4a58.57cc Teb: 7ffde000 Win32Thread: a224f1d8 WAIT: (Executive) KernelMode Non-Alertable
    89dee788 Semaphore Limit 0×7fffffff
    87836668 NotificationTimer
    Not impersonating
    Owning Process 8782cd60
    Wait Start TickCount 123758 Elapsed Ticks: 0
    Context Switch Count 97636 LargeStack
    UserTime 0:00:00.0593
    KernelTime 0:00:08.0359
    Start Address 0×7c57b70c
    Win32 Start Address 0×00401ee6
    Stack Init ac154770 Current ac154320 Base ac155000 Limit ac14d000 Call ac15477c
    Priority 11 BasePriority 8 PriorityDecrement 0 DecrementCount 0

    ChildEBP RetAddr Args to Child
    ac154338 8042d8d7 00000000 8047bd00 00000001 nt!KiSwapThread+0×1b1
    ac154360 80415d61 89dee788 00000000 00000000 nt!KeWaitForSingleObject+0×1a3
    ac15439c 8041547c 00000000 00000001 8051c501 nt!ExpWaitForResource+0×2d
    ac1543b4 8046907a 8047bd00 00000001 805225e9 nt!ExAcquireResourceSharedLite+0xc6
    ac1543c0 805225e9 00000000 00000001 8051c501 nt!CmpLockRegistry+0×18
    ac154430 8051c718 e7c5fd08 ac15448c 00000001 nt!CmSetValueKey+0×31
    ac1544b4 8046b2a9 00000798 00125c04 00000000 nt!NtSetValueKey+0×196
    ac1544b4 77f88de7 00000798 00125c04 00000000 nt!KiSystemService+0xc9
    00125bb0 00000000 00000000 00000000 00000000 +0×77f88de7

  3. Dmitry Vostokov Says:

    For complete and kernel dumps we can also pay attention to the output of !running command and also to the output of !stacks command (Ticks and ThreadState columns)

  4. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 44) Says:

    […] Spiking Thread pattern includes normal threads running at PASSIVE_LEVEL or APC_LEVEL IRQL that can be preempted by any other higher priority thread. Therefore, spiking threads are not necessarily ones that were in RUNNING state when the memory dump was saved. They consumed much CPU and this is reflected in their User and Kernel time values. The pattern also includes threads running at DISPATCH_LEVEL and higher IRQL. These threads cannot be preempted by another thread so they usually remain in RUNNING state all the time unless they lower their IRQL. Some of them I’ve seen were trying to acquire a spinlock and I decided to create another more specialized pattern and call it Dispatch Level Spin. We would see it when a spinlock for some data structure wasn’t released or was corrupt and some thread tries to acquire it and enters endless spinning loop unless interrupted by higher IRQL interrupt. These infinite loops can also happen due to software defects in code running at dispatch level or higher IRQL. […]

  5. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 63) Says:

    […] dispatcher effectively starving other threads. If prioritized threads are CPU-bound we also see Spiking Thread pattern. However, if their thread priorities were normal they would have been preempted by other […]

  6. Crash Dump Analysis » Blog Archive » Heap and spike: pattern cooperation Says:

    […] showing how different patterns interact. It was reported that the service was spiking CPU (see Spiking Thread pattern) and other processes couldn’t communicate with it (see Coupled Processes pattern). […]

  7. !analyze -v : Crash Dump Analysis Patterns (Part 14) Says:

    […] Crash Dump Analysis Patterns (Part 14) […]

  8. Crash Dump Analysis » Blog Archive » Truncated dump, spiking thread, not my version and hooked functions: pattern cooperation Says:

    […] Recently I got another manual complete memory dump from a hang workstation where default analysis pointed to a possibility for a Spiking Thread pattern: […]

  9. Crash Dump Analysis » Blog Archive » Crash Dump Analysis AntiPatterns (Part 12) Says:

    […] CPU spikes, application or session hangs where user space analysis is needed. Here is an example of Spiking Thread pattern in a kernel memory dump where analysis cannot be done because threads were spiking mostly […]

  10. Crash Dump Analysis » Blog Archive » Busy system, blocked threads, wait chains and deadlock: pattern cooperation Says:

    […] none of them consumed much CPU the patter of Spiking Thread is ruled out and CPU load can be explained by the number of active user sessions and this appears […]

  11. Crash Dump Analysis » Blog Archive » Comparative Memory Dump Analysis: CPU Spikes Says:

    […] Comparative analysis is based on saving several consecutive memory dump files to see similarities and differences. Most often this technique is used for memory leaks, for example, process heap leaks. Here we see another example related to CPU spikes. […]

  12. Software Generalist » Blog Archive » Reading Notebook: 07-September-09 Says:

    […] and it entered endless loop becoming unresponsive. CPU was spiking 50% (2 processor machine). I forced a crash dump in Task Manager (Spiking Thread pattern, http://www.dumpanalysis.org/blog/index.php/2007/05/11/crash-dump-analysis-patterns-part-14/): […]

  13. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 94a) Says:

    […] (a super pattern), be it a handle table size, a heap size, a  number of contended locks, time spent in kernel, and so on. Every system or process property has its average and mean values and large deviations […]

  14. Crash Dump Analysis » Blog Archive » Icons for Memory Dump Analysis Patterns (Part 29) Says:

    […] we introduce an icon for Spiking Thread […]

  15. Crash Dump Analysis » Blog Archive » Succession of Patterns (Part 2) Says:

    […] where Wait Chains (executive resources) and Swarm of Shared Locks were probably resulted from a Spiking Thread. We have these resource […]

  16. Crash Dump Analysis » Blog Archive » Insufficient kernel pool memory, spiking thread and data contents locality: pattern cooperation Says:

    […] We also check CPU consumption and see two spiking threads: […]

  17. Crash Dump Analysis » Blog Archive » Crash Dump Analysis Patterns (Part 106) Says:

    […] level when we detect it using Task Manager, for example. Sometimes that process has only one spiking thread among many but there are cases when CPU consumption is spread among many threads. I call this […]

  18. Crash Dump Analysis » Blog Archive » Case Study: Extremely Inconsitent Dump and CPU Spike Says:

    […] The both “running” threads were showing signs of a spiking thread: […]

  19. Dmitry Vostokov Says:

    Is some legacy dumps with missing !runaway information we can guess spiking threads if they do some sort of a peek message loop:

    0:012> !runaway
    ERROR: !runaway: extension exception 0×80004002.
    “Unable to get thread times - dumps may not have time information”

    6 Id: 136c.13bc Suspend: 1 Teb: 7ffd8000 Unfrozen
    ChildEBP RetAddr Args to Child
    0198f914 5b2d887c 0198f938 00000000 00000000 USER32!PeekMessageA+0xfb
    0198fa20 5b2d3b87 059d6704 00000001 00000000 ModuleA!Choose+0×177
    […]

    0:012> ~6s
    eax=000abbdd ebx=00000400 ecx=00000200 edx=00000abb esi=7ffd8000 edi=0066b3e0
    eip=7e42a3e5 esp=0198f908 ebp=0198f914 iopl=0 nv up ei pl nz na pe nc
    cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00040206
    USER32!PeekMessageA+0xfb:
    7e42a3e5 2b470c sub eax,dword ptr [edi+0Ch] ds:0023:0066b3ec=000abbdd

  20. Dmitry Vostokov Says:

    It is also useful to combine it with Thread Age pattern to see the magnitude of spike:

    Process Uptime: 0 days 17:35:06.000

    If we look at thread CPU consumption we would see spike values relatively small compared to the overall process uptime:

    0:000> !runaway f
    User Mode Time
    Thread Time
    4:1560 0 days 0:00:08.034
    7:7e0 0 days 0:00:03.338
    11:1244 0 days 0:00:03.213
    8:1510 0 days 0:00:02.324
    3:86c 0 days 0:00:00.015
    […]

    However, compared to thread ages we see them much bigger:

    Elapsed Time
    Thread Time
    […]
    3:86c 0 days 17:35:05.213
    4:1560 0 days 0:02:12.350
    […]
    7:7e0 0 days 0:00:22.429
    11:1244 0 days 0:00:17.655

  21. Dmitry Vostokov Says:

    How to use WinDbg scripts to get spiking threads from kernel and complete memory dumps:
    http://www.dumpanalysis.org/blog/index.php/2011/12/03/2-windbg-scripts-that-changed-the-world/

  22. Dmitry Vostokov Says:

    In kernel dumps we can see all running thread stacks with this command:

    !running -t

    For complete memory dumps user portion would be invalid if thread process contexts are different from the current process on the current CPU so manual thread inspection via !thread 3f is recommended.

  23. Dmitry Vostokov Says:

    Some very small value of ticks like Ticks: 1 is also a sign.

  24. Dmitry Vostokov Says:

    !ready command is also useful:

    0: kd> !ready
    KSHARED_READY_QUEUE fffff8000da1a700: (00) **————————————————————–
    SharedReadyQueue fffff8000da1a700: Ready Threads at priority 8
    THREAD ffff930ac224b080 Cid 0c14.0418 Teb: 000000ade0765000 Win32Thread: ffff930ac254b7a0 READY on processor 80000001
    THREAD ffff930ac5dc3080 Cid 0c58.1af8 Teb: 00000056ece5b000 Win32Thread: 0000000000000000 READY on processor 80000000
    THREAD ffff930abd933040 Cid 0004.00ec Teb: 0000000000000000 Win32Thread: 0000000000000000 READY on processor 80000001
    THREAD ffff930ac1de4080 Cid 0c14.0d3c Teb: 000000ade0755000 Win32Thread: 0000000000000000 READY on processor 80000000
    SharedReadyQueue fffff8000da1a700: Ready Threads at priority 5
    THREAD ffff930ac1e36040 Cid 0bf8.0d68 Teb: 000000ce7c002000 Win32Thread: ffff930ac21f2bf0 READY on processor 80000000
    SharedReadyQueue fffff8000da1a700: Ready Threads at priority 4
    THREAD ffff930ac1b40080 Cid 081c.0ba4 Teb: 0000000000000000 Win32Thread: 0000000000000000 READY on processor 80000000
    SharedReadyQueue fffff8000da1a700: Ready Threads at priority 0
    THREAD ffff930abcfac080 Cid 0004.0038 Teb: 0000000000000000 Win32Thread: 0000000000000000 READY on processor 80000001
    Processor 0: No threads in READY state
    Processor 1: Group Scheduling Queue

    Scb: ffff930ac28a0778 Rank: 2 OQHistory: 0000000000015555
    GenCycles: 706070 LTCycles: 21604816, MinTarget: 633600000, MaxTarget: 1267200000, RankTarget: 63360000
    nt!_KSCB ffff930ac28a0778: Ready Threads at priority 9
    THREAD ffff930ac62c70c0 Cid 158d0.14a8 Teb: 000000f09395c000 Win32Thread: 0000000000000000 READY on processor 1

Leave a Reply