Crash Dump Analysis Patterns (Part 14)
The next pattern is Spiking Thread. If you have a process dump with many threads from a customer it is sometimes difficult to see which thread there was spiking CPU, that’s why it is always good to have some screenshots or notes from QSlice or Process Explorer showing spiking thread ID and process ID. The latter ID is to make sure that the process dump was from the correct process. New process dumpers and tools from Microsoft (userdump.exe, for example) save thread time information so you can open the dump and see the time spent in kernel and user mode for any thread by entering !runaway command. However if that command shows many threads with similar CPU consumption it will not highlight the particular thread that was spiking at the time the crash dump was saved so screenshots are still useful in some cases.
What to do if you don’t have spiking thread ID? Look at all threads and find those that are not waiting. Almost all threads are waiting most of the time. So the chances to dump the normal process and see some active threads are very low. If the thread is waiting the top function on its stack usually is (for XP/W2K3/Vista):
ntdll!KiFastSystemCallRet
and below it you could see some blocking calls waiting for some synchronization object, Sleep API call, IO completion or for LPC reply:
0:085> ~*kv
...
...
...
64 Id: 1b0.120c Suspend: -1 Teb: 7ff69000 Unfrozen
ChildEBP RetAddr Args to Child
02defe18 7c90e399 ntdll!KiFastSystemCallRet
02defe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc
02deff80 77e76c22 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
02deff88 77e76a3b rpcrt4!RecvLotsaCallsWrapper+0xd
02deffa8 77e76c0a rpcrt4!BaseCachedThreadRoutine+0×79
02deffb4 7c80b683 rpcrt4!ThreadStartRoutine+0×1a
02deffec 00000000 kernel32!BaseThreadStart+0×37
65 Id: 1b0.740 Suspend: -1 Teb: 7ff67000 Unfrozen
ChildEBP RetAddr Args to Child
02edff44 7c90d85c ntdll!KiFastSystemCallRet
02edff48 7c8023ed ntdll!NtDelayExecution+0xc
02edffa0 57cde2dd kernel32!SleepEx+0×61
02edffb4 7c80b683 component!foo+0×35
02edffec 00000000 kernel32!BaseThreadStart+0×37
66 Id: 1b0.131c Suspend: -1 Teb: 7ff66000 Unfrozen
ChildEBP RetAddr Args to Child
02f4ff38 7c90e9c0 ntdll!KiFastSystemCallRet
02f4ff3c 7c8025cb ntdll!ZwWaitForSingleObject+0xc
02f4ffa0 72001f65 kernel32!WaitForSingleObjectEx+0xa8
02f4ffb4 7c80b683 component!WorkerThread+0×15
02f4ffec 00000000 kernel32!BaseThreadStart+0×37
67 Id: 1b0.1320 Suspend: -1 Teb: 7ff65000 Unfrozen
ChildEBP RetAddr Args to Child
02f8fe1c 7c90e9ab ntdll!KiFastSystemCallRet
02f8fe20 7c8094e2 ntdll!ZwWaitForMultipleObjects+0xc
02f8febc 7e4195f9 kernel32!WaitForMultipleObjectsEx+0×12c
02f8ff18 7e4196a8 user32!RealMsgWaitForMultipleObjectsEx+0×13e
02f8ff34 720019f6 user32!MsgWaitForMultipleObjects+0×1f
02f8ffa0 72001a29 component!bar+0xd9
02f8ffb4 7c80b683 component!MonitorWorkerThread+0×11
02f8ffec 00000000 kernel32!BaseThreadStart+0×37
68 Id: 1b0.1340 Suspend: -1 Teb: 7ff63000 Unfrozen
ChildEBP RetAddr Args to Child
0301ff1c 7c90e31b ntdll!KiFastSystemCallRet
0301ff20 7c80a746 ntdll!ZwRemoveIoCompletion+0xc
0301ff4c 57d46e65 kernel32!GetQueuedCompletionStatus+0×29
0301ffb4 7c80b683 component!AsyncEventsThread+0×91
0301ffec 00000000 kernel32!BaseThreadStart+0×37
…
…
…
# 85 Id: 1b0.17b4 Suspend: -1 Teb: 7ffd4000 Unfrozen
ChildEBP RetAddr Args to Child
00daffc8 7c9507a8 ntdll!DbgBreakPoint
00dafff4 00000000 ntdll!DbgUiRemoteBreakin+0×2d
Therefore if you have a different thread like this one below the chances that it was spiking are big:
58 Id: 1b0.9f4 Suspend: -1 Teb: 7ff75000 Unfrozen
ChildEBP RetAddr Args to Child
0280f64c 500af723 componentB!DoSomething+32
0280f85c 500b5391 componentB!CheckSomething+231
0280f884 500b7a3f componentB!ProcessWorkIteme+9f
0301ffec 00000000 kernel32!BaseThreadStart+0x37
There is no KiFastSystemCallRet on top and if we look at the currently executing instruction it indeed does some copy operation:
0:085> ~58r
eax=00000000 ebx=0280fdd4 ecx=0000005f edx=00000000 esi=03d30444 edi=0280f6dc
eip=500a4024 esp=0280f644 ebp=0280f64c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202
componentB!DoSomething+32:
500a4024 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:0280f6dc=00000409 ds:0023:03d30444=00000409
In a kernel or a complete memory dump you can see spikes by checking KernelTime and UserTime:
0: kd> !thread 88b66768
THREAD 88b66768 Cid 01fc.1550 Teb: 7ffad000 Win32Thread: bc18f240 RUNNING on processor 1
IRP List:
89716008: (0006,0094) Flags: 00000a00 Mdl: 00000000
Impersonation token: e423a030 (Level Impersonation)
DeviceMap e3712480
Owning Process 8a0a56a0 Image: SomeSvc.exe
Wait Start TickCount 1782229 Ticks: 0
Context Switch Count 877610 LargeStack
UserTime 00:00:01.0078
KernelTime 02:23:21.0718
- Dmitry Vostokov @ DumpAnalysis.org -
June 25th, 2007 at 3:09 pm
By default !runaway shows only user mode time. By specifying additional flags it is possible to see both kernel and user time:
0:000> !runaway 3
User Mode Time
Thread Time
8:15a4 0 days 0:12:32.812
0:1c00 0 days 0:00:00.312
9:1b50 0 days 0:00:00.296
22:2698 0 days 0:00:00.046
17:22b8 0 days 0:00:00.031
14:2034 0 days 0:00:00.031
21:21b4 0 days 0:00:00.000
20:27b0 0 days 0:00:00.000
19:278c 0 days 0:00:00.000
18:2788 0 days 0:00:00.000
16:2194 0 days 0:00:00.000
15:2064 0 days 0:00:00.000
13:2014 0 days 0:00:00.000
12:1e38 0 days 0:00:00.000
11:1c54 0 days 0:00:00.000
10:1d40 0 days 0:00:00.000
7:1994 0 days 0:00:00.000
6:1740 0 days 0:00:00.000
5:1c18 0 days 0:00:00.000
4:c10 0 days 0:00:00.000
3:1774 0 days 0:00:00.000
2:1a08 0 days 0:00:00.000
1:fb8 0 days 0:00:00.000
Kernel Mode Time
Thread Time
9:1b50 0 days 1:21:54.125
8:15a4 0 days 0:02:48.390
0:1c00 0 days 0:00:00.328
14:2034 0 days 0:00:00.234
22:2698 0 days 0:00:00.156
17:22b8 0 days 0:00:00.015
21:21b4 0 days 0:00:00.000
20:27b0 0 days 0:00:00.000
19:278c 0 days 0:00:00.000
18:2788 0 days 0:00:00.000
16:2194 0 days 0:00:00.000
15:2064 0 days 0:00:00.000
13:2014 0 days 0:00:00.000
12:1e38 0 days 0:00:00.000
11:1c54 0 days 0:00:00.000
10:1d40 0 days 0:00:00.000
7:1994 0 days 0:00:00.000
6:1740 0 days 0:00:00.000
5:1c18 0 days 0:00:00.000
4:c10 0 days 0:00:00.000
3:1774 0 days 0:00:00.000
2:1a08 0 days 0:00:00.000
1:fb8 0 days 0:00:00.000
We see that thread #15a4 spikes mostly in user mode but thread #1b50 spikes mostly in kernel mode!
October 20th, 2007 at 8:00 pm
In kernel and complete memory dumps we can scan all threads with
Ticks: 0
or
Elapsed Ticks: 0
to check their kernel and user times:
PROCESS 8782cd60 SessionId: 52 Cid: 4a58 Peb: 7ffdf000 ParentCid: 1ea4
DirBase: 0a0260c0 ObjectTable: 88ab33a8 TableSize: 486.
Image: IEXPLORE.EXE
VadRoot 87f59ea8 Clone 0 Private 2077. Modified 123. Locked 0.
DeviceMap 880f6828
Token e8217cd0
ElapsedTime 0:03:09.0765
UserTime 0:00:00.0890
KernelTime 0:00:10.0171
QuotaPoolUsage[PagedPool] 100320
QuotaPoolUsage[NonPagedPool] 58100
Working Set Sizes (now,min,max) (4944, 50, 345) (19776KB, 200KB, 1380KB)
PeakWorkingSetSize 4974
VirtualSize 83 Mb
PeakVirtualSize 83 Mb
PageFaultCount 8544
MemoryPriority FOREGROUND
BasePriority 8
CommitCharge 2262
THREAD 87836580 Cid 4a58.57cc Teb: 7ffde000 Win32Thread: a224f1d8 WAIT: (Executive) KernelMode Non-Alertable
89dee788 Semaphore Limit 0×7fffffff
87836668 NotificationTimer
Not impersonating
Owning Process 8782cd60
Wait Start TickCount 123758 Elapsed Ticks: 0
Context Switch Count 97636 LargeStack
UserTime 0:00:00.0593
KernelTime 0:00:08.0359
Start Address 0×7c57b70c
Win32 Start Address 0×00401ee6
Stack Init ac154770 Current ac154320 Base ac155000 Limit ac14d000 Call ac15477c
Priority 11 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
ac154338 8042d8d7 00000000 8047bd00 00000001 nt!KiSwapThread+0×1b1
ac154360 80415d61 89dee788 00000000 00000000 nt!KeWaitForSingleObject+0×1a3
ac15439c 8041547c 00000000 00000001 8051c501 nt!ExpWaitForResource+0×2d
ac1543b4 8046907a 8047bd00 00000001 805225e9 nt!ExAcquireResourceSharedLite+0xc6
ac1543c0 805225e9 00000000 00000001 8051c501 nt!CmpLockRegistry+0×18
ac154430 8051c718 e7c5fd08 ac15448c 00000001 nt!CmSetValueKey+0×31
ac1544b4 8046b2a9 00000798 00125c04 00000000 nt!NtSetValueKey+0×196
ac1544b4 77f88de7 00000798 00125c04 00000000 nt!KiSystemService+0xc9
00125bb0 00000000 00000000 00000000 00000000 +0×77f88de7
January 9th, 2008 at 4:50 pm
For complete and kernel dumps we can also pay attention to the output of !running command and also to the output of !stacks command (Ticks and ThreadState columns)
January 25th, 2008 at 4:53 pm
[…] Spiking Thread pattern includes normal threads running at PASSIVE_LEVEL or APC_LEVEL IRQL that can be preempted by any other higher priority thread. Therefore, spiking threads are not necessarily ones that were in RUNNING state when the memory dump was saved. They consumed much CPU and this is reflected in their User and Kernel time values. The pattern also includes threads running at DISPATCH_LEVEL and higher IRQL. These threads cannot be preempted by another thread so they usually remain in RUNNING state all the time unless they lower their IRQL. Some of them I’ve seen were trying to acquire a spinlock and I decided to create another more specialized pattern and call it Dispatch Level Spin. We would see it when a spinlock for some data structure wasn’t released or was corrupt and some thread tries to acquire it and enters endless spinning loop unless interrupted by higher IRQL interrupt. These infinite loops can also happen due to software defects in code running at dispatch level or higher IRQL. […]
June 6th, 2008 at 5:01 pm
[…] dispatcher effectively starving other threads. If prioritized threads are CPU-bound we also see Spiking Thread pattern. However, if their thread priorities were normal they would have been preempted by other […]
July 26th, 2008 at 6:39 pm
[…] showing how different patterns interact. It was reported that the service was spiking CPU (see Spiking Thread pattern) and other processes couldn’t communicate with it (see Coupled Processes pattern). […]
September 18th, 2008 at 9:53 am
[…] Crash Dump Analysis Patterns (Part 14) […]
November 14th, 2008 at 8:52 pm
[…] Recently I got another manual complete memory dump from a hang workstation where default analysis pointed to a possibility for a Spiking Thread pattern: […]
December 16th, 2008 at 7:02 pm
[…] CPU spikes, application or session hangs where user space analysis is needed. Here is an example of Spiking Thread pattern in a kernel memory dump where analysis cannot be done because threads were spiking mostly […]
March 3rd, 2009 at 7:54 pm
[…] none of them consumed much CPU the patter of Spiking Thread is ruled out and CPU load can be explained by the number of active user sessions and this appears […]
May 22nd, 2009 at 11:36 pm
[…] Comparative analysis is based on saving several consecutive memory dump files to see similarities and differences. Most often this technique is used for memory leaks, for example, process heap leaks. Here we see another example related to CPU spikes. […]
September 7th, 2009 at 4:16 pm
[…] and it entered endless loop becoming unresponsive. CPU was spiking 50% (2 processor machine). I forced a crash dump in Task Manager (Spiking Thread pattern, http://www.dumpanalysis.org/blog/index.php/2007/05/11/crash-dump-analysis-patterns-part-14/): […]
November 30th, 2009 at 4:49 pm
[…] (a super pattern), be it a handle table size, a heap size, a number of contended locks, time spent in kernel, and so on. Every system or process property has its average and mean values and large deviations […]
April 29th, 2010 at 11:06 am
[…] we introduce an icon for Spiking Thread […]
June 3rd, 2010 at 11:20 pm
[…] where Wait Chains (executive resources) and Swarm of Shared Locks were probably resulted from a Spiking Thread. We have these resource […]
September 18th, 2010 at 11:30 pm
[…] We also check CPU consumption and see two spiking threads: […]
September 19th, 2010 at 9:42 pm
[…] level when we detect it using Task Manager, for example. Sometimes that process has only one spiking thread among many but there are cases when CPU consumption is spread among many threads. I call this […]
September 22nd, 2010 at 9:49 pm
[…] The both “running” threads were showing signs of a spiking thread: […]
February 14th, 2011 at 2:16 pm
Is some legacy dumps with missing !runaway information we can guess spiking threads if they do some sort of a peek message loop:
0:012> !runaway
ERROR: !runaway: extension exception 0×80004002.
“Unable to get thread times - dumps may not have time information”
6 Id: 136c.13bc Suspend: 1 Teb: 7ffd8000 Unfrozen
ChildEBP RetAddr Args to Child
0198f914 5b2d887c 0198f938 00000000 00000000 USER32!PeekMessageA+0xfb
0198fa20 5b2d3b87 059d6704 00000001 00000000 ModuleA!Choose+0×177
[…]
0:012> ~6s
eax=000abbdd ebx=00000400 ecx=00000200 edx=00000abb esi=7ffd8000 edi=0066b3e0
eip=7e42a3e5 esp=0198f908 ebp=0198f914 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00040206
USER32!PeekMessageA+0xfb:
7e42a3e5 2b470c sub eax,dword ptr [edi+0Ch] ds:0023:0066b3ec=000abbdd
August 25th, 2011 at 10:23 am
It is also useful to combine it with Thread Age pattern to see the magnitude of spike:
Process Uptime: 0 days 17:35:06.000
If we look at thread CPU consumption we would see spike values relatively small compared to the overall process uptime:
0:000> !runaway f
User Mode Time
Thread Time
4:1560 0 days 0:00:08.034
7:7e0 0 days 0:00:03.338
11:1244 0 days 0:00:03.213
8:1510 0 days 0:00:02.324
3:86c 0 days 0:00:00.015
[…]
However, compared to thread ages we see them much bigger:
Elapsed Time
Thread Time
[…]
3:86c 0 days 17:35:05.213
4:1560 0 days 0:02:12.350
[…]
7:7e0 0 days 0:00:22.429
11:1244 0 days 0:00:17.655
December 3rd, 2011 at 9:19 pm
How to use WinDbg scripts to get spiking threads from kernel and complete memory dumps:
http://www.dumpanalysis.org/blog/index.php/2011/12/03/2-windbg-scripts-that-changed-the-world/
September 19th, 2013 at 2:54 pm
In kernel dumps we can see all running thread stacks with this command:
!running -t
For complete memory dumps user portion would be invalid if thread process contexts are different from the current process on the current CPU so manual thread inspection via !thread 3f is recommended.
June 28th, 2020 at 3:31 pm
Some very small value of ticks like Ticks: 1 is also a sign.
June 28th, 2020 at 3:32 pm
!ready command is also useful:
0: kd> !ready
KSHARED_READY_QUEUE fffff8000da1a700: (00) **————————————————————–
SharedReadyQueue fffff8000da1a700: Ready Threads at priority 8
THREAD ffff930ac224b080 Cid 0c14.0418 Teb: 000000ade0765000 Win32Thread: ffff930ac254b7a0 READY on processor 80000001
THREAD ffff930ac5dc3080 Cid 0c58.1af8 Teb: 00000056ece5b000 Win32Thread: 0000000000000000 READY on processor 80000000
THREAD ffff930abd933040 Cid 0004.00ec Teb: 0000000000000000 Win32Thread: 0000000000000000 READY on processor 80000001
THREAD ffff930ac1de4080 Cid 0c14.0d3c Teb: 000000ade0755000 Win32Thread: 0000000000000000 READY on processor 80000000
SharedReadyQueue fffff8000da1a700: Ready Threads at priority 5
THREAD ffff930ac1e36040 Cid 0bf8.0d68 Teb: 000000ce7c002000 Win32Thread: ffff930ac21f2bf0 READY on processor 80000000
SharedReadyQueue fffff8000da1a700: Ready Threads at priority 4
THREAD ffff930ac1b40080 Cid 081c.0ba4 Teb: 0000000000000000 Win32Thread: 0000000000000000 READY on processor 80000000
SharedReadyQueue fffff8000da1a700: Ready Threads at priority 0
THREAD ffff930abcfac080 Cid 0004.0038 Teb: 0000000000000000 Win32Thread: 0000000000000000 READY on processor 80000001
Processor 0: No threads in READY state
Processor 1: Group Scheduling Queue
Scb: ffff930ac28a0778 Rank: 2 OQHistory: 0000000000015555
GenCycles: 706070 LTCycles: 21604816, MinTarget: 633600000, MaxTarget: 1267200000, RankTarget: 63360000
nt!_KSCB ffff930ac28a0778: Ready Threads at priority 9
THREAD ffff930ac62c70c0 Cid 158d0.14a8 Teb: 000000f09395c000 Win32Thread: 0000000000000000 READY on processor 1