Software Diagnostics Library

Crash Dump Analysis Patterns (Part 19b)

Previously I wrote about how to calculate thread waiting time in kernel and complete memory dumps (Waiting Thread Time pattern). Now I show how to do it for user dumps. Unfortunately this information is not available in user space and therefore this can be done only approximately by calculating average or maximum waiting time.

The key is to take advantage of !runaway WinDbg command. Modern versions of postmortem debuggers and process dumpers usually save thread time information additionally. For example, /ma switch for .dump command does it. The later versions of userdump.exe also do it. When !runaway is invoked with 0y111 (7) flag it shows each thread user and kernel times and also the time elapsed since each thread was created. Therefore for threads that didn’t do much CPU intensive computation their average waiting time will almost correspond to their elapsed time. Unfortunately we cannot say when that computation took place. It could be the case that the thread in question was waiting 98.9% of the time, then did some computation for 0.01% and was waiting again 1% of the time or it could be the case that it was doing computation 0.01% of the time initially and then waiting the rest of the time (99.9%) until the dump was saved.

Consider this example of a sleeping thread:

0:000> ~32kL 100 ChildEBP RetAddr 0a18f03c 7c826f4b ntdll!KiFastSystemCallRet 0a18f040 77e41ed1 ntdll!NtDelayExecution+0xc 0a18f0a8 77e424ed kernel32!SleepEx+0x68 0a18f0b8 1b0ac343 kernel32!Sleep+0xf 0a18f100 1b00ba49 msjet40!System::AllocatePages+0x15e 0a18f11c 1b00ba08 msjet40!PageDesc::WireCurrentPage+0x2f 0a18f13c 1b00b8dd msjet40!PageDesc::ReadPage+0x119 0a18f164 1b01b567 msjet40!Database::ReadPage+0x7a 0a18f190 1b00e309 msjet40!TableMover::GetNextRow+0xc9 0a18f1a8 1b015de9 msjet40!TableMover::Move+0xc4 0a18f1d8 1b015d9c msjet40!ErrIsamMove+0x6c 0a18f1f0 1b038aa4 msjet40!ErrDispMove+0x43 0a18f43c 1b015d9c msjet40!ErrJPMoveRange+0x350 0a18f454 1b0200b3 msjet40!ErrDispMove+0x43 0a18f6a0 1b021e4d msjet40!ErrMaterializeRows+0x1fd 0a18f6f0 1b021c0d msjet40!ErrJPSetColumnSort+0x191 0a18f718 1b0210a4 msjet40!ErrJPOpenSort+0x105 0a18f750 1b020de5 msjet40!ErrJPOpenRvt+0x171 0a18f9f0 1b039b82 msjet40!ErrExecuteQuery+0x548 0a18fa3c 1b05548b msjet40!ErrExecuteTempQuery+0x13d 0a18fa6c 4c23b01e msjet40!JetExecuteTempQuery+0xc9 0a18fa94 4c23a8d1 odbcjt32!DoJetCloseTable+0x64 0a18fd10 4c23a6b6 odbcjt32!SQLInternalExecute+0x217 0a18fd20 4c23a694 odbcjt32!SQLExecuteCover+0x1f 0a18fd28 488b3fc1 odbcjt32!SQLExecute+0x9 0a18fd44 0c4898b5 odbc32!SQLExecute+0xd3 [...]

The process uptime can be seen from vertarget WinDbg command:

0:000> vertarget [...] Process Uptime: 0 days 1:17:22.000 [...]

Then we dump thread times (only information for our 32nd thread is shown here):

0:000> !runaway 0y111 User Mode Time Thread Time 32:cfc 0 days 0:00:00.109 [...] Kernel Mode Time Thread Time 32:cfc 0 days 0:00:00.062 [...] Elapsed Time Thread Time [...] 32:cfc 0 days 1:17:20.703 [...]

We see that 1 hour, 17 minutes and almost 21 seconds passed since the thread was created. By subtracting this time from process uptime we see that it was created in the first 2 seconds and it was consuming less than one second of CPU time. Therefore, most of the time this thread was waiting. Unfortunately we cannot say when it was waiting most of the time, in the beginning before it started sleeping or after. Fortunately we might continue guessing by looking at Sleep argument:

0:032> kv ChildEBP RetAddr Args to Child 0a18f03c 7c826f4b 77e41ed1 00000000 0a18f080 ntdll!KiFastSystemCallRet 0a18f040 77e41ed1 00000000 0a18f080 00000000 ntdll!NtDelayExecution+0xc 0a18f0a8 77e424ed 00000032 00000000 04390000 kernel32!SleepEx+0x68 0a18f0b8 1b0ac343 00000032 042a34b4 0a18f1c8 kernel32!Sleep+0xf […]

We see that it had been sleeping for at most 32 milliseconds and perhaps this is a retry / sleep loop. We might guess that the thread was recently spinning in that loop and therefore waiting all the time before that. Alternatively we might guess that the retry portion is very small and fast and 32 milliseconds are spread over all elapsed time and the thread was in this loop the significant proportion of time. What we can surely say that the last waiting time was no more than 32 milliseconds. in the case of waiting for an event, for example, I don’t see any reliable way of calculating this time. If anyone knows please post a comment.

- Dmitry Vostokov @ DumpAnalysis.org -

This entry was posted on Thursday, July 10th, 2008 at 1:46 pm and is filed under Crash Dump Analysis, Crash Dump Patterns, WinDbg Tips and Tricks. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “Crash Dump Analysis Patterns (Part 19b)”

Crash Dump Analysis » Blog Archive » Icons for Memory Dump Analysis Patterns (Part 36) Says:
May 11th, 2010 at 10:07 am
[…] Today we introduce an icon for Waiting Thread Time (user dumps) pattern: […]
David Meijers Says:
December 22nd, 2011 at 12:33 pm
Note that “sleeping for at most 32 milliseconds” means 32 HEX milliseconds.
In other words: 50 milliseconds for mere mortals.
Dmitry Vostokov Says:
December 22nd, 2011 at 2:20 pm
Good catch

You must be logged in to post a comment.

Crash Dump Analysis Patterns (Part 19b)

3 Responses to “Crash Dump Analysis Patterns (Part 19b)”

Leave a Reply