Crash Dump Analysis Patterns (Part 42a)
Wait Chain is another pattern and it is simply a sequence of causal relations between events: thread A is waiting for event E to happen that threads B, C or D are supposed to signal at some time in the future but they are all waiting for event F to happen that thread G is about to signal as soon as it finishes processing some critical task:
That subsumes various deadlock patterns too which are causal loops where thread A is waiting for event AB that thread B will signal as soon as thread A signals event BA thread B is waiting for:
In this context “Event” means any type of synchronization object, critical section, LPC/RPC reply or data arrival through some IPC channel and not only Win32 event object or kernel _KEVENT.
As the first example of Wait Chain pattern I show a process being terminated and waiting for the other thread to finish or in other words, considering thread termination as an event itself, the main process thread is waiting for the second thread object to be signaled. The second thread tries to cancel previous I/O request directed to some device. However that IRP is not cancellable and process hangs. This can be depicted on the following diagram:
where Thread A is our main thread waiting for Event A which is thread B itself waiting for I/O cancellation (Event B). Their stack traces are:
THREAD 8a3178d0 Cid 04bc.01cc Teb: 7ffdf000 Win32Thread: bc1b6e70 WAIT: (Unknown) KernelMode Non-Alertable
8af2c920 Thread
Not impersonating
DeviceMap e1032530
Owning Process 89ff8d88 Image: processA.exe
Wait Start TickCount 80444 Ticks: 873 (0:00:00:13.640)
Context Switch Count 122 LargeStack
UserTime 00:00:00.015
KernelTime 00:00:00.156
Win32 Start Address 0x010148a4
Start Address 0x77e617f8
Stack Init f3f29000 Current f3f28be8 Base f3f29000 Limit f3f25000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr
f3f28c00 80833465 nt!KiSwapContext+0x26
f3f28c2c 80829a62 nt!KiSwapThread+0x2e5
f3f28c74 8094c0ea nt!KeWaitForSingleObject+0x346 ; stack trace with arguments shows the first parameter as 8af2c920
f3f28d0c 8094c63f nt!PspExitThread+0×1f0
f3f28d24 8094c839 nt!PspTerminateThreadByPointer+0×4b
f3f28d54 8088978c nt!NtTerminateProcess+0×125
f3f28d54 7c8285ec nt!KiFastCallEntry+0xfc
THREAD 8af2c920 Cid 04bc.079c Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
8af2c998 NotificationTimer
IRP List:
8ad26260: (0006,0220) Flags: 00000000 Mdl: 00000000
Not impersonating
DeviceMap e1032530
Owning Process 89ff8d88 Image: processA.exe
Wait Start TickCount 81312 Ticks: 5 (0:00:00:00.078)
Context Switch Count 169 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0×77da3ea5
Start Address 0×77e617ec
Stack Init f3e09000 Current f3e08bac Base f3e09000 Limit f3e05000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr
f3e08bc4 80833465 nt!KiSwapContext+0×26
f3e08bf0 80828f0b nt!KiSwapThread+0×2e5
f3e08c38 808ea7a4 nt!KeDelayExecutionThread+0×2ab
f3e08c68 8094c360 nt!IoCancelThreadIo+0×62
f3e08cf0 8094c569 nt!PspExitThread+0×466
f3e08cfc 8082e0b6 nt!PsExitSpecialApc+0×1d
f3e08d4c 80889837 nt!KiDeliverApc+0×1ae
f3e08d4c 7c8285ec nt!KiServiceExit+0×56
By inspecting IRP we can see a device it was directed to, see that it has cancel bit but doesn’t have a cancel routine:
0: kd> !irp 8ad26260 1
Irp is active with 5 stacks 4 is current (= 0x8ad2633c)
No Mdl: No System Buffer: Thread 8af2c920: Irp stack trace.
Flags = 00000000
ThreadListEntry.Flink = 8af2cb28
ThreadListEntry.Blink = 8af2cb28
IoStatus.Status = 00000000
IoStatus.Information = 00000000
RequestorMode = 00000001
Cancel = 01
CancelIrql = 0
ApcEnvironment = 00
UserIosb = 77ecb700
UserEvent = 00000000
Overlay.AsynchronousParameters.UserApcRoutine = 00000000
Overlay.AsynchronousParameters.UserApcContext = 00000000
Overlay.AllocationSize = 00000000 - 00000000
CancelRoutine = 00000000
UserBuffer = 77ecb720
&Tail.Overlay.DeviceQueueEntry = 8ad262a0
Tail.Overlay.Thread = 8af2c920
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = 00000000
Tail.Overlay.ListEntry.Blink = 00000000
Tail.Overlay.CurrentStackLocation = 8ad2633c
Tail.Overlay.OriginalFileObject = 89ff8920
Tail.Apc = 00000000
Tail.CompletionKey = 00000000
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
>[ c, 2] 0 1 8ab20388 89ff8920 00000000-00000000 pending
\Device\DeviceA
Args: 00000020 00000017 00000000 00000000
[ c, 2] 0 0 8affa4b8 89ff8920 00000000-00000000
\Device\DeviceB
Args: 00000020 00000017 00000000 00000000
- Dmitry Vostokov @ DumpAnalysis.org -
April 23rd, 2008 at 1:58 pm
Wait Chain Traversal API in Windows Server 2008:
http://msdn2.microsoft.com/en-us/library/cc308564.aspx
November 7th, 2008 at 6:17 pm
[…] of Wait Chain pattern for objects with ownership semantics is seen in kernel and complete memory dumps where threads […]
December 17th, 2008 at 6:26 pm
[…] (0×7D9) - The Year of DebuggingNow its time to write about wait chains involving LPC calls. These chains are easily identified by searching for “Waiting for reply […]
February 17th, 2009 at 10:14 am
[…] Wait Chain (general) […]
August 10th, 2009 at 3:53 pm
[…] Thread suspension, termination and APC (p. 112) - see thread stack with ExitThread / PsExitSpecialApc in wait chain pattern case study: http://www.dumpanalysis.org/blog/index.php/2007/12/14/crash-dump-analysis-patterns-part-42a/ […]
August 11th, 2009 at 3:09 pm
[…] then we don’t have a deadlock, strictly speaking, because the latter involves activity chains with ownership, not a container dependency (a process is a container for threads). I illustrated all this on the […]
October 8th, 2009 at 10:37 am
[…] collective and its environment. As a simple low-level example, consider coupled processes, or wait chains. Note that this is not the same and it is not used in the same sense as a computational collective […]
February 16th, 2010 at 12:39 am
[…] we show an example of a wait chain involving process objects. This Wait Chain pattern variation is similar to threads waiting for thread objects. When looking at stack trace […]
August 10th, 2010 at 5:01 pm
[…] http://www.dumpanalysis.org/blog/index.php/2007/12/14/crash-dump-analysis-patterns-part-42a/ […]
September 9th, 2010 at 3:39 pm
[…] Experts Magazine Online Today we introduce an icon for Wait Chain (general) […]