ALPC wait chain, missing threads, message box, zombie and special processes: pattern cooperation
The purpose of this case study is to show how to choose what to include in a fiber bundle memory dump when x64 complete memory dumps are huge and not an option to deliver:
1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 5880464 ( 23521856 Kb)
[…]
The dump we have is a kernel. When we dump all processes and threads and look for “Waiting for ” we find many ALPC wait chains spanning 3 - 4 processes (sometimes semicircular), sometimes originated from processes with missing threads (just one or two present threads when we expect a dozen of them in a normal state):
1: kd> !process fffffa800b834c10
PROCESS fffffa800b834c10
SessionId: 205 Cid: 13c40 Peb: 7fffffdb000 ParentCid: 133c0
DirBase: 13b61d000 ObjectTable: fffff8800c2295b0 HandleCount: 58.
Image: ProcessA.exe
VadRoot fffffa8007d70c00 Vads 121 Clone 0 Private 497. Modified 0. Locked 0.
DeviceMap fffff88000007450
Token fffff8800c695560
ElapsedTime 00:03:42.083
UserTime 00:00:00.000
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 65968
QuotaPoolUsage[NonPagedPool] 11520
Working Set Sizes (now,min,max) (1274, 50, 345) (5096KB, 200KB, 1380KB)
PeakWorkingSetSize 1278
VirtualSize 37 Mb
PeakVirtualSize 38 Mb
PageFaultCount 1286
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 581
THREAD fffffa800b845bb0 Cid 13c40.1332c Teb: 000007fffffde000 Win32Thread: fffff900c0076010 WAIT: (WrLpcReply) UserMode Non-Alertable
fffffa800b845f40 Semaphore Limit 0x1
Waiting for reply to ALPC Message fffff88012527770 : queued at port fffffa80055bca60 : owned by process fffffa80054dfc10
Not impersonating
DeviceMap fffff88000007450
Owning Process fffffa800b834c10 Image: ProcessA.exe
Attached Process N/A Image: N/A
Wait Start TickCount 10912787 Ticks: 14208 (0:00:03:42.000)
Context Switch Count 34 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address 0×00000000fff60260
Stack Init fffffa600e8d5db0 Current fffffa600e8d5670
Base fffffa600e8d6000 Limit fffffa600e8ce000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffffa60`0e8d56b0 fffff800`016a36fa nt!KiSwapContext+0×7f
fffffa60`0e8d57f0 fffff800`0169835b nt!KiSwapThread+0×13a
fffffa60`0e8d5860 fffff800`016cd4e2 nt!KeWaitForSingleObject+0×2cb
fffffa60`0e8d58f0 fffff800`01916d14 nt!AlpcpSignalAndWait+0×92
fffffa60`0e8d5980 fffff800`019137a6 nt!AlpcpReceiveSynchronousReply+0×44
fffffa60`0e8d59e0 fffff800`0190330f nt!AlpcpProcessSynchronousRequest+0×24f
fffffa60`0e8d5b00 fffff800`016a0ef3 nt!NtAlpcSendWaitReceivePort+0×19f
fffffa60`0e8d5bb0 00000000`774d756a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`0e8d5c20)
00000000`0026f038 00000000`00000000 0×774d756a
1: kd> !alpc /m fffff88012527770
Message @ fffff88012527770
MessageID : 0x10E8 (4328)
CallbackID : 0xC3416B (12796267)
SequenceNumber : 0x00000002 (2)
Type : LPC_REQUEST
DataLength : 0x0040 (64)
TotalLength : 0x0068 (104)
Canceled : No
Release : No
ReplyWaitReply : No
Continuation : Yes
OwnerPort : fffffa80076e9660 [ALPC_CLIENT_COMMUNICATION_PORT]
WaitingThread : fffffa800b845bb0
QueueType : ALPC_MSGQUEUE_PENDING
QueuePort : fffffa80055bca60 [ALPC_CONNECTION_PORT]
QueuePortOwnerProcess : fffffa80054dfc10 (ProcessB.exe)
ServerThread : fffffa800b711060
QuotaCharged : No
CancelQueuePort : 0000000000000000
CancelSequencePort : 0000000000000000
CancelSequenceNumber : 0×00000000 (0)
ClientContext : 00000000003fcf20
ServerContext : 0000000000000000
PortContext : 00000000029fda00
CancelPortContext : 0000000000000000
SecurityData : 0000000000000000
View : 0000000000000000
1: kd> !thread fffffa800b711060
THREAD fffffa800b711060 Cid 032c.146e8 Teb: 000007fffff7c000 Win32Thread: 0000000000000000 WAIT: (WrLpcReply) UserMode Non-Alertable
fffffa800b7113f0 Semaphore Limit 0x1
Waiting for reply to ALPC Message fffff8800e401200 : queued at port fffffa8005a32730 : owned by process fffffa8004c39040
Not impersonating
DeviceMap fffff88000007450
Owning Process fffffa80054dfc10 Image: ProcessB.exe
Attached Process N/A Image: N/A
Wait Start TickCount 10916800 Ticks: 10195 (0:00:02:39.296)
Context Switch Count 401
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0×000007fefe647780
Stack Init fffffa6001d33db0 Current fffffa6001d33670
Base fffffa6001d34000 Limit fffffa6001d2e000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 1 IoPriority 2 PagePriority 5
Child-SP RetAddr : Call Site
fffffa60`01d336b0 fffff800`016a36fa : nt!KiSwapContext+0×7f
fffffa60`01d337f0 fffff800`0169835b : nt!KiSwapThread+0×13a
fffffa60`01d33860 fffff800`016cd4e2 : nt!KeWaitForSingleObject+0×2cb
fffffa60`01d338f0 fffff800`01916d14 : nt!AlpcpSignalAndWait+0×92
fffffa60`01d33980 fffff800`019137a6 : nt!AlpcpReceiveSynchronousReply+0×44
fffffa60`01d339e0 fffff800`0190330f : nt!AlpcpProcessSynchronousRequest+0×24f
fffffa60`01d33b00 fffff800`016a0ef3 : nt!NtAlpcSendWaitReceivePort+0×19f
fffffa60`01d33bb0 00000000`774d756a : nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`01d33c20)
00000000`03d8e458 00000000`00000000 : 0×774d756a
1: kd> !alpc /m fffff8800e401200
Message @ fffff8800e401200
MessageID : 0x0BA4 (2980)
CallbackID : 0xC3E68A (12838538)
SequenceNumber : 0x00021911 (137489)
Type : LPC_REQUEST
DataLength : 0x00C0 (192)
TotalLength : 0x00E8 (232)
Canceled : No
Release : No
ReplyWaitReply : No
Continuation : Yes
OwnerPort : fffffa8005b119c0 [ALPC_CLIENT_COMMUNICATION_PORT]
WaitingThread : fffffa800b711060
QueueType : ALPC_MSGQUEUE_PENDING
QueuePort : fffffa8005a32730 [ALPC_CONNECTION_PORT]
QueuePortOwnerProcess : fffffa8004c39040 (ProcessC.exe)
ServerThread : fffffa800a843bb0
QuotaCharged : No
CancelQueuePort : 0000000000000000
CancelSequencePort : 0000000000000000
CancelSequenceNumber : 0×00000000 (0)
ClientContext : 0000000002e2e810
ServerContext : 0000000000000000
PortContext : 00000000002f3eb0
CancelPortContext : 0000000000000000
SecurityData : 0000000000000000
View : 0000000000000000
1: kd> !thread fffffa800a843bb0
THREAD fffffa800a843bb0 Cid 048c.fbec Teb: 000007ffffdaa000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
fffffa8006027d80 Semaphore Limit 0x7fffffff
fffffa800a843c68 NotificationTimer
Not impersonating
DeviceMap fffff88001800ba0
Owning Process fffffa8004c39040 Image: ProcessC.exe
Attached Process N/A Image: N/A
Wait Start TickCount 10916801 Ticks: 10194 (0:00:02:39.281)
Context Switch Count 239
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address 0×000007fefe647780
Stack Init fffffa601b280db0 Current fffffa601b280940
Base fffffa601b281000 Limit fffffa601b27b000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Call Site
fffffa60`1b280980 fffff800`016a36fa : nt!KiSwapContext+0×7f
fffffa60`1b280ac0 fffff800`0169835b : nt!KiSwapThread+0×13a
fffffa60`1b280b30 fffff800`019013e8 : nt!KeWaitForSingleObject+0×2cb
fffffa60`1b280bc0 fffff800`016a0ef3 : nt!NtWaitForSingleObject+0×98
fffffa60`1b280c20 00000000`774d6d5a : nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`1b280c20)
00000000`10b7e548 00000000`00000000 : 0×774d6d5a
Some processes designed to be non-interactive have threads that wait for UI messages and therefore could be potential message or dialog box threads waiting for a dismissal and blocking other threads:
THREAD fffffa8005a7aa20 Cid 061c.0778 Teb: 000007fffff9e000 Win32Thread: fffff900c079fd50 WAIT: (WrUserRequest) UserMode Non-Alertable
fffffa8005a7a5a0 SynchronizationEvent
Not impersonating
DeviceMap fffff88000007450
Owning Process fffffa80058f01b0 Image: ProcessD.exe
Attached Process N/A Image: N/A
Wait Start TickCount 10911798 Ticks: 15197 (0:00:03:57.453)
Context Switch Count 88939 LargeStack
UserTime 00:00:00.078
KernelTime 00:00:00.609
Win32 Start Address 0×000007fefa8238a0
Stack Init fffffa60046a8db0 Current fffffa60046a8720
Base fffffa60046a9000 Limit fffffa60046a0000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffffa60`046a8760 fffff800`016a36fa nt!KiSwapContext+0×7f
fffffa60`046a88a0 fffff800`0169835b nt!KiSwapThread+0×13a
fffffa60`046a8910 fffff960`0014c053 nt!KeWaitForSingleObject+0×2cb
fffffa60`046a89a0 fffff960`0014c0ea win32k!xxxRealSleepThread+0×25f
fffffa60`046a8a40 fffff960`0014bb3a win32k!xxxSleepThread+0×56
fffffa60`046a8a70 fffff960`0014bc39 win32k!xxxRealInternalGetMessage+0×72e
fffffa60`046a8b50 fffff960`0014d0d9 win32k!xxxInternalGetMessage+0×35
fffffa60`046a8b90 fffff800`016a0ef3 win32k!NtUserGetMessage+0×79
fffffa60`046a8c20 00000000`773dd58a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`046a8c20)
00000000`03d2f7b8 00000000`00000000 0×773dd58a
We also have more than 30,000 zombie processes including some special ones signifying past faults:
1: kd> !vm
[...]
15714 ProcessE.exe 0 ( 0 Kb)
15650 WerFault.exe 0 ( 0 Kb)
15644 ProcessF.exe 0 ( 0 Kb)
15640 ProcessE.exe 0 ( 0 Kb)
15610 ProcessG.exe 0 ( 0 Kb)
1560c ProcessE.exe 0 ( 0 Kb)
155f8 ProcessH.exe 0 ( 0 Kb)
155e8 ProcessE.exe 0 ( 0 Kb)
155c4 ProcessG.exe 0 ( 0 Kb)
155bc ProcessE.exe 0 ( 0 Kb)
155b8 ProcessH.exe 0 ( 0 Kb)
1559c WerFault.exe 0 ( 0 Kb)
15560 ProcessE.exe 0 ( 0 Kb)
[…]
What we recommend here is to save user dumps of processes A, B, C and D and then force a kernel dump next time the problem surfaces. Also to check WER settings for any recorder faults and, because of the fact the the system is W2K8, configure LocalDumps registry keys to capture full user dumps.
- Dmitry Vostokov @ DumpAnalysis.org -