Archive for May, 2015

Crash Dump Analysis Patterns (Part 227)

Sunday, May 17th, 2015

Managed code Nested Exceptions give us process virtual space bound stack traces. However, exception objects may be marshaled across processes and even computers. The remote stack trace return addresses don’t have the same validity in different process contexts. Fortunately, there is _remoteStackTraceString field in exception objects that contains the original stack trace. Default analysis command sometimes uses it:

0:013> !analyze -v


Exception object: 00000000025203b0
Exception type: System.Reflection.TargetInvocationException
Message: Exception has been thrown by the target of an invocation.
InnerException: System.Management.Instrumentation.WmiProviderInstallationException, Use !PrintException 0000000002522cf0 to see more.
StackTrace (generated):
SP IP Function
000000001D39E720 0000000000000001 Component!Proxy.Start()+0x20
000000001D39E720 000007FEF503D0B6 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x286
000000001D39E880 000007FEF503CE1A mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0xa
000000001D39E8B0 000007FEF503CDD8 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x58
000000001D39E900 000007FEF4FB0302 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+0x52


MANAGED_STACK_COMMAND: ** Check field _remoteStackTraceString **;!do 2522cf0;!do 2521900


0:013> !DumpObj 2522cf0
000007fef51b77f0 4000054 2c System.String 0 instance 2521900 _remoteStackTraceString

0:013> !DumpObj 2521900
Name: System.String
String: at System.Management.Instrumentation.InstrumentationManager.RegisterType(Type managementType)
at Component.Provider..ctor()
at Component.Start()

Checking this field may also be necessary for exceptions of interest from managed space Execution Residue. We call this pattern Distributed Exception. The basic idea is illustrated in the following diagram using the borrowed UML notation (not limited to just two computers):

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 108)

Wednesday, May 13th, 2015

Palimpsest Messages are messages where some part or all of their content was erased or overwritten.

The name of this pattern comes from palimpsest manuscript scrolls. Such messages may be a part of malnarratives or result from Circular Tracing or trace buffer corruption. Sometimes, not all relevant data is erased and by using Intra- and Inter-Correlation, and via the analysis of Message Invariants it is possible to recover the original data. Also, as in Recovered Messages pattern it may be possible to use Message Context to infer some partial content.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 226)

Wednesday, May 13th, 2015

Occasionally, we look at Stack Trace Collection and notice Internal Stack Trace. This is a stack trace that is shouldn’t be seen in a normal crash dump because statistically it is rare (we planned to name this pattern Rare Stack Trace initially). This stack trace is also not Special Stack Trace because it is not associated with the special system events or problems. It is also not a stack trace that belongs to various Wait Chains or Spiking Threads. This is also a real stack trace and not a reconstructed or hypothetical stack trace such as Rough Stack Trace or Past Stack Trace. This is simply a thread stack trace that shows some internal operation, for example, where it suggests that message hooking was involved:

THREAD fffffa8123702b00 Cid 11cc.0448 Teb: 000007fffffda000 Win32Thread: fffff900c1e6ec20 WAIT: (WrUserRequest) UserMode Non-Alertable
fffffa81230cf4e0 SynchronizationEvent
Not impersonating
DeviceMap fffff8a0058745e0
Owning Process fffffa81237a8b30 Image: ProcessA.exe
Attached Process N/A Image: N/A
Wait Start TickCount 1258266 Ticks: 18 (0:00:00:00.280)
Context Switch Count 13752 IdealProcessor: 1 NoStackSwap LargeStack
UserTime 00:00:00.468
KernelTime 00:00:00.187

Win32 Start Address ProcessA!ThreadProc (0×000007feff17c608)
Stack Init fffff8800878c700 Current fffff8800878ba10
Base fffff8800878d000 Limit fffff88008781000 Call fffff8800878c750
Priority 12 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffff880`0878ba50 fffff800`01a6c8f2 nt!KiSwapContext+0×7a
fffff880`0878bb90 fffff800`01a7dc9f nt!KiCommitThreadWait+0×1d2
fffff880`0878bc20 fffff960`0010dbd7 nt!KeWaitForSingleObject+0×19f
fffff880`0878bcc0 fffff960`0010dc71 win32k!xxxRealSleepThread+0×257
fffff880`0878bd60 fffff960`000c4bf7 win32k!xxxSleepThread+0×59
fffff880`0878bd90 fffff960`000d07a5 win32k!xxxInterSendMsgEx+0×112a
fffff880`0878bea0 fffff960`00151bf8 win32k!xxxCallHook2+0×62d
fffff880`0878c010 fffff960`000d2454 win32k!xxxCallMouseHook+0×40
fffff880`0878c050 fffff960`0010bf23 win32k!xxxScanSysQueue+0×1828

fffff880`0878c390 fffff960`00118fae win32k!xxxRealInternalGetMessage+0×453
fffff880`0878c470 fffff800`01a76113 win32k!NtUserRealInternalGetMessage+0×7e
fffff880`0878c500 00000000`771b913a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`0878c570)
00000000`053ff258 000007fe`fac910f4 USER32!NtUserRealInternalGetMessage+0xa
00000000`053ff260 000007fe`fac911fa DUser!CoreSC::xwProcessNL+0×173
00000000`053ff2d0 00000000`771b9181 DUser!MphProcessMessage+0xbd
00000000`053ff330 00000000`774111f5 USER32!_ClientGetMessageMPH+0×3d
00000000`053ff3c0 00000000`771b908a ntdll!KiUserCallbackDispatcherContinue (TrapFrame @ 00000000`053ff288)
00000000`053ff438 00000000`771b9055 USER32!NtUserPeekMessage+0xa
00000000`053ff440 000007fe`ebae03fa USER32!PeekMessageW+0×105
00000000`053ff490 000007fe`ebae4925 ProcessA+0×5a
00000000`053ff820 00000000`773ec541 kernel32!BaseThreadInitThunk+0xd
00000000`053ff850 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

We see that this thread was neither waiting for significant time nor consuming CPU. It was reported that ProcessA.exe was very slow responding. So perhaps this was slowly punctuated thread execution with periodic small waits. In fact, Execution Residue analysis revealed Non-Coincidental Symbolic Information of the 3rd-party Message Hook and its Module Product Process was identified. Its removal resolved the problem.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 127d)

Sunday, May 10th, 2015

Here we add yet another Technology-Specific Subtrace pattern for COM client calls (as compared to COM interface invocation for servers). We recently got a complete memory dump where we had to find the destination server process, and we used the old technique described in the article In Search of Lost CID. We reprint the 32-bit stack subtrace trace here:

00faf828 7778c38b ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x112
00faf908 776c0565 ole32!CRpcChannelBuffer::SendReceive2+0xd3
00faf974 776c04fa ole32!CAptRpcChnl::SendReceive+0xab
00faf9c8 77ce247f ole32!CCtxComChnl::SendReceive+0×1a9
00faf9e4 77ce252f RPCRT4!NdrProxySendReceive+0×43
00fafdcc 77ce25a6 RPCRT4!NdrClientCall2+0×206

Here’s also an x64 fragment from Semantic Structures (PID.TID) pattern:

00000000`018ce450 000007fe`ffee041b ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0xa3
00000000`018ce4f0 000007fe`ffd819c6 ole32!CRpcChannelBuffer::SendReceive2+0×11b
00000000`018ce6b0 000007fe`ffd81928 ole32!CAptRpcChnl::SendReceive+0×52
00000000`018ce780 000007fe`ffedfcf5 ole32!CCtxComChnl::SendReceive+0×68
00000000`018ce830 000007fe`ff56ba3b ole32!NdrExtpProxySendReceive+0×45
00000000`018ce860 000007fe`ffee02d0 RPCRT4!NdrpClientCall3+0×2e2

If we have the call over ALPC it is easy to find the server process and thread (Wait Chain). In case of a modal loop we can use raw stack analysis technique mentioned above (see also this case study).

Other subtrace examples can be found in pattern examples for High Contention (.NET CLR monitors), Wait Chain (RTL_RESOURCE), and in this case study.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 225)

Saturday, May 9th, 2015

One of the questions asked was what can we do if we got a kernel memory dump instead of the requested complete memory dump? Can it be useful? Of course, if we requested a complete memory dump after analyzing a kernel memory dump then the second kernel dump may be useful for double checking. Therefore, we assume that we just got a kernel memory dump for the first time and the issue is some performance issue or system freeze and not a bugcheck. If we have a bugcheck then kernel memory dumps are sufficient most of the time, and we do not consider them for this pattern.

Such a kernel memory dump is still useful because of user space diagnostic indicators pointing to possible patterns in user space or “interspace”. We call this pattern User Space Evidence. It is a collective super-pattern like Historical Information.

We can see patterns in kernel memory dumps such as Wait Chains (for example, ALPC or Process Objects), Deadlocks (for example ALPC), kernel stack traces corresponding to specific Dual Stack Traces (for example, exception processing), Handle Leaks, Missing Threads, Module Product Process, One-Thread Processes, Spiking Thread, Process Factory (for example, PPID for Zombie Processes), and others.

Found evidence may point to specific processes and process groups (Couples Processes, session processes) and suggest process memory dump collection (especially forcing further complete memory dumps is problematic) or troubleshooting steps for diagnosed processes.

- Dmitry Vostokov @ + -

Trace Analysis Patterns (Part 107)

Sunday, May 3rd, 2015

If you analyze ETW-based traces such as CDF you may frequently encounter No Trace Metafile pattern especially after product updates and fixes. This complicates pattern analysis because we may not be able to see Significant Events, Anchor Messages, and Error Messages. In some cases we can recover messages by comparing Message Context for unknown messages. If we have source code access this may also help. Both approaches are illustrated in the following diagram:

The same approach may also be applied for a different kind of trace artifacts when some messages are corrupt. In such cases it is possible to recover diagnostic evidence and, therefore, we call this pattern Recovered Messages.

- Dmitry Vostokov @ + -