Archive for the ‘Crash Dump Patterns’ Category

Crash Dump Analysis Patterns (Part 13g)

Wednesday, December 30th, 2009

Thanks to Sonny Mir who pointed to !filecache WinDbg command to diagnose low VACB (Virtual Address Control Block or View Address Control Block) conditions I was able to discern another Insufficient Memory pattern for control blocks in general. Certain system and subsystem architectures and designs may put a hard limit on the amount of data structures created to manage resources. If there is a dependency on such resources from other subsystems there could be starvation and blockage conditions resulting in a sluggish system behaviour, absence of a functional response and even in some cases a perceived system, service or application freeze.

7: kd> !filecache
***** Dump file cache******
  Reading and sorting VACBs ...
  Removed 0 nonactive VACBs, processing 1907 active VACBs …
File Cache Information
  Current size 408276 kb
  Peak size    468992 kb
  1907 Control Areas
[…]

I plan to add more insufficient control block case studies including user space.

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump Analysis Anthology, Volume 3

Sunday, December 20th, 2009

“Memory dumps are facts.”

I’m very excited to announce that Volume 3 is available in paperback, hardcover and digital editions:

Memory Dump Analysis Anthology, Volume 3

Table of Contents

In two weeks paperback edition should also appear on Amazon and other bookstores. Amazon hardcover edition is planned to be available in January 2010.

The amount of information was so voluminous that I had to split the originally planned volume into two. Volume 4 should appear by the middle of February together with Color Supplement for Volumes 1-4. 

- Dmitry Vostokov @ DumpAnalysis.org -

Wait chain, blocked thread, waiting thread time, IRP distribution anomaly and stack trace collection: pattern cooperation

Thursday, December 17th, 2009

A kernel dump from a frozen system shows an executive resource wait chain:

0: kd> !locks
[...] 
Resource @ driverA!Resource (0xf58de4e0)    Exclusively owned
    Contention Count = 4411
    NumberOfExclusiveWaiters = 11
     Threads: 86d14ae8-01<*>
     Threads Waiting On Exclusive Access:
              8a788db0       8750e970       86c568a0       897ed428      
              86e34db0       86ca8ac0       86b22020       86fef5d8      
              872abdb0       86d16750       87b55830      
[…]

The blocking thread 86d14ae8 had been blocked waiting for a notification event for more than 2 hours:

0: kd> !thread 86d14ae8 1f
THREAD 86d14ae8  Cid 0004.29c4  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    b81e7adc  NotificationEvent
Not impersonating
DeviceMap                 e1001830
Owning Process            8a78b020       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      8378144        Ticks: 503606 (0:02:11:08.843)
Context Switch Count      1016            
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Start Address driverA!WorkerThreadDispatcher (0xf596ea0e)
Stack Init b81e8000 Current b81e7a2c Base b81e8000 Limit b81e5000 Call 0
Priority 14 BasePriority 10 PriorityDecrement 4
ChildEBP RetAddr 
b81e7a44 8083d5b1 nt!KiSwapContext+0×26
b81e7a70 8083df9e nt!KiSwapThread+0×2e5
b81e7ab8 f59d374d nt!KeWaitForSingleObject+0×346
[…]
b81e7b48 f59b9289 driverB!TcpDisconnect+0×42
[…]
b81e7c40 f595a8a5 nt!IofCallDriver+0×45
b81e7c48 f595ba1e driverA!SubmitTdiRequestNoWait+0×28
[…]
b81e7dac 80920833 driverA!WorkerThreadDispatcher+0×1a
b81e7ddc 8083fe9f nt!PspSystemThreadStartup+0×2e
00000000 00000000 nt!KiThreadStartup+0×16

We see that the wait happens after requesting a TCP disconnect so we check the list of IRP to see if there is any distribution anomaly among pending IRP:

0: kd> !irpfind
  Irp    [ Thread ] irpStack: (Mj,Mn)   DevObj  [Driver]         MDL Process
[...]
86c68d98 [88d2bdb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c6a5c0 [89b118c0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c6b008 [87564b40] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c6caf0 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c7bb28 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c7bd98 [8753ddb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c80008 [88d7b378] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c80590 [88e1c368] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c845a8 [89d2b400] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c84b80 [88d7b378] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c86008 [88e1c368] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c86688 [86d9a788] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c86d98 [88d2bdb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c87990 [88e1c368] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c8b640 [8757c3f0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c8f368 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c8f650 [88d66db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c92590 [87625c30] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c92bc8 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c94008 [8757c3f0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c94318 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c9a308 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c9e008 [88d66db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86c9e308 [89d2b400] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86ca0350 [87638020] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86ca0870 [88d66db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86ca0b28 [88d66db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86ca0d98 [86db0db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86ca4918 [88d66db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86ca6878 [87564b40] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86caa458 [88d7b378] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86cacc20 [86d4fb40] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86cb0818 [89c75bb0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86cb3658 [87638020] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]
86cb9d98 [88d66db0] irpStack: ( f, 6)  89cb5ea8 [ \Driver\Tcpip]

[…]

Indeed, we see a high disproportion of TCP I/O requests (many hundreds) after exporting command output to Excel:

We check all stack traces and see one system thread trying to clean TCP connection blocked for almost the same time (more than 2 hours):

0: kd> !stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
                            [8a78b020 System]
[...]
   4.00268c  870cf768 00765bd Blocked    tcpip!TCPCleanup+0xcf
[…]

0: kd> !whattime  00765bd
484797 Ticks in Standard Time:   2:06:14.953s

0: kd> !thread 870cf768 1f
THREAD 870cf768  Cid 0004.268c  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
    870a01f4  SynchronizationEvent
IRP List:
    8726fb00: (0006,0268) Flags: 00000404  Mdl: 00000000
Not impersonating
DeviceMap                 e1001830
Owning Process            8a78b020       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      8396953        Ticks: 484797 (0:02:06:14.953)
Context Switch Count      537            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Start Address nt!ExpWorkerThread (0×8082da4b)
Stack Init b87b0000 Current b87afa18 Base b87b0000 Limit b87ad000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0
ChildEBP RetAddr 
b87afa30 8083d5b1 nt!KiSwapContext+0×26
b87afa5c 8083df9e nt!KiSwapThread+0×2e5
b87afaa4 f5a9f9a6 nt!KeWaitForSingleObject+0×346
b87afaf0 f5a96a9d tcpip!TCPCleanup+0xcf
b87afb2c 80840153 tcpip!TCPDispatch+0×10c

b87afb40 f75eb817 nt!IofCallDriver+0×45
WARNING: Stack unwind information not available. Following frames may be wrong.
b87afb64 f75e8698 driverC!DispatchPassThrough+0×4c
[…]
b87afbcc 8092ec0a nt!IofCallDriver+0×45
b87afbfc 8092b6af nt!IopCloseFile+0×2ae
b87afc2c 8092b852 nt!ObpDecrementHandleCount+0xcc
b87afc54 8092b776 nt!ObpCloseHandleTableEntry+0×131
b87afc98 8092b7c1 nt!ObpCloseHandle+0×82
b87afca8 80833bdf nt!NtClose+0×1b
b87afca8 8083b00c nt!KiFastCallEntry+0xfc (TrapFrame @ b87afcb4)
b87afd24 f59d3a3a nt!ZwClose+0×11
b87afd3c f59b78a1 driverB!TdiCloseConnection+0×38
[…]
b87afdac 80920833 nt!ExpWorkerThread+0xeb
b87afddc 8083fe9f nt!PspSystemThreadStartup+0×2e
00000000 00000000 nt!KiThreadStartup+0×16

- Dmitry Vostokov @ DumpAnalysis.org -

Debugged! MZ/PE September issue is out

Wednesday, December 16th, 2009

Finally, after the long delay, the issue is available in print on Amazon and through other sellers:

Debugged! MZ/PE: Software Tracing

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

10 Common Mistakes in Memory Analysis (Part 6)

Tuesday, December 8th, 2009

Some debugger commands or commands they invoke can be context-sensitive and their diagnostic output can depend on a current thread or a process set in a debugger, not to mention loaded debugger extensions and even their load order. Therefore, it is advisable to be context-conscious about or at least to know about context sensitivity. For example, in one mmc.exe process memory dump a default analysis command in x64 WinDbg doesn’t show any managed stack trace reported by a user who had seen it in a failure dialog box:

0:000> !analyze -v

[...]

MANAGED_BITNESS_MISMATCH:
Managed code needs matching platform of sos.dll for proper analysis. Use 'x86' debugger.

PRIMARY_PROBLEM_CLASS:  STATUS_BREAKPOINT

BUGCHECK_STR:  APPLICATION_FAULT_STATUS_BREAKPOINT

STACK_TEXT: 
0007fc98 7c827d19 77e6202c 00000002 0007fce8 ntdll!KiFastSystemCallRet
0007fc9c 77e6202c 00000002 0007fce8 00000001 ntdll!NtWaitForMultipleObjects+0xc
0007fd44 7739bbd1 00000002 0007fd6c 00000000 kernel32!WaitForMultipleObjectsEx+0x11a
0007fda0 6c296601 00000001 0007fdd4 ffffffff user32!RealMsgWaitForMultipleObjectsEx+0x141
0007fdc0 6c29684b 000004ff ffffffff 00000001 duser!CoreSC::Wait+0x3a
0007fdf4 6c29693d 0007fe34 00000000 00000000 duser!CoreSC::xwProcessNL+0xab
0007fe14 773b0c02 0007fe34 00000000 00000000 duser!MphProcessMessage+0x2e
0007fe5c 7c828556 0007fe74 00000014 0007ffb0 user32!__ClientGetMessageMPH+0x30
0007fe84 7739c811 7739c844 01116894 00000000 ntdll!KiUserCallbackDispatcher+0x2e
0007fea4 7f072fd6 01116894 00000000 00000000 user32!NtUserGetMessage+0xc
0007fec0 010080ef 01116894 01116860 00000002 mfc42u!CWinThread::PumpMessage+0x16
0007fef0 7f072dda 01116860 01116860 ffffffff mmc!CAMCApp::PumpMessage+0x37
0007ff08 7f044d5b ffffffff 00000002 7ffd9000 mfc42u!CWinThread::Run+0x4a
0007ff1c 01034e19 01000000 00000000 00020710 mfc42u!AfxWinMain+0x7b
0007ffc0 77e6f23b 00000000 00000000 7ffd9000 mmc!wWinMainCRTStartup+0x19d
0007fff0 00000000 01034cb0 00000000 78746341 kernel32!BaseProcessStart+0x23

Instead of concluding that the dump file wasn’t saved at the time of the failure we pay attention to all aspects of the default analysis and see that we need a platform-specific debugger. We load the same dump file into x86 WinDbg: 

0:000> !analyze -v

[...]

MANAGED_STACK: !dumpstack -EE
No export dumpstack found

Now we see that we need to load SOS extension explicitly and retry: 

0:000> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll

0:000> !analyze -v

[...]

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x4ec (0)
Current frame:
ChildEBP RetAddr  Caller,Callee

Managed stack trace is empty here but look at all threads (we list full traces in order not to miss any module) we find one that shows a dialog box reporting a failure:

0:000> ~*kL 100

[...]

  17  Id: 658.7e4 Suspend: 1 Teb: 7ff48000 Unfrozen
ChildEBP RetAddr 
06b4f498 7739bf53 ntdll!KiFastSystemCallRet
06b4f4d0 7738965e user32!NtUserWaitMessage+0xc
06b4f4f8 773896a0 user32!InternalDialogBox+0xd0
06b4f518 773896e8 user32!DialogBoxIndirectParamAorW+0x37
06b4f53c 4afde2e1 user32!DialogBoxParamW+0×3f
06b4f584 4b05c4bc mmcndmgr!IsolationAwareDialogBoxParamW+0×4e
06b4f5a4 4b05c6eb mmcndmgr!ATL::CDialogImpl<CSnapInFailureReportDialog, ATL::CWindow>::DoModal+0×4f
06b4f68c 77c80193 mmcndmgr!CSnapInFailureReporter::ReportSnapInFailure+0×195

06b4f6b8 77ce33e1 rpcrt4!Invoke+0×30
06b4fab8 77ce2ed5 rpcrt4!NdrStubCall2+0×299
06b4fb10 7778d01b rpcrt4!CStdStubBuffer_Invoke+0xc6
06b4fb54 7778cfc8 ole32!SyncStubInvoke+0×37
06b4fb9c 776c120b ole32!StubInvoke+0xa7
06b4fc78 776c0bf5 ole32!CCtxComChnl::ContextInvoke+0xec
06b4fc94 776bc455 ole32!MTAInvoke+0×1a
06b4fcc0 7778ced5 ole32!STAInvoke+0×48
06b4fcf4 7778cd66 ole32!AppInvoke+0xa3
06b4fdc8 7778c24d ole32!ComInvokeWithLockAndIPID+0×2c5
06b4fdf0 776bc344 ole32!ComInvoke+0xca
06b4fe04 776bc30f ole32!ThreadDispatch+0×23
06b4fe1c 7739b6e3 ole32!ThreadWndProc+0xfe
06b4fe48 7739b874 user32!InternalCallWinProc+0×28
06b4fec0 7739ba92 user32!UserCallWinProcCheckWow+0×151
06b4ff28 7739bad0 user32!DispatchMessageWorker+0×327
06b4ff38 7768ffdc user32!DispatchMessageW+0xf
06b4ff6c 7768f366 ole32!CDllHost::STAWorkerLoop+0×5c
06b4ff88 7768f2a2 ole32!CDllHost::WorkerThread+0xc8
06b4ff90 776bbab4 ole32!DLLHostThreadEntry+0xd
06b4ffac 776b1704 ole32!CRpcThread::WorkerLoop+0×26
06b4ffb8 77e6482f ole32!CRpcThreadCache::RpcWorkerThreadEntry+0×20
06b4ffec 00000000 kernel32!BaseThreadStart+0×34

The previous thread #16 is a CLR thread loading an assembly:

  16  Id: 658.28c Suspend: 1 Teb: 7ffd4000 Unfrozen
ChildEBP RetAddr 
0714d024 7c827d29 ntdll!KiFastSystemCallRet
0714d028 77e61d1e ntdll!ZwWaitForSingleObject+0xc
0714d098 73ca790b kernel32!WaitForSingleObjectEx+0xac
0714d0c4 73ca485a cryptnet!CryptRetrieveObjectByUrlWithTimeout+0x12f
0714d0f0 73ca37ce cryptnet!CryptRetrieveObjectByUrlW+0x9b
0714d168 73ca4a60 cryptnet!RetrieveObjectByUrlValidForSubject+0x5b
0714d1b8 73ca3525 cryptnet!RetrieveTimeValidObjectByUrl+0xbc
0714d220 73ca3473 cryptnet!CTVOAgent::GetTimeValidObjectByUrl+0xc2
0714d2d0 73ca3314 cryptnet!CTVOAgent::GetTimeValidObject+0x2f1
0714d300 73ca2c00 cryptnet!FreshestCrlFromCrlGetTimeValidObject+0x2d
0714d344 73ca43a4 cryptnet!CryptGetTimeValidObject+0x58
0714d3a0 73ca3122 cryptnet!GetTimeValidCrl+0x1e0
0714d3e4 73ca3080 cryptnet!GetBaseCrl+0x34
0714d484 761d9033 cryptnet!MicrosoftCertDllVerifyRevocation+0x128
0714d514 761d8eef crypt32!I_CryptRemainingMilliseconds+0x21b
0714d584 761cf39f crypt32!CertVerifyRevocation+0xb7
0714d604 761c6966 crypt32!CChainPathObject::CalculateRevocationStatus+0x1f2
0714d64c 761c6771 crypt32!CChainPathObject::CalculateAdditionalStatus+0x147
0714d708 761c78bc crypt32!CCertChainEngine::CreateChainContextFromPathGraph+0x227
0714d738 761c783f crypt32!CCertChainEngine::GetChainContext+0x44
0714d760 76bb6d8f crypt32!CertGetCertificateChain+0x60
0714d7c4 76bb6bbc wintrust!_WalkChain+0x1a8
0714d800 76bb39ef wintrust!WintrustCertificateTrust+0xb7
0714d8f4 76bb31e2 wintrust!_VerifyTrust+0x144
0714d918 64025b1b wintrust!WinVerifyTrust+0x4e
0714d9bc 7a117c85 mscorsec!GetPublisher+0xe4
0714da14 79ebeccb mscorwks!PEFile::CheckSecurity+0xcb
0714da3c 79ebec14 mscorwks!PEAssembly::DoLoadSignatureChecks+0x3a
0714da64 79ebf05a mscorwks!PEAssembly::PEAssembly+0x109
0714dd00 79ebf155 mscorwks!PEAssembly::DoOpen+0x103
0714dd94 79eb8ff2 mscorwks!PEAssembly::Open+0x79
0714def8 79eb6a5e mscorwks!AppDomain::BindAssemblySpec+0x247
0714df90 79eb691c mscorwks!PEFile::LoadAssembly+0×95
0714e030 79eb68c0 mscorwks!Module::LoadAssembly+0xee

0714e06c 79e92873 mscorwks!Assembly::FindModuleByTypeRef+0×113
0714e0d8 79fc3dc8 mscorwks!ClassLoader::ResolveTokenToTypeDefThrowing+0×88
0714e12c 79fc953d mscorwks!CEEInfo::AddDependencyOnClassToken+0×103
0714e158 79fc61cf mscorwks!CEEInfo::ScanForModuleDependencies+0xa3
0714e1fc 7908bce1 mscorwks!CEEInfo::getArgType+0×256
0714e214 7908bc5b mscorjit!Compiler::eeGetArgType+0×23
0714e25c 79067745 mscorjit!Compiler::impInlineInitVars+0×3c3
0714e4fc 790673d5 mscorjit!Compiler::fgInvokeInlineeCompiler+0×95
0714e518 79067400 mscorjit!Compiler::fgMorphCallInline+0×41
0714e52c 79065272 mscorjit!Compiler::fgInline+0×30
0714e534 7906513e mscorjit!Compiler::fgMorph+0×45
0714e544 79065b8e mscorjit!Compiler::compCompile+0×83
0714e590 79065d33 mscorjit!Compiler::compCompile+0×44f
0714e618 79066448 mscorjit!jitNativeCode+0xef
0714e63c 79fc7198 mscorjit!CILJit::compileMethod+0×25
0714e6a8 79fc722d mscorwks!invokeCompileMethodHelper+0×72
0714e6ec 79fc72a0 mscorwks!invokeCompileMethod+0×31
0714e740 79fc7019 mscorwks!CallCompileMethodWithSEHWrapper+0×5b
0714eae8 79fc6ddb mscorwks!UnsafeJitFunction+0×31b
0714eb8c 79e811a3 mscorwks!MethodDesc::MakeJitWorker+0×1a8
0714ebe4 79e81363 mscorwks!MethodDesc::DoPrestub+0×41b
0714ec34 01c01efe mscorwks!PreStubWorker+0xf3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0714ec4c 06b08f29 0×1c01efe
0714ecb4 06b088dc 0×6b08f29
0714edf0 79e71b4c 0×6b088dc
0714edf4 79e7e45d mscorwks!CallDescrWorker+0×33
0714eeac 79e968b0 mscorwks!MethodDesc::IsSharedByGenericInstantiations+0×1c
0714ef2c 79e9eeb2 mscorwks!MetaSig::MetaSig+0×3a
0714f258 00000000 mscorwks!JIT_MonReliableEnter+0×120

If we switch to it we get a managed stack:

0:000> ~16s
eax=0714ce90 ebx=048bb528 ecx=77e63d5b edx=7ffd4000 esi=0000073c edi=00000000
eip=7c82860c esp=0714d028 ebp=0714d098 iopl=0 nv up ei ng nz ac pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000297
ntdll!KiFastSystemCallRet:
7c82860c c3              ret

0:016> !analyze -v

[...]

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x28c (16)
Current frame:
ChildEBP RetAddr  Caller,Callee
[...]
0714f264 792e0c3a (MethodDesc 0x79104344 +0xa System.Reflection.RuntimeMethodInfo.GetParametersNoCopy())
0714f28c 792d52d8 (MethodDesc 0x790c5058 +0x48 System.RuntimeMethodHandle.InvokeMethodFast(System.Object, System.Object[], System.Signature, System.Reflection.MethodAttributes, System.RuntimeTypeHandle))
0714f2dc 792d5086 (MethodDesc 0x791043a4 +0x106 System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, Boolean))
0714f314 792d4f6e (MethodDesc 0x7910439c +0x1e System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo))
0714f338 7928ea4b (MethodDesc 0x79108798 +0x82b System.RuntimeType.InvokeMember(System.String, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object, System.Object[], System.Reflection.ParameterModifier[], System.Globalization.CultureInfo, System.String[]))
0714f478 7973ea9d (MethodDesc 0x79108264 +0x1d System.Type.InvokeMember(System.String, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object, System.Object[]))
[...]
0714f588 792d6cf6 (MethodDesc 0x791939dc +0x66 System.Threading.ThreadHelper.ThreadStart_Context(System.Object))
0714f594 792e019f (MethodDesc 0x7910276c +0x6f System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object))
0714f5a8 792d6c74 (MethodDesc 0x790fbde4 +0x44 System.Threading.ThreadHelper.ThreadStart())

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 95)

Monday, December 7th, 2009

In cases where we don’t see managed code exceptions or managed stack traces by default, we need to identify CLR threads in order to try various SOS commands and start digging into a managed realm. These threads are easily distinguished by mscorwks module on their stack traces (don’t forget to list full stack traces):

0:000> ~*kL 100

.  0  Id: 658.4ec Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr 
0007fc98 7c827d19 ntdll!KiFastSystemCallRet
0007fc9c 77e6202c ntdll!NtWaitForMultipleObjects+0xc
0007fd44 7739bbd1 kernel32!WaitForMultipleObjectsEx+0x11a
0007fda0 6c296601 user32!RealMsgWaitForMultipleObjectsEx+0x141
0007fdc0 6c29684b duser!CoreSC::Wait+0x3a
0007fdf4 6c29693d duser!CoreSC::xwProcessNL+0xab
0007fe14 773b0c02 duser!MphProcessMessage+0x2e
0007fe5c 7c828556 user32!__ClientGetMessageMPH+0x30
0007fe84 7739c811 ntdll!KiUserCallbackDispatcher+0x2e
0007fea4 7f072fd6 user32!NtUserGetMessage+0xc
0007fec0 010080ef mfc42u!CWinThread::PumpMessage+0x16
0007fef0 7f072dda mmc!CAMCApp::PumpMessage+0x37
0007ff08 7f044d5b mfc42u!CWinThread::Run+0x4a
0007ff1c 01034e19 mfc42u!AfxWinMain+0x7b
0007ffc0 77e6f23b mmc!wWinMainCRTStartup+0x19d
0007fff0 00000000 kernel32!BaseProcessStart+0x23

   1  Id: 658.82c Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr 
003afea0 7c827d19 ntdll!KiFastSystemCallRet
003afea4 7c80e5bb ntdll!NtWaitForMultipleObjects+0xc
003aff48 7c80e4a2 ntdll!EtwpWaitForMultipleObjectsEx+0xf7
003affb8 77e6482f ntdll!EtwpEventPump+0x27f
003affec 00000000 kernel32!BaseThreadStart+0x34

   2  Id: 658.648 Suspend: 1 Teb: 7ffdd000 Unfrozen
ChildEBP RetAddr 
00f3fe18 7c827859 ntdll!KiFastSystemCallRet
00f3fe1c 77c885ac ntdll!NtReplyWaitReceivePortEx+0xc
00f3ff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x198
00f3ff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
00f3ffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0x9d
00f3ffb8 77e6482f rpcrt4!ThreadStartRoutine+0x1b
00f3ffec 00000000 kernel32!BaseThreadStart+0x34

   3  Id: 658.640 Suspend: 1 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr 
0156fdb4 7c827d19 ntdll!KiFastSystemCallRet
0156fdb8 77e6202c ntdll!NtWaitForMultipleObjects+0xc
0156fe60 7739bbd1 kernel32!WaitForMultipleObjectsEx+0x11a
0156febc 6c296601 user32!RealMsgWaitForMultipleObjectsEx+0x141
0156fedc 6c29684b duser!CoreSC::Wait+0x3a
0156ff10 6c28f9e6 duser!CoreSC::xwProcessNL+0xab
0156ff30 6c28bce1 duser!GetMessageExA+0x44
0156ff84 77bcb530 duser!ResourceManager::SharedThreadProc+0xb6
0156ffb8 77e6482f msvcrt!_endthreadex+0xa3
0156ffec 00000000 kernel32!BaseThreadStart+0x34

   4  Id: 658.e74 Suspend: 1 Teb: 7ffda000 Unfrozen
ChildEBP RetAddr 
01d1fe30 7c827d19 ntdll!KiFastSystemCallRet
01d1fe34 77e6202c ntdll!NtWaitForMultipleObjects+0xc
01d1fedc 77e62fbe kernel32!WaitForMultipleObjectsEx+0x11a
01d1fef8 79f02541 kernel32!WaitForMultipleObjects+0x18
01d1ff58 79f0249e mscorwks!DebuggerRCThread::MainLoop+0xe9
01d1ff88 79f023c5 mscorwks!DebuggerRCThread::ThreadProc+0xe5
01d1ffb8 77e6482f mscorwks!DebuggerRCThread::ThreadProcStatic+0×9c

01d1ffec 00000000 kernel32!BaseThreadStart+0×34

   5  Id: 658.4d4 Suspend: 1 Teb: 7ffd8000 Unfrozen
ChildEBP RetAddr 
03dffcc4 7c827d19 ntdll!KiFastSystemCallRet
03dffcc8 77e6202c ntdll!NtWaitForMultipleObjects+0xc
03dffd70 77e62fbe kernel32!WaitForMultipleObjectsEx+0x11a
03dffd8c 79f92bcb kernel32!WaitForMultipleObjects+0x18
03dffdac 79f97028 mscorwks!WKS::WaitForFinalizerEvent+0×77
03dffdc0 79e9845f mscorwks!WKS::GCHeap::FinalizerThreadWorker+0×49
03dffdd4 79e983fb mscorwks!Thread::DoADCallBack+0×32a
03dffe68 79e98321 mscorwks!Thread::ShouldChangeAbortToUnload+0xe3
03dffea4 79eef6cc mscorwks!Thread::ShouldChangeAbortToUnload+0×30a
03dffecc 79eef6dd mscorwks!ManagedThreadBase_NoADTransition+0×32
03dffedc 79f3c63c mscorwks!ManagedThreadBase::FinalizerBase+0xd
03dfff14 79f92015 mscorwks!WKS::GCHeap::FinalizerThreadStart+0xbb
03dfffb8 77e6482f mscorwks!Thread::intermediateThreadProc+0×49

03dfffec 00000000 kernel32!BaseThreadStart+0×34

   6  Id: 658.f54 Suspend: 1 Teb: 7ffd6000 Unfrozen
ChildEBP RetAddr 
040afec4 7c826f69 ntdll!KiFastSystemCallRet
040afec8 77e41ed5 ntdll!NtDelayExecution+0xc
040aff30 79fd8a41 kernel32!SleepEx+0x68
040affac 79fd88ef mscorwks!ThreadpoolMgr::TimerThreadFire+0×6d
040affb8 77e6482f mscorwks!ThreadpoolMgr::TimerThreadStart+0×57

040affec 00000000 kernel32!BaseThreadStart+0×34

   7  Id: 658.988 Suspend: 1 Teb: 7ffd5000 Unfrozen
ChildEBP RetAddr 
0410fc2c 7c827d29 ntdll!KiFastSystemCallRet
0410fc30 77e61d1e ntdll!ZwWaitForSingleObject+0xc
0410fca0 79e8c5f9 kernel32!WaitForSingleObjectEx+0xac
0410fce4 79e8c52f mscorwks!PEImage::LoadImage+0×1af
0410fd34 79e8c54e mscorwks!CLREvent::WaitEx+0×117
0410fd48 79ee3f35 mscorwks!CLREvent::Wait+0×17
0410fe14 79f92015 mscorwks!AppDomain::ADUnloadThreadStart+0×308
0410ffb8 77e6482f mscorwks!Thread::intermediateThreadProc+0×49

0410ffec 00000000 kernel32!BaseThreadStart+0×34

   8  Id: 658.e0 Suspend: 1 Teb: 7ff4f000 Unfrozen
ChildEBP RetAddr 
0422fcec 7c827d19 ntdll!KiFastSystemCallRet
0422fcf0 7c83c7be ntdll!NtWaitForMultipleObjects+0xc
0422ffb8 77e6482f ntdll!RtlpWaitThread+0x161
0422ffec 00000000 kernel32!BaseThreadStart+0x34

   9  Id: 658.db4 Suspend: 1 Teb: 7ff4e000 Unfrozen
ChildEBP RetAddr 
0447fec0 7c827d19 ntdll!KiFastSystemCallRet
0447fec4 77e6202c ntdll!NtWaitForMultipleObjects+0xc
0447ff6c 77e62fbe kernel32!WaitForMultipleObjectsEx+0x11a
0447ff88 76929e35 kernel32!WaitForMultipleObjects+0x18
0447ffb8 77e6482f userenv!NotificationThread+0x5f
0447ffec 00000000 kernel32!BaseThreadStart+0x34

  10  Id: 658.e7c Suspend: 1 Teb: 7ff4c000 Unfrozen
ChildEBP RetAddr 
0550ff7c 7c8277f9 ntdll!KiFastSystemCallRet
0550ff80 71b25914 ntdll!NtRemoveIoCompletion+0xc
0550ffb8 77e6482f mswsock!SockAsyncThread+0x69
0550ffec 00000000 kernel32!BaseThreadStart+0x34

[...]

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 94a)

Monday, November 30th, 2009

Memory dump analysis is all about deviations and of them is Value Deviation (a super pattern), be it a number of open handles, a heap size, a  number of contended lockstime spent in kernel, and so on. Every system or process property has its average and mean values and large deviations are noticable as the so called anomalies. In this post we provide an example of a stack trace size (depth) deviation. The average number of frames for most stack traces is dependent on the type of a memory dump: user, kernel and complete but considerably longer or shorter stack traces are clearly visible in stack trace collections. I originally planned to call this pattern a Black Swan but after a moment of thought dismissed that idea because such deviations are not really rare after all. Here is an example of a stack trace collection from a CPU spiking process with a number of identical stack traces with just only 3 frames:

0:000> ~*kL

[...]

  19  Id: 1054.1430 Suspend: 1 Teb: 7ff9c000 Unfrozen
ChildEBP RetAddr 
1ac6ff50 7739bf53 ntdll!KiFastSystemCallRet
1ac6ffb8 77e6482f user32!NtUserWaitMessage+0xc
1ac6ffec 00000000 kernel32!BaseThreadStart+0x34

  20  Id: 1054.c90 Suspend: 1 Teb: 7ffaf000 Unfrozen
ChildEBP RetAddr 
1b30ff50 7739bf53 ntdll!KiFastSystemCallRet
1b30ffb8 77e6482f user32!NtUserWaitMessage+0xc
1b30ffec 00000000 kernel32!BaseThreadStart+0x34

  21  Id: 1054.a34 Suspend: 1 Teb: 7ff9a000 Unfrozen
ChildEBP RetAddr 
1b63ff50 7739bf53 ntdll!KiFastSystemCallRet
1b63ffb8 77e6482f user32!NtUserWaitMessage+0xc
1b63ffec 00000000 kernel32!BaseThreadStart+0×34

  22  Id: 1054.1584 Suspend: 1 Teb: 7ff99000 Unfrozen
ChildEBP RetAddr 
1ba9ff50 7739bf53 ntdll!KiFastSystemCallRet
1ba9ffb8 77e6482f user32!NtUserWaitMessage+0xc
1ba9ffec 00000000 kernel32!BaseThreadStart+0x34

[...]

These stack traces are correct from RetAddr analysis perspective:

0:000> ub 7739bf53
user32!PeekMessageW+0×11e:
7739bf42 nop
7739bf43 nop
7739bf44 nop
7739bf45 nop
7739bf46 nop
user32!NtUserWaitMessage:
7739bf47 mov     eax,124Ah
7739bf4c mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)
7739bf51 call    dword ptr [edx]

0:000> ub 77e6482f
kernel32!BaseThreadStart+0×10:
77e6480b mov     eax,dword ptr fs:[00000018h]
77e64811 cmp     dword ptr [eax+10h],1E00h
77e64818 jne     kernel32!BaseThreadStart+0×2e (77e64829)
77e6481a cmp     byte ptr [kernel32!BaseRunningInServerProcess (77ecb008)],0
77e64821 jne     kernel32!BaseThreadStart+0×2e (77e64829)
77e64823 call    dword ptr [kernel32!_imp__CsrNewThread (77e4132c)]
77e64829 push    dword ptr [ebp+0Ch]
77e6482c call    dword ptr [ebp+8]

Looking at their thread times reveals that they were the most spikers:

0:000> !runaway
 User Mode Time
  Thread       Time
  19:1430      0 days 0:01:34.109
  22:1584      0 days 0:01:28.140
  21:a34       0 days 0:01:26.765
  20:c90       0 days 0:01:24.218

   0:e78       0 days 0:00:01.687
  10:398       0 days 0:00:01.062
   7:14e8      0 days 0:00:00.250
   4:1258      0 days 0:00:00.093
   6:2e8       0 days 0:00:00.015
   1:11c0      0 days 0:00:00.015
  26:1328      0 days 0:00:00.000
  25:7ec       0 days 0:00:00.000
[…]

In order to hypothesize about a possible culptit component we look at execution residue left on their raw stack data. Indeed, we see lots of non-coincidental symbolic references to 3rdPartyExtension module:

0:000> ~22s
eax=00000000 ebx=00000000 ecx=1ba9f488 edx=00000001 esi=1952bd40 edi=00000000
eip=7c82860c esp=1ba9ff54 ebp=1ba9ffb8 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00240246
ntdll!KiFastSystemCallRet:
7c82860c ret

0:022> !teb
TEB at 7ff99000
    ExceptionList:        1ba9ffdc
    StackBase:            1baa0000
    StackLimit:           1ba8f000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ff99000
    EnvironmentPointer:   00000000
    ClientId:             00001054 . 00001584
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffd5000
    LastErrorValue:       0
    LastStatusValue:      c0000034
    Count Owned Locks:    0
    HardErrorMode:        0

0:022> dds 1ba8f000 1baa0000
1ba8f000  00000000
1ba8f004  00000000
[...]
1ba939e8  00000000
1ba939ec  00000000
1ba939f0  00000037
1ba939f4  1906e6c0
1ba939f8  064e1112 3rdPartyExtension!DllUnregisterServer+0xe1f1f
1ba939fc  1a042678
1ba93a00  034d2918
1ba93a04  00000000
1ba93a08  1a042660
1ba93a0c  00000008
1ba93a10  064e18ea 3rdPartyExtension!DllUnregisterServer+0xe26f7
1ba93a14  1a042678
1ba93a18  00000001
1ba93a1c  034d2870
1ba93a20  034d2b78
1ba93a24  0000001f
1ba93a28  00000007
1ba93a2c  034d2870
1ba93a30  1a01fc68
1ba93a34  00000001
1ba93a38  1ba93a54
1ba93a3c  064e1b45 3rdPartyExtension!DllUnregisterServer+0xe2952
1ba93a40  034d2b78
1ba93a44  00000000
1ba93a48  00000000
1ba93a4c  06e7b498
1ba93a50  00000212
1ba93a54  1ba93c00
1ba93a58  064e3bce 3rdPartyExtension!DllUnregisterServer+0xe49db
1ba93a5c  00000001
1ba93a60  00000001
1ba93a64  00000000
1ba93a68  115d7fbc
1ba93a6c  06e7b498
1ba93a70  062de91d 3rdPartyExtension+0xe91d
1ba93a74  0000020c
1ba93a78  1ba93b78
1ba93a7c  06363797 3rdPartyExtension+0×93797
1ba93a80  00000024
1ba93a84  00000000
1ba93a88  00000000
1ba93a8c  1ba93ee0
[…]

0:022> ub 064e1112
3rdPartyExtension!DllUnregisterServer+0xe1f0d:
064e1100 jge     3rdPartyExtension!DllUnregisterServer+0xe1f16 (064e1109)
064e1102 mov     ecx,dword ptr [ecx+10h]
064e1105 cmp     ecx,eax
064e1107 jne     3rdPartyExtension!DllUnregisterServer+0xe1f0a (064e10fd)
064e1109 push    ecx
064e110a push    ebx
064e110b mov     ecx,edi
064e110d call    3rdPartyExtension!DllUnregisterServer+0xe1d17 (064e0f0a)

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 93)

Tuesday, November 24th, 2009

Analysis of .NET managed code requires processor architectural platform specific SOS extension. For example, x64 WinDbg is not able to analyze the managed stack for a managed code exception in 32-bit process:

0:010> !analyze -v

[...]

FAULTING_IP:
kernel32!RaiseException+53
77e4bee7 5e              pop     esi

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 77e4bee7 (kernel32!RaiseException+0x00000053)
   ExceptionCode: e0434f4d (CLR exception)
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 80131509

[...]

MANAGED_STACK: !dumpstack -EE
No export dumpstack found

MANAGED_BITNESS_MISMATCH:
Managed code needs matching platform of sos.dll for proper analysis. Use ‘x86′ debugger.

[...]

0:010> kL 100
ChildEBP RetAddr 
0573f0a4 79f071ac kernel32!RaiseException+0x53
0573f104 79f0a780 mscorwks!RaiseTheExceptionInternalOnly+0x2a8
0573f1a8 058ed3b3 mscorwks!JIT_Rethrow+0xbf
WARNING: Frame IP not in any known module. Following frames may be wrong.
0573f33c 793b0d1f <Unloaded_DllA.dll>+0x58ed3b2
0573f344 79373ecd mscorlib_ni+0x2f0d1f
0573f358 793b0c68 mscorlib_ni+0x2b3ecd
0573f370 79e7c74b mscorlib_ni+0x2f0c68
0573f380 79e7c6cc mscorwks!CallDescrWorker+0x33
0573f400 79e7c8e1 mscorwks!CallDescrWorkerWithHandler+0xa3
0573f53c 79e7c783 mscorwks!MethodDesc::CallDescr+0x19c
0573f558 79e7c90d mscorwks!MethodDesc::CallTargetWorker+0x1f
0573f56c 79fc58cd mscorwks!MethodDescCallSite::Call_RetArgSlot+0x18
0573f754 79ef3207 mscorwks!ThreadNative::KickOffThread_Worker+0x190
0573f768 79ef31a3 mscorwks!Thread::DoADCallBack+0x32a
0573f7fc 79ef30c3 mscorwks!Thread::ShouldChangeAbortToUnload+0xe3
0573f838 79ef4826 mscorwks!Thread::ShouldChangeAbortToUnload+0x30a
0573f860 79fc57b1 mscorwks!Thread::ShouldChangeAbortToUnload+0x33e
0573f878 79fc56ac mscorwks!ManagedThreadBase::KickOff+0x13
0573f914 79f95a2e mscorwks!ThreadNative::KickOffThread+0x269
0573ffb8 77e64829 mscorwks!Thread::intermediateThreadProc+0x49
0573ffec 00000000 kernel32!BaseThreadStart+0x34

So we dutifully run x86 WinDbg and get the better picture of nested exceptions:

0:010> !analyze -v

[...]

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0xc68 (15)
Current frame:
ChildEBP RetAddr  Caller,Callee

EXCEPTION_OBJECT: !pe 16584f0
Exception object: 016584f0
Exception type: System.InvalidOperationException
Message: There is an error in XML document (12, 12182).

InnerException: System.IO.IOException, use !PrintException 0164f6dc to see more
[…]

StackTraceString: <none>
HResult: 80131509
There are nested exceptions on this thread. Run with -nested for details

EXCEPTION_OBJECT: !pe 164f6dc
Exception object: 0164f6dc
Exception type: System.IO.IOException
Message: Unable to read data from the transport connection: The connection was closed.

InnerException: <none>
[…]

StackTraceString: <none>
HResult: 80131620
There are nested exceptions on this thread. Run with -nested for details

MANAGED_OBJECT: !dumpobj 1655a38
Name: System.String
MethodTable: 790fd8c4
EEClass: 790fd824
Size: 270(0x10e) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__[...]\mscorlib.dll)
String: Unable to read data from the transport connection: The connection was closed.

[...]

EXCEPTION_MESSAGE:  Unable to read data from the transport connection: The connection was closed.

MANAGED_OBJECT_NAME:  System.IO.IOException

[...]

There are other pattern instances of this kind when we need a Platform-Specific Debugger, for example, to do live debugging of an x86 process on x64 machine (needed x64 debugger) or we need to load an old 32-bit DLL extension (needed x86 debugger) for a postmortem analysis.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 92)

Tuesday, November 24th, 2009

Sometimes the functionality of a system depends upon a specific application or service process. For example, in a database server environment it might be a database process, in printing environment it is a print spooler process or in a terminal services environment it is a terminal services process (termsvc, hosted by svchost.exe). In system failure scenarios we should check these processes for their presence (and also the presence of any coupled processes), hence the name of this pattern: Missing Process. However, if the vital process is present we should check if it is exited but references to it exist or there are any missing threads or components inside it, any suspended threads and special processes like a postmortem debugger. We shouldn’t also forget about service dependencies and their relevant process startup order. For example, we know that our service is hosted by svchost.exe and we see one such process exited but its object still referenced somewhere:

0: kd> !vm

*** Virtual Memory Usage ***
[...]
         0ed8 svchost.exe          0 (         0 Kb)
[…]

However, another command shows that it could be a different service hosted by the same image, svchost.exe, if we know that ServiceA depends on our service:

0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 8b581818  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
    DirBase: bff4d020  ObjectTable: e1001e18  HandleCount: 1601.
    Image: System

PROCESS 8b06d778  SessionId: none  Cid: 01a8    Peb: 7ffde000  ParentCid: 0004
    DirBase: bff4d040  ObjectTable: e13eae40  HandleCount:  22.
    Image: smss.exe

[...]

PROCESS 8aabed88  SessionId: 0  Cid: 0854    Peb: 7ffd6000  ParentCid: 0220
    DirBase: bff4d4a0  ObjectTable: e1c867a8  HandleCount: 778.
    Image: ServiceA.exe

[...]

PROCESS 8aaa6510  SessionId: 0  Cid: 0ed8    Peb: 7ffd4000  ParentCid: 0220
    DirBase: bff4d580  ObjectTable: 00000000  HandleCount:   0.
    Image: svchost.exe

[...]

Another alternative is that our service was restarted but then exited. If our process is not visible it could be possible that it was either stopped or simply crashed before.

- Dmitry Vostokov @ DumpAnalysis.org -

Stack trace collection, missing threads, waiting time, critical section and LPC wait chains: pattern cooperation

Friday, November 20th, 2009

Windows shutdown couldn’t progress and a complete memory dump was saved for later postmortem analysis. !stacks command showed reduced number of waiting threads in one important system service:

0: kd> !stacks
[...]
                            [89d6d8e8 ServiceA.exe]
1454.0014b0  89d36b60 0000163 Blocked    DriverA!Check+0x177
[...]

Normally this service had at least a dozen waiting threads. If we switch to the process we see many threads missing and the process itself is in the process of exiting (three “process” nouns in one sentence):

0: kd> !process 89d6d8e8
PROCESS 89d6d8e8  SessionId: 0  Cid: 1454    Peb: 7ffd8000  ParentCid: 0234
    DirBase: afa06000  ObjectTable: e5491278  HandleCount: 444.
    Image: ServiceA.exe
    VadRoot 89db18d8 Vads 213 Clone 0 Private 827. Modified 15. Locked 0.
    DeviceMap e10028c8
    Token                             e5556710
    ElapsedTime                       2 Days 02:59:39.285
    UserTime                          00:00:08.375
    KernelTime                        00:00:20.046
    QuotaPoolUsage[PagedPool]         50660
    QuotaPoolUsage[NonPagedPool]      9704
    Working Set Sizes (now,min,max)  (2523, 50, 345) (10092KB, 200KB, 1380KB)
    PeakWorkingSetSize                2953
    VirtualSize                       76 Mb
    PeakVirtualSize                   78 Mb
    PageFaultCount                    19259
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      1522

THREAD 89d36b60  Cid 1454.14b0  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Alertable
    8a8d7438  NotificationEvent
    89d36bd8  NotificationTimer
Not impersonating
DeviceMap                 e10028c8
Owning Process            89d6d8e8       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11760358       Ticks: 355 (0:00:00:05.546)
Context Switch Count      4591            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Start Address DriverA!CheckProtocolStackThread (0xf762cfa0)
Stack Init f3e7b000 Current f3e7acc0 Base f3e7b000 Limit f3e78000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
f3e7acd8 8083d5b1 nt!KiSwapContext+0x26
f3e7ad04 8083df9e nt!KiSwapThread+0x2e5
f3e7ad4c f762cf8d nt!KeWaitForSingleObject+0x346
[...]
f3e7adac 8092083b DriverA!CheckProtocolStackThread+0x5
f3e7addc 8083fe9f nt!PspSystemThreadStartup+0x2e
00000000 00000000 nt!KiThreadStartup+0x16

THREAD 89ce9580  Cid 1454.1630  Teb: 7ff9e000 Win32Thread: bc1e71f8 WAIT: (Unknown) UserMode Non-Alertable
     893fae40  SynchronizationEvent
Not impersonating
DeviceMap                 e10028c8
Owning Process            89d6d8e8       Image:         ServiceA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11048203       Ticks: 712510 (0:03:05:32.968)
Context Switch Count      1103                 LargeStack
UserTime                  00:00:00.281
KernelTime                00:00:01.484
Win32 Start Address DllA!OperationThread (0x1003b37e)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f65a3000 Current f65a2c60 Base f65a3000 Limit f65a0000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f65a2c78 8083d5b1 nt!KiSwapContext+0x26
f65a2ca4 8083df9e nt!KiSwapThread+0x2e5
f65a2cec 8092ae67 nt!KeWaitForSingleObject+0x346
f65a2d50 80833bef nt!NtWaitForSingleObject+0x9a
f65a2d50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f65a2d64)
0293fd18 7c827d29 ntdll!KiFastSystemCallRet
0293fd1c 77e61d1e ntdll!ZwWaitForSingleObject+0xc
0293fd8c 77e61c8d kernel32!WaitForSingleObjectEx+0xac
0293fda0 724c705b kernel32!WaitForSingleObject+0x12
0293fdb4 724c6745 DllB!Cleanup+0x3b
[...]
0293fde0 7c81a352 DllB!DLLEntry+0x62
0293fe00 7c830e90 ntdll!LdrpCallInitRoutine+0x14
0293feb8 77e668ab ntdll!LdrShutdownProcess+0x182
0293ffa4 77e6690d kernel32!_ExitProcess+0×43
0293ffb8 77e792c1 kernel32!ExitProcess+0×14

0293ffec 00000000 kernel32!BaseThreadStart+0×5f

However, the brief scan of all other processes and threads from !process 0 ff command output shows that another important service ServiceB has critical section wait chains:

THREAD 89e1f890  Cid 09f4.1018  Teb: 7ff96000 Win32Thread: bc279160 WAIT: (Unknown) UserMode Non-Alertable
    89d96c30  SynchronizationEvent
Not impersonating
DeviceMap                 e10028c8
Owning Process            8a453b18       Image:         ServiceB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11750950       Ticks: 9763 (0:00:02:32.546)
Context Switch Count      327                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address msvcrt!_endthreadex (0×77bcb4bc)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init f6113000 Current f6112c60 Base f6113000 Limit f6110000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 1
Kernel stack not resident.
ChildEBP RetAddr 
f6112c78 8083d5b1 nt!KiSwapContext+0×26
f6112ca4 8083df9e nt!KiSwapThread+0×2e5
f6112cec 8092ae67 nt!KeWaitForSingleObject+0×346
f6112d50 80833bef nt!NtWaitForSingleObject+0×9a
f6112d50 7c82860c nt!KiFastCallEntry+0xfc
09eafd98 7c827d29 ntdll!KiFastSystemCallRet
09eafd9c 7c83d266 ntdll!ZwWaitForSingleObject+0xc
09eafdd8 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0×1a3
09eafdf8 6738d489 ntdll!RtlEnterCriticalSection+0xa8

[…]
09eaffb8 77e6482f msvcrt!_endthreadex+0xa3
09eaffec 00000000 kernel32!BaseThreadStart+0×34

We switch to this process and find the owner of a critical section that blocks other threads:

0: kd> .process /r /p 8a453b18
Implicit process is now 8a453b18

0: kd> !cs -l -o -s
[...]
DebugInfo          = 0x0a199ea0
Critical section   = 0x0998ac80 (+0x998AC80)
LOCKED
LockCount          = 0x5
WaiterWoken        = No
OwningThread       = 0x00001680
RecursionCount     = 0x1
LockSemaphore      = 0xE08
SpinCount          = 0x00000000
OwningThread       = .thread 89bfc4d8
[…]

0: kd> !thread 89bfc4d8 1f
THREAD 89bfc4d8  Cid 09f4.1680  Teb: 7ff70000 Win32Thread: bc1e79d8 WAIT: (Unknown) UserMode Non-Alertable
    89bfc6c4  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00fbbc86:
Current LPC port e544f108
Not impersonating
DeviceMap                 e10028c8
Owning Process            8a453b18       Image:         ServiceB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      11049035       Ticks: 711678 (0:03:05:19.968)
Context Switch Count      455269                 LargeStack
UserTime                  00:00:45.312
KernelTime                00:00:10.531
Win32 Start Address msvcrt!_endthreadex (0×77bcb4bc)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init f3b8b000 Current f3b8ac08 Base f3b8b000 Limit f3b88000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr 
f3b8ac20 8083d5b1 nt!KiSwapContext+0×26
f3b8ac4c 8083df9e nt!KiSwapThread+0×2e5
f3b8ac94 8093edb1 nt!KeWaitForSingleObject+0×346
f3b8ad50 80833bef nt!NtRequestWaitReplyPort+0×776
f3b8ad50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f3b8ad64)
0f13ebe8 7c827899 ntdll!KiFastSystemCallRet
0f13ebec 77c80a6e ntdll!ZwRequestWaitReplyPort+0xc
0f13ec38 77c7fcf0 RPCRT4!LRPC_CCALL::SendReceive+0×230
0f13ec44 77c80673 RPCRT4!I_RpcSendReceive+0×24
0f13ec58 77ce315a RPCRT4!NdrSendReceive+0×2b
0f13f040 771f40c4 RPCRT4!NdrClientCall2+0×22e
[…]
0f13ffb8 77e6482f msvcrt!_endthreadex+0xa3
0f13ffec 00000000 kernel32!BaseThreadStart+0×34

Following LPC chain we find that the blocking thread in ServiceB was waiting for a response from ServiceA:

0: kd> !lpc message 00fbbc86
[...]
    Server process  : 89d6d8e8 (ServiceA.exe)
[…]

Now the question arises: who was freezing first, ServiceA or ServiceB? We can compare waiting times to answer. We see that waiting time for ServiceB request thread is 3:05:19 and for ServiceA shutdown thread is 03:05:32 (from !thread and !process output above):

Owning Process            8a453b18       Image:         ServiceB.exe
[...]
Wait Start TickCount      11049035       Ticks: 711678 (0:03:05:19.968)

Owning Process            89d6d8e8       Image:         ServiceA.exe
[...]
Wait Start TickCount      11048203       Ticks: 712510 (0:03:05:32.968)

Therefore, we conclude that ServiceB was blocked after ServiceA was blocked. Such questions often arise in multivendor troubleshooting scenarious.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 65b)

Monday, November 16th, 2009

This is a hardware counterpart of Not My Version pattern. Some problems manifest themselves on different hardware not used at the time of the product testing. In such cases we can look at kernel and complete memory dumps, extract hardware information using !sysinfo command and compare differences. This is similar to Virtualized System pattern and might provide troubleshooting hints. One example, I have seen in the past, involved a graphics intensive application that relied heavily upon hardware acceleration features. It was tested with certain processors and chipsets but after a few years failed to work on one computer despite the same OS image and drivers. !sysinfo command revealed significant hardware differences: the failing client computer was newer faster multiprocessor machine.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 91)

Thursday, November 12th, 2009

Sometimes we can observe rare events when abnormal conditions that usually result in a system crash result in a milder problem, for example, a service is unavailable and not affecting other services and users. It was reported that an application was freezing during user session logoff. A complete memory dump was saved at that time and its stack trace collection (!stacks command) shows the following suspicious thread in a user process (all other threads were normally waiting):

0: kd> !stacks
Proc.Thread  .Thread  Ticks   ThreadState Blocker
[...]
                                    [89cfa960 Application.exe]
ea0.001c4c  89a11db0 0499cd1 Blocked    DriverA+0×69db
[…]

0: kd> !thread 89a11db0 16
THREAD 89a11db0  Cid 0ea0.1c4c  Teb: 7ffdf000 Win32Thread: bc347a48 WAIT: (Unknown) KernelMode Non-Alertable
    89b87770  Unknown
    b97004ac  NotificationEvent
IRP List:
    899e2668: (0006,0244) Flags: 00000884  Mdl: 00000000
Not impersonating
DeviceMap                 daf62b28
Owning Process            89cfa960       Image:         Application.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      909331         Ticks: 4824273 (0:20:56:19.265)
Context Switch Count      186                 LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.093
*** ERROR: Module load completed but symbols could not be loaded for Application.exe
Win32 Start Address Application (0×00406b2a)
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init b60ceb30 Current b60cdf10 Base b60cf000 Limit b60cb000 Call b60ceb34
Priority 10 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
b60cdf28 80833485 89a11db0 00000002 00000000 nt!KiSwapContext+0×26
b60cdf54 808294b9 dc399008 89b87748 b60ce01c nt!KiSwapThread+0×2e5
b60cdf88 b96d69db 00000002 b60cdfbc 00000001 nt!KeWaitForMultipleObjects+0×3d7
WARNING: Stack unwind information not available. Following frames may be wrong.
b60cdfe8 b96d719e 89b87748 dc399008 b60ce01c DriverA+0×69db
[…]

We notice “89b87770  Unknown” and double check what object the thread is waiting for:

0: kd> dp b60cdfbc L00000002
b60cdfbc  89b87770 b97004ac

These are exactly the same objects that are listed in !thread command output. We see that the second one is normal and resides in a nonpaged area:

0: kd> dt _DISPATCHER_HEADER b97004ac
ntdll!_DISPATCHER_HEADER
   +0x000 Type             : 0 ''
   +0x001 Absolute         : 0 ''
   +0x001 NpxIrql          : 0 ''
   +0x002 Size             : 0x4 ''
   +0x002 Hand             : 0x4 ''
   +0x003 Inserted         : 0 ''
   +0x003 DebugActive      : 0 ''
   +0x000 Lock             : 262144
   +0x004 SignalState      : 0
   +0x008 WaitListHead     : _LIST_ENTRY [ 0x89a11e70 - 0x89a11e70 ]

0: kd> !address b97004ac
  a71e3000 - 13e1d000                          
          Usage       KernelSpaceUsageNonPagedSystem

The other looks like an invalid Random Object from the free nonpaged pool entry (it even says about itself that it is bad) that used to belong in the past to Configuration Manager:

0: kd> !pool 89b87770
Pool page 89b87770 region is Nonpaged pool
[...]
 89b87540 size:   98 previous size:   40  (Allocated)  File (Protected)
*89b875d8 size:  260 previous size:   98  (Free)      *CMpa
  Pooltag CMpa : registry post apcs, Binary : nt!cm

 89b87838 size:   28 previous size:  260  (Allocated)  FSfm
[…]

0: kd> dd 89b87770
89b87770  bad0b0b0 00000000 00000000 00000000
89b87780  8a04be01 00000000 89b87788 89b87788
89b87790  00150006 e56c6946 8993e208 89ab96b8
89b877a0  00000000 00000000 bad0b0b0 c0000800
89b877b0  02110004 63426343 88ebbf80 00001000
89b877c0  00199000 00000000 8993e238 88d0d248
89b877d0  0019a000 00000000 00000000 00000000
89b877e0  00000000 00000000 00000000 00000000

0: kd> dt _DISPATCHER_HEADER 89b87770
ntdll!_DISPATCHER_HEADER
   +0×000 Type             : 0xb0 ”
   +0×001 Absolute         : 0xb0 ”
   +0×001 NpxIrql          : 0xb0 ”
   +0×002 Size             : 0xd0 ”
   +0×002 Hand             : 0xd0 ”
   +0×003 Inserted         : 0xba ”
   +0×003 DebugActive      : 0xba ”
   +0×000 Lock             : -1160728400
   +0×004 SignalState      : 0
   +0×008 WaitListHead     : _LIST_ENTRY [ 0×0 - 0×0 ]

Now some counterfactual thinking. One possible scenario after KeWaitForMultipleObjects was called to wait for both objects to become signalled (3rd WAIT_TYPE parameter) the free pool slot was allocated or coalesced with SignalState becoming nonzero by coincidence and other members becoming random values and then the second normal object becomes signalled when another thread sets the notification event…

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 90)

Wednesday, October 28th, 2009

Sometimes we have a managed code exception that was enveloping a handled unmanaged code exception, Mixed (Nested) Exception:

0:000> !analyze -v

[...]

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

FAULTING_THREAD:  00000cfc

[...]

EXCEPTION_OBJECT: !pe 1f9af1ac
Exception object: 1f9af1ac
Exception type: System.AccessViolationException
Message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
InnerException: <none>
StackTrace (generated):
SP       IP       Function
    0012EF3C 28DD9AF9 DllA!Component.getFirstField()+0×11
[…]
0012EFC8 7B194170 System_Windows_Forms_ni!System.Windows.Forms. Control.OnClick(System.EventArgs)+0×70
0012EFE0 7B6F74B4 System_Windows_Forms_ni!System.Windows.Forms. Control.WmMouseUp(System.Windows.Forms.Message ByRef, System.Windows.Forms.MouseButtons, Int32)+0×170
0012F06C 7BA29B66 System_Windows_Forms_ni!System.Windows.Forms. Control.WndProc(System.Windows.Forms.Message ByRef)+0×861516
0012F0C4 7B1D1D6A System_Windows_Forms_ni!System.Windows.Forms. ScrollableControl.WndProc(System.Windows.Forms.Message ByRef)+0×2a
0012F0D0 7B1C8640 System_Windows_Forms_ni!System.Windows.Forms. Control+ControlNativeWindow.OnMessage(System.Windows.Forms.Message ByRef)+0×10
0012F0D8 7B1C85C1 System_Windows_Forms_ni!System.Windows.Forms. Control+ControlNativeWindow.WndProc(System.Windows.Forms.Message ByRef)+0×31
0012F0EC 7B1C849A System_Windows_Forms_ni!System.Windows.Forms. NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)+0×5a

[...]

We see that it was the access violation exception and check the thread with TID cfc:

0:000> kL
ChildEBP RetAddr
0012db54 77d70dde ntdll!KiFastSystemCallRet
0012db58 7b1d8e48 user32!NtUserWaitMessage+0xc
0012dbec 7b1d8937 System_Windows_Forms_ni+0x208e48
0012dc44 7b1d8781 System_Windows_Forms_ni+0x208937
0012dc74 7b6edd1f System_Windows_Forms_ni+0x208781
0012dc8c 7b72246b System_Windows_Forms_ni+0x71dd1f
0012dd18 7b722683 System_Windows_Forms_ni+0x75246b
0012dd58 7b6f77f6 System_Windows_Forms_ni+0x752683
0012dd64 7b6fa27c System_Windows_Forms_ni+0x7277f6
0012f148 77d6f8d2 System_Windows_Forms_ni+0x72a27c
0012f174 77d6f794 user32!InternalCallWinProc+0x23
0012f1ec 77d70008 user32!UserCallWinProcCheckWow+0x14b
0012f250 77d70060 user32!DispatchMessageWorker+0x322
0012f260 0a1412fa user32!DispatchMessageW+0xf
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012f27c 578439f7 0xa1412fa
0012f2ec 578430c9 WindowsBase_ni+0x939f7
0012f2f8 5784306c WindowsBase_ni+0x930c9
0012f304 55bed46e WindowsBase_ni+0x9306c
0012f310 55bec76f PresentationFramework_ni+0x1cd46e
0012f334 55bd3aa6 PresentationFramework_ni+0x1cc76f

If there was an exception it must be hidden so we inspect the thread raw stack:

0:000> !teb
TEB at 7ffdf000
ExceptionList:        0012e470
StackBase:            00130000
StackLimit:           0011e000

SubSystemTib:         00000000
FiberData:            00001e00
ArbitraryUserPointer: 00000000
Self:                 7ffdf000
EnvironmentPointer:   00000000
ClientId:             00000b6c . 00000cfc
RpcHandle:            00000000
Tls Storage:          7ffdf02c
PEB Address:          7ffd4000
LastErrorValue:       0
LastStatusValue:      c0000139
Count Owned Locks:    0
HardErrorMode:        0

0:000> dps 0011e000 00130000
0011e000  00000000
0011e004  00000000
0011e008  00000000
[...]
0012e72c  00130000
0012e730  0011e000
0012e734  00ee350d
0012e738  0012ea3c
0012e73c  77f299f7 ntdll!KiUserExceptionDispatcher+0xf
0012e740  0012e750
0012e744  0012e76c
0012e748  0012e750
0012e74c  0012e76c
0012e750  c0000005
0012e754  00000000
0012e758  00000000
0012e75c  77f17d89 ntdll!RtlLeaveCriticalSection+0×9
0012e760  00000002
0012e764  00000001
0012e768  00000028
0012e76c  0001003f
0012e770  00000000
0012e774  00000000
0012e778  00000000
0012e77c  00000000
[…]

0:000> .cxr 0012e76c
eax=00000020 ebx=09ca1fa0 ecx=781c1b78 edx=00000001 esi=00000020 edi=09ca1ff8
eip=77f17d89 esp=0012ea38 ebp=0012ea3c iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202
ntdll!RtlLeaveCriticalSection+0x9:
77f17d89 834608ff        add     dword ptr [esi+8],0FFFFFFFFh ds:0023:00000028=????????

0:000> kL
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
0012ea3c 7813e5b5 ntdll!RtlLeaveCriticalSection+0x9
0012ea44 2071c9ba msvcr80!_unlock_file+0x35
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ea68 2071c31e DllB!getType+0×286a
0012ee34 206bbfbc DllB!getType+0×3eb
0012ee68 206c8abb DllC+0xbfbc
0012ee98 79e71ca7 DllC!getFirstField+0×3b

0012f148 77d6f8d2 mscorwks!NDirectGenericStubReturnFromCall
0012f1ec 77d70008 user32!InternalCallWinProc+0×23
0012f240 77db51b9 user32!DispatchMessageWorker+0×322
0012f4a4 79e95feb user32!_W32ExceptionHandler+0×18
0012f4fc 79e968b0 mscorwks!MetaSig::HasRetBuffArg+0×5
0012f50c 79e9643e mscorwks!MetaSig::MetaSig+0×3a
0012f610 79e96534 mscorwks!MethodDesc::CallDescr+0xaf
0012f62c 79e96552 mscorwks!MethodDesc::CallTargetWorker+0×1f
0012f644 79eefa45 mscorwks!MethodDescCallSite::CallWithValueTypes+0×1a
0012f7a8 79eef965 mscorwks!ClassLoader::RunMain+0×223

Therefore we identified DllB and DllC components as suspicious. If we check exception chain we see that .NET runtime registered custom exception handlers:

0:000> !exchain
0012e470: mscorwks!COMPlusNestedExceptionHandler+0 (79edd6d7)
0012f13c: mscorwks!FastNExportExceptHandler+0 (7a00a2e7)

0012f1dc: user32!_except_handler4+0 (77db51ba)
0012f240: user32!_except_handler4+0 (77db51ba)
0012f46c: mscorwks!COMPlusFrameHandler+0 (79edc3bc)
0012f4c0: mscorwks!_except_handler4+0 (79f908a2)
0012f798: mscorwks!_except_handler4+0 (79f908a2)
0012fa04: mscorwks!GetManagedNameForTypeInfo+a680 (7a328d90)
0012fed4: mscorwks!GetManagedNameForTypeInfo+82c8 (7a325a3a)
0012ff20: mscorwks!_except_handler4+0 (79f908a2)
0012ff6c: mscorwks!GetManagedNameForTypeInfo+a6e (7a319ee4)
0012ffc4: ntdll!_except_handler4+0 (77ed9834)
Invalid exception stack at ffffffff

We check that GetManagedNameForTypeInfo+a6e (7a319ee4) is an exception handler indeed:

0:000> .asm no_code_bytes
Assembly options: no_code_bytes

0:000> uf 7a319ee4
msvcr80!__CxxFrameHandler:
78158aeb push ebp
78158aec mov ebp,esp
78158aee sub esp,8
78158af1 push ebx
78158af2 push esi
78158af3 push edi
78158af4 cld
78158af5 mov dword ptr [ebp-4],eax
78158af8 xor eax,eax
78158afa push eax
78158afb push eax
78158afc push eax
78158afd push dword ptr [ebp-4]
78158b00 push dword ptr [ebp+14h]
78158b03 push dword ptr [ebp+10h]
78158b06 push dword ptr [ebp+0Ch]
78158b09 push dword ptr [ebp+8]
78158b0c call msvcr80!__InternalCxxFrameHandler (7815897e)
78158b11 add esp,20h
78158b14 mov dword ptr [ebp-8],eax
78158b17 pop edi
78158b18 pop esi
78158b19 pop ebx
78158b1a mov eax,dword ptr [ebp-8]
78158b1d mov esp,ebp
78158b1f pop ebp
78158b20 ret

mscorwks!__CxxFrameHandler3:
79f5f258 jmp dword ptr [mscorwks!_imp____CxxFrameHandler3 (79e711c4)]

mscorwks!GetManagedNameForTypeInfo+0xa6e:
7a319ee4 mov edx,dword ptr [esp+8]
7a319ee8 lea eax,[edx+0Ch]
7a319eeb mov ecx,dword ptr [edx-30h]
7a319eee xor ecx,eax
7a319ef0 call mscorwks!__security_check_cookie (79e72037)
7a319ef5 mov eax,offset mscorwks!_CT??_R0H+0xc14 (7a319f00)
7a319efa jmp mscorwks!__CxxFrameHandler3 (79f5f258)

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 89)

Friday, October 23rd, 2009

When looking at stack traces seasoned memory dump analysis and software maintenance engineers immediately spot the right function to dig into its source code:

STACK_TEXT:
05b3f514 66ed52d3 08d72fee 08da6fc4 05b3f540 msvcrt!wcscpy+0xe
05b3f53c 77c50193 07516fc8 06637fc4 00000006 ModuleA!Add+0xd8
05b3f568 77cb33e1 66ed51fb 05b3f750 00000006 rpcrt4!Invoke+0x30
05b3f968 77cb1968 08d6afe0 0774cf4c 0660af28 rpcrt4!NdrStubCall2+0x299
[...]

Most will start with ModuleA!Add and examine parameters to wcscpy. This is because wcscpy (UNICODE version of strcpy) is considered as a Well-Tested Function. For the purposes of default analysis via !analyse -v command it is possible to configure WinDbg to ignore our own functions and modules as well if we are sure they were well-tested or pass-through. For details please see the old minidump analysis case study.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 88)

Friday, October 23rd, 2009

Some modules like drivers or runtime DLLs are always present after some action has happened. I call them Effect Components. It is the last thing to assume them to be the “Cause” components” or “Root Cause” or the so so called “culprit” components. Typical example, is dump disk driver symbolic references found in execution residue on the raw stack of a running bugchecking thread:

0: kd> !thread
THREAD fffffa8002bdebb0  Cid 03c4.03f0  Teb: 000007fffffde000 Win32Thread: fffff900c20f9810 RUNNING on processor 0
IRP List:
    fffffa8002b986f0: (0006,0118) Flags: 00060000  Mdl: 00000000
Not impersonating
DeviceMap                 fffff88005346920
Owning Process            fffffa80035bec10       Image:         Application.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      35246          Ticks: 7 (0:00:00:00.109)
Context Switch Count      1595                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.031
Win32 Start Address Application (0x0000000140002708)
Stack Init fffffa600495ddb0 Current fffffa600495d720
Base fffffa600495e000 Limit fffffa6004955000 Call 0
Priority 11 BasePriority 8 PriorityDecrement 1 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Call Site
fffffa60`0495d558 fffff800`0186e3ee : nt!KeBugCheckEx
fffffa60`0495d560 fffff800`0186d2cb : nt!KiBugCheckDispatch+0×6e
fffffa60`0495d6a0 fffffa60`03d5917a : nt!KiPageFault+0×20b (TrapFrame @ fffffa60`0495d6a0)
[…]

0: kd> dps fffffa6004955000 fffffa600495e000
fffffa60`04955000  00d4d0c8`00d4d0c8
fffffa60`04955008  00d4d0c8`00d4d0c8
fffffa60`04955010  00d4d0c8`00d4d0c8
[…]
fffffa60`0495c7e0  00000000`00000001
fffffa60`0495c7e8  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495c7f0  fffffa80`024c05a8
fffffa60`0495c7f8  fffffa60`02869ad4 dump_dumpata!IdeDumpNotification+0×1a4
fffffa60`0495c800  fffffa60`0495cb00
fffffa60`0495c808  fffff800`0182ff34 nt!output_l+0×6c0
fffffa60`0495c810  fffffa60`02860110 crashdmp!StrBeginningDump
fffffa60`0495c818  fffffa60`0495cb00
fffffa60`0495c820  00000000`00000000
fffffa60`0495c828  fffffa60`02869b18 dump_dumpata!IdeDumpNotification+0×1e8
fffffa60`0495c830  00000000`00000000
fffffa60`0495c838  fffffa60`0495c8c0
fffffa60`0495c840  00000000`00000000
fffffa60`0495c848  fffffa60`00000024
fffffa60`0495c850  00000000`ffffffff
fffffa60`0495c858  00000000`00000000
fffffa60`0495c860  00000000`00000000
fffffa60`0495c868  fffffa60`0495cb00
fffffa60`0495c870  fffffa80`00000000
fffffa60`0495c878  00000000`00000000
fffffa60`0495c880  00000000`00000101
fffffa60`0495c888  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495c890  fffffa60`0495cb0f
fffffa60`0495c898  fffff800`0182ff34 nt!output_l+0×6c0
fffffa60`0495c8a0  fffffa60`0495cb0f
fffffa60`0495c8a8  fffffa60`0495cb90
fffffa60`0495c8b0  00000000`00000040
fffffa60`0495c8b8  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495c8c0  fffffa80`024c0728
fffffa60`0495c8c8  fffffa80`024c0728
fffffa60`0495c8d0  00000001`00000000
fffffa60`0495c8d8  fffffa60`00000026
fffffa60`0495c8e0  00000000`ffffffff
fffffa60`0495c8e8  00000000`00000000
fffffa60`0495c8f0  fffffa80`00000000
fffffa60`0495c8f8  fffffa60`0495cb90
fffffa60`0495c900  00000000`00000000
fffffa60`0495c908  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495c910  00000000`00000000
fffffa60`0495c918  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495c920  fffff880`05311010
fffffa60`0495c928  00000000`00000002
fffffa60`0495c930  fffffa60`02875094 dump_SATA_Driver!AhciAdapterControl
fffffa60`0495c938  fffffa80`024c6018
fffffa60`0495c940  fffffa80`024c0728
fffffa60`0495c948  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495c950  fffffa80`024c0728
fffffa60`0495c958  00000000`00000000
fffffa60`0495c960  fffffa60`0495ca18
fffffa60`0495c968  00000000`00000000
fffffa60`0495c970  fffffa80`024c0728
fffffa60`0495c978  fffffa60`02876427 dump_SATA_Driver!AhciHwInitialize+0×337
fffffa60`0495c980  fffffa80`024c0be6
fffffa60`0495c988  fffffa60`0286a459 dump_dumpata!IdeDumpWaitOnRequest+0×79
fffffa60`0495c990  00000000`00000000
fffffa60`0495c998  00000000`0000023a
fffffa60`0495c9a0  20474e55`534d4153
fffffa60`0495c9a8  204a4831`36314448
fffffa60`0495c9b0  20202020`20202020
fffffa60`0495c9b8  20202020`20202020
fffffa60`0495c9c0  fffffa80`024c05a8
fffffa60`0495c9c8  fffffa60`02869b18 dump_dumpata!IdeDumpNotification+0×1e8
fffffa60`0495c9d0  00000000`00000000
fffffa60`0495c9d8  fffffa60`0495ca60
fffffa60`0495c9e0  00000000`00000001
fffffa60`0495c9e8  fffffa60`02869396 dump_dumpata!IdeDumpMiniportChannelInitialize+0×236
fffffa60`0495c9f0  fffffa80`024c05a8
fffffa60`0495c9f8  fffffa60`02869ad4 dump_dumpata!IdeDumpNotification+0×1a4
fffffa60`0495ca00  00000000`00000000
fffffa60`0495ca08  fffffa60`0495ca90
fffffa60`0495ca10  00000000`00000001
fffffa60`0495ca18  00000001`00000038
fffffa60`0495ca20  00000000`10010000
fffffa60`0495ca28  00000000`00000003
fffffa60`0495ca30  fffffa80`024c05a8
fffffa60`0495ca38  fffffa60`0286a954 dump_dumpata!AtaPortGetPhysicalAddress+0×2c
fffffa60`0495ca40  fffffa80`024c0728
fffffa60`0495ca48  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495ca50  00000000`00000001
fffffa60`0495ca58  0000003f`022a8856
fffffa60`0495ca60  fffffa80`0000000c
fffffa60`0495ca68  fffffa80`024c0728
fffffa60`0495ca70  00000000`00000200
fffffa60`0495ca78  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495ca80  fffffa80`024c0728
fffffa60`0495ca88  ffff6226`4f5f3eb8
fffffa60`0495ca90  00000000`00000010
fffffa60`0495ca98  fffffa60`02860370 crashdmp!Context+0×30
fffffa60`0495caa0  fffffa80`024c05a8
fffffa60`0495caa8  fffffa60`02875a0d dump_SATA_Driver!AhciHwStartIo+0×69d
fffffa60`0495cab0  fffffa80`024c0728
fffffa60`0495cab8  00000000`00000000
fffffa60`0495cac0  00000000`00000001
fffffa60`0495cac8  fffff800`018f3dfc nt!DisplayCharacter+0×5c
fffffa60`0495cad0  00000000`00000000
fffffa60`0495cad8  fffffa60`02877f6f dump_SATA_Driver!RecordExecutionHistory+0xcf
fffffa60`0495cae0  00000000`00010000
fffffa60`0495cae8  00000000`00000000
fffffa60`0495caf0  fffffa60`0495cd10
fffffa60`0495caf8  fffffa60`0495cc00
fffffa60`0495cb00  fffffa80`024c01c0
fffffa60`0495cb08  fffffa60`02875c3f dump_SATA_Driver!AhciHwInterrupt+0×2b
fffffa60`0495cb10  fffffa80`024c05a8
fffffa60`0495cb18  00000000`00000000
fffffa60`0495cb20  00000000`00000000
fffffa60`0495cb28  fffff800`01d406c9 hal!KeStallExecutionProcessor+0×25
fffffa60`0495cb30  00000000`00010000
fffffa60`0495cb38  00000000`00000000
fffffa60`0495cb40  fffffa60`0495cd10
fffffa60`0495cb48  fffffa60`0495cc00
fffffa60`0495cb50  00000000`00000000
fffffa60`0495cb58  fffffa60`0286a429 dump_dumpata!IdeDumpWaitOnRequest+0×49
fffffa60`0495cb60  fffffa60`02860370 crashdmp!Context+0×30
fffffa60`0495cb68  00000000`d8bda325
fffffa60`0495cb70  00000000`00000000
fffffa60`0495cb78  00000000`0000033e
fffffa60`0495cb80  00000000`00000000
fffffa60`0495cb88  fffffa60`028694d2 dump_dumpata!IdeDumpWritePending+0xee
fffffa60`0495cb90  fffffa80`024c0000
fffffa60`0495cb98  fffffa80`024c01c0
fffffa60`0495cba0  00000000`00000000
fffffa60`0495cba8  00000000`00000000
fffffa60`0495cbb0  fffffa80`024c01c0
fffffa60`0495cbb8  fffffa80`01e3c740
fffffa60`0495cbc0  00000000`00010000
fffffa60`0495cbc8  00000000`00000000
fffffa60`0495cbd0  00000000`0c01f000
fffffa60`0495cbd8  fffffa60`0285bca9 crashdmp!WritePageSpanToDisk+0×181
fffffa60`0495cbe0  00000000`83d81000
fffffa60`0495cbe8  00000000`00000000
fffffa60`0495cbf0  fffffa60`02860370 crashdmp!Context+0×30
fffffa60`0495cbf8  00000000`00000002
fffffa60`0495cc00  00000000`00000000
fffffa60`0495cc08  00000000`00030000
fffffa60`0495cc10  00000000`00000000
fffffa60`0495cc18  fffffa60`00441000
fffffa60`0495cc20  fffffa60`00441000
fffffa60`0495cc28  00000000`00010000
fffffa60`0495cc30  00000000`0000c080
fffffa60`0495cc38  00000000`0000c081
fffffa60`0495cc40  00000000`0000c082
fffffa60`0495cc48  00000000`0000c083
fffffa60`0495cc50  00000000`0000c084
fffffa60`0495cc58  00000000`0000c085
fffffa60`0495cc60  00000000`0000c086
fffffa60`0495cc68  00000000`0000c087
fffffa60`0495cc70  00000000`0000c088
fffffa60`0495cc78  00000000`0000c089
fffffa60`0495cc80  00000000`0000c08a
fffffa60`0495cc88  00000000`0000c08b
fffffa60`0495cc90  00000000`0000c08c
fffffa60`0495cc98  00000000`0000c08d
fffffa60`0495cca0  00000000`0000c08e
fffffa60`0495cca8  00000000`0000c08f
fffffa60`0495ccb0  00000000`00000000
fffffa60`0495ccb8  00000000`00000000
fffffa60`0495ccc0  00000000`00000000
fffffa60`0495ccc8  00000000`00000010
fffffa60`0495ccd0  00000000`0000c01d
fffffa60`0495ccd8  fffffa60`02860370 crashdmp!Context+0×30
fffffa60`0495cce0  00000000`0000bf80
fffffa60`0495cce8  00000000`00000001
fffffa60`0495ccf0  00000000`00000000
fffffa60`0495ccf8  fffffa80`01e353d0
fffffa60`0495cd00  fffffa80`01e353f8
fffffa60`0495cd08  fffffa60`0285bacc crashdmp!WriteFullDump+0×70
fffffa60`0495cd10  00000002`3a3d8000
fffffa60`0495cd18  00000000`0000c080
fffffa60`0495cd20  fffffa80`00000000
fffffa60`0495cd28  fffffa60`0285c9c0 crashdmp!CrashdmpWriteRoutine
fffffa60`0495cd30  fffff880`05311010
fffffa60`0495cd38  00000000`00000002
fffffa60`0495cd40  fffffa60`0495cf70
fffffa60`0495cd48  00000000`00000000
fffffa60`0495cd50  fffffa60`02860370 crashdmp!Context+0×30
fffffa60`0495cd58  fffffa60`0285b835 crashdmp!DumpWrite+0xc5
fffffa60`0495cd60  00000000`00000000
fffffa60`0495cd68  00000000`0000000f
fffffa60`0495cd70  00000000`00000001
fffffa60`0495cd78  fffffa60`00000001
fffffa60`0495cd80  fffffa80`02bdebb0
fffffa60`0495cd88  fffffa60`0285b153 crashdmp!CrashdmpWrite+0×57
fffffa60`0495cd90  00000000`00000000
fffffa60`0495cd98  fffffa60`028602f0 crashdmp!StrInitPortDriver
fffffa60`0495cda0  00000000`00000000
fffffa60`0495cda8  fffffa60`02860a00 crashdmp!ContextCopy
fffffa60`0495cdb0  00000000`00000000
fffffa60`0495cdb8  fffff800`01902764 nt!IoWriteCrashDump+0×3f4
fffffa60`0495cdc0  fffffa60`0495ce00
fffffa60`0495cdc8  00000028`00000025
fffffa60`0495cdd0  fffff800`018afd40 nt! ?? ::FNODOBFM::`string’
fffffa60`0495cdd8  00000000`000000d1
fffffa60`0495cde0  fffff880`05311010
fffffa60`0495cde8  00000000`00000002
fffffa60`0495cdf0  00000000`00000000
fffffa60`0495cdf8  fffffa60`03d5917a
fffffa60`0495ce00  202a2a2a`0a0d0a0d
fffffa60`0495ce08  7830203a`504f5453
fffffa60`0495ce10  31443030`30303030
fffffa60`0495ce18  46464646`78302820
fffffa60`0495ce20  31333530`30383846
fffffa60`0495ce28  fffff800`018f5f83 nt!VidDisplayString+0×143
fffffa60`0495ce30  30303030`30300030
fffffa60`0495ce38  2c323030`30303030
fffffa60`0495ce40  30303030`30307830
fffffa60`0495ce48  30303030`30303030
fffffa60`0495ce50  46464678`302c3030
fffffa60`0495ce58  fffff800`018fe040 nt!KiInvokeBugCheckEntryCallbacks+0×80
fffffa60`0495ce60  fffffa80`02bdebb0
fffffa60`0495ce68  fffff800`01921d52 nt!InbvDisplayString+0×72
fffffa60`0495ce70  fffff880`05311000
fffffa60`0495ce78  fffff800`01d406c9 hal!KeStallExecutionProcessor+0×25
fffffa60`0495ce80  00000000`00000001
fffffa60`0495ce88  00000000`0000000a
fffffa60`0495ce90  fffffa60`03d5917a
fffffa60`0495ce98  00000000`40000082
fffffa60`0495cea0  00000000`00000001
fffffa60`0495cea8  fffff800`01922c3e nt!KeBugCheck2+0×92e
fffffa60`0495ceb0  fffff800`000000d1
fffffa60`0495ceb8  00000000`000004d0
fffffa60`0495cec0  fffff800`01a43640 nt!KiProcessorBlock
fffffa60`0495cec8  00000000`0000000a
fffffa60`0495ced0  fffffa60`03d5917a
fffffa60`0495ced8  fffffa60`0495cf70
fffffa60`0495cee0  fffffa80`02bdebb0
fffffa60`0495cee8  00000000`00000000
fffffa60`0495cef0  00000000`00000000
fffffa60`0495cef8  fffffa80`02bdebb0
fffffa60`0495cf00  00000000`c21a6d00
fffffa60`0495cf08  00000000`00000000
fffffa60`0495cf10  fffff800`0198e7a0 nt!KiInitialPCR+0×2a0
fffffa60`0495cf18  fffff800`0198e680 nt!KiInitialPCR+0×180
fffffa60`0495cf20  fffffa80`02bb7320
fffffa60`0495cf28  00000000`00000000
fffffa60`0495cf30  00000000`00000000
fffffa60`0495cf38  fffff960`00000003
fffffa60`0495cf40  fffffa60`0495e000
fffffa60`0495cf48  fffffa60`04955000
fffffa60`0495cf50  00000001`c0643000
fffffa60`0495cf58  00000000`00000000
fffffa60`0495cf60  fffff900`c06ca53c
fffffa60`0495cf68  fffffa60`0495d090
fffffa60`0495cf70  00000000`00000000
fffffa60`0495cf78  00000000`00000000
fffffa60`0495cf80  00000000`00000000
fffffa60`0495cf88  00000000`00000000
fffffa60`0495cf90  00000000`00000000
fffffa60`0495cf98  00000000`00000000
fffffa60`0495cfa0  00001f80`0010000f
fffffa60`0495cfa8  0053002b`002b0010
fffffa60`0495cfb0  00000286`0018002b
fffffa60`0495cfb8  00000000`00000000
fffffa60`0495cfc0  00000000`00000000
fffffa60`0495cfc8  00000000`00000000
fffffa60`0495cfd0  00000000`00000000
fffffa60`0495cfd8  00000000`00000000
fffffa60`0495cfe0  00000000`00000000
fffffa60`0495cfe8  fffffa60`0495d660
fffffa60`0495cff0  00000000`0000000a
fffffa60`0495cff8  fffff880`05311010
fffffa60`0495d000  fffff880`05311010
fffffa60`0495d008  fffffa60`0495d558
fffffa60`0495d010  fffffa60`0495d720
fffffa60`0495d018  fffffa80`02b986f0
fffffa60`0495d020  fffffa80`02b98720
fffffa60`0495d028  00000000`00000002
fffffa60`0495d030  00000000`00000000
fffffa60`0495d038  fffffa60`03d5917a
fffffa60`0495d040  00000000`000001f1
fffffa60`0495d048  fffffa80`026a9df0
fffffa60`0495d050  00000000`00000001
fffffa60`0495d058  00000000`83360018
fffffa60`0495d060  fffffa80`02b3ee40
fffffa60`0495d068  fffff800`0186e650 nt!KeBugCheckEx
fffffa60`0495d070  00000000`00000000
fffffa60`0495d078  00000000`00000000
fffffa60`0495d080  00000000`00000000
fffffa60`0495d088  00000000`00000000
fffffa60`0495d090  00000000`00000000
fffffa60`0495d098  00000000`00000000
fffffa60`0495d0a0  00000000`00000000
[…]

If a BSOD was reported after installing new drivers we shouldn’t suspect SATA_Driver package here because its components would almost always be present on any bugcheck thread as referenced after a bugcheck cause. There presence is the “effect”. This example might seem trivial and pointless but I’ve seen some memory dump analysis conclusions based on the reversal of causes and effects.

- Dmitry Vostokov @ DumpAnalysis.org -

Statement current, coupled processes, wait chain, spiking thread, hidden exception, message box and not my version: memory dump and trace analysis pattern cooperation

Monday, October 12th, 2009

It was reported that one important system functionality is not available from time to time but is usually restored to normal operation when one service (ServiceA) is restarted. That service was coupled with ServiceB and their memory dumps were saved and delivered for analysis. Unfortunately, nothing raising a suspicion was found inside. To tackle the problem it was advised to get an ETW trace from the system including modules from ServiceA together with process memory dumps when the problem happens again. The trace revealed the following message with exceptionally high statement current of 72,118 msg/s (and also superdense - no other types of trace statements were found inside):

#      PID    TID    Message
[...]
823296 11300  2484   ServiceB notification failed, error code = 6
[…]

Where the error 6 is invalid handle error:

0:000> !error 6
Error code: (Win32) 0x6 (6) - The handle is invalid.

The thread 2484 (9B4) corresponds to thread #22 in ServiceA and it is blocked waiting for an LPC reply:

  22  Id: 2c24.9b4 Suspend: 1 Teb: 7ffa4000 Unfrozen
ChildEBP RetAddr 
020cfa18 7c827899 ntdll!KiFastSystemCallRet
020cfa1c 77c80a6e ntdll!ZwRequestWaitReplyPort+0xc
020cfa68 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0×230

020cfa74 77c80673 rpcrt4!I_RpcSendReceive+0×24
020cfa88 77ce315a rpcrt4!NdrSendReceive+0×2b
020cfe70 73077ca5 rpcrt4!NdrClientCall2+0×22e
020cfe88 73077c2a ServiceA!RpcNextNotification+0×1c
020cffb8 77e6482f ServiceA!EventWatcherThread+0×107
020cffec 00000000 kernel32!BaseThreadStart+0×34

Suspicious of a loop we confirm that the thread was spiking:

0:000> !runaway f
 User Mode Time
  Thread       Time
  22:9b4       0 days 0:41:27.453
  19:4768      0 days 0:00:00.109
[…] 
Kernel Mode Time
  Thread       Time
  22:9b4       0 days 0:24:27.984
  23:407c      0 days 0:00:00.437
[…]
 Elapsed Time
  Thread       Time
[…]
  22:9b4       0 days 5:26:21.499
[…]

Looking at the raw stack data (using !teb and dds WinDbg commands) we see a hidden processed exception:

020cf6c4  020cf4c0
020cf6c8  020cf6d8
020cf6cc  020cf718
020cf6d0  7c828290 ntdll!_except_handler3
020cf6d4  7c82a120 ntdll!CheckHeapFillPattern+0x54
020cf6d8  020cf6e8
020cf6dc  00140000
020cf6e0  7c82a144 ntdll!RtlpAllocateFromHeapLookaside+0x13
020cf6e4  00140868
020cf6e8  020cf910
020cf6ec  7c82a0d8 ntdll!RtlAllocateHeap+0x1dd
020cf6f0  7c82a11c ntdll!RtlAllocateHeap+0xee7
020cf6f4  73074548
020cf6f8  00000000
020cf6fc  00000000
020cf700  00000000
020cf704  00000000
020cf708  00218ef0
020cf70c  020cf728
020cf710  7c82a791 ntdll!RtlpCoalesceFreeBlocks+0x383
020cf714  020d0000
020cf718  00218ef0
020cf71c  020cf9fc
020cf720  7c82865c ntdll!RtlRaiseException+0×3d
020cf724  020ce000
020cf728  020cf72c
020cf72c  00010007
020cf730  020cf810
020cf734  7c829f5d ntdll!RtlFreeHeap+0×20e
020cf738  001407d8
020cf73c  7c829f79 ntdll!RtlFreeHeap+0×70f
020cf740  00000000

After some time another pair of coupled processes was collected where ServiceA(2) was hanging on an LPC request again but this time ServiceB(2) had one thread blocked by a GUI property sheet processing code (a variant of Message Box pattern):

0:015> kL 100
ChildEBP RetAddr 
017fb9f0 7c827d29 ntdll!KiFastSystemCallRet
017fb9f4 77e61d1e ntdll!ZwWaitForSingleObject+0xc
017fba64 77e61c8d kernel32!WaitForSingleObjectEx+0xac
017fba78 6dfcdac3 kernel32!WaitForSingleObject+0x12
[...]
017fbdac 730801c5 compstui!CommonPropertySheetUIW+0×17
017fbdf4 73080f5d ServiceB!CommonPropertySheetUI+0×43

WARNING: Stack unwind information not available. Following frames may be wrong.
017fc27c 5c3ae4e6 ComponentA!DllGetClassObject+0xbf4e
[…]
017ff8f8 77ce33e1 rpcrt4!Invoke+0×30
017ffcf8 77ce35c4 rpcrt4!NdrStubCall2+0×299
017ffd14 77c7ff7a rpcrt4!NdrServerCall2+0×19
017ffd48 77c8042d rpcrt4!DispatchToStubInCNoAvrf+0×38
017ffd9c 77c80353 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×11f
017ffdc0 77c811dc rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3
017ffdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0×42c
017ffe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
017fff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
017fff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
017fffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0×9d
017fffb8 77e6482f rpcrt4!ThreadStartRoutine+0×1b
017fffec 00000000 kernel32!BaseThreadStart+0×34

ComponentA was also found loaded in ServiceB(1) user dump and in the ServiceB memory dump from the initial coupled pair where nothing was found before. The timestamp of that component was old enough (lmv command) to warrant more attention to it and contact its ISV.

- Dmitry Vostokov @ DumpAnalysis.org -

Critical section high contention and wait chains, blocked threads, and periodic error: memory dump and trace analysis pattern cooperation

Friday, October 9th, 2009

This is the first case study here that shows an interplay of memory dump analysis (DA) and software trace analysis (TA) patterns, what I call DATA analysis patterns (or DA+TA).  

It was reported that one process was blocking vital server functionality. After the process restart the problem was gone away. A complete memory dump was saved on the next occurrence and it revealed critical section wait chains in that process but no critical section deadlocks:

0: kd> .process /r /p 87f76020
Implicit process is now 87f76020
Loading User Symbols
[...]

0: kd> !cs -l -o -s
-----------------------------------------
DebugInfo          = 0x0016c6d8
Critical section   = 0×0032be30 (+0×32BE30)
LOCKED
LockCount          = 0×34
WaiterWoken        = No
OwningThread       = 0×00001c64
RecursionCount     = 0×1
LockSemaphore      = 0×624
SpinCount          = 0×00000000
OwningThread       = .thread 86396db0
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are not enabled.
[…]

The thread 86396db0 (TID 1c64) that blocked more than 50 threads (0×34) was blocked itself sleeping for more than 6 seconds:

0: kd> .thread 86396db0
Implicit thread is now 86396db0

0: kd> kL 100
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr 
ae7f8c98 8083d5b1 nt!KiSwapContext+0x26
ae7f8cc4 8083cf69 nt!KiSwapThread+0x2e5
ae7f8d0c 8092b03f nt!KeDelayExecutionThread+0x2ab
ae7f8d54 80833bef nt!NtDelayExecution+0x84
ae7f8d54 7c82860c nt!KiFastCallEntry+0xfc
1020e8ac 7c826f69 ntdll!KiFastSystemCallRet
1020e8b0 77e41ed5 ntdll!NtDelayExecution+0xc
1020e918 77e424fd kernel32!SleepEx+0x68
1020e928 67739357 kernel32!Sleep+0xf
1020e944 6773c3a2 ComponentA!DB_Driver_Command+0xa7
[…]
1020ec64 67485393 ComponentB!DatabaseSearch+0×34
[…]
1020ffb8 77e6482f msvcrt!_endthreadex+0xa3
1020ffec 00000000 kernel32!BaseThreadStart+0×34

0: kd> kv
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child             
[...]
1020e918 77e424fd 00001b00 00000000 1020e944 kernel32!SleepEx+0x68 (FPO: [SEH])
1020e928 67739357 00001b00 00000000 0032ac6c kernel32!Sleep+0xf (FPO: [1,0,0])
[…]

0: kd> ? 1b00 / 0n1000
Evaluate expression: 6 = 00000006

Critical section it owns shows high contention count too:

0: kd> dt -r1 _RTL_CRITICAL_SECTION   0x0032be30
ProcessA!_RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x0016c6d8 _RTL_CRITICAL_SECTION_DEBUG
      +0x000 Type             : 0
      +0x002 CreatorBackTraceIndex : 0
      +0x004 CriticalSection  : 0x0032be30 _RTL_CRITICAL_SECTION
      +0x008 ProcessLocksList : _LIST_ENTRY [ 0x16c708 - 0x16c6b8 ]
      +0x010 EntryCount       : 0
      +0×014 ContentionCount  : 0xac352
      +0×018 Spare            : [2] 0×43005c
   +0×004 LockCount        : -210
   +0×008 RecursionCount   : 1
   +0×00c OwningThread     : 0×00001c64
   +0×010 LockSemaphore    : 0×00000624
   +0×014 SpinCount        : 0

Fortunately, that process had ETW tracing capability and its software trace recorded before the complete memory dump was saved the following recurrent periodic errorfrom different threads that confirms our observation about the possible problem with a database and explains thread delays we see (> 6 seconds for Sleep):

#    PID  TID  Time         Message
[...]
1972 2780 5992 10:05:11.005 Error: [DB Driver] Not enough space on temp disk
1973 2780 5992 10:05:11.005 Execute DB command sleeps on error (retry 26)
[...]
4513 2780 3292 10:06:02.942 Error: [DB Driver] Not enough space on temp disk
4514 2780 3292 10:06:02.942 Execute DB command sleeps on error (retry 11)
4515 2780 3292 10:06:09.598 Error: [DB Driver] Not enough space on temp disk
4516 2780 3292 10:06:09.598 Execute DB command sleeps on error (retry 12)
[…]

- Dmitry Vostokov @ DumpAnalysis.org -

Forthcoming Memory Dump Analysis Anthology, Volume 3

Saturday, September 26th, 2009

This is a revised, edited, cross-referenced and thematically organized volume of selected DumpAnalysis.org blog posts about crash dump analysis and debugging written in October 2008 - June 2009 for software engineers developing and maintaining products on Windows platforms, quality assurance engineers testing software on Windows platforms and technical support and escalation engineers dealing with complex software issues. The third volume features:

- 15 new crash dump analysis patterns
- 29 new pattern interaction case studies
- Trace analysis patterns
- Updated checklist
- Fully cross-referenced with Volume 1 and Volume 2
- New appendixes

Product information:

  • Title: Memory Dump Analysis Anthology, Volume 3
  • Author: Dmitry Vostokov
  • Language: English
  • Product Dimensions: 22.86 x 15.24
  • Paperback: 404 pages
  • Publisher: Opentask (20 December 2009)
  • ISBN-13: 978-1-906717-43-8
  • Hardcover: 404 pages
  • Publisher: Opentask (30 January 2010)
  • ISBN-13: 978-1-906717-44-5

Back cover features 3D computer memory visualization image.

- Dmitry Vostokov @ DumpAnalysis.org -

ALPC wait chain, missing threads, message box, zombie and special processes: pattern cooperation

Friday, September 18th, 2009

The purpose of this case study is to show how to choose what to include in a fiber bundle memory dump when x64 complete memory dumps are huge and not an option to deliver:

1: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:     5880464 (   23521856 Kb)
[…]

The dump we have is a kernel. When we dump all processes and threads and look for “Waiting for ” we find many ALPC wait chains spanning 3 - 4 processes (sometimes semicircular), sometimes originated from processes with missing threads (just one or two present threads when we expect a dozen of them in a normal state):

1: kd> !process fffffa800b834c10
PROCESS fffffa800b834c10
    SessionId: 205  Cid: 13c40    Peb: 7fffffdb000  ParentCid: 133c0
    DirBase: 13b61d000  ObjectTable: fffff8800c2295b0  HandleCount:  58.
    Image: ProcessA.exe
    VadRoot fffffa8007d70c00 Vads 121 Clone 0 Private 497. Modified 0. Locked 0.
    DeviceMap fffff88000007450
    Token                             fffff8800c695560
    ElapsedTime                       00:03:42.083
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         65968
    QuotaPoolUsage[NonPagedPool]      11520
    Working Set Sizes (now,min,max)  (1274, 50, 345) (5096KB, 200KB, 1380KB)
    PeakWorkingSetSize                1278
    VirtualSize                       37 Mb
    PeakVirtualSize                   38 Mb
    PageFaultCount                    1286
    MemoryPriority                    BACKGROUND
    BasePriority                      13
    CommitCharge                      581

THREAD fffffa800b845bb0  Cid 13c40.1332c  Teb: 000007fffffde000 Win32Thread: fffff900c0076010 WAIT: (WrLpcReply) UserMode Non-Alertable
    fffffa800b845f40  Semaphore Limit 0x1
Waiting for reply to ALPC Message fffff88012527770 : queued at port fffffa80055bca60 : owned by process fffffa80054dfc10
Not impersonating
DeviceMap                 fffff88000007450
Owning Process            fffffa800b834c10       Image:         ProcessA.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      10912787       Ticks: 14208 (0:00:03:42.000)
Context Switch Count      34                 LargeStack
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0×00000000fff60260
Stack Init fffffa600e8d5db0 Current fffffa600e8d5670
Base fffffa600e8d6000 Limit fffffa600e8ce000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           Call Site
fffffa60`0e8d56b0 fffff800`016a36fa nt!KiSwapContext+0×7f
fffffa60`0e8d57f0 fffff800`0169835b nt!KiSwapThread+0×13a
fffffa60`0e8d5860 fffff800`016cd4e2 nt!KeWaitForSingleObject+0×2cb
fffffa60`0e8d58f0 fffff800`01916d14 nt!AlpcpSignalAndWait+0×92
fffffa60`0e8d5980 fffff800`019137a6 nt!AlpcpReceiveSynchronousReply+0×44
fffffa60`0e8d59e0 fffff800`0190330f nt!AlpcpProcessSynchronousRequest+0×24f
fffffa60`0e8d5b00 fffff800`016a0ef3 nt!NtAlpcSendWaitReceivePort+0×19f
fffffa60`0e8d5bb0 00000000`774d756a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`0e8d5c20)
00000000`0026f038 00000000`00000000 0×774d756a

1: kd> !alpc /m fffff88012527770

Message @ fffff88012527770
  MessageID             : 0x10E8 (4328)
  CallbackID            : 0xC3416B (12796267)
  SequenceNumber        : 0x00000002 (2)
  Type                  : LPC_REQUEST
  DataLength            : 0x0040 (64)
  TotalLength           : 0x0068 (104)
  Canceled              : No
  Release               : No
  ReplyWaitReply        : No
  Continuation          : Yes
  OwnerPort             : fffffa80076e9660 [ALPC_CLIENT_COMMUNICATION_PORT]
  WaitingThread         : fffffa800b845bb0
  QueueType             : ALPC_MSGQUEUE_PENDING
  QueuePort             : fffffa80055bca60 [ALPC_CONNECTION_PORT]
  QueuePortOwnerProcess : fffffa80054dfc10 (ProcessB.exe)
  ServerThread          : fffffa800b711060
  QuotaCharged          : No
  CancelQueuePort       : 0000000000000000
  CancelSequencePort    : 0000000000000000
  CancelSequenceNumber  : 0×00000000 (0)
  ClientContext         : 00000000003fcf20
  ServerContext         : 0000000000000000
  PortContext           : 00000000029fda00
  CancelPortContext     : 0000000000000000
  SecurityData          : 0000000000000000
  View                  : 0000000000000000

1: kd> !thread fffffa800b711060
THREAD fffffa800b711060  Cid 032c.146e8  Teb: 000007fffff7c000 Win32Thread: 0000000000000000 WAIT: (WrLpcReply) UserMode Non-Alertable
    fffffa800b7113f0  Semaphore Limit 0x1
Waiting for reply to ALPC Message fffff8800e401200 : queued at port fffffa8005a32730 : owned by process fffffa8004c39040
Not impersonating
DeviceMap                 fffff88000007450
Owning Process            fffffa80054dfc10       Image:         ProcessB.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      10916800       Ticks: 10195 (0:00:02:39.296)
Context Switch Count      401            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×000007fefe647780
Stack Init fffffa6001d33db0 Current fffffa6001d33670
Base fffffa6001d34000 Limit fffffa6001d2e000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 1 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Call Site
fffffa60`01d336b0 fffff800`016a36fa : nt!KiSwapContext+0×7f
fffffa60`01d337f0 fffff800`0169835b : nt!KiSwapThread+0×13a
fffffa60`01d33860 fffff800`016cd4e2 : nt!KeWaitForSingleObject+0×2cb
fffffa60`01d338f0 fffff800`01916d14 : nt!AlpcpSignalAndWait+0×92
fffffa60`01d33980 fffff800`019137a6 : nt!AlpcpReceiveSynchronousReply+0×44
fffffa60`01d339e0 fffff800`0190330f : nt!AlpcpProcessSynchronousRequest+0×24f
fffffa60`01d33b00 fffff800`016a0ef3 : nt!NtAlpcSendWaitReceivePort+0×19f
fffffa60`01d33bb0 00000000`774d756a : nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`01d33c20)
00000000`03d8e458 00000000`00000000 : 0×774d756a

1: kd> !alpc /m fffff8800e401200

Message @ fffff8800e401200
  MessageID             : 0x0BA4 (2980)
  CallbackID            : 0xC3E68A (12838538)
  SequenceNumber        : 0x00021911 (137489)
  Type                  : LPC_REQUEST
  DataLength            : 0x00C0 (192)
  TotalLength           : 0x00E8 (232)
  Canceled              : No
  Release               : No
  ReplyWaitReply        : No
  Continuation          : Yes
  OwnerPort             : fffffa8005b119c0 [ALPC_CLIENT_COMMUNICATION_PORT]
  WaitingThread         : fffffa800b711060
  QueueType             : ALPC_MSGQUEUE_PENDING
  QueuePort             : fffffa8005a32730 [ALPC_CONNECTION_PORT]
  QueuePortOwnerProcess : fffffa8004c39040 (ProcessC.exe)
  ServerThread          : fffffa800a843bb0
  QuotaCharged          : No
  CancelQueuePort       : 0000000000000000
  CancelSequencePort    : 0000000000000000
  CancelSequenceNumber  : 0×00000000 (0)
  ClientContext         : 0000000002e2e810
  ServerContext         : 0000000000000000
  PortContext           : 00000000002f3eb0
  CancelPortContext     : 0000000000000000
  SecurityData          : 0000000000000000
  View                  : 0000000000000000

1: kd> !thread fffffa800a843bb0
THREAD fffffa800a843bb0  Cid 048c.fbec  Teb: 000007ffffdaa000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    fffffa8006027d80  Semaphore Limit 0x7fffffff
    fffffa800a843c68  NotificationTimer
Not impersonating
DeviceMap                 fffff88001800ba0
Owning Process            fffffa8004c39040       Image:         ProcessC.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      10916801       Ticks: 10194 (0:00:02:39.281)
Context Switch Count      239            
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0×000007fefe647780
Stack Init fffffa601b280db0 Current fffffa601b280940
Base fffffa601b281000 Limit fffffa601b27b000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Call Site
fffffa60`1b280980 fffff800`016a36fa : nt!KiSwapContext+0×7f
fffffa60`1b280ac0 fffff800`0169835b : nt!KiSwapThread+0×13a
fffffa60`1b280b30 fffff800`019013e8 : nt!KeWaitForSingleObject+0×2cb
fffffa60`1b280bc0 fffff800`016a0ef3 : nt!NtWaitForSingleObject+0×98
fffffa60`1b280c20 00000000`774d6d5a : nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`1b280c20)
00000000`10b7e548 00000000`00000000 : 0×774d6d5a

Some processes designed to be non-interactive have threads that wait for UI messages and therefore could be potential message or dialog box threads waiting for a dismissal and blocking other threads:

THREAD fffffa8005a7aa20  Cid 061c.0778  Teb: 000007fffff9e000 Win32Thread: fffff900c079fd50 WAIT: (WrUserRequest) UserMode Non-Alertable
    fffffa8005a7a5a0  SynchronizationEvent
Not impersonating
DeviceMap                 fffff88000007450
Owning Process            fffffa80058f01b0       Image:         ProcessD.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      10911798       Ticks: 15197 (0:00:03:57.453)
Context Switch Count      88939                 LargeStack
UserTime                  00:00:00.078
KernelTime                00:00:00.609
Win32 Start Address 0×000007fefa8238a0
Stack Init fffffa60046a8db0 Current fffffa60046a8720
Base fffffa60046a9000 Limit fffffa60046a0000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           Call Site
fffffa60`046a8760 fffff800`016a36fa nt!KiSwapContext+0×7f
fffffa60`046a88a0 fffff800`0169835b nt!KiSwapThread+0×13a
fffffa60`046a8910 fffff960`0014c053 nt!KeWaitForSingleObject+0×2cb
fffffa60`046a89a0 fffff960`0014c0ea win32k!xxxRealSleepThread+0×25f
fffffa60`046a8a40 fffff960`0014bb3a win32k!xxxSleepThread+0×56
fffffa60`046a8a70 fffff960`0014bc39 win32k!xxxRealInternalGetMessage+0×72e
fffffa60`046a8b50 fffff960`0014d0d9 win32k!xxxInternalGetMessage+0×35
fffffa60`046a8b90 fffff800`016a0ef3 win32k!NtUserGetMessage+0×79

fffffa60`046a8c20 00000000`773dd58a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffffa60`046a8c20)
00000000`03d2f7b8 00000000`00000000 0×773dd58a

We also have more than 30,000 zombie processes including some special ones signifying past faults:

1: kd> !vm
[...]
         15714 ProcessE.exe       0 (         0 Kb)
         15650 WerFault.exe       0 (         0 Kb)
         15644 ProcessF.exe       0 (         0 Kb)
         15640 ProcessE.exe       0 (         0 Kb)
         15610 ProcessG.exe       0 (         0 Kb)
         1560c ProcessE.exe       0 (         0 Kb)
         155f8 ProcessH.exe       0 (         0 Kb)
         155e8 ProcessE.exe       0 (         0 Kb)
         155c4 ProcessG.exe       0 (         0 Kb)
         155bc ProcessE.exe       0 (         0 Kb)
         155b8 ProcessH.exe       0 (         0 Kb)
         1559c WerFault.exe       0 (         0 Kb)
         15560 ProcessE.exe       0 (         0 Kb)
[…]

What we recommend here is to save user dumps of processes A, B, C and D and then force a kernel dump next time the problem surfaces. Also to check WER settings for any recorder faults and, because of the fact the the system is W2K8, configure LocalDumps registry keys to capture full user dumps.

- Dmitry Vostokov @ DumpAnalysis.org -

Video from Microsoft GEC

Monday, September 14th, 2009

Ntdebugging blog has put the link to the video online from Microsoft Global Engineering Conference where I presented the pattern-driven memory dump analysis methodology:

Citrix engineers at Microsoft GEC

Note: you need to open a video link URL from the blog post in Windows Media Player if you don’t have an association for WMV files or save the file.

- Dmitry Vostokov @ DumpAnalysis.org -