Archive for the ‘Trace Analysis Patterns’ Category

Trace Analysis Patterns (Part 19)

Monday, April 5th, 2010

Typical software narrative history consists of requests and responses, for example, function or object method calls and returns:

#     Module PID  TID  Time         File    Function Message
[...]
26060 dllA   1604 7108 10:06:21.746 fileA.c foo      Calling bar
[...]
26232 dllA   1604 7108 10:06:22.262 fileA.c foo      bar returns 0x5
[...]

The code that generates execution history is response-complete if it traces both requests and responses. For such code (except in cases where tracing is stopped before a response) the absence of expected responses could be a sign of blocked threads or quiet exception processing. The code that generates execution history is exception-complete if it also traces exception processing. Response-complete and exception-complete code is called call-complete. If we don’t see response messages for call-complete code we have Incomplete History.

In general, we can talk about the absence of certain messages in a trace as a deviation from the standard trace sequence template corresponding to a use case. The difference there is in a missing request too. This is a topic for next patterns.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Adjoint threads, discontinuity and time delta: software trace pattern cooperation

Wednesday, March 24th, 2010

Here is one of the first case studies in pattern-driven software trace analysis. A user starts printing but nothing comes out. However, if the older printer driver is installed everything works as expected. We suspect that print spooler crashes if the newer printer driver is used. Based on known module name in ETW trace we find PID for print spooler process (19984) and immediately see discontinuity in the trace with the large time delta between the last PID message and the last trace statement (almost 4 minutes): 

No   Source        PID   TID   Time         Message
712  \src\print\ui 19984 16200 12:22:31.571 PropertySheet returns 1
[… no messages for PID 19984 …]
5103 \src\mgmt   1292  7604  12:26:11.659 WaitAction

If we select the adjoint thread of source \src\print\driver (in other words, filter only its messages) we would see discontinuity with the similar time delta. We know that printer driver runs in print spooler context. However, PID had changed and that means print spooler was restarted (perhaps after a crash):

No   Source            PID   TID   Time         Message
557  \src\print\driver 19984 16200 12:22:28.069 DisableDevice returns
[… discontinuity for \print\driver …]
1462 \src\print\driver 10828 17584 12:26:03.854 DllMain

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Component Heap

Friday, March 19th, 2010

Recently, a reader of this blog sent me a minidump file and a debugger log of an application that had about 300 modules loaded in a process address space. What was interesting is the huge amount of ModLoad / Unload module debugger events in the log prior to an access violation exception. Some modules were loaded / unloaded many times, for example (only included lines for just one module but there were many other):

[...]
ModLoad: 16640000 16649000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 16640000
[...]
ModLoad: 192b0000 192b9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 192b0000
[...]
ModLoad: 192b0000 192b9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 192b0000
[...]
ModLoad: 161b0000 161b9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161b0000
[...]
ModLoad: 161e0000 161e9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161e0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 161f0000 161f9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 161f0000
[...]
ModLoad: 171b0000 171b9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 171b0000
[...]
ModLoad: 25180000 25189000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 25180000
[...]
ModLoad: 171b0000 171b9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 171b0000
[...]
ModLoad: 171b0000 171b9000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 171b0000
[...]
[...]
[...]
ModLoad: 0df60000 0df69000   X:\Client\Bin\ModuleA.dll
[...]
Unload module X:\Client\Bin\ModuleA.dll at 0df60000
[...]
(f38.560): Access violation - code c0000005 (first chance)
---
--- 1st chance AccessViolation exception ----
[...]

We see the component ModuleA was loaded at different addresses and this looks similar to a singleton object factory with Create / Destroy operations that resembles heap operations Alloc and Free where every allocation can place the same object at a different address. This is why I call all this a component or module heap. The application was COM-based and every domain-specific object was implemented in a separate in-proc COM DLL. There were thousands of such objects.

PS. This story reminds me that when I learnt about COM in 90s I wanted to redesign my word processor written in C to have every paragraph, line and even word to be implemented as a COM object.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 18)

Monday, March 8th, 2010

Sometimes we have a sequence of Activity Regions with increasing values of Statement Current, like depicted here:

The boundaries of regions may be blurry and arbitrarily drawn. Nevertheless, the current is visibly increasing or decreasing, hence the name of this pattern: Trace Acceleration, by analogy with physical acceleration, second-order derivative. We can also metaphorically use here the notion of a partial derivative for trace statement current and acceleration for Threads of Activity and Adjoint Threads of Activity but whether it is useful remains to be seen.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 17)

Thursday, March 4th, 2010

This is an extension of Thread of Activity pattern based on the concept of multibraiding and it is called Adjoint Thread of Activity correspondingly. I’m going to illustrate it soon when I publish a synthetic case study involving several software trace analysis patterns.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 16)

Saturday, February 13th, 2010

Another useful pattern is called Time Delta. This is a time interval between significant events. For example,

#     Module PID  TID  Time         File    Function Message
1                      10:06:18.994                  (Start)
[...]
6060  dllA   1604 7108 10:06:21.746 fileA.c DllMain  DLL_PROCESS_ATTACH
[…]
24480 dllA   1604 7108 10:06:32.262 fileA.c Exec     Path: C:\Program Files\CompanyA\appB.exe
[…]
24550 dllB   1604 9588 10:06:32.362 fileB.c PostMsg  Event Q
[…]
28230                  10:07:05.170                  (End)

Such deltas are useful in examining delays. In the trace fragment above we are interested in dllA activity from its load until it launches appB.exe. We see that the time delta was only 10 seconds. The message #24550 was the last message from the process ID 1604 and after that we didn’t “hear” from that PID for more than 30 seconds until the tracing was stopped.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 15)

Saturday, February 13th, 2010

When looking at software traces and doing either a search for or just scrolling certain messages have our attention immediately. We call them Significant Events and hence the name of this pattern, Significant Event. It could be a recorded exception or an error, a basic fact, a trace message from vocabulary index, or just any trace statement that marks the start of some activity we want to explore in depth, for example, a certain DLL is attached to the process, a coupled process is started or a function is called. The start of a trace and the end of it are trivial significant events and are used in deciding whether the trace is circular, in determining the trace recording interval or its average statement current.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Memory Systems Language (Part 1)

Friday, February 12th, 2010

Computer memory analysis is based on interconnected structures of symbols and we state that there exists a memory language that extends a hierarchy of modeling and implementation languages (both domain-specific and general-purpose):

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Forthcoming Memory Dump Analysis Anthology, Volume 4

Thursday, February 11th, 2010

This is a revised, edited, cross-referenced and thematically organized volume of selected DumpAnalysis.org blog posts about crash dump analysis and debugging written in July 2009 - January 2010 for software engineers developing and maintaining products on Windows platforms, quality assurance engineers testing software on Windows platforms and technical support and escalation engineers dealing with complex software issues. The fourth volume features:

- 13 new crash dump analysis patterns
- 13 new pattern interaction case studies
- 10 new trace analysis patterns
- 6 new Debugware patterns and case study
- Workaround patterns
- Updated checklist
- Fully cross-referenced with Volume 1, Volume 2 and Volume 3
- New appendixes

Product information:

  • Title: Memory Dump Analysis Anthology, Volume 4
  • Author: Dmitry Vostokov
  • Language: English
  • Product Dimensions: 22.86 x 15.24
  • Paperback: 410 pages
  • Publisher: Opentask (30 March 2010)
  • ISBN-13: 978-1-906717-86-5
  • Hardcover: 410 pages
  • Publisher: Opentask (30 April 2010)
  • ISBN-13: 978-1-906717-87-2

Back cover features memory space art image: Internal Process Combustion.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Extending Multithreading to Multibraiding (Adjoint Threading)

Sunday, January 17th, 2010

Having considered computational threads as braided strings and after discerning several software trace analysis patterns (just the beginning) we can see formatted and tabulated software trace output in a new light and employ the “fabric of traces” and braid metaphors for an Adjoint Thread concept. This new concept was motivated by reading about Extended Phenotype (*) and extensive analysis of Citrix ETW-based CDF traces using CDFAnalyzer. The term Adjoint was borrowed from mathematics because the concept we discuss below resembles this metaphorical formula: (Thread A, B) = [A, Thread B]. Let me first illustrate adjoint threading using simplified trace tables. Consider this generalized software trace example (date and time column is omitted for visual clarity):

#

Source Dir

PID

TID

File Name

Function

Message

1

\src\subsystemA

2792

5676

file1.cpp

fooA

Message text…

2

\src\subsystemA

2792

5676

file1.cpp

fooA

Message text…

3

\src\subsystemA

2792

5676

file1.cpp

fooA

Message text…

4

\src\lib

2792

5680

file2.cpp

barA

Message text…

5

\src\subsystemA

2792

5680

file1.cpp

fooA

Message text…

6

\src\subsystemA

2792

5676

file1.cpp

fooA

Message text…

7

\src\lib

2792

5680

file2.cpp

fooA

Message text…

8

\src\lib

2792

5680

file2.cpp

fooA

Message text…

9

\src\subsystemB

2792

3912

file3.cpp

barB

Message text…

10

\src\subsystemB

2792

3912

file3.cpp

barB

Message text…

11

\src\subsystemB

2792

3912

file3.cpp

barB

Message text…

12

\src\subsystemB

2792

3912

file3.cpp

barB

Message text…

13

\src\subsystemB

2792

3912

file3.cpp

barB

Message text…

14

\src\subsystemB

2792

3912

file3.cpp

barB

Message text…

15

\src\subsystemB

2792

2992

file4.cpp

fooB

Message text…

16

\src\subsystemB

2792

3008

file4.cpp

fooB

Message text…

We see several threads in a process PID 2792. In CDFAnalyzer we can filter trace messages that belong to any column and if we filter by TID we get a view of any Thread of Activity. However, each thread can “run” through any source directory, file name or function. If a function belongs to a library multiple threads would access it. This source location (can be considered as a subsystem), file or function view of activity is called an Adjoint Thread. For example, if we filter only subsystemA column in the trace above we get this table:

#

Source Dir

PID

TID

File Name

Function

Message

1

\src\subsystemA

2792

5676

file1.cpp

fooA

Message …

2

\src\subsystemA

2792

5676

file1.cpp

fooA

Message …

3

\src\subsystemA

2792

5676

file1.cpp

fooA

Message …

5

\src\subsystemA

2792

5680

file1.cpp

fooA

Message …

6

\src\subsystemA

2792

5676

file1.cpp

fooA

Message …

7005

\src\subsystemA

2792

5664

file1.cpp

fooA

Message …

10198

\src\subsystemA

2792

5664

file1.cpp

fooA

Message …

10364

\src\subsystemA

2792

5664

file1.cpp

fooA

Message …

10417

\src\subsystemA

2792

5664

file1.cpp

fooA

Message …

10420

\src\subsystemA

2792

5676

file1.cpp

fooA

Message …

10422

\src\subsystemA

2792

5680

file1.cpp

fooA

Message …

10587

\src\subsystemA

2792

5664

file1.cpp

fooA

Message …

10767

\src\subsystemA

2792

5680

file1.cpp

fooA

Message …

11126

\src\subsystemA

2792

5668

file1.cpp

fooA

Message …

11131

\src\subsystemA

2792

5680

file1.cpp

fooA

Message …

11398

\src\subsystemA

2792

5676

file1.cpp

fooA

Message …

11501

\src\subsystemA

2792

5668

file1.cpp

fooA

Message …

11507

\src\subsystemA

2792

5668

file1.cpp

fooA

Message …

11509

\src\subsystemA

2792

5664

file1.cpp

fooA

Message …

11513

\src\subsystemA

2792

5680

file1.cpp

fooA

Message …

11524

\src\subsystemA

2792

5668

file1.cpp

fooA

Message …

We can graphically view subsystemA as a braid string that “permeates the fabric of threads”:

We can get many different braids by changing filters, hence multibraiding. Here is another example of a driver source file view initially permeating 2 process contexts and 4 threads:

#

Source Dir

PID

TID

File Name

Function

Message

41

\src\sys\driver

3636

3848

entry.c

DriverEntry

IOCTL …

80

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

99

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

102

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

179

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

180

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

311

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

447

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

448

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

457

\src\sys\driver

2792

5108

entry.c

DriverEntry

IOCTL …

608

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

614

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

655

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

675

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

678

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

680

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

681

\src\sys\driver

3636

3896

entry.c

DriverEntry

IOCTL …

1145

\src\sys\driver

3636

4960

entry.c

DriverEntry

IOCTL …

1153

\src\sys\driver

3636

4960

entry.c

DriverEntry

IOCTL …

1154

\src\sys\driver

3636

4960

entry.c

DriverEntry

IOCTL …

(*) A bit of digression. Looks like biology keeps giving insights into software, there is even a software phenotype metaphor albeit a bit restricted to code, I just thought that we need also an Extended Software Phenotype.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

The Year of Debugging in Retrospection

Thursday, January 14th, 2010

The Year of Debugging, 0×7D9, was a remarkable year for DumpAnalysis.org. Here is the list of achievements to report:

- Software Trace Analysis as a new discipline with its own set of patterns

- Unification of Memory Dump Analysis with Software Trace Analysis (DA+TA)

- New computer memory dump-based art movements: Opcodism and Physicalist Art

- Discovery of 3D computer memory visualization techniques

- Establishing Software Maintenance Institute

- Broadening software fault injection as Software Defect Construction discipline

- Establishing a new profession of a Software Defect Researcher

- Starting ambitious Dictionary of Debugging

- Publishing Windows Debugging: Practical Foundations book

- Publishing the first x86-free Windows debugging book: x64 Windows Debugging: Practical Foundations

- Establishing the new debugging magazine: Debugged! MZ/PE

- Publishing Memory Dump Analysis Anthology, Volume 3

- Cooperation with OpenTask to promote First Fault Software Problem Solving book

- Establishing Debugging Expert(s) Magazine Online

- Creating the first development process for debugging and software troubleshooting tools: RADII

- Publishing the first pattern-driven memory dump analysis troubleshooting methodology as a foundation for software debugging

- Proposal for an International Memory Analysts and Debuggers Day

- Almost completed Windows Debugging Notebook to be published soon

Now DumpAnalysis.org focuses on The Year of Dump Analysis, 0×7DA, as a foundation for the forthcoming debugging decade and reveals future plans this weekend.

I’m sure that many other organizations and individuals have no less remarkable accomplishments to report for 2009. I promise to track down and write about some of them in the forthcoming book:

The Science of Dr. Watson: An Illustrated History of Debugging (ISBN: 978-1906717070)

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

Trace Analysis Patterns (Part 14)

Tuesday, January 12th, 2010

Inter-Correlation pattern is analogous to the previously described Intra-Correlation pattern but involves several traces from possibly different trace agents recorded (most commonly) at the same time or during an overlapping time interval:

Let’s look at a typical example of an application subclassing windows to add additional look and feel element to its GUI or thjat hooks into window messaging. Suppose this application also records important trace points like window parameters before and after subclassing using ETW technology (Event Tracing for Windows). When we run the application in terminal services environment all windows (including other processes) are shown with an incorrect dimension. We therefore request the application trace and in addition WindowHistory trace to see how coordinates of all windows are changed over time. We easily find some Basic Facts in both traces such as window class name or time but it looks like window handle is different. In another set of traces recorded for comparison we have same window handle values, class name is absent from the ETW trace but a process and thread ID for the same window handle are different. We, therefore, don’t see a correlation between these traces and suspect that both traces in 2 sets were recorded in different terminal sessions, for example:

ETW trace:

#      PID   TID   Time          Message
[…]
46750  5890  6960  10:17:18.825  Subclassing, handle=0×100B8, class=MyWindowClass, […]
[…]

WindowHistory trace:

Handle: 0001006E Class: “MyWindowClass” Title: “”
Captured at: 10:17:19:637
   Process ID: 19e0
Thread ID: 16e4

Parent: 0
Screen position (l,t,r,b): (-2,896,1282,1026)
Client rectangle (l,t,r,b): (0,0,1276,122)
Visible: true
Window placement command: SW_SHOWNORMAL
Foreground: false
HungApp: false
Minimized: false
Maximized: false
[…]

- Dmitry Vostokov @ TraceAnalysis.org -

Mystique Back Covers Revealed

Thursday, January 7th, 2010

Some practical engineers asked me how do Debugged! MZ/PE magazine back covers look like from a birds eye view:

 

One engineer even commented that they look better and better (counterclockwise) :-) 

- Dmitry Vostokov @ DumpAnalysis.org -

Trace Analysis Patterns (Part 13)

Thursday, December 31st, 2009

What will you do confronted with a one million trace messages recorded between 10:44:15 and 10:46:55 with an average trace statement current of 7,000 msg/s from dozens of modules and having a one sentence problem description? One solution is to try to search for a specific vocabulary relevant to the problem description, for example, if a problem is an intermittent re-authentication then we might try to search for a word “password” or a similar one drawn from a troubleshooting domain vocabulary. So it is useful to have a Vocabulary Index to search for. Hence, the same name of this pattern. In our trace example, the search for “password” jumps straight to a small activity region of authorization modules starting from the message number #180,010 and the last “password” occurrence is in the message #180,490 that narrows initial analysis region to just 500 messages. Note the similarity here between a book and its index and a trace as a software narrative and its vocabulary index.

- Dmitry Vostokov @ TraceAnalysis.org -

Memory Dump Analysis Anthology, Volume 3

Sunday, December 20th, 2009

“Memory dumps are facts.”

I’m very excited to announce that Volume 3 is available in paperback, hardcover and digital editions:

Memory Dump Analysis Anthology, Volume 3

Table of Contents

In two weeks paperback edition should also appear on Amazon and other bookstores. Amazon hardcover edition is planned to be available in January 2010.

The amount of information was so voluminous that I had to split the originally planned volume into two. Volume 4 should appear by the middle of February together with Color Supplement for Volumes 1-4. 

- Dmitry Vostokov @ DumpAnalysis.org -

Debugged! MZ/PE September issue is out

Wednesday, December 16th, 2009

Finally, after the long delay, the issue is available in print on Amazon and through other sellers:

Debugged! MZ/PE: Software Tracing

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

Trace Analysis Patterns (Part 12)

Tuesday, November 17th, 2009

When looking at lengthy traces with thousands and millions of messages (trace statements) we can see regions of activity where statement current (Jm, msg/s) is much higher than in surrounding temporal regions. Hence the name of this pattern, Activity Region. Here is an illustration for a typical ETW/CDF trace where a middle region of activity (Jm2) signifies a system performing some response function like a user session initialization and application launch:

- Dmitry Vostokov @ TraceAnalysis.org -

Trace Analysis Patterns (Part 11)

Saturday, November 7th, 2009

Birds eye view of software traces makes it easier to see their coarse blocked structure:

where further finer structure is discernible and even nested blocks:

Some blocks of output can be seen when scrolling trace viewer output but if a viewer support zooming it is possible to get an overview and jump directly into a Characteristic Message Block, for example, debug messages of repeated attempts to query a database. If a viewer supports message coloring it also helps. Sometimes this technique is useful to ignore bulk messages and start the analysis around block boundaries.

- Dmitry Vostokov @ TraceAnalysis.org -

Software Trace: Birds Eye View

Friday, November 6th, 2009

Here is a fragment of a condensed view of a CDF (ETW-based) trace imported into MS Word:

- Dmitry Vostokov @ TraceAnalysis.org -

Statement current, coupled processes, wait chain, spiking thread, hidden exception, message box and not my version: memory dump and trace analysis pattern cooperation

Monday, October 12th, 2009

It was reported that one important system functionality is not available from time to time but is usually restored to normal operation when one service (ServiceA) is restarted. That service was coupled with ServiceB and their memory dumps were saved and delivered for analysis. Unfortunately, nothing raising a suspicion was found inside. To tackle the problem it was advised to get an ETW trace from the system including modules from ServiceA together with process memory dumps when the problem happens again. The trace revealed the following message with exceptionally high statement current of 72,118 msg/s (and also superdense - no other types of trace statements were found inside):

#      PID    TID    Message
[...]
823296 11300  2484   ServiceB notification failed, error code = 6
[…]

Where the error 6 is invalid handle error:

0:000> !error 6
Error code: (Win32) 0x6 (6) - The handle is invalid.

The thread 2484 (9B4) corresponds to thread #22 in ServiceA and it is blocked waiting for an LPC reply:

  22  Id: 2c24.9b4 Suspend: 1 Teb: 7ffa4000 Unfrozen
ChildEBP RetAddr 
020cfa18 7c827899 ntdll!KiFastSystemCallRet
020cfa1c 77c80a6e ntdll!ZwRequestWaitReplyPort+0xc
020cfa68 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0×230

020cfa74 77c80673 rpcrt4!I_RpcSendReceive+0×24
020cfa88 77ce315a rpcrt4!NdrSendReceive+0×2b
020cfe70 73077ca5 rpcrt4!NdrClientCall2+0×22e
020cfe88 73077c2a ServiceA!RpcNextNotification+0×1c
020cffb8 77e6482f ServiceA!EventWatcherThread+0×107
020cffec 00000000 kernel32!BaseThreadStart+0×34

Suspicious of a loop we confirm that the thread was spiking:

0:000> !runaway f
 User Mode Time
  Thread       Time
  22:9b4       0 days 0:41:27.453
  19:4768      0 days 0:00:00.109
[…] 
Kernel Mode Time
  Thread       Time
  22:9b4       0 days 0:24:27.984
  23:407c      0 days 0:00:00.437
[…]
 Elapsed Time
  Thread       Time
[…]
  22:9b4       0 days 5:26:21.499
[…]

Looking at the raw stack data (using !teb and dds WinDbg commands) we see a hidden processed exception:

020cf6c4  020cf4c0
020cf6c8  020cf6d8
020cf6cc  020cf718
020cf6d0  7c828290 ntdll!_except_handler3
020cf6d4  7c82a120 ntdll!CheckHeapFillPattern+0x54
020cf6d8  020cf6e8
020cf6dc  00140000
020cf6e0  7c82a144 ntdll!RtlpAllocateFromHeapLookaside+0x13
020cf6e4  00140868
020cf6e8  020cf910
020cf6ec  7c82a0d8 ntdll!RtlAllocateHeap+0x1dd
020cf6f0  7c82a11c ntdll!RtlAllocateHeap+0xee7
020cf6f4  73074548
020cf6f8  00000000
020cf6fc  00000000
020cf700  00000000
020cf704  00000000
020cf708  00218ef0
020cf70c  020cf728
020cf710  7c82a791 ntdll!RtlpCoalesceFreeBlocks+0x383
020cf714  020d0000
020cf718  00218ef0
020cf71c  020cf9fc
020cf720  7c82865c ntdll!RtlRaiseException+0×3d
020cf724  020ce000
020cf728  020cf72c
020cf72c  00010007
020cf730  020cf810
020cf734  7c829f5d ntdll!RtlFreeHeap+0×20e
020cf738  001407d8
020cf73c  7c829f79 ntdll!RtlFreeHeap+0×70f
020cf740  00000000

After some time another pair of coupled processes was collected where ServiceA(2) was hanging on an LPC request again but this time ServiceB(2) had one thread blocked by a GUI property sheet processing code (a variant of Message Box pattern):

0:015> kL 100
ChildEBP RetAddr 
017fb9f0 7c827d29 ntdll!KiFastSystemCallRet
017fb9f4 77e61d1e ntdll!ZwWaitForSingleObject+0xc
017fba64 77e61c8d kernel32!WaitForSingleObjectEx+0xac
017fba78 6dfcdac3 kernel32!WaitForSingleObject+0x12
[...]
017fbdac 730801c5 compstui!CommonPropertySheetUIW+0×17
017fbdf4 73080f5d ServiceB!CommonPropertySheetUI+0×43

WARNING: Stack unwind information not available. Following frames may be wrong.
017fc27c 5c3ae4e6 ComponentA!DllGetClassObject+0xbf4e
[…]
017ff8f8 77ce33e1 rpcrt4!Invoke+0×30
017ffcf8 77ce35c4 rpcrt4!NdrStubCall2+0×299
017ffd14 77c7ff7a rpcrt4!NdrServerCall2+0×19
017ffd48 77c8042d rpcrt4!DispatchToStubInCNoAvrf+0×38
017ffd9c 77c80353 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×11f
017ffdc0 77c811dc rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3
017ffdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0×42c
017ffe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
017fff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
017fff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
017fffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0×9d
017fffb8 77e6482f rpcrt4!ThreadStartRoutine+0×1b
017fffec 00000000 kernel32!BaseThreadStart+0×34

ComponentA was also found loaded in ServiceB(1) user dump and in the ServiceB memory dump from the initial coupled pair where nothing was found before. The timestamp of that component was old enough (lmv command) to warrant more attention to it and contact its ISV.

- Dmitry Vostokov @ DumpAnalysis.org -