Archive for the ‘CDF Analysis Tips and Tricks’ Category

Trace Analysis Patterns (Part 7)

Monday, August 3rd, 2009

When we have software traces that record process identifiers (PID) and thread identifiers (TID) it is important to differentiate between trace statements sorted by time and by Thread of Activity. The latter is simply the flow of trace messages sorted by TID and it is very helpful in cases with dense traces coming from hundreds of processes and components. Here is an example from MessageHistory bulk trace fragment showing different threads of activity in different colors:

Start time: 21:5:36:651
Format time: 21:5:43:133
Number of messages sent: 24736
Number of messages posted: 905

[...]
21:5:41:990 S PID: a7c TID: 554 HWND: 0×0000000000010E62  Class: “ToolbarWindow32″ Title: “” WM_USER+4b (0×44b) wParam: 0×14 lParam: 0×749e300
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E4A  Class: “CtrlNotifySink” Title: “” WM_NOTIFY (0×4e) wParam: 0×0 lParam: 0×749efa8
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62  Class: “ToolbarWindow32″ Title: “” WM_USER+3f (0×43f) wParam: 0×14 lParam: 0×749e1e0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62  Class: “ToolbarWindow32″ Title: “” WM_USER+4b (0×44b) wParam: 0×14 lParam: 0×749e300
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62  Class: “ToolbarWindow32″ Title: “” WM_USER+19 (0×419) wParam: 0×14 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62  Class: “ToolbarWindow32″ Title: “” WM_USER+61 (0×461) wParam: 0×6 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E62  Class: “ToolbarWindow32″ Title: “” WM_USER+56 (0×456) wParam: 0×0 lParam: 0×0
21:5:41:990 S PID: a7c TID: 554 HWND: 0×00010E4A  Class: “CtrlNotifySink” Title: “” WM_NOTIFY (0×4e) wParam: 0×0 lParam: 0×749f290
21:5:41:990 S PID: a7c TID: 554 HWND: 0×000E04A8  Class: “CtrlNotifySink” Title: “” WM_NCPAINT (0×85) wParam: 0xffffffffcc043bdb lParam: 0×0
21:5:41:990 P PID: a7c TID: 554 HWND: 0×000E04A8  Class: “CtrlNotifySink” Title: “” WM_PAINT (0xf) wParam: 0×0 lParam: 0×0

21:5:42:007 S PID: 1a8 TID: 660 HWND: 0×0001003C  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_WINDOWPOSCHANGING (0×46) wParam: 0×0 lParam: 0×29af030
21:5:42:007 P PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×6 lParam: 0×0
21:5:42:007 P PID: 1a8 TID: 660 HWND: 0×0001003C  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×8 lParam: 0×0
21:5:42:007 P PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×9 lParam: 0×0
21:5:42:022 P PID: a7c TID: a28 HWND: 0×0001061A  Class: “WPDShServiceObject” Title: “WPDShServiceObject_WND” WM_TIMER (0×113) wParam: 0xd lParam: 0×0
21:5:42:022 P PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×8 lParam: 0×0
21:5:42:022 P PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_PAINT (0xf) wParam: 0×0 lParam: 0×0

21:5:42:036 P PID: 1a8 TID: 660 HWND: 0×0001003C  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×5 lParam: 0×0
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C  Class: “ReBarWindow32″ Title: “” WM_USER+10 (0×410) wParam: 0×2 lParam: 0×0
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C  Class: “ReBarWindow32″ Title: “” WM_USER+18 (0×418) wParam: 0×2 lParam: 0×1041a
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C  Class: “ReBarWindow32″ Title: “” WM_USER+1a (0×41a) wParam: 0×0 lParam: 0×1041c
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×0001006C  Class: “ReBarWindow32″ Title: “” WM_USER+19 (0×419) wParam: 0×0 lParam: 0×0
21:5:42:054 S PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_WINDOWPOSCHANGING (0×46) wParam: 0×0 lParam: 0×2bef960
21:5:42:054 P PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×10 lParam: 0×0
21:5:42:054 P PID: a7c TID: 9b4 HWND: 0×00010084  Class: “CiceroUIWndFrame” Title: “TF_FloatingLangBar_WndTitle” WM_TIMER (0×113) wParam: 0×5 lParam: 0×0

21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E32  Class: “DirectUIHWND” Title: “” WM_NCHITTEST (0×84) wParam: 0×0 lParam: 0×640406
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E30  Class: “DUIViewWndClassName” Title: “” WM_NCHITTEST (0×84) wParam: 0×0 lParam: 0×640406
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E32  Class: “DirectUIHWND” Title: “” WM_SETCURSOR (0×20) wParam: 0×10e32 lParam: 0×2000001
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E30  Class: “DUIViewWndClassName” Title: “” WM_SETCURSOR (0×20) wParam: 0×10e32 lParam: 0×2000001
21:5:42:074 S PID: a7c TID: 554 HWND: 0×00010E20  Class: “ShellTabWindowClass” Title: “Release” WM_SETCURSOR

[…]

Ususally when we see an error indication we select its current thread of activity and investigate what happened in this process and thread before. Here is a synthesised example from real CDF traces:


No     PID  TID  Time         Message
[...]
165797 4280 5696 07:07:23.709 FreeToken Handle 00000000 
165798 4660 7948 07:07:23.709 EnumProcesses failed. Error=-2144534527
165799 7984 6216 07:07:23.749 GetData threw exception
165800 7984 6216 07:07:23.750 === Begin Exception Dump === 
[…]

We see the error and sort by TID 7948 to see what happened before and get additional information like server name: 

No     PID  TID  Time         Message
[...]
165223 4660 7948 07:07:23.704 GetServerName: Exit. ServerName = SERVER02 
165224 4660 7948 07:07:23.704 GetServerProcesses: ServerName is SERVER02
165798 4660 7948 07:07:23.709 EnumProcesses failed. Error=-2144534527 

[…]

- Dmitry Vostokov @ TraceAnalysis.org -

Trace Analysis Patterns (Part 6)

Wednesday, July 29th, 2009

Often analysis of software traces starts with searching for short textual patterns, like a failure or an exception code or simply the word “exception”. And indeed, some software components are able to record their own exceptions or exceptions that were propagated to them including full stack traces. This is all common in .NET and Java environments. Here is a synthetic and beautified example based on real software traces:

N      PID  TID  Message
[...]
265799 8984 4216 ComponentA.Store.GetData threw exception: ‘System.Reflection.TargetInvocationException: DCOM connection to server failed with error: ‘Exception from HRESULT: 0×842D0001′ —> System.Runtime.InteropServices.COMException (0×842D0001): Exception from HRESULT: 0×842D0001
   at ComponentA.GetData(Byte[] pKey)
   at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at ComponentB.Connections.ComInterfaceProxy.Invoke(IMessage message)’ 
265800 8984 4216 === Begin Exception Dump === 
265801 8984 4216 ComponentB.Exceptions.ConnectionException: DCOM connection to server failed with error: ‘Exception from HRESULT: 0×842D0001′ —> System.Runtime.InteropServices.COMException (0×842D0001): Exception from HRESULT: 0×842D0001
265802 8984 4216 at ComponentA.Store.GetData(Byte[] pKey) 
[…]
265808 8984 4216 Exception rethrown at [0]:
265809 8984 4216 at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg) 
265810 8984 4216 at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type) 
265811 8984 4216 at ComponentA.Store.GetData(Byte[] pKey) 
265812 8984 4216 at ComponentA.App.EnumBusinessObjects()

[…]
265816 8984 4216 ===> InnerException: 
265817 8984 4216 ** COM Exception Error Code: 0×842d0001 
265818 8984 4216 System.Runtime.InteropServices.COMException (0×842D0001): Exception from HRESULT: 0×842D0001 
265819 8984 4216 at ComponentA.Store.GetData(Byte[] pKey) 
265820 8984 4216 === End Exception Dump === 
[…]

In the embedded stack trace we see that App object was trying to enumerate business objects and asked Store object to get some data and the latter object was probably trying to communicate to the real data store via DCOM. The communication attempt failed with HRESULT.

- Dmitry Vostokov @ TraceAnalysis.org -

Welcome to TraceAnalysis.org!

Wednesday, June 3rd, 2009

DumpAnalysis.org acquires TraceAnalysis.org to complete computer DATA artifact analysis. The domain currently points to Dump Analysis Portal page but this might change in the future.

- Dmitry Vostokov @ DumpAnalysis.org -

Trace Analysis Patterns (Part 3)

Wednesday, June 3rd, 2009

Next obvious structural trace analysis pattern is called Circular Trace. Sometimes this information is missing in the problem description or trace metadata doesn’t reflect this. Then circular traces can be detected by trace file size (usually large) and from timestamps, like this 100Mb CDF trace snippet:

No     Module  PID  TID  Date      Time         Statement
[Begin of trace listing]
1      ModuleA 4280 1736 5/28/2009 08:53:50.496 [... Trace statement 1]
2      ModuleB 6212 6216 5/28/2009 08:53:52.876 [... Trace statement 2] 
3      ModuleA 4280 4776 5/28/2009 08:54:13.537 [... Trace statement 3] 
[... Some traced exceptions helpful for analysis ...]
3799   ModuleA 4280 3776 5/28/2009 09:15:00.853 [… Trace statement 3799]
3800   ModuleA 4280 1736 5/27/2009 09:42:12.029 [… Trace statement 3800]
[… Skipped …]
[… Skipped …]
[… Skipped …]
579210 ModuleA 4280 4776 5/28/2009 08:53:35.989 [… Trace statement 579210]
[End of trace listing]

In such traces the analysis region usually can be found at the beginning at the trace because as soon as elusive and hard to reproduce problem happens the trace is stopped.

- Dmitry Vostokov @ TraceAnalysis.org -

Trace Analysis Patterns (Part 2)

Thursday, May 7th, 2009

A typical trace is a detailed narrative. It is accompanied by a problem description that lists essential facts. Therefore the first task of any trace analysis is to check the presence of Basic Facts in the trace. If they are not visible or do not correspond then the trace was possibly not recorded during the problem or was taken from a different computer or under different conditions. Here is an example. A user “test01″ cannot connect to an application. We look at the trace and find this statement:

No   PID  TID  Date      Time         Statement
[...]
3903 3648 5436 4/29/2009 16:17:36.150 User Name: test01
[...]

At least we can be sure that this trace was taken for the user test01 especially when we expect this or similar trace statements. If we could not see this trace statement we can suppose that the trace was taken at the wrong time, for example, after the problem happened already.

- Dmitry Vostokov @ TraceAnalysis.org -

Trace Analysis Patterns (Part 1)

Tuesday, April 28th, 2009

After coming back to engineering I decided to expand the domain of my research and start the new series of posts called Trace Analysis Patterns. In addition to Citrix CDF / Microsoft ETW traces I plan to cover other variants based on my extensive software engineering background in the past where I used tracing in software products ranging from soft multi-platform real-time systems to static code analysis tools. Connection with memory dump analysis will be covered too because sometimes the combination of static and dynamic data leads to interesting observations and helps to troubleshoot and resolve customer problems especially when not all data can be collected dynamically.

In fact, stack traces and their collections are specializations of the more general traces. Another example is historical information in memory dump files especially when it is somehow timestamped.  

In this part I start with the obvious and to some extent the trivial pattern called Periodic Error. This is an error or a status value that is observed periodically many times:

No     PID  TID   Date      Time         Statement
[...]
664957 1788 22504 4/23/2009 17:59:14.600 MyClass::Initialize: Cannot open connection “Client ID: 310″, status=5  
[…]
668834 1788 19868 4/23/2009 19:11:52.979 MyClass::Initialize: Cannot open connection “Client ID: 612″, status=5 
[…]

or 

No     PID  TID   Date      Time         Statement
[...] 
202314 1788 19128 4/21/2009 16:03:46.861 HandleDataLevel: Error 12005 Getting Mask
[…]
347653 1788 17812 4/22/2009 13:26:00.735 HandleDataLevel: Error 12005 Getting Mask
[…]

Here single trace entries can be isolated from the trace and studied in detail. 

Be aware though that some modules might report periodic errors that are false positive, in the sense, that they are expected as a part of implementation details, for example, when a function returns an error to indicate that bigger buffer is required or to estimate its size for a subsequent call. It merits its own pattern name and I come to it next time with more examples.

I also created a page where I’ll will be adding all tracing patterns:

Trace Analysis Patterns   

- Dmitry Vostokov @ TraceAnalysis.org -

Debugger Log Reading Techniques (Part 1)

Thursday, February 26th, 2009

Debugger logs (textual output) from commands like !process 0 ff and various scripts can be very long and consist of thousands of pages. I found the following reading technique useful for my daily memory dump analysis activities:

CSA-QSA

Checklists-Skim-Analyze—Questions-Survey-Analyze   

1. First, have a checklist

2. Skim through the log several times

3. Write analysis notes

4. Have a list of questions based on problem description and steps 1-3

5. Survey the log

6. Write analysis notes

Repeat steps 2,3 and 5,6 if necessary.

This technique can also be applied to reading any large logs, for example, voluminous CDF or ETW traces.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.53

Monday, October 20th, 2008

Trace “back a little to” debug “further.”

John Clarke (1596-1658), Proverbs: English and Latine

- Dmitry Vostokov @ DumpAnalysis.org -

MDAA Volume 2 is available on Amazon and B&N

Saturday, October 18th, 2008

Paperback edition of Memory Dump Analysis Anthology, Volume 2 is finally available on Amazon and Barnes & Noble. Search Inside is also available on Amazon. In addition, I updated the list of recommended books:

Listmania! Crash Dump Analysis and Debugging

Hardcover edition will be available on Amazon and B&N in 2-3 weeks.

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump Analysis Anthology, Volume 2

Friday, October 3rd, 2008

“Everything is memory dump.”

I’m very excited to announce that Volume 2 is available in paperback, hardcover and digital editions:

Memory Dump Analysis Anthology, Volume 2

In one or two weeks paperback edition should also appear on Amazon and other bookstores. Amazon hardcover edition is planned to be available by the end of October.

I’m often asked when Volume 3 is available and I currently plan to release it in October - November, 2009. In the mean time I’m planning to concentrate on other publishing projects. 

- Dmitry Vostokov @ DumpAnalysis.org -

MDAA Volume 2: Table of Contents

Wednesday, October 1st, 2008

The book is nearly finished and here is the final TOC:

Memory Dump Analysis Anthology, Volume 2: Table of Contents

- Dmitry Vostokov @ DumpAnalysis.org -

Lean Tracing

Sunday, July 13th, 2008

Sometimes ETW (or CDF) traces can be really huge. Unless we trace the elusive but the specific error we already know about, there is no need to make such traces if we can reproduce the issue. My favourite example is connectivity problems when you cannot connect to a terminal server. The best way is to start tracing, try to connect, get an error and stop tracing. Usually it takes no more than a couple of minutes. We can even trace all modules here just to make sure that we don’t miss anything. It is also better to focus on one specific scenario per one lean trace instead of packing several of them into one big trace.

- Dmitry Vostokov @ DumpAnalysis.org -

ETW Tracing Tools

Tuesday, May 13th, 2008

There are few tools available. Please drop a comment if you know any other. As the focus here is mainly troubleshooting Citrix terminal service environments I put links to Citrix articles where possible:

- Dmitry Vostokov @ DumpAnalysis.org -

The First Windows® Memory Dump Analysis Book!

Tuesday, April 15th, 2008

I’m very proud to announce that it is finally available in both paperback and hardback. Why have I made available both editions? Because I personally prefer hardcover books. You can order the book today and it will be printed in 3-5 days (paperback) or 5-10 days (hardcover) and sent to you:

Memory Dump Analysis Anthology, Volume 1

Note: although listed on Amazon and other online bookstores it is not immediately available at these stores at the moment due to the late submission. I apologize for this. However, I expect that in a few weeks pre-orders taken there will be eventually fulfilled. In the mean time, if you want the book now, you can use the link above.

- Dmitry Vostokov @ DumpAnalysis.org -

CDF Traces: Using Performance Analyzer

Saturday, April 12th, 2008

Because Citrix CDF traces are ETW based it should be possible to process and analyze them using XPrerf and XPerfView from the latest Microsoft Windows Performance Tools:

Windows Performance Tools Kit, v.4.1.1

- Dmitry Vostokov @ DumpAnalysis.org -

CDF Traces: Analyzing Process Launch Sequence

Monday, March 31st, 2008

Citrix CDF traces are based on ETW (Event Tracing for Windows) and therefore Citrix customers, their support personnel and developers can use MS TraceView tool for troubleshooting Citrix terminal service environments:

Viewing Common Diagnostics Facility (CDF) Traces Using TraceView

In cases with slow logon or slow process startup we can analyze process launch sequence to determine delays. In the output trace we can filter tzhook module messages which also contain session id (this is quite handy to differentiate between different sessions), for example: 

PID      TID      TIME          MESSAGE     
21864    21912    06:34:53.598  tzhook: Attach on process - cmd.exe session=51         
21620    20372    06:34:59.754  tzhook: Attach on process - acregl.exe session=51         
18668    21240    06:35:02.704  tzhook: Attach on process - cmstart.exe session=51         
18560    18832    06:35:02.735  tzhook: Attach on process - wfshell.exe session=51         
18204    20060    06:35:06.575  tzhook: Attach on process - icast.exe session=51         
20640    21104    06:35:07.717  tzhook: Attach on process - LOGON.EXE session=51         
21188    21032    06:35:07.938  tzhook: Attach on process - cscript.exe session=51         
21888    19592    06:35:11.157  tzhook: Attach on process - WScript.exe session=51         
20600    20732    06:35:11.780  tzhook: Attach on process - admin.exe session=51         
17976    20456    06:35:18.752  tzhook: Attach on process - winlogon.exe session=53         
21332    13156    06:35:51.625  tzhook: Attach on process - mpnotify.exe session=53         
10988    10732    06:35:57.043  tzhook: Attach on process - rundll32.exe session=53

Here is another process launch sequence for published Notepad application:

PID     TID     TIME            MESSAGE
15828   18720   15:34:02.637    tzhook: Attach on process - winlogon.exe session=2 
5300    18508   15:34:03.043    tzhook: Attach on process - mpnotify.exe session=2 
17948   19300   15:34:03.356    tzhook: Attach on process - userinit.exe session=2 
17956   19316   15:34:03.415    tzhook: Attach on process - cmd.exe session=2 
5384    5324    15:34:03.443    tzhook: Attach on process - cmd.exe session=2 
19432   19264   15:34:03.461    tzhook: Attach on process - SSONSVR.EXE session=2 
12480   7472    15:34:03.466    tzhook: Attach on process - cmd.exe session=2 
19448   19364   15:34:03.474    tzhook: Attach on process - net.exe session=2 
19416   19656   15:34:03.489    tzhook: Attach on process - acregl.exe session=2 
19480   19596   15:34:03.544    tzhook: Attach on process - cmstart.exe session=2 
664     19512   15:34:03.559    tzhook: Attach on process - wfshell.exe session=2 
19904   13140   15:34:03.610    tzhook: Attach on process - net.exe session=2 
6864    20036   15:34:03.746    tzhook: Attach on process - icast.exe session=2 
19540   20016   15:34:03.749    tzhook: Attach on process - ctfmon.exe session=2 
19944   19032   15:34:03.757    tzhook: Attach on process - net.exe session=2 
10232   18356   15:34:03.787    tzhook: Attach on process - notepad.exe session=2

Such sequences are also useful to determine a process upon which the session initialization or startup sequence hangs. In this case a user dump of that process might be useful.

Of course we can do all this with Process Monitor and other similar tools but here we get other Citrix related trace messages as well. All in one.

- Dmitry Vostokov @ DumpAnalysis.org -