Clipboard Issues Explained

December 9th, 2006

I believe every Citrix user experienced clipboard breaks at least once. I remember my frustration when I coudn’t copy between Outlook and Vantive sessions and so 2.5 years ago I wrote RepairCBDChain tool to help to temporary restore clipboard functionality. Recently this feature was incorporated into ICA client. You can read about it in the client readme file (1. … [From 9.100][#112636]). However it is not enabled by default and if you experience clipboard breaks on the server side or you want to restore clipboard functionality immediately on your client without closing your session to apply changes to appsrv.ini or simply you are still using an old client then you can still benefit from this tool.

A month ago I promised to explain how my tool works. You all know that primary method for notifying windows about various events is window message mechanism. One of these notification events is clipboard notification message: WM_DRAWCLIPBOARD. Usually applications do not know whether clipboard content has changed if another program copied new data. Generally if you open Edit menu you see Paste enabled if there is data in the clipboard. This is done by application code itself by checking if clipboard is non-empty. If the application finds that clipboard is non-empty indeed it enables Paste menu item or disables it otherwise. In case of ICA client (wfica32.exe) it needs to know whether clipboard contains new data in order to send it down via ICA channel to a server session.

Windows has a mechanism to notify applications about clipboard changes. An application interested in such notifications has to register itself in the so called clipboard chain. Windows inserts it on top of that chain and that application is responsible to propagate changes down the chain:

rc1.jpg

If a 3rd-party application forgets to forward notifications down then we have a broken clipboard chain and clipboard changes are not sent via ICA protocol:

rc2.jpg

If you run RepairCBDChain.exe it tries to find the window of wfica32.exe and registers it for clipboard notifications again:

rc3.jpg 

However if it finds the second instance of wfica32.exe (as on the picture above) the first instance will be still cut off from notifications and this explains why RepairCBDChain.exe doesn’t work sometimes.

On the server session side the picture is similar (the registered application is wfshell.exe):

rc4.jpg

rc5.jpg

rc6.jpg

You can see WM_DRAWCLIPBOARD messages in MessageHistory logs for wfica32.exe process:

PID.TID: c20.c0c

HWND: 0x002501D4
Class: "wMFService006600CA004"
Title: "Microsoft Outlook7718 - MetaFrame Presentation Server Client [SpeedScreen On]"

HWND: 0x003F08DC
Class: "Transparent Windows Client"
Title: "^P ^b24 of 24 - Clipboard^b^SItem collected. - \\Remote"

HWND: 0x004E0332
Class: "WFClip"
Title: "WFClip"
17:58:53:484 S WM_DRAWCLIPBOARD (0×308) wParam: 0xd0aa0 lParam: 0×0

HWND: 0x0094036E
Class: "TWI Link"
Title: ""

Hope this little excursion explained clipboard chain, how it becomes broken and how it is repaired.

- Dmitry Vostokov -

Dmp2Txt: Solving Security Problem

December 9th, 2006

This is a follow up to my previous Q&A about crash dumps and security issues like exposing confidential information stored in memory: Crash Dumps and Security. It seems a solution exists which allows to do some sort of crash dump analysis or at least identify problem components without sending complete or kernel memory dumps.

This solution takes advantage of WinDbg ability to execute scripts of arbitrary complexity. Couple of months ago I wrote about scripts and they really help me in pulling out various information from complete memory dumps:

WinDbg scripts
Yet another Windbg script
Critical sections

Now I created the bigger script that combines together all frequent commands used for identification of potential problems in memory dumps:

  • !analyze -v
  • !vm 4
  • lmv
  • !locks
  • !poolused 3
  • !poolused 4
  • !exqueue f
  • !irpfind
  • !stacks
  • List of all processes’ thread stacks, loaded modues and critical sections (for complete memory dump)

Other commands can be added if necessary.

How does all this work? A customer has to install Debugging Tools for Windows from Microsoft. This can be done on any workstation and not necessarily in a production environment. Then the customer has to run WinDbg.exe with some parameters including path(s) to symbols (-y), a path to memory dump (-z) and a path to script (-c):

C:\Program Files\Debugging Tools for Windows>WinDbg.exe -y "srv*c:\mss*http://msdl.microsoft.com/download/symbols" -z MEMORY.DMP -c "$$><c:\WinDbgScripts\Dmp2Txt.txt;q" -Q -QS -QY –QSY

Once WinDbg.exe finishes (it can run for couple of hours if you have many processes in your complete memory dump) you can copy the .log file created in “C:\Program Files\Debugging Tools for Windows” folder, archive it and send it to support for analysis. Kernel and process data and cached files are not exposed in the log! And because this is a text file the customer can inspect it before sending.

Here are the contents of Dmp2Txt.txt file:

$$
$$ Dmp2Txt: Dump all necessary information from complete full memory dump into log
$$
.logopen /d
!analyze -v
!vm 4
lmv
!locks
!poolused 3
!poolused 4
!exqueue f
!irpfind
!stacks
r $t0 = nt!PsActiveProcessHead
.for (r $t1 = poi(@$t0); (@$t1 != 0) & (@$t1 != @$t0); r $t1 = poi(@$t1))
{
    r? $t2 = #CONTAINING_RECORD(@$t1, nt!_EPROCESS, ActiveProcessLinks);
    .process @$t2
    .reload
    !process @$t2
    !ntsdexts.locks
    lmv
}
.logclose
$$
$$ Dmp2Txt: End of File
$$

For kernel dumps the script is simpler: 

$$
$$ KeDmp2Txt: Dump all necessary information from kernel dump into log
$$
.logopen /d
!analyze -v
!vm 4
lmv
!locks
!poolused 3
!poolused 4
!exqueue f
!irpfind
!stacks
!process 0 7
.logclose
$$
$$ KeDmp2Txt: End of File
$$

Note: if the dump is LiveKd.exe generated then due to inconsistency scripts may run forever 

- Dmitry Vostokov -

New TestDefaultDebugger Tool

December 6th, 2006

It often happens that Citrix support advises customers to change their default postmortem debugger to NTSD. But there is no way to test new settings unless some application crashes again. And some customers come back saying dumps are not saved despite new settings and we don’t know whether dumps were not saved because a crash hadn’t yet happened or default debugger hadn’t been configured properly or something else happened.

In addition the arrival of 64-bit Windows brings another problem: there are 2 default postmortem debuggers on 64-bit Windows (for 32-bit and 64-bit applications respectively):

NTSD on x64 Windows

The new tool TestDefaultDebugger forces a crash on itself to test the presence and configuration of default postmortem debugger (Dr. Watson, NTSD or other). Then if the default postmortem debugger is configured properly OS will launch it to save a dump of TestDefaultDebugger.exe process.

 

If you enabled NTSD as a default postmortem debugger (CTX105888) the following console window will briefly appear:

Postmortem debuggers are explained here:

Dumps for Dummies (Part 3)

On 64-bit Windows you can run both 32-bit TestDefaultDebugger.exe and 64-bit TestDefaultDebugger64.exe applications and then open crash dumps to see whether both postmortem debuggers have been configured properly. The tool has also command line interface so you can use it remotely:

c:\>TestDefaultDebugger.exe now

You can download the tool from Citrix support web site:

TestDefaultDebugger v1.0 for 32-bit and 64-bit platforms

- Dmitry Vostokov @ DumpAnalysis.org -

Dumps and Systems Theory

November 24th, 2006

The environment where Citrix software operates is so complex that some education in Systems Theory and basic understanding of “cause and effect” and impossibility of “action at a distance” is needed. In forthcoming mini-series I would try to highlight some notions of that.

- Dmitry Vostokov -

Inside Citrix - November 2006

November 22nd, 2006

Welcome to Inside Citrix. This monthly column gives a glimpse of different aspects of Citrix through our people. Our guests have different areas of responsibility and expertise to give you an idea of what is happening behind the scenes. We discuss items of interest with people from Product Readiness, Escalation, Technical Support, and Engineering just to name a few.

In this installment of Inside Citrix, we discuss the meaning of life with Dmitry Vostokov, EMEA Development Analysis Team Lead.

Q: Hello Dmitry, how are you? I am very happy to conduct this interview as you are a creative and prolific worker. I wonder…has fame caught up to you yet, due to your creativity?

A: I’m fine, thank you! I believe there is a synergistic effect going on here. I make the company famous and the company makes me famous.

Q: So, before I get too far ahead of myself, please tell everyone a bit of your history. Where are you from? What did you do before Citrix? How long have you been with us? What kinds of things have you been doing at Citrix during your tenure?

A: I’m from Russia. I was born near Moscow and I spent 14 years there after enrolling at Moscow State University to study chemistry. In that university, I saw a computer and immediately started programming. My first program was written in FORTRAN and had almost 200 lines. My second program had commercial success: I ported 800 FORTRAN lines to about 2000 PDP-11 assembler lines and achieved a 25 percent increase in speed (the program calculated rocket fuel properties for weeks). Since then I’d been working from home for some U.S. and Russian ISV companies (mostly in speech and image processing domains) until 1999, when I went to work in an office to see a large software factory from the inside out.

In 2001 I went to Ireland to learn English. My first job in Ireland was with Ericsson in a small town as a Senior Software Designer. The title sounded great to me, but I heard rumors that the only engineers in Ericsson were hardware engineers. So that job didn’t last long because I was headhunted by a company called Programming Research and I relocated to Dublin. I spent 1.5 years there and after working briefly for a security company (that company is extinct now) I was hired by Citrix. I’ve already spent 3.16 years here. For Citrix I analyze crash dumps and provide recommendations. It’s like being a computer psychologist assessing brain damage. I also do a bit of escalation work when I have time. I like to provide full escalation and software maintenance cycles whenever I have sufficient resources to analyze the problem, contact the customer, and provide the resolution. I also have an opportunity here to apply my software design and programming skills by writing various troubleshooting tools.

Q: Most people probably didn’t know all of that. I guarantee you that Escalation knows you well. How is the blogging going? How can readers get to your blog?

A: I love blogging. I didn’t even think about blogging until I suddenly realized its potential in information sharing. When I joined the company there was no sufficient information available about crash dump analysis, so I had to learn on my own. Now I’m happy to share what I have learnt to everyone.

One topic I like to write about in my blog at the moment is crash dump analysis patterns and anti-patterns, where I summarize general solutions you can apply or should not apply in specific contexts to common recurrent dump analysis problems.

More will come…

Q: And the tools that you create, very useful! Can you take a moment to talk about each of the ones you have created? Which ones have you gotten the best feedback about? Which ones have been the most useful?

A: Thanks! I use them too. The tool I got the most complaints about is RepairCBDChain; the tool with the fewest complaints is SystemDump. I got the best feedback about PDBFinder.

All of them are useful in certain troubleshooting scenarios. I’m preparing a presentation about all these tools and I will present it to the EMEA TRM team in December. I’ll definitely publish it as soon as I get feedback about that training.

Here are brief descriptions of these tools (most of them have different versions for various platforms, and some were even ported to Windows Mobile):

• RepairCBDChain: Repairs clipboard functionality and magically you are able to copy/paste again (not always actually – I promise to write a blog post explaining why).

• ADSCleaner: Cleans Windows NT File System (NTFS) file streams created by Citrix memory optimization code if you no longer need this feature (it also frees disk space, by the way).

• ProcessHistory: Tracks processes, threads, and modules on 32-bit and 64-bit platforms. I’m going to release a Windows Mobile version soon.

• MessageHistory: Tracks window messages. It’s similar to Spy++ but much easier to use for troubleshooting and it works on 64-bit platforms too.

• WindowHistory: Tracks windows as they change their appearance, are created, and are destroyed and saves a log file. This is what Spy++ lacks and it was the primary motivation to write this tool.

• SystemDump: Forces a dump immediately or after a specified period of time. This can be done remotely too. It works on both 32-bit and 64-bit Windows! My primary motivation was that the OSR “bang” tool doesn’t work on 64-bit Windows.

• PDBFinder: Helps to find symbol files if you have zillions of them.

• DumpCheck: Verifies that you have a valid dump and even provides recommendations to avoid common mistakes before sending dumps to support.

• CtxHidEx32: Can hide any annoying windows or message boxes and reduce unnecessary support calls. It also has a peculiar feature: you can specify an action to do before hiding the window. When the Media Player window appears it can send a message to your boss.

• Dump2Wave: My most controversial tool that allows you to hear the sound of memory corruption. Some people say it’s useless but I would say it is entertaining.

Some other upcoming tools I’m working days and nights on (when I have free time) are:

• DumpDepends: Helps to automate repetitive dumping.

• DumpAlerts: Provides notification whenever new dump is saved.

• SessionHistory: Tracks session information.

• HistoryToolbar: Organizes “History” tools into one coherent super tool.

• DumpPlayer: Plays musical dumps in real-time and provides visual images based on crash dump memory contents. I coined a term—Dump Tomography—for this.

Q: They must take some upkeep, as we see a lot of improvements, updates, and so on. I also see you provide a lot of training information on escalation techniques, debugging, analysis, and more. What do you believe is the most important characteristic of a successful escalation engineer?

A: As Winston Churchill said: “Never, never, never give up!”?

Q: Any advice for Citrix administrators who might be reading this on how to avoid trouble or have their environment best situated to speed resolution, should an issue occur?

A: If you are asked to generate and/or collect crash dumps, please tell support personnel how you got that dump. And ensure that you are sending the right dump for the right issue.

I started writing Dumps for Dummies blog posts to explain dumps and I promise to continue and expand them.

Q: What do you find most challenging about your job?

A: To work with enormous amounts of information and make quick decisions at the same time.

Q: Is there anything you can share with us about new Citrix products or technologies (not giving away confidential information) that you are excited about?

A: I would tell you that with whatever new technology comes along, crash dumps will be the same! And this gives me some optimism. Whether there will be more or less crash dumps in the future is pretty confidential though…

Q: Any plans to visit Citrix headquarters in Fort Lauderdale, Florida?

A: I’m actually visiting Citrix headquarters at the end of this month! See you there.

Q: Not so much a question, make us laugh!

A: One day we got a fax from a customer where all of the blue screen information was written down by hand—hundreds of digits… How long it took to copy all that from the screen and whether or not he made any mistakes, we will never know. The copy from that fax is still hanging on my desk wall.

Q: What do you do in your free time besides analyzing dumps, debugging and programming?

A: Read books. I read lots of them and about quite diverse subjects. However, my favorite subject for the last four years has been math—the more abstract the better.

It really helps in improving the critical thinking skills required for my job.

Thanks, Dmitry. People will know to look you up online…

WindowHistory Mobile (new release)

November 22nd, 2006

WindowHistory Mobile edition has been updated. It replaces the previous version of WindowHistory CE/Mobile 2.1 and now available in two separate executables: for Windows Mobile 5.0 (ARMV4I) and Windows Pocket PC 2003 (ARMV4). It has been tested under emulators, Acer n300 (480×640 screen) and mobile phone Mio A701 (240×320 screen). Here are screenshots from Windows Mobile 5.0 emulator:

whm50.jpg

whm50w.jpg

The tool also includes Easter Egg (activate soft keyboard, click on and then click on About button. The following window appears with scrolling text of contributors and special thanks):

whm50a.jpg

- Dmitry Vostokov -

Voices from Process Space

November 19th, 2006

Following the release of Dump2Wave tool some members of Citrix community have been asking me to provide some interesting sound fragments from dump files. I was also particularly interested in catching voices from the past: embedded fragments of human voice. So I recorded my “Hello” message, played it by Media Player and then saved a process dump. Then I converted the dump to CD-quality wave file and saved interesting sound fragments from it (to conserve space - the original wave file was 76Mb).

To listen to these fragments you can download wave files from the following location:

DumpSounds.zip (8Mb)

Here is the description of what I heard in these wave files:

- dump1.wav

  • violin
  • aliens
  • train sound
  • Hello

- dump2.wav

  • electric guitar
  • signals from cosmos

- dump3.wav

  • Morse code alphabet

- dump4.wav

  • helicopter

- dump5.wav

  • horn
  • some interesting noise and fragments of electronic music

 Enjoy :-)

Of course, you can convert kernel memory dumps to wave files and hear voices from kernel space too…

- Dmitry Vostokov -

Preview of DumpAlerts tool

November 19th, 2006

The tool monitors folders where dumps can be saved including Dr. Watson, a folder specified when NTSD is set as a default debugger, etc. It then alerts a user, an administrator or a software vendor whenever a new dump is saved:

  • Icon in System Tray changes its color from green to red
  • Popup window appears until dismissed
  • E-mail is sent to a specified address
  • Sound is played
  • Custom action is executed, for example, automatically launching WinDbg.exe with the latest dump or copying it to an ftp server

All actions are fully configurable and can be enabled/disabled. Here is the screenshot of the main window:

I’m planning to incluide TAPI support and alerts from hung applications in the next version(s).

Later this tool will included in Dump Monitor Suite

Any comments and suggestions are welcome.

- Dmitry Vostokov -

Crash Dumps for Dummies (Part 4)

November 19th, 2006

In the previous Dumps for Dummies (Part 3) I tried to explain the nature of crashes. Another category of problems happens very often and we also need a dump for analysis: hangs. There is some confusion exists in understanding the difference between these two categories: crash and hang. Although sometimes a hang is a direct consequence of a crash most of the time hangs happen independently. They also manifest themselves differently. Let’s look at application (process) crashes and hangs first. When a crash happens an application (process) often disappears. When hang happens an application (process) is still in memory: you can see it in Task Manager, for example, but it doesn’t respond to user commands or to any other requests like pinging a TCP/IP port. If we have a crash in OS then the most visible manifestation is blue screen and/or reboot. If we have a hang then everything freezes.

Application or system hang happens because from high level view of the interaction between application or OS components (modules) is done via messages. One component sends a message to another and waits for a response. Some components are critical, for example, registry. The following hand-made picture depicts very common system hang situations when the register component stops responding. Then every running application (process) stops responding if its execution path depends on registry access.

The very common reason for hang is so called deadlock when two running applications (their execution paths, threads) are waiting for each other. Here is the analogy with a blocked road:

In order to see what’s inside the process or OS which caused a hang we need a dump. Usually this dump is called a crash dump too because in order to get it the usual method is to make some sort of a trap which causes an application or OS to crash and to save the dump. I personally prefer to call these dumps just memory dumps to avoid confusion.  

How can you get a memory dump if your application or service hangs?

How can you get a memory dump if your system hangs?

For most system hangs choosing Kernel memory dump option in Control Panel\System\Advanced\Startup and Recovery applet is sufficient. Kernel memory dumps are smaller and less susceptible to corruption or truncation due to small page file size. If you discover that you need to peer inside running user applications then you can always ask for another Complete memory dump when the problem happens again.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dumps and Security

November 17th, 2006

Suppose you work in a banking industry or for any company that has sensitive information. Is it secure to send a crash dump outside for analysis? One semi-anonymous person asked this question on www.dumpanalysis.org and here is my unedited answer based on my experience in crash dump analysis and kernel level development:

"It depends on credit card transactions software design and architecture and what type of dump is configured in Control Panel\System\Advanced\Startup and Recovery applet: Small, Kernel or Complete.

Software usually encrypts data before sending it down TCP/IP stack or other network protocol. If your credit card transactions software doesn't have any kernel space encryption drivers and doesn't rely on any MS or other 3rd-party encryption API that might send data to kernel, communicate to KSECDD or to user-space component like LSASS via LPC/RPC you can safely assume that kernel memory dumps will not have unencrypted data. If encryption is done entirely in user space Small memory dump and Kernel memory dump will only have encrypted fragments. Otherwise there is a probability that BSOD happens just before encryption or after decryption or when secure protocol is being handled. This exposure can even happen in Small memory dumps if BSOD happens in the thread that handles sensitive information in kernel mode.

The same applies if your software stores credit data on any medium. If it stores only encrypted data and decrypts entirely in user space without any transition to kernel it should be safe to enable kernel memory dump.

If your goal is ultimate security then even Small memory dump (64Kb) should not be allowed. But in reality as we consider probabilities sending small memory dump is equivalent to no more than exposing just one credit card number or one password.

What you must avoid at any cost is to enable complete memory dump option in control panel. In this case all your credit card transactions software code and data including file system cache will be exposed.

Contrary to complete memory dump kernel memory dump will not have much data even if some potion of it is being communicated during crash time. I would also be interested in hearing what other experts say. This is very interesting topic."

If you are interested too you can participate in that discussion (registration is needed to avoid spammers):

http://www.dumpanalysis.org/forum/viewtopic.php?t=56

- Dmitry Vostokov -

How WINE can help in Crash Dump Analysis

November 16th, 2006

You probably already know or have heard about the project WINE: Windows API on top of X and Unix

winehq.com 

I first heard about it more than 10 years ago when it started. Today I rediscovered it again and was really surprised. I was looking for one NT status code I couldn’t find in MS official documentation and found it here:

dlls/ntdll/error.c

In order to run Win32 programs WINE emulates all API calls including OLE32, USER32, GDI32, KERNEL32, ADVAPI32 and of course, NTDLL:

dlls/ntdll
dlls/ole32
dlls/user32
dlls/kernel32
dlls/gdi32
dlls/advapi32

Plus hundreds of other components. All source code is located here:

http://cvs.winehq.com/cvsweb/wine/

So if want to see how particular function or protocol might have been implemented hypothetically by Windows OS designers it is a good place to start.

- Dmitry Vostokov -

Horrors of debugging legacy code

November 8th, 2006

We all know that macro definitions in C and C++ are evil. They cause maintenance nightmares by introducing subtle bugs. I never took that seriously until last weekend I was debugging my old code written 10 years ago which uses macros written 15 years ago :-) 

My Windows Mobile 5.0 application was crashing when I was using POOM COM interfaces (Pocket Outlook Object Model). The crash never pointed to my code. It always happened after pimstore.dll and other MS modules were loaded and COM interfaces started to return errors. I first suspected that I was using POOM incorrectly and rewrote all code several times and in different ways. No luck. Then I tried PoomMaster sample from Windows Mobile 5.0 SDK and it worked well. So I rewrote my code in exactly the same way as in that sample. No luck. My last hope was that moving code from my DLL to EXE (as in sample SDK project) would eliminate crashes but it didn’t help too. Then I slowly started to realize that the problem might have been in my old code and I also noticed that one old piece of code had never been used before. So I started debugging by elimination (commenting out less and less code) until I found a macro. I had to stare at it for couple of minutes until I realized that one pair of brackets was missing and that caused allocating less memory and worse: the returned pointer to allocated memory was multiplied by 2! So the net result was the pointer pointing to other modules and subsequent string copy was effectively overwriting their memory and eventually causing crashes inside MS dlls.  

Here is that legacy macro:

#define ALLOC(t, p, s)
((p)=(t)GlobalLock(GlobalAlloc(GHND, (s))))

It allocates memory and returns a pointer. It should have been called like this (size parameter is highlighted in blue):

if (ALLOC(LPWSTR,lpm->lpszEvents,
(lstrlen(lpszMacro)+1)*sizeof(WCHAR)))
{
lstrcpy(lpm->lpszEvents, lpszMacro);
    lpm->nEvents=lstrlen(lpm->lpszEvents)+1;
}

What I found is the missing bracket before lstrlen and last enclosing bracket (size parameter is highlighted in red):

if (ALLOC(LPWSTR,lpm->lpszEvents,
lstrlen(lpszMacro)+1)*sizeof(WCHAR))
{
lstrcpy(lpm->lpszEvents, lpszMacro);
    lpm->nEvents=lstrlen(lpm->lpszEvents)+1;
}

The resulted code after macro expansion looks like this

if (lpm->lpszEvents=(LPWSTR)GlobalLock(GlobalAlloc(GHND,
lstrlen(lpszMacro)+1))*sizeof(WCHAR))

You see that the pointer to allocated memory is multiplied by two and string copy is performed to a random place in the address space of other loaded dlls corrupting their data and causing the process to crash later.

- Dmitry Vostokov -

Crash Dump Analysis Patterns (Part 4)

November 3rd, 2006

After looking at one dump today where all thread environment blocks were zeroed, import table corrupt and recalling some similar cases I encountered previously I came up with the next pattern: Lateral Damage.

When this problem happens you don’t have much choice and your first temptation is to apply Alien Component anti-pattern unless your module list is corrupt and you have manifestation of another common problem I will talk about next time: Corrupt Dump.

Anti-pattern is not always bad solution if complemented by subsequent verification and backed by experience. If you get damaged process and thread structures you can point to a suspicious component (supported by some evidence like raw stack analysis and educated guess) and request additional dumps in hope to get less damaged process space or see that component again. At the very end if removing it stabilizes the customer environment it proves you were right.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 1)

November 1st, 2006

In any domain of activity where patterns exist we can find anti-patterns too. They are bad solutions for recurrent problems in specific contexts. One of them I would like to introduce briefly is Alien Component. In essence, when every technique fails or you run out of WinDbg commands look at some innocent component you have never seen before or don’t have symbols for: be it some driver or hook. Of course, this component cannot be the component developed by the company you are working for. :-)

- Dmitry Vostokov -

Crash Dump Analysis Patterns (Part 3)

November 1st, 2006

Another pattern I observe frequently is False Positive Dump. We get dumps pointing in a wrong direction or not useful for analysis and this usually happens when wrong tool was selected or right one was not properly configured for capturing crash dumps. Here is one example I investigated in detail.

The customer experienced frequent spooler crashes. The dump was sent for investigation to find an offending component: usually it is a printer driver. WinDbg revealed the following exception thread stack (parameters are not shown here for readability):

KERNEL32!RaiseException+0x56
KERNEL32!OutputDebugStringA+0x55
KERNEL32!OutputDebugStringW+0x39
HPZUI041!ConvertTicket+0x3c90
HPZUI041!DllGetClassObject+0x5d9b
HPZUI041!DllGetClassObject+0x11bb

The immediate response is to point to HPZUI041.DLL but if we look at parameters to KERNEL32!OutputDebugStringA we would see that the string passed to it is a valid NULL-terminated string:

0:010> da 000d0040
000d0040  ".Lower DWORD of elapsed time = 3"
000d0060  "750000."

If we disassemble OutputDebugStringA up to RaiseException call we would see:

0:010> u KERNEL32!OutputDebugStringA
KERNEL32!OutputDebugStringA+0x55
KERNEL32!OutputDebugStringA:
push    ebp
mov     ebp,esp
push    0FFFFFFFFh
push    offset KERNEL32!'string'+0x10
push    offset KERNEL32!_except_handler3
mov     eax,dword ptr fs:[00000000h]
push    eax
mov     dword ptr fs:[0],esp
push    ecx
push    ecx
sub     esp,228h
push    ebx
push    esi
push    edi
mov     dword ptr [ebp-18h],esp
and     dword ptr [ebp-4],0
mov     edx,dword ptr [ebp+8]
mov     edi,edx
or      ecx,0FFFFFFFFh
xor     eax,eax
repne scas byte ptr es:[edi]
not     ecx
mov     dword ptr [ebp-20h],ecx
mov     dword ptr [ebp-1Ch],edx
lea     eax,[ebp-20h]
push    eax
push    2
push    0
push    40010006h
call    KERNEL32!RaiseException

There is no jumps in the code prior to KERNEL32!RaiseException call and this means that raising exception was expected. Also MSDN documentation says:

“If the application has no debugger, the system debugger displays the string. If the application has no debugger and the system debugger is not active, OutputDebugString does nothing.”

So spoolsv.exe might have been monitored by a debugger which caught that exception and instead of dismissing it dumped the spooler process.

If we look at ‘analyze -v’ output we could see the following:

Comment: 'Userdump generated complete user-mode minidump
with Exception Monitor function on WS002E0O-01-MFP'
ERROR_CODE: (NTSTATUS) 0x40010006 -
Debugger printed exception on control C.

Now we see that debugger was User Mode Process Dumper you can download from Microsoft web site:

How to use the Userdump.exe tool to create a dump file 

If we download it, install it and write a small console program in Visual C++ to reproduce this crash:

#include "stdafx.h"
#include
int _tmain(int argc, _TCHAR* argv[])
{
    OutputDebugString(_T("Sample string"));
    return 0;
}

and if we compile it in Release mode and configure Process Dumper applet in Control Panel to include TestOutputDebugString.exe with the following properties:

and then run our program we would see Process Dumper catching KERNEL32!RaiseException and saving the dump.

Even if we select to ignore exceptions that occur inside kernel32.dll this tool still dumps our process. Now we can see that the customer most probably enabled ‘All Exceptions’ check box too. What the customer should have done is to use default rules like on the picture below:

Or select exception codes manually. In this case no dump is generated even if we manually select all of them. Just to check that the latter configuration still catches access violations we can add a line of code dereferencing NULL pointer and Process Dumper will catch it and save the dump.

Conclusion: the customer should have used NTSD as a default postmortem debugger from the start. Then if crash happened we would have seen the real offending component or could have applied other patterns and requested additional dumps.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 2)

October 31st, 2006

Another pattern I would like to discuss is Dynamic Memory Corruption (and its user and kernel variants called Heap Corruption and Pool Corruption). You might have already guessed it :-) It is so ubiquitous. And its manifestations are random and usually crashes happen far away from the original corruption point. In your user mode and space part of exception threads (don’t forget about Multiple Exceptions pattern) you would see something like this:

ntdll!RtlpCoalesceFreeBlocks+0x10c
ntdll!RtlFreeHeap+0x142
MSVCRT!free+0xda
componentA!xxx

or this

ntdll!RtlpCoalesceFreeBlocks+0x10c
ntdll!RtlpExtendHeap+0x1c1
ntdll!RtlAllocateHeap+0x3b6
componentA!xxx

or any similar variants and you need to know exact component that corrupted application heap (which usually is not the same as componentA.dll you see in crashed thread stack).

For this common recurrent problem we have a general solution: enable heap checking. This general solution has many variants applied in a specific context:

  • parameter value checking for heap functions

  • user space software heap checks before or after certain checkpoints (like “malloc”/”new” and/or “free”/”delete” calls): usually implemented by checking various fill patterns, etc.

  • hardware/OS supported heap checks (like using guard and nonaccessible pages to trap buffer overruns)

The latter variant is the mostly used according to my experience and mainly due to the fact that most heap corruptions originate from buffer overflows. And it is easier to rely on instant MMU support than on checking fill patterns. Here is the article from Citrix support web site describing how you can enable full page heap. It uses specific process as an example: Citrix Independent Management Architecture (IMA) service but you can substitute any application name you are interested in debugging:

How to enable full page heap

and another article:

How to check in a user dump that full page heap was enabled

The following Microsoft article discusses various heap related checks:

How to use Pageheap.exe in Windows XP and Windows 2000

The Windows kernel analog to user mode and space heap corruption is called page and nonpaged pool corruption. If we consider Windows kernel pools as variants of heap then exactly the same techniques are applicable there, for example, the so called special pool enabled by Driver Verifier is implemented by nonaccessible pages. Refer to the following Microsoft article for further details:

How to use the special pool feature to isolate pool damage

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 1)

October 30th, 2006

After doing crash dump analysis exclusively for more than 3 years I decided to organize my knowledge into a set of patterns (so to speak in a dump analysis pattern language and therefore try to facilitate its common vocabulary).

What is a pattern? It is a general solution you can apply in a specific context to a common recurrent problem.

There are many pattern and pattern languages in software engineering, for example, look at the following almanac that lists +700 patterns:

The Pattern Almanac 2000

and the following link is very useful:

Patterns Library

The first pattern I’m going to introduce today is Multiple Exceptions. This pattern captures the known fact that there could be as many exceptions (”crashes”) as many threads in a process. The following UML diagram depicts the relationship between Process, Thread and Exception entities:

Every process in Windows has at least one execution thread so there could be at least one exception per thread (like invalid memory reference) if things go wrong. There could be second exception in that thread if exception handling code experiences another exception or the first exception was handled and you have another one and so on.

So what is the general solution to that common problem when an application or service crashes and you have a crash dump file (common recurrent problem) from a customer (specific context)? The general solution is to look at all threads and their stacks and do not rely on what tools say.

Here is a concrete example from one of the dumps I got today:

Internet Explorer crashed and I opened it in WinDbg and ran ‘!analyze -v’ command. This is what I got in my WinDbg output:

ExceptionAddress: 7c822583 (ntdll!DbgBreakPoint)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 8fb834b8
   Parameter[2]: 00000003

Break instruction, you might think, shows that the dump was taken manually from the running application and there was no crash - the customer sent the wrong dump or misunderstood instructions. However I looked at all threads and noticed the following two stacks (threads 15 and 16):

0:016>~*kL
...
15  Id: 1734.8f4 Suspend: 1 Teb: 7ffab000 Unfrozen
ntdll!KiFastSystemCallRet
ntdll!NtRaiseHardError+0xc
kernel32!UnhandledExceptionFilter+0x54b
kernel32!BaseThreadStart+0x4a
kernel32!_except_handler3+0x61
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xe
componentA!xxx
componentB!xxx
mshtml!xxx
kernel32!BaseThreadStart+0x34

# 16  Id: 1734.11a4 Suspend: 1 Teb: 7ffaa000 Unfrozen
ntdll!DbgBreakPoint
ntdll!DbgUiRemoteBreakin+0x36

So we see here that the real crash happened in componentA.dll and componentB.dll or mshtml.dll might have influenced that. Why this happened? The customer might have dumped Internet Explorer manually while it was displaying an exception message box. The following reference says that ZwRaiseHardError displays a message box containing an error message:

Windows NT/2000 Native API Reference

Buy from Amazon

Or perhaps something else happened. Many cases where we see multiple thread exceptions in one process dump happened because crashed threads displayed message boxes like Visual C++ debug message box and preventing that process from termination. In our dump under discussion WinDbg automatic analysis command recognized only the last breakpoint exception (shown as # 16). In conclusion we shouldn’t rely on ”automatic analysis” often anyway and probably should write our own extension to list possible multiple exceptions (based on some heuristics I will talk about later).

- Dmitry Vostokov @ DumpAnalysis.org -

Applying API Wrapper Pattern

October 30th, 2006

Recently I had been porting my old Win32 legacy project (more than 100,000 lines) to Windows Mobile. Here I summarize the approach I used and I can say now that it was very successful (no single crash since I finished my porting - the original Win32 program was very stable indeed but we all know that hidden bugs surface or introduced when project is ported to another platform). The project was written in Windows 3.x 16-bit era and then it was already ported to Win32 in Windows 95 era. Win32 interface is huge and it contains many legacy Win16 functions (mainly for easy portability and compatibility with existing Win16 code base). The following UML component diagram depicts application dependencies on many Win32 API and runtime libraries from build perspective:

Windows Mobile (in essence Windows CE) has smaller interface and many functions available in Win32 API (especially legacy Win16) and many C runtime functions are absent. My project uses many such functions (due to its history) so the interface becomes broken. The following UML component diagram depicts this dependency:

First I ported my project to use UNICODE strings and UNICODE function equivalents throughout. This was a huge task already. Then instead of further rewriting my code which uses many absent functions and therefore quite possibly to introduce new bugs due to changed semantics I decided to apply a variant of Adapter or Wrapper pattern (for non-object-oriented API). My application now still uses old functions but links to a set of libraries translating these calls into existing Windows CE API. The following UML component depicts the final component infrastructure from build perspective:

Here is the list of functions I had to translate:

- RegEmu.lib: translates INI file calls into registry

  • GetProfileString
  • GetProfileInt
  • GetPrivateProfileString
  • GetPrivateProfileInt
  • WriteProfileString
  • WritePrivateProfileString

- FileEmu.lib: various file system related calls

  • _lcreatW
  • _lopenW
  • OpenFileW
  • WinExecW
  • _wsplitpath
  • _wsplitpathparam
  • _waccess
  • _wmkdir
  • _wrename
  • _wfindfirst
  • _wfindnext
  • _findclose
  • _wunlink
  • _wrmdir
  • _llseek
  • _lclose
  • _hread
  • _hwrite
  • _lread
  • _lwrite
  • _wstat
  • _wgetcwd
  • _wmakepath
  • _chdrive
  • _wchdir
  • ShellExecute
  • GetWindowsDirectory
  • GetSystemDirectory

- GdiEmu.lib: various graphics functions

  • SetMapMode
  • CreateFont
  • CreateDIBitmap

- UserEmu.lib: various UI related functions

  • GetKeyNameText
  • GetScrollPos
  • GetScrollRange
  • GetLastActivePopup
  • IsIconic
  • IsZoomed
  • IsBadStringPtr
  • IsMenu
  • VkKeyScan
  • GetKeyboardState
  • SetKeyboardState
  • ToAscii
  • MulDiv

PS. If you are interested in applying patterns to your C++ projects or simply in using them to describe your design and architecture these books are good place to start:

Design Patterns: Elements of Reusable Object-Oriented Software

Buy from Amazon

Pattern-Oriented Software Architecture, Volume 1: A System of Patterns

Buy from Amazon

Pattern-Oriented Software Architecture, Volume 2, Patterns for Concurrent and Networked Objects

Buy from Amazon

- Dmitry Vostokov -

Hide and seek in a Citrix farm

October 29th, 2006

Just want to mention this tool I wrote some time ago: CtxHideEx32. It has nothing to do with CtxHide.exe which is executed during every ICA logon. I simply borrowed the prefix. The purpose of CtxHideEx32 is to hide messages or windows you don’t want your users to see. Or perhaps you want to execute certain actions if some window appears like logging off the poor user who launched Media Player or sending a message to his boss :-).  Policy reinforcement… Read the following article and try (requires free registration on Citrix support web site):

http://support.citrix.com/article/CTX110341 

- Dmitry Vostokov @ DumpAnalysis.org -

Dump2Wave update

October 29th, 2006

Dump2Wave has been updated and can be downloaded from the same link as before:

Download Dump2Wave version 1.2.1

What’s new and corrections list:

  • Dumps with file size not divisible by the product of bytes-per-sample and channels (sample alignment) are converted into correct wave files now
  • Sample alignment is calculated correctly for non-CD quality wave
  • File paths with spaces are allowed now (you need to include them in double quotes)
  • Added diagnostic and error messages

- Dmitry Vostokov -