Archive for October, 2008

MDAA Volume 2 is available on Amazon and B&N

Saturday, October 18th, 2008

Paperback edition of Memory Dump Analysis Anthology, Volume 2 is finally available on Amazon and Barnes & Noble. Search Inside is also available on Amazon. In addition, I updated the list of recommended books:

Listmania! Crash Dump Analysis and Debugging

Hardcover edition will be available on Amazon and B&N in 2-3 weeks.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.52

Saturday, October 18th, 2008

“The moment you have worked out” a code fix , “start checking it - it probably isn’t right.”

Edmund Callis Berkeley, Computers and Automation magazine, Right Answers: A Short Guide for Obtaining Them

- Dmitry Vostokov @ DumpAnalysis.org -

Windows Internals Certificate

Friday, October 17th, 2008

I was surprised when I found on Microsoft MCP site that I can print my certificate for Windows Internals exam. Here it is:

Seems Microsoft certificates changed colours again. I was certified as MCSD in Microsoft® Windows® Architecture, OLE 2.0 and MFC in 1997 and after upgrading the status by passing “Analyzing Requirements and Defining Solution Architectures” exam around the turn of the century I got updated certificates with silver colors. I’ll put pictures of them later.

- Dmitry Vostokov @ DumpAnalysis.org -

Citrix Certificate

Friday, October 17th, 2008

Shortly after celebrating 5 years at Citrix I got my first certificate since the 1st of December, 2006 when I was certified in Kernel Debugging for Windows®. Now it is also my first Citrix certificate and it marks my 5 year service:

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 11)

Thursday, October 16th, 2008

This one is called No Question and is summarized by the following dialog that I observed many times:

As an engineer proceeds with analysis of a problem a phone rings and an irritated customer (internal or external) asks “What is the answer?”, an engineer glancing over the problem description replies, “What is the question?”.

This immediately came to my mind when I found this famous dialog:

“As the poet Gertrude Stein lay on her deathbed, her partner, Alice B. Toklas, leaned over and whispered, ‘What is the answer, Gertrude?’. Replied Stein, ‘What’s the question?’”

- Dmitry Vostokov @ DumpAnalysis.org -

Tales before the dawn of virtualization

Wednesday, October 15th, 2008

Before the advent of virtual machines to Wintel platforms people had to take pictures of bluescreens or even write all data down by hand to fax it:

 

How easy it is today! You just do Print Screen.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.51

Wednesday, October 15th, 2008

The following bugtation is quite wise and dedicated to beginners learning WinDbg (see Common Mistakes and Coincidental Symbolic Information for some examples).

“You rule the” debugger, “not the” debugger “you”.

John Dryden, The Hind and the Panther

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 1b)

Wednesday, October 15th, 2008

Almost 2 years have passed since I wrote the first post about crash dump analysis patterns: Multiple Exceptions. Today I write about multiple exceptions or faults in kernel mode. Here I distinguish multiple exceptions from nested exceptions. The latter ones in kernel result in double faults, see for example, Stack Overflow pattern. At the first glance it looked like the dump was saved manually:

0: kd> !analyze -v
[...]
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
[...]

However further down in analysis report there was the presence of page fault:

TRAP_FRAME:  a38df520 -- (.trap 0xffffffffa38df520)
ErrCode = 00000002
eax=b6d9220f ebx=b6ab4ffb ecx=00000304 edx=eaf2fdea esi=b6d9214c edi=b6ab8189
eip=bfa10e6e esp=a38df594 ebp=a38df5ac iopl=0  nv up ei ng nz ac po cy
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000  efl=00010293
driver+0x2ae6e:
bfa10e6e 895304   mov     dword ptr [ebx+4],edx ds:0023:b6ab4fff=????????
Resetting default scope

STACK_TEXT: 
a38df410 b48aa532 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1b
a38df440 b48a9d2c 000eba28 9282c8c6 00000000 i8042prt!I8xProcessCrashDump+0x256
a38df488 80839595 89d0c008 8a0eb970 0001000a i8042prt!I8042KeyboardInterruptService+0x225
a38df488 80836bfa 89d0c008 8a0eb970 0001000a nt!KiInterruptDispatch+0x49
a38df520 bfa10e6e badb0d00 eaf2fdea 8867cbe8 nt!KiTrap0E+0xbc
WARNING: Stack unwind information not available. Following frames may be wrong.
a38df5ac bfa22461 b6ab423b 000003dc 00000007 driver+0×2ae6e
[…]

Looking at b6ab4fff address shows that it crosses page boundary, see Data Alignment pattern.

We also see that this thread was running and consumed too much kernel time, see Spiking Thread pattern:

0: kd> !thread
THREAD 88e686d8  Cid 1e48.1f7c  Teb: 7ffdf000 Win32Thread: b669de70 RUNNING on processor 0
Not impersonating
DeviceMap                 dc971120
Owning Process            889e0d88       Image:         ProcessA.EXE
Wait Start TickCount      9231345        Ticks: 0
Context Switch Count      2196221                 LargeStack
UserTime                  00:00:35.562
KernelTime                04:51:23.656
[…]

Intrigued, we see another running thread on the second processor:

0: kd> !running
     Prcb      Current   Next   
  0  ffdff120  88e686d8            ................
  1  f7727120  88bd33f8            …………….

0: kd> !thread 88bd33f8
THREAD 88bd33f8  Cid 2fdc.27f0  Teb: 7ffdf000 Win32Thread: b6640ab8 RUNNING on processor 1
Not impersonating
DeviceMap                 d7a13b40
Owning Process            89e45200       Image:         ProcessA.EXE
Wait Start TickCount      9231345        Ticks: 0
Context Switch Count      2324364                 LargeStack
UserTime                  00:00:21.171
KernelTime                05:02:09.500
Win32 Start Address ProcessA (0×30001e28)
Start Address kernel32!BaseProcessStartThunk (0×77e617f8)
Stack Init ac5e7bd0 Current ac5e7078 Base ac5e8000 Limit ac5e1000 Call ac5e7bd8
Priority 6 BasePriority 6 PriorityDecrement 0
ChildEBP RetAddr  Args to Child             
ac5e7150 bfa10e6e badb0d00 dbeaffdb 89a793d8 nt!KiTrap0E+0xbc (FPO: [0,0] TrapFrame @ ac5e7150)
WARNING: Stack unwind information not available. Following frames may be wrong.
ac5e71dc bfa22461 b701f15f ffffff24 00000007 driver+0×2ae6e
[…]

We see it is spiking CPU too and we detect a possible loop in page fault handler:

0: kd> .thread 88bd33f8
Implicit thread is now 88bd33f8

0: kd> ~1s

1: kd> r
eax=fffff81c ebx=ac5e71dc ecx=88bd33f8 edx=dbeaffdb esi=b6f81168 edi=b701fffe
eip=80836bfa esp=ac5e7150 ebp=ac5e7150 iopl=0 nv up ei pl nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000  efl=00000206
nt!KiTrap0E+0xbc:
80836bfa 0f84e5010000  je  nt!KiTrap0E+0×2a7 (80836de5) [br=0]

When looking at raw stack we see that the loop happened after processing this exception:

1: kd> .trap ac5e7150
ErrCode = 00000002
eax=b6f8122b ebx=b701fffa ecx=fffffe4c edx=dbeaffdb esi=b6f81168 edi=b70201a0
eip=bfa10e6e esp=ac5e71c4 ebp=ac5e71dc iopl=0 nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293
driver+0×2ae6e:
bfa10e6e 895304 mov dword ptr [ebx+4],edx ds:0023:b701fffe=????????

The address crosses page boundary too:

1: kd> !pte b701fffe
               VA b701fffe
PDE at   C0300B70        PTE at C02DC07C
contains 642CF863      contains 2F336863
pfn 642cf ---DA--KWEV    pfn 2f336 ---DA--KWEV

1: kd> !pte b701fffe+3
               VA b7020001
PDE at   C0300B70        PTE at C02DC080
contains 642CF863      contains 00000080
pfn 642cf ---DA--KWEV                           not valid
                       DemandZero
                       Protect: 4 - ReadWrite

 
This is because trap processing code is found below the current ESP value and also 3rd-party virtual block drivers which I guess were trying to satisfy page fault (the latter not shown in the raw stack fragment here):

1: kd> dds esp-1000 esp
[...]
ac5e6f78  00000002
ac5e6f7c  899c05b0
ac5e6f80  88bd33f8
ac5e6f84  00000010
ac5e6f88  ac5e702c
ac5e6f8c  808457ff nt!KeContextFromKframes+0x9b
ac5e6f90  00000023
ac5e6f94  f7727120
ac5e6f98  00000000
ac5e6f9c  808458fd nt!KeContextFromKframes+0x2bc
ac5e6fa0  ac5e70dc
ac5e6fa4  1f840a42
ac5e6fa8  00000000
ac5e6fac  f7727000
ac5e6fb0  00000000
ac5e6fb4  f7727a7c
ac5e6fb8  ac5e6fd4
ac5e6fbc  808398d4 nt!KiDispatchInterrupt+0xd8
ac5e6fc0  00000000
ac5e6fc4  80a801ae hal!HalpDispatchSoftwareInterrupt+0x5e
ac5e6fc8  ac5e700c
ac5e6fcc  ac5e7001
ac5e6fd0  00000002
ac5e6fd4  ac5e6ff0
ac5e6fd8  80a80397 hal!HalpCheckForSoftwareInterrupt+0x3f
ac5e6fdc  00000002
ac5e6fe0  ac5e700c
ac5e6fe4  ac5e700c
ac5e6fe8  ac5e70b0
ac5e6fec  00000001
ac5e6ff0  f772f120
ac5e6ff4  88bd33f8
ac5e6ff8  00000002
ac5e6ffc  ac5e700c
ac5e7000  8a0a88a0
ac5e7004  88bd33f8
ac5e7008  f7727002
ac5e700c  80a8057e hal!HalEndSystemInterrupt+0x6e
ac5e7010  88bd33f8
ac5e7014  f7727002
ac5e7018  00000002
ac5e701c  ac5e702c
ac5e7020  80a80456 hal!KfLowerIrql+0x62
ac5e7024  f7727000
ac5e7028  0000bb40
ac5e702c  ac5e70ac
ac5e7030  808093eb nt!KiSaveProcessorState+0x20
ac5e7034  ac5e70dc
ac5e7038  00000000
ac5e703c  808093f0 nt!KiSaveProcessorState+0x25
ac5e7040  f772713c
ac5e7044  8087dcbd nt!KiFreezeTargetExecution+0x6a
ac5e7048  ac5e70dc
ac5e704c  00000000
ac5e7050  f7727120
ac5e7054  00000000
ac5e7058  80a7e501 hal!KeAcquireQueuedSpinLockRaiseToSynch+0x21
ac5e705c  88bd3401
ac5e7060  ac5e7070
ac5e7064  80a80456 hal!KfLowerIrql+0x62
ac5e7068  80a7e530 hal!KeReleaseInStackQueuedSpinLock
ac5e706c  88bd3401
ac5e7070  ac5e70b0
ac5e7074  80a7e56d hal!KeReleaseQueuedSpinLock+0x2d
ac5e7078  80823822 nt!KiDeliverApc+0x1cc
ac5e707c  00000000
ac5e7080  ac806e00
ac5e7084  00000200
ac5e7088  00000000
ac5e708c  88bd343c
ac5e7090  00000001
ac5e7094  ac5e7934
ac5e7098  89e45200
ac5e709c  809282c8 nt!CmpPostApc
ac5e70a0  00000000
ac5e70a4  0000010c
ac5e70a8  1d01f008
ac5e70ac  ac5e70dc
ac5e70b0  80837c86 nt!KiIpiServiceRoutine+0x8b
ac5e70b4  ac5e70dc
ac5e70b8  00000000
ac5e70bc  80836bfa nt!KiTrap0E+0xbc
ac5e70c0  b6f81168
ac5e70c4  ac5e7150
ac5e70c8  80a7d8fc hal!HalpIpiHandler+0xcc
ac5e70cc  ac5e70dc
ac5e70d0  00000000
ac5e70d4  80a80300 hal!HalpLowerIrqlHardwareInterrupts+0x10c
ac5e70d8  000000e1
ac5e70dc  ac5e7150
ac5e70e0  80836bfa nt!KiTrap0E+0xbc
ac5e70e4  badb0d00
ac5e70e8  dbeaffdb
ac5e70ec  ac5e70fc
ac5e70f0  80a80456 hal!KfLowerIrql+0x62
ac5e70f4  2f336801
ac5e70f8  ac806e00
ac5e70fc  ac5e7138
ac5e7100  8081a2bf nt!MmAccessFault+0x558
ac5e7104  b701fffe
ac5e7108  00000000
ac5e710c  00000000
ac5e7110  00000023
ac5e7114  00000023
ac5e7118  dbeaffdb
ac5e711c  88bd33f8
ac5e7120  fffff81c
ac5e7124  00000000
ac5e7128  ac5e72b0
ac5e712c  00000030
ac5e7130  b701fffe
ac5e7134  b6f81168
ac5e7138  ac5e71dc
ac5e713c  ac5e7150
ac5e7140  00000000
ac5e7144  80836bfa nt!KiTrap0E+0xbc
ac5e7148  00000008
ac5e714c  00000206
ac5e7150  ac5e71dc

What we may guess here is the fact that two page faults happened simultaneously or nearly at the same time and one of them possibly during the attempt to satisfy it and this resulted in two processors looping. The whole system was hang and the usual keyboard method via Scroll Lock was used to generate the manual dump.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.50

Tuesday, October 14th, 2008

“On Mind-Body problem and Afterlife: if Mind is merely a computation then what is left is a memory dump in the Body”.

Dmitry Vostokov

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.49

Tuesday, October 14th, 2008

Although complex OS and computer systems are designed to never stop they actually do:

“The only sure thing about” computation “is that it will” stop.

Bret Harte, The Outcasts of Poker Flat, The Luck of Roaring Camp

- Dmitry Vostokov @ DumpAnalysis.org -

5 Years at Citrix!

Tuesday, October 14th, 2008

I’ve just passed 5 year mark in my Citrix career and I say those years were really exciting! Incidentally at the same time, on 14th of October 2003 I started doing true memory dump analysis in WinDbg-sense of it despite the fact that I’d been doing debugging since 1987 and encountered the first symbolic assembly language debugger on PDP-11 RSX-11M Russian clone in 1988. Also 5 years ago I made my transition from software engineering world to software technical support and started my journey to become a manager.

I’ve put a timeline on a portal accessible from the main page or from the following link:

http://www.dumpanalysis.org/Timeline

The timeline will help me in the future to write my memoirs (Crash Dump: A Software Engineering Autobiography, ISBN: 978-1906717193).

Finally, to record the glorious Citrixofication of the world by various CitrixwareOpentask decided to publish a book where I plan to be an editor of various contributors and probably write a chapter or two myself:

Discovering Citrixware (ISBN: 978-1906717278) 

It is not about troubleshooting but more in a sense of discovery. More details will be announced later.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.48

Monday, October 13th, 2008

“… those who have a natural talent for” troubleshooting “are generally quick at every other kind of knowledge …” debugging “is a kind of knowledge in which the best natures should be trained, and which must not be given up.”

Plato, The Republic

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.47

Sunday, October 12th, 2008

Here is a bugtation to accompany my post about Debugging Architects:

“His brain, trained by long years of high” design “and plain” architecture, “had become too subtle, too refined an instrument for” debugging.

Stephen Leacock, Literary Lapses

- Dmitry Vostokov @ DumpAnalysis.org -

I’m Windows Internals certified!

Saturday, October 11th, 2008

Seems railroad to it was a success: just got this message in my e-mail:

Congratulations on passing your recent Microsoft Certification exam, inspiring confidence for your employer, your peers, and yourself with a widely-recognized validation of your skills on Microsoft technology.

Because I haven’t done any exam since Windows Internals beta I assumed that I passed it and I was right! After registering at Microsoft certification site as MCP I was able to build my logo:

Here is the link to Exam 70-660 information and required skills:

http://www.microsoft.com/learning/en/us/Exams/70-660.aspx

- Dmitry Vostokov @ DumpAnalysis.org -

The most downloaded and best rated Citrix tools

Friday, October 10th, 2008

The most downloaded is StressPrinters:

http://support.citrix.com/product/xa/v4.5/tool/?sortBy=READCOUNT

The best rated tools are:

http://support.citrix.com/product/xa/v4.5/tool/?sortBy=RATING

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.46

Friday, October 10th, 2008

“Good” troubleshooters “see analogies between” applications “or” services, “the very best ones see analogies between analogies.”

Stefan Banach

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.45

Thursday, October 9th, 2008

“If the” modules “in my” process “don’t work with or without” yours, “I cannot blame” you.

Francisco Alves, “If the brakes in my car don’t work with or without petrol in the fuel tank, I cannot blame the fuel”

- Dmitry Vostokov @ DumpAnalysis.org -

Early crash dump, blocked thread, not my version and lost opportunity: pattern cooperation

Thursday, October 9th, 2008

It was reported that one important Windows service stops responding from time to time. The customer was proactive in gathering memory dumps and we got several early crash dumps. Most of them were false positive showing normal error handling via throwing an exception:

0:042> kL
ChildEBP RetAddr 
0f7bec6c 77c31e37 kernel32!RaiseException+0×53
0f7bec84 77c32042 rpcrt4!RpcpRaiseException+0×24
0f7bec94 77cb30e4 rpcrt4!NdrGetBuffer+0×46
0f7bf080 09a554a6 rpcrt4!NdrClientCall2+0×197
[…]

However one such dump also had a clearly blocked thread which was blocking 10 different threads:

0:042> !locks

CritSec MyService!MainCriticalSection+0 at 0041b9a0
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       ad0
EntryCount         0
ContentionCount    0
*** Locked

CritSec +339fb8 at 00339fb8
WaiterWoken        No
LockCount          10
RecursionCount     1
OwningThread       ad0
EntryCount         0
ContentionCount    31
*** Locked

0:042> ~~[ad0]kL
ChildEBP RetAddr 
008dc1e0 7c94734b ntdll!KiFastSystemCallRet
008dc1e4 77d96c61 ntdll!NtOpenKey+0xc
008dc244 77d8e15f advapi32!LocalBaseRegOpenKey+0xd0
008dc278 6064fe47 advapi32!RegOpenKeyExA+0×11c
WARNING: Stack unwind information not available. Following frames may be wrong.
008dc8b8 6064fa00 NotMyDLL!getvar+0×4e7
[…]

Checking NotMyDLL module time stamp we identified Not My Version pattern because we expected much later version:

0:042> lmt m NotMyDLL
start    end        module name
60600000 60686000   NotMyDLL  Mon Oct 30 10:14:07 1999

We know this component often had problems in the past and although being stuck in registry access could be a coincidence, registry corruption or system-wide problem we immediately advised to upgrade the component to the latest stable version. We also got a manual dump of the service when the customer tried to restart it and it showed the signs of Lost Opportunity pattern:

0:000> kv
ChildEBP RetAddr Args to Child
1744fd44 7c947d0b 7c821d1e 00001b58 00000000 ntdll!KiFastSystemCallRet
1744fd48 7c821d1e 00001b58 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
1744fdb8 7c821c8d 00001b58 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
1744fdcc 67e223dd 00001b58 ffffffff 1744fdf4 kernel32!WaitForSingleObject+0x12
1744fde0 7c93a352 67e20000 00000000 00000001 MyDLL!DllInitialize+0xed
1744fe00 7c950e70 67e222f0 67e20000 00000000 ntdll!LdrpCallInitRoutine+0x14
1744feb8 7c8268a3 00000000 00000000 00000000 ntdll!LdrShutdownProcess+0x182
1744ffa4 7c826905 c0000005 77e8f3b0 ffffffff kernel32!_ExitProcess+0x43
1744ffb8 7c8392c1 c0000005 00000000 00000000 kernel32!ExitProcess+0×14
1744ffec 00000000 77c4b0f5 0b644720 00000000 kernel32!BaseThreadStart+0×5f

0:000> !teb
TEB at 7ff4b000
    ExceptionList:        1744fda8
    StackBase:            17450000
    StackLimit:           17449000
    SubSystemTib:         00000000
    FiberData:            00001e00
    ArbitraryUserPointer: 00000000
    Self:                 7ff4b000
    EnvironmentPointer:   00000000
    ClientId:             00001e90 . 00001168
    RpcHandle:            00000000
    Tls Storage:          00000000
    PEB Address:          7ffdd000
    LastErrorValue:       0
    LastStatusValue:      103
    Count Owned Locks:    0
    HardErrorMode:        0

0:000> dds 17449000 17450000
[...]
1744f4b0  7c94775b ntdll!NtRaiseHardError+0xc
1744f4b4  7c842610 kernel32!UnhandledExceptionFilter+0×51a
1744f4b8  d0000144
1744f4bc  00000000
[…]

0:000> !error d0000144
Error code: (NTSTATUS) 0xd0000144 (3489661252) - {Application Error} The exception %s (0x%08lx) occurred in the application at location 0x%08lx.

Therefore we additionally advised to dump the process manually using userdump.exe when an error message box appears on the console session. We hope that getting right dump files at the right time via the right method would prove or disprove our hypothesis about NotMyDLL component.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.44

Thursday, October 9th, 2008

“I’m gonna do better than learn to” troubleshoot, “I’m gonna learn to” debug.

Alexander Murray Palmer Haley, Roots

- Dmitry Vostokov @ DumpAnalysis.org -

Dump Analysis Library

Wednesday, October 8th, 2008

Crash Dump Analysis Portal (www.DumpAnalysis.org) has changed its menu bar:

  • - Books menu item is renamed to Reviews.

  • - Added new menu item Library which features many books from my collection partially pictured on the front cover of Memory Dump Analysis Anthology volumes.

You can also access the library from this link:

http://www.dumpanalysis.org/Library 

Currently you can browse about estimated 80% of the collection bought from Amazon.com. Books bought from Amazon (.co.uk | .de | .fr | .ca) and local bookshops will be added in the future.

- Dmitry Vostokov @ DumpAnalysis.org -