Archive for July, 2011

Trace Analysis Patterns (Part 41)

Thursday, July 28th, 2011

UI Message pattern is very useful for troubleshooting system-wide issues because we can map visual behaviour to various activity regions and consider such messages as significant events.

#    Module  PID  TID  Time         Message
2782 ModuleA 2124 5648 10:58:03.356 CreateWindow: Title "..." Class "..."
3512 ModuleA 2124 5648 10:58:08.154 Menu command: Save Data
3583 ModuleA 2124 5648 10:58:08.155 CreateWindow: Title "Save As" Class "Dialog"
[... Data update and replication related messages ...]
4483 ModuleA 2124 5648 10:58:12.342 DestroyWindow: Title "Save As" Class "Dialog"

By filtering the emitting module we can create an adjoint thread:

#    Module  PID  TID  Time         Message
2782 ModuleA 2124 5648 10:58:03.356 CreateWindow: Title "..." Class "..."
3512 ModuleA 2124 5648 10:58:08.154 Menu command: Save Data
3583 ModuleA 2124 5648 10:58:08.155 CreateWindow: Title "Save As" Class "Dialog"
4483 ModuleA 2124 5648 10:58:12.342 DestroyWindow: Title "Save As" Class "Dialog"

- Dmitry Vostokov @ + -

Second Eye (Debugging Slang, Part 25)

Wednesday, July 27th, 2011

Second Eye (or sometimes a stronger variant “second pair of eyes”) - another engineer you typically need when you don’t see anything useful in a memory dump, software trace or source code for problem resolution purposes. You are anxious to recommend something useful.

Examples: Don’t see anything in this huge trace. I need a second eye.

- Dmitry Vostokov @ + -

The First Evidence for Process Resurrection

Saturday, July 23rd, 2011

Recently analyzed a process memory dump and noticed that it (up and running) survived system reboot :-)

0:000> version
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS Personal
kernel32.dll version: 6.0.6000.16386 (vista_rtm.061101-2205)
Machine Name:
Debug session time: Tue Jul 12 16:53:07.000 2011 (UTC + 1:00)
System Uptime: 0 days 1:27:04.516
Process Uptime: 1 days 4:05:35.000
  Kernel time: 0 days 0:00:13.000
  User time: 0 days 0:00:04.000

I have a hypothesis how this could have happened. Interested in knowing yours. I’ll write mine later on.

- Dmitry Vostokov @ + -

A Visit from Vatican

Friday, July 22nd, 2011

I’m pleased to announce that I had a visitor from Vatican City (as reported by Google Analytics):

I hope they were interested in Memory Religion (Memorianity) where I have the title of Memoriarch.

- Dmitry Vostokov @ + -

Vacuum Pages

Friday, July 22nd, 2011

It came to my attention that almost every huge or not so x64 kernel or complete memory dump is diagnosed with excessive pool usage. Sometimes it is too excessive like in the following example:

0: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:     8387414 (  33549656 Kb)
 Page File: \??\D:\pagefile.sys
   Current:  33856856 Kb  Free Space:  33855520 Kb
   Minimum:  33856856 Kb  Maximum:     46364420 Kb
 Available Pages:     7231844 (  28927376 Kb)
 ResAvail Pages:      7763458 (  31053832 Kb)
 Locked IO Pages:           0 (         0 Kb)
 Free System PTEs:   33556220 ( 134224880 Kb)
 Modified Pages:         2759 (     11036 Kb)
 Modified PF Pages:      2759 (     11036 Kb)
 NonPagedPool Usage: 650867425 (2603469700 Kb)
 NonPagedPoolNx Usage:  83715 (    334860 Kb)
 NonPagedPool Max:    6271754 (  25087016 Kb)
 ********** Excessive NonPaged Pool Usage *****
 PagedPool 0 Usage:     48923 (    195692 Kb)
 PagedPool 1 Usage:     39797 (    159188 Kb)
 PagedPool 2 Usage:     37412 (    149648 Kb)
 PagedPool 3 Usage:     37536 (    150144 Kb)
 PagedPool 4 Usage:     37453 (    149812 Kb)
 PagedPool Usage:      201121 (    804484 Kb)
 PagedPool Maximum:  33554432 ( 134217728 Kb)
 Session Commit:        15829 (     63316 Kb)
 Shared Commit:          7198 (     28792 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:       158498 (    633992 Kb)
 PagedPool Commit:     201147 (    804588 Kb)
 Driver Commit:          5761 (     23044 Kb)
 Committed pages:     1126203 (   4504812 Kb)
 Commit limit:       16851145 (  67404580 Kb)

What we can see above is that the amount of used nonpaged pool is more than 2.5 Tb which is far less than the amount of physical memory + page file size (both in total do not exceed 100 Gb). So I conclude that Windows architects did the impossible and are able to create information (pages) from vacuum like matter can be created from vacuum fluctuations. Perhaps they are a step closer to implement some features from Cantor OS.

- Dmitry Vostokov @ + -

Crash Dump Analysis Patterns (Part 146)

Friday, July 15th, 2011

One way to quickly check for something suspicious in a memory dump is to convert it to a debugger log (for example, stack trace collection) and search for textual strings such as “Waiting for“, “Terminate“, “Stop”, “Mutant“, “Exception“, “Crit“, “MessageBox“, “SuspendCount“, etc. The vocabulary, of course, is OS dependent, can have false positives, and can change over time. We name this pattern Problem Vocabulary. It is similar to Vocabulary Index software trace analysis pattern.

For example, recently in a complete memory dump involving lots of ALPC wait chains with potential inter-process deadlock we found the following thread having long waiting time (exceeding ALPC threads waiting times) pointing to a process object to examine further:

THREAD fffffa801338b950  Cid 02a0.7498  Teb: 000007fffffd8000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    fffffa8012a39b30  ProcessObject
Not impersonating
DeviceMap                 fffff8a000008a70
Owning Process            fffffa800a31d040       Image:         smss.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      9752080        Ticks: 5334204 (0:23:09:06.937)
Context Switch Count      38            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address ntdll!TppWorkerThread (0×000000007722fbc0)
Stack Init fffff88020259db0 Current fffff88020259900
Base fffff8802025a000 Limit fffff88020254000 Call 0
Priority 11 BasePriority 11 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP          RetAddr           Call Site
fffff880`20259940 fffff800`01693f92 nt!KiSwapContext+0×7a
fffff880`20259a80 fffff800`016967af nt!KiCommitThreadWait+0×1d2
fffff880`20259b10 fffff800`01984b2e nt!KeWaitForSingleObject+0×19f
fffff880`20259bb0 fffff800`0168df93 nt!NtWaitForSingleObject+0xde
fffff880`20259c20 00000000`7726135a nt!KiSystemServiceCopyEnd+0×13 (TrapFrame @ fffff880`20259c20)
00000000`0048f648 00000000`48026517 ntdll!NtWaitForSingleObject+0xa
00000000`0048f650 00000000`480269c4 smss!SmpTerminateCSR+0xa3
00000000`0048f6a0 00000000`48023670 smss!SmpStopCsr+0×44
00000000`0048f6d0 00000000`77288137 smss!SmpApiCallback+0×338
00000000`0048f900 00000000`7722feff ntdll! ?? ::FNODOBFM::`string’+0×1f718
00000000`0048f990 00000000`77274a00 ntdll!TppWorkerThread+0×3f8
00000000`0048fc90 00000000`00000000 ntdll!RtlUserThreadStart+0×25

In that process we could see a blocking module and recommended to contact its vendor: 

- Dmitry Vostokov @ + -

User Interface Problem Analysis Patterns (Part 1)

Thursday, July 14th, 2011

As a part of unified debugging pattern and generative debugging approach we extend software behavior analysis patterns such as memory dump and software trace analysis with UI abnormal behaviour patterns. Here by abnormality we mean behavior that users should not encounter while using software. Typical example is some error message or GUI distortion during execution of a functional use case. Such patterns will extend software behavior analysis pattern language we use for description of various post-construction software problems.

The first pattern we start with is called Error Message Box and we link it to Message Box and Self-Diagnosis memory analysis patterns. You can download x86 and x64 modeling examples from this location:

When we start the application it shows a message box:

We then launch Task Manager and find the window:

Then we save a crash dump using right-click context menu:

When we open the process memory dump we see this stack trace:

0:000> ~*kL

.  0  Id: d30.71c Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
Child-SP          RetAddr           Call Site
00000000`002ff1e8 00000000`77837214 user32!ZwUserWaitMessage+0xa
00000000`002ff1f0 00000000`778374a5 user32!DialogBox2+0x274
00000000`002ff280 00000000`778827f0 user32!InternalDialogBox+0x135
00000000`002ff2e0 00000000`77881ae5 user32!SoftModalMessageBox+0x9b4
00000000`002ff410 00000000`7788133b user32!MessageBoxWorker+0x31d
00000000`002ff5d0 00000000`77881232 user32!MessageBoxTimeoutW+0xb3
00000000`002ff6a0 00000001`3ffa101d user32!MessageBoxW+0×4e
00000000`002ff6e0 00000001`3ffa1039 UIPMessageBox!bar+0×1d
00000000`002ff710 00000001`3ffa1052 UIPMessageBox!foo+0×9
00000000`002ff740 00000001`3ffa11ea UIPMessageBox!wmain+0×12
00000000`002ff770 00000000`7770f56d UIPMessageBox!__tmainCRTStartup+0×15a
00000000`002ff7b0 00000000`77942cc1 kernel32!BaseThreadInitThunk+0xd
00000000`002ff7e0 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

We see there that foo function called bar function which displayed the message box. In real scenarios function name could me more meaningful and give a clue for troubleshooting and debugging in addition to message text:

0:000> ub 00000001`3ffa101d
00000001`3ffa0fff add     byte ptr [rax-7Dh],cl
00000001`3ffa1002 in      al,dx
00000001`3ffa1003 sub     byte ptr [rbp+33h],al
00000001`3ffa1006 leave
00000001`3ffa1007 lea     r8,[UIPMessageBox!__mnames+0×28 (00000001`3ffa83c8)]
00000001`3ffa100e lea     rdx,[UIPMessageBox!__mnames+0×38 (00000001`3ffa83d8)]
00000001`3ffa1015 xor     ecx,ecx
00000001`3ffa1017 call    qword ptr [UIPMessageBox!_imp_MessageBoxW (00000001`3ffa71d8)]

0:000> du 00000001`3ffa83c8
00000001`3ffa83c8  “Problem”

0:000> du 00000001`3ffa83d8
00000001`3ffa83d8  “We have a problem!”

- Dmitry Vostokov @ + -

Raw Stack from Laterally Damaged Memory Dumps

Thursday, July 14th, 2011

Sometimes TEB information is missing from laterally damaged dumps:

0:010> !teb
TEB at 000007fffff9c000
    ExceptionList:        0000000000000000
    StackBase:            0000000000000000
    StackLimit:           0000000000000000
    SubSystemTib:         0000000000000000
    FiberData:            0000000000000000
    ArbitraryUserPointer: 0000000000000000
    Self:                 0000000000000000
    EnvironmentPointer:   0000000000000000
    ClientId:             0000000000000000 . 0000000000000000
    RpcHandle:            0000000000000000
    Tls Storage:          0000000000000000
    PEB Address:          0000000000000000
    LastErrorValue:       0
    LastStatusValue:      0
    Count Owned Locks:    0
    HardErrorMode:        0

In such cases if stack trace is present we can get raw stack data with associated symbolic information by using ChildEBP (x86) or Child-SP (x64) columns:

0:010> kL
Child-SP          RetAddr           Call Site
00000000`0310ec88 000007fe`fd2313a6 ntdll!NtWaitForMultipleObjects+0xa
00000000`0310ec90 00000000`77023143 KERNELBASE!WaitForMultipleObjectsEx+0xe8
00000000`0310ed90 00000000`77099025 kernel32!WaitForMultipleObjectsExImplementation+0xb3
00000000`0310ee20 00000000`770991a7 kernel32!WerpReportFaultInternal+0×215
00000000`0310eec0 00000000`770991ff kernel32!WerpReportFault+0×77
00000000`0310eef0 00000000`7709941c kernel32!BasepReportFault+0×1f
00000000`0310ef20 00000000`772b6228 kernel32!UnhandledExceptionFilter+0×1fc
00000000`0310f000 00000000`77234f48 ntdll! ?? ::FNODOBFM::`string’+0×22c5
00000000`0310f030 00000000`77254f6d ntdll!_C_specific_handler+0×8c
00000000`0310f0a0 00000000`77235b2c ntdll!RtlpExecuteHandlerForException+0xd
00000000`0310f0d0 00000000`7726f638 ntdll!RtlDispatchException+0×3cb
00000000`0310f7b0 00000000`000a1760 ntdll!KiUserExceptionDispatcher+0×2e
00000000`0310fd68 000007fe`f6c1ba28 0xa1760
00000000`0310fd70 000007fe`fb5c4744 ModuleA!Close+0×88
00000000`0310fdb0 000007fe`fb5c7603 ModuleB!Close+0×38
00000000`0310fde0 00000000`7701f56d ModuleB!WorkItem+0×5b
00000000`0310fe10 00000000`77252cc1 kernel32!BaseThreadInitThunk+0xd
00000000`0310fe40 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

0:010> dps 00000000`0310ec88 00000000`0310fe40
00000000`0310ec88  000007fe`fd2313a6 KERNELBASE!WaitForMultipleObjectsEx+0xe8
00000000`0310fe38  00000000`77252cc1 ntdll!RtlUserThreadStart+0×1d
00000000`0310fe40  00000000`00000000

- Dmitry Vostokov @ + -


Tuesday, July 12th, 2011

- Dmitry Vostokov @ + -

Cloud Traces

Tuesday, July 12th, 2011

I was inspired today for a new comic art movement called Computicart (Computical Art). The first composition is called Cloud Traces (remember that memory dumps are just bigger software traces and software traces are just smaller memory dumps):

- Dmitry Vostokov @ + -

Airport Terminal Services Incident

Sunday, July 10th, 2011

A week ago I was flying from Dublin airport and observed “terminal services” incident first hand being the first in the registration queue for an hour while waiting for the problem resolution and “live debugging”. The system couldn’t print tickets. The following dialog phrases were very amusing to hear:

- Press F6 to initialize PE

- A message reads: “Can’t associate a printer with terminal session. Please enter the printer address.”

PS. The problem was resolved by booting the previous system version. It’s aways Mr. Upgrade who’s at fault…

- Dmitry Vostokov @ + -

New Published Books

Sunday, July 10th, 2011

The following books have been published and available on Amazon and B&N:

- Hardcover version of Memory Dump Analysis Anthology, Volume 5

- Memory Dump Analysis Anthology: Color Supplement for Volumes 4-5


- Introduction to Pattern-Driven Software Problem Solving

- Dmitry Vostokov @ + -