Archive for February, 2009

Bugtation No.85

Saturday, February 28th, 2009

A contribution to Software Resistentialism:

Software objects can be classified scientifically into three major categories: those that don’t work, those that crash and those that hang.

Russell Wayne Baker

- Dmitry Vostokov @ DumpAnalysis.org -

Bugtation No.84

Friday, February 27th, 2009

“Don’t” guess “it, get a larger” dump.

Anthony’s Law of Force

- Dmitry Vostokov @ DumpAnalysis.org -

Debugger Log Reading Techniques (Part 1)

Thursday, February 26th, 2009

Debugger logs (textual output) from commands like !process 0 ff and various scripts can be very long and consist of thousands of pages. I found the following reading technique useful for my daily memory dump analysis activities:

CSA-QSA

Checklists-Skim-Analyze—Questions-Survey-Analyze   

1. First, have a checklist

2. Skim through the log several times

3. Write analysis notes

4. Have a list of questions based on problem description and steps 1-3

5. Survey the log

6. Write analysis notes

Repeat steps 2,3 and 5,6 if necessary.

This technique can also be applied to reading any large logs, for example, voluminous CDF or ETW traces.

- Dmitry Vostokov @ DumpAnalysis.org -

Cantor Operating System (Part 1)

Wednesday, February 25th, 2009

Named after Georg Cantor CAN.TOR.OS brings computation from the distant future into today. The transfinite worldview and universe of tomorrow into the finite worldview and universe of today. Cantor OS drives transfinite computing and saves transfinite memory dumps. More on this in subsequent parts as I have to come back to finite memory dumps… One cautious note though: transfinite doesn’t mean absolute infinity, or God-like computation, the latter is the realm of Memory Religion

(∞) TOR is a new transfinite operation in addition to finite OR, AND or XOR 

- Dmitry Vostokov @ DumpAnalysis.org -

Transfinite Memory Dumps (Part 1)

Wednesday, February 25th, 2009

These dumps are larger than any finite memory dump and contain all of them inside (see the definition of a transfinite number). Think about them as a variant of the Library of Babel where all possible memory snapshots of your Windows or Linux PC are stored including Googol dumps. If you have some code then all possible code defects are there too. An interesting question then arises. If this dump is collected what kind of patterns we can see there? Are these patterns extrapolated infinite versions of finite patterns or there come new ones specific to transfinite computations? More on this in the next parts.

- Dmitry Vostokov @ DumpAnalysis.org -

Debugged Paper for Debugged! Magazine

Tuesday, February 24th, 2009

The first issue of Debugged! MZ/PE magazine is going to be printed on debugged paper (not to confused with common bugs-free paper used in publishing houses and printing factories). Once you open the first issue you would instantly recognize that!

- Dmitry Vostokov @ DumpAnalysis.org -

WDPF Book is #1 Assembly Language Bestseller

Monday, February 23rd, 2009

Looked this evening at Amazon and found that the book achieved #1 status (although it might not be the case at the time when you are reading this post):

#1 in  Books > Computers & Internet > Programming > Languages & Tools > Assembly Language Programming

- Dmitry Vostokov @ DumpAnalysis.org -

Riemann Programming Language

Monday, February 23rd, 2009

Named after Bernhard Riemann, this programming language gives software defects first-class status as alternative branches of computation, comparable with multivalued functions and Riemann surfaces. Bugs become first-class constructs. It is reflected in the language syntax, semantics and pragmatics. More on this later.

- Dmitry Vostokov @ DumpAnalysis.org -

OSMOSIS Memory Dumps

Monday, February 23rd, 2009

The main problem of memory dump analysis is the lack of consistent kernel virtual memory dumps saved on demand without system halt. LiveKd and Win32DD tools are physical memory dumpers only and do not save kernel memory dump files. These dumps are known to be inconsistent and I elaborated on different schemes to save memory consistently, for example, 1) to partition physical memory into 2 parts from OS boot time, 2) when memory snapshot is needed raise IRQL on all processors, 3) pump memory contents from one part to another (with compression if necessary, in such partition the reserved part of physical memory could be smaller), 4) lower IRQL on all processors to resume normal OS functions and 5) save consistent memory snapshot from reserved part of physical memory to a dump file in the background. The crucial feature of osmosis is its bipartite division and membrane. Hence the name of the project: 

OSMOSIS

Optimally Saved Memory of System Internal State

Optimally Saved Memory (of) Operating System Internal State

 

This is, of course, for OS running on physical machines, virtual machine case is much simpler in theory because we can freeze the whole VM or save its snapshot and later run an external tool or file converter on it.

- Dmitry Vostokov @ DumpAnalysis.org -

Book Update: Crash Dump Analysis for SA

Friday, February 20th, 2009

One of the good outcomes of the previously announced restructuring: the book Crash Dump Analysis for System Administrators (Windows edition) has been prioritized to be published on 30th of November, 2009 due to the overwhelming demand. The book will soon be available for pre-orders.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 81)

Thursday, February 19th, 2009

One analysis problem that happens frequently is the absence of stack traces due to kernel stack pages being paged out and therefore not present in a complete memory dump that only contains physical memory. This shouldn’t be a problem for kernel or process memory dumps because they contain virtual memory. The problem usually manifests itself either on busy systems utilizing almost all physical memory or on over-aged systems where certain processes hadn’t been used for a long time. It could also be the case when a problem happened some time ago and only diagnosed much later. For example, this LPC wait chain for coupled processes happened to be 2 days ago before the dump was saved:

0: kd> !thread fffffadfcf9e8bf0 1f
THREAD fffffadfcf9e8bf0  Cid 61f0.2c70  Teb: 000007fffffdd000 Win32Thread: fffff97ff381a480 WAIT: (Unknown) UserMode Non-Alertable
    fffffadfcf9e8f58  Semaphore Limit 0x1
Waiting for reply to LPC MessageId 01e2cb39:
Current LPC port fffffa800e5a9d10
Impersonation token:  fffffa80039cd050 (Level Impersonation)
Owning Process fffffadffc7c7c20      
Image:         applicationA.exe
Wait Start TickCount      12018444           
Ticks: 11312740 (2:01:06:01.562)
Context Switch Count      456                 LargeStack
UserTime                  00:00:00.046
KernelTime                00:00:00.078
Start Address applicationA (0×0000000100061411)
Stack Init fffffadc125d4e00 Current fffffadc125d48e0
Base fffffadc125d5000 Limit fffffadc125cc000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0
Kernel stack not resident.

0: kd> !lpc message 01e2cb39
Searching message 1e2cb39 in threads …
    Server thread fffffadff93c5bf0 is working on message 1e2cb39

0: kd> !thread fffffadff93c5bf0 1f
THREAD fffffadff93c5bf0  Cid 0218.5130  Teb: 000007ffffcbc000 Win32Thread: 0000000000000000 WAIT: (Unknown) UserMode Non-Alertable
    fffffadff6c71c70  SynchronizationEvent
Impersonation token:  fffffa803bde5060 (Level Impersonation)
Owning Process fffffadcde439280     
Image:         applicationB.exe
Wait Start TickCount      12018444        
Ticks: 11312740 (2:01:06:01.562)
Context Switch Count      12            
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0×0000000001e2cb39
LPC Server thread working on message Id 1e2cb39
Start Address kernel32 (0×0000000077d6b6a0)
Stack Init fffffadc28b19e00 Current fffffadc28b19950
Base fffffadc28b1a000 Limit fffffadc28b14000 Call 0
Priority 14 BasePriority 13 PriorityDecrement 0
Kernel stack not resident.

One of tricks I started to recommend in such cases is to save user dumps of processes that could possibly be paged out before forcing a complete memory dump but I haven’t yet got enough data to see whether this helps or not.

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Dump Analysis Pattern: A Definition

Tuesday, February 17th, 2009

I found today that I never came up with a definition for a memory dump analysis pattern and only mentioned similar design pattern definition in the first pattern post. Now it is time to draft it:

a common recurrent identifiable problem together with a set of recommendations and possible solutions to apply in a specific context

In memory dump analysis pattern catalogue some identified patterns are left without a set of recommendations and I’m going to rectify this in the nearest future.

- Dmitry Vostokov @ DumpAnalysis.org -

Wait Chain Patterns

Tuesday, February 17th, 2009

While preparing a presentation for an engineering conference I found that a page to reference various wait chain patterns is necessary, so I created this post:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org -

DLL Link Patterns

Tuesday, February 17th, 2009

While preparing a presentation for an engineering conference I found that another different page to reference link time problems is necessary, so I created this post:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org -

Memory Consumption Patterns

Tuesday, February 17th, 2009

While preparing a presentation for an engineering conference I found that yet another page to reference different kinds of insufficient memory patterns is necessary, so I created this post:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org -

Dynamic Memory Corruption Patterns

Tuesday, February 17th, 2009

While preparing a presentation for an engineering conference I found that another page to reference different kinds of dynamic memory corruption patterns is necessary, so I created this post too:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org -

Deadlock and Livelock Patterns

Tuesday, February 17th, 2009

While preparing a presentation for an engineering conference I found that a page to reference all different kinds of deadlocks is necessary, so I created this post:

I’ll update it as soon as I add more similar patterns.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

The Mystery of Negative LockCount

Monday, February 16th, 2009

I’ve always wondered why LockCount is sometimes negative in the latest OS versions since SPs for Windows Server 2003:

0:000> !locks

CritSec +319ee0 at 00319EE0
LockCount -90
RecursionCount 1
OwningThread 1198
EntryCount 0
ContentionCount 734a6
*** Locked

CritSec +202e3d0 at 0202E3D0
LockCount -2
RecursionCount 1
OwningThread 1198
EntryCount 0
ContentionCount 0
*** Locked

CritSec +1b69ad4 at 01B69AD4
LockCount -2
RecursionCount 1
OwningThread 1198
EntryCount 0
ContentionCount 0
*** Locked

Finally the post on NT Debugging Blog pointed to WinDbg Help article (the section Interpreting Critical Section Fields in Windows Server 2003 SP1 and Later):

http://msdn.microsoft.com/en-us/library/cc267476.aspx

Be aware though that negative values could also mean Critical Section Corruption pattern:

CritSec printerA!DlgProc+53b38 at 107cddd8
WaiterWoken Yes
LockCount -69154612
RecursionCount 276618309
OwningThread 107cdd36

EntryCount 0
ContentionCount 0
*** Locked 

- Dmitry Vostokov @ DumpAnalysis.org -

Insufficient memory, handle leak, process factory, high contention and busy system: pattern cooperation

Monday, February 16th, 2009

It was reported that one system became slower and slower and eventually the complete memory dump was generated for analysis. Usual !vm 4 command shows pool allocation errors and particularly insufficient session pool memory for user session 1:

1: kd> !vm 4

*** Virtual Memory Usage ***
 Physical Memory:     1048503 (   4194012 Kb)
 Page File: \??\C:\pagefile.sys
   Current:   5109760 Kb  Free Space:   3668764 Kb
   Minimum:   5109760 Kb  Maximum:      5109760 Kb
 Available Pages:      409409 (   1637636 Kb)
 ResAvail Pages:       708135 (   2832540 Kb)
 Locked IO Pages:         108 (       432 Kb)
 Free System PTEs:     137566 (    550264 Kb)
 Free NP PTEs:          27288 (    109152 Kb)
 Free Special NP:           0 (         0 Kb)
 Modified Pages:          780 (      3120 Kb)
 Modified PF Pages:       771 (      3084 Kb)
 NonPagedPool Usage:    37587 (    150348 Kb)
 NonPagedPool Max:      65214 (    260856 Kb)

 PagedPool 0 Usage:     21583 (     86332 Kb)
 PagedPool 1 Usage:      3054 (     12216 Kb)
 PagedPool 2 Usage:      3076 (     12304 Kb)
 PagedPool 3 Usage:      3050 (     12200 Kb)
 PagedPool 4 Usage:      3040 (     12160 Kb)
 PagedPool Usage:       33803 (    135212 Kb)
 PagedPool Maximum:     66560 (    266240 Kb)

 ********** 185454 pool allocations have failed **********

 Shared Commit:        123289 (    493156 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:       101937 (    407748 Kb)
 PagedPool Commit:      33850 (    135400 Kb)
 Driver Commit:          1697 (      6788 Kb)
 Committed pages:     1159581 (   4638324 Kb)
 Commit limit:        2283111 (   9132444 Kb)

[...]

 Session ID 1 @ af925000:
 Paged Pool Usage:       16868K

 *** 371034 Pool Allocation Failures ***

 Commit Usage:           17960K

 Session ID 2 @ af927000:
 Paged Pool Usage:        2856K
 Commit Usage:            3860K

[...]

The exploration of session 1 processes shows Process Factory pattern (5,000 launched processes) with explorer.exe ran amok:

1: kd> !sprocess 1
Dumping Session 1

_MM_SESSION_SPACE af925000
_MMSESSION        af925d80
PROCESS 87db34a0  SessionId: 1  Cid: 4a68    Peb: 7ffd4000  ParentCid: 01b8
    DirBase: cffb6d20  ObjectTable: e6084268  HandleCount: 11677.
    Image: csrss.exe

PROCESS 86083020  SessionId: 1  Cid: 4560    Peb: 7ffde000  ParentCid: 2978
    DirBase: cffb6f40  ObjectTable: e6c41410  HandleCount:  79.
    Image: AnotherApp.exe

PROCESS 8607c020  SessionId: 1  Cid: 44c8    Peb: 7ffdc000  ParentCid: 4cf8
    DirBase: cffb7080  ObjectTable: e3c9fd38  HandleCount: 25407.
    Image: explorer.exe

[...]

PROCESS 85e94738  SessionId: 1  Cid: 4868    Peb: 7ffde000  ParentCid: 44c8
    DirBase: cffb7c00  ObjectTable: e7038840  HandleCount:  39.
    Image: application.exe

PROCESS 85e5c020  SessionId: 1  Cid: 4668    Peb: 7ffd4000  ParentCid: 4a50
    DirBase: cffb7de0  ObjectTable: e11891d0  HandleCount: 172.
    Image: application.exe

PROCESS 85e51578  SessionId: 1  Cid: 4c9c    Peb: 7ffde000  ParentCid: 44c8
    DirBase: cffb7e40  ObjectTable: e2ee0070  HandleCount:  39.
    Image: application.exe

[...]

PROCESS 85c81020  SessionId: 1  Cid: 53a4    Peb: 7ffd7000  ParentCid: 44c8
    DirBase: cffb89c0  ObjectTable: e6d2f600  HandleCount:  39.
    Image: application.exe

PROCESS 85c6fb18  SessionId: 1  Cid: 53a8    Peb: 7ffd7000  ParentCid: 44c8
    DirBase: cffb89e0  ObjectTable: e54df078  HandleCount:  39.
    Image: application.exe

PROCESS 85c60020  SessionId: 1  Cid: 53bc    Peb: 7ffdf000  ParentCid: 44c8
    DirBase: cffb8a40  ObjectTable: e1214e90  HandleCount:  39.
    Image: application.exe

[...]

We can also see the very big number of handles in both csrss.exe and explorer.exe. If we look at one of 5,000 application.exe processes we would see that it is a GUI application and this explains session pool exhaustion:

1: kd> !process 85c60020 1f
PROCESS 85c60020  SessionId: 1  Cid: 53bc    Peb: 7ffdf000  ParentCid: 44c8
    DirBase: cffb8a40  ObjectTable: e1214e90  HandleCount:  39.
    Image: application.exe
    VadRoot 85c97450 Vads 80 Clone 0 Private 173. Modified 159. Locked 0.
    DeviceMap e3743340
    Token                             e2ead5e0
    ElapsedTime                       00:54:17.218
    UserTime                          00:00:00.015
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         32460
    QuotaPoolUsage[NonPagedPool]      3200
    Working Set Sizes (now,min,max)  (30, 50, 345) (120KB, 200KB, 1380KB)
    PeakWorkingSetSize                778
    VirtualSize                       16 Mb
    PeakVirtualSize                   17 Mb
    PageFaultCount                    829
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      212

THREAD 85c25db0  Cid 53bc.5470  Teb: 7ffde000 Win32Thread: bd1a18a8 WAIT: (Unknown) UserMode Non-Alertable
    85bbf2a8  SynchronizationEvent
Not impersonating
DeviceMap                 e3743340
Owning Process            85c60020       Image:         application.exe
Wait Start TickCount      1121133        Ticks: 206877 (0:00:53:52.453)
Context Switch Count      19355  NoStackSwap    LargeStack
UserTime                  00:00:00.015
KernelTime                00:00:00.000
Win32 Start Address application (0x00402a38)
Start Address kernel32!BaseProcessStartThunk (0x7c8217f8)
Stack Init ad2b8000 Current ad2b7b1c Base ad2b8000 Limit ad2b4000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr 
ad2b7b34 80833ec5 nt!KiSwapContext+0x26
ad2b7b60 80829bc0 nt!KiSwapThread+0x2e5
ad2b7ba8 bf89aacc nt!KeWaitForSingleObject+0x346
ad2b7c04 bf8c4816 win32k!xxxSleepThread+0×1be
ad2b7ca0 bf8a134f win32k!xxxInterSendMsgEx+0×798
ad2b7cec bf85ae3c win32k!xxxSendMessageTimeout+0×1f3
ad2b7d10 bf8c13dc win32k!xxxWrapSendMessage+0×1b
ad2b7d40 8088ad3c win32k!NtUserMessageCall+0×9d

ad2b7d40 7c9485ec nt!KiFastCallEntry+0xfc (TrapFrame @ ad2b7d64)
0012f814 00000000 ntdll!KiFastSystemCallRet

We also see extremely high executive resource contention and various signs of a busy system like the fact that all processors are busy and very high number of ready threads at normal priority (about 1,000):

1: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****

Resource @ DriverA (0xbae016a0)    Exclusively owned
    Contention Count = 277088870
    NumberOfExclusiveWaiters = 3591
     Threads: 83b46db0-01<*> << Too many exclusive waiters to list>>

1: kd> !running

System Processors ff (affinity mask)
  Idle Processors 0

     Prcb      Current   Next   
  0  ffdff120  86300b40            …………….
  1  f772f120  f77320a0  852fadb0  …………….
  2  f7737120  f773a0a0  861528d0  …………….
  3  f773f120  f77420a0  877a4020  …………….
  4  f7747120  f774a0a0  865909a8  …………….
  5  f774f120  f77520a0  83d62cd0  …………….
  6  f7757120  f775a0a0  841aa020  …………….
  7  f775f120  f77620a0  ffb7b5e0  …………….

1: kd> !ready
Processor 0: No threads in READY state
Processor 1: Ready Threads at priority 10
    THREAD 861b1898  Cid 4860.4894  Teb: 7ffd8000 Win32Thread: bc1669a8 READY
Processor 1: Ready Threads at priority 9
    THREAD 83b46db0  Cid 44c8.a11c  Teb: 7f8f8000 Win32Thread: bcbabb38 READY
Processor 1: Ready Threads at priority 8
    THREAD 851408d0  Cid 6338.6398  Teb: 7ffdf000 Win32Thread: bc9bfbb8 READY
    THREAD 83eb1730  Cid 8f30.9568  Teb: 7ffdf000 Win32Thread: 00000000 READY
    THREAD fd310af8  Cid d53c.b4d8  Teb: 7ffdf000 Win32Thread: 00000000 READY
    THREAD 83bce8d0  Cid abec.6450  Teb: 7ffdd000 Win32Thread: 00000000 READY
    THREAD 83de2388  Cid a618.d79c  Teb: 7ffde000 Win32Thread: 00000000 READY
    THREAD 84197b40  Cid 44c8.6974  Teb: 7fe6a000 Win32Thread: bc662d48 READY
[…]

Processor 1: Ready Threads at priority 7
    THREAD 86d0c890  Cid 3524.365c  Teb: 7ff8e000 Win32Thread: bc403bb0 READY
Processor 2: No threads in READY state
Processor 3: No threads in READY state
Processor 4: No threads in READY state
Processor 5: No threads in READY state
Processor 6: No threads in READY state
Processor 7: No threads in READY state

There is AnotherApp.exe in the session process list that executes a VBS script:  

1: kd> .process /r /p 86083020
Implicit process is now 86083020
Loading User Symbols

1: kd> !process 86083020
[...]

        THREAD 860b0db0  Cid 4560.4e4c  Teb: 7ffdd000 Win32Thread: bc1bf4d0 WAIT: (Unknown) KernelMode Non-Alertable
            860b0e28  NotificationTimer

1: kd> .thread 860b0db0
Implicit thread is now 860b0db0

1: kd> kv 100
ChildEBP RetAddr  Args to Child             
ae7ec76c 80833ec5 860b0db0 860b0ea0 00000000 nt!KiSwapContext+0x26
ae7ec798 80829069 0000001e 00000000 bae016a0 nt!KiSwapThread+0x2e5
ae7ec7e0 badff394 00000000 00000000 ffdff5f0 nt!KeDelayExecutionThread+0x2ab
[...]
0012f6ac 7c8024d5 00000000 00000000 00a43280 kernel32!CreateProcessInternalA+0x28b
0012f6e4 00401b31 00000000 00a43280 00000000 kernel32!CreateProcessA+0×2c
[…]

1: kd> da /c 100 00a43280
00a43280  “wscript.exe C:\Scripts\script.vbs parameters”

The examination of that script showed that it contained a coding error resulted in an infinite loop.

- Dmitry Vostokov @ DumpAnalysis.org -

Inverse Temperature Spike

Monday, February 16th, 2009

The first week in February was extremely cold in Ireland with sub-zero temperatures and snowfall. This culminated on Sunday, 8th with even more snowfall that I haven’t seen in Ireland for 8 years. I woke up that morning and everything outside my apartment was white so I went out to take a few pictures:

- Dmitry Vostokov @ DumpAnalysis.org