Archive for the ‘Debugging’ Category

Dr. Debugalov lectures on Debugging

Sunday, July 20th, 2008

Yet another insightful cartoon from Narasimha Vedala (click on it to enlarge):

Spider Man

DBG_SpiderMan from Narasimha Vedala (click to enlarge)

- Dmitry Vostokov @ DumpAnalysis.org -

Dump Analysis on Wikipedia

Sunday, July 20th, 2008

I was very pleased to notice that the reference to crash dump analysis patterns and Memory Dump Analysis Anthology book appeared on Wikipedia:

http://en.wikipedia.org/wiki/Debugging 

Special thanks to a person who added this information!

- Dmitry Vostokov @ DumpAnalysis.org -

Bedtime story from Dr. Debugalov

Sunday, July 20th, 2008

Another cartoon from Narasimha Vedala:

Bedtime Story

DBG_AttemptsBedtimeStory from Narasimha Vedala (click to enlarge)

- Dmitry Vostokov @ DumpAnalysis.org -

DebugWare Patterns (Part 1)

Saturday, July 19th, 2008

These are patterns for troubleshooting and debugging tools: solutions for common recurrent problems. Forthcoming book

DebugWare: The Art and Craft of Writing Troubleshooting Tools 

which I’m writing with Kapildev Ramlal, the author of several Citrix tools, will discuss them in sufficient detail. The first pattern is named:

API Query

Software products use various API and external interfaces to query data or get notifications from operating system environment. Their behaviour depends on API return values or output parameters that are not always logged or logged with insufficient detail. In order to reproduce or diagnose problems an engineer can write a small API Query tool that will periodically or asynchronously query the same set of API and log their input and output data. If the problem happens with the product at some point this additional log will help in problem identification and resolution.

Examples:

TSUserLog
WindowHistory

I’ve set up a dedicated page for DebugWare patterns:

http://www.dumpanalysis.org/blog/index.php/debugware-patterns/

- Dmitry Vostokov @ DumpAnalysis.org

Dr. Debugalov at WHDC

Thursday, July 17th, 2008

The proposed World Heavy-weight Debugging Championship and the cartoon come from Narasimha Vedala (click to enlarge):

At The Championships

DBG_AtTheChampionships from Narasimha Vedala (click to enlarge)

WHDC is also known as the abbreviation of Windows Hardware Developer Central.

- Dmitry Vostokov @ DumpAnalysis.org -

Multiple patterns in a dump: case study

Thursday, July 17th, 2008

Let’s now apply the knowledge of crash dump analysis patterns to the real memory dump coming from the hanging system. !locks WinDbg command is the most frequently used start command for such symptoms according to my observation which might be biased. It reveals deep Wait Chain pattern for executive resources:

1: kd> !locks
[...]
Resource @ 0x8b10c168    Exclusively owned
    Contention Count = 1950624
    NumberOfSharedWaiters = 2
    NumberOfExclusiveWaiters = 154
     Threads: 88256430-01<*> 882aedb0-01    8a348580-01   
     Threads Waiting On Exclusive Access:
              891fd368       894fd240       88382280       8921b9c0      
              8aa18db0       89790328       882b8818       88a70bf8      
              884a2780       88999818       8a13b020       8846a7a0      
              8a0b3020       8812e568       897b6db0       88a16440      
              8922c5f8       88bfe3b8       88264ac0       89ff2b40      
              8a9da020       881cf020       8807adb0       89d64598      
              887811d0       8822a850       88264820       88194738      
              8801f7a8       88284020       88628db0       8a071db0      
              884a84b0       88be46c0       89755b18       89700020      
              89ca4580       881cddb0       882f7020       88bcf9a0      
              8921b020       8826fdb0       88a73db0       88211020      
              8868a1c8       89121280       89e01020       895cedb0      
              88d03790       883941f8       8910f820       891ebc80      
              89862db0       88154af8       8821e7d0       881cedb0      
              8822b020       88094818       8a00b020       89e69020      
              880bbdb0       8945f690       8954c1c0       88d2cb90      
              881cd020       8921c448       89550540       8a5a5870      
              8a159228       893976c0       882847b8       89306578      
              880eb9a8       8978e020       882f72c0       8966c380      
              8a12f4b8       8815adb0       881a5020       897c7db0      
              8873ebf8       88674530       8831b468       88e999a0      
              88287020       8966e600       88541db0       8826d7a0      
              88119b10       8a226338       882f7810       888ba348      
              884b89a8       88d03db0       8826db00       8910adb0      
              881d8368       89288238       8a00adb0       89125db0      
              88eb50a0       88dbbdb0       880ed020       895cd5b0      
              881d4b00       88565db0       886e7780       884b86d8      
              8a603598       89383020       8826f370       886d2248      
              88cd1360       881d1888       88bef670       88117db0      
              890d63f0       894d0368       8826f850       89123020      
              88209020       8826fac0       88f9bdb0       89027478      
              894b8d18       882a7338       899b9020       897c3db0      
              8a13fc50       88b33d50       88b54b68       88652360      
              8a199020       8910dc98       8833a020       8a194a70      
              8a5af640       89b717a0       89464db0       8a152878      
              884773d8       88afe020       88debaf8       88bef2a0      
              88bd6948       89abddb0       8a133db0       88e0ebe8      
              88287398       889622e0       8836aab8       88daec80      
              88c5c450       88225718

[...]

Resource @ 0x8a316c98    Exclusively owned
    Contention Count = 40315
    NumberOfExclusiveWaiters = 2
     Threads: 893bd498-01<*>
     Threads Waiting On Exclusive Access:
              8846f9a8       88256430

[...]

Resource @ 0x8a2b3800    Exclusively owned
    Contention Count = 17735
    NumberOfExclusiveWaiters = 1
     Threads: 8a30ec80-01<*>
     Threads Waiting On Exclusive Access:
              893bd498

[...]

14606 total locks, 14 locks currently held

We have this chain: 154 threads -> 88256430 -> 893bd498 -> 8a30ec80

The same conclusion comes from !analyze -v -hang  command:

1: kd> !analyze -v -hang

[...]

Scanning for threads blocked on locks ...

CURRENT_IRQL:  2

BLOCKING_THREAD:  8a30ec80

LOCK_ADDRESS:  8a2b3800 -- (!locks 8a2b3800)

Resource @ 0x8a2b3800    Exclusively owned
    Contention Count = 17735
    NumberOfExclusiveWaiters = 1
     Threads: 8a30ec80-01<*>
     Threads Waiting On Exclusive Access:
              893bd498

BUGCHECK_STR:  LOCK_HELD

[...]

Let’s examine the thread 8a30ec80 that holds so many others:

1: kd> !thread 8a30ec80 1f
THREAD 8a30ec80  Cid 3ca0.20f0  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 2
Not impersonating
DeviceMap                 e1000930
Owning Process            8a254128       Image:         processA.exe
Wait Start TickCount      2024978        Ticks: 2291 (0:00:00:35.796)
Context Switch Count      339739            
UserTime                  00:00:00.000
KernelTime                01:08:29.484
Start Address driverA!WorkerRoutine (0xbfa4b850)
Stack Init b7409000 Current b7407e74 Base b7409000 Limit b7406000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
ChildEBP RetAddr 
f77353a0 bfa4b880 driverA!DoProcessing+0×256
f77353a4 ff180010 driverA!WorkerRoutine+0×30
[…]

We see that this thread is running on processor 2 and the time spent in kernel is more than one hour. Seems we have the case of Spiking Thread pattern here. Also the thread is not waiting and seems to be moving some data:

1: kd> .thread 8a30ec80
Implicit thread is now 8a30ec80

1: kd> r
Last set context:
eax=baec2950 ebx=00000000 ecx=00001b4a edx=00002275 esi=bae8c010 edi=ba01a018
eip=bfa3b68c esp=f77353a4 ebp=b7407f44 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
driverA!DoProcessing+0×256:
bfa3b68c f3a5 rep movs dword ptr es:[edi],dword ptr [esi]

This driver was also known to have pool allocation problems resulting in various strange system behaviour so let’s inspect the output of !vm command to see whether we have an instance of Insufficient Memory pattern:

1: kd> !vm

*** Virtual Memory Usage ***
 Physical Memory:     1048242 (   4192968 Kb)
 Page File: \??\C:\pagefile.sys
   Current:   4456448 Kb  Free Space:   3483860 Kb
   Minimum:   4456448 Kb  Maximum:      4456448 Kb
 Available Pages:      409879 (   1639516 Kb)
 ResAvail Pages:       942398 (   3769592 Kb)
 Locked IO Pages:         124 (       496 Kb)
 Free System PTEs:     182782 (    731128 Kb)
 Free NP PTEs:          32766 (    131064 Kb)
 Free Special NP:           0 (         0 Kb)
 Modified Pages:           29 (       116 Kb)
 Modified PF Pages:        29 (       116 Kb)
 NonPagedPool Usage:    13567 (     54268 Kb)
 NonPagedPool Max:      65279 (    261116 Kb)
 PagedPool 0 Usage:     12350 (     49400 Kb)
 PagedPool 1 Usage:      2442 (      9768 Kb)
 PagedPool 2 Usage:      2457 (      9828 Kb)
 PagedPool 3 Usage:      2395 (      9580 Kb)
 PagedPool 4 Usage:      2465 (      9860 Kb)
 PagedPool Usage:       22109 (     88436 Kb)
 PagedPool Maximum:     64512 (    258048 Kb)

 ********** 3 pool allocations have failed **********

 Shared Commit:        133470 (    533880 Kb)
 Special Pool:              0 (         0 Kb)
 Shared Process:        24993 (     99972 Kb)
 PagedPool Commit:      22173 (     88692 Kb)
 Driver Commit:          2056 (      8224 Kb)
 Committed pages:      938909 (   3755636 Kb)
 Commit limit:        2119027 (   8476108 Kb)

[...]

We have signs here but the current size of paged pool and nonpaged pool seems to be very far from their maximum. Perhaps there were failures in session pool allocations? Let’s look at session pool:

1: kd> !vm 4
[...]
 Session ID 5 @ f79f3000:
 Paged Pool Usage:       35640K

 *** 6 Pool Allocation Failures ***

 Commit Usage:           36900K

We see the problem in session 5 and let’s see in which session our processA.exe was running:

1: kd> !process 8a254128 0
PROCESS 8a254128  SessionId: 15  Cid: 3ca0    Peb: 7ffd7000  ParentCid: 01ac
    DirBase: bff47800  ObjectTable: e779f5c8  HandleCount: 161.
    Image: processA.exe

It was session 15 and therefore we might conclude that previous problems with driverA are not connected to this new one. The identified problem is CPU spike. Perhaps the code contains a bug that causes this driver to loop indefinitely.

- Dmitry Vostokov @ DumpAnalysis.org -

Bugs prosecute defectors

Thursday, July 17th, 2008

New cartoons from Narasimha Vedala: 

Valley Of Death

DBG_ValleyOfDeath from Narasimha Vedala (click to enlarge)

At A Glance

DBG_AtAGlance from Narasimha Vedala (click to enlarge)

Igor Messes Up

DBG_IgorMessesUp from Narasimha Vedala (click to enlarge)

- Dmitry Vostokov @ DumpAnalysis.org -

Bugs plot a revolution!

Tuesday, July 15th, 2008

More cartoons from Narasimha Vedala:

Heap Analysis

DBG_HeapAnalysis from Narasimha Vedala

Revolution Is Brewing

DBG_DungbeetlesPlot from Narasimha Vedala (click to enlarge)

- Dmitry Vostokov @ DumpAnalysis.org -

Welcome to Dr. Dmitry Debugalov!

Monday, July 14th, 2008

This fictional character and cartoons are the creation of Narasimha Vedala and I’m sure you will enjoy them as much as I do:

Bug On Stack

DBG_BugOnStack from Narasimha Vedala

At The Board

DBG_AtTheBoard from Narasimha Vedala (click to enlarge)

Field Guide To Analysis

DBG_FieldGuideToAnalysis from Narasimha Vedala

Unearthing Bugs

DBG_UnearthingBugs from Narasimha Vedala

Bugs Doom

DBG_BugsDoom from Narasimha Vedala (click to enlarge)

- Dmitry Vostokov @ DumpAnalysis.org -

Lean Tracing

Sunday, July 13th, 2008

Sometimes ETW (or CDF) traces can be really huge. Unless we trace the elusive but the specific error we already know about, there is no need to make such traces if we can reproduce the issue. My favourite example is connectivity problems when you cannot connect to a terminal server. The best way is to start tracing, try to connect, get an error and stop tracing. Usually it takes no more than a couple of minutes. We can even trace all modules here just to make sure that we don’t miss anything. It is also better to focus on one specific scenario per one lean trace instead of packing several of them into one big trace.

- Dmitry Vostokov @ DumpAnalysis.org -

Debugging PHP

Sunday, July 13th, 2008

Yesterday I noticed that certain wp-admin PHP pages were not accessible anymore. I was able to view messages but not to post them or change options. Originally I thought that this was the side effect of having two versions of Wordpress running at the same time or improper DB charset configuration of the second instance (I recently set up the Russian version of my blog). I tried to reinstall the latter instance with proper configuration and language but this didn’t help. HTTP fiddler showed no response from the server for problem PHP pages. I also noticed that my other domains running on the same server didn’t have this problem. Finally I hit upon the clue: one of logs had this error:

[client xx.xx.xx.xxx] PHP Fatal error: Allowed memory size of 8388608 bytes exhausted (tried to allocate 11 bytes) in […] dumpanalysis.org/httpdocs/ru/blog/wp-admin/includes/update.php on line 17

Out of memory error! Googling “PHP memory” hit upon the solution to add the following line to .htaccess file:

php_value memory_limit 20M

Now it all suddenly made sense to me. I didn’t think of possible artificial memory limits in script engines. Adding the second instance of Wordpress hit PHP memory limit which was already almost exhausted by my Wordpress blog with more than 500 posts, Drupal portal, phpBB forum and MediaWiki.

Now I’m able to write posts again :-) Happy debugging!

- Dmitry Vostokov @ DumpAnalysis.org -

Software Debugging Generalist

Saturday, July 12th, 2008

Who’s that person? I came up with this title long time ago when I heard some kernel software developers bragging that they don’t care about user space. Actually it was apparently visible that they never analyzed complete memory dumps in all their entirety. A software debugging generalist feels comfortable in every space and mode, in live debugging of every possible application, service and driver and in postmortem analysis of every memory dump type (see also Programmer Universalis for a similar description). 

- Dmitry Vostokov @ DumpAnalysis.org

In Search of Lost CID

Friday, July 11th, 2008

Paraphrasing the title of Marcel Proust’s “In Search of Lost Time” 6-volume classic we can say there is timeless empirical knowledge. One is target CID (Client ID, PID:TID) for RPC calls. We just need to search for even 16-bit numbers and compare them with the list of available PIDs. The example can be found on ntdebugging blog:

Tracking Down a Multi-Process Deadlock

Actually the second dword after PID can contain even 16-bit TID number as can be seen from another example:

1: kd> kv
ChildEBP RetAddr  Args to Child             
[...]
00faf828 7778c38b 00faf8f0 00faf9f0 06413b54 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x112
00faf908 776c0565 06413b54 00fafa00 00faf9f0 ole32!CRpcChannelBuffer::SendReceive2+0xd3
00faf974 776c04fa 06413b54 00fafa00 00faf9f0 ole32!CAptRpcChnl::SendReceive+0xab
00faf9c8 77ce247f 06413b54 00fafa00 00faf9f0 ole32!CCtxComChnl::SendReceive+0×1a9
00faf9e4 77ce252f 03213bdc 00fafa2c 0600016e RPCRT4!NdrProxySendReceive+0×43
00fafdcc 77ce25a6 763050e8 76306bba 00fafe04 RPCRT4!NdrClientCall2+0×206
00fafdec 77c64f87 0000000c 00000005 00fafe3c RPCRT4!ObjectStublessClient+0×8b
00fafdfc 7784ba75 03213bdc 03235858 03235850 RPCRT4!ObjectStubless+0xf
[…]

1: kd> dpp 06413b54 l8
06413b54  77672418 7778bcdf ole32!CRpcChannelBuffer::QueryInterface
06413b58  776723e8 777267f5 ole32!CRpcChannelBuffer::QueryInterface
06413b5c  00000003
06413b60  00000002
06413b64  0743fde8 0316f080
06413b68  07676528 031d5ad0
06413b6c  064c9c80 064c9d00
06413b70  078843c0 00000000

1: kd> dd 064c9c80 l4
064c9c80  064c9d00 064c9c00 00003108 00001dac ; PID TID

I have been using this technique for a long time and only now see it documented.

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 71)

Friday, July 11th, 2008

Dynamic memory corruption patterns in user and kernel spaces are specializations of one big parent pattern called Corrupt Structure because crashes there happen due to corrupt or overwritten heap or pool control structures (for the latter see Double Free pattern). Another frequently seen specialization is called Critical Section Corruption which is the subject of this post. Critical sections are linked together through statically pre-allocated or heap-allocated helper structure (shown in magenta) although themselves they can be stored anywhere from static and stack area to heap:

0:001> dt -r1 ntdll!_RTL_CRITICAL_SECTION 77795240
   +0×000 DebugInfo        : 0×00175d28 _RTL_CRITICAL_SECTION_DEBUG
      +0×000 Type             : 0
      +0×002 CreatorBackTraceIndex : 0
      +0×004 CriticalSection  : 0×77795240 _RTL_CRITICAL_SECTION
      +0×008 ProcessLocksList : _LIST_ENTRY [ 0×173a08 - 0×173298 ]
      +0×010 EntryCount       : 0
      +0×014 ContentionCount  : 0
      +0×018 Spare            : [2] 0

   +0×004 LockCount        : -1
   +0×008 RecursionCount   : 0
   +0×00c OwningThread     : (null)
   +0×010 LockSemaphore    : (null)
   +0×014 SpinCount        : 0

0:001> !address 77795240
    77670000 : 77792000 - 00005000
                    Type     01000000 MEM_IMAGE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                  Usage    RegionUsageImage
                    FullPath C:\WINDOWS\system32\ole32.dll

0:001> !address 0×00175d28
    00140000 : 00173000 - 0000d000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageHeap
                    Handle   00140000

0:000> !locks

CritSec ntdll!LdrpLoaderLock+0 at 7c8877a0
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       1184
EntryCount         0
ContentionCount    b04707
*** Locked

0:000> dt -r1 _RTL_CRITICAL_SECTION 7c8877a0
   +0×000 DebugInfo        : 0×7c8877c0 _RTL_CRITICAL_SECTION_DEBUG
      +0×000 Type             : 0
      +0×002 CreatorBackTraceIndex : 0
      +0×004 CriticalSection  : 0×7c8877a0 _RTL_CRITICAL_SECTION
      +0×008 ProcessLocksList : _LIST_ENTRY [ 0×7c887be8 - 0×7c887bc8 ]
      +0×010 EntryCount       : 0
      +0×014 ContentionCount  : 0xb04707
      +0×018 Spare            : [2] 0

   +0×004 LockCount        : -2
   +0×008 RecursionCount   : 1
   +0×00c OwningThread     : 0×00001184
   +0×010 LockSemaphore    : 0×0000013c
   +0×014 SpinCount        : 0

0:000> !address 7c8877a0
    7c800000 : 7c887000 - 00003000
                    Type     01000000 MEM_IMAGE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageImage
                    FullPath C:\WINDOWS\system32\ntdll.dll

0:000> !address 0×7c8877c0
    7c800000 : 7c887000 - 00003000
                    Type     01000000 MEM_IMAGE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageImage
                    FullPath C:\WINDOWS\system32\ntdll.dll

Consider the case when CRITICAL_SECTION is defined on a stack and there was Local Buffer Overflow overwriting DebugInfo pointer. Then we have an example of Wild Pointer pattern and traversing the list of critical sections from this point will diverge into completely unrelated memory area or stop there. Consider another example of heap corruption or race condition overwriting ProcessLocksList or CriticalSection pointer. Then we have an instance of Wild Pointer pattern illustrated below:

0:000> !locks

CritSec ntdll!LdrpLoaderLock+0 at 7c8877a0
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       1184
EntryCount         0
ContentionCount    b04707
*** Locked

CritSec +1018de08 at 1018de08
WaiterWoken        Yes
LockCount          -49153
RecursionCount     5046347
OwningThread       460050
EntryCount         0
ContentionCount    0
*** Locked

CritSec +1018ddd8 at 1018ddd8
WaiterWoken        Yes
LockCount          -1
RecursionCount     0
OwningThread       0
*** Locked

CritSec +1018de28 at 1018de28
WaiterWoken        Yes
LockCount          -1
RecursionCount     0
OwningThread       0
*** Locked

CritSec +1018de08 at 1018de08
WaiterWoken        Yes
LockCount          -49153
RecursionCount     5046347
OwningThread       460050
EntryCount         0
ContentionCount    0
*** Locked

CritSec +1018de28 at 1018de28
WaiterWoken        Yes
LockCount          -1
RecursionCount     0
OwningThread       0
*** Locked

CritSec +1018ddd8 at 1018ddd8
WaiterWoken        Yes
LockCount          -1
RecursionCount     0
OwningThread       0
*** Locked

Scanned 841 critical sections

We see the signs of corruption at 1018de08 address which is interpreted as pointing to a locked critical section. To see where the corruption started we need to look at the list of all critical sections either locked or not locked:

0:000> !locks -v

CritSec ntdll!RtlCriticalSectionLock+0 at 7c887780
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    28

CritSec ntdll!LdrpLoaderLock+0 at 7c8877a0
WaiterWoken        No
LockCount          0
RecursionCount     1
OwningThread       1184
EntryCount         0
ContentionCount    b04707
*** Locked

CritSec ntdll!FastPebLock+0 at 7c887740
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    42c9

CritSec ntdll!RtlpCalloutEntryLock+0 at 7c888ea0
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!PMCritSect+0 at 7c8883c0
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!UMLogCritSect+0 at 7c888400
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!RtlpProcessHeapsListLock+0 at 7c887960
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +80608 at 00080608
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    22

[...]

CritSec cabinet!_adbgmsg+13c at 74fb4658
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +c6c17c at 00c6c17c
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +c6c0e4 at 00c6c0e4
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec at 1018de08 does not point back to the debug info at 00136a40
Perhaps the memory that held the critical section has been reused without calling DeleteCriticalSection() ?

CritSec +1018de08 at 1018de08
WaiterWoken        Yes
LockCount          -49153
RecursionCount     5046347
OwningThread       460050
EntryCount         0
ContentionCount    0
*** Locked

CritSec at 1018ddd8 does not point back to the debug info at 00136a68
Perhaps the memory that held the critical section has been reused without calling DeleteCriticalSection() ?

CritSec +1018ddd8 at 1018ddd8
WaiterWoken        Yes
LockCount          -1
RecursionCount     0
OwningThread       0
*** Locked

[...]

We see that the problem appears when the heap-allocated critical section at 00c6c0e4 address is linked to an inconsistent critical section at 0×1018de08 address where memory contains UNICODE string fragment:

0:000> !address 00c6c0e4
    00c60000 : 00c60000 - 00010000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageHeap
                    Handle   00c60000

0:000> dt -r1 _RTL_CRITICAL_SECTION 00c6c0e4
   +0x000 DebugInfo        : 0x00161140 _RTL_CRITICAL_SECTION_DEBUG
      +0x000 Type             : 0
      +0x002 CreatorBackTraceIndex : 0
      +0x004 CriticalSection  : 0x00c6c0e4 _RTL_CRITICAL_SECTION
      +0×008 ProcessLocksList : _LIST_ENTRY [ 0×136a48 - 0×119f58 ]
      +0×010 EntryCount       : 0
      +0×014 ContentionCount  : 0
      +0×018 Spare            : [2] 0
   +0×004 LockCount        : -1
   +0×008 RecursionCount   : 0
   +0×00c OwningThread     : (null)
   +0×010 LockSemaphore    : (null)
   +0×014 SpinCount        : 0

0:000> dt -r _RTL_CRITICAL_SECTION_DEBUG 0×00136a48-0×008
   +0×000 Type             : 0
   +0×002 CreatorBackTraceIndex : 0
   +0×004 CriticalSection  : 0×1018de08 _RTL_CRITICAL_SECTION
      +0×000 DebugInfo        : 0×000d001b _RTL_CRITICAL_SECTION_DEBUG
         +0×000 Type             : 0
         +0×002 CreatorBackTraceIndex : 0
         +0×004 CriticalSection  : (null)
         +0×008 ProcessLocksList : _LIST_ENTRY [ 0×0 - 0×0 ]
         +0×010 EntryCount       : 0
         +0×014 ContentionCount  : 0×37e3c700
         +0×018 Spare            : [2] 0×8000025
      +0×004 LockCount        : 196609
      +0×008 RecursionCount   : 5046347
      +0×00c OwningThread     : 0×00460050
      +0×010 LockSemaphore    : 0×00310033
      +0×014 SpinCount        : 0×520044
   +0×008 ProcessLocksList : _LIST_ENTRY [ 0×136a70 - 0×161148 ]
      +0×000 Flink            : 0×00136a70 _LIST_ENTRY [ 0×136a98 - 0×136a48 ]
         +0×000 Flink            : 0×00136a98 _LIST_ENTRY [ 0×136ae8 - 0×136a70 ]
         +0×004 Blink            : 0×00136a48 _LIST_ENTRY [ 0×136a70 - 0×161148 ]
      +0×004 Blink            : 0×00161148 _LIST_ENTRY [ 0×136a48 - 0×119f58 ]
         +0×000 Flink            : 0×00136a48 _LIST_ENTRY [ 0×136a70 - 0×161148 ]
         +0×004 Blink            : 0×00119f58 _LIST_ENTRY [ 0×161148 - 0×16cc3c0 ]
   +0×010 EntryCount       : 0
   +0×014 ContentionCount  : 0
   +0×018 Spare            : [2] 0×2e760000

0:000> !address 0x1018de08
    10120000 : 10120000 - 00100000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageIsVAD

The address points miraculously to some DLL: 

0:000> du 1018de08
1018de08  "....componentA.dll"

We might suggest that componentA.dll played some role there.

There are other messages from verbose version of !locks WinDbg command pointing to critical section problems:  

CritSec componentB!Section+0 at 74004008
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

The CritSec componentC!Info+c at 72455074 has been RE-INITIALIZED.
The critical section points to DebugInfo at 00107cc8 instead of 000f4788

CritSec componentC!Info+c at 72455074
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec componentD!foo+8ec0 at 0101add0
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis Patterns (Part 70)

Thursday, July 10th, 2008

Sometimes compilers optimize code by replacing function calls with their bodies. This procedure is called function inlining and functions themselves are called inline. On one platform we can see the real function call on the stack trace but on another platform or product version we only see the same problem instruction. Fortunately the rest of stack trace should be the same. Therefore when comparing Stack Traces we shouldn’t pay attention only to the top function call. This pattern is called Inline Function Optimization.

It is frequently seen when threads crash while copying or moving memory. Consider this stack trace:

0: kd> kL
ChildEBP RetAddr 
f22efaf4 f279ec3d driver!QueueValue+0x26b
f22efb30 8081dcdf driver!BufferAppendData+0x35f
f22efc7c 808f47b7 nt!IofCallDriver+0x45
f22efc90 808f24ee nt!IopSynchronousServiceTail+0x10b
f22efd38 80888c7c nt!NtWriteFile+0x65a
f22efd38 7c82ed54 nt!KiFastCallEntry+0xfc

When looking at rep movs instruction we might suspect that QueueValue was copying memory:

0: kd> r
eax=00000640 ebx=89b23000 ecx=00000190 edx=89b3c828 esi=02124220 edi=e2108f58
eip=f279c797 esp=f22efadc ebp=f22efaf4 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
driver!QueueValue+0x26b:
f279c797 f3a5 rep movs dword ptr es:[edi],dword ptr [esi] es:0023:e2108f58=dfefbecf ds:0023:02124220=????????

On x64 bit platform the same driver had the similar stack trace but with memcpy at its top:

fffffadf`8955f4a8 fffffadf`8d1bef46 driver!memcpy+0x1c0
fffffadf`8955f4b0 fffffadf`8d1c15c9 driver!QueueValue+0x2fe
fffffadf`8955f550 fffff800`01273ed9 driver!BufferAppendData+0x481
[...]

We also see how QueueValue+0×2fe and QueueValue+0×26b are close. In fact the source code for the driver calls RtlCopyMemory function only once and it is defined as memcpy in wdm.h. The latter function is also exported from nt:

0: kd> x nt!
[...]
80881780 nt!memcpy = <no type information>
[...]

but usually can be found in any driver that links it from C runtime library, for example, on my x64 Windows:

1: kd> x nt!memcpy
fffff800`01c464e0 nt!memcpy = <no type information>

1: kd> x srv!memcpy
fffff980`0eafdf20 srv!memcpy = <no type information>

1: kd> x win32k!memcpy
fffff960`000c1b40 win32k!memcpy = <no type information>

Therefore we see that when compiling for x86 platform Visual C++ compiler decided to inline memcpy code but AMD compiler on x64 platform didn’t inline it. The overall stack trace without offsets is very similar and we can suppose that the problem was identical.

- Dmitry Vostokov @ DumpAnalysis.org -

Windows Debugging on Facebook

Thursday, July 10th, 2008

There is a growing Windows Debugging group on Facebook which I joined recently:

http://www.facebook.com/group.php?gid=23775552268

- Dmitry Vostokov @ DumpAnalysis.org -

MDAA V1 becomes #1 Debugging bestseller

Thursday, July 10th, 2008

Previously I noticed that it reached #1 bestseller status in Assembly Language Programming category but today I see it #1 bestseller in Debugging category:

#1 in  Books > Computers & Internet > Programming > Languages & Tools > Debugging
#1 in  Books > Computers & Internet > Programming > Languages & Tools > Assembly Language Programming

And again, I remind that because the status is updated every hour you might not see the same status when you read this post :-)

- Dmitry Vostokov @ DumpAnalysis.org -

It’s Not a Bug, It’s Not a Feature, It’s a Gift!

Wednesday, July 9th, 2008

This little quotation book is a perfect gift to any software engineer or scientist:

It’s Not a Bug, It’s a Feature!: Computer Wit and Wisdom

Buy from Amazon

- Dmitry Vostokov @ DumpAnalysis.org -

The Hidden Tomb in Pyramid of Software Change

Wednesday, July 9th, 2008

How does software change in production environment? My experience suggests 3 major ways:

  1. Executive decision to replace the whole software product with another competing product.
  2. Software troubleshooting at component level like upgrading or eliminating suspicious components and unrelated products that influence behaviour.
  3. Correction of individual components after debugging to address implementation and functional defects, non-functional, design or architecture deficiencies.

This can be shown on the following rough diagram (excluding possible overlapping of levels) highlighting the often hidden role of memory dump analysis in software change:

- Dmitry Vostokov @ DumpAnalysis.org -

Crash Dump Analysis AntiPatterns (Part 10)

Wednesday, July 9th, 2008

Often engineers spend 10 minutes pursuing a certain investigation path and then prematurely closing it and switching to another. This is what I call Myopic Troubleshooting and Debugging.

This anti-pattern name was inspired by Daniel Dennett’s discussion of insufficiently patient scientists doing computer simulations:

“mistaking a failure of imagination for an insight into necessity” (Darwin’s Dangerous Idea, page 175).

Paraphrasing we can say that engineers think of impossibility where their imagination fails.

- Dmitry Vostokov @ DumpAnalysis.org -