Bugtation No.101
Monday, August 31st, 2009Most bugs are permanent.
Dmitry Vostokov, Variation on a theme “Most objects are temporary”
- Dmitry Vostokov @ DumpAnalysis.org -
Most bugs are permanent.
Dmitry Vostokov, Variation on a theme “Most objects are temporary”
- Dmitry Vostokov @ DumpAnalysis.org -
Sometimes not paying attention to all aspects of default analysis makes it difficult to consider an alternative troubleshooting hypothesis. Here is a sample of !analyze -v output showing massive patching (hooked functions pattern) by DriverA module:
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8092d47f, The address that the exception occurred at
Arg3: f5205b14, Trap Frame
Arg4: 00000000
[...]
CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
80822a49-80822a4d 5 bytes - nt!NtYieldExecution
[ 8b ff 55 8b ec:e9 14 3a 95 76 ]
80823c11-80823c14 4 bytes - nt!KeFlushProcessTb+2c (+0x11c8)
[ 69 76 82 80:88 ff ff ff ]
80823c17-80823c1a 4 bytes - nt!KeFlushProcessTb+32 (+0x06)
[ dd 40 01 00:b5 34 b3 76 ]
8083771f-80837725 7 bytes - nt!KeAcquireQueuedSpinLockAtDpcLevel+1b (+0x13b08)
[ f7 41 04 01 00 00 00:e9 c4 f9 b1 76 cc cc ]
80840945-8084094a 6 bytes - nt!KxFlushEntireTb+9 (+0x9226)
[ ff 15 1c 10 80 80:e9 65 67 b1 76 cc ]
80845fe0-80845fe3 4 bytes - nt!KeFlushSingleTb+49 (+0x569b)
[ 14 1d ff ff:dd 10 b1 76 ]
80845fe5 - nt!KeFlushSingleTb+4e (+0x05)
[ b9:c3 ]
8084722d-80847230 4 bytes - nt!KeFlushMultipleTb+45 (+0x1248)
[ 5e e3 82 80:14 00 00 00 ]
80847233-80847236 4 bytes - nt!KeFlushMultipleTb+4b (+0x06)
[ c1 0a ff ff:99 fe b0 76 ]
808c039c-808c039e 3 bytes - nt!NtSetContextThread
[ 8b ff 55:e9 31 5f ]
808c03a0 - nt!NtSetContextThread+4 (+0x04)
[ ec:76 ]
808e3184-808e3188 5 bytes - nt!NtCreateProcess (+0x22de4)
[ 8b ff 55 8b ec:e9 0b 31 89 76 ]
808f6ad0-808f6ad6 7 bytes - nt!NtLoadKeyEx (+0x1394c)
[ 6a 70 68 98 4b 81 80:e9 e7 f8 87 76 90 90 ]
8090c66f-8090c675 7 bytes - nt!NtDeleteValueKey (+0x15b9f)
[ 6a 44 68 60 f0 81 80:e9 c4 9c 86 76 90 90 ]
8090e36c-8090e370 5 bytes - nt!NtTerminateProcess (+0x1cfd)
[ 8b ff 55 8b ec:e9 34 81 86 76 ]
80915342-80915346 5 bytes - nt!NtDeleteKey (+0x6fd6)
[ 8b ff 55 8b ec:e9 c7 0f 86 76 ]
80918114-80918118 5 bytes - nt!NtOpenThread (+0x2dd2)
[ 68 c4 00 00 00:e9 53 e1 85 76 ]
80921eac-80921eb2 7 bytes - nt!NtEnumerateKey (+0x9d98)
[ 6a 48 68 f0 f9 82 80:e9 f5 44 85 76 90 90 ]
80922578-8092257e 7 bytes - nt!NtEnumerateValueKey (+0x6cc)
[ 6a 48 68 10 fc 82 80:e9 13 3e 85 76 90 90 ]
80922efd-80922f01 5 bytes - nt!NtNotifyChangeKey (+0x985)
[ 8b ff 55 8b ec:e9 e4 34 85 76 ]
809246fb-809246ff 5 bytes - nt!NtOpenProcess (+0x17fe)
[ 68 c8 00 00 00:e9 58 1b 85 76 ]
8092c8a0-8092c8a4 5 bytes - nt!NtCreateKey (+0x81a5)
[ 68 c0 00 00 00:e9 55 9a 84 76 ]
8092f3a6-8092f3ac 7 bytes - nt!NtSetValueKey (+0x2b06)
[ 6a 58 68 a0 f6 82 80:e9 a3 6f 84 76 90 90 ]
8092fa88-8092fa8c 5 bytes - nt!NtCreateFile (+0x6e2)
[ 8b ff 55 8b ec:e9 ab 69 84 76 ]
80931311-80931315 5 bytes - nt!NtOpenKey (+0x1889)
[ 68 ac 00 00 00:e9 d0 4f 84 76 ]
809316ed-809316f3 7 bytes - nt!NtQueryValueKey (+0x3dc)
[ 6a 60 68 80 90 84 80:e9 72 4c 84 76 90 90 ]
8093470f-80934715 7 bytes - nt!NtQueryKey (+0x3022)
[ 6a 58 68 c8 97 84 80:e9 0e 1d 84 76 90 90 ]
809354fa-80935500 7 bytes - nt!NtMapViewOfSection (+0xdeb)
[ 6a 38 68 80 a2 84 80:e9 77 0f 84 76 90 90 ]
80935785-80935789 5 bytes - nt!NtUnmapViewOfSection (+0x28b)
[ 8b ff 55 8b ec:e9 02 0d 84 76 ]
8093ba96-8093ba9c 7 bytes - nt!NtProtectVirtualMemory (+0x6311)
[ 6a 44 68 40 03 85 80:e9 b1 a9 83 76 90 90 ]
8093c86d-8093c871 5 bytes - nt!NtSetInformationProcess (+0xdd7)
[ 68 08 01 00 00:e9 4c 9a 83 76 ]
8093ce6b-8093ce71 7 bytes - nt!NtCreateProcessEx (+0x5fe)
[ 6a 0c 68 58 0e 85 80:e9 38 94 83 76 90 90 ]
80978fef-80978ff5 7 bytes - nt!NtQueryMultipleValueKey (+0x3c184)
[ 6a 48 68 f0 f9 86 80:e9 86 d3 7f 76 90 90 ]
80979775-8097977b 7 bytes - nt!NtRenameKey (+0x786)
[ 6a 3c 68 38 fa 86 80:e9 a8 cb 7f 76 90 90 ]
80979caf-80979cb3 5 bytes - nt!NtRestoreKey (+0x53a)
[ 8b ff 55 8b ec:e9 46 c7 7f 76 ]
8097a11c-8097a120 5 bytes - nt!NtUnloadKey (+0x46d)
[ 8b ff 55 8b ec:e9 b1 c2 7f 76 ]
8097a139-8097a13d 5 bytes - nt!NtReplaceKey (+0x1d)
[ 8b ff 55 8b ec:e9 d0 c2 7f 76 ]
197 errors : !nt (80822a49-8097a13d)
MODULE_NAME: DriverA
IMAGE_NAME: DriverA.sys
MEMORY_CORRUPTOR: PATCH_DriverA
FAILURE_BUCKET_ID: MEMORY_CORRUPTION_PATCH_DriverA
BUCKET_ID: MEMORY_CORRUPTION_PATCH_DriverA
However, when we look at the stack trace, we would see that BSOD happened when accessing registry while updating drivers:
FAULTING_IP:
nt!HvpGetCellMapped+97
8092d47f 8b4604 mov eax,dword ptr [esi+4]
TRAP_FRAME: f5205b14 -- (.trap 0xfffffffff5205b14)
ErrCode = 00000000
eax=e1021000 ebx=e101a3b8 ecx=00000003 edx=89214988 esi=00000100 edi=00000000
eip=8092d47f esp=f5205b88 ebp=f5205bfc iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!HvpGetCellMapped+0×97:
8092d47f 8b4604 mov eax,dword ptr [esi+4] ds:0023:00000104=????????
Resetting default scope
PROCESS_NAME: updatedrivers.exe
STACK_TEXT:
f52056e0 8085bb9f 0000008e c0000005 8092d47f nt!KeBugCheckEx+0x1b
f5205aa4 808346b4 f5205ac0 00000000 f5205b14 nt!KiDispatchException+0x3a2
f5205b0c 80834668 f5205bfc 8092d47f badb0d00 nt!CommonDispatchException+0x4a
f5205b98 8092d559 e101a3b8 e63a8e40 0010fc18 nt!Kei386EoiHelper+0x186
f5205bfc 80920fcd e101a3b8 00610052 3b9aca07 nt!HvpGetCellMapped+0×36a
f5205c20 8092248b e63a8e40 e22b4794 00000000 nt!CmpGetValueKeyFromCache+0xa4
f5205cc0 80922649 e63a8e40 00000000 00000001 nt!CmEnumerateValueKey+0×45a
f5205d44 80833bdf 00000058 00000000 00000001 nt!NtEnumerateValueKey+0×1c9
f5205d44 7c9485ec 00000058 00000000 00000001 nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be wrong.
001290fc 00000000 00000000 00000000 00000000 0×7c9485ec
So an alternative hypothesis to pursue would be some sort of registry corruption after driver updates.
- Dmitry Vostokov @ DumpAnalysis.org -
Shortly after being published, x64 Windows Debugging: Practical Foundations book rises to the top of Amazon debugging bestesellers list (on 22:30 19.08.09):
- Dmitry Vostokov @ DumpAnalysis.org -
By analogy with the free verse and the anthropologist John Tedlock’s written narratives of Native American Zuni where different font size was used for different levels I tried today the similar technique with a raw stack data from the previous case study of registry corruption:
[...]
f690a3dc f7a21a06 BOOTVID!ReadWriteMode+0×42
f690a3e0 f7a219a7 BOOTVID!__outpw+0×17
f690a3ec f7a21a76 BOOTVID!SetPixel+0×6a
f690a404 f7a21c1b BOOTVID!DisplayCharacter+0×47
f690a420 b42e14db dump_iaStor+0×3a4db
f690a468 b4364080 dump_iaStor+0xbd080
f690a480 f6249983 ati2mtag+0×1b6983
f690a488 804f2ee6 nt!IopWritePageToDisk+0xe4
f690a4e0 804f2fb6 nt!IopWriteSummaryDump+0×7e
f690a4e4 b42e12d8 dump_iaStor+0×3a2d8
f690a50c 804f3c8d nt!IoWriteCrashDump+0×42d
f690a514 b42e12d8 dump_iaStor+0×3a2d8
f690a584 804f8fa7 nt!KiDumpParameterImages+0×5f
f690a594 f74764bb sptd+0×664bb
f690a598 f74764a0 sptd+0×664a0
f690a59c b42e162a dump_iaStor+0×3a62a
f690a5a8 f7a22394 BOOTVID!PreserveRow+0×7c
f690a5c0 b42e12d8 dump_iaStor+0×3a2d8
f690a5cc 804f9ecd nt!KeBugCheck2+0xa4d
f690a6e0 804f9f43 nt!KeBugCheckEx+0×1b
f690a950 80545d00 nt!KiSwapProcess+0×60
f690a9a0 80522d45 nt!MiDecrementReferenceCount+0×65
f690a9ac 805067ea nt!MiDeferredUnlockPages+0×1c8
f690a9c8 804f9f43 nt!KeBugCheckEx+0×1b
f690a9e8 80548c2d nt!MiFreePoolPages+0×8b
f690aa04 80564d20 nt!NonPagedPoolDescriptor
f690aa28 8054b49a nt!ExFreePoolWithTag+0×1ba
f690aa3c 8062bc17 nt!CmpPinCmView+0xab
f690aa5c 80637e13 nt!HvpDelistBinFreeCells+0xad
f690aa68 8063bf19 nt!CmpFree+0×17
f690aa78 8063eb20 nt!HvpRecoverData+0×3ec
f690aad4 8063ef05 nt!HvMapHive+0×133
f690ab10 80539ac0 nt!_except_handler3
f690ab14 804e0e38 nt!`string’+0×258
f690ab20 8063087e nt!HvInitializeHive+0×416
f690ab38 806383a9 nt!CmpInitializeHive+0×26d
f690ab54 8063bf02 nt!CmpFree
f690ab58 8063b918 nt!CmpFileSetSize
f690ab5c 8063c466 nt!CmpFileWrite
f690ab60 8063c33e nt!CmpFileRead
f690ab64 8063c1fc nt!CmpFileFlush
f690aba4 80625bf9 nt!CmpInitHiveFromFile+0xa3
f690abfc 8062ad8b nt!CmpCmdHiveOpen+0×21
f690ac24 80631f24 nt!CmLoadKey+0×90
f690ac98 80622053 nt!CmConvertHandleToKernelHandle+0×55
f690acb0 806257b4 nt!NtLoadKey2+0×1fc
f690acc8 806259ac nt!NtLoadKey
f690acd8 805bc33f nt!ObpCloseHandleTableEntry+0×14d
f690ad24 805bc401 nt!ObpCloseHandle+0xab
f690ad34 80539ac0 nt!_except_handler3
f690ad38 804e0bd0 nt!`string’+0×364
f690ad44 806259be nt!NtLoadKey+0×12
f690ad58 8054162c nt!KiFastCallEntry+0xfc
f690ade0 805460ee nt!KiThreadStartup+0×16
f690ade4 80626dee nt!CmpLoadHiveThread
f690aec0 bf875fb4 win32k!WatchdogDrvStretchBlt+0×92
f690aee4 bf988527 win32k!_except_handler3
f690aee8 bf995f40 win32k!`string’+0×124
f690aef0 bf875fb4 win32k!WatchdogDrvStretchBlt+0×92
f690aef4 bf873ec2 win32k!EngStretchBltROP+0×3a9
where the larger font size indicates the stack trace from kv command and the smaller font size indicates symbolic information found between call frames that may or may not correspond to partial stack traces left from intermediate nested function calls of the current call sequence or past stack traces and their frames.
- Dmitry Vostokov @ DumpAnalysis.org -
This is the real photo of tracing experiments used for the front cover of September 2009 issue of Debugged! magazine:
- Dmitry Vostokov @ DumpAnalysis.org -
Finally, the front cover has materialized for this issue of Debugged! magazine:
Debugged! MZ/PE: Software Tracing (ISBN: 978-1906717797)
If you have an article idea or if you’d like to write an article for us please use the following contact form:
http://www.dumpanalysis.org/contact
The deadline for this issue is 25th of September.
- Dmitry Vostokov @ DumpAnalysis.org -
This Sunday while drinking Turkish coffee (after 3 years of blogging) I was thinking about the new publishing year starting in September, the new year of reading (already started) and, in particular, about the cover of the forthcoming September issue of Debugged! MZ/PE magazine (to be revealed tonight).
- Dmitry Vostokov @ DumpAnalysis.org -
A friend of mine couldn’t start Windows XP on his notebook. As soon as he entered his credentials in a logon window the system experienced a BSOD event. He booted from another media and collected mini-dumps. All of them were consistent in resisting to my attempts to load symbols and modules. Even explicit downloading the symbol package from Microsoft didn’t help. All bugcheck info and stack traces were like this pointing to pool corruption:
0: kd> !analyze -v
[...]
BAD_POOL_CALLER (c2)
The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 00000043, Attempt to free a virtual address which was never in any pool
Arg2: c9c00000, Address being freed.
Arg3: 00000000, 0
Arg4: 00000000, 0
[...]
1: kd> kv 100
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
f6cc09e4 80548c2d 000000c2 00000043 c9c00000 nt+0x22f43
f6cc0a24 8054b49a c9c00000 e2039410 e23fd000 nt+0x71c2d
f6cc0a64 8063bf19 c9c00000 00000000 f6cc0ad0 nt+0x7449a
f6cc0a74 8063eb20 c9c00000 00002000 00000000 nt+0x164f19
f6cc0ad0 8063ef05 e1f6e008 00000000 00000000 nt+0x167b20
f6cc0b1c 8063087e e1f6e008 00000000 00000001 nt+0x167f05
f6cc0b34 806383a9 e1f6e101 00000005 00000000 nt+0x15987e
f6cc0ba0 80625bf9 f6cc0bdc 00000005 00000000 nt+0x1613a9
f6cc0bf8 8062ad8b f6cc0d04 00000000 f6cc0c64 nt+0x14ebf9
f6cc0c20 80631f24 f6cc0ccc f6cc0c6c f6cc0c5c nt+0x153d8b
f6cc0cac 806257b4 f6cc0ce4 f6cc0ccc 00000000 nt+0x15af24
f6cc0d40 806259be 0006dcc4 0006dcac 00000000 nt+0x14e7b4
f6cc0d54 8054162c 0006dcc4 0006dcac 0006dcf0 nt+0x14e9be
f6cc0d64 7c91e514 badb0d00 0006dc98 00000000 nt+0x6a62c
f6cc0d68 badb0d00 0006dc98 00000000 00000000 0x7c91e514
f6cc0d6c 0006dc98 00000000 00000000 00000090 0xbadb0d00
f6cc0d70 00000000 00000000 00000090 000000a4 0x6dc98
Portions of raw stack data available in minidump didn’t have any traces of other modules and drivers except nt:
1: kd> !thread
GetPointerFromAddress: unable to read from 80562134
[...]
86485da8: Unable to get thread contents
1: kd> dps f6cc09cc-3000 f6cc09cc+3000
[...]
f6cc095c ????????
f6cc0960 ????????
f6cc0964 ????????
f6cc0968 00000000
f6cc096c 00000000
f6cc0970 003d0058
f6cc0974 f6cc09a8
f6cc0978 00000000
f6cc097c 0000c000
f6cc0980 00000000
f6cc0984 00000000
f6cc0988 8648b4d8
f6cc098c 863eb240
f6cc0990 00000000
f6cc0994 01ffffff
f6cc0998 f6cc093c
f6cc099c 00000000
f6cc09a0 f6cc0a14
f6cc09a4 80539ac0 nt+0x62ac0
f6cc09a8 804d8228 nt+0x1228
f6cc09ac ffffffff
f6cc09b0 00000002
f6cc09b4 80506653 nt+0x2f653
f6cc09b8 f78a9548
f6cc09bc c9c00000
f6cc09c0 0000bb40
[...]
f6cc0fcc 00000000
f6cc0fd0 00000000
f6cc0fd4 00000000
f6cc0fd8 00000000
f6cc0fdc 00000000
f6cc0fe0 7c91d5aa
f6cc0fe4 7c940574
f6cc0fe8 0015fd80
f6cc0fec 00100020
f6cc0ff0 00000000
f6cc0ff4 00000000
f6cc0ff8 00000000
f6cc0ffc 00000000
f6cc1000 ????????
f6cc1004 ????????
f6cc1008 ????????
[...]
So I advised to give me a kernel dump and fortunately there was one available too. It was more amenable for analysis and showed the involvement of registry:
0: kd> kv 100
ChildEBP RetAddr Args to Child
f690a9e4 80548c2d 000000c2 00000043 dcf40000 nt!KeBugCheckEx+0x1b
f690aa24 8054b49a dcf40000 e1294410 e17c6000 nt!MiFreePoolPages+0x8b
f690aa64 8063bf19 dcf40000 00000000 f690aad0 nt!ExFreePoolWithTag+0x1ba
f690aa74 8063eb20 dcf40000 00002000 00000000 nt!CmpFree+0×17
f690aad0 8063ef05 e11c4b60 00000000 00000000 nt!HvpRecoverData+0×3ec
f690ab1c 8063087e e11c4b60 00000000 00000001 nt!HvMapHive+0×133
f690ab34 806383a9 e11c4c01 00000005 00000000 nt!HvInitializeHive+0×416
f690aba0 80625bf9 f690abdc 00000005 00000000 nt!CmpInitializeHive+0×26d
f690abf8 8062ad8b f690ad04 00000000 f690ac64 nt!CmpInitHiveFromFile+0xa3
f690ac20 80631f24 f690accc f690ac6c f690ac5c nt!CmpCmdHiveOpen+0×21
f690acac 806257b4 f690ace4 f690accc 00000000 nt!CmLoadKey+0×90
f690ad40 806259be 0006dcc4 0006dcac 00000000 nt!NtLoadKey2+0×1fc
f690ad54 8054162c 0006dcc4 0006dcac 0006dcf0 nt!NtLoadKey+0×12
f690ad54 7c91e514 0006dcc4 0006dcac 0006dcf0 nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006dcf0 00000000 00000000 00000000 00000000 0×7c91e514
Examination of parameters on raw stack pointed to a user hive for MyFriend user:
0: kd> dpu f690ace4
f690ace4 00000018
f690ace8 80000ce0
f690acec f690ad0c “Z\..(”
f690acf0 00000240
f690acf4 00000000
f690acf8 00000000
f690acfc 00660066
f690ad00 00eddea0
f690ad04 00660066
f690ad08 e10b1e60 “\??\C:\Documents and Settings\MyFriend\ntuser.dat”
f690ad14 00000028
[…]
So the solution was to log as Administrator and recreate the user.
- Dmitry Vostokov @ DumpAnalysis.org -
Finally the book came through the publishing process and is available on Amazon and other bookstores:
x64 Windows Debugging: Practical Foundations
- Dmitry Vostokov @ DumpAnalysis.org -
Today I celebrate 3 years of blogging that resulted in 1,430 posts across 8 blogs. I would like to thank everyone for their continuing support!
This blog post belongs to the 4th year of blogging.
- Dmitry Vostokov @ DumpAnalysis.org -
I was recently asked why the following code used byte ptr modifier for MOV instruction when assigning a number to a memory location pointed to by a register:
C/C++ code:
int a;
int *pa = &a;
void foo()
{
__asm
{
// ...
mov eax, [pa]
mov [eax], 1
// ...
}
}
Generated x86 assembly language code:
0:000:x86> uf foo
[...]
0042d64e c60001 mov byte ptr [eax],1
[…]
It looks like by default Visual C++ inline assembler treats MOV as “byte ptr” because it doesn’t know about C or C++ language semantics. Originally I thought that was the sign of a code optimization because the resulted binary code is smaller than the one generated by dword ptr. In order to check that I used a WinDbg command:
0:000> a
77067dfe mov dword ptr [eax], 1
mov dword ptr [eax], 1
77067e04
0:000> u 77067dfe
ntdll!DbgBreakPoint:
77067dfe c70001000000 mov dword ptr [eax],1
77067e04 0c8b or al,8Bh
77067e06 54 push esp
77067e07 2408 and al,8
77067e09 c70200000000 mov dword ptr [edx],0
77067e0f 897a04 mov dword ptr [edx+4],edi
77067e12 0bff or edi,edi
77067e14 741e je ntdll!RtlInitString+0×34 (77067e34)
This could be possible because the variable “a” is global, initialized to 0 during the program startup, so it is safe to change just one byte. If “a” was a local variable (on stack) than other 3 bytes of DWORD could contain garbage from the previously used stack memory. However, I noticed that the program was compiled as Debug target with all optimization turned off. If Visual C++ compiler was used it should have assumed that the variable “a” could have been referenced from other compilation units and no longer contained 0 before the assignment in foo function. I recreated the same code in C/C++, built the new Debug executable, and indeed, it used dword ptr instead of byte ptr as expected from C/C++ semantics.
- Dmitry Vostokov @ DumpAnalysis.org -
Studying poetry and reading books about Stalin certainly influenced this first verse:
Welcome, Doctor DebugLove!
Your name, pronounced, fixes bugs!
- Dmitry Vostokov @ DumpAnalysis.org -
Pre-ordered today on Amazon this forthcoming book:
Advanced .NET Debugging (Addison-Wesley Microsoft Technology Series)
I was able to find TOC on InformIt. Looking forward to reading it. .NET crash dump (mixed managed and unmanaged code) and software trace analysis is a sizable part of my day-to-day activities.
When ordering I recalled that I’m was also working on a .NET debugging and memory dump analysis book:
Unmanaged Code: Escaping the Matrix of .NET
but I had to postpone it due to other commitments. It is now planned for the next year after I accumulate more material and real-world case studies.
Taking the opportunity, I also created a category .NET Debugging where I put some old blog posts and patterns related to managed code.
- Dmitry Vostokov @ DumpAnalysis.org -
I finished reading Dirac’s biography The Strangest Man 3 months ago and started to read this book. Its title intrigued me when I was browsing recent physics releases on Amazon and I bought it. It looks to me like the mix of brief biographical notes with explanation of physical theories. Here learning from mistakes undoubtedly helps to understand special and general relativity better. I also liked the short and clear explanation of EPR paradox in just one page, “revisionist” and unusual biographical notes on other scientists and their faults, like Galileo and Newton, and notes about Einstein’s private life. This makes him really human (he was like an ideal scientist from Plato Universe for me before). When I was reading Not Even Wrong and the Trouble With Physics books I thought of the possible “yellow press physics” (which is not bad, and doesn’t mean bad quality for me, I like to read yellow press sometimes and listen to pop music) and one day, at lunch, when reading about Newton madness and other peculiar character traits I thought about “yellow press physics” again. Was the choice of this book hardcover and jacket colors (yellow) made deliberate? Anyway, while approaching the end of the book and reading about how Einstein wasted 20-30 years on his idée fixe unified theories I immediately recalled String Theory, and indeed, the author voiced the same thoughts a few moments later when I turned a page over. I also liked the discussion on how General Relativity might have been discovered if it wasn’t formulated by Einstein. The author tells us that it would have been done via a QFT route. Einstein has fallen in my eyes, and now, after reading this book, he is not quite the hero of science like I imagined before. Nevertheless, his stature from McDonald’s is still on my shelves.
Einstein’s Mistakes: The Human Failings of Genius
I don’t want to repeat Einstein’s mistakes…
- Dmitry Vostokov @ LiterateScientist.com -
Previously introduced RADII software development process acquires definite shape as a product supportability driven software support tools development process. In summary, supportability of a product gives rise to Requirements, they expand into Architecture segments, then into Design segments, then into Implementation segments, and finally, into several Improvement phases. In short, RADII:
Every segment is a separate troubleshooting or debugging tool. All segments share elements of RADII via DebugWare patterns and can be further refined via iterative and incremental SDLC if needed.
- Dmitry Vostokov @ DumpAnalysis.org -
It was reported that one server was hanging during automated reboot. Stack trace collection shows a few suspended and frozen threads. They all belong to the same process, ServiceA:
PROCESS 8545eb18 SessionId: 0 Cid: 0fec Peb: 7ffd4000 ParentCid: 0fdc
DirBase: 3fbeb8e0 ObjectTable: e19dd1d0 HandleCount: 169.
Image: ServiceA.exe
THREAD 859cc900 Cid 0fec.0ff0 Teb: 7ffdf000 Win32Thread: bc1738d0 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
859cca90 Semaphore Limit 0×2
THREAD 858c6480 Cid 0fec.0ff4 Teb: 7ffde000 Win32Thread: bc178c40 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
f55747d8 SynchronizationEvent
THREAD 859f2338 Cid 0fec.0ff8 Teb: 7ffdd000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
859f24c8 Semaphore Limit 0×2
THREAD 859be2b8 Cid 0fec.0ffc Teb: 7ffdc000 Win32Thread: bc1915d8 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
859be448 Semaphore Limit 0×2
[...]
When zooming into this process we see that one thread was processing an exception:
0: kd> .process /r /p 8545eb18
Implicit process is now 8545eb18
Loading User Symbols
0: kd> !process 8545eb18
[...]
THREAD 858c6480 Cid 0fec.0ff4 Teb: 7ffde000 Win32Thread: bc178c40 WAIT: (Unknown) KernelMode Non-Alertable
SuspendCount 1
f55747d8 SynchronizationEvent
Not impersonating
DeviceMap e10008e8
Owning Process 8545eb18 Image: ServiceA.exe
Attached Process N/A Image: N/A
Wait Start TickCount 6927 Ticks: 89866 (0:00:23:24.156)
Context Switch Count 156 LargeStack
UserTime 00:00:00.031
KernelTime 00:00:00.000
Win32 Start Address 0x611054cb
Start Address kernel32!BaseThreadStartThunk (0x7c8217ec)
Stack Init f5575000 Current f557471c Base f5575000 Limit f5571000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr
f5574734 80833ec5 nt!KiSwapContext+0x26
f5574760 80829c14 nt!KiSwapThread+0x2e5
f55747a8 809a25c8 nt!KeWaitForSingleObject+0x346
f5574888 809a3739 nt!DbgkpQueueMessage+0x178
f55748ac 809a386e nt!DbgkpSendApiMessage+0x45
f5574938 8082d7ec nt!DbgkForwardException+0x90
f5574cf4 8088bed2 nt!KiDispatchException+0×1ea
f5574d5c 8088be86 nt!CommonDispatchException+0×4a
f5574da0 7c829c3a nt!Kei386EoiHelper+0×186
f5574dd0 00000000 kernel32!LoadResource+0×5d
We zoom into its parameters in search of semantically consistent output of .exr, .cxr and .trap commands:
0: kd> .thread 858c6480
Implicit thread is now 858c6480
0: kd> kv 100
ChildEBP RetAddr Args to Child
f5574734 80833ec5 858c6480 858c6528 00000200 nt!KiSwapContext+0x26
f5574760 80829c14 00000000 858c6480 f55747d0 nt!KiSwapThread+0x2e5
f55747a8 809a25c8 f55747d8 00000000 00000000 nt!KeWaitForSingleObject+0x346
f5574888 809a3739 8545eb18 00000000 f55748c0 nt!DbgkpQueueMessage+0x178
f55748ac 809a386e f55748c0 00000001 f5574d64 nt!DbgkpSendApiMessage+0x45
f5574938 8082d7ec f5574d10 00000001 00000000 nt!DbgkForwardException+0x90
f5574cf4 8088bed2 f5574d10 00000000 f5574d64nt!KiDispatchException+0×1ea
f5574d5c 8088be86 005bf4b4 61213267 badb0d00 nt!CommonDispatchException+0×4a
f5574da0 7c829c3a 71c22898 00000001 ffffffff nt!Kei386EoiHelper+0×186
f5574dd0 00000000 005bf448 00000023 00000000 kernel32!LoadResource+0×5d
After probing parameters for KiDispatchException we get these results pointing to ModuleA:
0: kd> .exr f5574d10
ExceptionAddress: 61213267 (ModuleA!GetData+0×0000b57f)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 71c22898
Attempt to read from address 71c22898
0: kd> .trap f5574d64
ErrCode = 00000004
eax=71c22898 ebx=0073a7a8 ecx=7c829c3a edx=71c1c000 esi=00000104 edi=00000000
eip=61213267 esp=005bf448 ebp=005bf4b4 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ModuleA!GetData+0×0000b57f:
001b:61213267 0fb700 movzx eax,word ptr [eax] ds:0023:71c22898=????
We check its data using lmv WinDbg command and find out that it is old and needs to be updated. But we don’t stop our investigation here. The fact that ServiceA was suspended means that it was probably being debugged or memory dumped. And indeed, we see NTSD in a process list:
0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 8619d5d0 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 3fbeb020 ObjectTable: e1001e08 HandleCount: 1651.
Image: System
PROCESS 85e95d88 SessionId: none Cid: 019c Peb: 7ffdf000 ParentCid: 0004
DirBase: 3fbeb040 ObjectTable: e16d5f18 HandleCount: 19.
Image: smss.exe
PROCESS 85e4fd88 SessionId: 0 Cid: 01cc Peb: 7ffd4000 ParentCid: 019c
DirBase: 3fbeb060 ObjectTable: e1561d70 HandleCount: 907.
Image: csrss.exe
PROCESS 85e42d88 SessionId: 0 Cid: 01e4 Peb: 7ffde000 ParentCid: 019c
DirBase: 3fbeb080 ObjectTable: e16a97b0 HandleCount: 504.
Image: winlogon.exe
[...]
PROCESS 85a4dd18 SessionId: 0 Cid: 0fdc Peb: 7ffda000 ParentCid: 0214
DirBase: 3fbeb520 ObjectTable: e1aa5b38 HandleCount: 121.
Image: ntsd.exe
[...]
If we zoom into NTSD process we would see that its main thread was waiting for a console input:
0: kd> !process 0fdc ff
[...]
THREAD 859f8768 Cid 0fdc.0fe0 Teb: 7ffdf000 Win32Thread: bc14cb38 WAIT: (Unknown) UserMode Non-Alertable
859f8954 Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00001f98:
Current LPC port e19f03a0
Not impersonating
DeviceMap e10008e8
Owning Process 85a4dd18 Image: ntsd.exe
Attached Process N/A Image: N/A
Wait Start TickCount 6932 Ticks: 89861 (0:00:23:24.078)
Context Switch Count 450 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.078
Win32 Start Address ntsd!mainCRTStartup (0×0100845a)
Start Address kernel32!BaseProcessStartThunk (0×7c8217f8)
Stack Init f55c5000 Current f55c4c08 Base f55c5000 Limit f55c1000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr
f55c4c20 80833ec5 nt!KiSwapContext+0×26
f55c4c4c 80829c14 nt!KiSwapThread+0×2e5
f55c4c94 80920fba nt!KeWaitForSingleObject+0×346
f55c4d50 8088b3fc nt!NtRequestWaitReplyPort+0×776
f55c4d50 7c94860c nt!KiFastCallEntry+0xfc
0006ece0 7c947899 ntdll!KiFastSystemCallRet
0006ece4 7c94ec4a ntdll!ZwRequestWaitReplyPort+0xc
0006ed04 7c80cf8c ntdll!CsrClientCallServer+0×8c
0006edfc 7c872904 kernel32!ReadConsoleInternal+0×1b8
0006ee84 7c8018f4 kernel32!ReadConsoleA+0×3b
0006eedc 01005141 kernel32!ReadFile+0×64
0006ef04 01006974 ntsd!ConIn+0×183
0006ff38 010082d1 ntsd!MainLoop+0×1eb
0006ff44 01008589 ntsd!main+0×149
0006ffc0 7c82f23b ntsd!mainCRTStartup+0×12f
0006fff0 00000000 kernel32!BaseProcessStart+0×23
We follow LPC chain to csrss.exe to find out another blocked thread there:
0: kd> !lpc message 00001f98
Searching message 1f98 in threads …
Client thread 859f8768 waiting a reply from 1f98
Searching thread 859f8768 in port rundown queues …
Server communication port 0xe19b6b08
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty
Connected port: 0xe19f03a0 Server connection port: 0xe1361d20
Client communication port 0xe19f03a0
Handles: 1 References: 4
The LpcDataInfoChainHead queue is empty
Server connection port e1361d20 Name: ServiceAPort
Handles: 1 References: 233
Server process : 85e4fd88 (csrss.exe)
Queue semaphore : 85e9b078
Semaphore state 0 (0×0)
The message queue is empty
The LpcDataInfoChainHead queue is empty
Done.
0: kd> !process 85e4fd88 ff
[…]
THREAD 8549db60 Cid 01cc.1390 Teb: 7ffad000 Win32Thread: bc15aea8 WAIT: (Unknown) UserMode Non-Alertable
8549dd4c Semaphore Limit 0×1
Waiting for reply to LPC MessageId 00004feb:
Pending LPC Reply Message:
e191b6d0: [e1a162e8,e19ffc18]
Not impersonating
DeviceMap e10008e8
Owning Process 85e4fd88 Image: csrss.exe
Attached Process N/A Image: N/A
Wait Start TickCount 12095 Ticks: 84698 (0:00:22:03.406)
Context Switch Count 35 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Start Address 0×75943b55
Stack Init f5625000 Current f5624bf0 Base f5625000 Limit f5622000 Call 0
Priority 15 BasePriority 13 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr
f5624c08 80833ec5 nt!KiSwapContext+0×26
f5624c34 80829c14 nt!KiSwapThread+0×2e5
f5624c7c 809222f6 nt!KeWaitForSingleObject+0×346
f5624d38 8088b3fc nt!NtSecureConnectPort+0×6ce
f5624d38 7c94860c nt!KiFastCallEntry+0xfc
015ff778 7c947939 ntdll!KiFastSystemCallRet
015ff77c 77c2e7c3 ntdll!NtSecureConnectPort+0xc
015ff8a0 77c4607b RPCRT4!LRPC_CASSOCIATION::OpenLpcPort+0×21e
015ff8e0 77c45ffb RPCRT4!LRPC_CASSOCIATION::ActuallyDoBinding+0×55
015ff958 77c4f6a5 RPCRT4!LRPC_CASSOCIATION::AllocateCCall+0×190
015ff98c 77c4f5d1 RPCRT4!LRPC_BINDING_HANDLE::AllocateCCall+0×1f2
015ff9b8 77c4f201 RPCRT4!LRPC_BINDING_HANDLE::NegotiateTransferSyntax+0xd3
015ff9d0 77c4ed14 RPCRT4!I_RpcGetBufferWithObject+0×5b
015ff9e0 77c4f464 RPCRT4!I_RpcGetBuffer+0xf
015ff9f0 77cb30e4 RPCRT4!NdrGetBuffer+0×2e
015ffddc 779b4695 RPCRT4!NdrClientCall2+0×197
[…]
We follow LPC chain again to see that csrss.exe thread was waiting for a reply from our suspended and frozen ServiceA:
0: kd> !lpc message 00004feb
Searching message 4feb in threads …
Client thread 8549db60 waiting a reply from 4feb
Searching thread 8549db60 in port rundown queues …
Server connection port e19a50e0 Name: ServiceAPort
Handles: 1 References: 20
Server process : 8545eb18 (ServiceA.exe)
Queue semaphore : 85443320
Semaphore state 9 (0×9)
Messages in queue:
0000 e1a866e0 - Busy Id=000022e7 From: 01e4.01e8 Context=80060004 [e19a50f0 . e1878688]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e1878688 - Busy Id=00003297 From: 0ac0.0b54 Context=804d0045 [e1a866e0 . e1036740]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e1036740 - Busy Id=00003986 From: 0ce4.0ce8 Context=00000042 [e1878688 . e1441228]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e1441228 - Busy Id=00003a32 From: 0db4.0e14 Context=00000050 [e1036740 . e1a162e8]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e1a162e8 - Busy Id=00004c75 From: 059c.05ac Context=00000051 [e1441228 . e191b6d0]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e191b6d0 - Busy Id=00004feb From: 01cc.1390 Context=00000051 [e1a162e8 . e19ffc18]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e19ffc18 - Busy Id=000055e3 From: 13fc.05b4 Context=800d0009 [e191b6d0 . e19f4ea0]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
0000 e19f4ea0 - Busy Id=00006844 From: 0b00.0f20 Context=006b3d60 [e19ffc18 . e19a50f0]
Length=011800e8 Type=0000000a (LPC_CONNECTION_REQUEST)
Data: 00000000 00000000 00000000 00000000 00000000 00000000
The message queue contains 8 messages
The LpcDataInfoChainHead queue is empty
Done.
It doesn’t look as a deadlock because, although we have a cyclic process wait chain ServiceA -> NTSD -> CSRSS -> ServiceA, NTSD was waiting for a different thread in CSRSS than the one in CSRSS waiting for a reply from ServiceA. If these threads are unrelated then we don’t have a deadlock, strictly speaking, because the latter involves activity chains with ownership, not a container dependency (a process is a container for threads). I illustrated all this on the following diagram:
- Dmitry Vostokov @ DumpAnalysis.org -
I’m very delighted to be a Dr. DebugLove! There are many Dr. Debug out there (Google shows 1,840,000 hits) but do they really love debugging like I do? Of course, they do, but I’m the first to acknowledge my strange love publicly by accepting a pseudonym.
- Dmitry Vostokov @ DumpAnalysis.org -
Last week I was comparing the existing collection of memory dump analysis patterns to the collection of trace analysis patterns (in formation) in the search of isomorphism (or more correctly, general morphism) similar to Missing Component pattern. It is not a coincidence that such pattern pairs can be formed. For example, it is possible to discern deadlocks from both crash dumps and software traces (if appropriate information is available there). Fundamentally, it is implied by the definition of a software trace as some sort of a memory dump. And we can see traces in memory dumps too, for example, Execution Residue pattern. Because raw stack data resides in stack pages and in contemporary operating systems they are created from zero pages (metaphorically, out of the void) we can say that stack regions of threads are sorted by their creation time, for example, in this process user memory dump:
0:017> !runaway 4
Elapsed Time
Thread Time
0:49c 0 days 5:16:31.076
4:4d8 0 days 5:16:30.967
3:4d0 0 days 5:16:30.967
2:4cc 0 days 5:16:30.967
1:4c8 0 days 5:16:30.967
5:4e8 0 days 5:16:30.936
6:b6c 0 days 5:16:15.695
7:b70 0 days 5:16:15.679
9:b88 0 days 5:16:15.586
8:b84 0 days 5:16:15.586
11:348 0 days 5:16:12.934
10:bfc 0 days 5:16:12.934
12:1200 0 days 5:15:16.528
15:1298 0 days 5:15:15.220
14:1290 0 days 5:15:15.220
13:128c 0 days 5:15:15.220
17:12e4 0 days 5:15:13.257
16:12dc 0 days 5:15:13.257
18:12ec 0 days 5:15:13.117
20:12f4 0 days 5:15:13.085
19:12f0 0 days 5:15:13.085
21:17a0 0 days 5:13:16.321
22:1628 0 days 5:13:15.729
24:1778 0 days 1:35:50.773
23:17ec 0 days 1:35:50.773
25:1570 0 days 1:27:54.190
26:1724 0 days 1:27:10.151
27:1490 0 days 0:05:46.732
28:1950 0 days 0:02:28.153
29:19b4 0 days 0:00:58.108
30:177c 0 days 0:00:38.358
31:1798 0 days 0:00:23.351
32:1a7c 0 days 0:00:08.343
If we have complete memory dumps we can also account for other processes and their elapsed time. Within stack pages we have partial stack traces but do not have exact timing information between them except for stack frames from the current frozen thread stack trace or, if we are lucky, from a partial stack trace from the past execution. However, the timing between frames from different stacks is undefined and we can only guess it from higher level considerations like semantics of procedure calls and other information.
These considerations and the notion of a poset (partially ordered set) let me thinking about memory dumps as posets. I even created my interpretation of POSET abbreviation for this occasion:
POSET
Partially Ordered Software Execution Trace
- Dmitry Vostokov @ DumpAnalysis.org -
Errata for the previous book Windows Debugging: Practical Foundations has been published:
Next week the updated version (revision 2.0) should be available on Amazon and other stores for both paperback and hardback titles. Digital version on Lulu has already been updated.
- Dmitry Vostokov @ DumpAnalysis.org -
The digital version of the book is finally available:
x64 Windows Debugging: Practical Foundations
Paperback should be available in 1-2 weeks on Amazon and other stores. When working on the book I fixed errors in the previous x86 version. Errata file for it should be available tomorrow.
- Dmitry Vostokov @ DumpAnalysis.org -