Viewing Problem Artifacts from Different Angles
Tuesday, April 28th, 2009I often say or write something like this: “I looked at the dump|trace file from different angles”.

- Dmitry Vostokov @ DumpAnalysis.org -
I often say or write something like this: “I looked at the dump|trace file from different angles”.

- Dmitry Vostokov @ DumpAnalysis.org -
One application was constantly crashing in a highly dynamic multi-user environment with various value-added GUI-enhancing hooking and patching 3rd-party products. The dump was analyzed and it shows the data NULL pointer access violation:
0:000> r
eax=05503ff0 ebx=05503f10 ecx=6bb7da25 edx=00000000 esi=02203c10 edi=00000000
eip=6b542ba3 esp=04f6f860 ebp=04f6f87c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
gui_plugin!entry+0xe63:
6b542ba3 8b0f mov ecx,dword ptr [edi] ds:0023:00000000=????????
Unfortunately due to the lack of gui_plugin symbols the stack trace is incorrect:
0:000> kv
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
04f6f87c 00000000 000003b9 00000690 00000d20 gui_plugin!entry+0xe63
We do not attempt stack trace reconstruction here but show one of troubleshooting methods to eliminate possible effects of the changed environment. We look for any code patching modules by examining hooked functions:
0:000> !chkimg -lo 50 -d !kernel32 -v
Searching for module with expression: !kernel32
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: c:\mss\kernel32.dll\4791A76Ddb000\kernel32.dll
No range specified
Scanning section: .text
Size: 835001
Range to scan: 76701000-767ccdb9
7670a672-7670a676 5 bytes - kernel32!MoveFileW
[ 8b ff 55 8b ec:e9 89 59 80 09 ]
7671c5c8-7671c5cc 5 bytes - kernel32!DeleteFileW (+0×11f56)
[ 8b ff 55 8b ec:e9 33 3a 80 09 ]
76721070-76721074 5 bytes - kernel32!MoveFileExW (+0×4aa8)
[ 8b ff 55 8b ec:e9 8b ef 83 09 ]
7678ff3f-7678ff43 5 bytes - kernel32!ReplaceFileW (+0×6eecf)
[ 68 b4 03 00 00:e9 bc 00 7a 09 ]
Total bytes compared: 835001(100%)
Number of errors: 20
20 errors : !kernel32 (7670a672-7678ff43)
0:000> u 7670a672
kernel32!MoveFileW:
7670a672 e989598009 jmp 7ff10000
7670a677 6a02 push 2
7670a679 6a00 push 0
7670a67b 6a00 push 0
7670a67d ff750c push dword ptr [ebp+0Ch]
7670a680 ff7508 push dword ptr [ebp+8]
7670a683 e8c4690100 call kernel32!MoveFileWithProgressW (7672104c)
7670a688 5d pop ebp
0:000> u 7ff10000
7ff10000 e99b1e94f5 jmp 3rdPartyHook!MoveFileW (75851ea0)
7ff10005 8bff mov edi,edi
7ff10007 55 push ebp
7ff10008 8bec mov ebp,esp
7ff1000a e968a67ff6 jmp kernel32!MoveFileW+0×5 (7670a677)
7ff1000f 0000 add byte ptr [eax],al
7ff10011 0000 add byte ptr [eax],al
7ff10013 0000 add byte ptr [eax],al
We see lots of references to 3rdPartyHook on the thread raw stack:
0:000> !teb
TEB at 7ffd5000
ExceptionList: 04f6ff70
StackBase: 04f70000
StackLimit: 04f6c000
SubSystemTib: 00000000
FiberData: 00001e00
ArbitraryUserPointer: 00000000
Self: 7ffd5000
EnvironmentPointer: 00000000
ClientId: 00001790 . 000017d0
RpcHandle: 00000000
Tls Storage: 7ffd502c
PEB Address: 7ffd8000
LastErrorValue: 0
LastStatusValue: 0
Count Owned Locks: 0
HardErrorMode: 0
0:000> dds 04f6c000 04f70000
[...]
04f6cf28 00440042
04f6cf2c 04f6d2d4
04f6cf30 77179834 ntdll!_except_handler4
04f6cf34 003ce48e
04f6cf38 fffffffe
04f6cf3c 7719d584 ntdll!LdrpResSearchResourceInsideDirectory+0x80f
04f6cf40 7719cf9d ntdll!LdrpResSearchResourceMappedFile+0x521
04f6cf44 6b920002
04f6cf48 00000000
04f6cf4c 000bf000
04f6cf50 6b9d9000
04f6cf54 6b9200f8
04f6cf58 00000000
04f6cf5c 01f6d1d4 <Unloaded_D3D9.DLL>+0x12d1d4
04f6cf60 00000003
04f6cf64 04f6cf88
04f6cf68 04f6d370
04f6cf6c 04f6d340
04f6cf70 771d8fb2 ntdll!_SEH_epilog4_GS+0xa
04f6cf74 7719d018 ntdll!LdrpResSearchResourceMappedFile+0x781
04f6cf78 73d77612
04f6cf7c 6b920002
04f6cf80 000bf000
04f6cf84 00000000
04f6cf88 00000001
04f6cf8c 00000000
04f6cf90 00000000
04f6cf94 00000000
04f6cf98 00000000
04f6cf9c 04f6d44c
04f6cfa0 75851c2f 3rdPartyHook+0×1c2f
04f6cfa4 04f6d1e8
04f6cfa8 00000000
04f6cfac 00000000
04f6cfb0 00000000
04f6cfb4 00000000
[…]
The symbolic references point to valid and sound code:
0:000> ub 75851c2f
3rdPartyHook!GetData+0x22d:
75851c1d c20c00 ret 0Ch
75851c20 8b4dfc mov ecx,dword ptr [ebp-4]
75851c23 5f pop edi
75851c24 5e pop esi
75851c25 33cd xor ecx,ebp
75851c27 33c0 xor eax,eax
75851c29 5b pop ebx
75851c2a e8590f0000 call 3rdPartyHook!__security_check_cookie (75852b88)
0:000> u 75851c2f
3rdPartyHook!GetData+0x23f:
75851c2f 8be5 mov esp,ebp
75851c31 5d pop ebp
75851c32 c20c00 ret 0Ch
75851c35 cc int 3
75851c36 cc int 3
75851c37 cc int 3
75851c38 cc int 3
75851c39 cc int 3
We also see the other 4th-party module that we know as patching from our past experience:
[...]
04f6f850 00000000
04f6f854 00000000
04f6f858 00000000
04f6f85c 00000000
04f6f860 1000c200 4thPartyHook!Shared+0×80
04f6f864 00000000
04f6f868 04f6f87c
04f6f86c 00000001
04f6f870 00000000
04f6f874 00000000
04f6f878 00000000
04f6f87c 0000006c
[…]
The address 1000c200 looks suspicious and coincidental as the set of flags. However, when we disassemble the address, we see the valid and sound code and the module seems to be loaded at 0×10000000 address:
0:000> ub 1000c200
4thPartyHook!Shared+0x80:
1000c1e8 e8234effff call 10001010
1000c1ed a160020710 mov eax,dword ptr [10070260]
1000c1f2 83c418 add esp,18h
1000c1f5 8b750c mov esi,dword ptr [ebp+0Ch]
1000c1f8 85f6 test esi,esi
1000c1fa 7530 jne 1000c22c
1000c1fc 85c0 test eax,eax
1000c1fe 742c je 1000c22c
0:000> u 1000c200
4thPartyHook!Shared+0x80:
1000c200 8b08 mov ecx,dword ptr [eax]
1000c202 8b5004 mov edx,dword ptr [eax+4]
1000c205 53 push ebx
1000c206 8bd9 mov ebx,ecx
1000c208 0bda or ebx,edx
1000c20a 5b pop ebx
1000c20b 741f je 1000c22c
1000c20d f6400801 test byte ptr [eax+8],1
0:000> !dh 10000000
[...]
File Type: DLL
FILE HEADER VALUES
14C machine (i386)
5 number of sections
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
2102 characteristics
Executable
32 bit word machine
DLL
[...]
OPTIONAL HEADER VALUES
10B magic #
8.00 linker version
62000 size of code
15000 size of initialized data
0 size of uninitialized data
3E1B6 address of entry point
1000 base of code
----- new -----
10000000 image base
1000 section alignment
1000 file alignment
2 subsystem (Windows GUI)
4.00 operating system version
0.00 image version
4.00 subsystem version
7A000 size of image
1000 size of headers
[...]
0:000> lm
start end module name
00400000 00425000 App
[...]
10000000 1007a000 4thPartyHook
[...]
The recommendation I usually give in such cases it to remove or disable 3rd- and 4th-party products that do code patching to see if this resolves the problem. This makes the module list internally clean and if the problem persists then we should look for another causes and restore 3rd- and 4th-party products.
- Dmitry Vostokov @ DumpAnalysis.org -
On the great divide in modern software factories:
“We have in fact, two kinds of” engineers, “side by side: one that” design, “but do not” code, “and another that” code, “but seldom” design.
Bertrand Russell, Sceptical Essays
- Dmitry Vostokov @ DumpAnalysis.org -
After coming back to engineering I decided to expand the domain of my research and start the new series of posts called Trace Analysis Patterns. In addition to Citrix CDF / Microsoft ETW traces I plan to cover other variants based on my extensive software engineering background in the past where I used tracing in software products ranging from soft multi-platform real-time systems to static code analysis tools. Connection with memory dump analysis will be covered too because sometimes the combination of static and dynamic data leads to interesting observations and helps to troubleshoot and resolve customer problems especially when not all data can be collected dynamically.
In fact, stack traces and their collections are specializations of the more general traces. Another example is historical information in memory dump files especially when it is somehow timestamped.
In this part I start with the obvious and to some extent the trivial pattern called Periodic Error. This is an error or a status value that is observed periodically many times:
No PID TID Date Time Statement
[...]
664957 1788 22504 4/23/2009 17:59:14.600 MyClass::Initialize: Cannot open connection “Client ID: 310″, status=5
[…]
668834 1788 19868 4/23/2009 19:11:52.979 MyClass::Initialize: Cannot open connection “Client ID: 612″, status=5
[…]
or
No PID TID Date Time Statement
[...]
202314 1788 19128 4/21/2009 16:03:46.861 HandleDataLevel: Error 12005 Getting Mask
[…]
347653 1788 17812 4/22/2009 13:26:00.735 HandleDataLevel: Error 12005 Getting Mask
[…]
Here single trace entries can be isolated from the trace and studied in detail.
Be aware though that some modules might report periodic errors that are false positive, in the sense, that they are expected as a part of implementation details, for example, when a function returns an error to indicate that bigger buffer is required or to estimate its size for a subsequent call. It merits its own pattern name and I come to it next time with more examples.
I also created a page where I’ll will be adding all tracing patterns:
- Dmitry Vostokov @ TraceAnalysis.org -