Icons for Memory Dump Analysis Patterns (Part 73)
Wednesday, September 22nd, 2010Today we introduce an icon for Wait Chain (process objects) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Wait Chain (process objects) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
A page to reference all different kinds of contention is necessary, so I created this post:
I’ll update it as soon as I add more similar patterns.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Wait Chain (LPC/ALPC) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Wait Chain (thread objects) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Sometimes, when comparing normal, expected (working) and abnormal (non-working) traces we can get a clue for further troubleshooting and debugging by looking at module load events. For example, when we see an unexpected module loaded in our non-working trace, its function (and sometimes even module name) can signify some difference to pay attention to:
# PID TID Time Message
[...]
14492 6908 6912 11:06:41.953 LoadImageEvent: ImageName(\WINDOWS\system32\3rdPartySso.dll)
[...]
I call this pattern Guest Component and it is a different from Missing Component. Although in the latter pattern a missing component in one trace may appear in another but the component name is known apriori and expected. In the former pattern the component is unexpected. For example, in the trace above, its partial name fragment “Sso” may trigger a suggestion to relate differences in authentication where in a non-working case SSO (single sign-on) was configured.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
There many interpretations of the letter M in M-theory but I propose another one: M stands for Memory. In any outcome it surely will be committed to memory in the future either as successful or not. On the other hand I’m now trying to make sense of it in relation to Memory as an ur-foundation (ur-, primordial, German prefix).
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Abnormal CPU consumption detection usually goes at a process level when we detect it using Task Manager, for example. Sometimes that process has only one spiking thread among many but there are cases when CPU consumption is spread among many threads. I call this pattern Distributed Spike. This behavior could be a consequence of a weak process coupling, for example, in these 2 services (where, for simplicity, I highlight in red color threads with more than 1 second CPU time spent in user mode):
0:000> !runaway
User Mode Time
Thread Time
120:4e518 0 days 0:05:09.937
126:531bc 0 days 0:03:56.546
44:334c 0 days 0:03:40.765
133:4fe1c 0 days 0:03:31.156
45:42b4 0 days 0:03:27.328
107:25ae0 0 days 0:03:19.921
49:627c 0 days 0:02:48.250
147:6b90c 0 days 0:02:33.046
136:6620c 0 days 0:02:05.109
127:4f2d0 0 days 0:02:04.046
129:5bc30 0 days 0:02:02.171
48:623c 0 days 0:02:01.796
119:41f00 0 days 0:02:00.562
74:cd18 0 days 0:01:59.453
51:7a4c 0 days 0:01:54.234
35:21d4 0 days 0:01:47.390
148:326dc 0 days 0:01:32.640
123:43c8c 0 days 0:01:32.515
135:67b08 0 days 0:01:32.296
11:aa8 0 days 0:01:30.906
118:42f8c 0 days 0:01:20.265
42:3a3c 0 days 0:01:20.000
77:d024 0 days 0:01:19.734
115:3a840 0 days 0:01:15.625
89:145f4 0 days 0:01:10.500
157:4e310 0 days 0:01:07.625
80:d07c 0 days 0:01:07.468
33:1ab0 0 days 0:01:00.593
117:10bd4 0 days 0:00:59.421
151:1aaa0 0 days 0:00:59.015
28:17bc 0 days 0:00:58.796
83:f3a4 0 days 0:00:55.828
122:41964 0 days 0:00:55.578
149:4101c 0 days 0:00:55.234
10:aa4 0 days 0:00:52.453
106:21b80 0 days 0:00:51.187
132:62e5c 0 days 0:00:49.437
160:3a3a8 0 days 0:00:48.875
137:6bf90 0 days 0:00:48.687
145:6f594 0 days 0:00:47.968
143:58d60 0 days 0:00:45.703
72:ba64 0 days 0:00:44.515
41:19b0 0 days 0:00:44.000
130:5d480 0 days 0:00:43.750
139:6d090 0 days 0:00:42.062
138:6d578 0 days 0:00:40.406
91:17974 0 days 0:00:40.359
152:37f80 0 days 0:00:39.781
81:de68 0 days 0:00:39.265
150:65b2c 0 days 0:00:36.625
162:1f340 0 days 0:00:35.125
85:10650 0 days 0:00:33.546
131:614e8 0 days 0:00:33.093
128:2eddc 0 days 0:00:33.000
146:6f690 0 days 0:00:32.015
161:3c4b4 0 days 0:00:30.421
167:3cde4 0 days 0:00:29.390
171:3979c 0 days 0:00:28.515
166:3cd40 0 days 0:00:28.312
168:68ef0 0 days 0:00:27.781
65:aad0 0 days 0:00:26.593
109:267f4 0 days 0:00:26.390
88:13624 0 days 0:00:26.000
173:5282c 0 days 0:00:24.640
153:71e14 0 days 0:00:23.390
112:322b4 0 days 0:00:22.812
110:9578 0 days 0:00:22.125
175:20230 0 days 0:00:20.250
79:b458 0 days 0:00:20.218
66:61b8 0 days 0:00:19.875
62:9498 0 days 0:00:19.562
156:d900 0 days 0:00:19.015
121:5106c 0 days 0:00:18.687
142:6bb28 0 days 0:00:18.562
46:2cbc 0 days 0:00:17.796
169:d920 0 days 0:00:16.875
154:720b4 0 days 0:00:16.484
170:4ac8c 0 days 0:00:15.968
73:b010 0 days 0:00:13.609
39:3224 0 days 0:00:13.406
172:722e4 0 days 0:00:12.375
63:9780 0 days 0:00:12.203
177:8464 0 days 0:00:11.906
184:22908 0 days 0:00:10.234
140:5765c 0 days 0:00:09.750
174:2f484 0 days 0:00:08.390
50:7230 0 days 0:00:07.125
187:3c324 0 days 0:00:06.765
125:46cf0 0 days 0:00:06.296
178:3a424 0 days 0:00:05.125
114:33d20 0 days 0:00:03.734
165:3ca74 0 days 0:00:01.203
189:3c358 0 days 0:00:01.000
164:3124c 0 days 0:00:00.578
25:be4 0 days 0:00:00.515
17:ba8 0 days 0:00:00.125
104:5cf8 0 days 0:00:00.109
26:e4c 0 days 0:00:00.109
96:5d44 0 days 0:00:00.093
99:5b18 0 days 0:00:00.078
56:8a6c 0 days 0:00:00.078
55:8a68 0 days 0:00:00.078
6:a08 0 days 0:00:00.078
4:a00 0 days 0:00:00.062
103:5cfc 0 days 0:00:00.046
100:5ab8 0 days 0:00:00.046
68:bf34 0 days 0:00:00.046
37:29d4 0 days 0:00:00.046
101:5ab4 0 days 0:00:00.031
98:5b44 0 days 0:00:00.031
97:5d40 0 days 0:00:00.031
57:8a70 0 days 0:00:00.031
53:8a60 0 days 0:00:00.031
36:29c0 0 days 0:00:00.031
16:ac4 0 days 0:00:00.031
1:9e4 0 days 0:00:00.031
60:880c 0 days 0:00:00.015
58:8a5c 0 days 0:00:00.015
24:be0 0 days 0:00:00.015
15:abc 0 days 0:00:00.015
188:13044 0 days 0:00:00.000
186:6530 0 days 0:00:00.000
185:2013c 0 days 0:00:00.000
183:6047c 0 days 0:00:00.000
182:65400 0 days 0:00:00.000
181:61560 0 days 0:00:00.000
180:2b7a4 0 days 0:00:00.000
179:56294 0 days 0:00:00.000
176:20300 0 days 0:00:00.000
163:2ab1c 0 days 0:00:00.000
159:276cc 0 days 0:00:00.000
158:72134 0 days 0:00:00.000
155:6a078 0 days 0:00:00.000
144:6ce98 0 days 0:00:00.000
141:5404 0 days 0:00:00.000
134:65718 0 days 0:00:00.000
124:4bed4 0 days 0:00:00.000
116:3c770 0 days 0:00:00.000
113:b08 0 days 0:00:00.000
111:28e54 0 days 0:00:00.000
108:25fbc 0 days 0:00:00.000
105:20504 0 days 0:00:00.000
102:5cf4 0 days 0:00:00.000
95:5c70 0 days 0:00:00.000
94:5ed4 0 days 0:00:00.000
93:18c2c 0 days 0:00:00.000
92:19fd8 0 days 0:00:00.000
90:c870 0 days 0:00:00.000
87:7994 0 days 0:00:00.000
86:124cc 0 days 0:00:00.000
84:eab8 0 days 0:00:00.000
82:f2a4 0 days 0:00:00.000
78:d5c0 0 days 0:00:00.000
76:cfd0 0 days 0:00:00.000
75:cf64 0 days 0:00:00.000
71:b4f8 0 days 0:00:00.000
70:c628 0 days 0:00:00.000
69:c484 0 days 0:00:00.000
67:be84 0 days 0:00:00.000
64:aa00 0 days 0:00:00.000
61:93f0 0 days 0:00:00.000
59:89e4 0 days 0:00:00.000
54:8a64 0 days 0:00:00.000
52:89a8 0 days 0:00:00.000
47:4c64 0 days 0:00:00.000
43:3fa0 0 days 0:00:00.000
40:2c88 0 days 0:00:00.000
38:2a28 0 days 0:00:00.000
34:1928 0 days 0:00:00.000
32:1668 0 days 0:00:00.000
31:8dc 0 days 0:00:00.000
30:15d4 0 days 0:00:00.000
29:1044 0 days 0:00:00.000
27:fb4 0 days 0:00:00.000
23:bd8 0 days 0:00:00.000
22:bd4 0 days 0:00:00.000
21:bd0 0 days 0:00:00.000
20:bc8 0 days 0:00:00.000
19:bc4 0 days 0:00:00.000
18:bc0 0 days 0:00:00.000
14:ab8 0 days 0:00:00.000
13:ab4 0 days 0:00:00.000
12:ab0 0 days 0:00:00.000
9:aa0 0 days 0:00:00.000
8:a9c 0 days 0:00:00.000
7:a98 0 days 0:00:00.000
5:a04 0 days 0:00:00.000
3:9f4 0 days 0:00:00.000
2:9f0 0 days 0:00:00.000
0:994 0 days 0:00:00.000
This is a real spike in the first service process as can be confirmed by a random non-waiting thread:
0:000> ~143k
ChildEBP RetAddr
050dfc68 7c82d6a4 ntdll!RtlEnterCriticalSection+0x1d
050dfc84 77c7bc50 ntdll!RtlInitializeCriticalSectionAndSpinCount+0x92
050dfc98 77c7bc7c rpcrt4!MUTEX::CommonConstructor+0x1b
050dfcac 77c7c000 rpcrt4!MUTEX::MUTEX+0x13
050dfcc8 77c6ff47 rpcrt4!BINDING_HANDLE::BINDING_HANDLE+0x2d
050dfcd8 77c6ff1f rpcrt4!SVR_BINDING_HANDLE::SVR_BINDING_HANDLE+0x10
050dfcfc 77c6d338 rpcrt4!RPC_ADDRESS::InquireBinding+0x8a
050dfd0c 77c6fd1d rpcrt4!LRPC_SCALL::ToStringBinding+0x16
050dfd1c 76554c83 rpcrt4!RpcBindingToStringBindingW+0x4d
050dfd5c 77c7c42a ServiceA!RpcSecurityCallback+0x1e
050dfdb4 77c7c4b0 rpcrt4!RPC_INTERFACE::CheckSecurityIfNecessary+0x6f
050dfdcc 77c7c46c rpcrt4!LRPC_SBINDING::CheckSecurity+0x4f
050dfdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0x2bb
050dfe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0x127
050dff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430
050dff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
050dffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0x9d
050dffb8 77e64829 rpcrt4!ThreadStartRoutine+0x1b
050dffec 00000000 kernel32!BaseThreadStart+0x34
0:000> ~143r
eax=00000000 ebx=00000000 ecx=7c887784 edx=7c887780 esi=7c887784 edi=00163fb0
eip=7c81a37d esp=050dfc5c ebp=050dfc68 iopl=0 nv up ei ng nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000287
ntdll!RtlEnterCriticalSection+0x1d:
7c81a37d 0f92c0 setb al
0:000> .asm no_code_bytes
Assembly options: no_code_bytes
0:000> u 7c81a37d
ntdll!RtlEnterCriticalSection+0x1d:
7c81a37d setb al
7c81a380 test al,al
7c81a382 je ntdll!RtlEnterCriticalSection+0x28 (7c82b096)
7c81a388 mov ecx,dword ptr fs:[18h]
7c81a38f mov eax,dword ptr [ecx+24h]
7c81a392 pop edi
7c81a393 mov dword ptr [edx+0Ch],eax
7c81a396 mov dword ptr [edx+8],1
0:000> ub 7c81a37d
ntdll!RtlEnterCriticalSection+0x6:
7c81a366 mov edx,dword ptr [ebp+8]
7c81a369 push esi
7c81a36a lea esi,[edx+4]
7c81a36d push edi
7c81a36e mov dword ptr [ebp-4],esi
7c81a371 mov eax,0
7c81a376 mov ecx,dword ptr [ebp-4]
7c81a379 lock btr dword ptr [ecx],eax
The second service is weakly (waiting for event notifications) coupled to the first service above:
0:000> !runaway
User Mode Time
Thread Time
5:dbec 0 days 0:01:50.031
8:46008 0 days 0:01:46.062
11:ad0c 0 days 0:01:13.921
17:932c 0 days 0:01:03.234
14:45d78 0 days 0:00:58.109
15:6d4d0 0 days 0:00:00.015
2:725a4 0 days 0:00:00.015
0:6101c 0 days 0:00:00.015
18:d1c4 0 days 0:00:00.000
16:76bc 0 days 0:00:00.000
13:456a8 0 days 0:00:00.000
12:459e4 0 days 0:00:00.000
10:3c768 0 days 0:00:00.000
9:12d20 0 days 0:00:00.000
7:46010 0 days 0:00:00.000
6:4600c 0 days 0:00:00.000
4:dbf0 0 days 0:00:00.000
3:17ed4 0 days 0:00:00.000
1:61024 0 days 0:00:00.000
0:000> ~11k
ChildEBP RetAddr
0223fa68 7c82787b ntdll!KiFastSystemCallRet
0223fa6c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
0223fab8 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0x230
0223fac4 77c80673 rpcrt4!I_RpcSendReceive+0x24
0223fad8 77ce315a rpcrt4!NdrSendReceive+0x2b
0223fec0 771f4fbd rpcrt4!NdrClientCall2+0x22e
0223fed8 771f4f60 ServiceB!RpcWaitEvent+0x1c
[...]
0:000> ~17k
ChildEBP RetAddr
0283fa68 7c82787b ntdll!KiFastSystemCallRet
0283fa6c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
0283fab8 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0x230
0283fac4 77c80673 rpcrt4!I_RpcSendReceive+0x24
0283fad8 77ce315a rpcrt4!NdrSendReceive+0x2b
0283fec0 771f4fbd rpcrt4!NdrClientCall2+0x22e
0283fed8 771f4f60 ServiceB!RpcWaitEvent+0x1c
[...]
Sometimes, semantic process coupling also results in distributed spikes and most often it is possible to predict another spiking process in such cases. In our example above both spiking processes were semantically coupled with another service and it was confirmed that it was spiking too:
0:000> !runaway
User Mode Time
Thread Time
89:10d4 0 days 0:03:03.500
28:a94 0 days 0:00:39.562
73:c10 0 days 0:00:37.531
54:b88 0 days 0:00:37.140
29:a98 0 days 0:00:35.906
27:a90 0 days 0:00:35.500
75:c2c 0 days 0:00:28.812
90:10d8 0 days 0:00:27.000
93:10e4 0 days 0:00:24.265
32:aa4 0 days 0:00:12.906
41:ac8 0 days 0:00:11.890
35:ab0 0 days 0:00:11.875
58:bc4 0 days 0:00:10.218
42:acc 0 days 0:00:09.546
85:e74 0 days 0:00:08.859
36:ab4 0 days 0:00:08.578
72:c0c 0 days 0:00:05.890
70:c04 0 days 0:00:05.687
33:aa8 0 days 0:00:05.046
74:c14 0 days 0:00:04.953
40:ac4 0 days 0:00:04.953
38:abc 0 days 0:00:04.359
39:ac0 0 days 0:00:04.312
34:aac 0 days 0:00:04.140
64:bec 0 days 0:00:03.812
88:10d0 0 days 0:00:03.187
30:a9c 0 days 0:00:02.859
9:a10 0 days 0:00:01.968
37:ab8 0 days 0:00:01.953
92:10e0 0 days 0:00:01.718
83:d00 0 days 0:00:01.125
94:1150 0 days 0:00:01.031
77:c54 0 days 0:00:00.890
98:f2c0 0 days 0:00:00.265
97:eb1c 0 days 0:00:00.265
76:c50 0 days 0:00:00.265
21:a48 0 days 0:00:00.187
22:a4c 0 days 0:00:00.140
63:be8 0 days 0:00:00.093
23:a50 0 days 0:00:00.093
53:af8 0 days 0:00:00.078
24:a54 0 days 0:00:00.046
71:c08 0 days 0:00:00.031
65:bf0 0 days 0:00:00.031
87:e8c 0 days 0:00:00.015
57:bc0 0 days 0:00:00.015
104:6454c 0 days 0:00:00.000
103:63fb4 0 days 0:00:00.000
102:3c5ec 0 days 0:00:00.000
101:65178 0 days 0:00:00.000
100:5d0e4 0 days 0:00:00.000
99:5bae4 0 days 0:00:00.000
96:574 0 days 0:00:00.000
95:b84 0 days 0:00:00.000
91:10dc 0 days 0:00:00.000
86:e88 0 days 0:00:00.000
84:e70 0 days 0:00:00.000
82:c84 0 days 0:00:00.000
81:c68 0 days 0:00:00.000
80:c64 0 days 0:00:00.000
79:c60 0 days 0:00:00.000
78:c5c 0 days 0:00:00.000
69:c00 0 days 0:00:00.000
68:bfc 0 days 0:00:00.000
67:bf8 0 days 0:00:00.000
66:bf4 0 days 0:00:00.000
62:bd8 0 days 0:00:00.000
61:bd4 0 days 0:00:00.000
60:bd0 0 days 0:00:00.000
59:bcc 0 days 0:00:00.000
56:bbc 0 days 0:00:00.000
55:bb8 0 days 0:00:00.000
52:af4 0 days 0:00:00.000
51:af0 0 days 0:00:00.000
50:aec 0 days 0:00:00.000
49:ae8 0 days 0:00:00.000
48:ae4 0 days 0:00:00.000
47:ae0 0 days 0:00:00.000
46:adc 0 days 0:00:00.000
45:ad8 0 days 0:00:00.000
44:ad4 0 days 0:00:00.000
43:ad0 0 days 0:00:00.000
31:aa0 0 days 0:00:00.000
26:a8c 0 days 0:00:00.000
25:a64 0 days 0:00:00.000
20:a44 0 days 0:00:00.000
19:a40 0 days 0:00:00.000
18:a34 0 days 0:00:00.000
17:a30 0 days 0:00:00.000
16:a2c 0 days 0:00:00.000
15:a28 0 days 0:00:00.000
14:a24 0 days 0:00:00.000
13:a20 0 days 0:00:00.000
12:a1c 0 days 0:00:00.000
11:a18 0 days 0:00:00.000
10:a14 0 days 0:00:00.000
8:a0c 0 days 0:00:00.000
7:a08 0 days 0:00:00.000
6:a04 0 days 0:00:00.000
5:a00 0 days 0:00:00.000
4:9fc 0 days 0:00:00.000
3:9f8 0 days 0:00:00.000
2:9f4 0 days 0:00:00.000
1:9f0 0 days 0:00:00.000
0:9e4 0 days 0:00:00.000
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Finally, after the delay, the issue is available in print on Amazon and through other sellers:
Debugged! MZ/PE: Multithreading
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
A complete memory dump was generated from a totally unresponsive frozen system. Looking at its virtual memory stats we see the shortage of nonpaged pool (insufficient kernel pool memory pattern):
0: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 1047977 ( 4191908 Kb)
Page File: \??\C:\pagefile.sys
Current: 4195328 Kb Free Space: 4167696 Kb
Minimum: 4195328 Kb Maximum: 4195328 Kb
Page File: \??\E:\pagefile.sys
Current: 8372224 Kb Free Space: 8348916 Kb
Minimum: 8372224 Kb Maximum: 8388608 Kb
Available Pages: 855939 ( 3423756 Kb)
ResAvail Pages: 958559 ( 3834236 Kb)
Locked IO Pages: 100 ( 400 Kb)
Free System PTEs: 200627 ( 802508 Kb)
Free NP PTEs: 0 ( 0 Kb)
Free Special NP: 120925 ( 483700 Kb)
Modified Pages: 52 ( 208 Kb)
Modified PF Pages: 52 ( 208 Kb)
NonPagedPool Usage: 65199 ( 260796 Kb)
NonPagedPool Max: 65536 ( 262144 Kb)
********** Excessive NonPaged Pool Usage *****
PagedPool 0 Usage: 7731 ( 30924 Kb)
PagedPool 1 Usage: 6334 ( 25336 Kb)
PagedPool 2 Usage: 6316 ( 25264 Kb)
PagedPool 3 Usage: 6333 ( 25332 Kb)
PagedPool 4 Usage: 6318 ( 25272 Kb)
PagedPool Usage: 33032 ( 132128 Kb)
PagedPool Maximum: 66560 ( 266240 Kb)
********** 7041316 pool allocations have failed **********
Session Commit: 2860 ( 11440 Kb)
Shared Commit: 3904 ( 15616 Kb)
Special Pool: 255 ( 1020 Kb)
Shared Process: 9336 ( 37344 Kb)
PagedPool Commit: 33032 ( 132128 Kb)
Driver Commit: 2147 ( 8588 Kb)
Committed pages: 174724 ( 698896 Kb)
Commit limit: 4148057 ( 16592228 Kb)
Dumping sorted pool consumers we see the most used were DRV* pool tags:
0: kd> !poolused 3
Sorting by NonPaged Pool Consumed
Pool Used:
NonPaged
Tag Allocs Frees Diff Used
DRV2 21683882 21280457 403425 80685000 UNKNOWN pooltag ‘DRV2′, please update pooltag.txt
DRV4 46621052 46217627 403425 54156728 UNKNOWN pooltag ‘DRV4′, please update pooltag.txt
DRV5 37848660 37065132 783528 31341120 UNKNOWN pooltag ‘DRV5′, please update pooltag.txt
MmCm 15754 14607 1147 24917536 Calls made to MmAllocateContiguousMemory , Binary: nt!mm
DRV3 16189418 15785993 403425 19364400 UNKNOWN pooltag ‘DRV3′, please update pooltag.txt
[…]
We also check CPU consumption and see two spiking threads:
0: kd> !running
System Processors f (affinity mask)
Idle Processors 9
Prcbs Current Next
1 f7727120 87c5ca48 …………….
2 f772f120 881ccae0 …………….
We see the first thread spent much more kernel time and stack trace involved DriverA module:
0: kd> !thread 87c5ca48 1f
THREAD 87c5ca48 Cid 0ba0.0bc8 Teb: 7ffd7000 Win32Thread: 00000000 RUNNING on processor 1
Not impersonating
DeviceMap e10028f0
Owning Process 87c3a978 Image: ApplicationB.exe
Attached Process N/A Image: N/A
Wait Start TickCount 4516745 Ticks: 2 (0:00:00:00.031)
Context Switch Count 2234544
UserTime 00:00:58.500
KernelTime 00:55:09.218
Win32 Start Address ApplicationB (0×402016bd)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b8771000 Current b87708b4 Base b8771000 Limit b876e000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 1
ChildEBP RetAddr
b8770bd0 80892b6f nt!ExFreePoolWithTag+0xb7
b8770be0 80933bcd nt!ExFreePool+0xf
b8770c0c 808bb471 nt!ObOpenObjectByName+0×163
b8770cf8 b951f08e nt!NtOpenKey+0×1ad
WARNING: Stack unwind information not available. Following frames may be wrong.
b8770d50 808897cc DriverA+0×1608e
b8770d50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ b8770d64)
00c0f0b4 7c827369 ntdll!KiFastSystemCallRet
00c0f0b8 7d206cb1 ntdll!ZwOpenKey+0xc
00c0f118 7d1fe187 ADVAPI32!LocalBaseRegOpenKey+0xd0
00c0f14c 40202284 ADVAPI32!RegOpenKeyExA+0×11c
[…]
00c0ffb8 77e6482f ApplicationB+0×1891
00c0ffec 00000000 kernel32!BaseThreadStart+0×34
0: kd> !thread 881ccae0 1f
THREAD 881ccae0 Cid 05b8.05c4 Teb: 7ffde000 Win32Thread: 00000000 RUNNING on processor 2
Not impersonating
DeviceMap e10028f0
Owning Process 881cf020 Image: ApplicationC.exe
Attached Process N/A Image: N/A
Wait Start TickCount 4516747 Ticks: 0
Context Switch Count 115916
UserTime 00:00:03.468
KernelTime 00:00:53.156
Win32 Start Address ADVAPI32!ScSvcctrlThreadA (0×7d1f5e70)
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init b900d000 Current b900cc50 Base b900d000 Limit b900a000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr
005bf9c4 7d1eef7e ADVAPI32!LocalBaseRegEnumKey+0×2
005bfc24 010091c9 ADVAPI32!RegEnumKeyExA+0×12f
WARNING: Stack unwind information not available. Following frames may be wrong.
005bfc74 0100a28c ApplicationC+0×91c9
[…]
005bffb8 77e6482f ApplicationC+0×8b13
005bffec 00000000 kernel32!BaseThreadStart+0×34
We check the pool entry it was freeing:
0: kd> .thread /r /p 87c5ca48
Implicit thread is now 87c5ca48
Implicit process is now 87c3a978
Loading User Symbols
0: kd> kv L1
ChildEBP RetAddr Args to Child
b8770bd0 80892b6f 8ab6b3c8 00000000 b8770c0c nt!ExFreePoolWithTag+0xb7
0: kd> !pool 8ab6b3c8
Pool page 8ab6b3c8 region is Nonpaged pool
8ab6b000 size: c8 previous size: 0 (Allocated) DRV2
8ab6b0c8 size: 70 previous size: c8 (Allocated) MmCa
8ab6b138 size: 10 previous size: 70 (Free) DRV2
8ab6b148 size: 30 previous size: 10 (Allocated) DRV3
8ab6b178 size: 30 previous size: 30 (Allocated) DRV3
8ab6b1a8 size: 28 previous size: 30 (Allocated) NtFs
8ab6b1d0 size: 30 previous size: 28 (Allocated) DRV3
8ab6b200 size: 18 previous size: 30 (Free) IrpC
8ab6b218 size: a8 previous size: 18 (Allocated) File (Protected)
8ab6b2c0 size: 100 previous size: a8 (Allocated) MmCi
*8ab6b3c0 size: 38 previous size: 100 (Allocated) *ObCi
Pooltag ObCi : captured information for ObCreateObject, Binary : nt!ob
8ab6b3f8 size: 48 previous size: 38 (Allocated) DRV2
8ab6b440 size: 8 previous size: 48 (Free) DRV4
8ab6b448 size: 28 previous size: 8 (Allocated) NtFs
8ab6b470 size: 8 previous size: 28 (Free) Mdl
8ab6b478 size: 60 previous size: 8 (Allocated) DRV4
8ab6b4d8 size: c8 previous size: 60 (Allocated) DRV2
8ab6b5a0 size: a8 previous size: c8 (Allocated) File (Protected)
8ab6b648 size: c8 previous size: a8 (Allocated) DRV2
8ab6b710 size: c8 previous size: c8 (Allocated) DRV2
8ab6b7d8 size: 10 previous size: c8 (Free) File
8ab6b7e8 size: 98 previous size: 10 (Allocated) DRV4
8ab6b880 size: 30 previous size: 98 (Allocated) DRV3
8ab6b8b0 size: 28 previous size: 30 (Allocated) NtFs
8ab6b8d8 size: 28 previous size: 28 (Allocated) DRV5
8ab6b900 size: 30 previous size: 28 (Allocated) DRV3
8ab6b930 size: a8 previous size: 30 (Allocated) File (Protected)
8ab6b9d8 size: 100 previous size: a8 (Allocated) MmCi
8ab6bad8 size: 10 previous size: 100 (Free) ObCi
[…]
In the output above we see all clustering of DRV* pool tags and check their contents:
0: kd> dps 8ab6b000
8ab6b000 0a190000
[…]
8ab6b06c b95349a0 DriverA+0×2b9a0
8ab6b070 87cb85c8
8ab6b074 002d0000
8ab6b078 8ab6b078
8ab6b07c 8ab6b078
0: kd> dps 8ab6b8d8
8ab6b8d8 0a050005
[…]
8ab6b8f8 b95349a0 DriverA+0×2b9a0
8ab6b8fc 00000074
[…]
It looks like all DRV* pool entries have symbolic references in the range of DriverA (data contents locality):
0: kd> lm m DriverA
start end module name
b9509000 b9537f00 DriverA (no symbols)
So the conclusion was to refer to DriverA vendor for any updates.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Wait Chain (executive resources) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Exploring the destiny theme of !analyze -vostokov further. Suddenly it came to me today that my initials DV are the same as initials of Debugging Virtuoso It might be also Debugging Vanity as well…
With fix-privet,
Dr. DebugLove
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
The first annual competition was held between 7/7/2010 and 8/8/2010 and the following winners were selected (in alphabetical order):
- Tavis Ormandy
nominated for “windows” story:
https://docs.google.com/View?id=dfqd62nk_228h28szgz
- Andrey Permamedov
selected entries from “Diary of war with bugs”:
http://bugswar.blogspot.com/
- Sathish Venkataraman
sent the tough enterprise debugging story:
to be published in the next issue of Debugged! magazine
All winners will get 3 volumes of Memory Dump Analysis Anthology + Color Supplement all signed by the author.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
In addition to LPC / ALPC wait chains we can also see RPC chains in complete memory dumps and even mixed (A)LPC / RPC chains. How to distinguish RPC from (A)LPC (and RPC over LPC) threads? Here’s a fragment from an RPC over LPC thread (they also have “waiting for …” or “working on …” strings in THREAD output):
f50e4c20 80833491 nt!KiSwapContext+0x26
f50e4c4c 80829a82 nt!KiSwapThread+0x2e5
f50e4c94 8091ecf2 nt!KeWaitForSingleObject+0x346
f50e4d50 808897cc nt!NtRequestWaitReplyPort+0×776
f50e4d50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f50e4d64)
00e5f2b8 7c827899 ntdll!KiFastSystemCallRet
00e5f2bc 77c80a6e ntdll!ZwRequestWaitReplyPort+0xc
00e5f308 77c7fcf0 RPCRT4!LRPC_CCALL::SendReceive+0×230
00e5f314 77c80673 RPCRT4!I_RpcSendReceive+0×24
00e5f328 77ce315a RPCRT4!NdrSendReceive+0×2b
00e5f710 771f40c4 RPCRT4!NdrClientCall2+0×22e
Here’s the thread stack of an RPC waiting thread (the connection was over a pipe):
THREAD 8a4b7320 Cid 0208.0220 Teb: 7ffa4000 Win32Thread: bc3eaea8 WAIT: (Unknown) UserMode Non-Alertable
8bc379f8 NotificationEvent
IRP List:
891879a8: (0006,0094) Flags: 00000800 Mdl: 00000000
Not impersonating
DeviceMap e1002270
Owning Process 8a5c8828 Image: ApplicationA.exe
Attached Process N/A Image: N/A
Wait Start TickCount 3044574 Ticks: 37746 (0:00:09:49.781)
Context Switch Count 54673 LargeStack
UserTime 00:00:00.015
KernelTime 00:00:00.046
Win32 Start Address MSVCR90!_threadstartex (0x7854345e)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f473b000 Current f473ac60 Base f473b000 Limit f4737000 Call 0
Priority 11 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr
f473ac78 80833491 nt!KiSwapContext+0x26
f473aca4 80829a82 nt!KiSwapThread+0x2e5
f473acec 80938dea nt!KeWaitForSingleObject+0x346
f473ad50 808897cc nt!NtWaitForSingleObject+0x9a
f473ad50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f473ad64)
0155f0f4 7c827d29 ntdll!KiFastSystemCallRet
0155f0f8 77e61d1e ntdll!ZwWaitForSingleObject+0xc
0155f168 77c6a85f kernel32!WaitForSingleObjectEx+0xac
0155f184 77c69bf7 RPCRT4!UTIL_WaitForSyncIO+0×20
0155f1a8 77c6a935 RPCRT4!UTIL_GetOverlappedResultEx+0×1d
0155f1c4 77c6a8f3 RPCRT4!UTIL_GetOverlappedResult+0×17
0155f1e4 77c6944f RPCRT4!NMP_SyncSendRecv+0×73
0155f20c 77c69667 RPCRT4!OSF_CCONNECTION::TransSendReceive+0×7d
0155f294 77c695d4 RPCRT4!OSF_CCONNECTION::SendFragment+0×2ae
0155f2ec 77c6977a RPCRT4!OSF_CCALL::SendNextFragment+0×1e2
0155f334 77c699f2 RPCRT4!OSF_CCALL::FastSendReceive+0×148
0155f350 77c69975 RPCRT4!OSF_CCALL::SendReceiveHelper+0×5b
0155f380 77c7fcf0 RPCRT4!OSF_CCALL::SendReceive+0×41
0155f38c 77c80673 RPCRT4!I_RpcSendReceive+0×24
0155f3a0 77ce315a RPCRT4!NdrSendReceive+0×2b
0155f788 7d1fa0b1 RPCRT4!NdrClientCall2+0×22e
[…]
0155ffac 785434c7 MSVCR90!_callthreadstartex+0×1b
0155ffb8 77e6482f MSVCR90!_threadstartex+0×69
0155ffec 00000000 kernel32!BaseThreadStart+0×34
Here’s the endpoint thread stack in the RPC service processing the client call:
THREAD 8a631d80 Cid 0244.0290 Teb: 7ffd4000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
8a6354d8 SynchronizationEvent
IRP List:
882d0008: (0006,0094) Flags: 00000900 Mdl: 00000000
8811c450: (0006,0094) Flags: 00000900 Mdl: 00000000
8a4d1b28: (0006,0190) Flags: 00000000 Mdl: 8a4d2e40
8a634188: (0006,0094) Flags: 00000800 Mdl: 00000000
Not impersonating
DeviceMap e1002270
Owning Process 8a5b3ac8 Image: ServiceB.exe
Attached Process N/A Image: N/A
Wait Start TickCount 3041752 Ticks: 40568 (0:00:10:33.875)
Context Switch Count 36194
UserTime 00:00:00.562
KernelTime 00:00:01.093
Win32 Start Address RPCRT4!ThreadStartRoutine (0x77c7b0f5)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f5f3e000 Current f5f3dc60 Base f5f3e000 Limit f5f3b000 Call 0
Priority 12 BasePriority 10 PriorityDecrement 0
ChildEBP RetAddr
f5f3dc78 80833491 nt!KiSwapContext+0x26
f5f3dca4 80829a82 nt!KiSwapThread+0x2e5
f5f3dcec 80938dea nt!KeWaitForSingleObject+0x346
f5f3dd50 808897cc nt!NtWaitForSingleObject+0x9a
f5f3dd50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f5f3dd64)
00def83c 7c827d29 ntdll!KiFastSystemCallRet
00def840 7c83d266 ntdll!ZwWaitForSingleObject+0xc
00def87c 7c83d2b1 ntdll!RtlpWaitOnCriticalSection+0×1a3
00def89c 4ab773ea ntdll!RtlEnterCriticalSection+0xa8
00def8ac 4ab78726 ServiceB!AcquireLock+0×7c
[…]
00def944 77ce33e1 RPCRT4!Invoke+0×30
00defd44 77ce35c4 RPCRT4!NdrStubCall2+0×299
00defd60 77c7ff7a RPCRT4!NdrServerCall2+0×19
00defd94 77c8042d RPCRT4!DispatchToStubInCNoAvrf+0×38
00defde8 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×11f
00defe0c 77c68e0d RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
00defe40 77c68cb3 RPCRT4!OSF_SCALL::DispatchHelper+0×149
00defe54 77c68c2b RPCRT4!OSF_SCALL::DispatchRPCCall+0×10d
00defe84 77c68b5e RPCRT4!OSF_SCALL::ProcessReceivedPDU+0×57f
00defea4 77c6e8db RPCRT4!OSF_SCALL::BeginRpcCall+0×194
00deff04 77c6e7b4 RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+0×435
00deff18 77c7b799 RPCRT4!ProcessConnectionServerReceivedEvent+0×21
00deff84 77c7b9b5 RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+0×1b8
00deff8c 77c8872d RPCRT4!ProcessIOEventsWrapper+0xd
00deffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
00deffb8 77e6482f RPCRT4!ThreadStartRoutine+0×1b
00deffec 00000000 kernel32!BaseThreadStart+0×34
We also see that the latter thread is waiting for a critical section so we have an example of a mixed wait chain here as well. Another example is an RPC over LPC server thread that is also an RPC client thread:
THREAD 8989f020 Cid 0170.1cfc Teb: 7ff8c000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
89a1c368 NotificationEvent
IRP List:
887aac68: (0006,0094) Flags: 00000800 Mdl: 00000000
Not impersonating
DeviceMap e1002270
Owning Process 8a056b80 Image: ServiceC.exe
Attached Process N/A Image: N/A
Wait Start TickCount 3075354 Ticks: 6966 (0:00:01:48.843)
Context Switch Count 2521
UserTime 00:00:00.031
KernelTime 00:00:00.015
Win32 Start Address 0x00750d91
LPC Server thread working on message Id 750d91
Start Address kernel32!BaseThreadStartThunk (0×77e617ec)
Stack Init f26da000 Current f26d9c60 Base f26da000 Limit f26d7000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0
ChildEBP RetAddr
f26d9c78 80833491 nt!KiSwapContext+0×26
f26d9ca4 80829a82 nt!KiSwapThread+0×2e5
f26d9cec 80938dea nt!KeWaitForSingleObject+0×346
f26d9d50 808897cc nt!NtWaitForSingleObject+0×9a
f26d9d50 7c82860c nt!KiFastCallEntry+0xfc (TrapFrame @ f26d9d64)
03e9efa8 7c827d29 ntdll!KiFastSystemCallRet
03e9efac 77e61d1e ntdll!ZwWaitForSingleObject+0xc
03e9f01c 77c6a85f kernel32!WaitForSingleObjectEx+0xac
03e9f038 77c69bf7 RPCRT4!UTIL_WaitForSyncIO+0×20
03e9f05c 77c6a935 RPCRT4!UTIL_GetOverlappedResultEx+0×1d
03e9f078 77c6a8f3 RPCRT4!UTIL_GetOverlappedResult+0×17
03e9f098 77c6944f RPCRT4!NMP_SyncSendRecv+0×73
03e9f0c0 77c69667 RPCRT4!OSF_CCONNECTION::TransSendReceive+0×7d
03e9f148 77c695d4 RPCRT4!OSF_CCONNECTION::SendFragment+0×2ae
03e9f1a0 77c6977a RPCRT4!OSF_CCALL::SendNextFragment+0×1e2
03e9f1f8 77c699f2 RPCRT4!OSF_CCALL::FastSendReceive+0×148
03e9f214 77c69975 RPCRT4!OSF_CCALL::SendReceiveHelper+0×5b
03e9f244 77c7fcf0 RPCRT4!OSF_CCALL::SendReceive+0×41
03e9f250 77c80673 RPCRT4!I_RpcSendReceive+0×24
03e9f264 77ce315a RPCRT4!NdrSendReceive+0×2b
03e9f64c 7d1fa0b1 RPCRT4!NdrClientCall2+0×22e
03e9f8ac 7654fa50 ServiceC!QueryInformation+0×801
[…]
03e9f8f8 77ce33e1 RPCRT4!Invoke+0×30
03e9fcf8 77ce35c4 RPCRT4!NdrStubCall2+0×299
03e9fd14 77c7ff7a RPCRT4!NdrServerCall2+0×19
03e9fd48 77c8042d RPCRT4!DispatchToStubInCNoAvrf+0×38
03e9fd9c 77c80353 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0×11f
03e9fdc0 77c811dc RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3
03e9fdfc 77c812f0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0×42c
03e9fe20 77c88678 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0×127
03e9ff84 77c88792 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0×430
03e9ff8c 77c8872d RPCRT4!RecvLotsaCallsWrapper+0xd
03e9ffac 77c7b110 RPCRT4!BaseCachedThreadRoutine+0×9d
03e9ffb8 77e6482f RPCRT4!ThreadStartRoutine+0×1b
03e9ffec 00000000 kernel32!BaseThreadStart+0×34
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
In one of my previous posts I told the fictitious story about the power of human mind in debugging:
Here is another software defect that I think might require a human-computer pair to fix it. Consider the following problem description: after connecting to a remote desktop and launching an office suite application a display shows some aberrations and black lines appear and disappear frequently. If this depends on a real non-virtualized hardware then I don’t see easily how to quantify screen aberrations to setup an automated debugger procedure to locate the code especially if the problem disappears on a virtual machine.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Often we see errors in software traces recorded during deviant software behavior (often called non-working software traces) and when we double check their presence in normal expected software behavior traces (often called working traces) we find them there too. This pattern is called False Positive Error. I already mentioned similar false positives when I introduced the first software trace analysis pattern called Periodic Error. Here is an example from the real trace. In a non-working trace we found this error in an adjoint thread of a foreground component:
OpenProcess error 5
However, we found the same error in the working trace so we continued looking and found several other errors:
Message request report: last error 1168, …
[…]
GetMsg result -2146435043
The last one if converted to a hex status is 8010001D but, unfortunately, the same errors were present in the working trace too in the same activity regions.
After that we started comparing both traces looking for a bifurcation point and we found the error that was only present in a non-working trace with a significant trace differences after that:
Error reading from the named pipe: 800700E9
My favourite tool (WinDbg) to convert error and status values gave this description:
0:000> !error 800700E9
Error code: (HRESULT) 0x800700e9 (2147942633) - No process is on the other end of the pipe.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Wait Chain (critical sections) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Wait Chain (general) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Manual Dump (process) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
This forthcoming full color book is the complete transcript of a Webinar organized by Memory Dump Analysis Services (www.DumpAnalysis.com).
It discusses user vs. kernel vs. physical (complete) memory space, challenges of complete memory dump analysis, common WinDbg commands, patterns and pattern-driven analysis methodology, common mistakes, fiber bundles, DumpAnalysis.org case studies and illustrates step by step a hands-on exercise in a complete memory dump analysis.
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -
Today we introduce an icon for Manual Dump (kernel) pattern:
B/W
Color
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -