Crash Dump Analysis Patterns (Part 106)
Sunday, September 19th, 2010Abnormal CPU consumption detection usually goes at a process level when we detect it using Task Manager, for example. Sometimes that process has only one spiking thread among many but there are cases when CPU consumption is spread among many threads. I call this pattern Distributed Spike. This behavior could be a consequence of a weak process coupling, for example, in these 2 services (where, for simplicity, I highlight in red color threads with more than 1 second CPU time spent in user mode):
0:000> !runaway
User Mode Time
Thread Time
120:4e518 0 days 0:05:09.937
126:531bc 0 days 0:03:56.546
44:334c 0 days 0:03:40.765
133:4fe1c 0 days 0:03:31.156
45:42b4 0 days 0:03:27.328
107:25ae0 0 days 0:03:19.921
49:627c 0 days 0:02:48.250
147:6b90c 0 days 0:02:33.046
136:6620c 0 days 0:02:05.109
127:4f2d0 0 days 0:02:04.046
129:5bc30 0 days 0:02:02.171
48:623c 0 days 0:02:01.796
119:41f00 0 days 0:02:00.562
74:cd18 0 days 0:01:59.453
51:7a4c 0 days 0:01:54.234
35:21d4 0 days 0:01:47.390
148:326dc 0 days 0:01:32.640
123:43c8c 0 days 0:01:32.515
135:67b08 0 days 0:01:32.296
11:aa8 0 days 0:01:30.906
118:42f8c 0 days 0:01:20.265
42:3a3c 0 days 0:01:20.000
77:d024 0 days 0:01:19.734
115:3a840 0 days 0:01:15.625
89:145f4 0 days 0:01:10.500
157:4e310 0 days 0:01:07.625
80:d07c 0 days 0:01:07.468
33:1ab0 0 days 0:01:00.593
117:10bd4 0 days 0:00:59.421
151:1aaa0 0 days 0:00:59.015
28:17bc 0 days 0:00:58.796
83:f3a4 0 days 0:00:55.828
122:41964 0 days 0:00:55.578
149:4101c 0 days 0:00:55.234
10:aa4 0 days 0:00:52.453
106:21b80 0 days 0:00:51.187
132:62e5c 0 days 0:00:49.437
160:3a3a8 0 days 0:00:48.875
137:6bf90 0 days 0:00:48.687
145:6f594 0 days 0:00:47.968
143:58d60 0 days 0:00:45.703
72:ba64 0 days 0:00:44.515
41:19b0 0 days 0:00:44.000
130:5d480 0 days 0:00:43.750
139:6d090 0 days 0:00:42.062
138:6d578 0 days 0:00:40.406
91:17974 0 days 0:00:40.359
152:37f80 0 days 0:00:39.781
81:de68 0 days 0:00:39.265
150:65b2c 0 days 0:00:36.625
162:1f340 0 days 0:00:35.125
85:10650 0 days 0:00:33.546
131:614e8 0 days 0:00:33.093
128:2eddc 0 days 0:00:33.000
146:6f690 0 days 0:00:32.015
161:3c4b4 0 days 0:00:30.421
167:3cde4 0 days 0:00:29.390
171:3979c 0 days 0:00:28.515
166:3cd40 0 days 0:00:28.312
168:68ef0 0 days 0:00:27.781
65:aad0 0 days 0:00:26.593
109:267f4 0 days 0:00:26.390
88:13624 0 days 0:00:26.000
173:5282c 0 days 0:00:24.640
153:71e14 0 days 0:00:23.390
112:322b4 0 days 0:00:22.812
110:9578 0 days 0:00:22.125
175:20230 0 days 0:00:20.250
79:b458 0 days 0:00:20.218
66:61b8 0 days 0:00:19.875
62:9498 0 days 0:00:19.562
156:d900 0 days 0:00:19.015
121:5106c 0 days 0:00:18.687
142:6bb28 0 days 0:00:18.562
46:2cbc 0 days 0:00:17.796
169:d920 0 days 0:00:16.875
154:720b4 0 days 0:00:16.484
170:4ac8c 0 days 0:00:15.968
73:b010 0 days 0:00:13.609
39:3224 0 days 0:00:13.406
172:722e4 0 days 0:00:12.375
63:9780 0 days 0:00:12.203
177:8464 0 days 0:00:11.906
184:22908 0 days 0:00:10.234
140:5765c 0 days 0:00:09.750
174:2f484 0 days 0:00:08.390
50:7230 0 days 0:00:07.125
187:3c324 0 days 0:00:06.765
125:46cf0 0 days 0:00:06.296
178:3a424 0 days 0:00:05.125
114:33d20 0 days 0:00:03.734
165:3ca74 0 days 0:00:01.203
189:3c358 0 days 0:00:01.000
164:3124c 0 days 0:00:00.578
25:be4 0 days 0:00:00.515
17:ba8 0 days 0:00:00.125
104:5cf8 0 days 0:00:00.109
26:e4c 0 days 0:00:00.109
96:5d44 0 days 0:00:00.093
99:5b18 0 days 0:00:00.078
56:8a6c 0 days 0:00:00.078
55:8a68 0 days 0:00:00.078
6:a08 0 days 0:00:00.078
4:a00 0 days 0:00:00.062
103:5cfc 0 days 0:00:00.046
100:5ab8 0 days 0:00:00.046
68:bf34 0 days 0:00:00.046
37:29d4 0 days 0:00:00.046
101:5ab4 0 days 0:00:00.031
98:5b44 0 days 0:00:00.031
97:5d40 0 days 0:00:00.031
57:8a70 0 days 0:00:00.031
53:8a60 0 days 0:00:00.031
36:29c0 0 days 0:00:00.031
16:ac4 0 days 0:00:00.031
1:9e4 0 days 0:00:00.031
60:880c 0 days 0:00:00.015
58:8a5c 0 days 0:00:00.015
24:be0 0 days 0:00:00.015
15:abc 0 days 0:00:00.015
188:13044 0 days 0:00:00.000
186:6530 0 days 0:00:00.000
185:2013c 0 days 0:00:00.000
183:6047c 0 days 0:00:00.000
182:65400 0 days 0:00:00.000
181:61560 0 days 0:00:00.000
180:2b7a4 0 days 0:00:00.000
179:56294 0 days 0:00:00.000
176:20300 0 days 0:00:00.000
163:2ab1c 0 days 0:00:00.000
159:276cc 0 days 0:00:00.000
158:72134 0 days 0:00:00.000
155:6a078 0 days 0:00:00.000
144:6ce98 0 days 0:00:00.000
141:5404 0 days 0:00:00.000
134:65718 0 days 0:00:00.000
124:4bed4 0 days 0:00:00.000
116:3c770 0 days 0:00:00.000
113:b08 0 days 0:00:00.000
111:28e54 0 days 0:00:00.000
108:25fbc 0 days 0:00:00.000
105:20504 0 days 0:00:00.000
102:5cf4 0 days 0:00:00.000
95:5c70 0 days 0:00:00.000
94:5ed4 0 days 0:00:00.000
93:18c2c 0 days 0:00:00.000
92:19fd8 0 days 0:00:00.000
90:c870 0 days 0:00:00.000
87:7994 0 days 0:00:00.000
86:124cc 0 days 0:00:00.000
84:eab8 0 days 0:00:00.000
82:f2a4 0 days 0:00:00.000
78:d5c0 0 days 0:00:00.000
76:cfd0 0 days 0:00:00.000
75:cf64 0 days 0:00:00.000
71:b4f8 0 days 0:00:00.000
70:c628 0 days 0:00:00.000
69:c484 0 days 0:00:00.000
67:be84 0 days 0:00:00.000
64:aa00 0 days 0:00:00.000
61:93f0 0 days 0:00:00.000
59:89e4 0 days 0:00:00.000
54:8a64 0 days 0:00:00.000
52:89a8 0 days 0:00:00.000
47:4c64 0 days 0:00:00.000
43:3fa0 0 days 0:00:00.000
40:2c88 0 days 0:00:00.000
38:2a28 0 days 0:00:00.000
34:1928 0 days 0:00:00.000
32:1668 0 days 0:00:00.000
31:8dc 0 days 0:00:00.000
30:15d4 0 days 0:00:00.000
29:1044 0 days 0:00:00.000
27:fb4 0 days 0:00:00.000
23:bd8 0 days 0:00:00.000
22:bd4 0 days 0:00:00.000
21:bd0 0 days 0:00:00.000
20:bc8 0 days 0:00:00.000
19:bc4 0 days 0:00:00.000
18:bc0 0 days 0:00:00.000
14:ab8 0 days 0:00:00.000
13:ab4 0 days 0:00:00.000
12:ab0 0 days 0:00:00.000
9:aa0 0 days 0:00:00.000
8:a9c 0 days 0:00:00.000
7:a98 0 days 0:00:00.000
5:a04 0 days 0:00:00.000
3:9f4 0 days 0:00:00.000
2:9f0 0 days 0:00:00.000
0:994 0 days 0:00:00.000
This is a real spike in the first service process as can be confirmed by a random non-waiting thread:
0:000> ~143k
ChildEBP RetAddr
050dfc68 7c82d6a4 ntdll!RtlEnterCriticalSection+0x1d
050dfc84 77c7bc50 ntdll!RtlInitializeCriticalSectionAndSpinCount+0x92
050dfc98 77c7bc7c rpcrt4!MUTEX::CommonConstructor+0x1b
050dfcac 77c7c000 rpcrt4!MUTEX::MUTEX+0x13
050dfcc8 77c6ff47 rpcrt4!BINDING_HANDLE::BINDING_HANDLE+0x2d
050dfcd8 77c6ff1f rpcrt4!SVR_BINDING_HANDLE::SVR_BINDING_HANDLE+0x10
050dfcfc 77c6d338 rpcrt4!RPC_ADDRESS::InquireBinding+0x8a
050dfd0c 77c6fd1d rpcrt4!LRPC_SCALL::ToStringBinding+0x16
050dfd1c 76554c83 rpcrt4!RpcBindingToStringBindingW+0x4d
050dfd5c 77c7c42a ServiceA!RpcSecurityCallback+0x1e
050dfdb4 77c7c4b0 rpcrt4!RPC_INTERFACE::CheckSecurityIfNecessary+0x6f
050dfdcc 77c7c46c rpcrt4!LRPC_SBINDING::CheckSecurity+0x4f
050dfdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0x2bb
050dfe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0x127
050dff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430
050dff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
050dffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0x9d
050dffb8 77e64829 rpcrt4!ThreadStartRoutine+0x1b
050dffec 00000000 kernel32!BaseThreadStart+0x34
0:000> ~143r
eax=00000000 ebx=00000000 ecx=7c887784 edx=7c887780 esi=7c887784 edi=00163fb0
eip=7c81a37d esp=050dfc5c ebp=050dfc68 iopl=0 nv up ei ng nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000287
ntdll!RtlEnterCriticalSection+0x1d:
7c81a37d 0f92c0 setb al
0:000> .asm no_code_bytes
Assembly options: no_code_bytes
0:000> u 7c81a37d
ntdll!RtlEnterCriticalSection+0x1d:
7c81a37d setb al
7c81a380 test al,al
7c81a382 je ntdll!RtlEnterCriticalSection+0x28 (7c82b096)
7c81a388 mov ecx,dword ptr fs:[18h]
7c81a38f mov eax,dword ptr [ecx+24h]
7c81a392 pop edi
7c81a393 mov dword ptr [edx+0Ch],eax
7c81a396 mov dword ptr [edx+8],1
0:000> ub 7c81a37d
ntdll!RtlEnterCriticalSection+0x6:
7c81a366 mov edx,dword ptr [ebp+8]
7c81a369 push esi
7c81a36a lea esi,[edx+4]
7c81a36d push edi
7c81a36e mov dword ptr [ebp-4],esi
7c81a371 mov eax,0
7c81a376 mov ecx,dword ptr [ebp-4]
7c81a379 lock btr dword ptr [ecx],eax
The second service is weakly (waiting for event notifications) coupled to the first service above:
0:000> !runaway
User Mode Time
Thread Time
5:dbec 0 days 0:01:50.031
8:46008 0 days 0:01:46.062
11:ad0c 0 days 0:01:13.921
17:932c 0 days 0:01:03.234
14:45d78 0 days 0:00:58.109
15:6d4d0 0 days 0:00:00.015
2:725a4 0 days 0:00:00.015
0:6101c 0 days 0:00:00.015
18:d1c4 0 days 0:00:00.000
16:76bc 0 days 0:00:00.000
13:456a8 0 days 0:00:00.000
12:459e4 0 days 0:00:00.000
10:3c768 0 days 0:00:00.000
9:12d20 0 days 0:00:00.000
7:46010 0 days 0:00:00.000
6:4600c 0 days 0:00:00.000
4:dbf0 0 days 0:00:00.000
3:17ed4 0 days 0:00:00.000
1:61024 0 days 0:00:00.000
0:000> ~11k
ChildEBP RetAddr
0223fa68 7c82787b ntdll!KiFastSystemCallRet
0223fa6c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
0223fab8 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0x230
0223fac4 77c80673 rpcrt4!I_RpcSendReceive+0x24
0223fad8 77ce315a rpcrt4!NdrSendReceive+0x2b
0223fec0 771f4fbd rpcrt4!NdrClientCall2+0x22e
0223fed8 771f4f60 ServiceB!RpcWaitEvent+0x1c
[...]
0:000> ~17k
ChildEBP RetAddr
0283fa68 7c82787b ntdll!KiFastSystemCallRet
0283fa6c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
0283fab8 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0x230
0283fac4 77c80673 rpcrt4!I_RpcSendReceive+0x24
0283fad8 77ce315a rpcrt4!NdrSendReceive+0x2b
0283fec0 771f4fbd rpcrt4!NdrClientCall2+0x22e
0283fed8 771f4f60 ServiceB!RpcWaitEvent+0x1c
[...]
Sometimes, semantic process coupling also results in distributed spikes and most often it is possible to predict another spiking process in such cases. In our example above both spiking processes were semantically coupled with another service and it was confirmed that it was spiking too:
0:000> !runaway
User Mode Time
Thread Time
89:10d4 0 days 0:03:03.500
28:a94 0 days 0:00:39.562
73:c10 0 days 0:00:37.531
54:b88 0 days 0:00:37.140
29:a98 0 days 0:00:35.906
27:a90 0 days 0:00:35.500
75:c2c 0 days 0:00:28.812
90:10d8 0 days 0:00:27.000
93:10e4 0 days 0:00:24.265
32:aa4 0 days 0:00:12.906
41:ac8 0 days 0:00:11.890
35:ab0 0 days 0:00:11.875
58:bc4 0 days 0:00:10.218
42:acc 0 days 0:00:09.546
85:e74 0 days 0:00:08.859
36:ab4 0 days 0:00:08.578
72:c0c 0 days 0:00:05.890
70:c04 0 days 0:00:05.687
33:aa8 0 days 0:00:05.046
74:c14 0 days 0:00:04.953
40:ac4 0 days 0:00:04.953
38:abc 0 days 0:00:04.359
39:ac0 0 days 0:00:04.312
34:aac 0 days 0:00:04.140
64:bec 0 days 0:00:03.812
88:10d0 0 days 0:00:03.187
30:a9c 0 days 0:00:02.859
9:a10 0 days 0:00:01.968
37:ab8 0 days 0:00:01.953
92:10e0 0 days 0:00:01.718
83:d00 0 days 0:00:01.125
94:1150 0 days 0:00:01.031
77:c54 0 days 0:00:00.890
98:f2c0 0 days 0:00:00.265
97:eb1c 0 days 0:00:00.265
76:c50 0 days 0:00:00.265
21:a48 0 days 0:00:00.187
22:a4c 0 days 0:00:00.140
63:be8 0 days 0:00:00.093
23:a50 0 days 0:00:00.093
53:af8 0 days 0:00:00.078
24:a54 0 days 0:00:00.046
71:c08 0 days 0:00:00.031
65:bf0 0 days 0:00:00.031
87:e8c 0 days 0:00:00.015
57:bc0 0 days 0:00:00.015
104:6454c 0 days 0:00:00.000
103:63fb4 0 days 0:00:00.000
102:3c5ec 0 days 0:00:00.000
101:65178 0 days 0:00:00.000
100:5d0e4 0 days 0:00:00.000
99:5bae4 0 days 0:00:00.000
96:574 0 days 0:00:00.000
95:b84 0 days 0:00:00.000
91:10dc 0 days 0:00:00.000
86:e88 0 days 0:00:00.000
84:e70 0 days 0:00:00.000
82:c84 0 days 0:00:00.000
81:c68 0 days 0:00:00.000
80:c64 0 days 0:00:00.000
79:c60 0 days 0:00:00.000
78:c5c 0 days 0:00:00.000
69:c00 0 days 0:00:00.000
68:bfc 0 days 0:00:00.000
67:bf8 0 days 0:00:00.000
66:bf4 0 days 0:00:00.000
62:bd8 0 days 0:00:00.000
61:bd4 0 days 0:00:00.000
60:bd0 0 days 0:00:00.000
59:bcc 0 days 0:00:00.000
56:bbc 0 days 0:00:00.000
55:bb8 0 days 0:00:00.000
52:af4 0 days 0:00:00.000
51:af0 0 days 0:00:00.000
50:aec 0 days 0:00:00.000
49:ae8 0 days 0:00:00.000
48:ae4 0 days 0:00:00.000
47:ae0 0 days 0:00:00.000
46:adc 0 days 0:00:00.000
45:ad8 0 days 0:00:00.000
44:ad4 0 days 0:00:00.000
43:ad0 0 days 0:00:00.000
31:aa0 0 days 0:00:00.000
26:a8c 0 days 0:00:00.000
25:a64 0 days 0:00:00.000
20:a44 0 days 0:00:00.000
19:a40 0 days 0:00:00.000
18:a34 0 days 0:00:00.000
17:a30 0 days 0:00:00.000
16:a2c 0 days 0:00:00.000
15:a28 0 days 0:00:00.000
14:a24 0 days 0:00:00.000
13:a20 0 days 0:00:00.000
12:a1c 0 days 0:00:00.000
11:a18 0 days 0:00:00.000
10:a14 0 days 0:00:00.000
8:a0c 0 days 0:00:00.000
7:a08 0 days 0:00:00.000
6:a04 0 days 0:00:00.000
5:a00 0 days 0:00:00.000
4:9fc 0 days 0:00:00.000
3:9f8 0 days 0:00:00.000
2:9f4 0 days 0:00:00.000
1:9f0 0 days 0:00:00.000
0:9e4 0 days 0:00:00.000
- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -