Crash Dump Analysis Patterns (Part 106)

Abnormal CPU consumption detection usually goes at a process level when we detect it using Task Manager, for example. Sometimes that process has only one spiking thread among many but there are cases when CPU consumption is spread among many threads. I call this pattern Distributed Spike. This behavior could be a consequence of a weak process coupling, for example, in these 2 services (where, for simplicity, I highlight in red color threads with more than 1 second CPU time spent in user mode):

0:000> !runaway
 User Mode Time
  Thread       Time
 120:4e518      0 days 0:05:09.937
 126:531bc      0 days 0:03:56.546
  44:334c      0 days 0:03:40.765
 133:4fe1c      0 days 0:03:31.156
  45:42b4      0 days 0:03:27.328
 107:25ae0      0 days 0:03:19.921
  49:627c      0 days 0:02:48.250
 147:6b90c      0 days 0:02:33.046
 136:6620c      0 days 0:02:05.109
 127:4f2d0      0 days 0:02:04.046
 129:5bc30      0 days 0:02:02.171
  48:623c      0 days 0:02:01.796
 119:41f00      0 days 0:02:00.562
  74:cd18      0 days 0:01:59.453
  51:7a4c      0 days 0:01:54.234
  35:21d4      0 days 0:01:47.390
 148:326dc      0 days 0:01:32.640
 123:43c8c      0 days 0:01:32.515
 135:67b08      0 days 0:01:32.296
  11:aa8       0 days 0:01:30.906
 118:42f8c      0 days 0:01:20.265
  42:3a3c      0 days 0:01:20.000
  77:d024      0 days 0:01:19.734
 115:3a840      0 days 0:01:15.625
  89:145f4      0 days 0:01:10.500
 157:4e310      0 days 0:01:07.625
  80:d07c      0 days 0:01:07.468
  33:1ab0      0 days 0:01:00.593
 117:10bd4      0 days 0:00:59.421
 151:1aaa0      0 days 0:00:59.015
  28:17bc      0 days 0:00:58.796
  83:f3a4      0 days 0:00:55.828
 122:41964      0 days 0:00:55.578
 149:4101c      0 days 0:00:55.234
  10:aa4       0 days 0:00:52.453
 106:21b80      0 days 0:00:51.187
 132:62e5c      0 days 0:00:49.437
 160:3a3a8      0 days 0:00:48.875
 137:6bf90      0 days 0:00:48.687
 145:6f594      0 days 0:00:47.968
 143:58d60      0 days 0:00:45.703
  72:ba64      0 days 0:00:44.515
  41:19b0      0 days 0:00:44.000
 130:5d480      0 days 0:00:43.750
 139:6d090      0 days 0:00:42.062
 138:6d578      0 days 0:00:40.406
  91:17974      0 days 0:00:40.359
 152:37f80      0 days 0:00:39.781
  81:de68      0 days 0:00:39.265
 150:65b2c      0 days 0:00:36.625
 162:1f340      0 days 0:00:35.125
  85:10650      0 days 0:00:33.546
 131:614e8      0 days 0:00:33.093
 128:2eddc      0 days 0:00:33.000
 146:6f690      0 days 0:00:32.015
 161:3c4b4      0 days 0:00:30.421
 167:3cde4      0 days 0:00:29.390
 171:3979c      0 days 0:00:28.515
 166:3cd40      0 days 0:00:28.312
 168:68ef0      0 days 0:00:27.781
  65:aad0      0 days 0:00:26.593
 109:267f4      0 days 0:00:26.390
  88:13624      0 days 0:00:26.000
 173:5282c      0 days 0:00:24.640
 153:71e14      0 days 0:00:23.390
 112:322b4      0 days 0:00:22.812
 110:9578      0 days 0:00:22.125
 175:20230      0 days 0:00:20.250
  79:b458      0 days 0:00:20.218
  66:61b8      0 days 0:00:19.875
  62:9498      0 days 0:00:19.562
 156:d900      0 days 0:00:19.015
 121:5106c      0 days 0:00:18.687
 142:6bb28      0 days 0:00:18.562
  46:2cbc      0 days 0:00:17.796
 169:d920      0 days 0:00:16.875
 154:720b4      0 days 0:00:16.484
 170:4ac8c      0 days 0:00:15.968
  73:b010      0 days 0:00:13.609
  39:3224      0 days 0:00:13.406
 172:722e4      0 days 0:00:12.375
  63:9780      0 days 0:00:12.203
 177:8464      0 days 0:00:11.906
 184:22908      0 days 0:00:10.234
 140:5765c      0 days 0:00:09.750
 174:2f484      0 days 0:00:08.390
  50:7230      0 days 0:00:07.125
 187:3c324      0 days 0:00:06.765
 125:46cf0      0 days 0:00:06.296
 178:3a424      0 days 0:00:05.125
 114:33d20      0 days 0:00:03.734
 165:3ca74      0 days 0:00:01.203
 189:3c358      0 days 0:00:01.000

 164:3124c      0 days 0:00:00.578
  25:be4       0 days 0:00:00.515
  17:ba8       0 days 0:00:00.125
 104:5cf8      0 days 0:00:00.109
  26:e4c       0 days 0:00:00.109
  96:5d44      0 days 0:00:00.093
  99:5b18      0 days 0:00:00.078
  56:8a6c      0 days 0:00:00.078
  55:8a68      0 days 0:00:00.078
   6:a08       0 days 0:00:00.078
   4:a00       0 days 0:00:00.062
 103:5cfc      0 days 0:00:00.046
 100:5ab8      0 days 0:00:00.046
  68:bf34      0 days 0:00:00.046
  37:29d4      0 days 0:00:00.046
 101:5ab4      0 days 0:00:00.031
  98:5b44      0 days 0:00:00.031
  97:5d40      0 days 0:00:00.031
  57:8a70      0 days 0:00:00.031
  53:8a60      0 days 0:00:00.031
  36:29c0      0 days 0:00:00.031
  16:ac4       0 days 0:00:00.031
   1:9e4       0 days 0:00:00.031
  60:880c      0 days 0:00:00.015
  58:8a5c      0 days 0:00:00.015
  24:be0       0 days 0:00:00.015
  15:abc       0 days 0:00:00.015
 188:13044      0 days 0:00:00.000
 186:6530      0 days 0:00:00.000
 185:2013c      0 days 0:00:00.000
 183:6047c      0 days 0:00:00.000
 182:65400      0 days 0:00:00.000
 181:61560      0 days 0:00:00.000
 180:2b7a4      0 days 0:00:00.000
 179:56294      0 days 0:00:00.000
 176:20300      0 days 0:00:00.000
 163:2ab1c      0 days 0:00:00.000
 159:276cc      0 days 0:00:00.000
 158:72134      0 days 0:00:00.000
 155:6a078      0 days 0:00:00.000
 144:6ce98      0 days 0:00:00.000
 141:5404      0 days 0:00:00.000
 134:65718      0 days 0:00:00.000
 124:4bed4      0 days 0:00:00.000
 116:3c770      0 days 0:00:00.000
 113:b08       0 days 0:00:00.000
 111:28e54      0 days 0:00:00.000
 108:25fbc      0 days 0:00:00.000
 105:20504      0 days 0:00:00.000
 102:5cf4      0 days 0:00:00.000
  95:5c70      0 days 0:00:00.000
  94:5ed4      0 days 0:00:00.000
  93:18c2c      0 days 0:00:00.000
  92:19fd8      0 days 0:00:00.000
  90:c870      0 days 0:00:00.000
  87:7994      0 days 0:00:00.000
  86:124cc      0 days 0:00:00.000
  84:eab8      0 days 0:00:00.000
  82:f2a4      0 days 0:00:00.000
  78:d5c0      0 days 0:00:00.000
  76:cfd0      0 days 0:00:00.000
  75:cf64      0 days 0:00:00.000
  71:b4f8      0 days 0:00:00.000
  70:c628      0 days 0:00:00.000
  69:c484      0 days 0:00:00.000
  67:be84      0 days 0:00:00.000
  64:aa00      0 days 0:00:00.000
  61:93f0      0 days 0:00:00.000
  59:89e4      0 days 0:00:00.000
  54:8a64      0 days 0:00:00.000
  52:89a8      0 days 0:00:00.000
  47:4c64      0 days 0:00:00.000
  43:3fa0      0 days 0:00:00.000
  40:2c88      0 days 0:00:00.000
  38:2a28      0 days 0:00:00.000
  34:1928      0 days 0:00:00.000
  32:1668      0 days 0:00:00.000
  31:8dc       0 days 0:00:00.000
  30:15d4      0 days 0:00:00.000
  29:1044      0 days 0:00:00.000
  27:fb4       0 days 0:00:00.000
  23:bd8       0 days 0:00:00.000
  22:bd4       0 days 0:00:00.000
  21:bd0       0 days 0:00:00.000
  20:bc8       0 days 0:00:00.000
  19:bc4       0 days 0:00:00.000
  18:bc0       0 days 0:00:00.000
  14:ab8       0 days 0:00:00.000
  13:ab4       0 days 0:00:00.000
  12:ab0       0 days 0:00:00.000
   9:aa0       0 days 0:00:00.000
   8:a9c       0 days 0:00:00.000
   7:a98       0 days 0:00:00.000
   5:a04       0 days 0:00:00.000
   3:9f4       0 days 0:00:00.000
   2:9f0       0 days 0:00:00.000
   0:994       0 days 0:00:00.000

This is a real spike in the first service process as can be confirmed by a random non-waiting thread:

0:000> ~143k
ChildEBP RetAddr 
050dfc68 7c82d6a4 ntdll!RtlEnterCriticalSection+0x1d
050dfc84 77c7bc50 ntdll!RtlInitializeCriticalSectionAndSpinCount+0x92
050dfc98 77c7bc7c rpcrt4!MUTEX::CommonConstructor+0x1b
050dfcac 77c7c000 rpcrt4!MUTEX::MUTEX+0x13
050dfcc8 77c6ff47 rpcrt4!BINDING_HANDLE::BINDING_HANDLE+0x2d
050dfcd8 77c6ff1f rpcrt4!SVR_BINDING_HANDLE::SVR_BINDING_HANDLE+0x10
050dfcfc 77c6d338 rpcrt4!RPC_ADDRESS::InquireBinding+0x8a
050dfd0c 77c6fd1d rpcrt4!LRPC_SCALL::ToStringBinding+0x16
050dfd1c 76554c83 rpcrt4!RpcBindingToStringBindingW+0x4d
050dfd5c 77c7c42a ServiceA!RpcSecurityCallback+0x1e
050dfdb4 77c7c4b0 rpcrt4!RPC_INTERFACE::CheckSecurityIfNecessary+0x6f
050dfdcc 77c7c46c rpcrt4!LRPC_SBINDING::CheckSecurity+0x4f
050dfdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0x2bb
050dfe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0x127
050dff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430
050dff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd
050dffac 77c7b110 rpcrt4!BaseCachedThreadRoutine+0x9d
050dffb8 77e64829 rpcrt4!ThreadStartRoutine+0x1b
050dffec 00000000 kernel32!BaseThreadStart+0x34

0:000> ~143r
eax=00000000 ebx=00000000 ecx=7c887784 edx=7c887780 esi=7c887784 edi=00163fb0
eip=7c81a37d esp=050dfc5c ebp=050dfc68 iopl=0 nv up ei ng nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000287
ntdll!RtlEnterCriticalSection+0x1d:
7c81a37d 0f92c0          setb    al

0:000> .asm no_code_bytes
Assembly options: no_code_bytes

0:000> u 7c81a37d
ntdll!RtlEnterCriticalSection+0x1d:
7c81a37d setb    al
7c81a380 test    al,al
7c81a382 je      ntdll!RtlEnterCriticalSection+0x28 (7c82b096)
7c81a388 mov     ecx,dword ptr fs:[18h]
7c81a38f mov     eax,dword ptr [ecx+24h]
7c81a392 pop     edi
7c81a393 mov     dword ptr [edx+0Ch],eax
7c81a396 mov     dword ptr [edx+8],1

0:000> ub 7c81a37d
ntdll!RtlEnterCriticalSection+0x6:
7c81a366 mov     edx,dword ptr [ebp+8]
7c81a369 push    esi
7c81a36a lea     esi,[edx+4]
7c81a36d push    edi
7c81a36e mov     dword ptr [ebp-4],esi
7c81a371 mov     eax,0
7c81a376 mov     ecx,dword ptr [ebp-4]
7c81a379 lock btr dword ptr [ecx],eax

The second service is weakly (waiting for event notifications) coupled to the first service above:

0:000> !runaway
 User Mode Time
  Thread       Time
   5:dbec      0 days 0:01:50.031
   8:46008      0 days 0:01:46.062
  11:ad0c      0 days 0:01:13.921
  17:932c      0 days 0:01:03.234
  14:45d78      0 days 0:00:58.109

  15:6d4d0      0 days 0:00:00.015
   2:725a4      0 days 0:00:00.015
   0:6101c      0 days 0:00:00.015
  18:d1c4      0 days 0:00:00.000
  16:76bc      0 days 0:00:00.000
  13:456a8      0 days 0:00:00.000
  12:459e4      0 days 0:00:00.000
  10:3c768      0 days 0:00:00.000
   9:12d20      0 days 0:00:00.000
   7:46010      0 days 0:00:00.000
   6:4600c      0 days 0:00:00.000
   4:dbf0      0 days 0:00:00.000
   3:17ed4      0 days 0:00:00.000
   1:61024      0 days 0:00:00.000

0:000> ~11k
ChildEBP RetAddr 
0223fa68 7c82787b ntdll!KiFastSystemCallRet
0223fa6c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
0223fab8 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0x230
0223fac4 77c80673 rpcrt4!I_RpcSendReceive+0x24
0223fad8 77ce315a rpcrt4!NdrSendReceive+0x2b
0223fec0 771f4fbd rpcrt4!NdrClientCall2+0x22e
0223fed8 771f4f60 ServiceB!RpcWaitEvent+0x1c
[...]

0:000> ~17k
ChildEBP RetAddr 
0283fa68 7c82787b ntdll!KiFastSystemCallRet
0283fa6c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
0283fab8 77c7fcf0 rpcrt4!LRPC_CCALL::SendReceive+0x230
0283fac4 77c80673 rpcrt4!I_RpcSendReceive+0x24
0283fad8 77ce315a rpcrt4!NdrSendReceive+0x2b
0283fec0 771f4fbd rpcrt4!NdrClientCall2+0x22e
0283fed8 771f4f60 ServiceB!RpcWaitEvent+0x1c
[...]

Sometimes, semantic process coupling also results in distributed spikes and most often it is possible to predict another spiking process in such cases. In our example above both spiking processes were semantically coupled with another service and it was confirmed that it was spiking too:

0:000> !runaway
 User Mode Time
  Thread       Time
  89:10d4      0 days 0:03:03.500
  28:a94       0 days 0:00:39.562
  73:c10       0 days 0:00:37.531
  54:b88       0 days 0:00:37.140
  29:a98       0 days 0:00:35.906
  27:a90       0 days 0:00:35.500
  75:c2c       0 days 0:00:28.812
  90:10d8      0 days 0:00:27.000
  93:10e4      0 days 0:00:24.265
  32:aa4       0 days 0:00:12.906
  41:ac8       0 days 0:00:11.890
  35:ab0       0 days 0:00:11.875
  58:bc4       0 days 0:00:10.218
  42:acc       0 days 0:00:09.546
  85:e74       0 days 0:00:08.859
  36:ab4       0 days 0:00:08.578
  72:c0c       0 days 0:00:05.890
  70:c04       0 days 0:00:05.687
  33:aa8       0 days 0:00:05.046
  74:c14       0 days 0:00:04.953
  40:ac4       0 days 0:00:04.953
  38:abc       0 days 0:00:04.359
  39:ac0       0 days 0:00:04.312
  34:aac       0 days 0:00:04.140
  64:bec       0 days 0:00:03.812
  88:10d0      0 days 0:00:03.187
  30:a9c       0 days 0:00:02.859
   9:a10       0 days 0:00:01.968
  37:ab8       0 days 0:00:01.953
  92:10e0      0 days 0:00:01.718
  83:d00       0 days 0:00:01.125
  94:1150      0 days 0:00:01.031

  77:c54       0 days 0:00:00.890
  98:f2c0      0 days 0:00:00.265
  97:eb1c      0 days 0:00:00.265
  76:c50       0 days 0:00:00.265
  21:a48       0 days 0:00:00.187
  22:a4c       0 days 0:00:00.140
  63:be8       0 days 0:00:00.093
  23:a50       0 days 0:00:00.093
  53:af8       0 days 0:00:00.078
  24:a54       0 days 0:00:00.046
  71:c08       0 days 0:00:00.031
  65:bf0       0 days 0:00:00.031
  87:e8c       0 days 0:00:00.015
  57:bc0       0 days 0:00:00.015
 104:6454c      0 days 0:00:00.000
 103:63fb4      0 days 0:00:00.000
 102:3c5ec      0 days 0:00:00.000
 101:65178      0 days 0:00:00.000
 100:5d0e4      0 days 0:00:00.000
  99:5bae4      0 days 0:00:00.000
  96:574       0 days 0:00:00.000
  95:b84       0 days 0:00:00.000
  91:10dc      0 days 0:00:00.000
  86:e88       0 days 0:00:00.000
  84:e70       0 days 0:00:00.000
  82:c84       0 days 0:00:00.000
  81:c68       0 days 0:00:00.000
  80:c64       0 days 0:00:00.000
  79:c60       0 days 0:00:00.000
  78:c5c       0 days 0:00:00.000
  69:c00       0 days 0:00:00.000
  68:bfc       0 days 0:00:00.000
  67:bf8       0 days 0:00:00.000
  66:bf4       0 days 0:00:00.000
  62:bd8       0 days 0:00:00.000
  61:bd4       0 days 0:00:00.000
  60:bd0       0 days 0:00:00.000
  59:bcc       0 days 0:00:00.000
  56:bbc       0 days 0:00:00.000
  55:bb8       0 days 0:00:00.000
  52:af4       0 days 0:00:00.000
  51:af0       0 days 0:00:00.000
  50:aec       0 days 0:00:00.000
  49:ae8       0 days 0:00:00.000
  48:ae4       0 days 0:00:00.000
  47:ae0       0 days 0:00:00.000
  46:adc       0 days 0:00:00.000
  45:ad8       0 days 0:00:00.000
  44:ad4       0 days 0:00:00.000
  43:ad0       0 days 0:00:00.000
  31:aa0       0 days 0:00:00.000
  26:a8c       0 days 0:00:00.000
  25:a64       0 days 0:00:00.000
  20:a44       0 days 0:00:00.000
  19:a40       0 days 0:00:00.000
  18:a34       0 days 0:00:00.000
  17:a30       0 days 0:00:00.000
  16:a2c       0 days 0:00:00.000
  15:a28       0 days 0:00:00.000
  14:a24       0 days 0:00:00.000
  13:a20       0 days 0:00:00.000
  12:a1c       0 days 0:00:00.000
  11:a18       0 days 0:00:00.000
  10:a14       0 days 0:00:00.000
   8:a0c       0 days 0:00:00.000
   7:a08       0 days 0:00:00.000
   6:a04       0 days 0:00:00.000
   5:a00       0 days 0:00:00.000
   4:9fc       0 days 0:00:00.000
   3:9f8       0 days 0:00:00.000
   2:9f4       0 days 0:00:00.000
   1:9f0       0 days 0:00:00.000
   0:9e4       0 days 0:00:00.000
 

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

One Response to “Crash Dump Analysis Patterns (Part 106)”

  1. Dmitry Vostokov Says:

    If timing information is not available but random process memory snapshots or snapshots by procdump tool reveal different spiking threads then we might also have this pattern.

Leave a Reply