When a process dies silently
There are cases when default postmortem debugger doesn’t save a dump file. This is because the default postmortem debugger is called from the crashed application thread on Windows prior to Vista and if a thread stack is exhausted or critical thread data is corrupt there is no user dump. On Vista the default postmorten debugger is called from WER (Windows Error Reporting) process WerFault.exe so there is a chance that it can save a user dump. During my experiments today on Windows 2003 (x64) I found that if we have a stack overflow inside a 64-bit process then the process silently dies. This doesn’t happen for 32-bit processes on the same server on a native 32-bit OS. Here is the added code from the modified default Win32 API project created in Visual Studio 2005:
...
volatile DWORD dwSupressOptimization;
...
void SoFunction();
...
LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
...
case WM_PAINT:
hdc = BeginPaint(hWnd, &ps);
SoFunction();
EndPaint(hWnd, &ps);
break;
...
}
...
void SoFunction()
{
if (++dwSupressOptimization)
{
SoFunction();
WndProc(0,0,0,0);
}
}
Adding WndProc call to SoFunction is done to eliminate an optimization in Release build when a recursion call is transformed into a loop:
void SoFunction()
{
if (++dwSupressOptimization)
{
SoFunction();
}
}
0:001> uf SoFunction
00401300 mov eax,1
00401305 jmp StackOverflow!SoFunction+0x10 (00401310)
00401310 add dword ptr [StackOverflow!dwSupressOptimization (00403374)],eax
00401316 mov ecx,dword ptr [StackOverflow!dwSupressOptimization (00403374)]
0040131c jne StackOverflow!SoFunction+0x10 (00401310)
0040131e ret
Therefore without WndProc added or more complicated SoFunction there is no stack overflow but a loop with 4294967295 (0xFFFFFFFF) iterations.
If we compile an x64 project with WndProc call included in SoFunction and run it we would never get a dump from any default postmortem debugger although TestDefaultDebugger64 tool crashes with a dump. I also observed a strange behavior that the application disappears only during the second window repaint although it shall crash immediately when we launch it and the main window is shown. What I have seen is when I launch the application it is running and the main window is visible. When I force it to repaint by minimizing and then maximizing, for example, only then it disappears from the screen and the process list.
If we launch 64-bit WinDbg, load and run our application we would hit the first chance exception:
0:000> g
(159c.fc4): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
StackOverflow!SoFunction+0x22:
00000001`40001322 e8d9ffffff call StackOverflow!SoFunction (00000001`40001300)
Stack trace looks like normal stack overflow:
0:000> k
Child-SP RetAddr Call Site
00000000`00033fe0 00000001`40001327 StackOverflow!SoFunction+0x22
00000000`00034020 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034060 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000340a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000340e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034120 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034160 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000341a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000341e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034220 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034260 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000342a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000342e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034320 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034360 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000343a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000343e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034420 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034460 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000344a0 00000001`40001327 StackOverflow!SoFunction+0x27
RSP was inside stack guard page during the CALL instruction.
0:000> r
rax=0000000000003eed rbx=00000000000f26fe rcx=0000000077c4080a
rdx=0000000000000000 rsi=000000000000000f rdi=0000000000000000
rip=0000000140001322 rsp=0000000000033fe0 rbp=00000001400035f0
r8=000000000012fb18 r9=00000001400035f0 r10=0000000000000000
r11=0000000000000246 r12=000000000012fdd8 r13=000000000012fd50
r14=00000000000f26fe r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
StackOverflow!SoFunction+0×22:
00000001`40001322 e8d9ffffff call StackOverflow!SoFunction (00000001`40001300)
0:000> uf StackOverflow!SoFunction
00000001`40001300 sub rsp,38h
00000001`40001304 mov rax,qword ptr [StackOverflow!__security_cookie (00000001`40003000)]
00000001`4000130b xor rax,rsp
00000001`4000130e mov qword ptr [rsp+20h],rax
00000001`40001313 add dword ptr [StackOverflow!dwSupressOptimization (00000001`400035e4)],1
00000001`4000131a mov eax,dword ptr [StackOverflow!dwSupressOptimization (00000001`400035e4)]
00000001`40001320 je StackOverflow!SoFunction+0×37 (00000001`40001337)
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)
00000001`40001327 xor r9d,r9d
00000001`4000132a xor r8d,r8d
00000001`4000132d xor edx,edx
00000001`4000132f xor ecx,ecx
00000001`40001331 call qword ptr [StackOverflow!_imp_DefWindowProcW (00000001`40002198)]
00000001`40001337 mov rcx,qword ptr [rsp+20h]
00000001`4000133c xor rcx,rsp
00000001`4000133f call StackOverflow!__security_check_cookie (00000001`40001360)
00000001`40001344 add rsp,38h
00000001`40001348 ret
However this guard page is not the last stack page as can be seen from TEB and the current RSP address (0×33fe0):
0:000> !teb
TEB at 000007fffffde000
ExceptionList: 0000000000000000
StackBase: 0000000000130000
StackLimit: 0000000000031000
SubSystemTib: 0000000000000000
FiberData: 0000000000001e00
ArbitraryUserPointer: 0000000000000000
Self: 000007fffffde000
EnvironmentPointer: 0000000000000000
ClientId: 000000000000159c . 0000000000000fc4
RpcHandle: 0000000000000000
Tls Storage: 0000000000000000
PEB Address: 000007fffffd5000
LastErrorValue: 0
LastStatusValue: c0000135
Count Owned Locks: 0
HardErrorMode: 0
If we continue execution and force the main application window to invalidate (repaint) itself we get another first chance exception instead of second chance:
0:000> g
(159c.fc4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
StackOverflow!SoFunction+0x22:
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)
What we see now is that RSP is outside the valid stack region (stack limit) 0×31000:
0:000> k
Child-SP RetAddr Call Site
00000000`00030ff0 00000001`40001327 StackOverflow!SoFunction+0×22
00000000`00031030 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031070 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000310b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000310f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031130 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031170 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000311b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000311f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031230 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031270 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000312b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000312f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031330 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031370 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000313b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000313f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031430 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031470 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000314b0 00000001`40001327 StackOverflow!SoFunction+0×27
0:000> r
rax=0000000000007e98 rbx=00000000000f26fe rcx=0000000077c4080a
rdx=0000000000000000 rsi=000000000000000f rdi=0000000000000000
rip=0000000140001322 rsp=0000000000030ff0 rbp=00000001400035f0
r8=000000000012faa8 r9=00000001400035f0 r10=0000000000000000
r11=0000000000000246 r12=000000000012fd68 r13=000000000012fce0
r14=00000000000f26fe r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
StackOverflow!SoFunction+0×22:
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)
Therefore we expect the second chance exception at the same address here and we get it indeed when we continue execution:
0:000> g
(159c.fc4): Access violation - code c0000005 (!!! second chance !!!)
StackOverflow!SoFunction+0x22:
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)
Now we see why the process died silently. There was no stack space left for exception dispatch handler functions and therefore for the default unhandled exception filter that launches the default postmortem debugger to save a process dump. So it looks like on x64 Windows when our process had first chance stack overflow exception there was no second chance exception afterwards and after handling first chance stack overflow exception process execution resumed and finally hit its thread stack limit. This doesn’t happen with 32-bit processes even on x64 Windows where unhandled first chance stack overflow exception results in immediate second chance stack overflow exception at the same stack address and therefore there is a sufficient room for the local variables for exception handler and filter functions.
This is an example of what happened before exception handling changes in Vista.
- Dmitry Vostokov @ DumpAnalysis.org -
October 4th, 2007 at 12:00 am
Hi, I think if application has an exception handler(SEH) and calls _resetstkoflw() when exception code == STATUS_STACK_OVERFLOW. Then it’ll never crash silently.