When a process dies silently

There are cases when default postmortem debugger doesn’t save a dump file. This is because the default postmortem debugger is called from the crashed application thread on Windows prior to Vista and if a thread stack is exhausted or critical thread data is corrupt there is no user dump.  On Vista the default postmorten debugger is called from WER (Windows Error Reporting) process WerFault.exe so there is a chance that it can save a user dump. During my experiments today on Windows 2003 (x64) I found that if we have a stack overflow inside a 64-bit process then the process silently dies. This doesn’t happen for 32-bit processes on the same server on a native 32-bit OS. Here is the added code from the modified default Win32 API project created in Visual Studio 2005:

...
volatile DWORD dwSupressOptimization;
...
void SoFunction();
...
LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)
{
...
  case WM_PAINT:
     hdc = BeginPaint(hWnd, &ps);
     SoFunction();
     EndPaint(hWnd, &ps);
     break;
...
}
...
void SoFunction()
{
  if (++dwSupressOptimization)
  {
     SoFunction();
     WndProc(0,0,0,0);
  }
}

Adding WndProc call to SoFunction is done to eliminate an optimization in Release build when a recursion call is transformed into a loop:

void SoFunction()
{
  if (++dwSupressOptimization)
  {
     SoFunction();
  }
}

0:001> uf SoFunction
00401300 mov     eax,1
00401305 jmp     StackOverflow!SoFunction+0x10 (00401310)
00401310 add     dword ptr [StackOverflow!dwSupressOptimization (00403374)],eax
00401316 mov     ecx,dword ptr [StackOverflow!dwSupressOptimization (00403374)]
0040131c jne     StackOverflow!SoFunction+0x10 (00401310)
0040131e ret

Therefore without WndProc added or more complicated SoFunction there is no stack overflow but a loop with 4294967295 (0xFFFFFFFF) iterations.

If we compile an x64 project with WndProc call included in SoFunction and run it we would never get a dump from any default postmortem debugger although TestDefaultDebugger64 tool crashes with a dump. I also observed a strange behavior that the application disappears only during the second window repaint although it shall crash immediately when we launch it and the main window is shown. What I have seen is when I launch the application it is running and the main window is visible. When I force it to repaint by minimizing and then maximizing, for example, only then it disappears from the screen and the process list.

If we launch 64-bit WinDbg, load and run our application we would hit the first chance exception:

0:000> g
(159c.fc4): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
StackOverflow!SoFunction+0x22:
00000001`40001322 e8d9ffffff call StackOverflow!SoFunction (00000001`40001300)

Stack trace looks like normal stack overflow:

0:000> k
Child-SP          RetAddr           Call Site
00000000`00033fe0 00000001`40001327 StackOverflow!SoFunction+0x22
00000000`00034020 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034060 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000340a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000340e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034120 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034160 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000341a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000341e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034220 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034260 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000342a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000342e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034320 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034360 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000343a0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000343e0 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034420 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`00034460 00000001`40001327 StackOverflow!SoFunction+0x27
00000000`000344a0 00000001`40001327 StackOverflow!SoFunction+0x27

RSP was inside stack guard page during the CALL instruction.

0:000> r
rax=0000000000003eed rbx=00000000000f26fe rcx=0000000077c4080a
rdx=0000000000000000 rsi=000000000000000f rdi=0000000000000000
rip=0000000140001322 rsp=0000000000033fe0 rbp=00000001400035f0
 r8=000000000012fb18 r9=00000001400035f0 r10=0000000000000000
r11=0000000000000246 r12=000000000012fdd8 r13=000000000012fd50
r14=00000000000f26fe r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
StackOverflow!SoFunction+0×22:
00000001`40001322 e8d9ffffff call StackOverflow!SoFunction (00000001`40001300)

0:000> uf StackOverflow!SoFunction
00000001`40001300 sub     rsp,38h
00000001`40001304 mov     rax,qword ptr [StackOverflow!__security_cookie (00000001`40003000)]
00000001`4000130b xor     rax,rsp
00000001`4000130e mov     qword ptr [rsp+20h],rax
00000001`40001313 add     dword ptr [StackOverflow!dwSupressOptimization (00000001`400035e4)],1
00000001`4000131a mov     eax,dword ptr [StackOverflow!dwSupressOptimization (00000001`400035e4)]
00000001`40001320 je      StackOverflow!SoFunction+0×37 (00000001`40001337)
00000001`40001322 call    StackOverflow!SoFunction (00000001`40001300)
00000001`40001327 xor     r9d,r9d
00000001`4000132a xor     r8d,r8d
00000001`4000132d xor     edx,edx
00000001`4000132f xor     ecx,ecx
00000001`40001331 call    qword ptr [StackOverflow!_imp_DefWindowProcW (00000001`40002198)]
00000001`40001337 mov     rcx,qword ptr [rsp+20h]
00000001`4000133c xor     rcx,rsp
00000001`4000133f call    StackOverflow!__security_check_cookie (00000001`40001360)
00000001`40001344 add     rsp,38h
00000001`40001348 ret

However this guard page is not the last stack page as can be seen from TEB and the current RSP address (0×33fe0):

0:000> !teb
TEB at 000007fffffde000
    ExceptionList:        0000000000000000
    StackBase:            0000000000130000
    StackLimit:           0000000000031000
    SubSystemTib:         0000000000000000
    FiberData:            0000000000001e00
    ArbitraryUserPointer: 0000000000000000
    Self:                 000007fffffde000
    EnvironmentPointer:   0000000000000000
    ClientId:             000000000000159c . 0000000000000fc4
    RpcHandle:            0000000000000000
    Tls Storage:          0000000000000000
    PEB Address:          000007fffffd5000
    LastErrorValue:       0
    LastStatusValue:      c0000135
    Count Owned Locks:    0
    HardErrorMode:        0

If we continue execution and force the main application window to invalidate (repaint) itself we get another first chance exception instead of second chance:

0:000> g
(159c.fc4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
StackOverflow!SoFunction+0x22:
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)

What we see now is that RSP is outside the valid stack region (stack limit) 0×31000:

0:000> k
Child-SP          RetAddr           Call Site
00000000`00030ff0 00000001`40001327 StackOverflow!SoFunction+0×22
00000000`00031030 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031070 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000310b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000310f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031130 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031170 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000311b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000311f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031230 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031270 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000312b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000312f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031330 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031370 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000313b0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000313f0 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031430 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`00031470 00000001`40001327 StackOverflow!SoFunction+0×27
00000000`000314b0 00000001`40001327 StackOverflow!SoFunction+0×27
0:000> r
rax=0000000000007e98 rbx=00000000000f26fe rcx=0000000077c4080a
rdx=0000000000000000 rsi=000000000000000f rdi=0000000000000000
rip=0000000140001322 rsp=0000000000030ff0 rbp=00000001400035f0
 r8=000000000012faa8  r9=00000001400035f0 r10=0000000000000000
r11=0000000000000246 r12=000000000012fd68 r13=000000000012fce0
r14=00000000000f26fe r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
StackOverflow!SoFunction+0×22:
00000001`40001322 call    StackOverflow!SoFunction (00000001`40001300)

Therefore we expect the second chance exception at the same address here and we get it indeed when we continue execution:

0:000> g
(159c.fc4): Access violation - code c0000005 (!!! second chance !!!)
StackOverflow!SoFunction+0x22:
00000001`40001322 call    StackOverflow!SoFunction (00000001`40001300)

Now we see why the process died silently. There was no stack space left for exception dispatch handler functions and therefore for the default unhandled exception filter that launches the default postmortem debugger to save a process dump. So it looks like on x64 Windows when our process had first chance stack overflow exception there was no second chance exception afterwards and after handling first chance stack overflow exception process execution resumed and finally hit its thread stack limit. This doesn’t happen with 32-bit processes even on x64 Windows where unhandled first chance stack overflow exception results in immediate second chance stack overflow exception at the same stack address and therefore there is a sufficient room for the local variables for exception handler and filter functions.

This is an example of what happened before exception handling changes in Vista.

- Dmitry Vostokov @ DumpAnalysis.org -

One Response to “When a process dies silently”

  1. chae Says:

    Hi, I think if application has an exception handler(SEH) and calls _resetstkoflw() when exception code == STATUS_STACK_OVERFLOW. Then it’ll never crash silently.

Leave a Reply

You must be logged in to post a comment.