What is KiFastSystemCallRet?
Thursday, January 10th, 2008This question was asked hundreds of times in 2007 and here is the short answer. This is a return function address for trap frames created for system calls on x86 post-W2K systems.
Since Pentium II Microsoft changed OS call dispatching from interrupt driven INT /IRETD mechanism used in Windows NT and Windows 2000 to faster optimized instruction sequence. This is SYSENTER / SYSEXIT pair on x86 32-bit Intel platforms and SYSCALL / SYSRET pair on x64 Intel and AMD platforms.
INT instruction saves a return address but SYSENTER doesn’t. Let’s look at a typical thread call stack from complete memory dump coming from x86 Windows 2003 system:
1: kd> kL
ChildEBP RetAddr
a5a2ac64 80502d26 nt!KiSwapContext+0x2f
a5a2ac70 804faf20 nt!KiSwapThread+0x8a
a5a2ac98 805a4d6c nt!KeWaitForSingleObject+0x1c2
a5a2ad48 8054086c nt!NtReplyWaitReceivePortEx+0x3dc
a5a2ad48 7c91eb94 nt!KiFastCallEntry+0xfc
00a0fe18 7c91e399 ntdll!KiFastSystemCallRet
00a0fe1c 77e56703 ntdll!NtReplyWaitReceivePortEx+0xc
00a0ff80 77e56c22 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
00a0ff88 77e56a3b RPCRT4!RecvLotsaCallsWrapper+0xd
00a0ffa8 77e56c0a RPCRT4!BaseCachedThreadRoutine+0×79
00a0ffb4 7c80b683 RPCRT4!ThreadStartRoutine+0×1a
00a0ffec 00000000 kernel32!BaseThreadStart+0×37
RPC module calls the native function to wait for a reply from an LPC port. Note that we disassemble the return address instead of the symbolic address because of OMAP Code Optimization:
1: kd> ub RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
^ Unable to find valid previous instruction for 'ub RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4'
1: kd> ub 77e56703
RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0xd9:
77e566e8 e8edfeffff call RPCRT4!RpcpPurgeEEInfoFromThreadIfNecessary (77e565da)
77e566ed ff75ec push dword ptr [ebp-14h]
77e566f0 8d45f0 lea eax,[ebp-10h]
77e566f3 ff75f4 push dword ptr [ebp-0Ch]
77e566f6 ff75fc push dword ptr [ebp-4]
77e566f9 50 push eax
77e566fa ff7658 push dword ptr [esi+58h]
77e566fd ff15b010e577 call dword ptr [RPCRT4!_imp__NtReplyWaitReceivePortEx (77e510b0)]
1: kd> dps 77e510b0 l1
77e510b0 7c91e38d ntdll!ZwReplyWaitReceivePortEx
NTDLL stub for the native function is small and transitions to level 0 via shared SystemCallSub immediately:
1: kd> uf ntdll!NtReplyWaitReceivePortEx
ntdll!ZwReplyWaitReceivePortEx:
7c91e38d mov eax,0C4h
7c91e392 mov edx,offset SharedUserData!SystemCallStub (7ffe0300)
7c91e397 call dword ptr [edx]
7c91e399 ret 14h
1: kd> dps 7ffe0300 l3
7ffe0300 7c91eb8b ntdll!KiFastSystemCall
7ffe0304 7c91eb94 ntdll!KiFastSystemCallRet
7ffe0308 00000000
1: kd> uf ntdll!KiFastSystemCall
ntdll!KiFastSystemCall:
7c91eb8b mov edx,esp
7c91eb8d sysenter
7c91eb8f nop
7c91eb90 nop
7c91eb91 nop
7c91eb92 nop
7c91eb93 nop
7c91eb94 ret
Before executing SYSENTER ESP points to the following return address:
1: kd> u 7c91e399
ntdll!NtReplyWaitReceivePortEx+0xc:
7c91e399 ret 14h
SYSENTER instruction changes ESP and EIP to new values contained in machine-specific registers (MSR). As a result EIP points to nt!KiFastCallEntry. After saving a trap frame and checking parameters it calls nt!NtReplyWaitReceivePortEx address from system function table. When the latter function returns KiFastCallEntry proceeds to KiServiceExit and KiSystemCallExit2:
1: kd> ub 8054086c
nt!KiFastCallEntry+0xe2:
80540852 mov ebx,dword ptr [edi+eax*4]
80540855 sub esp,ecx
80540857 shr ecx,2
8054085a mov edi,esp
8054085c cmp esi,dword ptr [nt!MmUserProbeAddress (80561114)]
80540862 jae nt!KiSystemCallExit2+0×9f (80540a10)
80540868 rep movs dword ptr es:[edi],dword ptr [esi]
8054086a call ebx
1: kd> u
nt!KiFastCallEntry+0x105:
80540875 mov edx,dword ptr [ebp+3Ch]
80540878 mov dword ptr [ecx+134h],edx
nt!KiServiceExit:
8054087e cli
8054087f test dword ptr [ebp+70h],20000h
80540886 jne nt!KiServiceExit+0x10 (8054088e)
80540888 test byte ptr [ebp+6Ch],1
8054088c je nt!KiServiceExit+0x66 (805408e4)
8054088e mov ebx,dword ptr fs:[124h]
1: kd> u
nt!KiSystemCallExit2+0x12:
80540983 sti
80540984 sysexit
Let’s inspect the trap frame:
1: kd> kv5
ChildEBP RetAddr Args to Child
a5a2ac64 80502d26 82ffc090 82ffc020 804faf20 nt!KiSwapContext+0x2f
a5a2ac70 804faf20 e12424b0 8055c0a0 e12424b0 nt!KiSwapThread+0x8a
a5a2ac98 805a4d6c 00000001 00000010 00000001 nt!KeWaitForSingleObject+0x1c2
a5a2ad48 8054086c 000000c8 00a0ff70 00000000 nt!NtReplyWaitReceivePortEx+0x3dc
a5a2ad48 7c91eb94 000000c8 00a0ff70 00000000 nt!KiFastCallEntry+0xfc (TrapFrame @ a5a2ad64)
1: kd> .trap a5a2ad64
ErrCode = 00000000
eax=00000000 ebx=00000000 ecx=00a0fd6c edx=7c91eb94 esi=00159b38 edi=00000100
eip=7c91eb94 esp=00a0fe1c ebp=00a0ff80 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
001b:7c91eb94 ret
1: kd> kL
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
00a0fe18 7c91e399 ntdll!KiFastSystemCallRet
00a0fe1c 77e56703 ntdll!NtReplyWaitReceivePortEx+0xc
00a0ff80 77e56c22 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4
00a0ff88 77e56a3b RPCRT4!RecvLotsaCallsWrapper+0xd
00a0ffa8 77e56c0a RPCRT4!BaseCachedThreadRoutine+0x79
00a0ffb4 7c80b683 RPCRT4!ThreadStartRoutine+0x1a
00a0ffec 00000000 kernel32!BaseThreadStart+0x37
Therefore I believe the dummy ntdll!KiFastSystemCallRet function with one RET instruction is used to create a uniform trap frame across system calls. Otherwise trap frames for different native API calls would contain different return values.
While reading this post I found two related articles. The first one explains old mechanism for Windows NT and the second one explains the new one:
I’ll cover SYSCALL / SYSRET in another blog post.
- Dmitry Vostokov @ DumpAnalysis.org -