1 1.1 joerg =============== 2 1.1 joerg ShadowCallStack 3 1.1 joerg =============== 4 1.1 joerg 5 1.1 joerg .. contents:: 6 1.1 joerg :local: 7 1.1 joerg 8 1.1 joerg Introduction 9 1.1 joerg ============ 10 1.1 joerg 11 1.1 joerg ShadowCallStack is an instrumentation pass, currently only implemented for 12 1.1 joerg aarch64, that protects programs against return address overwrites 13 1.1 joerg (e.g. stack buffer overflows.) It works by saving a function's return address 14 1.1 joerg to a separately allocated 'shadow call stack' in the function prolog in 15 1.1 joerg non-leaf functions and loading the return address from the shadow call stack 16 1.1 joerg in the function epilog. The return address is also stored on the regular stack 17 1.1 joerg for compatibility with unwinders, but is otherwise unused. 18 1.1 joerg 19 1.1 joerg The aarch64 implementation is considered production ready, and 20 1.1 joerg an `implementation of the runtime`_ has been added to Android's libc 21 1.1 joerg (bionic). An x86_64 implementation was evaluated using Chromium and was found 22 1.1 joerg to have critical performance and security deficiencies--it was removed in 23 1.1 joerg LLVM 9.0. Details on the x86_64 implementation can be found in the 24 1.1 joerg `Clang 7.0.1 documentation`_. 25 1.1 joerg 26 1.1 joerg .. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128 27 1.1 joerg .. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html 28 1.1 joerg 29 1.1 joerg Comparison 30 1.1 joerg ---------- 31 1.1 joerg 32 1.1 joerg To optimize for memory consumption and cache locality, the shadow call 33 1.1 joerg stack stores only an array of return addresses. This is in contrast to other 34 1.1 joerg schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off 35 1.1 joerg consuming more memory for shorter function prologs and epilogs with fewer 36 1.1 joerg memory accesses. 37 1.1 joerg 38 1.1 joerg `Return Flow Guard`_ is a pure software implementation of shadow call stacks 39 1.1 joerg on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is 40 1.1 joerg inherently racy due to the architecture's use of the stack for calls and 41 1.1 joerg returns. 42 1.1 joerg 43 1.1 joerg Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware 44 1.1 joerg extension that would add native support to use a shadow stack to store/check 45 1.1 joerg return addresses at call/return time. Being a hardware implementation, it 46 1.1 joerg would not suffer from race conditions and would not incur the overhead of 47 1.1 joerg function instrumentation, but it does require operating system support. 48 1.1 joerg 49 1.1 joerg .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ 50 1.1 joerg .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf 51 1.1 joerg 52 1.1 joerg Compatibility 53 1.1 joerg ------------- 54 1.1 joerg 55 1.1 joerg A runtime is not provided in compiler-rt so one must be provided by the 56 1.1 joerg compiled application or the operating system. Integrating the runtime into 57 1.1 joerg the operating system should be preferred since otherwise all thread creation 58 1.1 joerg and destruction would need to be intercepted by the application. 59 1.1 joerg 60 1.1 joerg The instrumentation makes use of the platform register ``x18``. On some 61 1.1 joerg platforms, ``x18`` is reserved, and on others, it is designated as a scratch 62 1.1 joerg register. This generally means that any code that may run on the same thread 63 1.1 joerg as code compiled with ShadowCallStack must either target one of the platforms 64 1.1 joerg whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows) 65 1.1 joerg or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code 66 1.1 joerg compiled without ``-ffixed-x18`` may be run on the same thread as code that 67 1.1 joerg uses ShadowCallStack by saving the register value temporarily on the stack 68 1.1 joerg (`example in Android`_) but this should be done with care since it risks 69 1.1 joerg leaking the shadow call stack address. 70 1.1 joerg 71 1.1 joerg .. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717 72 1.1 joerg 73 1.1 joerg Because of the use of register ``x18``, the ShadowCallStack feature is 74 1.1 joerg incompatible with any other feature that may use ``x18``. However, there 75 1.1 joerg is no inherent reason why ShadowCallStack needs to use register ``x18`` 76 1.1 joerg specifically; in principle, a platform could choose to reserve and use another 77 1.1 joerg register for ShadowCallStack, but this would be incompatible with the AAPCS64. 78 1.1 joerg 79 1.1 joerg Special unwind information is required on functions that are compiled 80 1.1 joerg with ShadowCallStack and that may be unwound, i.e. functions compiled with 81 1.1 joerg ``-fexceptions`` (which is the default in C++). Some unwinders (such as the 82 1.1 joerg libgcc 4.9 unwinder) do not understand this unwind info and will segfault 83 1.1 joerg when encountering it. LLVM libunwind processes this unwind info correctly, 84 1.1 joerg however. This means that if exceptions are used together with ShadowCallStack, 85 1.1 joerg the program must use a compatible unwinder. 86 1.1 joerg 87 1.1 joerg Security 88 1.1 joerg ======== 89 1.1 joerg 90 1.1 joerg ShadowCallStack is intended to be a stronger alternative to 91 1.1 joerg ``-fstack-protector``. It protects from non-linear overflows and arbitrary 92 1.1 joerg memory writes to the return address slot. 93 1.1 joerg 94 1.1 joerg The instrumentation makes use of the ``x18`` register to reference the shadow 95 1.1 joerg call stack, meaning that references to the shadow call stack do not have 96 1.1 joerg to be stored in memory. This makes it possible to implement a runtime that 97 1.1 joerg avoids exposing the address of the shadow call stack to attackers that can 98 1.1 joerg read arbitrary memory. However, attackers could still try to exploit side 99 1.1 joerg channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ 100 1.1 joerg to discover the address of the shadow call stack. 101 1.1 joerg 102 1.1 joerg .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ 103 1.1 joerg .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf 104 1.1 joerg .. _`[3]`: https://www.vusec.net/projects/anc/ 105 1.1 joerg 106 1.1 joerg Unless care is taken when allocating the shadow call stack, it may be 107 1.1 joerg possible for an attacker to guess its address using the addresses of 108 1.1 joerg other allocations. Therefore, the address should be chosen to make this 109 1.1 joerg difficult. One way to do this is to allocate a large guard region without 110 1.1 joerg read/write permissions, randomly select a small region within it to be 111 1.1 joerg used as the address of the shadow call stack and mark only that region as 112 1.1 joerg read/write. This also mitigates somewhat against processor side channels. 113 1.1 joerg The intent is that the Android runtime `will do this`_, but the platform will 114 1.1 joerg first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit 115 1.1 joerg memory allocations in certain processes, as this also limits the number of 116 1.1 joerg guard regions that can be allocated. 117 1.1 joerg 118 1.1 joerg .. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622 119 1.1 joerg .. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745 120 1.1 joerg 121 1.1 joerg The runtime will need the address of the shadow call stack in order to 122 1.1 joerg deallocate it when destroying the thread. If the entire program is compiled 123 1.1 joerg with ``-ffixed-x18``, this is trivial: the address can be derived from the 124 1.1 joerg value stored in ``x18`` (e.g. by masking out the lower bits). If a guard 125 1.1 joerg region is used, the address of the start of the guard region could then be 126 1.1 joerg stored at the start of the shadow call stack itself. But if it is possible 127 1.1 joerg for code compiled without ``-ffixed-x18`` to run on a thread managed by the 128 1.1 joerg runtime, which is the case on Android for example, the address must be stored 129 1.1 joerg somewhere else instead. On Android we store the address of the start of the 130 1.1 joerg guard region in TLS and deallocate the entire guard region including the 131 1.1 joerg shadow call stack at thread exit. This is considered acceptable given that 132 1.1 joerg the address of the start of the guard region is already somewhat guessable. 133 1.1 joerg 134 1.1 joerg One way in which the address of the shadow call stack could leak is in the 135 1.1 joerg ``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android 136 1.1 joerg runtime `avoids this`_ by only storing the low bits of ``x18`` in the 137 1.1 joerg ``jmp_buf``, which requires the address of the shadow call stack to be 138 1.1 joerg aligned to its size. 139 1.1 joerg 140 1.1 joerg .. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49 141 1.1 joerg 142 1.1 joerg The architecture's call and return instructions (``bl`` and ``ret``) operate on 143 1.1 joerg a register rather than the stack, which means that leaf functions are generally 144 1.1 joerg protected from return address overwrites even without ShadowCallStack. 145 1.1 joerg 146 1.1 joerg Usage 147 1.1 joerg ===== 148 1.1 joerg 149 1.1 joerg To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` 150 1.1 joerg flag to both compile and link command lines. On aarch64, you also need to pass 151 1.1 joerg ``-ffixed-x18`` unless your target already reserves ``x18``. 152 1.1 joerg 153 1.1 joerg Low-level API 154 1.1 joerg ------------- 155 1.1 joerg 156 1.1 joerg ``__has_feature(shadow_call_stack)`` 157 1.1 joerg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 158 1.1 joerg 159 1.1 joerg In some cases one may need to execute different code depending on whether 160 1.1 joerg ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can 161 1.1 joerg be used for this purpose. 162 1.1 joerg 163 1.1 joerg .. code-block:: c 164 1.1 joerg 165 1.1 joerg #if defined(__has_feature) 166 1.1 joerg # if __has_feature(shadow_call_stack) 167 1.1 joerg // code that builds only under ShadowCallStack 168 1.1 joerg # endif 169 1.1 joerg #endif 170 1.1 joerg 171 1.1 joerg ``__attribute__((no_sanitize("shadow-call-stack")))`` 172 1.1 joerg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 173 1.1 joerg 174 1.1 joerg Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function 175 1.1 joerg declaration to specify that the shadow call stack instrumentation should not be 176 1.1 joerg applied to that function, even if enabled globally. 177 1.1 joerg 178 1.1 joerg Example 179 1.1 joerg ======= 180 1.1 joerg 181 1.1 joerg The following example code: 182 1.1 joerg 183 1.1 joerg .. code-block:: c++ 184 1.1 joerg 185 1.1 joerg int foo() { 186 1.1 joerg return bar() + 1; 187 1.1 joerg } 188 1.1 joerg 189 1.1 joerg Generates the following aarch64 assembly when compiled with ``-O2``: 190 1.1 joerg 191 1.1 joerg .. code-block:: none 192 1.1 joerg 193 1.1 joerg stp x29, x30, [sp, #-16]! 194 1.1 joerg mov x29, sp 195 1.1 joerg bl bar 196 1.1 joerg add w0, w0, #1 197 1.1 joerg ldp x29, x30, [sp], #16 198 1.1 joerg ret 199 1.1 joerg 200 1.1 joerg Adding ``-fsanitize=shadow-call-stack`` would output the following assembly: 201 1.1 joerg 202 1.1 joerg .. code-block:: none 203 1.1 joerg 204 1.1 joerg str x30, [x18], #8 205 1.1 joerg stp x29, x30, [sp, #-16]! 206 1.1 joerg mov x29, sp 207 1.1 joerg bl bar 208 1.1 joerg add w0, w0, #1 209 1.1 joerg ldp x29, x30, [sp], #16 210 1.1 joerg ldr x30, [x18, #-8]! 211 1.1 joerg ret 212