Home | History | Annotate | Line # | Download | only in docs
      1  1.1  joerg ===============
      2  1.1  joerg ShadowCallStack
      3  1.1  joerg ===============
      4  1.1  joerg 
      5  1.1  joerg .. contents::
      6  1.1  joerg    :local:
      7  1.1  joerg 
      8  1.1  joerg Introduction
      9  1.1  joerg ============
     10  1.1  joerg 
     11  1.1  joerg ShadowCallStack is an instrumentation pass, currently only implemented for
     12  1.1  joerg aarch64, that protects programs against return address overwrites
     13  1.1  joerg (e.g. stack buffer overflows.) It works by saving a function's return address
     14  1.1  joerg to a separately allocated 'shadow call stack' in the function prolog in
     15  1.1  joerg non-leaf functions and loading the return address from the shadow call stack
     16  1.1  joerg in the function epilog. The return address is also stored on the regular stack
     17  1.1  joerg for compatibility with unwinders, but is otherwise unused.
     18  1.1  joerg 
     19  1.1  joerg The aarch64 implementation is considered production ready, and
     20  1.1  joerg an `implementation of the runtime`_ has been added to Android's libc
     21  1.1  joerg (bionic). An x86_64 implementation was evaluated using Chromium and was found
     22  1.1  joerg to have critical performance and security deficiencies--it was removed in
     23  1.1  joerg LLVM 9.0. Details on the x86_64 implementation can be found in the
     24  1.1  joerg `Clang 7.0.1 documentation`_.
     25  1.1  joerg 
     26  1.1  joerg .. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
     27  1.1  joerg .. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html
     28  1.1  joerg 
     29  1.1  joerg Comparison
     30  1.1  joerg ----------
     31  1.1  joerg 
     32  1.1  joerg To optimize for memory consumption and cache locality, the shadow call
     33  1.1  joerg stack stores only an array of return addresses. This is in contrast to other
     34  1.1  joerg schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
     35  1.1  joerg consuming more memory for shorter function prologs and epilogs with fewer
     36  1.1  joerg memory accesses.
     37  1.1  joerg 
     38  1.1  joerg `Return Flow Guard`_ is a pure software implementation of shadow call stacks
     39  1.1  joerg on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is
     40  1.1  joerg inherently racy due to the architecture's use of the stack for calls and
     41  1.1  joerg returns.
     42  1.1  joerg 
     43  1.1  joerg Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
     44  1.1  joerg extension that would add native support to use a shadow stack to store/check
     45  1.1  joerg return addresses at call/return time. Being a hardware implementation, it
     46  1.1  joerg would not suffer from race conditions and would not incur the overhead of
     47  1.1  joerg function instrumentation, but it does require operating system support.
     48  1.1  joerg 
     49  1.1  joerg .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
     50  1.1  joerg .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
     51  1.1  joerg 
     52  1.1  joerg Compatibility
     53  1.1  joerg -------------
     54  1.1  joerg 
     55  1.1  joerg A runtime is not provided in compiler-rt so one must be provided by the
     56  1.1  joerg compiled application or the operating system. Integrating the runtime into
     57  1.1  joerg the operating system should be preferred since otherwise all thread creation
     58  1.1  joerg and destruction would need to be intercepted by the application.
     59  1.1  joerg 
     60  1.1  joerg The instrumentation makes use of the platform register ``x18``.  On some
     61  1.1  joerg platforms, ``x18`` is reserved, and on others, it is designated as a scratch
     62  1.1  joerg register.  This generally means that any code that may run on the same thread
     63  1.1  joerg as code compiled with ShadowCallStack must either target one of the platforms
     64  1.1  joerg whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
     65  1.1  joerg or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
     66  1.1  joerg compiled without ``-ffixed-x18`` may be run on the same thread as code that
     67  1.1  joerg uses ShadowCallStack by saving the register value temporarily on the stack
     68  1.1  joerg (`example in Android`_) but this should be done with care since it risks
     69  1.1  joerg leaking the shadow call stack address.
     70  1.1  joerg 
     71  1.1  joerg .. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717
     72  1.1  joerg 
     73  1.1  joerg Because of the use of register ``x18``, the ShadowCallStack feature is
     74  1.1  joerg incompatible with any other feature that may use ``x18``. However, there
     75  1.1  joerg is no inherent reason why ShadowCallStack needs to use register ``x18``
     76  1.1  joerg specifically; in principle, a platform could choose to reserve and use another
     77  1.1  joerg register for ShadowCallStack, but this would be incompatible with the AAPCS64.
     78  1.1  joerg 
     79  1.1  joerg Special unwind information is required on functions that are compiled
     80  1.1  joerg with ShadowCallStack and that may be unwound, i.e. functions compiled with
     81  1.1  joerg ``-fexceptions`` (which is the default in C++). Some unwinders (such as the
     82  1.1  joerg libgcc 4.9 unwinder) do not understand this unwind info and will segfault
     83  1.1  joerg when encountering it. LLVM libunwind processes this unwind info correctly,
     84  1.1  joerg however. This means that if exceptions are used together with ShadowCallStack,
     85  1.1  joerg the program must use a compatible unwinder.
     86  1.1  joerg 
     87  1.1  joerg Security
     88  1.1  joerg ========
     89  1.1  joerg 
     90  1.1  joerg ShadowCallStack is intended to be a stronger alternative to
     91  1.1  joerg ``-fstack-protector``. It protects from non-linear overflows and arbitrary
     92  1.1  joerg memory writes to the return address slot.
     93  1.1  joerg 
     94  1.1  joerg The instrumentation makes use of the ``x18`` register to reference the shadow
     95  1.1  joerg call stack, meaning that references to the shadow call stack do not have
     96  1.1  joerg to be stored in memory. This makes it possible to implement a runtime that
     97  1.1  joerg avoids exposing the address of the shadow call stack to attackers that can
     98  1.1  joerg read arbitrary memory. However, attackers could still try to exploit side
     99  1.1  joerg channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
    100  1.1  joerg to discover the address of the shadow call stack.
    101  1.1  joerg 
    102  1.1  joerg .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
    103  1.1  joerg .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
    104  1.1  joerg .. _`[3]`: https://www.vusec.net/projects/anc/
    105  1.1  joerg 
    106  1.1  joerg Unless care is taken when allocating the shadow call stack, it may be
    107  1.1  joerg possible for an attacker to guess its address using the addresses of
    108  1.1  joerg other allocations. Therefore, the address should be chosen to make this
    109  1.1  joerg difficult. One way to do this is to allocate a large guard region without
    110  1.1  joerg read/write permissions, randomly select a small region within it to be
    111  1.1  joerg used as the address of the shadow call stack and mark only that region as
    112  1.1  joerg read/write. This also mitigates somewhat against processor side channels.
    113  1.1  joerg The intent is that the Android runtime `will do this`_, but the platform will
    114  1.1  joerg first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
    115  1.1  joerg memory allocations in certain processes, as this also limits the number of
    116  1.1  joerg guard regions that can be allocated.
    117  1.1  joerg 
    118  1.1  joerg .. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
    119  1.1  joerg .. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745
    120  1.1  joerg 
    121  1.1  joerg The runtime will need the address of the shadow call stack in order to
    122  1.1  joerg deallocate it when destroying the thread. If the entire program is compiled
    123  1.1  joerg with ``-ffixed-x18``, this is trivial: the address can be derived from the
    124  1.1  joerg value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
    125  1.1  joerg region is used, the address of the start of the guard region could then be
    126  1.1  joerg stored at the start of the shadow call stack itself. But if it is possible
    127  1.1  joerg for code compiled without ``-ffixed-x18`` to run on a thread managed by the
    128  1.1  joerg runtime, which is the case on Android for example, the address must be stored
    129  1.1  joerg somewhere else instead. On Android we store the address of the start of the
    130  1.1  joerg guard region in TLS and deallocate the entire guard region including the
    131  1.1  joerg shadow call stack at thread exit. This is considered acceptable given that
    132  1.1  joerg the address of the start of the guard region is already somewhat guessable.
    133  1.1  joerg 
    134  1.1  joerg One way in which the address of the shadow call stack could leak is in the
    135  1.1  joerg ``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
    136  1.1  joerg runtime `avoids this`_ by only storing the low bits of ``x18`` in the
    137  1.1  joerg ``jmp_buf``, which requires the address of the shadow call stack to be
    138  1.1  joerg aligned to its size.
    139  1.1  joerg 
    140  1.1  joerg .. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49
    141  1.1  joerg 
    142  1.1  joerg The architecture's call and return instructions (``bl`` and ``ret``) operate on
    143  1.1  joerg a register rather than the stack, which means that leaf functions are generally
    144  1.1  joerg protected from return address overwrites even without ShadowCallStack.
    145  1.1  joerg 
    146  1.1  joerg Usage
    147  1.1  joerg =====
    148  1.1  joerg 
    149  1.1  joerg To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
    150  1.1  joerg flag to both compile and link command lines. On aarch64, you also need to pass
    151  1.1  joerg ``-ffixed-x18`` unless your target already reserves ``x18``.
    152  1.1  joerg 
    153  1.1  joerg Low-level API
    154  1.1  joerg -------------
    155  1.1  joerg 
    156  1.1  joerg ``__has_feature(shadow_call_stack)``
    157  1.1  joerg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    158  1.1  joerg 
    159  1.1  joerg In some cases one may need to execute different code depending on whether
    160  1.1  joerg ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
    161  1.1  joerg be used for this purpose.
    162  1.1  joerg 
    163  1.1  joerg .. code-block:: c
    164  1.1  joerg 
    165  1.1  joerg     #if defined(__has_feature)
    166  1.1  joerg     #  if __has_feature(shadow_call_stack)
    167  1.1  joerg     // code that builds only under ShadowCallStack
    168  1.1  joerg     #  endif
    169  1.1  joerg     #endif
    170  1.1  joerg 
    171  1.1  joerg ``__attribute__((no_sanitize("shadow-call-stack")))``
    172  1.1  joerg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    173  1.1  joerg 
    174  1.1  joerg Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
    175  1.1  joerg declaration to specify that the shadow call stack instrumentation should not be
    176  1.1  joerg applied to that function, even if enabled globally.
    177  1.1  joerg 
    178  1.1  joerg Example
    179  1.1  joerg =======
    180  1.1  joerg 
    181  1.1  joerg The following example code:
    182  1.1  joerg 
    183  1.1  joerg .. code-block:: c++
    184  1.1  joerg 
    185  1.1  joerg     int foo() {
    186  1.1  joerg       return bar() + 1;
    187  1.1  joerg     }
    188  1.1  joerg 
    189  1.1  joerg Generates the following aarch64 assembly when compiled with ``-O2``:
    190  1.1  joerg 
    191  1.1  joerg .. code-block:: none
    192  1.1  joerg 
    193  1.1  joerg     stp     x29, x30, [sp, #-16]!
    194  1.1  joerg     mov     x29, sp
    195  1.1  joerg     bl      bar
    196  1.1  joerg     add     w0, w0, #1
    197  1.1  joerg     ldp     x29, x30, [sp], #16
    198  1.1  joerg     ret
    199  1.1  joerg 
    200  1.1  joerg Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:
    201  1.1  joerg 
    202  1.1  joerg .. code-block:: none
    203  1.1  joerg 
    204  1.1  joerg     str     x30, [x18], #8
    205  1.1  joerg     stp     x29, x30, [sp, #-16]!
    206  1.1  joerg     mov     x29, sp
    207  1.1  joerg     bl      bar
    208  1.1  joerg     add     w0, w0, #1
    209  1.1  joerg     ldp     x29, x30, [sp], #16
    210  1.1  joerg     ldr     x30, [x18, #-8]!
    211  1.1  joerg     ret
    212