Home | History | Annotate | Line # | Download | only in intro
      1 .. Copyright (C) 2015-2022 Free Software Foundation, Inc.
      2    Originally contributed by David Malcolm <dmalcolm (a] redhat.com>
      3 
      4    This is free software: you can redistribute it and/or modify it
      5    under the terms of the GNU General Public License as published by
      6    the Free Software Foundation, either version 3 of the License, or
      7    (at your option) any later version.
      8 
      9    This program is distributed in the hope that it will be useful, but
     10    WITHOUT ANY WARRANTY; without even the implied warranty of
     11    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
     12    General Public License for more details.
     13 
     14    You should have received a copy of the GNU General Public License
     15    along with this program.  If not, see
     16    <https://www.gnu.org/licenses/>.
     17 
     18 Tutorial part 5: Implementing an Ahead-of-Time compiler
     19 -------------------------------------------------------
     20 
     21 If you have a pre-existing language frontend that's compatible with
     22 libgccjit's license, it's possible to hook it up to libgccjit as a
     23 backend.  In the previous example we showed
     24 how to do that for in-memory JIT-compilation, but libgccjit can also
     25 compile code directly to a file, allowing you to implement a more
     26 traditional ahead-of-time compiler ("JIT" is something of a misnomer
     27 for this use-case).
     28 
     29 The essential difference is to compile the context using
     30 :c:func:`gcc_jit_context_compile_to_file` rather than
     31 :c:func:`gcc_jit_context_compile`.
     32 
     33 The "brainf" language
     34 *********************
     35 
     36 In this example we use libgccjit to construct an ahead-of-time compiler
     37 for an esoteric programming language that we shall refer to as "brainf".
     38 
     39 brainf scripts operate on an array of bytes, with a notional data pointer
     40 within the array.
     41 
     42 brainf is hard for humans to read, but it's trivial to write a parser for
     43 it, as there is no lexing; just a stream of bytes.  The operations are:
     44 
     45 ====================== =============================
     46 Character              Meaning
     47 ====================== =============================
     48 ``>``                  ``idx += 1``
     49 ``<``                  ``idx -= 1``
     50 ``+``                  ``data[idx] += 1``
     51 ``-``                  ``data[idx] -= 1``
     52 ``.``                  ``output (data[idx])``
     53 ``,``                  ``data[idx] = input ()``
     54 ``[``                  loop until ``data[idx] == 0``
     55 ``]``                  end of loop
     56 Anything else          ignored
     57 ====================== =============================
     58 
     59 Unlike the previous example, we'll implement an ahead-of-time compiler,
     60 which reads ``.bf`` scripts and outputs executables (though it would
     61 be trivial to have it run them JIT-compiled in-process).
     62 
     63 Here's what a simple ``.bf`` script looks like:
     64 
     65    .. literalinclude:: ../examples/emit-alphabet.bf
     66     :lines: 1-
     67 
     68 .. note::
     69 
     70    This example makes use of whitespace and comments for legibility, but
     71    could have been written as::
     72 
     73      ++++++++++++++++++++++++++
     74      >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
     75      [>.+<-]
     76 
     77    It's not a particularly useful language, except for providing
     78    compiler-writers with a test case that's easy to parse.  The point
     79    is that you can use :c:func:`gcc_jit_context_compile_to_file`
     80    to use libgccjit as a backend for a pre-existing language frontend
     81    (provided that the pre-existing frontend is compatible with libgccjit's
     82    license).
     83 
     84 Converting a brainf script to libgccjit IR
     85 ******************************************
     86 
     87 As before we write simple code to populate a :c:type:`gcc_jit_context *`.
     88 
     89    .. literalinclude:: ../examples/tut05-bf.c
     90     :start-after: #define MAX_OPEN_PARENS 16
     91     :end-before: /* Entrypoint to the compiler.  */
     92     :language: c
     93 
     94 Compiling a context to a file
     95 *****************************
     96 
     97 Unlike the previous tutorial, this time we'll compile the context
     98 directly to an executable, using :c:func:`gcc_jit_context_compile_to_file`:
     99 
    100 .. code-block:: c
    101 
    102     gcc_jit_context_compile_to_file (ctxt,
    103                                      GCC_JIT_OUTPUT_KIND_EXECUTABLE,
    104                                      output_file);
    105 
    106 Here's the top-level of the compiler, which is what actually calls into
    107 :c:func:`gcc_jit_context_compile_to_file`:
    108 
    109  .. literalinclude:: ../examples/tut05-bf.c
    110     :start-after: /* Entrypoint to the compiler.  */
    111     :end-before: /* Use the built compiler to compile the example to an executable:
    112     :language: c
    113 
    114 Note how once the context is populated you could trivially instead compile
    115 it to memory using :c:func:`gcc_jit_context_compile` and run it in-process
    116 as in the previous tutorial.
    117 
    118 To create an executable, we need to export a ``main`` function.  Here's
    119 how to create one from the JIT API:
    120 
    121  .. literalinclude:: ../examples/tut05-bf.c
    122     :start-after: #include "libgccjit.h"
    123     :end-before: #define MAX_OPEN_PARENS 16
    124     :language: c
    125 
    126 .. note::
    127 
    128    The above implementation ignores ``argc`` and ``argv``, but you could
    129    make use of them by exposing ``param_argc`` and ``param_argv`` to the
    130    caller.
    131 
    132 Upon compiling this C code, we obtain a bf-to-machine-code compiler;
    133 let's call it ``bfc``:
    134 
    135 .. code-block:: console
    136 
    137   $ gcc \
    138       tut05-bf.c \
    139       -o bfc \
    140       -lgccjit
    141 
    142 We can now use ``bfc`` to compile .bf files into machine code executables:
    143 
    144 .. code-block:: console
    145 
    146   $ ./bfc \
    147        emit-alphabet.bf \
    148        a.out
    149 
    150 which we can run directly:
    151 
    152 .. code-block:: console
    153 
    154   $ ./a.out
    155   ABCDEFGHIJKLMNOPQRSTUVWXYZ
    156 
    157 Success!
    158 
    159 We can also inspect the generated executable using standard tools:
    160 
    161 .. code-block:: console
    162 
    163   $ objdump -d a.out |less
    164 
    165 which shows that libgccjit has managed to optimize the function
    166 somewhat (for example, the runs of 26 and 65 increment operations
    167 have become integer constants 0x1a and 0x41):
    168 
    169 .. code-block:: console
    170 
    171   0000000000400620 <main>:
    172     400620:     80 3d 39 0a 20 00 00    cmpb   $0x0,0x200a39(%rip)        # 601060 <data
    173     400627:     74 07                   je     400630 <main
    174     400629:     eb fe                   jmp    400629 <main+0x9>
    175     40062b:     0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
    176     400630:     48 83 ec 08             sub    $0x8,%rsp
    177     400634:     0f b6 05 26 0a 20 00    movzbl 0x200a26(%rip),%eax        # 601061 <data_cells+0x1>
    178     40063b:     c6 05 1e 0a 20 00 1a    movb   $0x1a,0x200a1e(%rip)       # 601060 <data_cells>
    179     400642:     8d 78 41                lea    0x41(%rax),%edi
    180     400645:     40 88 3d 15 0a 20 00    mov    %dil,0x200a15(%rip)        # 601061 <data_cells+0x1>
    181     40064c:     0f 1f 40 00             nopl   0x0(%rax)
    182     400650:     40 0f b6 ff             movzbl %dil,%edi
    183     400654:     e8 87 fe ff ff          callq  4004e0 <putchar@plt>
    184     400659:     0f b6 05 01 0a 20 00    movzbl 0x200a01(%rip),%eax        # 601061 <data_cells+0x1>
    185     400660:     80 2d f9 09 20 00 01    subb   $0x1,0x2009f9(%rip)        # 601060 <data_cells>
    186     400667:     8d 78 01                lea    0x1(%rax),%edi
    187     40066a:     40 88 3d f0 09 20 00    mov    %dil,0x2009f0(%rip)        # 601061 <data_cells+0x1>
    188     400671:     75 dd                   jne    400650 <main+0x30>
    189     400673:     31 c0                   xor    %eax,%eax
    190     400675:     48 83 c4 08             add    $0x8,%rsp
    191     400679:     c3                      retq
    192     40067a:     66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    193 
    194 We also set up debugging information (via
    195 :c:func:`gcc_jit_context_new_location` and
    196 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO`), so it's possible to use ``gdb``
    197 to singlestep through the generated binary and inspect the internal
    198 state ``idx`` and ``data_cells``:
    199 
    200 .. code-block:: console
    201 
    202   (gdb) break main
    203   Breakpoint 1 at 0x400790
    204   (gdb) run
    205   Starting program: a.out
    206 
    207   Breakpoint 1, 0x0000000000400790 in main (argc=1, argv=0x7fffffffe448)
    208   (gdb) stepi
    209   0x0000000000400797 in main (argc=1, argv=0x7fffffffe448)
    210   (gdb) stepi
    211   0x00000000004007a0 in main (argc=1, argv=0x7fffffffe448)
    212   (gdb) stepi
    213   9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
    214   (gdb) list
    215   4
    216   5     cell 0 = 26
    217   6     ++++++++++++++++++++++++++
    218   7
    219   8     cell 1 = 65
    220   9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
    221   10
    222   11    while cell#0 != 0
    223   12    [
    224   13     >
    225   (gdb) n
    226   6     ++++++++++++++++++++++++++
    227   (gdb) n
    228   9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
    229   (gdb) p idx
    230   $1 = 1
    231   (gdb) p data_cells
    232   $2 = "\032", '\000' <repeats 29998 times>
    233   (gdb) p data_cells[0]
    234   $3 = 26 '\032'
    235   (gdb) p data_cells[1]
    236   $4 = 0 '\000'
    237   (gdb) list
    238   4
    239   5     cell 0 = 26
    240   6     ++++++++++++++++++++++++++
    241   7
    242   8     cell 1 = 65
    243   9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
    244   10
    245   11    while cell#0 != 0
    246   12    [
    247   13     >
    248 
    249 
    250 Other forms of ahead-of-time-compilation
    251 ****************************************
    252 
    253 The above demonstrates compiling a :c:type:`gcc_jit_context *` directly
    254 to an executable.  It's also possible to compile it to an object file,
    255 and to a dynamic library.  See the documentation of
    256 :c:func:`gcc_jit_context_compile_to_file` for more information.
    257