Home | History | Annotate | Line # | Download | only in doc
      1  1.11  mrg @c Copyright (C) 2010-2022 Free Software Foundation, Inc.
      2   1.1  mrg @c This is part of the GCC manual.
      3   1.1  mrg @c For copying conditions, see the file gcc.texi.
      4   1.1  mrg @c Contributed by Jan Hubicka <jh (a] suse.cz> and
      5   1.1  mrg @c Diego Novillo <dnovillo (a] google.com>
      6   1.1  mrg 
      7   1.1  mrg @node LTO
      8   1.1  mrg @chapter Link Time Optimization
      9   1.1  mrg @cindex lto
     10   1.1  mrg @cindex whopr
     11   1.1  mrg @cindex wpa
     12   1.1  mrg @cindex ltrans
     13   1.1  mrg 
     14   1.1  mrg Link Time Optimization (LTO) gives GCC the capability of
     15   1.1  mrg dumping its internal representation (GIMPLE) to disk,
     16   1.1  mrg so that all the different compilation units that make up
     17   1.1  mrg a single executable can be optimized as a single module.
     18   1.1  mrg This expands the scope of inter-procedural optimizations
     19   1.1  mrg to encompass the whole program (or, rather, everything
     20   1.1  mrg that is visible at link time).
     21   1.1  mrg 
     22   1.1  mrg @menu
     23   1.1  mrg * LTO Overview::            Overview of LTO.
     24   1.1  mrg * LTO object file layout::  LTO file sections in ELF.
     25   1.1  mrg * IPA::                     Using summary information in IPA passes.
     26   1.1  mrg * WHOPR::                   Whole program assumptions,
     27   1.1  mrg                             linker plugin and symbol visibilities.
     28   1.1  mrg * Internal flags::          Internal flags controlling @code{lto1}.
     29   1.1  mrg @end menu
     30   1.1  mrg 
     31   1.1  mrg @node LTO Overview
     32   1.1  mrg @section Design Overview
     33   1.1  mrg 
     34   1.1  mrg Link time optimization is implemented as a GCC front end for a
     35   1.1  mrg bytecode representation of GIMPLE that is emitted in special sections
     36   1.1  mrg of @code{.o} files.  Currently, LTO support is enabled in most
     37   1.1  mrg ELF-based systems, as well as darwin, cygwin and mingw systems.
     38   1.1  mrg 
     39  1.11  mrg By default, object files generated with LTO support contain only GIMPLE
     40  1.11  mrg bytecode.  Such objects are called ``slim'', and they require that
     41  1.11  mrg tools like @code{ar} and @code{nm} understand symbol tables of LTO
     42  1.11  mrg sections.  For most targets these tools have been extended to use the
     43  1.11  mrg plugin infrastructure, so GCC can support ``slim'' objects consisting
     44  1.11  mrg of the intermediate code alone.
     45  1.11  mrg 
     46  1.11  mrg GIMPLE bytecode could also be saved alongside final object code if
     47  1.11  mrg the @option{-ffat-lto-objects} option is passed, or if no plugin support
     48  1.11  mrg is detected for @code{ar} and @code{nm} when GCC is configured.  It makes
     49  1.11  mrg the object files generated with LTO support larger than regular object
     50  1.11  mrg files.  This ``fat'' object format allows to ship one set of fat
     51   1.1  mrg objects which could be used both for development and the production of
     52   1.1  mrg optimized builds.  A, perhaps surprising, side effect of this feature
     53   1.4  mrg is that any mistake in the toolchain leads to LTO information not
     54   1.1  mrg being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
     55   1.1  mrg This is both an advantage, as the system is more robust, and a
     56   1.1  mrg disadvantage, as the user is not informed that the optimization has
     57   1.1  mrg been disabled.
     58   1.1  mrg 
     59   1.1  mrg At the highest level, LTO splits the compiler in two.  The first half
     60   1.1  mrg (the ``writer'') produces a streaming representation of all the
     61   1.1  mrg internal data structures needed to optimize and generate code.  This
     62   1.1  mrg includes declarations, types, the callgraph and the GIMPLE representation
     63   1.1  mrg of function bodies.
     64   1.1  mrg 
     65   1.1  mrg When @option{-flto} is given during compilation of a source file, the
     66   1.1  mrg pass manager executes all the passes in @code{all_lto_gen_passes}.
     67   1.1  mrg Currently, this phase is composed of two IPA passes:
     68   1.1  mrg 
     69   1.1  mrg @itemize @bullet
     70   1.1  mrg @item @code{pass_ipa_lto_gimple_out}
     71   1.1  mrg This pass executes the function @code{lto_output} in
     72  1.11  mrg @file{lto-streamer-out.cc}, which traverses the call graph encoding
     73   1.1  mrg every reachable declaration, type and function.  This generates a
     74   1.1  mrg memory representation of all the file sections described below.
     75   1.1  mrg 
     76   1.1  mrg @item @code{pass_ipa_lto_finish_out}
     77   1.1  mrg This pass executes the function @code{produce_asm_for_decls} in
     78  1.11  mrg @file{lto-streamer-out.cc}, which takes the memory image built in the
     79   1.1  mrg previous pass and encodes it in the corresponding ELF file sections.
     80   1.1  mrg @end itemize
     81   1.1  mrg 
     82   1.1  mrg The second half of LTO support is the ``reader''.  This is implemented
     83  1.11  mrg as the GCC front end @file{lto1} in @file{lto/lto.cc}.  When
     84   1.1  mrg @file{collect2} detects a link set of @code{.o}/@code{.a} files with
     85   1.1  mrg LTO information and the @option{-flto} is enabled, it invokes
     86   1.1  mrg @file{lto1} which reads the set of files and aggregates them into a
     87   1.1  mrg single translation unit for optimization.  The main entry point for
     88  1.11  mrg the reader is @file{lto/lto.cc}:@code{lto_main}.
     89   1.1  mrg 
     90   1.1  mrg @subsection LTO modes of operation
     91   1.1  mrg 
     92   1.1  mrg One of the main goals of the GCC link-time infrastructure was to allow
     93   1.1  mrg effective compilation of large programs.  For this reason GCC implements two
     94   1.1  mrg link-time compilation modes.
     95   1.1  mrg 
     96   1.1  mrg @enumerate
     97   1.1  mrg @item	@emph{LTO mode}, in which the whole program is read into the
     98   1.1  mrg compiler at link-time and optimized in a similar way as if it
     99   1.1  mrg were a single source-level compilation unit.
    100   1.1  mrg 
    101   1.1  mrg @item	@emph{WHOPR or partitioned mode}, designed to utilize multiple
    102   1.1  mrg CPUs and/or a distributed compilation environment to quickly link
    103   1.1  mrg large applications.  WHOPR stands for WHOle Program optimizeR (not to
    104   1.1  mrg be confused with the semantics of @option{-fwhole-program}).  It
    105   1.1  mrg partitions the aggregated callgraph from many different @code{.o}
    106   1.1  mrg files and distributes the compilation of the sub-graphs to different
    107   1.1  mrg CPUs.
    108   1.1  mrg 
    109   1.1  mrg Note that distributed compilation is not implemented yet, but since
    110   1.1  mrg the parallelism is facilitated via generating a @code{Makefile}, it
    111   1.1  mrg would be easy to implement.
    112   1.1  mrg @end enumerate
    113   1.1  mrg 
    114   1.1  mrg WHOPR splits LTO into three main stages:
    115   1.1  mrg @enumerate
    116   1.1  mrg @item Local generation (LGEN)
    117   1.1  mrg This stage executes in parallel.  Every file in the program is compiled
    118   1.1  mrg into the intermediate language and packaged together with the local
    119   1.1  mrg call-graph and summary information.  This stage is the same for both
    120   1.1  mrg the LTO and WHOPR compilation mode.
    121   1.1  mrg 
    122   1.1  mrg @item Whole Program Analysis (WPA)
    123   1.1  mrg WPA is performed sequentially.  The global call-graph is generated, and
    124   1.1  mrg a global analysis procedure makes transformation decisions.  The global
    125   1.1  mrg call-graph is partitioned to facilitate parallel optimization during
    126   1.1  mrg phase 3.  The results of the WPA stage are stored into new object files
    127   1.1  mrg which contain the partitions of program expressed in the intermediate
    128   1.1  mrg language and the optimization decisions.
    129   1.1  mrg 
    130   1.1  mrg @item Local transformations (LTRANS)
    131   1.1  mrg This stage executes in parallel.  All the decisions made during phase 2
    132   1.1  mrg are implemented locally in each partitioned object file, and the final
    133   1.1  mrg object code is generated.  Optimizations which cannot be decided
    134   1.1  mrg efficiently during the phase 2 may be performed on the local
    135   1.1  mrg call-graph partitions.
    136   1.1  mrg @end enumerate
    137   1.1  mrg 
    138   1.1  mrg WHOPR can be seen as an extension of the usual LTO mode of
    139   1.1  mrg compilation.  In LTO, WPA and LTRANS are executed within a single
    140   1.1  mrg execution of the compiler, after the whole program has been read into
    141   1.1  mrg memory.
    142   1.1  mrg 
    143   1.1  mrg When compiling in WHOPR mode, the callgraph is partitioned during
    144   1.1  mrg the WPA stage.  The whole program is split into a given number of
    145   1.1  mrg partitions of roughly the same size.  The compiler tries to
    146   1.1  mrg minimize the number of references which cross partition boundaries.
    147   1.1  mrg The main advantage of WHOPR is to allow the parallel execution of
    148   1.1  mrg LTRANS stages, which are the most time-consuming part of the
    149   1.1  mrg compilation process.  Additionally, it avoids the need to load the
    150   1.1  mrg whole program into memory.
    151   1.1  mrg 
    152   1.1  mrg 
    153   1.1  mrg @node LTO object file layout
    154   1.1  mrg @section LTO file sections
    155   1.1  mrg 
    156   1.1  mrg LTO information is stored in several ELF sections inside object files.
    157   1.1  mrg Data structures and enum codes for sections are defined in
    158   1.1  mrg @file{lto-streamer.h}.
    159   1.1  mrg 
    160  1.11  mrg These sections are emitted from @file{lto-streamer-out.cc} and mapped
    161  1.11  mrg in all at once from @file{lto/lto.cc}:@code{lto_file_read}.  The
    162   1.1  mrg individual functions dealing with the reading/writing of each section
    163   1.1  mrg are described below.
    164   1.1  mrg 
    165   1.1  mrg @itemize @bullet
    166   1.1  mrg @item Command line options (@code{.gnu.lto_.opts})
    167   1.1  mrg 
    168   1.1  mrg This section contains the command line options used to generate the
    169   1.1  mrg object files.  This is used at link time to determine the optimization
    170   1.1  mrg level and other settings when they are not explicitly specified at the
    171   1.1  mrg linker command line.
    172   1.1  mrg 
    173   1.1  mrg Currently, GCC does not support combining LTO object files compiled
    174   1.1  mrg with different set of the command line options into a single binary.
    175   1.1  mrg At link time, the options given on the command line and the options
    176   1.1  mrg saved on all the files in a link-time set are applied globally.  No
    177   1.1  mrg attempt is made at validating the combination of flags (other than the
    178   1.1  mrg usual validation done by option processing).  This is implemented in
    179  1.11  mrg @file{lto/lto.cc}:@code{lto_read_all_file_options}.
    180   1.1  mrg 
    181   1.1  mrg 
    182   1.1  mrg @item Symbol table (@code{.gnu.lto_.symtab})
    183   1.1  mrg 
    184   1.1  mrg This table replaces the ELF symbol table for functions and variables
    185   1.1  mrg represented in the LTO IL.  Symbols used and exported by the optimized
    186   1.1  mrg assembly code of ``fat'' objects might not match the ones used and
    187   1.1  mrg exported by the intermediate code.  This table is necessary because
    188   1.1  mrg the intermediate code is less optimized and thus requires a separate
    189   1.1  mrg symbol table.
    190   1.1  mrg 
    191   1.1  mrg Additionally, the binary code in the ``fat'' object will lack a call
    192   1.1  mrg to a function, since the call was optimized out at compilation time
    193   1.1  mrg after the intermediate language was streamed out.  In some special
    194   1.1  mrg cases, the same optimization may not happen during link-time
    195   1.1  mrg optimization.  This would lead to an undefined symbol if only one
    196   1.1  mrg symbol table was used.
    197   1.1  mrg 
    198   1.1  mrg The symbol table is emitted in
    199  1.11  mrg @file{lto-streamer-out.cc}:@code{produce_symtab}.
    200   1.1  mrg 
    201   1.1  mrg 
    202   1.1  mrg @item Global declarations and types (@code{.gnu.lto_.decls})
    203   1.1  mrg 
    204   1.1  mrg This section contains an intermediate language dump of all
    205   1.1  mrg declarations and types required to represent the callgraph, static
    206   1.1  mrg variables and top-level debug info.
    207   1.1  mrg 
    208   1.1  mrg The contents of this section are emitted in
    209  1.11  mrg @file{lto-streamer-out.cc}:@code{produce_asm_for_decls}.  Types and
    210   1.1  mrg symbols are emitted in a topological order that preserves the sharing
    211   1.1  mrg of pointers when the file is read back in
    212  1.11  mrg (@file{lto.cc}:@code{read_cgraph_and_symbols}).
    213   1.1  mrg 
    214   1.1  mrg 
    215   1.1  mrg @item The callgraph (@code{.gnu.lto_.cgraph})
    216   1.1  mrg 
    217   1.1  mrg This section contains the basic data structure used by the GCC
    218   1.1  mrg inter-procedural optimization infrastructure.  This section stores an
    219   1.1  mrg annotated multi-graph which represents the functions and call sites as
    220   1.1  mrg well as the variables, aliases and top-level @code{asm} statements.
    221   1.1  mrg 
    222   1.1  mrg This section is emitted in
    223  1.11  mrg @file{lto-streamer-out.cc}:@code{output_cgraph} and read in
    224  1.11  mrg @file{lto-cgraph.cc}:@code{input_cgraph}.
    225   1.1  mrg 
    226   1.1  mrg 
    227   1.1  mrg @item IPA references (@code{.gnu.lto_.refs})
    228   1.1  mrg 
    229   1.1  mrg This section contains references between function and static
    230  1.11  mrg variables.  It is emitted by @file{lto-cgraph.cc}:@code{output_refs}
    231  1.11  mrg and read by @file{lto-cgraph.cc}:@code{input_refs}.
    232   1.1  mrg 
    233   1.1  mrg 
    234   1.1  mrg @item Function bodies (@code{.gnu.lto_.function_body.<name>})
    235   1.1  mrg 
    236   1.1  mrg This section contains function bodies in the intermediate language
    237   1.1  mrg representation.  Every function body is in a separate section to allow
    238   1.1  mrg copying of the section independently to different object files or
    239   1.1  mrg reading the function on demand.
    240   1.1  mrg 
    241   1.1  mrg Functions are emitted in
    242  1.11  mrg @file{lto-streamer-out.cc}:@code{output_function} and read in
    243  1.11  mrg @file{lto-streamer-in.cc}:@code{input_function}.
    244   1.1  mrg 
    245   1.1  mrg 
    246   1.1  mrg @item Static variable initializers (@code{.gnu.lto_.vars})
    247   1.1  mrg 
    248   1.1  mrg This section contains all the symbols in the global variable pool.  It
    249  1.11  mrg is emitted by @file{lto-cgraph.cc}:@code{output_varpool} and read in
    250  1.11  mrg @file{lto-cgraph.cc}:@code{input_cgraph}.
    251   1.1  mrg 
    252   1.1  mrg @item Summaries and optimization summaries used by IPA passes
    253   1.1  mrg (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
    254   1.1  mrg @code{pureconst} or @code{reference})
    255   1.1  mrg 
    256   1.1  mrg These sections are used by IPA passes that need to emit summary
    257   1.1  mrg information during LTO generation to be read and aggregated at
    258   1.1  mrg link time.  Each pass is responsible for implementing two pass manager
    259   1.1  mrg hooks: one for writing the summary and another for reading it in.  The
    260   1.1  mrg format of these sections is entirely up to each individual pass.  The
    261   1.1  mrg only requirement is that the writer and reader hooks agree on the
    262   1.1  mrg format.
    263   1.1  mrg @end itemize
    264   1.1  mrg 
    265   1.1  mrg 
    266   1.1  mrg @node IPA
    267   1.1  mrg @section Using summary information in IPA passes
    268   1.1  mrg 
    269   1.1  mrg Programs are represented internally as a @emph{callgraph} (a
    270   1.1  mrg multi-graph where nodes are functions and edges are call sites)
    271   1.1  mrg and a @emph{varpool} (a list of static and external variables in
    272   1.1  mrg the program).
    273   1.1  mrg 
    274   1.1  mrg The inter-procedural optimization is organized as a sequence of
    275   1.1  mrg individual passes, which operate on the callgraph and the
    276   1.1  mrg varpool.  To make the implementation of WHOPR possible, every
    277   1.1  mrg inter-procedural optimization pass is split into several stages
    278   1.1  mrg that are executed at different times during WHOPR compilation:
    279   1.1  mrg 
    280   1.1  mrg @itemize @bullet
    281   1.1  mrg @item LGEN time
    282   1.1  mrg @enumerate
    283   1.1  mrg @item @emph{Generate summary} (@code{generate_summary} in
    284   1.1  mrg @code{struct ipa_opt_pass_d}).  This stage analyzes every function
    285   1.1  mrg body and variable initializer is examined and stores relevant
    286   1.1  mrg information into a pass-specific data structure.
    287   1.1  mrg 
    288   1.1  mrg @item @emph{Write summary} (@code{write_summary} in
    289   1.1  mrg @code{struct ipa_opt_pass_d}).  This stage writes all the
    290   1.1  mrg pass-specific information generated by @code{generate_summary}.
    291   1.1  mrg Summaries go into their own @code{LTO_section_*} sections that
    292   1.1  mrg have to be declared in @file{lto-streamer.h}:@code{enum
    293   1.1  mrg lto_section_type}.  A new section is created by calling
    294   1.1  mrg @code{create_output_block} and data can be written using the
    295   1.1  mrg @code{lto_output_*} routines.
    296   1.1  mrg @end enumerate
    297   1.1  mrg 
    298   1.1  mrg @item WPA time
    299   1.1  mrg @enumerate
    300   1.1  mrg @item @emph{Read summary} (@code{read_summary} in
    301   1.1  mrg @code{struct ipa_opt_pass_d}).  This stage reads all the
    302   1.1  mrg pass-specific information in exactly the same order that it was
    303   1.1  mrg written by @code{write_summary}.
    304   1.1  mrg 
    305   1.1  mrg @item @emph{Execute} (@code{execute} in @code{struct
    306   1.1  mrg opt_pass}).  This performs inter-procedural propagation.  This
    307   1.1  mrg must be done without actual access to the individual function
    308   1.1  mrg bodies or variable initializers.  Typically, this results in a
    309   1.1  mrg transitive closure operation over the summary information of all
    310   1.1  mrg the nodes in the callgraph.
    311   1.1  mrg 
    312   1.1  mrg @item @emph{Write optimization summary}
    313   1.1  mrg (@code{write_optimization_summary} in @code{struct
    314   1.1  mrg ipa_opt_pass_d}).  This writes the result of the inter-procedural
    315   1.1  mrg propagation into the object file.  This can use the same data
    316   1.1  mrg structures and helper routines used in @code{write_summary}.
    317   1.1  mrg @end enumerate
    318   1.1  mrg 
    319   1.1  mrg @item LTRANS time
    320   1.1  mrg @enumerate
    321   1.1  mrg @item @emph{Read optimization summary}
    322   1.1  mrg (@code{read_optimization_summary} in @code{struct
    323   1.1  mrg ipa_opt_pass_d}).  The counterpart to
    324   1.1  mrg @code{write_optimization_summary}.  This reads the interprocedural
    325   1.1  mrg optimization decisions in exactly the same format emitted by
    326   1.1  mrg @code{write_optimization_summary}.
    327   1.1  mrg 
    328   1.1  mrg @item @emph{Transform} (@code{function_transform} and
    329   1.1  mrg @code{variable_transform} in @code{struct ipa_opt_pass_d}).
    330   1.1  mrg The actual function bodies and variable initializers are updated
    331   1.1  mrg based on the information passed down from the @emph{Execute} stage.
    332   1.1  mrg @end enumerate
    333   1.1  mrg @end itemize
    334   1.1  mrg 
    335   1.1  mrg The implementation of the inter-procedural passes are shared
    336   1.1  mrg between LTO, WHOPR and classic non-LTO compilation.
    337   1.1  mrg 
    338   1.1  mrg @itemize
    339   1.1  mrg @item During the traditional file-by-file mode every pass executes its
    340   1.1  mrg own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
    341   1.1  mrg stages within the single execution context of the compiler.
    342   1.1  mrg 
    343   1.1  mrg @item In LTO compilation mode, every pass uses @emph{Generate
    344   1.1  mrg summary} and @emph{Write summary} stages at compilation time,
    345   1.1  mrg while the @emph{Read summary}, @emph{Execute}, and
    346   1.1  mrg @emph{Transform} stages are executed at link time.
    347   1.1  mrg 
    348   1.1  mrg @item In WHOPR mode all stages are used.
    349   1.1  mrg @end itemize
    350   1.1  mrg 
    351   1.1  mrg To simplify development, the GCC pass manager differentiates
    352  1.10  mrg between normal inter-procedural passes (@pxref{Regular IPA passes}),
    353  1.10  mrg small inter-procedural passes (@pxref{Small IPA passes})
    354  1.10  mrg and late inter-procedural passes (@pxref{Late IPA passes}).
    355  1.10  mrg A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does
    356  1.10  mrg everything at once and thus cannot be executed during WPA in
    357   1.1  mrg WHOPR mode.  It defines only the @emph{Execute} stage and during
    358   1.1  mrg this stage it accesses and modifies the function bodies.  Such
    359   1.1  mrg passes are useful for optimization at LGEN or LTRANS time and are
    360   1.1  mrg used, for example, to implement early optimization before writing
    361   1.1  mrg object files.  The simple inter-procedural passes can also be used
    362   1.1  mrg for easier prototyping and development of a new inter-procedural
    363   1.1  mrg pass.
    364   1.1  mrg 
    365   1.1  mrg 
    366   1.1  mrg @subsection Virtual clones
    367   1.1  mrg 
    368   1.1  mrg One of the main challenges of introducing the WHOPR compilation
    369   1.1  mrg mode was addressing the interactions between optimization passes.
    370   1.1  mrg In LTO compilation mode, the passes are executed in a sequence,
    371   1.1  mrg each of which consists of analysis (or @emph{Generate summary}),
    372   1.1  mrg propagation (or @emph{Execute}) and @emph{Transform} stages.
    373   1.1  mrg Once the work of one pass is finished, the next pass sees the
    374   1.1  mrg updated program representation and can execute.  This makes the
    375   1.1  mrg individual passes dependent on each other.
    376   1.1  mrg 
    377   1.1  mrg In WHOPR mode all passes first execute their @emph{Generate
    378   1.1  mrg summary} stage.  Then summary writing marks the end of the LGEN
    379   1.1  mrg stage.  At WPA time,
    380   1.1  mrg the summaries are read back into memory and all passes run the
    381   1.1  mrg @emph{Execute} stage.  Optimization summaries are streamed and
    382   1.1  mrg sent to LTRANS, where all the passes execute the @emph{Transform}
    383   1.1  mrg stage.
    384   1.1  mrg 
    385   1.1  mrg Most optimization passes split naturally into analysis,
    386   1.1  mrg propagation and transformation stages.  But some do not.  The
    387   1.1  mrg main problem arises when one pass performs changes and the
    388   1.1  mrg following pass gets confused by seeing different callgraphs
    389   1.1  mrg between the @emph{Transform} stage and the @emph{Generate summary}
    390   1.1  mrg or @emph{Execute} stage.  This means that the passes are required
    391   1.1  mrg to communicate their decisions with each other.
    392   1.1  mrg 
    393   1.1  mrg To facilitate this communication, the GCC callgraph
    394   1.1  mrg infrastructure implements @emph{virtual clones}, a method of
    395   1.1  mrg representing the changes performed by the optimization passes in
    396   1.1  mrg the callgraph without needing to update function bodies.
    397   1.1  mrg 
    398   1.1  mrg A @emph{virtual clone} in the callgraph is a function that has no
    399   1.1  mrg associated body, just a description of how to create its body based
    400   1.1  mrg on a different function (which itself may be a virtual clone).
    401   1.1  mrg 
    402   1.1  mrg The description of function modifications includes adjustments to
    403   1.1  mrg the function's signature (which allows, for example, removing or
    404   1.1  mrg adding function arguments), substitutions to perform on the
    405   1.1  mrg function body, and, for inlined functions, a pointer to the
    406   1.1  mrg function that it will be inlined into.
    407   1.1  mrg 
    408   1.1  mrg It is also possible to redirect any edge of the callgraph from a
    409   1.1  mrg function to its virtual clone.  This implies updating of the call
    410   1.1  mrg site to adjust for the new function signature.
    411   1.1  mrg 
    412   1.1  mrg Most of the transformations performed by inter-procedural
    413   1.1  mrg optimizations can be represented via virtual clones.  For
    414   1.1  mrg instance, a constant propagation pass can produce a virtual clone
    415   1.1  mrg of the function which replaces one of its arguments by a
    416   1.1  mrg constant.  The inliner can represent its decisions by producing a
    417   1.1  mrg clone of a function whose body will be later integrated into
    418   1.1  mrg a given function.
    419   1.1  mrg 
    420   1.1  mrg Using @emph{virtual clones}, the program can be easily updated
    421   1.1  mrg during the @emph{Execute} stage, solving most of pass interactions
    422   1.1  mrg problems that would otherwise occur during @emph{Transform}.
    423   1.1  mrg 
    424   1.1  mrg Virtual clones are later materialized in the LTRANS stage and
    425   1.1  mrg turned into real functions.  Passes executed after the virtual
    426   1.1  mrg clone were introduced also perform their @emph{Transform} stage
    427   1.1  mrg on new functions, so for a pass there is no significant
    428   1.1  mrg difference between operating on a real function or a virtual
    429   1.1  mrg clone introduced before its @emph{Execute} stage.
    430   1.1  mrg 
    431   1.1  mrg Optimization passes then work on virtual clones introduced before
    432   1.1  mrg their @emph{Execute} stage as if they were real functions.  The
    433   1.1  mrg only difference is that clones are not visible during the
    434   1.1  mrg @emph{Generate Summary} stage.
    435   1.1  mrg 
    436   1.1  mrg To keep function summaries updated, the callgraph interface
    437   1.1  mrg allows an optimizer to register a callback that is called every
    438   1.1  mrg time a new clone is introduced as well as when the actual
    439   1.1  mrg function or variable is generated or when a function or variable
    440   1.1  mrg is removed.  These hooks are registered in the @emph{Generate
    441   1.1  mrg summary} stage and allow the pass to keep its information intact
    442   1.1  mrg until the @emph{Execute} stage.  The same hooks can also be
    443   1.1  mrg registered during the @emph{Execute} stage to keep the
    444   1.1  mrg optimization summaries updated for the @emph{Transform} stage.
    445   1.1  mrg 
    446   1.1  mrg @subsection IPA references
    447   1.1  mrg 
    448   1.1  mrg GCC represents IPA references in the callgraph.  For a function
    449   1.1  mrg or variable @code{A}, the @emph{IPA reference} is a list of all
    450   1.1  mrg locations where the address of @code{A} is taken and, when
    451   1.1  mrg @code{A} is a variable, a list of all direct stores and reads
    452   1.1  mrg to/from @code{A}.  References represent an oriented multi-graph on
    453   1.1  mrg the union of nodes of the callgraph and the varpool.  See
    454  1.11  mrg @file{ipa-reference.cc}:@code{ipa_reference_write_optimization_summary}
    455   1.1  mrg and
    456  1.11  mrg @file{ipa-reference.cc}:@code{ipa_reference_read_optimization_summary}
    457   1.1  mrg for details.
    458   1.1  mrg 
    459   1.1  mrg @subsection Jump functions
    460   1.1  mrg Suppose that an optimization pass sees a function @code{A} and it
    461   1.1  mrg knows the values of (some of) its arguments.  The @emph{jump
    462   1.1  mrg function} describes the value of a parameter of a given function
    463   1.1  mrg call in function @code{A} based on this knowledge.
    464   1.1  mrg 
    465   1.1  mrg Jump functions are used by several optimizations, such as the
    466   1.1  mrg inter-procedural constant propagation pass and the
    467   1.1  mrg devirtualization pass.  The inliner also uses jump functions to
    468   1.1  mrg perform inlining of callbacks.
    469   1.1  mrg 
    470   1.1  mrg @node WHOPR
    471   1.1  mrg @section Whole program assumptions, linker plugin and symbol visibilities
    472   1.1  mrg 
    473   1.1  mrg Link-time optimization gives relatively minor benefits when used
    474   1.1  mrg alone.  The problem is that propagation of inter-procedural
    475   1.1  mrg information does not work well across functions and variables
    476   1.1  mrg that are called or referenced by other compilation units (such as
    477   1.1  mrg from a dynamically linked library).  We say that such functions
    478   1.1  mrg and variables are @emph{externally visible}.
    479   1.1  mrg 
    480   1.1  mrg To make the situation even more difficult, many applications
    481   1.1  mrg organize themselves as a set of shared libraries, and the default
    482   1.1  mrg ELF visibility rules allow one to overwrite any externally
    483   1.1  mrg visible symbol with a different symbol at runtime.  This
    484   1.1  mrg basically disables any optimizations across such functions and
    485   1.1  mrg variables, because the compiler cannot be sure that the function
    486   1.1  mrg body it is seeing is the same function body that will be used at
    487   1.1  mrg runtime.  Any function or variable not declared @code{static} in
    488   1.1  mrg the sources degrades the quality of inter-procedural
    489   1.1  mrg optimization.
    490   1.1  mrg 
    491   1.1  mrg To avoid this problem the compiler must assume that it sees the
    492   1.1  mrg whole program when doing link-time optimization.  Strictly
    493   1.1  mrg speaking, the whole program is rarely visible even at link-time.
    494   1.1  mrg Standard system libraries are usually linked dynamically or not
    495   1.1  mrg provided with the link-time information.  In GCC, the whole
    496   1.1  mrg program option (@option{-fwhole-program}) asserts that every
    497   1.1  mrg function and variable defined in the current compilation
    498   1.1  mrg unit is static, except for function @code{main} (note: at
    499   1.1  mrg link time, the current unit is the union of all objects compiled
    500   1.1  mrg with LTO).  Since some functions and variables need to
    501   1.1  mrg be referenced externally, for example by another DSO or from an
    502   1.1  mrg assembler file, GCC also provides the function and variable
    503   1.1  mrg attribute @code{externally_visible} which can be used to disable
    504   1.1  mrg the effect of @option{-fwhole-program} on a specific symbol.
    505   1.1  mrg 
    506   1.1  mrg The whole program mode assumptions are slightly more complex in
    507   1.1  mrg C++, where inline functions in headers are put into @emph{COMDAT}
    508   1.1  mrg sections.  COMDAT function and variables can be defined by
    509   1.1  mrg multiple object files and their bodies are unified at link-time
    510   1.1  mrg and dynamic link-time.  COMDAT functions are changed to local only
    511   1.1  mrg when their address is not taken and thus un-sharing them with a
    512   1.1  mrg library is not harmful.  COMDAT variables always remain externally
    513   1.1  mrg visible, however for readonly variables it is assumed that their
    514   1.1  mrg initializers cannot be overwritten by a different value.
    515   1.1  mrg 
    516   1.1  mrg GCC provides the function and variable attribute
    517   1.1  mrg @code{visibility} that can be used to specify the visibility of
    518   1.1  mrg externally visible symbols (or alternatively an
    519   1.1  mrg @option{-fdefault-visibility} command line option).  ELF defines
    520   1.1  mrg the @code{default}, @code{protected}, @code{hidden} and
    521   1.1  mrg @code{internal} visibilities.
    522   1.1  mrg 
    523   1.1  mrg The most commonly used is visibility is @code{hidden}.  It
    524   1.1  mrg specifies that the symbol cannot be referenced from outside of
    525   1.1  mrg the current shared library.  Unfortunately, this information
    526   1.1  mrg cannot be used directly by the link-time optimization in the
    527   1.1  mrg compiler since the whole shared library also might contain
    528   1.1  mrg non-LTO objects and those are not visible to the compiler.
    529   1.1  mrg 
    530   1.1  mrg GCC solves this problem using linker plugins.  A @emph{linker
    531   1.1  mrg plugin} is an interface to the linker that allows an external
    532   1.1  mrg program to claim the ownership of a given object file.  The linker
    533   1.1  mrg then performs the linking procedure by querying the plugin about
    534   1.1  mrg the symbol table of the claimed objects and once the linking
    535   1.1  mrg decisions are complete, the plugin is allowed to provide the
    536   1.1  mrg final object file before the actual linking is made.  The linker
    537   1.1  mrg plugin obtains the symbol resolution information which specifies
    538   1.1  mrg which symbols provided by the claimed objects are bound from the
    539   1.1  mrg rest of a binary being linked.
    540   1.1  mrg 
    541   1.1  mrg GCC is designed to be independent of the rest of the toolchain
    542   1.1  mrg and aims to support linkers without plugin support.  For this
    543   1.1  mrg reason it does not use the linker plugin by default.  Instead,
    544   1.1  mrg the object files are examined by @command{collect2} before being
    545   1.1  mrg passed to the linker and objects found to have LTO sections are
    546   1.1  mrg passed to @command{lto1} first.  This mode does not work for
    547   1.1  mrg library archives.  The decision on what object files from the
    548   1.1  mrg archive are needed depends on the actual linking and thus GCC
    549   1.1  mrg would have to implement the linker itself.  The resolution
    550   1.1  mrg information is missing too and thus GCC needs to make an educated
    551   1.1  mrg guess based on @option{-fwhole-program}.  Without the linker
    552   1.1  mrg plugin GCC also assumes that symbols are declared @code{hidden}
    553   1.1  mrg and not referred by non-LTO code by default.
    554   1.1  mrg 
    555   1.1  mrg @node Internal flags
    556   1.1  mrg @section Internal flags controlling @code{lto1}
    557   1.1  mrg 
    558   1.1  mrg The following flags are passed into @command{lto1} and are not
    559   1.1  mrg meant to be used directly from the command line.
    560   1.1  mrg 
    561   1.1  mrg @itemize
    562   1.1  mrg @item -fwpa
    563   1.1  mrg @opindex fwpa
    564   1.1  mrg This option runs the serial part of the link-time optimizer
    565   1.1  mrg performing the inter-procedural propagation (WPA mode).  The
    566   1.1  mrg compiler reads in summary information from all inputs and
    567   1.1  mrg performs an analysis based on summary information only.  It
    568   1.1  mrg generates object files for subsequent runs of the link-time
    569   1.1  mrg optimizer where individual object files are optimized using both
    570   1.1  mrg summary information from the WPA mode and the actual function
    571   1.1  mrg bodies.  It then drives the LTRANS phase.
    572   1.1  mrg 
    573   1.1  mrg @item -fltrans
    574   1.1  mrg @opindex fltrans
    575   1.1  mrg This option runs the link-time optimizer in the
    576   1.1  mrg local-transformation (LTRANS) mode, which reads in output from a
    577   1.1  mrg previous run of the LTO in WPA mode.  In the LTRANS mode, LTO
    578   1.1  mrg optimizes an object and produces the final assembly.
    579   1.1  mrg 
    580   1.1  mrg @item -fltrans-output-list=@var{file}
    581   1.1  mrg @opindex fltrans-output-list
    582   1.1  mrg This option specifies a file to which the names of LTRANS output
    583   1.1  mrg files are written.  This option is only meaningful in conjunction
    584   1.1  mrg with @option{-fwpa}.
    585   1.3  mrg 
    586   1.3  mrg @item -fresolution=@var{file}
    587   1.3  mrg @opindex fresolution
    588   1.3  mrg This option specifies the linker resolution file.  This option is
    589   1.3  mrg only meaningful in conjunction with @option{-fwpa} and as option
    590   1.3  mrg to pass through to the LTO linker plugin.
    591   1.1  mrg @end itemize
    592