gcc/doc/lto.texi

1.10  mrg @c Copyright (C) 2010-2020 Free Software Foundation, Inc.
 1.1  mrg @c This is part of the GCC manual.
 1.1  mrg @c For copying conditions, see the file gcc.texi.
 1.1  mrg @c Contributed by Jan Hubicka <jh (a] suse.cz> and
 1.1  mrg @c Diego Novillo <dnovillo (a] google.com>
 1.1  mrg
 1.1  mrg @node LTO
 1.1  mrg @chapter Link Time Optimization
 1.1  mrg @cindex lto
 1.1  mrg @cindex whopr
 1.1  mrg @cindex wpa
 1.1  mrg @cindex ltrans
 1.1  mrg
 1.1  mrg Link Time Optimization (LTO) gives GCC the capability of
 1.1  mrg dumping its internal representation (GIMPLE) to disk,
 1.1  mrg so that all the different compilation units that make up
 1.1  mrg a single executable can be optimized as a single module.
 1.1  mrg This expands the scope of inter-procedural optimizations
 1.1  mrg to encompass the whole program (or, rather, everything
 1.1  mrg that is visible at link time).
 1.1  mrg
 1.1  mrg @menu
 1.1  mrg * LTO Overview::            Overview of LTO.
 1.1  mrg * LTO object file layout::  LTO file sections in ELF.
 1.1  mrg * IPA::                     Using summary information in IPA passes.
 1.1  mrg * WHOPR::                   Whole program assumptions,
 1.1  mrg                             linker plugin and symbol visibilities.
 1.1  mrg * Internal flags::          Internal flags controlling @code{lto1}.
 1.1  mrg @end menu
 1.1  mrg
 1.1  mrg @node LTO Overview
 1.1  mrg @section Design Overview
 1.1  mrg
 1.1  mrg Link time optimization is implemented as a GCC front end for a
 1.1  mrg bytecode representation of GIMPLE that is emitted in special sections
 1.1  mrg of @code{.o} files.  Currently, LTO support is enabled in most
 1.1  mrg ELF-based systems, as well as darwin, cygwin and mingw systems.
 1.1  mrg
 1.1  mrg Since GIMPLE bytecode is saved alongside final object code, object
 1.1  mrg files generated with LTO support are larger than regular object files.
 1.1  mrg This ``fat'' object format makes it easy to integrate LTO into
 1.1  mrg existing build systems, as one can, for instance, produce archives of
 1.1  mrg the files.  Additionally, one might be able to ship one set of fat
 1.1  mrg objects which could be used both for development and the production of
 1.1  mrg optimized builds.  A, perhaps surprising, side effect of this feature
 1.4  mrg is that any mistake in the toolchain leads to LTO information not
 1.1  mrg being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
 1.1  mrg This is both an advantage, as the system is more robust, and a
 1.1  mrg disadvantage, as the user is not informed that the optimization has
 1.1  mrg been disabled.
 1.1  mrg
 1.1  mrg The current implementation only produces ``fat'' objects, effectively
 1.1  mrg doubling compilation time and increasing file sizes up to 5x the
 1.1  mrg original size.  This hides the problem that some tools, such as
 1.1  mrg @code{ar} and @code{nm}, need to understand symbol tables of LTO
 1.1  mrg sections.  These tools were extended to use the plugin infrastructure,
 1.1  mrg and with these problems solved, GCC will also support ``slim'' objects
 1.1  mrg consisting of the intermediate code alone.
 1.1  mrg
 1.1  mrg At the highest level, LTO splits the compiler in two.  The first half
 1.1  mrg (the ``writer'') produces a streaming representation of all the
 1.1  mrg internal data structures needed to optimize and generate code.  This
 1.1  mrg includes declarations, types, the callgraph and the GIMPLE representation
 1.1  mrg of function bodies.
 1.1  mrg
 1.1  mrg When @option{-flto} is given during compilation of a source file, the
 1.1  mrg pass manager executes all the passes in @code{all_lto_gen_passes}.
 1.1  mrg Currently, this phase is composed of two IPA passes:
 1.1  mrg
 1.1  mrg @itemize @bullet
 1.1  mrg @item @code{pass_ipa_lto_gimple_out}
 1.1  mrg This pass executes the function @code{lto_output} in
 1.1  mrg @file{lto-streamer-out.c}, which traverses the call graph encoding
 1.1  mrg every reachable declaration, type and function.  This generates a
 1.1  mrg memory representation of all the file sections described below.
 1.1  mrg
 1.1  mrg @item @code{pass_ipa_lto_finish_out}
 1.1  mrg This pass executes the function @code{produce_asm_for_decls} in
 1.1  mrg @file{lto-streamer-out.c}, which takes the memory image built in the
 1.1  mrg previous pass and encodes it in the corresponding ELF file sections.
 1.1  mrg @end itemize
 1.1  mrg
 1.1  mrg The second half of LTO support is the ``reader''.  This is implemented
 1.1  mrg as the GCC front end @file{lto1} in @file{lto/lto.c}.  When
 1.1  mrg @file{collect2} detects a link set of @code{.o}/@code{.a} files with
 1.1  mrg LTO information and the @option{-flto} is enabled, it invokes
 1.1  mrg @file{lto1} which reads the set of files and aggregates them into a
 1.1  mrg single translation unit for optimization.  The main entry point for
 1.1  mrg the reader is @file{lto/lto.c}:@code{lto_main}.
 1.1  mrg
 1.1  mrg @subsection LTO modes of operation
 1.1  mrg
 1.1  mrg One of the main goals of the GCC link-time infrastructure was to allow
 1.1  mrg effective compilation of large programs.  For this reason GCC implements two
 1.1  mrg link-time compilation modes.
 1.1  mrg
 1.1  mrg @enumerate
 1.1  mrg @item	@emph{LTO mode}, in which the whole program is read into the
 1.1  mrg compiler at link-time and optimized in a similar way as if it
 1.1  mrg were a single source-level compilation unit.
 1.1  mrg
 1.1  mrg @item	@emph{WHOPR or partitioned mode}, designed to utilize multiple
 1.1  mrg CPUs and/or a distributed compilation environment to quickly link
 1.1  mrg large applications.  WHOPR stands for WHOle Program optimizeR (not to
 1.1  mrg be confused with the semantics of @option{-fwhole-program}).  It
 1.1  mrg partitions the aggregated callgraph from many different @code{.o}
 1.1  mrg files and distributes the compilation of the sub-graphs to different
 1.1  mrg CPUs.
 1.1  mrg
 1.1  mrg Note that distributed compilation is not implemented yet, but since
 1.1  mrg the parallelism is facilitated via generating a @code{Makefile}, it
 1.1  mrg would be easy to implement.
 1.1  mrg @end enumerate
 1.1  mrg
 1.1  mrg WHOPR splits LTO into three main stages:
 1.1  mrg @enumerate
 1.1  mrg @item Local generation (LGEN)
 1.1  mrg This stage executes in parallel.  Every file in the program is compiled
 1.1  mrg into the intermediate language and packaged together with the local
 1.1  mrg call-graph and summary information.  This stage is the same for both
 1.1  mrg the LTO and WHOPR compilation mode.
 1.1  mrg
 1.1  mrg @item Whole Program Analysis (WPA)
 1.1  mrg WPA is performed sequentially.  The global call-graph is generated, and
 1.1  mrg a global analysis procedure makes transformation decisions.  The global
 1.1  mrg call-graph is partitioned to facilitate parallel optimization during
 1.1  mrg phase 3.  The results of the WPA stage are stored into new object files
 1.1  mrg which contain the partitions of program expressed in the intermediate
 1.1  mrg language and the optimization decisions.
 1.1  mrg
 1.1  mrg @item Local transformations (LTRANS)
 1.1  mrg This stage executes in parallel.  All the decisions made during phase 2
 1.1  mrg are implemented locally in each partitioned object file, and the final
 1.1  mrg object code is generated.  Optimizations which cannot be decided
 1.1  mrg efficiently during the phase 2 may be performed on the local
 1.1  mrg call-graph partitions.
 1.1  mrg @end enumerate
 1.1  mrg
 1.1  mrg WHOPR can be seen as an extension of the usual LTO mode of
 1.1  mrg compilation.  In LTO, WPA and LTRANS are executed within a single
 1.1  mrg execution of the compiler, after the whole program has been read into
 1.1  mrg memory.
 1.1  mrg
 1.1  mrg When compiling in WHOPR mode, the callgraph is partitioned during
 1.1  mrg the WPA stage.  The whole program is split into a given number of
 1.1  mrg partitions of roughly the same size.  The compiler tries to
 1.1  mrg minimize the number of references which cross partition boundaries.
 1.1  mrg The main advantage of WHOPR is to allow the parallel execution of
 1.1  mrg LTRANS stages, which are the most time-consuming part of the
 1.1  mrg compilation process.  Additionally, it avoids the need to load the
 1.1  mrg whole program into memory.
 1.1  mrg
 1.1  mrg
 1.1  mrg @node LTO object file layout
 1.1  mrg @section LTO file sections
 1.1  mrg
 1.1  mrg LTO information is stored in several ELF sections inside object files.
 1.1  mrg Data structures and enum codes for sections are defined in
 1.1  mrg @file{lto-streamer.h}.
 1.1  mrg
 1.1  mrg These sections are emitted from @file{lto-streamer-out.c} and mapped
 1.1  mrg in all at once from @file{lto/lto.c}:@code{lto_file_read}.  The
 1.1  mrg individual functions dealing with the reading/writing of each section
 1.1  mrg are described below.
 1.1  mrg
 1.1  mrg @itemize @bullet
 1.1  mrg @item Command line options (@code{.gnu.lto_.opts})
 1.1  mrg
 1.1  mrg This section contains the command line options used to generate the
 1.1  mrg object files.  This is used at link time to determine the optimization
 1.1  mrg level and other settings when they are not explicitly specified at the
 1.1  mrg linker command line.
 1.1  mrg
 1.1  mrg Currently, GCC does not support combining LTO object files compiled
 1.1  mrg with different set of the command line options into a single binary.
 1.1  mrg At link time, the options given on the command line and the options
 1.1  mrg saved on all the files in a link-time set are applied globally.  No
 1.1  mrg attempt is made at validating the combination of flags (other than the
 1.1  mrg usual validation done by option processing).  This is implemented in
 1.1  mrg @file{lto/lto.c}:@code{lto_read_all_file_options}.
 1.1  mrg
 1.1  mrg
 1.1  mrg @item Symbol table (@code{.gnu.lto_.symtab})
 1.1  mrg
 1.1  mrg This table replaces the ELF symbol table for functions and variables
 1.1  mrg represented in the LTO IL.  Symbols used and exported by the optimized
 1.1  mrg assembly code of ``fat'' objects might not match the ones used and
 1.1  mrg exported by the intermediate code.  This table is necessary because
 1.1  mrg the intermediate code is less optimized and thus requires a separate
 1.1  mrg symbol table.
 1.1  mrg
 1.1  mrg Additionally, the binary code in the ``fat'' object will lack a call
 1.1  mrg to a function, since the call was optimized out at compilation time
 1.1  mrg after the intermediate language was streamed out.  In some special
 1.1  mrg cases, the same optimization may not happen during link-time
 1.1  mrg optimization.  This would lead to an undefined symbol if only one
 1.1  mrg symbol table was used.
 1.1  mrg
 1.1  mrg The symbol table is emitted in
 1.1  mrg @file{lto-streamer-out.c}:@code{produce_symtab}.
 1.1  mrg
 1.1  mrg
 1.1  mrg @item Global declarations and types (@code{.gnu.lto_.decls})
 1.1  mrg
 1.1  mrg This section contains an intermediate language dump of all
 1.1  mrg declarations and types required to represent the callgraph, static
 1.1  mrg variables and top-level debug info.
 1.1  mrg
 1.1  mrg The contents of this section are emitted in
 1.1  mrg @file{lto-streamer-out.c}:@code{produce_asm_for_decls}.  Types and
 1.1  mrg symbols are emitted in a topological order that preserves the sharing
 1.1  mrg of pointers when the file is read back in
 1.1  mrg (@file{lto.c}:@code{read_cgraph_and_symbols}).
 1.1  mrg
 1.1  mrg
 1.1  mrg @item The callgraph (@code{.gnu.lto_.cgraph})
 1.1  mrg
 1.1  mrg This section contains the basic data structure used by the GCC
 1.1  mrg inter-procedural optimization infrastructure.  This section stores an
 1.1  mrg annotated multi-graph which represents the functions and call sites as
 1.1  mrg well as the variables, aliases and top-level @code{asm} statements.
 1.1  mrg
 1.1  mrg This section is emitted in
 1.1  mrg @file{lto-streamer-out.c}:@code{output_cgraph} and read in
 1.1  mrg @file{lto-cgraph.c}:@code{input_cgraph}.
 1.1  mrg
 1.1  mrg
 1.1  mrg @item IPA references (@code{.gnu.lto_.refs})
 1.1  mrg
 1.1  mrg This section contains references between function and static
 1.1  mrg variables.  It is emitted by @file{lto-cgraph.c}:@code{output_refs}
 1.1  mrg and read by @file{lto-cgraph.c}:@code{input_refs}.
 1.1  mrg
 1.1  mrg
 1.1  mrg @item Function bodies (@code{.gnu.lto_.function_body.<name>})
 1.1  mrg
 1.1  mrg This section contains function bodies in the intermediate language
 1.1  mrg representation.  Every function body is in a separate section to allow
 1.1  mrg copying of the section independently to different object files or
 1.1  mrg reading the function on demand.
 1.1  mrg
 1.1  mrg Functions are emitted in
 1.1  mrg @file{lto-streamer-out.c}:@code{output_function} and read in
 1.1  mrg @file{lto-streamer-in.c}:@code{input_function}.
 1.1  mrg
 1.1  mrg
 1.1  mrg @item Static variable initializers (@code{.gnu.lto_.vars})
 1.1  mrg
 1.1  mrg This section contains all the symbols in the global variable pool.  It
 1.1  mrg is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
 1.1  mrg @file{lto-cgraph.c}:@code{input_cgraph}.
 1.1  mrg
 1.1  mrg @item Summaries and optimization summaries used by IPA passes
 1.1  mrg (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
 1.1  mrg @code{pureconst} or @code{reference})
 1.1  mrg
 1.1  mrg These sections are used by IPA passes that need to emit summary
 1.1  mrg information during LTO generation to be read and aggregated at
 1.1  mrg link time.  Each pass is responsible for implementing two pass manager
 1.1  mrg hooks: one for writing the summary and another for reading it in.  The
 1.1  mrg format of these sections is entirely up to each individual pass.  The
 1.1  mrg only requirement is that the writer and reader hooks agree on the
 1.1  mrg format.
 1.1  mrg @end itemize
 1.1  mrg
 1.1  mrg
 1.1  mrg @node IPA
 1.1  mrg @section Using summary information in IPA passes
 1.1  mrg
 1.1  mrg Programs are represented internally as a @emph{callgraph} (a
 1.1  mrg multi-graph where nodes are functions and edges are call sites)
 1.1  mrg and a @emph{varpool} (a list of static and external variables in
 1.1  mrg the program).
 1.1  mrg
 1.1  mrg The inter-procedural optimization is organized as a sequence of
 1.1  mrg individual passes, which operate on the callgraph and the
 1.1  mrg varpool.  To make the implementation of WHOPR possible, every
 1.1  mrg inter-procedural optimization pass is split into several stages
 1.1  mrg that are executed at different times during WHOPR compilation:
 1.1  mrg
 1.1  mrg @itemize @bullet
 1.1  mrg @item LGEN time
 1.1  mrg @enumerate
 1.1  mrg @item @emph{Generate summary} (@code{generate_summary} in
 1.1  mrg @code{struct ipa_opt_pass_d}).  This stage analyzes every function
 1.1  mrg body and variable initializer is examined and stores relevant
 1.1  mrg information into a pass-specific data structure.
 1.1  mrg
 1.1  mrg @item @emph{Write summary} (@code{write_summary} in
 1.1  mrg @code{struct ipa_opt_pass_d}).  This stage writes all the
 1.1  mrg pass-specific information generated by @code{generate_summary}.
 1.1  mrg Summaries go into their own @code{LTO_section_*} sections that
 1.1  mrg have to be declared in @file{lto-streamer.h}:@code{enum
 1.1  mrg lto_section_type}.  A new section is created by calling
 1.1  mrg @code{create_output_block} and data can be written using the
 1.1  mrg @code{lto_output_*} routines.
 1.1  mrg @end enumerate
 1.1  mrg
 1.1  mrg @item WPA time
 1.1  mrg @enumerate
 1.1  mrg @item @emph{Read summary} (@code{read_summary} in
 1.1  mrg @code{struct ipa_opt_pass_d}).  This stage reads all the
 1.1  mrg pass-specific information in exactly the same order that it was
 1.1  mrg written by @code{write_summary}.
 1.1  mrg
 1.1  mrg @item @emph{Execute} (@code{execute} in @code{struct
 1.1  mrg opt_pass}).  This performs inter-procedural propagation.  This
 1.1  mrg must be done without actual access to the individual function
 1.1  mrg bodies or variable initializers.  Typically, this results in a
 1.1  mrg transitive closure operation over the summary information of all
 1.1  mrg the nodes in the callgraph.
 1.1  mrg
 1.1  mrg @item @emph{Write optimization summary}
 1.1  mrg (@code{write_optimization_summary} in @code{struct
 1.1  mrg ipa_opt_pass_d}).  This writes the result of the inter-procedural
 1.1  mrg propagation into the object file.  This can use the same data
 1.1  mrg structures and helper routines used in @code{write_summary}.
 1.1  mrg @end enumerate
 1.1  mrg
 1.1  mrg @item LTRANS time
 1.1  mrg @enumerate
 1.1  mrg @item @emph{Read optimization summary}
 1.1  mrg (@code{read_optimization_summary} in @code{struct
 1.1  mrg ipa_opt_pass_d}).  The counterpart to
 1.1  mrg @code{write_optimization_summary}.  This reads the interprocedural
 1.1  mrg optimization decisions in exactly the same format emitted by
 1.1  mrg @code{write_optimization_summary}.
 1.1  mrg
 1.1  mrg @item @emph{Transform} (@code{function_transform} and
 1.1  mrg @code{variable_transform} in @code{struct ipa_opt_pass_d}).
 1.1  mrg The actual function bodies and variable initializers are updated
 1.1  mrg based on the information passed down from the @emph{Execute} stage.
 1.1  mrg @end enumerate
 1.1  mrg @end itemize
 1.1  mrg
 1.1  mrg The implementation of the inter-procedural passes are shared
 1.1  mrg between LTO, WHOPR and classic non-LTO compilation.
 1.1  mrg
 1.1  mrg @itemize
 1.1  mrg @item During the traditional file-by-file mode every pass executes its
 1.1  mrg own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
 1.1  mrg stages within the single execution context of the compiler.
 1.1  mrg
 1.1  mrg @item In LTO compilation mode, every pass uses @emph{Generate
 1.1  mrg summary} and @emph{Write summary} stages at compilation time,
 1.1  mrg while the @emph{Read summary}, @emph{Execute}, and
 1.1  mrg @emph{Transform} stages are executed at link time.
 1.1  mrg
 1.1  mrg @item In WHOPR mode all stages are used.
 1.1  mrg @end itemize
 1.1  mrg
 1.1  mrg To simplify development, the GCC pass manager differentiates
1.10  mrg between normal inter-procedural passes (@pxref{Regular IPA passes}),
1.10  mrg small inter-procedural passes (@pxref{Small IPA passes})
1.10  mrg and late inter-procedural passes (@pxref{Late IPA passes}).
1.10  mrg A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does
1.10  mrg everything at once and thus cannot be executed during WPA in
 1.1  mrg WHOPR mode.  It defines only the @emph{Execute} stage and during
 1.1  mrg this stage it accesses and modifies the function bodies.  Such
 1.1  mrg passes are useful for optimization at LGEN or LTRANS time and are
 1.1  mrg used, for example, to implement early optimization before writing
 1.1  mrg object files.  The simple inter-procedural passes can also be used
 1.1  mrg for easier prototyping and development of a new inter-procedural
 1.1  mrg pass.
 1.1  mrg
 1.1  mrg
 1.1  mrg @subsection Virtual clones
 1.1  mrg
 1.1  mrg One of the main challenges of introducing the WHOPR compilation
 1.1  mrg mode was addressing the interactions between optimization passes.
 1.1  mrg In LTO compilation mode, the passes are executed in a sequence,
 1.1  mrg each of which consists of analysis (or @emph{Generate summary}),
 1.1  mrg propagation (or @emph{Execute}) and @emph{Transform} stages.
 1.1  mrg Once the work of one pass is finished, the next pass sees the
 1.1  mrg updated program representation and can execute.  This makes the
 1.1  mrg individual passes dependent on each other.
 1.1  mrg
 1.1  mrg In WHOPR mode all passes first execute their @emph{Generate
 1.1  mrg summary} stage.  Then summary writing marks the end of the LGEN
 1.1  mrg stage.  At WPA time,
 1.1  mrg the summaries are read back into memory and all passes run the
 1.1  mrg @emph{Execute} stage.  Optimization summaries are streamed and
 1.1  mrg sent to LTRANS, where all the passes execute the @emph{Transform}
 1.1  mrg stage.
 1.1  mrg
 1.1  mrg Most optimization passes split naturally into analysis,
 1.1  mrg propagation and transformation stages.  But some do not.  The
 1.1  mrg main problem arises when one pass performs changes and the
 1.1  mrg following pass gets confused by seeing different callgraphs
 1.1  mrg between the @emph{Transform} stage and the @emph{Generate summary}
 1.1  mrg or @emph{Execute} stage.  This means that the passes are required
 1.1  mrg to communicate their decisions with each other.
 1.1  mrg
 1.1  mrg To facilitate this communication, the GCC callgraph
 1.1  mrg infrastructure implements @emph{virtual clones}, a method of
 1.1  mrg representing the changes performed by the optimization passes in
 1.1  mrg the callgraph without needing to update function bodies.
 1.1  mrg
 1.1  mrg A @emph{virtual clone} in the callgraph is a function that has no
 1.1  mrg associated body, just a description of how to create its body based
 1.1  mrg on a different function (which itself may be a virtual clone).
 1.1  mrg
 1.1  mrg The description of function modifications includes adjustments to
 1.1  mrg the function's signature (which allows, for example, removing or
 1.1  mrg adding function arguments), substitutions to perform on the
 1.1  mrg function body, and, for inlined functions, a pointer to the
 1.1  mrg function that it will be inlined into.
 1.1  mrg
 1.1  mrg It is also possible to redirect any edge of the callgraph from a
 1.1  mrg function to its virtual clone.  This implies updating of the call
 1.1  mrg site to adjust for the new function signature.
 1.1  mrg
 1.1  mrg Most of the transformations performed by inter-procedural
 1.1  mrg optimizations can be represented via virtual clones.  For
 1.1  mrg instance, a constant propagation pass can produce a virtual clone
 1.1  mrg of the function which replaces one of its arguments by a
 1.1  mrg constant.  The inliner can represent its decisions by producing a
 1.1  mrg clone of a function whose body will be later integrated into
 1.1  mrg a given function.
 1.1  mrg
 1.1  mrg Using @emph{virtual clones}, the program can be easily updated
 1.1  mrg during the @emph{Execute} stage, solving most of pass interactions
 1.1  mrg problems that would otherwise occur during @emph{Transform}.
 1.1  mrg
 1.1  mrg Virtual clones are later materialized in the LTRANS stage and
 1.1  mrg turned into real functions.  Passes executed after the virtual
 1.1  mrg clone were introduced also perform their @emph{Transform} stage
 1.1  mrg on new functions, so for a pass there is no significant
 1.1  mrg difference between operating on a real function or a virtual
 1.1  mrg clone introduced before its @emph{Execute} stage.
 1.1  mrg
 1.1  mrg Optimization passes then work on virtual clones introduced before
 1.1  mrg their @emph{Execute} stage as if they were real functions.  The
 1.1  mrg only difference is that clones are not visible during the
 1.1  mrg @emph{Generate Summary} stage.
 1.1  mrg
 1.1  mrg To keep function summaries updated, the callgraph interface
 1.1  mrg allows an optimizer to register a callback that is called every
 1.1  mrg time a new clone is introduced as well as when the actual
 1.1  mrg function or variable is generated or when a function or variable
 1.1  mrg is removed.  These hooks are registered in the @emph{Generate
 1.1  mrg summary} stage and allow the pass to keep its information intact
 1.1  mrg until the @emph{Execute} stage.  The same hooks can also be
 1.1  mrg registered during the @emph{Execute} stage to keep the
 1.1  mrg optimization summaries updated for the @emph{Transform} stage.
 1.1  mrg
 1.1  mrg @subsection IPA references
 1.1  mrg
 1.1  mrg GCC represents IPA references in the callgraph.  For a function
 1.1  mrg or variable @code{A}, the @emph{IPA reference} is a list of all
 1.1  mrg locations where the address of @code{A} is taken and, when
 1.1  mrg @code{A} is a variable, a list of all direct stores and reads
 1.1  mrg to/from @code{A}.  References represent an oriented multi-graph on
 1.1  mrg the union of nodes of the callgraph and the varpool.  See
 1.1  mrg @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
 1.1  mrg and
 1.1  mrg @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
 1.1  mrg for details.
 1.1  mrg
 1.1  mrg @subsection Jump functions
 1.1  mrg Suppose that an optimization pass sees a function @code{A} and it
 1.1  mrg knows the values of (some of) its arguments.  The @emph{jump
 1.1  mrg function} describes the value of a parameter of a given function
 1.1  mrg call in function @code{A} based on this knowledge.
 1.1  mrg
 1.1  mrg Jump functions are used by several optimizations, such as the
 1.1  mrg inter-procedural constant propagation pass and the
 1.1  mrg devirtualization pass.  The inliner also uses jump functions to
 1.1  mrg perform inlining of callbacks.
 1.1  mrg
 1.1  mrg @node WHOPR
 1.1  mrg @section Whole program assumptions, linker plugin and symbol visibilities
 1.1  mrg
 1.1  mrg Link-time optimization gives relatively minor benefits when used
 1.1  mrg alone.  The problem is that propagation of inter-procedural
 1.1  mrg information does not work well across functions and variables
 1.1  mrg that are called or referenced by other compilation units (such as
 1.1  mrg from a dynamically linked library).  We say that such functions
 1.1  mrg and variables are @emph{externally visible}.
 1.1  mrg
 1.1  mrg To make the situation even more difficult, many applications
 1.1  mrg organize themselves as a set of shared libraries, and the default
 1.1  mrg ELF visibility rules allow one to overwrite any externally
 1.1  mrg visible symbol with a different symbol at runtime.  This
 1.1  mrg basically disables any optimizations across such functions and
 1.1  mrg variables, because the compiler cannot be sure that the function
 1.1  mrg body it is seeing is the same function body that will be used at
 1.1  mrg runtime.  Any function or variable not declared @code{static} in
 1.1  mrg the sources degrades the quality of inter-procedural
 1.1  mrg optimization.
 1.1  mrg
 1.1  mrg To avoid this problem the compiler must assume that it sees the
 1.1  mrg whole program when doing link-time optimization.  Strictly
 1.1  mrg speaking, the whole program is rarely visible even at link-time.
 1.1  mrg Standard system libraries are usually linked dynamically or not
 1.1  mrg provided with the link-time information.  In GCC, the whole
 1.1  mrg program option (@option{-fwhole-program}) asserts that every
 1.1  mrg function and variable defined in the current compilation
 1.1  mrg unit is static, except for function @code{main} (note: at
 1.1  mrg link time, the current unit is the union of all objects compiled
 1.1  mrg with LTO).  Since some functions and variables need to
 1.1  mrg be referenced externally, for example by another DSO or from an
 1.1  mrg assembler file, GCC also provides the function and variable
 1.1  mrg attribute @code{externally_visible} which can be used to disable
 1.1  mrg the effect of @option{-fwhole-program} on a specific symbol.
 1.1  mrg
 1.1  mrg The whole program mode assumptions are slightly more complex in
 1.1  mrg C++, where inline functions in headers are put into @emph{COMDAT}
 1.1  mrg sections.  COMDAT function and variables can be defined by
 1.1  mrg multiple object files and their bodies are unified at link-time
 1.1  mrg and dynamic link-time.  COMDAT functions are changed to local only
 1.1  mrg when their address is not taken and thus un-sharing them with a
 1.1  mrg library is not harmful.  COMDAT variables always remain externally
 1.1  mrg visible, however for readonly variables it is assumed that their
 1.1  mrg initializers cannot be overwritten by a different value.
 1.1  mrg
 1.1  mrg GCC provides the function and variable attribute
 1.1  mrg @code{visibility} that can be used to specify the visibility of
 1.1  mrg externally visible symbols (or alternatively an
 1.1  mrg @option{-fdefault-visibility} command line option).  ELF defines
 1.1  mrg the @code{default}, @code{protected}, @code{hidden} and
 1.1  mrg @code{internal} visibilities.
 1.1  mrg
 1.1  mrg The most commonly used is visibility is @code{hidden}.  It
 1.1  mrg specifies that the symbol cannot be referenced from outside of
 1.1  mrg the current shared library.  Unfortunately, this information
 1.1  mrg cannot be used directly by the link-time optimization in the
 1.1  mrg compiler since the whole shared library also might contain
 1.1  mrg non-LTO objects and those are not visible to the compiler.
 1.1  mrg
 1.1  mrg GCC solves this problem using linker plugins.  A @emph{linker
 1.1  mrg plugin} is an interface to the linker that allows an external
 1.1  mrg program to claim the ownership of a given object file.  The linker
 1.1  mrg then performs the linking procedure by querying the plugin about
 1.1  mrg the symbol table of the claimed objects and once the linking
 1.1  mrg decisions are complete, the plugin is allowed to provide the
 1.1  mrg final object file before the actual linking is made.  The linker
 1.1  mrg plugin obtains the symbol resolution information which specifies
 1.1  mrg which symbols provided by the claimed objects are bound from the
 1.1  mrg rest of a binary being linked.
 1.1  mrg
 1.1  mrg GCC is designed to be independent of the rest of the toolchain
 1.1  mrg and aims to support linkers without plugin support.  For this
 1.1  mrg reason it does not use the linker plugin by default.  Instead,
 1.1  mrg the object files are examined by @command{collect2} before being
 1.1  mrg passed to the linker and objects found to have LTO sections are
 1.1  mrg passed to @command{lto1} first.  This mode does not work for
 1.1  mrg library archives.  The decision on what object files from the
 1.1  mrg archive are needed depends on the actual linking and thus GCC
 1.1  mrg would have to implement the linker itself.  The resolution
 1.1  mrg information is missing too and thus GCC needs to make an educated
 1.1  mrg guess based on @option{-fwhole-program}.  Without the linker
 1.1  mrg plugin GCC also assumes that symbols are declared @code{hidden}
 1.1  mrg and not referred by non-LTO code by default.
 1.1  mrg
 1.1  mrg @node Internal flags
 1.1  mrg @section Internal flags controlling @code{lto1}
 1.1  mrg
 1.1  mrg The following flags are passed into @command{lto1} and are not
 1.1  mrg meant to be used directly from the command line.
 1.1  mrg
 1.1  mrg @itemize
 1.1  mrg @item -fwpa
 1.1  mrg @opindex fwpa
 1.1  mrg This option runs the serial part of the link-time optimizer
 1.1  mrg performing the inter-procedural propagation (WPA mode).  The
 1.1  mrg compiler reads in summary information from all inputs and
 1.1  mrg performs an analysis based on summary information only.  It
 1.1  mrg generates object files for subsequent runs of the link-time
 1.1  mrg optimizer where individual object files are optimized using both
 1.1  mrg summary information from the WPA mode and the actual function
 1.1  mrg bodies.  It then drives the LTRANS phase.
 1.1  mrg
 1.1  mrg @item -fltrans
 1.1  mrg @opindex fltrans
 1.1  mrg This option runs the link-time optimizer in the
 1.1  mrg local-transformation (LTRANS) mode, which reads in output from a
 1.1  mrg previous run of the LTO in WPA mode.  In the LTRANS mode, LTO
 1.1  mrg optimizes an object and produces the final assembly.
 1.1  mrg
 1.1  mrg @item -fltrans-output-list=@var{file}
 1.1  mrg @opindex fltrans-output-list
 1.1  mrg This option specifies a file to which the names of LTRANS output
 1.1  mrg files are written.  This option is only meaningful in conjunction
 1.1  mrg with @option{-fwpa}.
 1.3  mrg
 1.3  mrg @item -fresolution=@var{file}
 1.3  mrg @opindex fresolution
 1.3  mrg This option specifies the linker resolution file.  This option is
 1.3  mrg only meaningful in conjunction with @option{-fwpa} and as option
 1.3  mrg to pass through to the LTO linker plugin.
 1.1  mrg @end itemize