Home | History | Annotate | Line # | Download | only in doc
lto.texi revision 1.10
      1 @c Copyright (C) 2010-2020 Free Software Foundation, Inc.
      2 @c This is part of the GCC manual.
      3 @c For copying conditions, see the file gcc.texi.
      4 @c Contributed by Jan Hubicka <jh (a] suse.cz> and
      5 @c Diego Novillo <dnovillo (a] google.com>
      6 
      7 @node LTO
      8 @chapter Link Time Optimization
      9 @cindex lto
     10 @cindex whopr
     11 @cindex wpa
     12 @cindex ltrans
     13 
     14 Link Time Optimization (LTO) gives GCC the capability of
     15 dumping its internal representation (GIMPLE) to disk,
     16 so that all the different compilation units that make up
     17 a single executable can be optimized as a single module.
     18 This expands the scope of inter-procedural optimizations
     19 to encompass the whole program (or, rather, everything
     20 that is visible at link time).
     21 
     22 @menu
     23 * LTO Overview::            Overview of LTO.
     24 * LTO object file layout::  LTO file sections in ELF.
     25 * IPA::                     Using summary information in IPA passes.
     26 * WHOPR::                   Whole program assumptions,
     27                             linker plugin and symbol visibilities.
     28 * Internal flags::          Internal flags controlling @code{lto1}.
     29 @end menu
     30 
     31 @node LTO Overview
     32 @section Design Overview
     33 
     34 Link time optimization is implemented as a GCC front end for a
     35 bytecode representation of GIMPLE that is emitted in special sections
     36 of @code{.o} files.  Currently, LTO support is enabled in most
     37 ELF-based systems, as well as darwin, cygwin and mingw systems.
     38 
     39 Since GIMPLE bytecode is saved alongside final object code, object
     40 files generated with LTO support are larger than regular object files.
     41 This ``fat'' object format makes it easy to integrate LTO into
     42 existing build systems, as one can, for instance, produce archives of
     43 the files.  Additionally, one might be able to ship one set of fat
     44 objects which could be used both for development and the production of
     45 optimized builds.  A, perhaps surprising, side effect of this feature
     46 is that any mistake in the toolchain leads to LTO information not
     47 being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
     48 This is both an advantage, as the system is more robust, and a
     49 disadvantage, as the user is not informed that the optimization has
     50 been disabled.
     51 
     52 The current implementation only produces ``fat'' objects, effectively
     53 doubling compilation time and increasing file sizes up to 5x the
     54 original size.  This hides the problem that some tools, such as
     55 @code{ar} and @code{nm}, need to understand symbol tables of LTO
     56 sections.  These tools were extended to use the plugin infrastructure,
     57 and with these problems solved, GCC will also support ``slim'' objects
     58 consisting of the intermediate code alone.
     59 
     60 At the highest level, LTO splits the compiler in two.  The first half
     61 (the ``writer'') produces a streaming representation of all the
     62 internal data structures needed to optimize and generate code.  This
     63 includes declarations, types, the callgraph and the GIMPLE representation
     64 of function bodies.
     65 
     66 When @option{-flto} is given during compilation of a source file, the
     67 pass manager executes all the passes in @code{all_lto_gen_passes}.
     68 Currently, this phase is composed of two IPA passes:
     69 
     70 @itemize @bullet
     71 @item @code{pass_ipa_lto_gimple_out}
     72 This pass executes the function @code{lto_output} in
     73 @file{lto-streamer-out.c}, which traverses the call graph encoding
     74 every reachable declaration, type and function.  This generates a
     75 memory representation of all the file sections described below.
     76 
     77 @item @code{pass_ipa_lto_finish_out}
     78 This pass executes the function @code{produce_asm_for_decls} in
     79 @file{lto-streamer-out.c}, which takes the memory image built in the
     80 previous pass and encodes it in the corresponding ELF file sections.
     81 @end itemize
     82 
     83 The second half of LTO support is the ``reader''.  This is implemented
     84 as the GCC front end @file{lto1} in @file{lto/lto.c}.  When
     85 @file{collect2} detects a link set of @code{.o}/@code{.a} files with
     86 LTO information and the @option{-flto} is enabled, it invokes
     87 @file{lto1} which reads the set of files and aggregates them into a
     88 single translation unit for optimization.  The main entry point for
     89 the reader is @file{lto/lto.c}:@code{lto_main}.
     90 
     91 @subsection LTO modes of operation
     92 
     93 One of the main goals of the GCC link-time infrastructure was to allow
     94 effective compilation of large programs.  For this reason GCC implements two
     95 link-time compilation modes.
     96 
     97 @enumerate
     98 @item	@emph{LTO mode}, in which the whole program is read into the
     99 compiler at link-time and optimized in a similar way as if it
    100 were a single source-level compilation unit.
    101 
    102 @item	@emph{WHOPR or partitioned mode}, designed to utilize multiple
    103 CPUs and/or a distributed compilation environment to quickly link
    104 large applications.  WHOPR stands for WHOle Program optimizeR (not to
    105 be confused with the semantics of @option{-fwhole-program}).  It
    106 partitions the aggregated callgraph from many different @code{.o}
    107 files and distributes the compilation of the sub-graphs to different
    108 CPUs.
    109 
    110 Note that distributed compilation is not implemented yet, but since
    111 the parallelism is facilitated via generating a @code{Makefile}, it
    112 would be easy to implement.
    113 @end enumerate
    114 
    115 WHOPR splits LTO into three main stages:
    116 @enumerate
    117 @item Local generation (LGEN)
    118 This stage executes in parallel.  Every file in the program is compiled
    119 into the intermediate language and packaged together with the local
    120 call-graph and summary information.  This stage is the same for both
    121 the LTO and WHOPR compilation mode.
    122 
    123 @item Whole Program Analysis (WPA)
    124 WPA is performed sequentially.  The global call-graph is generated, and
    125 a global analysis procedure makes transformation decisions.  The global
    126 call-graph is partitioned to facilitate parallel optimization during
    127 phase 3.  The results of the WPA stage are stored into new object files
    128 which contain the partitions of program expressed in the intermediate
    129 language and the optimization decisions.
    130 
    131 @item Local transformations (LTRANS)
    132 This stage executes in parallel.  All the decisions made during phase 2
    133 are implemented locally in each partitioned object file, and the final
    134 object code is generated.  Optimizations which cannot be decided
    135 efficiently during the phase 2 may be performed on the local
    136 call-graph partitions.
    137 @end enumerate
    138 
    139 WHOPR can be seen as an extension of the usual LTO mode of
    140 compilation.  In LTO, WPA and LTRANS are executed within a single
    141 execution of the compiler, after the whole program has been read into
    142 memory.
    143 
    144 When compiling in WHOPR mode, the callgraph is partitioned during
    145 the WPA stage.  The whole program is split into a given number of
    146 partitions of roughly the same size.  The compiler tries to
    147 minimize the number of references which cross partition boundaries.
    148 The main advantage of WHOPR is to allow the parallel execution of
    149 LTRANS stages, which are the most time-consuming part of the
    150 compilation process.  Additionally, it avoids the need to load the
    151 whole program into memory.
    152 
    153 
    154 @node LTO object file layout
    155 @section LTO file sections
    156 
    157 LTO information is stored in several ELF sections inside object files.
    158 Data structures and enum codes for sections are defined in
    159 @file{lto-streamer.h}.
    160 
    161 These sections are emitted from @file{lto-streamer-out.c} and mapped
    162 in all at once from @file{lto/lto.c}:@code{lto_file_read}.  The
    163 individual functions dealing with the reading/writing of each section
    164 are described below.
    165 
    166 @itemize @bullet
    167 @item Command line options (@code{.gnu.lto_.opts})
    168 
    169 This section contains the command line options used to generate the
    170 object files.  This is used at link time to determine the optimization
    171 level and other settings when they are not explicitly specified at the
    172 linker command line.
    173 
    174 Currently, GCC does not support combining LTO object files compiled
    175 with different set of the command line options into a single binary.
    176 At link time, the options given on the command line and the options
    177 saved on all the files in a link-time set are applied globally.  No
    178 attempt is made at validating the combination of flags (other than the
    179 usual validation done by option processing).  This is implemented in
    180 @file{lto/lto.c}:@code{lto_read_all_file_options}.
    181 
    182 
    183 @item Symbol table (@code{.gnu.lto_.symtab})
    184 
    185 This table replaces the ELF symbol table for functions and variables
    186 represented in the LTO IL.  Symbols used and exported by the optimized
    187 assembly code of ``fat'' objects might not match the ones used and
    188 exported by the intermediate code.  This table is necessary because
    189 the intermediate code is less optimized and thus requires a separate
    190 symbol table.
    191 
    192 Additionally, the binary code in the ``fat'' object will lack a call
    193 to a function, since the call was optimized out at compilation time
    194 after the intermediate language was streamed out.  In some special
    195 cases, the same optimization may not happen during link-time
    196 optimization.  This would lead to an undefined symbol if only one
    197 symbol table was used.
    198 
    199 The symbol table is emitted in
    200 @file{lto-streamer-out.c}:@code{produce_symtab}.
    201 
    202 
    203 @item Global declarations and types (@code{.gnu.lto_.decls})
    204 
    205 This section contains an intermediate language dump of all
    206 declarations and types required to represent the callgraph, static
    207 variables and top-level debug info.
    208 
    209 The contents of this section are emitted in
    210 @file{lto-streamer-out.c}:@code{produce_asm_for_decls}.  Types and
    211 symbols are emitted in a topological order that preserves the sharing
    212 of pointers when the file is read back in
    213 (@file{lto.c}:@code{read_cgraph_and_symbols}).
    214 
    215 
    216 @item The callgraph (@code{.gnu.lto_.cgraph})
    217 
    218 This section contains the basic data structure used by the GCC
    219 inter-procedural optimization infrastructure.  This section stores an
    220 annotated multi-graph which represents the functions and call sites as
    221 well as the variables, aliases and top-level @code{asm} statements.
    222 
    223 This section is emitted in
    224 @file{lto-streamer-out.c}:@code{output_cgraph} and read in
    225 @file{lto-cgraph.c}:@code{input_cgraph}.
    226 
    227 
    228 @item IPA references (@code{.gnu.lto_.refs})
    229 
    230 This section contains references between function and static
    231 variables.  It is emitted by @file{lto-cgraph.c}:@code{output_refs}
    232 and read by @file{lto-cgraph.c}:@code{input_refs}.
    233 
    234 
    235 @item Function bodies (@code{.gnu.lto_.function_body.<name>})
    236 
    237 This section contains function bodies in the intermediate language
    238 representation.  Every function body is in a separate section to allow
    239 copying of the section independently to different object files or
    240 reading the function on demand.
    241 
    242 Functions are emitted in
    243 @file{lto-streamer-out.c}:@code{output_function} and read in
    244 @file{lto-streamer-in.c}:@code{input_function}.
    245 
    246 
    247 @item Static variable initializers (@code{.gnu.lto_.vars})
    248 
    249 This section contains all the symbols in the global variable pool.  It
    250 is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
    251 @file{lto-cgraph.c}:@code{input_cgraph}.
    252 
    253 @item Summaries and optimization summaries used by IPA passes
    254 (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
    255 @code{pureconst} or @code{reference})
    256 
    257 These sections are used by IPA passes that need to emit summary
    258 information during LTO generation to be read and aggregated at
    259 link time.  Each pass is responsible for implementing two pass manager
    260 hooks: one for writing the summary and another for reading it in.  The
    261 format of these sections is entirely up to each individual pass.  The
    262 only requirement is that the writer and reader hooks agree on the
    263 format.
    264 @end itemize
    265 
    266 
    267 @node IPA
    268 @section Using summary information in IPA passes
    269 
    270 Programs are represented internally as a @emph{callgraph} (a
    271 multi-graph where nodes are functions and edges are call sites)
    272 and a @emph{varpool} (a list of static and external variables in
    273 the program).
    274 
    275 The inter-procedural optimization is organized as a sequence of
    276 individual passes, which operate on the callgraph and the
    277 varpool.  To make the implementation of WHOPR possible, every
    278 inter-procedural optimization pass is split into several stages
    279 that are executed at different times during WHOPR compilation:
    280 
    281 @itemize @bullet
    282 @item LGEN time
    283 @enumerate
    284 @item @emph{Generate summary} (@code{generate_summary} in
    285 @code{struct ipa_opt_pass_d}).  This stage analyzes every function
    286 body and variable initializer is examined and stores relevant
    287 information into a pass-specific data structure.
    288 
    289 @item @emph{Write summary} (@code{write_summary} in
    290 @code{struct ipa_opt_pass_d}).  This stage writes all the
    291 pass-specific information generated by @code{generate_summary}.
    292 Summaries go into their own @code{LTO_section_*} sections that
    293 have to be declared in @file{lto-streamer.h}:@code{enum
    294 lto_section_type}.  A new section is created by calling
    295 @code{create_output_block} and data can be written using the
    296 @code{lto_output_*} routines.
    297 @end enumerate
    298 
    299 @item WPA time
    300 @enumerate
    301 @item @emph{Read summary} (@code{read_summary} in
    302 @code{struct ipa_opt_pass_d}).  This stage reads all the
    303 pass-specific information in exactly the same order that it was
    304 written by @code{write_summary}.
    305 
    306 @item @emph{Execute} (@code{execute} in @code{struct
    307 opt_pass}).  This performs inter-procedural propagation.  This
    308 must be done without actual access to the individual function
    309 bodies or variable initializers.  Typically, this results in a
    310 transitive closure operation over the summary information of all
    311 the nodes in the callgraph.
    312 
    313 @item @emph{Write optimization summary}
    314 (@code{write_optimization_summary} in @code{struct
    315 ipa_opt_pass_d}).  This writes the result of the inter-procedural
    316 propagation into the object file.  This can use the same data
    317 structures and helper routines used in @code{write_summary}.
    318 @end enumerate
    319 
    320 @item LTRANS time
    321 @enumerate
    322 @item @emph{Read optimization summary}
    323 (@code{read_optimization_summary} in @code{struct
    324 ipa_opt_pass_d}).  The counterpart to
    325 @code{write_optimization_summary}.  This reads the interprocedural
    326 optimization decisions in exactly the same format emitted by
    327 @code{write_optimization_summary}.
    328 
    329 @item @emph{Transform} (@code{function_transform} and
    330 @code{variable_transform} in @code{struct ipa_opt_pass_d}).
    331 The actual function bodies and variable initializers are updated
    332 based on the information passed down from the @emph{Execute} stage.
    333 @end enumerate
    334 @end itemize
    335 
    336 The implementation of the inter-procedural passes are shared
    337 between LTO, WHOPR and classic non-LTO compilation.
    338 
    339 @itemize
    340 @item During the traditional file-by-file mode every pass executes its
    341 own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
    342 stages within the single execution context of the compiler.
    343 
    344 @item In LTO compilation mode, every pass uses @emph{Generate
    345 summary} and @emph{Write summary} stages at compilation time,
    346 while the @emph{Read summary}, @emph{Execute}, and
    347 @emph{Transform} stages are executed at link time.
    348 
    349 @item In WHOPR mode all stages are used.
    350 @end itemize
    351 
    352 To simplify development, the GCC pass manager differentiates
    353 between normal inter-procedural passes (@pxref{Regular IPA passes}),
    354 small inter-procedural passes (@pxref{Small IPA passes})
    355 and late inter-procedural passes (@pxref{Late IPA passes}).
    356 A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does
    357 everything at once and thus cannot be executed during WPA in
    358 WHOPR mode.  It defines only the @emph{Execute} stage and during
    359 this stage it accesses and modifies the function bodies.  Such
    360 passes are useful for optimization at LGEN or LTRANS time and are
    361 used, for example, to implement early optimization before writing
    362 object files.  The simple inter-procedural passes can also be used
    363 for easier prototyping and development of a new inter-procedural
    364 pass.
    365 
    366 
    367 @subsection Virtual clones
    368 
    369 One of the main challenges of introducing the WHOPR compilation
    370 mode was addressing the interactions between optimization passes.
    371 In LTO compilation mode, the passes are executed in a sequence,
    372 each of which consists of analysis (or @emph{Generate summary}),
    373 propagation (or @emph{Execute}) and @emph{Transform} stages.
    374 Once the work of one pass is finished, the next pass sees the
    375 updated program representation and can execute.  This makes the
    376 individual passes dependent on each other.
    377 
    378 In WHOPR mode all passes first execute their @emph{Generate
    379 summary} stage.  Then summary writing marks the end of the LGEN
    380 stage.  At WPA time,
    381 the summaries are read back into memory and all passes run the
    382 @emph{Execute} stage.  Optimization summaries are streamed and
    383 sent to LTRANS, where all the passes execute the @emph{Transform}
    384 stage.
    385 
    386 Most optimization passes split naturally into analysis,
    387 propagation and transformation stages.  But some do not.  The
    388 main problem arises when one pass performs changes and the
    389 following pass gets confused by seeing different callgraphs
    390 between the @emph{Transform} stage and the @emph{Generate summary}
    391 or @emph{Execute} stage.  This means that the passes are required
    392 to communicate their decisions with each other.
    393 
    394 To facilitate this communication, the GCC callgraph
    395 infrastructure implements @emph{virtual clones}, a method of
    396 representing the changes performed by the optimization passes in
    397 the callgraph without needing to update function bodies.
    398 
    399 A @emph{virtual clone} in the callgraph is a function that has no
    400 associated body, just a description of how to create its body based
    401 on a different function (which itself may be a virtual clone).
    402 
    403 The description of function modifications includes adjustments to
    404 the function's signature (which allows, for example, removing or
    405 adding function arguments), substitutions to perform on the
    406 function body, and, for inlined functions, a pointer to the
    407 function that it will be inlined into.
    408 
    409 It is also possible to redirect any edge of the callgraph from a
    410 function to its virtual clone.  This implies updating of the call
    411 site to adjust for the new function signature.
    412 
    413 Most of the transformations performed by inter-procedural
    414 optimizations can be represented via virtual clones.  For
    415 instance, a constant propagation pass can produce a virtual clone
    416 of the function which replaces one of its arguments by a
    417 constant.  The inliner can represent its decisions by producing a
    418 clone of a function whose body will be later integrated into
    419 a given function.
    420 
    421 Using @emph{virtual clones}, the program can be easily updated
    422 during the @emph{Execute} stage, solving most of pass interactions
    423 problems that would otherwise occur during @emph{Transform}.
    424 
    425 Virtual clones are later materialized in the LTRANS stage and
    426 turned into real functions.  Passes executed after the virtual
    427 clone were introduced also perform their @emph{Transform} stage
    428 on new functions, so for a pass there is no significant
    429 difference between operating on a real function or a virtual
    430 clone introduced before its @emph{Execute} stage.
    431 
    432 Optimization passes then work on virtual clones introduced before
    433 their @emph{Execute} stage as if they were real functions.  The
    434 only difference is that clones are not visible during the
    435 @emph{Generate Summary} stage.
    436 
    437 To keep function summaries updated, the callgraph interface
    438 allows an optimizer to register a callback that is called every
    439 time a new clone is introduced as well as when the actual
    440 function or variable is generated or when a function or variable
    441 is removed.  These hooks are registered in the @emph{Generate
    442 summary} stage and allow the pass to keep its information intact
    443 until the @emph{Execute} stage.  The same hooks can also be
    444 registered during the @emph{Execute} stage to keep the
    445 optimization summaries updated for the @emph{Transform} stage.
    446 
    447 @subsection IPA references
    448 
    449 GCC represents IPA references in the callgraph.  For a function
    450 or variable @code{A}, the @emph{IPA reference} is a list of all
    451 locations where the address of @code{A} is taken and, when
    452 @code{A} is a variable, a list of all direct stores and reads
    453 to/from @code{A}.  References represent an oriented multi-graph on
    454 the union of nodes of the callgraph and the varpool.  See
    455 @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
    456 and
    457 @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
    458 for details.
    459 
    460 @subsection Jump functions
    461 Suppose that an optimization pass sees a function @code{A} and it
    462 knows the values of (some of) its arguments.  The @emph{jump
    463 function} describes the value of a parameter of a given function
    464 call in function @code{A} based on this knowledge.
    465 
    466 Jump functions are used by several optimizations, such as the
    467 inter-procedural constant propagation pass and the
    468 devirtualization pass.  The inliner also uses jump functions to
    469 perform inlining of callbacks.
    470 
    471 @node WHOPR
    472 @section Whole program assumptions, linker plugin and symbol visibilities
    473 
    474 Link-time optimization gives relatively minor benefits when used
    475 alone.  The problem is that propagation of inter-procedural
    476 information does not work well across functions and variables
    477 that are called or referenced by other compilation units (such as
    478 from a dynamically linked library).  We say that such functions
    479 and variables are @emph{externally visible}.
    480 
    481 To make the situation even more difficult, many applications
    482 organize themselves as a set of shared libraries, and the default
    483 ELF visibility rules allow one to overwrite any externally
    484 visible symbol with a different symbol at runtime.  This
    485 basically disables any optimizations across such functions and
    486 variables, because the compiler cannot be sure that the function
    487 body it is seeing is the same function body that will be used at
    488 runtime.  Any function or variable not declared @code{static} in
    489 the sources degrades the quality of inter-procedural
    490 optimization.
    491 
    492 To avoid this problem the compiler must assume that it sees the
    493 whole program when doing link-time optimization.  Strictly
    494 speaking, the whole program is rarely visible even at link-time.
    495 Standard system libraries are usually linked dynamically or not
    496 provided with the link-time information.  In GCC, the whole
    497 program option (@option{-fwhole-program}) asserts that every
    498 function and variable defined in the current compilation
    499 unit is static, except for function @code{main} (note: at
    500 link time, the current unit is the union of all objects compiled
    501 with LTO).  Since some functions and variables need to
    502 be referenced externally, for example by another DSO or from an
    503 assembler file, GCC also provides the function and variable
    504 attribute @code{externally_visible} which can be used to disable
    505 the effect of @option{-fwhole-program} on a specific symbol.
    506 
    507 The whole program mode assumptions are slightly more complex in
    508 C++, where inline functions in headers are put into @emph{COMDAT}
    509 sections.  COMDAT function and variables can be defined by
    510 multiple object files and their bodies are unified at link-time
    511 and dynamic link-time.  COMDAT functions are changed to local only
    512 when their address is not taken and thus un-sharing them with a
    513 library is not harmful.  COMDAT variables always remain externally
    514 visible, however for readonly variables it is assumed that their
    515 initializers cannot be overwritten by a different value.
    516 
    517 GCC provides the function and variable attribute
    518 @code{visibility} that can be used to specify the visibility of
    519 externally visible symbols (or alternatively an
    520 @option{-fdefault-visibility} command line option).  ELF defines
    521 the @code{default}, @code{protected}, @code{hidden} and
    522 @code{internal} visibilities.
    523 
    524 The most commonly used is visibility is @code{hidden}.  It
    525 specifies that the symbol cannot be referenced from outside of
    526 the current shared library.  Unfortunately, this information
    527 cannot be used directly by the link-time optimization in the
    528 compiler since the whole shared library also might contain
    529 non-LTO objects and those are not visible to the compiler.
    530 
    531 GCC solves this problem using linker plugins.  A @emph{linker
    532 plugin} is an interface to the linker that allows an external
    533 program to claim the ownership of a given object file.  The linker
    534 then performs the linking procedure by querying the plugin about
    535 the symbol table of the claimed objects and once the linking
    536 decisions are complete, the plugin is allowed to provide the
    537 final object file before the actual linking is made.  The linker
    538 plugin obtains the symbol resolution information which specifies
    539 which symbols provided by the claimed objects are bound from the
    540 rest of a binary being linked.
    541 
    542 GCC is designed to be independent of the rest of the toolchain
    543 and aims to support linkers without plugin support.  For this
    544 reason it does not use the linker plugin by default.  Instead,
    545 the object files are examined by @command{collect2} before being
    546 passed to the linker and objects found to have LTO sections are
    547 passed to @command{lto1} first.  This mode does not work for
    548 library archives.  The decision on what object files from the
    549 archive are needed depends on the actual linking and thus GCC
    550 would have to implement the linker itself.  The resolution
    551 information is missing too and thus GCC needs to make an educated
    552 guess based on @option{-fwhole-program}.  Without the linker
    553 plugin GCC also assumes that symbols are declared @code{hidden}
    554 and not referred by non-LTO code by default.
    555 
    556 @node Internal flags
    557 @section Internal flags controlling @code{lto1}
    558 
    559 The following flags are passed into @command{lto1} and are not
    560 meant to be used directly from the command line.
    561 
    562 @itemize
    563 @item -fwpa
    564 @opindex fwpa
    565 This option runs the serial part of the link-time optimizer
    566 performing the inter-procedural propagation (WPA mode).  The
    567 compiler reads in summary information from all inputs and
    568 performs an analysis based on summary information only.  It
    569 generates object files for subsequent runs of the link-time
    570 optimizer where individual object files are optimized using both
    571 summary information from the WPA mode and the actual function
    572 bodies.  It then drives the LTRANS phase.
    573 
    574 @item -fltrans
    575 @opindex fltrans
    576 This option runs the link-time optimizer in the
    577 local-transformation (LTRANS) mode, which reads in output from a
    578 previous run of the LTO in WPA mode.  In the LTRANS mode, LTO
    579 optimizes an object and produces the final assembly.
    580 
    581 @item -fltrans-output-list=@var{file}
    582 @opindex fltrans-output-list
    583 This option specifies a file to which the names of LTRANS output
    584 files are written.  This option is only meaningful in conjunction
    585 with @option{-fwpa}.
    586 
    587 @item -fresolution=@var{file}
    588 @opindex fresolution
    589 This option specifies the linker resolution file.  This option is
    590 only meaningful in conjunction with @option{-fwpa} and as option
    591 to pass through to the LTO linker plugin.
    592 @end itemize
    593