1 1.11 mrg @c Copyright (C) 2010-2022 Free Software Foundation, Inc. 2 1.1 mrg @c This is part of the GCC manual. 3 1.1 mrg @c For copying conditions, see the file gcc.texi. 4 1.1 mrg @c Contributed by Jan Hubicka <jh (a] suse.cz> and 5 1.1 mrg @c Diego Novillo <dnovillo (a] google.com> 6 1.1 mrg 7 1.1 mrg @node LTO 8 1.1 mrg @chapter Link Time Optimization 9 1.1 mrg @cindex lto 10 1.1 mrg @cindex whopr 11 1.1 mrg @cindex wpa 12 1.1 mrg @cindex ltrans 13 1.1 mrg 14 1.1 mrg Link Time Optimization (LTO) gives GCC the capability of 15 1.1 mrg dumping its internal representation (GIMPLE) to disk, 16 1.1 mrg so that all the different compilation units that make up 17 1.1 mrg a single executable can be optimized as a single module. 18 1.1 mrg This expands the scope of inter-procedural optimizations 19 1.1 mrg to encompass the whole program (or, rather, everything 20 1.1 mrg that is visible at link time). 21 1.1 mrg 22 1.1 mrg @menu 23 1.1 mrg * LTO Overview:: Overview of LTO. 24 1.1 mrg * LTO object file layout:: LTO file sections in ELF. 25 1.1 mrg * IPA:: Using summary information in IPA passes. 26 1.1 mrg * WHOPR:: Whole program assumptions, 27 1.1 mrg linker plugin and symbol visibilities. 28 1.1 mrg * Internal flags:: Internal flags controlling @code{lto1}. 29 1.1 mrg @end menu 30 1.1 mrg 31 1.1 mrg @node LTO Overview 32 1.1 mrg @section Design Overview 33 1.1 mrg 34 1.1 mrg Link time optimization is implemented as a GCC front end for a 35 1.1 mrg bytecode representation of GIMPLE that is emitted in special sections 36 1.1 mrg of @code{.o} files. Currently, LTO support is enabled in most 37 1.1 mrg ELF-based systems, as well as darwin, cygwin and mingw systems. 38 1.1 mrg 39 1.11 mrg By default, object files generated with LTO support contain only GIMPLE 40 1.11 mrg bytecode. Such objects are called ``slim'', and they require that 41 1.11 mrg tools like @code{ar} and @code{nm} understand symbol tables of LTO 42 1.11 mrg sections. For most targets these tools have been extended to use the 43 1.11 mrg plugin infrastructure, so GCC can support ``slim'' objects consisting 44 1.11 mrg of the intermediate code alone. 45 1.11 mrg 46 1.11 mrg GIMPLE bytecode could also be saved alongside final object code if 47 1.11 mrg the @option{-ffat-lto-objects} option is passed, or if no plugin support 48 1.11 mrg is detected for @code{ar} and @code{nm} when GCC is configured. It makes 49 1.11 mrg the object files generated with LTO support larger than regular object 50 1.11 mrg files. This ``fat'' object format allows to ship one set of fat 51 1.1 mrg objects which could be used both for development and the production of 52 1.1 mrg optimized builds. A, perhaps surprising, side effect of this feature 53 1.4 mrg is that any mistake in the toolchain leads to LTO information not 54 1.1 mrg being used (e.g.@: an older @code{libtool} calling @code{ld} directly). 55 1.1 mrg This is both an advantage, as the system is more robust, and a 56 1.1 mrg disadvantage, as the user is not informed that the optimization has 57 1.1 mrg been disabled. 58 1.1 mrg 59 1.1 mrg At the highest level, LTO splits the compiler in two. The first half 60 1.1 mrg (the ``writer'') produces a streaming representation of all the 61 1.1 mrg internal data structures needed to optimize and generate code. This 62 1.1 mrg includes declarations, types, the callgraph and the GIMPLE representation 63 1.1 mrg of function bodies. 64 1.1 mrg 65 1.1 mrg When @option{-flto} is given during compilation of a source file, the 66 1.1 mrg pass manager executes all the passes in @code{all_lto_gen_passes}. 67 1.1 mrg Currently, this phase is composed of two IPA passes: 68 1.1 mrg 69 1.1 mrg @itemize @bullet 70 1.1 mrg @item @code{pass_ipa_lto_gimple_out} 71 1.1 mrg This pass executes the function @code{lto_output} in 72 1.11 mrg @file{lto-streamer-out.cc}, which traverses the call graph encoding 73 1.1 mrg every reachable declaration, type and function. This generates a 74 1.1 mrg memory representation of all the file sections described below. 75 1.1 mrg 76 1.1 mrg @item @code{pass_ipa_lto_finish_out} 77 1.1 mrg This pass executes the function @code{produce_asm_for_decls} in 78 1.11 mrg @file{lto-streamer-out.cc}, which takes the memory image built in the 79 1.1 mrg previous pass and encodes it in the corresponding ELF file sections. 80 1.1 mrg @end itemize 81 1.1 mrg 82 1.1 mrg The second half of LTO support is the ``reader''. This is implemented 83 1.11 mrg as the GCC front end @file{lto1} in @file{lto/lto.cc}. When 84 1.1 mrg @file{collect2} detects a link set of @code{.o}/@code{.a} files with 85 1.1 mrg LTO information and the @option{-flto} is enabled, it invokes 86 1.1 mrg @file{lto1} which reads the set of files and aggregates them into a 87 1.1 mrg single translation unit for optimization. The main entry point for 88 1.11 mrg the reader is @file{lto/lto.cc}:@code{lto_main}. 89 1.1 mrg 90 1.1 mrg @subsection LTO modes of operation 91 1.1 mrg 92 1.1 mrg One of the main goals of the GCC link-time infrastructure was to allow 93 1.1 mrg effective compilation of large programs. For this reason GCC implements two 94 1.1 mrg link-time compilation modes. 95 1.1 mrg 96 1.1 mrg @enumerate 97 1.1 mrg @item @emph{LTO mode}, in which the whole program is read into the 98 1.1 mrg compiler at link-time and optimized in a similar way as if it 99 1.1 mrg were a single source-level compilation unit. 100 1.1 mrg 101 1.1 mrg @item @emph{WHOPR or partitioned mode}, designed to utilize multiple 102 1.1 mrg CPUs and/or a distributed compilation environment to quickly link 103 1.1 mrg large applications. WHOPR stands for WHOle Program optimizeR (not to 104 1.1 mrg be confused with the semantics of @option{-fwhole-program}). It 105 1.1 mrg partitions the aggregated callgraph from many different @code{.o} 106 1.1 mrg files and distributes the compilation of the sub-graphs to different 107 1.1 mrg CPUs. 108 1.1 mrg 109 1.1 mrg Note that distributed compilation is not implemented yet, but since 110 1.1 mrg the parallelism is facilitated via generating a @code{Makefile}, it 111 1.1 mrg would be easy to implement. 112 1.1 mrg @end enumerate 113 1.1 mrg 114 1.1 mrg WHOPR splits LTO into three main stages: 115 1.1 mrg @enumerate 116 1.1 mrg @item Local generation (LGEN) 117 1.1 mrg This stage executes in parallel. Every file in the program is compiled 118 1.1 mrg into the intermediate language and packaged together with the local 119 1.1 mrg call-graph and summary information. This stage is the same for both 120 1.1 mrg the LTO and WHOPR compilation mode. 121 1.1 mrg 122 1.1 mrg @item Whole Program Analysis (WPA) 123 1.1 mrg WPA is performed sequentially. The global call-graph is generated, and 124 1.1 mrg a global analysis procedure makes transformation decisions. The global 125 1.1 mrg call-graph is partitioned to facilitate parallel optimization during 126 1.1 mrg phase 3. The results of the WPA stage are stored into new object files 127 1.1 mrg which contain the partitions of program expressed in the intermediate 128 1.1 mrg language and the optimization decisions. 129 1.1 mrg 130 1.1 mrg @item Local transformations (LTRANS) 131 1.1 mrg This stage executes in parallel. All the decisions made during phase 2 132 1.1 mrg are implemented locally in each partitioned object file, and the final 133 1.1 mrg object code is generated. Optimizations which cannot be decided 134 1.1 mrg efficiently during the phase 2 may be performed on the local 135 1.1 mrg call-graph partitions. 136 1.1 mrg @end enumerate 137 1.1 mrg 138 1.1 mrg WHOPR can be seen as an extension of the usual LTO mode of 139 1.1 mrg compilation. In LTO, WPA and LTRANS are executed within a single 140 1.1 mrg execution of the compiler, after the whole program has been read into 141 1.1 mrg memory. 142 1.1 mrg 143 1.1 mrg When compiling in WHOPR mode, the callgraph is partitioned during 144 1.1 mrg the WPA stage. The whole program is split into a given number of 145 1.1 mrg partitions of roughly the same size. The compiler tries to 146 1.1 mrg minimize the number of references which cross partition boundaries. 147 1.1 mrg The main advantage of WHOPR is to allow the parallel execution of 148 1.1 mrg LTRANS stages, which are the most time-consuming part of the 149 1.1 mrg compilation process. Additionally, it avoids the need to load the 150 1.1 mrg whole program into memory. 151 1.1 mrg 152 1.1 mrg 153 1.1 mrg @node LTO object file layout 154 1.1 mrg @section LTO file sections 155 1.1 mrg 156 1.1 mrg LTO information is stored in several ELF sections inside object files. 157 1.1 mrg Data structures and enum codes for sections are defined in 158 1.1 mrg @file{lto-streamer.h}. 159 1.1 mrg 160 1.11 mrg These sections are emitted from @file{lto-streamer-out.cc} and mapped 161 1.11 mrg in all at once from @file{lto/lto.cc}:@code{lto_file_read}. The 162 1.1 mrg individual functions dealing with the reading/writing of each section 163 1.1 mrg are described below. 164 1.1 mrg 165 1.1 mrg @itemize @bullet 166 1.1 mrg @item Command line options (@code{.gnu.lto_.opts}) 167 1.1 mrg 168 1.1 mrg This section contains the command line options used to generate the 169 1.1 mrg object files. This is used at link time to determine the optimization 170 1.1 mrg level and other settings when they are not explicitly specified at the 171 1.1 mrg linker command line. 172 1.1 mrg 173 1.1 mrg Currently, GCC does not support combining LTO object files compiled 174 1.1 mrg with different set of the command line options into a single binary. 175 1.1 mrg At link time, the options given on the command line and the options 176 1.1 mrg saved on all the files in a link-time set are applied globally. No 177 1.1 mrg attempt is made at validating the combination of flags (other than the 178 1.1 mrg usual validation done by option processing). This is implemented in 179 1.11 mrg @file{lto/lto.cc}:@code{lto_read_all_file_options}. 180 1.1 mrg 181 1.1 mrg 182 1.1 mrg @item Symbol table (@code{.gnu.lto_.symtab}) 183 1.1 mrg 184 1.1 mrg This table replaces the ELF symbol table for functions and variables 185 1.1 mrg represented in the LTO IL. Symbols used and exported by the optimized 186 1.1 mrg assembly code of ``fat'' objects might not match the ones used and 187 1.1 mrg exported by the intermediate code. This table is necessary because 188 1.1 mrg the intermediate code is less optimized and thus requires a separate 189 1.1 mrg symbol table. 190 1.1 mrg 191 1.1 mrg Additionally, the binary code in the ``fat'' object will lack a call 192 1.1 mrg to a function, since the call was optimized out at compilation time 193 1.1 mrg after the intermediate language was streamed out. In some special 194 1.1 mrg cases, the same optimization may not happen during link-time 195 1.1 mrg optimization. This would lead to an undefined symbol if only one 196 1.1 mrg symbol table was used. 197 1.1 mrg 198 1.1 mrg The symbol table is emitted in 199 1.11 mrg @file{lto-streamer-out.cc}:@code{produce_symtab}. 200 1.1 mrg 201 1.1 mrg 202 1.1 mrg @item Global declarations and types (@code{.gnu.lto_.decls}) 203 1.1 mrg 204 1.1 mrg This section contains an intermediate language dump of all 205 1.1 mrg declarations and types required to represent the callgraph, static 206 1.1 mrg variables and top-level debug info. 207 1.1 mrg 208 1.1 mrg The contents of this section are emitted in 209 1.11 mrg @file{lto-streamer-out.cc}:@code{produce_asm_for_decls}. Types and 210 1.1 mrg symbols are emitted in a topological order that preserves the sharing 211 1.1 mrg of pointers when the file is read back in 212 1.11 mrg (@file{lto.cc}:@code{read_cgraph_and_symbols}). 213 1.1 mrg 214 1.1 mrg 215 1.1 mrg @item The callgraph (@code{.gnu.lto_.cgraph}) 216 1.1 mrg 217 1.1 mrg This section contains the basic data structure used by the GCC 218 1.1 mrg inter-procedural optimization infrastructure. This section stores an 219 1.1 mrg annotated multi-graph which represents the functions and call sites as 220 1.1 mrg well as the variables, aliases and top-level @code{asm} statements. 221 1.1 mrg 222 1.1 mrg This section is emitted in 223 1.11 mrg @file{lto-streamer-out.cc}:@code{output_cgraph} and read in 224 1.11 mrg @file{lto-cgraph.cc}:@code{input_cgraph}. 225 1.1 mrg 226 1.1 mrg 227 1.1 mrg @item IPA references (@code{.gnu.lto_.refs}) 228 1.1 mrg 229 1.1 mrg This section contains references between function and static 230 1.11 mrg variables. It is emitted by @file{lto-cgraph.cc}:@code{output_refs} 231 1.11 mrg and read by @file{lto-cgraph.cc}:@code{input_refs}. 232 1.1 mrg 233 1.1 mrg 234 1.1 mrg @item Function bodies (@code{.gnu.lto_.function_body.<name>}) 235 1.1 mrg 236 1.1 mrg This section contains function bodies in the intermediate language 237 1.1 mrg representation. Every function body is in a separate section to allow 238 1.1 mrg copying of the section independently to different object files or 239 1.1 mrg reading the function on demand. 240 1.1 mrg 241 1.1 mrg Functions are emitted in 242 1.11 mrg @file{lto-streamer-out.cc}:@code{output_function} and read in 243 1.11 mrg @file{lto-streamer-in.cc}:@code{input_function}. 244 1.1 mrg 245 1.1 mrg 246 1.1 mrg @item Static variable initializers (@code{.gnu.lto_.vars}) 247 1.1 mrg 248 1.1 mrg This section contains all the symbols in the global variable pool. It 249 1.11 mrg is emitted by @file{lto-cgraph.cc}:@code{output_varpool} and read in 250 1.11 mrg @file{lto-cgraph.cc}:@code{input_cgraph}. 251 1.1 mrg 252 1.1 mrg @item Summaries and optimization summaries used by IPA passes 253 1.1 mrg (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs}, 254 1.1 mrg @code{pureconst} or @code{reference}) 255 1.1 mrg 256 1.1 mrg These sections are used by IPA passes that need to emit summary 257 1.1 mrg information during LTO generation to be read and aggregated at 258 1.1 mrg link time. Each pass is responsible for implementing two pass manager 259 1.1 mrg hooks: one for writing the summary and another for reading it in. The 260 1.1 mrg format of these sections is entirely up to each individual pass. The 261 1.1 mrg only requirement is that the writer and reader hooks agree on the 262 1.1 mrg format. 263 1.1 mrg @end itemize 264 1.1 mrg 265 1.1 mrg 266 1.1 mrg @node IPA 267 1.1 mrg @section Using summary information in IPA passes 268 1.1 mrg 269 1.1 mrg Programs are represented internally as a @emph{callgraph} (a 270 1.1 mrg multi-graph where nodes are functions and edges are call sites) 271 1.1 mrg and a @emph{varpool} (a list of static and external variables in 272 1.1 mrg the program). 273 1.1 mrg 274 1.1 mrg The inter-procedural optimization is organized as a sequence of 275 1.1 mrg individual passes, which operate on the callgraph and the 276 1.1 mrg varpool. To make the implementation of WHOPR possible, every 277 1.1 mrg inter-procedural optimization pass is split into several stages 278 1.1 mrg that are executed at different times during WHOPR compilation: 279 1.1 mrg 280 1.1 mrg @itemize @bullet 281 1.1 mrg @item LGEN time 282 1.1 mrg @enumerate 283 1.1 mrg @item @emph{Generate summary} (@code{generate_summary} in 284 1.1 mrg @code{struct ipa_opt_pass_d}). This stage analyzes every function 285 1.1 mrg body and variable initializer is examined and stores relevant 286 1.1 mrg information into a pass-specific data structure. 287 1.1 mrg 288 1.1 mrg @item @emph{Write summary} (@code{write_summary} in 289 1.1 mrg @code{struct ipa_opt_pass_d}). This stage writes all the 290 1.1 mrg pass-specific information generated by @code{generate_summary}. 291 1.1 mrg Summaries go into their own @code{LTO_section_*} sections that 292 1.1 mrg have to be declared in @file{lto-streamer.h}:@code{enum 293 1.1 mrg lto_section_type}. A new section is created by calling 294 1.1 mrg @code{create_output_block} and data can be written using the 295 1.1 mrg @code{lto_output_*} routines. 296 1.1 mrg @end enumerate 297 1.1 mrg 298 1.1 mrg @item WPA time 299 1.1 mrg @enumerate 300 1.1 mrg @item @emph{Read summary} (@code{read_summary} in 301 1.1 mrg @code{struct ipa_opt_pass_d}). This stage reads all the 302 1.1 mrg pass-specific information in exactly the same order that it was 303 1.1 mrg written by @code{write_summary}. 304 1.1 mrg 305 1.1 mrg @item @emph{Execute} (@code{execute} in @code{struct 306 1.1 mrg opt_pass}). This performs inter-procedural propagation. This 307 1.1 mrg must be done without actual access to the individual function 308 1.1 mrg bodies or variable initializers. Typically, this results in a 309 1.1 mrg transitive closure operation over the summary information of all 310 1.1 mrg the nodes in the callgraph. 311 1.1 mrg 312 1.1 mrg @item @emph{Write optimization summary} 313 1.1 mrg (@code{write_optimization_summary} in @code{struct 314 1.1 mrg ipa_opt_pass_d}). This writes the result of the inter-procedural 315 1.1 mrg propagation into the object file. This can use the same data 316 1.1 mrg structures and helper routines used in @code{write_summary}. 317 1.1 mrg @end enumerate 318 1.1 mrg 319 1.1 mrg @item LTRANS time 320 1.1 mrg @enumerate 321 1.1 mrg @item @emph{Read optimization summary} 322 1.1 mrg (@code{read_optimization_summary} in @code{struct 323 1.1 mrg ipa_opt_pass_d}). The counterpart to 324 1.1 mrg @code{write_optimization_summary}. This reads the interprocedural 325 1.1 mrg optimization decisions in exactly the same format emitted by 326 1.1 mrg @code{write_optimization_summary}. 327 1.1 mrg 328 1.1 mrg @item @emph{Transform} (@code{function_transform} and 329 1.1 mrg @code{variable_transform} in @code{struct ipa_opt_pass_d}). 330 1.1 mrg The actual function bodies and variable initializers are updated 331 1.1 mrg based on the information passed down from the @emph{Execute} stage. 332 1.1 mrg @end enumerate 333 1.1 mrg @end itemize 334 1.1 mrg 335 1.1 mrg The implementation of the inter-procedural passes are shared 336 1.1 mrg between LTO, WHOPR and classic non-LTO compilation. 337 1.1 mrg 338 1.1 mrg @itemize 339 1.1 mrg @item During the traditional file-by-file mode every pass executes its 340 1.1 mrg own @emph{Generate summary}, @emph{Execute}, and @emph{Transform} 341 1.1 mrg stages within the single execution context of the compiler. 342 1.1 mrg 343 1.1 mrg @item In LTO compilation mode, every pass uses @emph{Generate 344 1.1 mrg summary} and @emph{Write summary} stages at compilation time, 345 1.1 mrg while the @emph{Read summary}, @emph{Execute}, and 346 1.1 mrg @emph{Transform} stages are executed at link time. 347 1.1 mrg 348 1.1 mrg @item In WHOPR mode all stages are used. 349 1.1 mrg @end itemize 350 1.1 mrg 351 1.1 mrg To simplify development, the GCC pass manager differentiates 352 1.10 mrg between normal inter-procedural passes (@pxref{Regular IPA passes}), 353 1.10 mrg small inter-procedural passes (@pxref{Small IPA passes}) 354 1.10 mrg and late inter-procedural passes (@pxref{Late IPA passes}). 355 1.10 mrg A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does 356 1.10 mrg everything at once and thus cannot be executed during WPA in 357 1.1 mrg WHOPR mode. It defines only the @emph{Execute} stage and during 358 1.1 mrg this stage it accesses and modifies the function bodies. Such 359 1.1 mrg passes are useful for optimization at LGEN or LTRANS time and are 360 1.1 mrg used, for example, to implement early optimization before writing 361 1.1 mrg object files. The simple inter-procedural passes can also be used 362 1.1 mrg for easier prototyping and development of a new inter-procedural 363 1.1 mrg pass. 364 1.1 mrg 365 1.1 mrg 366 1.1 mrg @subsection Virtual clones 367 1.1 mrg 368 1.1 mrg One of the main challenges of introducing the WHOPR compilation 369 1.1 mrg mode was addressing the interactions between optimization passes. 370 1.1 mrg In LTO compilation mode, the passes are executed in a sequence, 371 1.1 mrg each of which consists of analysis (or @emph{Generate summary}), 372 1.1 mrg propagation (or @emph{Execute}) and @emph{Transform} stages. 373 1.1 mrg Once the work of one pass is finished, the next pass sees the 374 1.1 mrg updated program representation and can execute. This makes the 375 1.1 mrg individual passes dependent on each other. 376 1.1 mrg 377 1.1 mrg In WHOPR mode all passes first execute their @emph{Generate 378 1.1 mrg summary} stage. Then summary writing marks the end of the LGEN 379 1.1 mrg stage. At WPA time, 380 1.1 mrg the summaries are read back into memory and all passes run the 381 1.1 mrg @emph{Execute} stage. Optimization summaries are streamed and 382 1.1 mrg sent to LTRANS, where all the passes execute the @emph{Transform} 383 1.1 mrg stage. 384 1.1 mrg 385 1.1 mrg Most optimization passes split naturally into analysis, 386 1.1 mrg propagation and transformation stages. But some do not. The 387 1.1 mrg main problem arises when one pass performs changes and the 388 1.1 mrg following pass gets confused by seeing different callgraphs 389 1.1 mrg between the @emph{Transform} stage and the @emph{Generate summary} 390 1.1 mrg or @emph{Execute} stage. This means that the passes are required 391 1.1 mrg to communicate their decisions with each other. 392 1.1 mrg 393 1.1 mrg To facilitate this communication, the GCC callgraph 394 1.1 mrg infrastructure implements @emph{virtual clones}, a method of 395 1.1 mrg representing the changes performed by the optimization passes in 396 1.1 mrg the callgraph without needing to update function bodies. 397 1.1 mrg 398 1.1 mrg A @emph{virtual clone} in the callgraph is a function that has no 399 1.1 mrg associated body, just a description of how to create its body based 400 1.1 mrg on a different function (which itself may be a virtual clone). 401 1.1 mrg 402 1.1 mrg The description of function modifications includes adjustments to 403 1.1 mrg the function's signature (which allows, for example, removing or 404 1.1 mrg adding function arguments), substitutions to perform on the 405 1.1 mrg function body, and, for inlined functions, a pointer to the 406 1.1 mrg function that it will be inlined into. 407 1.1 mrg 408 1.1 mrg It is also possible to redirect any edge of the callgraph from a 409 1.1 mrg function to its virtual clone. This implies updating of the call 410 1.1 mrg site to adjust for the new function signature. 411 1.1 mrg 412 1.1 mrg Most of the transformations performed by inter-procedural 413 1.1 mrg optimizations can be represented via virtual clones. For 414 1.1 mrg instance, a constant propagation pass can produce a virtual clone 415 1.1 mrg of the function which replaces one of its arguments by a 416 1.1 mrg constant. The inliner can represent its decisions by producing a 417 1.1 mrg clone of a function whose body will be later integrated into 418 1.1 mrg a given function. 419 1.1 mrg 420 1.1 mrg Using @emph{virtual clones}, the program can be easily updated 421 1.1 mrg during the @emph{Execute} stage, solving most of pass interactions 422 1.1 mrg problems that would otherwise occur during @emph{Transform}. 423 1.1 mrg 424 1.1 mrg Virtual clones are later materialized in the LTRANS stage and 425 1.1 mrg turned into real functions. Passes executed after the virtual 426 1.1 mrg clone were introduced also perform their @emph{Transform} stage 427 1.1 mrg on new functions, so for a pass there is no significant 428 1.1 mrg difference between operating on a real function or a virtual 429 1.1 mrg clone introduced before its @emph{Execute} stage. 430 1.1 mrg 431 1.1 mrg Optimization passes then work on virtual clones introduced before 432 1.1 mrg their @emph{Execute} stage as if they were real functions. The 433 1.1 mrg only difference is that clones are not visible during the 434 1.1 mrg @emph{Generate Summary} stage. 435 1.1 mrg 436 1.1 mrg To keep function summaries updated, the callgraph interface 437 1.1 mrg allows an optimizer to register a callback that is called every 438 1.1 mrg time a new clone is introduced as well as when the actual 439 1.1 mrg function or variable is generated or when a function or variable 440 1.1 mrg is removed. These hooks are registered in the @emph{Generate 441 1.1 mrg summary} stage and allow the pass to keep its information intact 442 1.1 mrg until the @emph{Execute} stage. The same hooks can also be 443 1.1 mrg registered during the @emph{Execute} stage to keep the 444 1.1 mrg optimization summaries updated for the @emph{Transform} stage. 445 1.1 mrg 446 1.1 mrg @subsection IPA references 447 1.1 mrg 448 1.1 mrg GCC represents IPA references in the callgraph. For a function 449 1.1 mrg or variable @code{A}, the @emph{IPA reference} is a list of all 450 1.1 mrg locations where the address of @code{A} is taken and, when 451 1.1 mrg @code{A} is a variable, a list of all direct stores and reads 452 1.1 mrg to/from @code{A}. References represent an oriented multi-graph on 453 1.1 mrg the union of nodes of the callgraph and the varpool. See 454 1.11 mrg @file{ipa-reference.cc}:@code{ipa_reference_write_optimization_summary} 455 1.1 mrg and 456 1.11 mrg @file{ipa-reference.cc}:@code{ipa_reference_read_optimization_summary} 457 1.1 mrg for details. 458 1.1 mrg 459 1.1 mrg @subsection Jump functions 460 1.1 mrg Suppose that an optimization pass sees a function @code{A} and it 461 1.1 mrg knows the values of (some of) its arguments. The @emph{jump 462 1.1 mrg function} describes the value of a parameter of a given function 463 1.1 mrg call in function @code{A} based on this knowledge. 464 1.1 mrg 465 1.1 mrg Jump functions are used by several optimizations, such as the 466 1.1 mrg inter-procedural constant propagation pass and the 467 1.1 mrg devirtualization pass. The inliner also uses jump functions to 468 1.1 mrg perform inlining of callbacks. 469 1.1 mrg 470 1.1 mrg @node WHOPR 471 1.1 mrg @section Whole program assumptions, linker plugin and symbol visibilities 472 1.1 mrg 473 1.1 mrg Link-time optimization gives relatively minor benefits when used 474 1.1 mrg alone. The problem is that propagation of inter-procedural 475 1.1 mrg information does not work well across functions and variables 476 1.1 mrg that are called or referenced by other compilation units (such as 477 1.1 mrg from a dynamically linked library). We say that such functions 478 1.1 mrg and variables are @emph{externally visible}. 479 1.1 mrg 480 1.1 mrg To make the situation even more difficult, many applications 481 1.1 mrg organize themselves as a set of shared libraries, and the default 482 1.1 mrg ELF visibility rules allow one to overwrite any externally 483 1.1 mrg visible symbol with a different symbol at runtime. This 484 1.1 mrg basically disables any optimizations across such functions and 485 1.1 mrg variables, because the compiler cannot be sure that the function 486 1.1 mrg body it is seeing is the same function body that will be used at 487 1.1 mrg runtime. Any function or variable not declared @code{static} in 488 1.1 mrg the sources degrades the quality of inter-procedural 489 1.1 mrg optimization. 490 1.1 mrg 491 1.1 mrg To avoid this problem the compiler must assume that it sees the 492 1.1 mrg whole program when doing link-time optimization. Strictly 493 1.1 mrg speaking, the whole program is rarely visible even at link-time. 494 1.1 mrg Standard system libraries are usually linked dynamically or not 495 1.1 mrg provided with the link-time information. In GCC, the whole 496 1.1 mrg program option (@option{-fwhole-program}) asserts that every 497 1.1 mrg function and variable defined in the current compilation 498 1.1 mrg unit is static, except for function @code{main} (note: at 499 1.1 mrg link time, the current unit is the union of all objects compiled 500 1.1 mrg with LTO). Since some functions and variables need to 501 1.1 mrg be referenced externally, for example by another DSO or from an 502 1.1 mrg assembler file, GCC also provides the function and variable 503 1.1 mrg attribute @code{externally_visible} which can be used to disable 504 1.1 mrg the effect of @option{-fwhole-program} on a specific symbol. 505 1.1 mrg 506 1.1 mrg The whole program mode assumptions are slightly more complex in 507 1.1 mrg C++, where inline functions in headers are put into @emph{COMDAT} 508 1.1 mrg sections. COMDAT function and variables can be defined by 509 1.1 mrg multiple object files and their bodies are unified at link-time 510 1.1 mrg and dynamic link-time. COMDAT functions are changed to local only 511 1.1 mrg when their address is not taken and thus un-sharing them with a 512 1.1 mrg library is not harmful. COMDAT variables always remain externally 513 1.1 mrg visible, however for readonly variables it is assumed that their 514 1.1 mrg initializers cannot be overwritten by a different value. 515 1.1 mrg 516 1.1 mrg GCC provides the function and variable attribute 517 1.1 mrg @code{visibility} that can be used to specify the visibility of 518 1.1 mrg externally visible symbols (or alternatively an 519 1.1 mrg @option{-fdefault-visibility} command line option). ELF defines 520 1.1 mrg the @code{default}, @code{protected}, @code{hidden} and 521 1.1 mrg @code{internal} visibilities. 522 1.1 mrg 523 1.1 mrg The most commonly used is visibility is @code{hidden}. It 524 1.1 mrg specifies that the symbol cannot be referenced from outside of 525 1.1 mrg the current shared library. Unfortunately, this information 526 1.1 mrg cannot be used directly by the link-time optimization in the 527 1.1 mrg compiler since the whole shared library also might contain 528 1.1 mrg non-LTO objects and those are not visible to the compiler. 529 1.1 mrg 530 1.1 mrg GCC solves this problem using linker plugins. A @emph{linker 531 1.1 mrg plugin} is an interface to the linker that allows an external 532 1.1 mrg program to claim the ownership of a given object file. The linker 533 1.1 mrg then performs the linking procedure by querying the plugin about 534 1.1 mrg the symbol table of the claimed objects and once the linking 535 1.1 mrg decisions are complete, the plugin is allowed to provide the 536 1.1 mrg final object file before the actual linking is made. The linker 537 1.1 mrg plugin obtains the symbol resolution information which specifies 538 1.1 mrg which symbols provided by the claimed objects are bound from the 539 1.1 mrg rest of a binary being linked. 540 1.1 mrg 541 1.1 mrg GCC is designed to be independent of the rest of the toolchain 542 1.1 mrg and aims to support linkers without plugin support. For this 543 1.1 mrg reason it does not use the linker plugin by default. Instead, 544 1.1 mrg the object files are examined by @command{collect2} before being 545 1.1 mrg passed to the linker and objects found to have LTO sections are 546 1.1 mrg passed to @command{lto1} first. This mode does not work for 547 1.1 mrg library archives. The decision on what object files from the 548 1.1 mrg archive are needed depends on the actual linking and thus GCC 549 1.1 mrg would have to implement the linker itself. The resolution 550 1.1 mrg information is missing too and thus GCC needs to make an educated 551 1.1 mrg guess based on @option{-fwhole-program}. Without the linker 552 1.1 mrg plugin GCC also assumes that symbols are declared @code{hidden} 553 1.1 mrg and not referred by non-LTO code by default. 554 1.1 mrg 555 1.1 mrg @node Internal flags 556 1.1 mrg @section Internal flags controlling @code{lto1} 557 1.1 mrg 558 1.1 mrg The following flags are passed into @command{lto1} and are not 559 1.1 mrg meant to be used directly from the command line. 560 1.1 mrg 561 1.1 mrg @itemize 562 1.1 mrg @item -fwpa 563 1.1 mrg @opindex fwpa 564 1.1 mrg This option runs the serial part of the link-time optimizer 565 1.1 mrg performing the inter-procedural propagation (WPA mode). The 566 1.1 mrg compiler reads in summary information from all inputs and 567 1.1 mrg performs an analysis based on summary information only. It 568 1.1 mrg generates object files for subsequent runs of the link-time 569 1.1 mrg optimizer where individual object files are optimized using both 570 1.1 mrg summary information from the WPA mode and the actual function 571 1.1 mrg bodies. It then drives the LTRANS phase. 572 1.1 mrg 573 1.1 mrg @item -fltrans 574 1.1 mrg @opindex fltrans 575 1.1 mrg This option runs the link-time optimizer in the 576 1.1 mrg local-transformation (LTRANS) mode, which reads in output from a 577 1.1 mrg previous run of the LTO in WPA mode. In the LTRANS mode, LTO 578 1.1 mrg optimizes an object and produces the final assembly. 579 1.1 mrg 580 1.1 mrg @item -fltrans-output-list=@var{file} 581 1.1 mrg @opindex fltrans-output-list 582 1.1 mrg This option specifies a file to which the names of LTRANS output 583 1.1 mrg files are written. This option is only meaningful in conjunction 584 1.1 mrg with @option{-fwpa}. 585 1.3 mrg 586 1.3 mrg @item -fresolution=@var{file} 587 1.3 mrg @opindex fresolution 588 1.3 mrg This option specifies the linker resolution file. This option is 589 1.3 mrg only meaningful in conjunction with @option{-fwpa} and as option 590 1.3 mrg to pass through to the LTO linker plugin. 591 1.1 mrg @end itemize 592