lto.texi revision 1.10 1 1.10 mrg @c Copyright (C) 2010-2020 Free Software Foundation, Inc.
2 1.1 mrg @c This is part of the GCC manual.
3 1.1 mrg @c For copying conditions, see the file gcc.texi.
4 1.1 mrg @c Contributed by Jan Hubicka <jh (a] suse.cz> and
5 1.1 mrg @c Diego Novillo <dnovillo (a] google.com>
6 1.1 mrg
7 1.1 mrg @node LTO
8 1.1 mrg @chapter Link Time Optimization
9 1.1 mrg @cindex lto
10 1.1 mrg @cindex whopr
11 1.1 mrg @cindex wpa
12 1.1 mrg @cindex ltrans
13 1.1 mrg
14 1.1 mrg Link Time Optimization (LTO) gives GCC the capability of
15 1.1 mrg dumping its internal representation (GIMPLE) to disk,
16 1.1 mrg so that all the different compilation units that make up
17 1.1 mrg a single executable can be optimized as a single module.
18 1.1 mrg This expands the scope of inter-procedural optimizations
19 1.1 mrg to encompass the whole program (or, rather, everything
20 1.1 mrg that is visible at link time).
21 1.1 mrg
22 1.1 mrg @menu
23 1.1 mrg * LTO Overview:: Overview of LTO.
24 1.1 mrg * LTO object file layout:: LTO file sections in ELF.
25 1.1 mrg * IPA:: Using summary information in IPA passes.
26 1.1 mrg * WHOPR:: Whole program assumptions,
27 1.1 mrg linker plugin and symbol visibilities.
28 1.1 mrg * Internal flags:: Internal flags controlling @code{lto1}.
29 1.1 mrg @end menu
30 1.1 mrg
31 1.1 mrg @node LTO Overview
32 1.1 mrg @section Design Overview
33 1.1 mrg
34 1.1 mrg Link time optimization is implemented as a GCC front end for a
35 1.1 mrg bytecode representation of GIMPLE that is emitted in special sections
36 1.1 mrg of @code{.o} files. Currently, LTO support is enabled in most
37 1.1 mrg ELF-based systems, as well as darwin, cygwin and mingw systems.
38 1.1 mrg
39 1.1 mrg Since GIMPLE bytecode is saved alongside final object code, object
40 1.1 mrg files generated with LTO support are larger than regular object files.
41 1.1 mrg This ``fat'' object format makes it easy to integrate LTO into
42 1.1 mrg existing build systems, as one can, for instance, produce archives of
43 1.1 mrg the files. Additionally, one might be able to ship one set of fat
44 1.1 mrg objects which could be used both for development and the production of
45 1.1 mrg optimized builds. A, perhaps surprising, side effect of this feature
46 1.4 mrg is that any mistake in the toolchain leads to LTO information not
47 1.1 mrg being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
48 1.1 mrg This is both an advantage, as the system is more robust, and a
49 1.1 mrg disadvantage, as the user is not informed that the optimization has
50 1.1 mrg been disabled.
51 1.1 mrg
52 1.1 mrg The current implementation only produces ``fat'' objects, effectively
53 1.1 mrg doubling compilation time and increasing file sizes up to 5x the
54 1.1 mrg original size. This hides the problem that some tools, such as
55 1.1 mrg @code{ar} and @code{nm}, need to understand symbol tables of LTO
56 1.1 mrg sections. These tools were extended to use the plugin infrastructure,
57 1.1 mrg and with these problems solved, GCC will also support ``slim'' objects
58 1.1 mrg consisting of the intermediate code alone.
59 1.1 mrg
60 1.1 mrg At the highest level, LTO splits the compiler in two. The first half
61 1.1 mrg (the ``writer'') produces a streaming representation of all the
62 1.1 mrg internal data structures needed to optimize and generate code. This
63 1.1 mrg includes declarations, types, the callgraph and the GIMPLE representation
64 1.1 mrg of function bodies.
65 1.1 mrg
66 1.1 mrg When @option{-flto} is given during compilation of a source file, the
67 1.1 mrg pass manager executes all the passes in @code{all_lto_gen_passes}.
68 1.1 mrg Currently, this phase is composed of two IPA passes:
69 1.1 mrg
70 1.1 mrg @itemize @bullet
71 1.1 mrg @item @code{pass_ipa_lto_gimple_out}
72 1.1 mrg This pass executes the function @code{lto_output} in
73 1.1 mrg @file{lto-streamer-out.c}, which traverses the call graph encoding
74 1.1 mrg every reachable declaration, type and function. This generates a
75 1.1 mrg memory representation of all the file sections described below.
76 1.1 mrg
77 1.1 mrg @item @code{pass_ipa_lto_finish_out}
78 1.1 mrg This pass executes the function @code{produce_asm_for_decls} in
79 1.1 mrg @file{lto-streamer-out.c}, which takes the memory image built in the
80 1.1 mrg previous pass and encodes it in the corresponding ELF file sections.
81 1.1 mrg @end itemize
82 1.1 mrg
83 1.1 mrg The second half of LTO support is the ``reader''. This is implemented
84 1.1 mrg as the GCC front end @file{lto1} in @file{lto/lto.c}. When
85 1.1 mrg @file{collect2} detects a link set of @code{.o}/@code{.a} files with
86 1.1 mrg LTO information and the @option{-flto} is enabled, it invokes
87 1.1 mrg @file{lto1} which reads the set of files and aggregates them into a
88 1.1 mrg single translation unit for optimization. The main entry point for
89 1.1 mrg the reader is @file{lto/lto.c}:@code{lto_main}.
90 1.1 mrg
91 1.1 mrg @subsection LTO modes of operation
92 1.1 mrg
93 1.1 mrg One of the main goals of the GCC link-time infrastructure was to allow
94 1.1 mrg effective compilation of large programs. For this reason GCC implements two
95 1.1 mrg link-time compilation modes.
96 1.1 mrg
97 1.1 mrg @enumerate
98 1.1 mrg @item @emph{LTO mode}, in which the whole program is read into the
99 1.1 mrg compiler at link-time and optimized in a similar way as if it
100 1.1 mrg were a single source-level compilation unit.
101 1.1 mrg
102 1.1 mrg @item @emph{WHOPR or partitioned mode}, designed to utilize multiple
103 1.1 mrg CPUs and/or a distributed compilation environment to quickly link
104 1.1 mrg large applications. WHOPR stands for WHOle Program optimizeR (not to
105 1.1 mrg be confused with the semantics of @option{-fwhole-program}). It
106 1.1 mrg partitions the aggregated callgraph from many different @code{.o}
107 1.1 mrg files and distributes the compilation of the sub-graphs to different
108 1.1 mrg CPUs.
109 1.1 mrg
110 1.1 mrg Note that distributed compilation is not implemented yet, but since
111 1.1 mrg the parallelism is facilitated via generating a @code{Makefile}, it
112 1.1 mrg would be easy to implement.
113 1.1 mrg @end enumerate
114 1.1 mrg
115 1.1 mrg WHOPR splits LTO into three main stages:
116 1.1 mrg @enumerate
117 1.1 mrg @item Local generation (LGEN)
118 1.1 mrg This stage executes in parallel. Every file in the program is compiled
119 1.1 mrg into the intermediate language and packaged together with the local
120 1.1 mrg call-graph and summary information. This stage is the same for both
121 1.1 mrg the LTO and WHOPR compilation mode.
122 1.1 mrg
123 1.1 mrg @item Whole Program Analysis (WPA)
124 1.1 mrg WPA is performed sequentially. The global call-graph is generated, and
125 1.1 mrg a global analysis procedure makes transformation decisions. The global
126 1.1 mrg call-graph is partitioned to facilitate parallel optimization during
127 1.1 mrg phase 3. The results of the WPA stage are stored into new object files
128 1.1 mrg which contain the partitions of program expressed in the intermediate
129 1.1 mrg language and the optimization decisions.
130 1.1 mrg
131 1.1 mrg @item Local transformations (LTRANS)
132 1.1 mrg This stage executes in parallel. All the decisions made during phase 2
133 1.1 mrg are implemented locally in each partitioned object file, and the final
134 1.1 mrg object code is generated. Optimizations which cannot be decided
135 1.1 mrg efficiently during the phase 2 may be performed on the local
136 1.1 mrg call-graph partitions.
137 1.1 mrg @end enumerate
138 1.1 mrg
139 1.1 mrg WHOPR can be seen as an extension of the usual LTO mode of
140 1.1 mrg compilation. In LTO, WPA and LTRANS are executed within a single
141 1.1 mrg execution of the compiler, after the whole program has been read into
142 1.1 mrg memory.
143 1.1 mrg
144 1.1 mrg When compiling in WHOPR mode, the callgraph is partitioned during
145 1.1 mrg the WPA stage. The whole program is split into a given number of
146 1.1 mrg partitions of roughly the same size. The compiler tries to
147 1.1 mrg minimize the number of references which cross partition boundaries.
148 1.1 mrg The main advantage of WHOPR is to allow the parallel execution of
149 1.1 mrg LTRANS stages, which are the most time-consuming part of the
150 1.1 mrg compilation process. Additionally, it avoids the need to load the
151 1.1 mrg whole program into memory.
152 1.1 mrg
153 1.1 mrg
154 1.1 mrg @node LTO object file layout
155 1.1 mrg @section LTO file sections
156 1.1 mrg
157 1.1 mrg LTO information is stored in several ELF sections inside object files.
158 1.1 mrg Data structures and enum codes for sections are defined in
159 1.1 mrg @file{lto-streamer.h}.
160 1.1 mrg
161 1.1 mrg These sections are emitted from @file{lto-streamer-out.c} and mapped
162 1.1 mrg in all at once from @file{lto/lto.c}:@code{lto_file_read}. The
163 1.1 mrg individual functions dealing with the reading/writing of each section
164 1.1 mrg are described below.
165 1.1 mrg
166 1.1 mrg @itemize @bullet
167 1.1 mrg @item Command line options (@code{.gnu.lto_.opts})
168 1.1 mrg
169 1.1 mrg This section contains the command line options used to generate the
170 1.1 mrg object files. This is used at link time to determine the optimization
171 1.1 mrg level and other settings when they are not explicitly specified at the
172 1.1 mrg linker command line.
173 1.1 mrg
174 1.1 mrg Currently, GCC does not support combining LTO object files compiled
175 1.1 mrg with different set of the command line options into a single binary.
176 1.1 mrg At link time, the options given on the command line and the options
177 1.1 mrg saved on all the files in a link-time set are applied globally. No
178 1.1 mrg attempt is made at validating the combination of flags (other than the
179 1.1 mrg usual validation done by option processing). This is implemented in
180 1.1 mrg @file{lto/lto.c}:@code{lto_read_all_file_options}.
181 1.1 mrg
182 1.1 mrg
183 1.1 mrg @item Symbol table (@code{.gnu.lto_.symtab})
184 1.1 mrg
185 1.1 mrg This table replaces the ELF symbol table for functions and variables
186 1.1 mrg represented in the LTO IL. Symbols used and exported by the optimized
187 1.1 mrg assembly code of ``fat'' objects might not match the ones used and
188 1.1 mrg exported by the intermediate code. This table is necessary because
189 1.1 mrg the intermediate code is less optimized and thus requires a separate
190 1.1 mrg symbol table.
191 1.1 mrg
192 1.1 mrg Additionally, the binary code in the ``fat'' object will lack a call
193 1.1 mrg to a function, since the call was optimized out at compilation time
194 1.1 mrg after the intermediate language was streamed out. In some special
195 1.1 mrg cases, the same optimization may not happen during link-time
196 1.1 mrg optimization. This would lead to an undefined symbol if only one
197 1.1 mrg symbol table was used.
198 1.1 mrg
199 1.1 mrg The symbol table is emitted in
200 1.1 mrg @file{lto-streamer-out.c}:@code{produce_symtab}.
201 1.1 mrg
202 1.1 mrg
203 1.1 mrg @item Global declarations and types (@code{.gnu.lto_.decls})
204 1.1 mrg
205 1.1 mrg This section contains an intermediate language dump of all
206 1.1 mrg declarations and types required to represent the callgraph, static
207 1.1 mrg variables and top-level debug info.
208 1.1 mrg
209 1.1 mrg The contents of this section are emitted in
210 1.1 mrg @file{lto-streamer-out.c}:@code{produce_asm_for_decls}. Types and
211 1.1 mrg symbols are emitted in a topological order that preserves the sharing
212 1.1 mrg of pointers when the file is read back in
213 1.1 mrg (@file{lto.c}:@code{read_cgraph_and_symbols}).
214 1.1 mrg
215 1.1 mrg
216 1.1 mrg @item The callgraph (@code{.gnu.lto_.cgraph})
217 1.1 mrg
218 1.1 mrg This section contains the basic data structure used by the GCC
219 1.1 mrg inter-procedural optimization infrastructure. This section stores an
220 1.1 mrg annotated multi-graph which represents the functions and call sites as
221 1.1 mrg well as the variables, aliases and top-level @code{asm} statements.
222 1.1 mrg
223 1.1 mrg This section is emitted in
224 1.1 mrg @file{lto-streamer-out.c}:@code{output_cgraph} and read in
225 1.1 mrg @file{lto-cgraph.c}:@code{input_cgraph}.
226 1.1 mrg
227 1.1 mrg
228 1.1 mrg @item IPA references (@code{.gnu.lto_.refs})
229 1.1 mrg
230 1.1 mrg This section contains references between function and static
231 1.1 mrg variables. It is emitted by @file{lto-cgraph.c}:@code{output_refs}
232 1.1 mrg and read by @file{lto-cgraph.c}:@code{input_refs}.
233 1.1 mrg
234 1.1 mrg
235 1.1 mrg @item Function bodies (@code{.gnu.lto_.function_body.<name>})
236 1.1 mrg
237 1.1 mrg This section contains function bodies in the intermediate language
238 1.1 mrg representation. Every function body is in a separate section to allow
239 1.1 mrg copying of the section independently to different object files or
240 1.1 mrg reading the function on demand.
241 1.1 mrg
242 1.1 mrg Functions are emitted in
243 1.1 mrg @file{lto-streamer-out.c}:@code{output_function} and read in
244 1.1 mrg @file{lto-streamer-in.c}:@code{input_function}.
245 1.1 mrg
246 1.1 mrg
247 1.1 mrg @item Static variable initializers (@code{.gnu.lto_.vars})
248 1.1 mrg
249 1.1 mrg This section contains all the symbols in the global variable pool. It
250 1.1 mrg is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
251 1.1 mrg @file{lto-cgraph.c}:@code{input_cgraph}.
252 1.1 mrg
253 1.1 mrg @item Summaries and optimization summaries used by IPA passes
254 1.1 mrg (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
255 1.1 mrg @code{pureconst} or @code{reference})
256 1.1 mrg
257 1.1 mrg These sections are used by IPA passes that need to emit summary
258 1.1 mrg information during LTO generation to be read and aggregated at
259 1.1 mrg link time. Each pass is responsible for implementing two pass manager
260 1.1 mrg hooks: one for writing the summary and another for reading it in. The
261 1.1 mrg format of these sections is entirely up to each individual pass. The
262 1.1 mrg only requirement is that the writer and reader hooks agree on the
263 1.1 mrg format.
264 1.1 mrg @end itemize
265 1.1 mrg
266 1.1 mrg
267 1.1 mrg @node IPA
268 1.1 mrg @section Using summary information in IPA passes
269 1.1 mrg
270 1.1 mrg Programs are represented internally as a @emph{callgraph} (a
271 1.1 mrg multi-graph where nodes are functions and edges are call sites)
272 1.1 mrg and a @emph{varpool} (a list of static and external variables in
273 1.1 mrg the program).
274 1.1 mrg
275 1.1 mrg The inter-procedural optimization is organized as a sequence of
276 1.1 mrg individual passes, which operate on the callgraph and the
277 1.1 mrg varpool. To make the implementation of WHOPR possible, every
278 1.1 mrg inter-procedural optimization pass is split into several stages
279 1.1 mrg that are executed at different times during WHOPR compilation:
280 1.1 mrg
281 1.1 mrg @itemize @bullet
282 1.1 mrg @item LGEN time
283 1.1 mrg @enumerate
284 1.1 mrg @item @emph{Generate summary} (@code{generate_summary} in
285 1.1 mrg @code{struct ipa_opt_pass_d}). This stage analyzes every function
286 1.1 mrg body and variable initializer is examined and stores relevant
287 1.1 mrg information into a pass-specific data structure.
288 1.1 mrg
289 1.1 mrg @item @emph{Write summary} (@code{write_summary} in
290 1.1 mrg @code{struct ipa_opt_pass_d}). This stage writes all the
291 1.1 mrg pass-specific information generated by @code{generate_summary}.
292 1.1 mrg Summaries go into their own @code{LTO_section_*} sections that
293 1.1 mrg have to be declared in @file{lto-streamer.h}:@code{enum
294 1.1 mrg lto_section_type}. A new section is created by calling
295 1.1 mrg @code{create_output_block} and data can be written using the
296 1.1 mrg @code{lto_output_*} routines.
297 1.1 mrg @end enumerate
298 1.1 mrg
299 1.1 mrg @item WPA time
300 1.1 mrg @enumerate
301 1.1 mrg @item @emph{Read summary} (@code{read_summary} in
302 1.1 mrg @code{struct ipa_opt_pass_d}). This stage reads all the
303 1.1 mrg pass-specific information in exactly the same order that it was
304 1.1 mrg written by @code{write_summary}.
305 1.1 mrg
306 1.1 mrg @item @emph{Execute} (@code{execute} in @code{struct
307 1.1 mrg opt_pass}). This performs inter-procedural propagation. This
308 1.1 mrg must be done without actual access to the individual function
309 1.1 mrg bodies or variable initializers. Typically, this results in a
310 1.1 mrg transitive closure operation over the summary information of all
311 1.1 mrg the nodes in the callgraph.
312 1.1 mrg
313 1.1 mrg @item @emph{Write optimization summary}
314 1.1 mrg (@code{write_optimization_summary} in @code{struct
315 1.1 mrg ipa_opt_pass_d}). This writes the result of the inter-procedural
316 1.1 mrg propagation into the object file. This can use the same data
317 1.1 mrg structures and helper routines used in @code{write_summary}.
318 1.1 mrg @end enumerate
319 1.1 mrg
320 1.1 mrg @item LTRANS time
321 1.1 mrg @enumerate
322 1.1 mrg @item @emph{Read optimization summary}
323 1.1 mrg (@code{read_optimization_summary} in @code{struct
324 1.1 mrg ipa_opt_pass_d}). The counterpart to
325 1.1 mrg @code{write_optimization_summary}. This reads the interprocedural
326 1.1 mrg optimization decisions in exactly the same format emitted by
327 1.1 mrg @code{write_optimization_summary}.
328 1.1 mrg
329 1.1 mrg @item @emph{Transform} (@code{function_transform} and
330 1.1 mrg @code{variable_transform} in @code{struct ipa_opt_pass_d}).
331 1.1 mrg The actual function bodies and variable initializers are updated
332 1.1 mrg based on the information passed down from the @emph{Execute} stage.
333 1.1 mrg @end enumerate
334 1.1 mrg @end itemize
335 1.1 mrg
336 1.1 mrg The implementation of the inter-procedural passes are shared
337 1.1 mrg between LTO, WHOPR and classic non-LTO compilation.
338 1.1 mrg
339 1.1 mrg @itemize
340 1.1 mrg @item During the traditional file-by-file mode every pass executes its
341 1.1 mrg own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
342 1.1 mrg stages within the single execution context of the compiler.
343 1.1 mrg
344 1.1 mrg @item In LTO compilation mode, every pass uses @emph{Generate
345 1.1 mrg summary} and @emph{Write summary} stages at compilation time,
346 1.1 mrg while the @emph{Read summary}, @emph{Execute}, and
347 1.1 mrg @emph{Transform} stages are executed at link time.
348 1.1 mrg
349 1.1 mrg @item In WHOPR mode all stages are used.
350 1.1 mrg @end itemize
351 1.1 mrg
352 1.1 mrg To simplify development, the GCC pass manager differentiates
353 1.10 mrg between normal inter-procedural passes (@pxref{Regular IPA passes}),
354 1.10 mrg small inter-procedural passes (@pxref{Small IPA passes})
355 1.10 mrg and late inter-procedural passes (@pxref{Late IPA passes}).
356 1.10 mrg A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does
357 1.10 mrg everything at once and thus cannot be executed during WPA in
358 1.1 mrg WHOPR mode. It defines only the @emph{Execute} stage and during
359 1.1 mrg this stage it accesses and modifies the function bodies. Such
360 1.1 mrg passes are useful for optimization at LGEN or LTRANS time and are
361 1.1 mrg used, for example, to implement early optimization before writing
362 1.1 mrg object files. The simple inter-procedural passes can also be used
363 1.1 mrg for easier prototyping and development of a new inter-procedural
364 1.1 mrg pass.
365 1.1 mrg
366 1.1 mrg
367 1.1 mrg @subsection Virtual clones
368 1.1 mrg
369 1.1 mrg One of the main challenges of introducing the WHOPR compilation
370 1.1 mrg mode was addressing the interactions between optimization passes.
371 1.1 mrg In LTO compilation mode, the passes are executed in a sequence,
372 1.1 mrg each of which consists of analysis (or @emph{Generate summary}),
373 1.1 mrg propagation (or @emph{Execute}) and @emph{Transform} stages.
374 1.1 mrg Once the work of one pass is finished, the next pass sees the
375 1.1 mrg updated program representation and can execute. This makes the
376 1.1 mrg individual passes dependent on each other.
377 1.1 mrg
378 1.1 mrg In WHOPR mode all passes first execute their @emph{Generate
379 1.1 mrg summary} stage. Then summary writing marks the end of the LGEN
380 1.1 mrg stage. At WPA time,
381 1.1 mrg the summaries are read back into memory and all passes run the
382 1.1 mrg @emph{Execute} stage. Optimization summaries are streamed and
383 1.1 mrg sent to LTRANS, where all the passes execute the @emph{Transform}
384 1.1 mrg stage.
385 1.1 mrg
386 1.1 mrg Most optimization passes split naturally into analysis,
387 1.1 mrg propagation and transformation stages. But some do not. The
388 1.1 mrg main problem arises when one pass performs changes and the
389 1.1 mrg following pass gets confused by seeing different callgraphs
390 1.1 mrg between the @emph{Transform} stage and the @emph{Generate summary}
391 1.1 mrg or @emph{Execute} stage. This means that the passes are required
392 1.1 mrg to communicate their decisions with each other.
393 1.1 mrg
394 1.1 mrg To facilitate this communication, the GCC callgraph
395 1.1 mrg infrastructure implements @emph{virtual clones}, a method of
396 1.1 mrg representing the changes performed by the optimization passes in
397 1.1 mrg the callgraph without needing to update function bodies.
398 1.1 mrg
399 1.1 mrg A @emph{virtual clone} in the callgraph is a function that has no
400 1.1 mrg associated body, just a description of how to create its body based
401 1.1 mrg on a different function (which itself may be a virtual clone).
402 1.1 mrg
403 1.1 mrg The description of function modifications includes adjustments to
404 1.1 mrg the function's signature (which allows, for example, removing or
405 1.1 mrg adding function arguments), substitutions to perform on the
406 1.1 mrg function body, and, for inlined functions, a pointer to the
407 1.1 mrg function that it will be inlined into.
408 1.1 mrg
409 1.1 mrg It is also possible to redirect any edge of the callgraph from a
410 1.1 mrg function to its virtual clone. This implies updating of the call
411 1.1 mrg site to adjust for the new function signature.
412 1.1 mrg
413 1.1 mrg Most of the transformations performed by inter-procedural
414 1.1 mrg optimizations can be represented via virtual clones. For
415 1.1 mrg instance, a constant propagation pass can produce a virtual clone
416 1.1 mrg of the function which replaces one of its arguments by a
417 1.1 mrg constant. The inliner can represent its decisions by producing a
418 1.1 mrg clone of a function whose body will be later integrated into
419 1.1 mrg a given function.
420 1.1 mrg
421 1.1 mrg Using @emph{virtual clones}, the program can be easily updated
422 1.1 mrg during the @emph{Execute} stage, solving most of pass interactions
423 1.1 mrg problems that would otherwise occur during @emph{Transform}.
424 1.1 mrg
425 1.1 mrg Virtual clones are later materialized in the LTRANS stage and
426 1.1 mrg turned into real functions. Passes executed after the virtual
427 1.1 mrg clone were introduced also perform their @emph{Transform} stage
428 1.1 mrg on new functions, so for a pass there is no significant
429 1.1 mrg difference between operating on a real function or a virtual
430 1.1 mrg clone introduced before its @emph{Execute} stage.
431 1.1 mrg
432 1.1 mrg Optimization passes then work on virtual clones introduced before
433 1.1 mrg their @emph{Execute} stage as if they were real functions. The
434 1.1 mrg only difference is that clones are not visible during the
435 1.1 mrg @emph{Generate Summary} stage.
436 1.1 mrg
437 1.1 mrg To keep function summaries updated, the callgraph interface
438 1.1 mrg allows an optimizer to register a callback that is called every
439 1.1 mrg time a new clone is introduced as well as when the actual
440 1.1 mrg function or variable is generated or when a function or variable
441 1.1 mrg is removed. These hooks are registered in the @emph{Generate
442 1.1 mrg summary} stage and allow the pass to keep its information intact
443 1.1 mrg until the @emph{Execute} stage. The same hooks can also be
444 1.1 mrg registered during the @emph{Execute} stage to keep the
445 1.1 mrg optimization summaries updated for the @emph{Transform} stage.
446 1.1 mrg
447 1.1 mrg @subsection IPA references
448 1.1 mrg
449 1.1 mrg GCC represents IPA references in the callgraph. For a function
450 1.1 mrg or variable @code{A}, the @emph{IPA reference} is a list of all
451 1.1 mrg locations where the address of @code{A} is taken and, when
452 1.1 mrg @code{A} is a variable, a list of all direct stores and reads
453 1.1 mrg to/from @code{A}. References represent an oriented multi-graph on
454 1.1 mrg the union of nodes of the callgraph and the varpool. See
455 1.1 mrg @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
456 1.1 mrg and
457 1.1 mrg @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
458 1.1 mrg for details.
459 1.1 mrg
460 1.1 mrg @subsection Jump functions
461 1.1 mrg Suppose that an optimization pass sees a function @code{A} and it
462 1.1 mrg knows the values of (some of) its arguments. The @emph{jump
463 1.1 mrg function} describes the value of a parameter of a given function
464 1.1 mrg call in function @code{A} based on this knowledge.
465 1.1 mrg
466 1.1 mrg Jump functions are used by several optimizations, such as the
467 1.1 mrg inter-procedural constant propagation pass and the
468 1.1 mrg devirtualization pass. The inliner also uses jump functions to
469 1.1 mrg perform inlining of callbacks.
470 1.1 mrg
471 1.1 mrg @node WHOPR
472 1.1 mrg @section Whole program assumptions, linker plugin and symbol visibilities
473 1.1 mrg
474 1.1 mrg Link-time optimization gives relatively minor benefits when used
475 1.1 mrg alone. The problem is that propagation of inter-procedural
476 1.1 mrg information does not work well across functions and variables
477 1.1 mrg that are called or referenced by other compilation units (such as
478 1.1 mrg from a dynamically linked library). We say that such functions
479 1.1 mrg and variables are @emph{externally visible}.
480 1.1 mrg
481 1.1 mrg To make the situation even more difficult, many applications
482 1.1 mrg organize themselves as a set of shared libraries, and the default
483 1.1 mrg ELF visibility rules allow one to overwrite any externally
484 1.1 mrg visible symbol with a different symbol at runtime. This
485 1.1 mrg basically disables any optimizations across such functions and
486 1.1 mrg variables, because the compiler cannot be sure that the function
487 1.1 mrg body it is seeing is the same function body that will be used at
488 1.1 mrg runtime. Any function or variable not declared @code{static} in
489 1.1 mrg the sources degrades the quality of inter-procedural
490 1.1 mrg optimization.
491 1.1 mrg
492 1.1 mrg To avoid this problem the compiler must assume that it sees the
493 1.1 mrg whole program when doing link-time optimization. Strictly
494 1.1 mrg speaking, the whole program is rarely visible even at link-time.
495 1.1 mrg Standard system libraries are usually linked dynamically or not
496 1.1 mrg provided with the link-time information. In GCC, the whole
497 1.1 mrg program option (@option{-fwhole-program}) asserts that every
498 1.1 mrg function and variable defined in the current compilation
499 1.1 mrg unit is static, except for function @code{main} (note: at
500 1.1 mrg link time, the current unit is the union of all objects compiled
501 1.1 mrg with LTO). Since some functions and variables need to
502 1.1 mrg be referenced externally, for example by another DSO or from an
503 1.1 mrg assembler file, GCC also provides the function and variable
504 1.1 mrg attribute @code{externally_visible} which can be used to disable
505 1.1 mrg the effect of @option{-fwhole-program} on a specific symbol.
506 1.1 mrg
507 1.1 mrg The whole program mode assumptions are slightly more complex in
508 1.1 mrg C++, where inline functions in headers are put into @emph{COMDAT}
509 1.1 mrg sections. COMDAT function and variables can be defined by
510 1.1 mrg multiple object files and their bodies are unified at link-time
511 1.1 mrg and dynamic link-time. COMDAT functions are changed to local only
512 1.1 mrg when their address is not taken and thus un-sharing them with a
513 1.1 mrg library is not harmful. COMDAT variables always remain externally
514 1.1 mrg visible, however for readonly variables it is assumed that their
515 1.1 mrg initializers cannot be overwritten by a different value.
516 1.1 mrg
517 1.1 mrg GCC provides the function and variable attribute
518 1.1 mrg @code{visibility} that can be used to specify the visibility of
519 1.1 mrg externally visible symbols (or alternatively an
520 1.1 mrg @option{-fdefault-visibility} command line option). ELF defines
521 1.1 mrg the @code{default}, @code{protected}, @code{hidden} and
522 1.1 mrg @code{internal} visibilities.
523 1.1 mrg
524 1.1 mrg The most commonly used is visibility is @code{hidden}. It
525 1.1 mrg specifies that the symbol cannot be referenced from outside of
526 1.1 mrg the current shared library. Unfortunately, this information
527 1.1 mrg cannot be used directly by the link-time optimization in the
528 1.1 mrg compiler since the whole shared library also might contain
529 1.1 mrg non-LTO objects and those are not visible to the compiler.
530 1.1 mrg
531 1.1 mrg GCC solves this problem using linker plugins. A @emph{linker
532 1.1 mrg plugin} is an interface to the linker that allows an external
533 1.1 mrg program to claim the ownership of a given object file. The linker
534 1.1 mrg then performs the linking procedure by querying the plugin about
535 1.1 mrg the symbol table of the claimed objects and once the linking
536 1.1 mrg decisions are complete, the plugin is allowed to provide the
537 1.1 mrg final object file before the actual linking is made. The linker
538 1.1 mrg plugin obtains the symbol resolution information which specifies
539 1.1 mrg which symbols provided by the claimed objects are bound from the
540 1.1 mrg rest of a binary being linked.
541 1.1 mrg
542 1.1 mrg GCC is designed to be independent of the rest of the toolchain
543 1.1 mrg and aims to support linkers without plugin support. For this
544 1.1 mrg reason it does not use the linker plugin by default. Instead,
545 1.1 mrg the object files are examined by @command{collect2} before being
546 1.1 mrg passed to the linker and objects found to have LTO sections are
547 1.1 mrg passed to @command{lto1} first. This mode does not work for
548 1.1 mrg library archives. The decision on what object files from the
549 1.1 mrg archive are needed depends on the actual linking and thus GCC
550 1.1 mrg would have to implement the linker itself. The resolution
551 1.1 mrg information is missing too and thus GCC needs to make an educated
552 1.1 mrg guess based on @option{-fwhole-program}. Without the linker
553 1.1 mrg plugin GCC also assumes that symbols are declared @code{hidden}
554 1.1 mrg and not referred by non-LTO code by default.
555 1.1 mrg
556 1.1 mrg @node Internal flags
557 1.1 mrg @section Internal flags controlling @code{lto1}
558 1.1 mrg
559 1.1 mrg The following flags are passed into @command{lto1} and are not
560 1.1 mrg meant to be used directly from the command line.
561 1.1 mrg
562 1.1 mrg @itemize
563 1.1 mrg @item -fwpa
564 1.1 mrg @opindex fwpa
565 1.1 mrg This option runs the serial part of the link-time optimizer
566 1.1 mrg performing the inter-procedural propagation (WPA mode). The
567 1.1 mrg compiler reads in summary information from all inputs and
568 1.1 mrg performs an analysis based on summary information only. It
569 1.1 mrg generates object files for subsequent runs of the link-time
570 1.1 mrg optimizer where individual object files are optimized using both
571 1.1 mrg summary information from the WPA mode and the actual function
572 1.1 mrg bodies. It then drives the LTRANS phase.
573 1.1 mrg
574 1.1 mrg @item -fltrans
575 1.1 mrg @opindex fltrans
576 1.1 mrg This option runs the link-time optimizer in the
577 1.1 mrg local-transformation (LTRANS) mode, which reads in output from a
578 1.1 mrg previous run of the LTO in WPA mode. In the LTRANS mode, LTO
579 1.1 mrg optimizes an object and produces the final assembly.
580 1.1 mrg
581 1.1 mrg @item -fltrans-output-list=@var{file}
582 1.1 mrg @opindex fltrans-output-list
583 1.1 mrg This option specifies a file to which the names of LTRANS output
584 1.1 mrg files are written. This option is only meaningful in conjunction
585 1.1 mrg with @option{-fwpa}.
586 1.3 mrg
587 1.3 mrg @item -fresolution=@var{file}
588 1.3 mrg @opindex fresolution
589 1.3 mrg This option specifies the linker resolution file. This option is
590 1.3 mrg only meaningful in conjunction with @option{-fwpa} and as option
591 1.3 mrg to pass through to the LTO linker plugin.
592 1.1 mrg @end itemize
593