bb-reorder.cc revision 1.1 1 1.1 mrg /* Basic block reordering routines for the GNU compiler.
2 1.1 mrg Copyright (C) 2000-2022 Free Software Foundation, Inc.
3 1.1 mrg
4 1.1 mrg This file is part of GCC.
5 1.1 mrg
6 1.1 mrg GCC is free software; you can redistribute it and/or modify it
7 1.1 mrg under the terms of the GNU General Public License as published by
8 1.1 mrg the Free Software Foundation; either version 3, or (at your option)
9 1.1 mrg any later version.
10 1.1 mrg
11 1.1 mrg GCC is distributed in the hope that it will be useful, but WITHOUT
12 1.1 mrg ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13 1.1 mrg or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
14 1.1 mrg License for more details.
15 1.1 mrg
16 1.1 mrg You should have received a copy of the GNU General Public License
17 1.1 mrg along with GCC; see the file COPYING3. If not see
18 1.1 mrg <http://www.gnu.org/licenses/>. */
19 1.1 mrg
20 1.1 mrg /* This file contains the "reorder blocks" pass, which changes the control
21 1.1 mrg flow of a function to encounter fewer branches; the "partition blocks"
22 1.1 mrg pass, which divides the basic blocks into "hot" and "cold" partitions,
23 1.1 mrg which are kept separate; and the "duplicate computed gotos" pass, which
24 1.1 mrg duplicates blocks ending in an indirect jump.
25 1.1 mrg
26 1.1 mrg There are two algorithms for "reorder blocks": the "simple" algorithm,
27 1.1 mrg which just rearranges blocks, trying to minimize the number of executed
28 1.1 mrg unconditional branches; and the "software trace cache" algorithm, which
29 1.1 mrg also copies code, and in general tries a lot harder to have long linear
30 1.1 mrg pieces of machine code executed. This algorithm is described next. */
31 1.1 mrg
32 1.1 mrg /* This (greedy) algorithm constructs traces in several rounds.
33 1.1 mrg The construction starts from "seeds". The seed for the first round
34 1.1 mrg is the entry point of the function. When there are more than one seed,
35 1.1 mrg the one with the lowest key in the heap is selected first (see bb_to_key).
36 1.1 mrg Then the algorithm repeatedly adds the most probable successor to the end
37 1.1 mrg of a trace. Finally it connects the traces.
38 1.1 mrg
39 1.1 mrg There are two parameters: Branch Threshold and Exec Threshold.
40 1.1 mrg If the probability of an edge to a successor of the current basic block is
41 1.1 mrg lower than Branch Threshold or its count is lower than Exec Threshold,
42 1.1 mrg then the successor will be the seed in one of the next rounds.
43 1.1 mrg Each round has these parameters lower than the previous one.
44 1.1 mrg The last round has to have these parameters set to zero so that the
45 1.1 mrg remaining blocks are picked up.
46 1.1 mrg
47 1.1 mrg The algorithm selects the most probable successor from all unvisited
48 1.1 mrg successors and successors that have been added to this trace.
49 1.1 mrg The other successors (that has not been "sent" to the next round) will be
50 1.1 mrg other seeds for this round and the secondary traces will start from them.
51 1.1 mrg If the successor has not been visited in this trace, it is added to the
52 1.1 mrg trace (however, there is some heuristic for simple branches).
53 1.1 mrg If the successor has been visited in this trace, a loop has been found.
54 1.1 mrg If the loop has many iterations, the loop is rotated so that the source
55 1.1 mrg block of the most probable edge going out of the loop is the last block
56 1.1 mrg of the trace.
57 1.1 mrg If the loop has few iterations and there is no edge from the last block of
58 1.1 mrg the loop going out of the loop, the loop header is duplicated.
59 1.1 mrg
60 1.1 mrg When connecting traces, the algorithm first checks whether there is an edge
61 1.1 mrg from the last block of a trace to the first block of another trace.
62 1.1 mrg When there are still some unconnected traces it checks whether there exists
63 1.1 mrg a basic block BB such that BB is a successor of the last block of a trace
64 1.1 mrg and BB is a predecessor of the first block of another trace. In this case,
65 1.1 mrg BB is duplicated, added at the end of the first trace and the traces are
66 1.1 mrg connected through it.
67 1.1 mrg The rest of traces are simply connected so there will be a jump to the
68 1.1 mrg beginning of the rest of traces.
69 1.1 mrg
70 1.1 mrg The above description is for the full algorithm, which is used when the
71 1.1 mrg function is optimized for speed. When the function is optimized for size,
72 1.1 mrg in order to reduce long jumps and connect more fallthru edges, the
73 1.1 mrg algorithm is modified as follows:
74 1.1 mrg (1) Break long traces to short ones. A trace is broken at a block that has
75 1.1 mrg multiple predecessors/ successors during trace discovery. When connecting
76 1.1 mrg traces, only connect Trace n with Trace n + 1. This change reduces most
77 1.1 mrg long jumps compared with the above algorithm.
78 1.1 mrg (2) Ignore the edge probability and count for fallthru edges.
79 1.1 mrg (3) Keep the original order of blocks when there is no chance to fall
80 1.1 mrg through. We rely on the results of cfg_cleanup.
81 1.1 mrg
82 1.1 mrg To implement the change for code size optimization, block's index is
83 1.1 mrg selected as the key and all traces are found in one round.
84 1.1 mrg
85 1.1 mrg References:
86 1.1 mrg
87 1.1 mrg "Software Trace Cache"
88 1.1 mrg A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999
89 1.1 mrg http://citeseer.nj.nec.com/15361.html
90 1.1 mrg
91 1.1 mrg */
92 1.1 mrg
93 1.1 mrg #include "config.h"
94 1.1 mrg #include "system.h"
95 1.1 mrg #include "coretypes.h"
96 1.1 mrg #include "backend.h"
97 1.1 mrg #include "target.h"
98 1.1 mrg #include "rtl.h"
99 1.1 mrg #include "tree.h"
100 1.1 mrg #include "cfghooks.h"
101 1.1 mrg #include "df.h"
102 1.1 mrg #include "memmodel.h"
103 1.1 mrg #include "optabs.h"
104 1.1 mrg #include "regs.h"
105 1.1 mrg #include "emit-rtl.h"
106 1.1 mrg #include "output.h"
107 1.1 mrg #include "expr.h"
108 1.1 mrg #include "tree-pass.h"
109 1.1 mrg #include "cfgrtl.h"
110 1.1 mrg #include "cfganal.h"
111 1.1 mrg #include "cfgbuild.h"
112 1.1 mrg #include "cfgcleanup.h"
113 1.1 mrg #include "bb-reorder.h"
114 1.1 mrg #include "except.h"
115 1.1 mrg #include "alloc-pool.h"
116 1.1 mrg #include "fibonacci_heap.h"
117 1.1 mrg #include "stringpool.h"
118 1.1 mrg #include "attribs.h"
119 1.1 mrg #include "common/common-target.h"
120 1.1 mrg
121 1.1 mrg /* The number of rounds. In most cases there will only be 4 rounds, but
122 1.1 mrg when partitioning hot and cold basic blocks into separate sections of
123 1.1 mrg the object file there will be an extra round. */
124 1.1 mrg #define N_ROUNDS 5
125 1.1 mrg
126 1.1 mrg struct target_bb_reorder default_target_bb_reorder;
127 1.1 mrg #if SWITCHABLE_TARGET
128 1.1 mrg struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder;
129 1.1 mrg #endif
130 1.1 mrg
131 1.1 mrg #define uncond_jump_length \
132 1.1 mrg (this_target_bb_reorder->x_uncond_jump_length)
133 1.1 mrg
134 1.1 mrg /* Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. */
135 1.1 mrg static const int branch_threshold[N_ROUNDS] = {400, 200, 100, 0, 0};
136 1.1 mrg
137 1.1 mrg /* Exec thresholds in thousandths (per mille) of the count of bb 0. */
138 1.1 mrg static const int exec_threshold[N_ROUNDS] = {500, 200, 50, 0, 0};
139 1.1 mrg
140 1.1 mrg /* If edge count is lower than DUPLICATION_THRESHOLD per mille of entry
141 1.1 mrg block the edge destination is not duplicated while connecting traces. */
142 1.1 mrg #define DUPLICATION_THRESHOLD 100
143 1.1 mrg
144 1.1 mrg typedef fibonacci_heap <long, basic_block_def> bb_heap_t;
145 1.1 mrg typedef fibonacci_node <long, basic_block_def> bb_heap_node_t;
146 1.1 mrg
147 1.1 mrg /* Structure to hold needed information for each basic block. */
148 1.1 mrg struct bbro_basic_block_data
149 1.1 mrg {
150 1.1 mrg /* Which trace is the bb start of (-1 means it is not a start of any). */
151 1.1 mrg int start_of_trace;
152 1.1 mrg
153 1.1 mrg /* Which trace is the bb end of (-1 means it is not an end of any). */
154 1.1 mrg int end_of_trace;
155 1.1 mrg
156 1.1 mrg /* Which trace is the bb in? */
157 1.1 mrg int in_trace;
158 1.1 mrg
159 1.1 mrg /* Which trace was this bb visited in? */
160 1.1 mrg int visited;
161 1.1 mrg
162 1.1 mrg /* Cached maximum frequency of interesting incoming edges.
163 1.1 mrg Minus one means not yet computed. */
164 1.1 mrg int priority;
165 1.1 mrg
166 1.1 mrg /* Which heap is BB in (if any)? */
167 1.1 mrg bb_heap_t *heap;
168 1.1 mrg
169 1.1 mrg /* Which heap node is BB in (if any)? */
170 1.1 mrg bb_heap_node_t *node;
171 1.1 mrg };
172 1.1 mrg
173 1.1 mrg /* The current size of the following dynamic array. */
174 1.1 mrg static int array_size;
175 1.1 mrg
176 1.1 mrg /* The array which holds needed information for basic blocks. */
177 1.1 mrg static bbro_basic_block_data *bbd;
178 1.1 mrg
179 1.1 mrg /* To avoid frequent reallocation the size of arrays is greater than needed,
180 1.1 mrg the number of elements is (not less than) 1.25 * size_wanted. */
181 1.1 mrg #define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5)
182 1.1 mrg
183 1.1 mrg /* Free the memory and set the pointer to NULL. */
184 1.1 mrg #define FREE(P) (gcc_assert (P), free (P), P = 0)
185 1.1 mrg
186 1.1 mrg /* Structure for holding information about a trace. */
187 1.1 mrg struct trace
188 1.1 mrg {
189 1.1 mrg /* First and last basic block of the trace. */
190 1.1 mrg basic_block first, last;
191 1.1 mrg
192 1.1 mrg /* The round of the STC creation which this trace was found in. */
193 1.1 mrg int round;
194 1.1 mrg
195 1.1 mrg /* The length (i.e. the number of basic blocks) of the trace. */
196 1.1 mrg int length;
197 1.1 mrg };
198 1.1 mrg
199 1.1 mrg /* Maximum count of one of the entry blocks. */
200 1.1 mrg static profile_count max_entry_count;
201 1.1 mrg
202 1.1 mrg /* Local function prototypes. */
203 1.1 mrg static void find_traces_1_round (int, profile_count, struct trace *, int *,
204 1.1 mrg int, bb_heap_t **, int);
205 1.1 mrg static basic_block copy_bb (basic_block, edge, basic_block, int);
206 1.1 mrg static long bb_to_key (basic_block);
207 1.1 mrg static bool better_edge_p (const_basic_block, const_edge, profile_probability,
208 1.1 mrg profile_count, profile_probability, profile_count,
209 1.1 mrg const_edge);
210 1.1 mrg static bool copy_bb_p (const_basic_block, int);
211 1.1 mrg
212 1.1 mrg /* Return the trace number in which BB was visited. */
214 1.1 mrg
215 1.1 mrg static int
216 1.1 mrg bb_visited_trace (const_basic_block bb)
217 1.1 mrg {
218 1.1 mrg gcc_assert (bb->index < array_size);
219 1.1 mrg return bbd[bb->index].visited;
220 1.1 mrg }
221 1.1 mrg
222 1.1 mrg /* This function marks BB that it was visited in trace number TRACE. */
223 1.1 mrg
224 1.1 mrg static void
225 1.1 mrg mark_bb_visited (basic_block bb, int trace)
226 1.1 mrg {
227 1.1 mrg bbd[bb->index].visited = trace;
228 1.1 mrg if (bbd[bb->index].heap)
229 1.1 mrg {
230 1.1 mrg bbd[bb->index].heap->delete_node (bbd[bb->index].node);
231 1.1 mrg bbd[bb->index].heap = NULL;
232 1.1 mrg bbd[bb->index].node = NULL;
233 1.1 mrg }
234 1.1 mrg }
235 1.1 mrg
236 1.1 mrg /* Check to see if bb should be pushed into the next round of trace
237 1.1 mrg collections or not. Reasons for pushing the block forward are 1).
238 1.1 mrg If the block is cold, we are doing partitioning, and there will be
239 1.1 mrg another round (cold partition blocks are not supposed to be
240 1.1 mrg collected into traces until the very last round); or 2). There will
241 1.1 mrg be another round, and the basic block is not "hot enough" for the
242 1.1 mrg current round of trace collection. */
243 1.1 mrg
244 1.1 mrg static bool
245 1.1 mrg push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds,
246 1.1 mrg profile_count count_th)
247 1.1 mrg {
248 1.1 mrg bool there_exists_another_round;
249 1.1 mrg bool block_not_hot_enough;
250 1.1 mrg
251 1.1 mrg there_exists_another_round = round < number_of_rounds - 1;
252 1.1 mrg
253 1.1 mrg block_not_hot_enough = (bb->count < count_th
254 1.1 mrg || probably_never_executed_bb_p (cfun, bb));
255 1.1 mrg
256 1.1 mrg if (there_exists_another_round
257 1.1 mrg && block_not_hot_enough)
258 1.1 mrg return true;
259 1.1 mrg else
260 1.1 mrg return false;
261 1.1 mrg }
262 1.1 mrg
263 1.1 mrg /* Find the traces for Software Trace Cache. Chain each trace through
264 1.1 mrg RBI()->next. Store the number of traces to N_TRACES and description of
265 1.1 mrg traces to TRACES. */
266 1.1 mrg
267 1.1 mrg static void
268 1.1 mrg find_traces (int *n_traces, struct trace *traces)
269 1.1 mrg {
270 1.1 mrg int i;
271 1.1 mrg int number_of_rounds;
272 1.1 mrg edge e;
273 1.1 mrg edge_iterator ei;
274 1.1 mrg bb_heap_t *heap = new bb_heap_t (LONG_MIN);
275 1.1 mrg
276 1.1 mrg /* Add one extra round of trace collection when partitioning hot/cold
277 1.1 mrg basic blocks into separate sections. The last round is for all the
278 1.1 mrg cold blocks (and ONLY the cold blocks). */
279 1.1 mrg
280 1.1 mrg number_of_rounds = N_ROUNDS - 1;
281 1.1 mrg
282 1.1 mrg /* Insert entry points of function into heap. */
283 1.1 mrg max_entry_count = profile_count::zero ();
284 1.1 mrg FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
285 1.1 mrg {
286 1.1 mrg bbd[e->dest->index].heap = heap;
287 1.1 mrg bbd[e->dest->index].node = heap->insert (bb_to_key (e->dest), e->dest);
288 1.1 mrg if (e->dest->count > max_entry_count)
289 1.1 mrg max_entry_count = e->dest->count;
290 1.1 mrg }
291 1.1 mrg
292 1.1 mrg /* Find the traces. */
293 1.1 mrg for (i = 0; i < number_of_rounds; i++)
294 1.1 mrg {
295 1.1 mrg profile_count count_threshold;
296 1.1 mrg
297 1.1 mrg if (dump_file)
298 1.1 mrg fprintf (dump_file, "STC - round %d\n", i + 1);
299 1.1 mrg
300 1.1 mrg count_threshold = max_entry_count.apply_scale (exec_threshold[i], 1000);
301 1.1 mrg
302 1.1 mrg find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / 1000,
303 1.1 mrg count_threshold, traces, n_traces, i, &heap,
304 1.1 mrg number_of_rounds);
305 1.1 mrg }
306 1.1 mrg delete heap;
307 1.1 mrg
308 1.1 mrg if (dump_file)
309 1.1 mrg {
310 1.1 mrg for (i = 0; i < *n_traces; i++)
311 1.1 mrg {
312 1.1 mrg basic_block bb;
313 1.1 mrg fprintf (dump_file, "Trace %d (round %d): ", i + 1,
314 1.1 mrg traces[i].round + 1);
315 1.1 mrg for (bb = traces[i].first;
316 1.1 mrg bb != traces[i].last;
317 1.1 mrg bb = (basic_block) bb->aux)
318 1.1 mrg {
319 1.1 mrg fprintf (dump_file, "%d [", bb->index);
320 1.1 mrg bb->count.dump (dump_file);
321 1.1 mrg fprintf (dump_file, "] ");
322 1.1 mrg }
323 1.1 mrg fprintf (dump_file, "%d [", bb->index);
324 1.1 mrg bb->count.dump (dump_file);
325 1.1 mrg fprintf (dump_file, "]\n");
326 1.1 mrg }
327 1.1 mrg fflush (dump_file);
328 1.1 mrg }
329 1.1 mrg }
330 1.1 mrg
331 1.1 mrg /* Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE
332 1.1 mrg (with sequential number TRACE_N). */
333 1.1 mrg
334 1.1 mrg static basic_block
335 1.1 mrg rotate_loop (edge back_edge, struct trace *trace, int trace_n)
336 1.1 mrg {
337 1.1 mrg basic_block bb;
338 1.1 mrg
339 1.1 mrg /* Information about the best end (end after rotation) of the loop. */
340 1.1 mrg basic_block best_bb = NULL;
341 1.1 mrg edge best_edge = NULL;
342 1.1 mrg profile_count best_count = profile_count::uninitialized ();
343 1.1 mrg /* The best edge is preferred when its destination is not visited yet
344 1.1 mrg or is a start block of some trace. */
345 1.1 mrg bool is_preferred = false;
346 1.1 mrg
347 1.1 mrg /* Find the most frequent edge that goes out from current trace. */
348 1.1 mrg bb = back_edge->dest;
349 1.1 mrg do
350 1.1 mrg {
351 1.1 mrg edge e;
352 1.1 mrg edge_iterator ei;
353 1.1 mrg
354 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
355 1.1 mrg if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
356 1.1 mrg && bb_visited_trace (e->dest) != trace_n
357 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU)
358 1.1 mrg && !(e->flags & EDGE_COMPLEX))
359 1.1 mrg {
360 1.1 mrg if (is_preferred)
361 1.1 mrg {
362 1.1 mrg /* The best edge is preferred. */
363 1.1 mrg if (!bb_visited_trace (e->dest)
364 1.1 mrg || bbd[e->dest->index].start_of_trace >= 0)
365 1.1 mrg {
366 1.1 mrg /* The current edge E is also preferred. */
367 1.1 mrg if (e->count () > best_count)
368 1.1 mrg {
369 1.1 mrg best_count = e->count ();
370 1.1 mrg best_edge = e;
371 1.1 mrg best_bb = bb;
372 1.1 mrg }
373 1.1 mrg }
374 1.1 mrg }
375 1.1 mrg else
376 1.1 mrg {
377 1.1 mrg if (!bb_visited_trace (e->dest)
378 1.1 mrg || bbd[e->dest->index].start_of_trace >= 0)
379 1.1 mrg {
380 1.1 mrg /* The current edge E is preferred. */
381 1.1 mrg is_preferred = true;
382 1.1 mrg best_count = e->count ();
383 1.1 mrg best_edge = e;
384 1.1 mrg best_bb = bb;
385 1.1 mrg }
386 1.1 mrg else
387 1.1 mrg {
388 1.1 mrg if (!best_edge || e->count () > best_count)
389 1.1 mrg {
390 1.1 mrg best_count = e->count ();
391 1.1 mrg best_edge = e;
392 1.1 mrg best_bb = bb;
393 1.1 mrg }
394 1.1 mrg }
395 1.1 mrg }
396 1.1 mrg }
397 1.1 mrg bb = (basic_block) bb->aux;
398 1.1 mrg }
399 1.1 mrg while (bb != back_edge->dest);
400 1.1 mrg
401 1.1 mrg if (best_bb)
402 1.1 mrg {
403 1.1 mrg /* Rotate the loop so that the BEST_EDGE goes out from the last block of
404 1.1 mrg the trace. */
405 1.1 mrg if (back_edge->dest == trace->first)
406 1.1 mrg {
407 1.1 mrg trace->first = (basic_block) best_bb->aux;
408 1.1 mrg }
409 1.1 mrg else
410 1.1 mrg {
411 1.1 mrg basic_block prev_bb;
412 1.1 mrg
413 1.1 mrg for (prev_bb = trace->first;
414 1.1 mrg prev_bb->aux != back_edge->dest;
415 1.1 mrg prev_bb = (basic_block) prev_bb->aux)
416 1.1 mrg ;
417 1.1 mrg prev_bb->aux = best_bb->aux;
418 1.1 mrg
419 1.1 mrg /* Try to get rid of uncond jump to cond jump. */
420 1.1 mrg if (single_succ_p (prev_bb))
421 1.1 mrg {
422 1.1 mrg basic_block header = single_succ (prev_bb);
423 1.1 mrg
424 1.1 mrg /* Duplicate HEADER if it is a small block containing cond jump
425 1.1 mrg in the end. */
426 1.1 mrg if (any_condjump_p (BB_END (header)) && copy_bb_p (header, 0)
427 1.1 mrg && !CROSSING_JUMP_P (BB_END (header)))
428 1.1 mrg copy_bb (header, single_succ_edge (prev_bb), prev_bb, trace_n);
429 1.1 mrg }
430 1.1 mrg }
431 1.1 mrg }
432 1.1 mrg else
433 1.1 mrg {
434 1.1 mrg /* We have not found suitable loop tail so do no rotation. */
435 1.1 mrg best_bb = back_edge->src;
436 1.1 mrg }
437 1.1 mrg best_bb->aux = NULL;
438 1.1 mrg return best_bb;
439 1.1 mrg }
440 1.1 mrg
441 1.1 mrg /* One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do
442 1.1 mrg not include basic blocks whose probability is lower than BRANCH_TH or whose
443 1.1 mrg count is lower than EXEC_TH into traces (or whose count is lower than
444 1.1 mrg COUNT_TH). Store the new traces into TRACES and modify the number of
445 1.1 mrg traces *N_TRACES. Set the round (which the trace belongs to) to ROUND.
446 1.1 mrg The function expects starting basic blocks to be in *HEAP and will delete
447 1.1 mrg *HEAP and store starting points for the next round into new *HEAP. */
448 1.1 mrg
449 1.1 mrg static void
450 1.1 mrg find_traces_1_round (int branch_th, profile_count count_th,
451 1.1 mrg struct trace *traces, int *n_traces, int round,
452 1.1 mrg bb_heap_t **heap, int number_of_rounds)
453 1.1 mrg {
454 1.1 mrg /* Heap for discarded basic blocks which are possible starting points for
455 1.1 mrg the next round. */
456 1.1 mrg bb_heap_t *new_heap = new bb_heap_t (LONG_MIN);
457 1.1 mrg bool for_size = optimize_function_for_size_p (cfun);
458 1.1 mrg
459 1.1 mrg while (!(*heap)->empty ())
460 1.1 mrg {
461 1.1 mrg basic_block bb;
462 1.1 mrg struct trace *trace;
463 1.1 mrg edge best_edge, e;
464 1.1 mrg long key;
465 1.1 mrg edge_iterator ei;
466 1.1 mrg
467 1.1 mrg bb = (*heap)->extract_min ();
468 1.1 mrg bbd[bb->index].heap = NULL;
469 1.1 mrg bbd[bb->index].node = NULL;
470 1.1 mrg
471 1.1 mrg if (dump_file)
472 1.1 mrg fprintf (dump_file, "Getting bb %d\n", bb->index);
473 1.1 mrg
474 1.1 mrg /* If the BB's count is too low, send BB to the next round. When
475 1.1 mrg partitioning hot/cold blocks into separate sections, make sure all
476 1.1 mrg the cold blocks (and ONLY the cold blocks) go into the (extra) final
477 1.1 mrg round. When optimizing for size, do not push to next round. */
478 1.1 mrg
479 1.1 mrg if (!for_size
480 1.1 mrg && push_to_next_round_p (bb, round, number_of_rounds,
481 1.1 mrg count_th))
482 1.1 mrg {
483 1.1 mrg int key = bb_to_key (bb);
484 1.1 mrg bbd[bb->index].heap = new_heap;
485 1.1 mrg bbd[bb->index].node = new_heap->insert (key, bb);
486 1.1 mrg
487 1.1 mrg if (dump_file)
488 1.1 mrg fprintf (dump_file,
489 1.1 mrg " Possible start point of next round: %d (key: %d)\n",
490 1.1 mrg bb->index, key);
491 1.1 mrg continue;
492 1.1 mrg }
493 1.1 mrg
494 1.1 mrg trace = traces + *n_traces;
495 1.1 mrg trace->first = bb;
496 1.1 mrg trace->round = round;
497 1.1 mrg trace->length = 0;
498 1.1 mrg bbd[bb->index].in_trace = *n_traces;
499 1.1 mrg (*n_traces)++;
500 1.1 mrg
501 1.1 mrg do
502 1.1 mrg {
503 1.1 mrg bool ends_in_call;
504 1.1 mrg
505 1.1 mrg /* The probability and count of the best edge. */
506 1.1 mrg profile_probability best_prob = profile_probability::uninitialized ();
507 1.1 mrg profile_count best_count = profile_count::uninitialized ();
508 1.1 mrg
509 1.1 mrg best_edge = NULL;
510 1.1 mrg mark_bb_visited (bb, *n_traces);
511 1.1 mrg trace->length++;
512 1.1 mrg
513 1.1 mrg if (dump_file)
514 1.1 mrg fprintf (dump_file, "Basic block %d was visited in trace %d\n",
515 1.1 mrg bb->index, *n_traces);
516 1.1 mrg
517 1.1 mrg ends_in_call = block_ends_with_call_p (bb);
518 1.1 mrg
519 1.1 mrg /* Select the successor that will be placed after BB. */
520 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
521 1.1 mrg {
522 1.1 mrg gcc_assert (!(e->flags & EDGE_FAKE));
523 1.1 mrg
524 1.1 mrg if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
525 1.1 mrg continue;
526 1.1 mrg
527 1.1 mrg if (bb_visited_trace (e->dest)
528 1.1 mrg && bb_visited_trace (e->dest) != *n_traces)
529 1.1 mrg continue;
530 1.1 mrg
531 1.1 mrg /* If partitioning hot/cold basic blocks, don't consider edges
532 1.1 mrg that cross section boundaries. */
533 1.1 mrg if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
534 1.1 mrg continue;
535 1.1 mrg
536 1.1 mrg profile_probability prob = e->probability;
537 1.1 mrg profile_count count = e->dest->count;
538 1.1 mrg
539 1.1 mrg /* The only sensible preference for a call instruction is the
540 1.1 mrg fallthru edge. Don't bother selecting anything else. */
541 1.1 mrg if (ends_in_call)
542 1.1 mrg {
543 1.1 mrg if (e->flags & EDGE_CAN_FALLTHRU)
544 1.1 mrg {
545 1.1 mrg best_edge = e;
546 1.1 mrg best_prob = prob;
547 1.1 mrg best_count = count;
548 1.1 mrg }
549 1.1 mrg continue;
550 1.1 mrg }
551 1.1 mrg
552 1.1 mrg /* Edge that cannot be fallthru or improbable or infrequent
553 1.1 mrg successor (i.e. it is unsuitable successor). When optimizing
554 1.1 mrg for size, ignore the probability and count. */
555 1.1 mrg if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX)
556 1.1 mrg || !prob.initialized_p ()
557 1.1 mrg || ((prob.to_reg_br_prob_base () < branch_th
558 1.1 mrg || e->count () < count_th) && (!for_size)))
559 1.1 mrg continue;
560 1.1 mrg
561 1.1 mrg if (better_edge_p (bb, e, prob, count, best_prob, best_count,
562 1.1 mrg best_edge))
563 1.1 mrg {
564 1.1 mrg best_edge = e;
565 1.1 mrg best_prob = prob;
566 1.1 mrg best_count = count;
567 1.1 mrg }
568 1.1 mrg }
569 1.1 mrg
570 1.1 mrg /* If the best destination has multiple predecessors and can be
571 1.1 mrg duplicated cheaper than a jump, don't allow it to be added to
572 1.1 mrg a trace; we'll duplicate it when connecting the traces later.
573 1.1 mrg However, we need to check that this duplication wouldn't leave
574 1.1 mrg the best destination with only crossing predecessors, because
575 1.1 mrg this would change its effective partition from hot to cold. */
576 1.1 mrg if (best_edge
577 1.1 mrg && EDGE_COUNT (best_edge->dest->preds) >= 2
578 1.1 mrg && copy_bb_p (best_edge->dest, 0))
579 1.1 mrg {
580 1.1 mrg bool only_crossing_preds = true;
581 1.1 mrg edge e;
582 1.1 mrg edge_iterator ei;
583 1.1 mrg FOR_EACH_EDGE (e, ei, best_edge->dest->preds)
584 1.1 mrg if (e != best_edge && !(e->flags & EDGE_CROSSING))
585 1.1 mrg {
586 1.1 mrg only_crossing_preds = false;
587 1.1 mrg break;
588 1.1 mrg }
589 1.1 mrg if (!only_crossing_preds)
590 1.1 mrg best_edge = NULL;
591 1.1 mrg }
592 1.1 mrg
593 1.1 mrg /* If the best destination has multiple successors or predecessors,
594 1.1 mrg don't allow it to be added when optimizing for size. This makes
595 1.1 mrg sure predecessors with smaller index are handled before the best
596 1.1 mrg destination. It breaks long trace and reduces long jumps.
597 1.1 mrg
598 1.1 mrg Take if-then-else as an example.
599 1.1 mrg A
600 1.1 mrg / \
601 1.1 mrg B C
602 1.1 mrg \ /
603 1.1 mrg D
604 1.1 mrg If we do not remove the best edge B->D/C->D, the final order might
605 1.1 mrg be A B D ... C. C is at the end of the program. If D's successors
606 1.1 mrg and D are complicated, might need long jumps for A->C and C->D.
607 1.1 mrg Similar issue for order: A C D ... B.
608 1.1 mrg
609 1.1 mrg After removing the best edge, the final result will be ABCD/ ACBD.
610 1.1 mrg It does not add jump compared with the previous order. But it
611 1.1 mrg reduces the possibility of long jumps. */
612 1.1 mrg if (best_edge && for_size
613 1.1 mrg && (EDGE_COUNT (best_edge->dest->succs) > 1
614 1.1 mrg || EDGE_COUNT (best_edge->dest->preds) > 1))
615 1.1 mrg best_edge = NULL;
616 1.1 mrg
617 1.1 mrg /* Add all non-selected successors to the heaps. */
618 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
619 1.1 mrg {
620 1.1 mrg if (e == best_edge
621 1.1 mrg || e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
622 1.1 mrg || bb_visited_trace (e->dest))
623 1.1 mrg continue;
624 1.1 mrg
625 1.1 mrg key = bb_to_key (e->dest);
626 1.1 mrg
627 1.1 mrg if (bbd[e->dest->index].heap)
628 1.1 mrg {
629 1.1 mrg /* E->DEST is already in some heap. */
630 1.1 mrg if (key != bbd[e->dest->index].node->get_key ())
631 1.1 mrg {
632 1.1 mrg if (dump_file)
633 1.1 mrg {
634 1.1 mrg fprintf (dump_file,
635 1.1 mrg "Changing key for bb %d from %ld to %ld.\n",
636 1.1 mrg e->dest->index,
637 1.1 mrg (long) bbd[e->dest->index].node->get_key (),
638 1.1 mrg key);
639 1.1 mrg }
640 1.1 mrg bbd[e->dest->index].heap->replace_key
641 1.1 mrg (bbd[e->dest->index].node, key);
642 1.1 mrg }
643 1.1 mrg }
644 1.1 mrg else
645 1.1 mrg {
646 1.1 mrg bb_heap_t *which_heap = *heap;
647 1.1 mrg
648 1.1 mrg profile_probability prob = e->probability;
649 1.1 mrg
650 1.1 mrg if (!(e->flags & EDGE_CAN_FALLTHRU)
651 1.1 mrg || (e->flags & EDGE_COMPLEX)
652 1.1 mrg || !prob.initialized_p ()
653 1.1 mrg || prob.to_reg_br_prob_base () < branch_th
654 1.1 mrg || e->count () < count_th)
655 1.1 mrg {
656 1.1 mrg /* When partitioning hot/cold basic blocks, make sure
657 1.1 mrg the cold blocks (and only the cold blocks) all get
658 1.1 mrg pushed to the last round of trace collection. When
659 1.1 mrg optimizing for size, do not push to next round. */
660 1.1 mrg
661 1.1 mrg if (!for_size && push_to_next_round_p (e->dest, round,
662 1.1 mrg number_of_rounds,
663 1.1 mrg count_th))
664 1.1 mrg which_heap = new_heap;
665 1.1 mrg }
666 1.1 mrg
667 1.1 mrg bbd[e->dest->index].heap = which_heap;
668 1.1 mrg bbd[e->dest->index].node = which_heap->insert (key, e->dest);
669 1.1 mrg
670 1.1 mrg if (dump_file)
671 1.1 mrg {
672 1.1 mrg fprintf (dump_file,
673 1.1 mrg " Possible start of %s round: %d (key: %ld)\n",
674 1.1 mrg (which_heap == new_heap) ? "next" : "this",
675 1.1 mrg e->dest->index, (long) key);
676 1.1 mrg }
677 1.1 mrg
678 1.1 mrg }
679 1.1 mrg }
680 1.1 mrg
681 1.1 mrg if (best_edge) /* Suitable successor was found. */
682 1.1 mrg {
683 1.1 mrg if (bb_visited_trace (best_edge->dest) == *n_traces)
684 1.1 mrg {
685 1.1 mrg /* We do nothing with one basic block loops. */
686 1.1 mrg if (best_edge->dest != bb)
687 1.1 mrg {
688 1.1 mrg if (best_edge->count ()
689 1.1 mrg > best_edge->dest->count.apply_scale (4, 5))
690 1.1 mrg {
691 1.1 mrg /* The loop has at least 4 iterations. If the loop
692 1.1 mrg header is not the first block of the function
693 1.1 mrg we can rotate the loop. */
694 1.1 mrg
695 1.1 mrg if (best_edge->dest
696 1.1 mrg != ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
697 1.1 mrg {
698 1.1 mrg if (dump_file)
699 1.1 mrg {
700 1.1 mrg fprintf (dump_file,
701 1.1 mrg "Rotating loop %d - %d\n",
702 1.1 mrg best_edge->dest->index, bb->index);
703 1.1 mrg }
704 1.1 mrg bb->aux = best_edge->dest;
705 1.1 mrg bbd[best_edge->dest->index].in_trace =
706 1.1 mrg (*n_traces) - 1;
707 1.1 mrg bb = rotate_loop (best_edge, trace, *n_traces);
708 1.1 mrg }
709 1.1 mrg }
710 1.1 mrg else
711 1.1 mrg {
712 1.1 mrg /* The loop has less than 4 iterations. */
713 1.1 mrg
714 1.1 mrg if (single_succ_p (bb)
715 1.1 mrg && copy_bb_p (best_edge->dest,
716 1.1 mrg optimize_edge_for_speed_p
717 1.1 mrg (best_edge)))
718 1.1 mrg {
719 1.1 mrg bb = copy_bb (best_edge->dest, best_edge, bb,
720 1.1 mrg *n_traces);
721 1.1 mrg trace->length++;
722 1.1 mrg }
723 1.1 mrg }
724 1.1 mrg }
725 1.1 mrg
726 1.1 mrg /* Terminate the trace. */
727 1.1 mrg break;
728 1.1 mrg }
729 1.1 mrg else
730 1.1 mrg {
731 1.1 mrg /* Check for a situation
732 1.1 mrg
733 1.1 mrg A
734 1.1 mrg /|
735 1.1 mrg B |
736 1.1 mrg \|
737 1.1 mrg C
738 1.1 mrg
739 1.1 mrg where
740 1.1 mrg AB->count () + BC->count () >= AC->count ().
741 1.1 mrg (i.e. 2 * B->count >= AC->count )
742 1.1 mrg Best ordering is then A B C.
743 1.1 mrg
744 1.1 mrg When optimizing for size, A B C is always the best order.
745 1.1 mrg
746 1.1 mrg This situation is created for example by:
747 1.1 mrg
748 1.1 mrg if (A) B;
749 1.1 mrg C;
750 1.1 mrg
751 1.1 mrg */
752 1.1 mrg
753 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
754 1.1 mrg if (e != best_edge
755 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU)
756 1.1 mrg && !(e->flags & EDGE_COMPLEX)
757 1.1 mrg && !bb_visited_trace (e->dest)
758 1.1 mrg && single_pred_p (e->dest)
759 1.1 mrg && !(e->flags & EDGE_CROSSING)
760 1.1 mrg && single_succ_p (e->dest)
761 1.1 mrg && (single_succ_edge (e->dest)->flags
762 1.1 mrg & EDGE_CAN_FALLTHRU)
763 1.1 mrg && !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX)
764 1.1 mrg && single_succ (e->dest) == best_edge->dest
765 1.1 mrg && (e->dest->count.apply_scale (2, 1)
766 1.1 mrg >= best_edge->count () || for_size))
767 1.1 mrg {
768 1.1 mrg best_edge = e;
769 1.1 mrg if (dump_file)
770 1.1 mrg fprintf (dump_file, "Selecting BB %d\n",
771 1.1 mrg best_edge->dest->index);
772 1.1 mrg break;
773 1.1 mrg }
774 1.1 mrg
775 1.1 mrg bb->aux = best_edge->dest;
776 1.1 mrg bbd[best_edge->dest->index].in_trace = (*n_traces) - 1;
777 1.1 mrg bb = best_edge->dest;
778 1.1 mrg }
779 1.1 mrg }
780 1.1 mrg }
781 1.1 mrg while (best_edge);
782 1.1 mrg trace->last = bb;
783 1.1 mrg bbd[trace->first->index].start_of_trace = *n_traces - 1;
784 1.1 mrg if (bbd[trace->last->index].end_of_trace != *n_traces - 1)
785 1.1 mrg {
786 1.1 mrg bbd[trace->last->index].end_of_trace = *n_traces - 1;
787 1.1 mrg /* Update the cached maximum frequency for interesting predecessor
788 1.1 mrg edges for successors of the new trace end. */
789 1.1 mrg FOR_EACH_EDGE (e, ei, trace->last->succs)
790 1.1 mrg if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority)
791 1.1 mrg bbd[e->dest->index].priority = EDGE_FREQUENCY (e);
792 1.1 mrg }
793 1.1 mrg
794 1.1 mrg /* The trace is terminated so we have to recount the keys in heap
795 1.1 mrg (some block can have a lower key because now one of its predecessors
796 1.1 mrg is an end of the trace). */
797 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
798 1.1 mrg {
799 1.1 mrg if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
800 1.1 mrg || bb_visited_trace (e->dest))
801 1.1 mrg continue;
802 1.1 mrg
803 1.1 mrg if (bbd[e->dest->index].heap)
804 1.1 mrg {
805 1.1 mrg key = bb_to_key (e->dest);
806 1.1 mrg if (key != bbd[e->dest->index].node->get_key ())
807 1.1 mrg {
808 1.1 mrg if (dump_file)
809 1.1 mrg {
810 1.1 mrg fprintf (dump_file,
811 1.1 mrg "Changing key for bb %d from %ld to %ld.\n",
812 1.1 mrg e->dest->index,
813 1.1 mrg (long) bbd[e->dest->index].node->get_key (), key);
814 1.1 mrg }
815 1.1 mrg bbd[e->dest->index].heap->replace_key
816 1.1 mrg (bbd[e->dest->index].node, key);
817 1.1 mrg }
818 1.1 mrg }
819 1.1 mrg }
820 1.1 mrg }
821 1.1 mrg
822 1.1 mrg delete (*heap);
823 1.1 mrg
824 1.1 mrg /* "Return" the new heap. */
825 1.1 mrg *heap = new_heap;
826 1.1 mrg }
827 1.1 mrg
828 1.1 mrg /* Create a duplicate of the basic block OLD_BB and redirect edge E to it, add
829 1.1 mrg it to trace after BB, mark OLD_BB visited and update pass' data structures
830 1.1 mrg (TRACE is a number of trace which OLD_BB is duplicated to). */
831 1.1 mrg
832 1.1 mrg static basic_block
833 1.1 mrg copy_bb (basic_block old_bb, edge e, basic_block bb, int trace)
834 1.1 mrg {
835 1.1 mrg basic_block new_bb;
836 1.1 mrg
837 1.1 mrg new_bb = duplicate_block (old_bb, e, bb);
838 1.1 mrg BB_COPY_PARTITION (new_bb, old_bb);
839 1.1 mrg
840 1.1 mrg gcc_assert (e->dest == new_bb);
841 1.1 mrg
842 1.1 mrg if (dump_file)
843 1.1 mrg fprintf (dump_file,
844 1.1 mrg "Duplicated bb %d (created bb %d)\n",
845 1.1 mrg old_bb->index, new_bb->index);
846 1.1 mrg
847 1.1 mrg if (new_bb->index >= array_size
848 1.1 mrg || last_basic_block_for_fn (cfun) > array_size)
849 1.1 mrg {
850 1.1 mrg int i;
851 1.1 mrg int new_size;
852 1.1 mrg
853 1.1 mrg new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + 1);
854 1.1 mrg new_size = GET_ARRAY_SIZE (new_size);
855 1.1 mrg bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size);
856 1.1 mrg for (i = array_size; i < new_size; i++)
857 1.1 mrg {
858 1.1 mrg bbd[i].start_of_trace = -1;
859 1.1 mrg bbd[i].end_of_trace = -1;
860 1.1 mrg bbd[i].in_trace = -1;
861 1.1 mrg bbd[i].visited = 0;
862 1.1 mrg bbd[i].priority = -1;
863 1.1 mrg bbd[i].heap = NULL;
864 1.1 mrg bbd[i].node = NULL;
865 1.1 mrg }
866 1.1 mrg array_size = new_size;
867 1.1 mrg
868 1.1 mrg if (dump_file)
869 1.1 mrg {
870 1.1 mrg fprintf (dump_file,
871 1.1 mrg "Growing the dynamic array to %d elements.\n",
872 1.1 mrg array_size);
873 1.1 mrg }
874 1.1 mrg }
875 1.1 mrg
876 1.1 mrg gcc_assert (!bb_visited_trace (e->dest));
877 1.1 mrg mark_bb_visited (new_bb, trace);
878 1.1 mrg new_bb->aux = bb->aux;
879 1.1 mrg bb->aux = new_bb;
880 1.1 mrg
881 1.1 mrg bbd[new_bb->index].in_trace = trace;
882 1.1 mrg
883 1.1 mrg return new_bb;
884 1.1 mrg }
885 1.1 mrg
886 1.1 mrg /* Compute and return the key (for the heap) of the basic block BB. */
887 1.1 mrg
888 1.1 mrg static long
889 1.1 mrg bb_to_key (basic_block bb)
890 1.1 mrg {
891 1.1 mrg edge e;
892 1.1 mrg edge_iterator ei;
893 1.1 mrg
894 1.1 mrg /* Use index as key to align with its original order. */
895 1.1 mrg if (optimize_function_for_size_p (cfun))
896 1.1 mrg return bb->index;
897 1.1 mrg
898 1.1 mrg /* Do not start in probably never executed blocks. */
899 1.1 mrg
900 1.1 mrg if (BB_PARTITION (bb) == BB_COLD_PARTITION
901 1.1 mrg || probably_never_executed_bb_p (cfun, bb))
902 1.1 mrg return BB_FREQ_MAX;
903 1.1 mrg
904 1.1 mrg /* Prefer blocks whose predecessor is an end of some trace
905 1.1 mrg or whose predecessor edge is EDGE_DFS_BACK. */
906 1.1 mrg int priority = bbd[bb->index].priority;
907 1.1 mrg if (priority == -1)
908 1.1 mrg {
909 1.1 mrg priority = 0;
910 1.1 mrg FOR_EACH_EDGE (e, ei, bb->preds)
911 1.1 mrg {
912 1.1 mrg if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
913 1.1 mrg && bbd[e->src->index].end_of_trace >= 0)
914 1.1 mrg || (e->flags & EDGE_DFS_BACK))
915 1.1 mrg {
916 1.1 mrg int edge_freq = EDGE_FREQUENCY (e);
917 1.1 mrg
918 1.1 mrg if (edge_freq > priority)
919 1.1 mrg priority = edge_freq;
920 1.1 mrg }
921 1.1 mrg }
922 1.1 mrg bbd[bb->index].priority = priority;
923 1.1 mrg }
924 1.1 mrg
925 1.1 mrg if (priority)
926 1.1 mrg /* The block with priority should have significantly lower key. */
927 1.1 mrg return -(100 * BB_FREQ_MAX + 100 * priority + bb->count.to_frequency (cfun));
928 1.1 mrg
929 1.1 mrg return -bb->count.to_frequency (cfun);
930 1.1 mrg }
931 1.1 mrg
932 1.1 mrg /* Return true when the edge E from basic block BB is better than the temporary
933 1.1 mrg best edge (details are in function). The probability of edge E is PROB. The
934 1.1 mrg count of the successor is COUNT. The current best probability is
935 1.1 mrg BEST_PROB, the best count is BEST_COUNT.
936 1.1 mrg The edge is considered to be equivalent when PROB does not differ much from
937 1.1 mrg BEST_PROB; similarly for count. */
938 1.1 mrg
939 1.1 mrg static bool
940 1.1 mrg better_edge_p (const_basic_block bb, const_edge e, profile_probability prob,
941 1.1 mrg profile_count count, profile_probability best_prob,
942 1.1 mrg profile_count best_count, const_edge cur_best_edge)
943 1.1 mrg {
944 1.1 mrg bool is_better_edge;
945 1.1 mrg
946 1.1 mrg /* The BEST_* values do not have to be best, but can be a bit smaller than
947 1.1 mrg maximum values. */
948 1.1 mrg profile_probability diff_prob = best_prob.apply_scale (1, 10);
949 1.1 mrg
950 1.1 mrg /* The smaller one is better to keep the original order. */
951 1.1 mrg if (optimize_function_for_size_p (cfun))
952 1.1 mrg return !cur_best_edge
953 1.1 mrg || cur_best_edge->dest->index > e->dest->index;
954 1.1 mrg
955 1.1 mrg /* Those edges are so expensive that continuing a trace is not useful
956 1.1 mrg performance wise. */
957 1.1 mrg if (e->flags & (EDGE_ABNORMAL | EDGE_EH))
958 1.1 mrg return false;
959 1.1 mrg
960 1.1 mrg if (prob > best_prob + diff_prob
961 1.1 mrg || (!best_prob.initialized_p ()
962 1.1 mrg && prob > profile_probability::guessed_never ()))
963 1.1 mrg /* The edge has higher probability than the temporary best edge. */
964 1.1 mrg is_better_edge = true;
965 1.1 mrg else if (prob < best_prob - diff_prob)
966 1.1 mrg /* The edge has lower probability than the temporary best edge. */
967 1.1 mrg is_better_edge = false;
968 1.1 mrg else
969 1.1 mrg {
970 1.1 mrg profile_count diff_count = best_count.apply_scale (1, 10);
971 1.1 mrg if (count < best_count - diff_count
972 1.1 mrg || (!best_count.initialized_p ()
973 1.1 mrg && count.nonzero_p ()))
974 1.1 mrg /* The edge and the temporary best edge have almost equivalent
975 1.1 mrg probabilities. The higher countuency of a successor now means
976 1.1 mrg that there is another edge going into that successor.
977 1.1 mrg This successor has lower countuency so it is better. */
978 1.1 mrg is_better_edge = true;
979 1.1 mrg else if (count > best_count + diff_count)
980 1.1 mrg /* This successor has higher countuency so it is worse. */
981 1.1 mrg is_better_edge = false;
982 1.1 mrg else if (e->dest->prev_bb == bb)
983 1.1 mrg /* The edges have equivalent probabilities and the successors
984 1.1 mrg have equivalent frequencies. Select the previous successor. */
985 1.1 mrg is_better_edge = true;
986 1.1 mrg else
987 1.1 mrg is_better_edge = false;
988 1.1 mrg }
989 1.1 mrg
990 1.1 mrg return is_better_edge;
991 1.1 mrg }
992 1.1 mrg
993 1.1 mrg /* Return true when the edge E is better than the temporary best edge
994 1.1 mrg CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of
995 1.1 mrg E and CUR_BEST_EDGE; otherwise it will compare the dest bb.
996 1.1 mrg BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE.
997 1.1 mrg TRACES record the information about traces.
998 1.1 mrg When optimizing for size, the edge with smaller index is better.
999 1.1 mrg When optimizing for speed, the edge with bigger probability or longer trace
1000 1.1 mrg is better. */
1001 1.1 mrg
1002 1.1 mrg static bool
1003 1.1 mrg connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
1004 1.1 mrg const_edge cur_best_edge, struct trace *traces)
1005 1.1 mrg {
1006 1.1 mrg int e_index;
1007 1.1 mrg int b_index;
1008 1.1 mrg bool is_better_edge;
1009 1.1 mrg
1010 1.1 mrg if (!cur_best_edge)
1011 1.1 mrg return true;
1012 1.1 mrg
1013 1.1 mrg if (optimize_function_for_size_p (cfun))
1014 1.1 mrg {
1015 1.1 mrg e_index = src_index_p ? e->src->index : e->dest->index;
1016 1.1 mrg b_index = src_index_p ? cur_best_edge->src->index
1017 1.1 mrg : cur_best_edge->dest->index;
1018 1.1 mrg /* The smaller one is better to keep the original order. */
1019 1.1 mrg return b_index > e_index;
1020 1.1 mrg }
1021 1.1 mrg
1022 1.1 mrg if (src_index_p)
1023 1.1 mrg {
1024 1.1 mrg e_index = e->src->index;
1025 1.1 mrg
1026 1.1 mrg /* We are looking for predecessor, so probabilities are not that
1027 1.1 mrg informative. We do not want to connect A to B because A has
1028 1.1 mrg only one successor (probability is 100%) while there is edge
1029 1.1 mrg A' to B where probability is 90% but which is much more frequent. */
1030 1.1 mrg if (e->count () > cur_best_edge->count ())
1031 1.1 mrg /* The edge has higher probability than the temporary best edge. */
1032 1.1 mrg is_better_edge = true;
1033 1.1 mrg else if (e->count () < cur_best_edge->count ())
1034 1.1 mrg /* The edge has lower probability than the temporary best edge. */
1035 1.1 mrg is_better_edge = false;
1036 1.1 mrg else if (e->probability > cur_best_edge->probability)
1037 1.1 mrg /* The edge has higher probability than the temporary best edge. */
1038 1.1 mrg is_better_edge = true;
1039 1.1 mrg else if (e->probability < cur_best_edge->probability)
1040 1.1 mrg /* The edge has lower probability than the temporary best edge. */
1041 1.1 mrg is_better_edge = false;
1042 1.1 mrg else if (traces[bbd[e_index].end_of_trace].length > best_len)
1043 1.1 mrg /* The edge and the temporary best edge have equivalent probabilities.
1044 1.1 mrg The edge with longer trace is better. */
1045 1.1 mrg is_better_edge = true;
1046 1.1 mrg else
1047 1.1 mrg is_better_edge = false;
1048 1.1 mrg }
1049 1.1 mrg else
1050 1.1 mrg {
1051 1.1 mrg e_index = e->dest->index;
1052 1.1 mrg
1053 1.1 mrg if (e->probability > cur_best_edge->probability)
1054 1.1 mrg /* The edge has higher probability than the temporary best edge. */
1055 1.1 mrg is_better_edge = true;
1056 1.1 mrg else if (e->probability < cur_best_edge->probability)
1057 1.1 mrg /* The edge has lower probability than the temporary best edge. */
1058 1.1 mrg is_better_edge = false;
1059 1.1 mrg else if (traces[bbd[e_index].start_of_trace].length > best_len)
1060 1.1 mrg /* The edge and the temporary best edge have equivalent probabilities.
1061 1.1 mrg The edge with longer trace is better. */
1062 1.1 mrg is_better_edge = true;
1063 1.1 mrg else
1064 1.1 mrg is_better_edge = false;
1065 1.1 mrg }
1066 1.1 mrg
1067 1.1 mrg return is_better_edge;
1068 1.1 mrg }
1069 1.1 mrg
1070 1.1 mrg /* Connect traces in array TRACES, N_TRACES is the count of traces. */
1071 1.1 mrg
1072 1.1 mrg static void
1073 1.1 mrg connect_traces (int n_traces, struct trace *traces)
1074 1.1 mrg {
1075 1.1 mrg int i;
1076 1.1 mrg bool *connected;
1077 1.1 mrg bool two_passes;
1078 1.1 mrg int last_trace;
1079 1.1 mrg int current_pass;
1080 1.1 mrg int current_partition;
1081 1.1 mrg profile_count count_threshold;
1082 1.1 mrg bool for_size = optimize_function_for_size_p (cfun);
1083 1.1 mrg
1084 1.1 mrg count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, 1000);
1085 1.1 mrg
1086 1.1 mrg connected = XCNEWVEC (bool, n_traces);
1087 1.1 mrg last_trace = -1;
1088 1.1 mrg current_pass = 1;
1089 1.1 mrg current_partition = BB_PARTITION (traces[0].first);
1090 1.1 mrg two_passes = false;
1091 1.1 mrg
1092 1.1 mrg if (crtl->has_bb_partition)
1093 1.1 mrg for (i = 0; i < n_traces && !two_passes; i++)
1094 1.1 mrg if (BB_PARTITION (traces[0].first)
1095 1.1 mrg != BB_PARTITION (traces[i].first))
1096 1.1 mrg two_passes = true;
1097 1.1 mrg
1098 1.1 mrg for (i = 0; i < n_traces || (two_passes && current_pass == 1) ; i++)
1099 1.1 mrg {
1100 1.1 mrg int t = i;
1101 1.1 mrg int t2;
1102 1.1 mrg edge e, best;
1103 1.1 mrg int best_len;
1104 1.1 mrg
1105 1.1 mrg if (i >= n_traces)
1106 1.1 mrg {
1107 1.1 mrg gcc_assert (two_passes && current_pass == 1);
1108 1.1 mrg i = 0;
1109 1.1 mrg t = i;
1110 1.1 mrg current_pass = 2;
1111 1.1 mrg if (current_partition == BB_HOT_PARTITION)
1112 1.1 mrg current_partition = BB_COLD_PARTITION;
1113 1.1 mrg else
1114 1.1 mrg current_partition = BB_HOT_PARTITION;
1115 1.1 mrg }
1116 1.1 mrg
1117 1.1 mrg if (connected[t])
1118 1.1 mrg continue;
1119 1.1 mrg
1120 1.1 mrg if (two_passes
1121 1.1 mrg && BB_PARTITION (traces[t].first) != current_partition)
1122 1.1 mrg continue;
1123 1.1 mrg
1124 1.1 mrg connected[t] = true;
1125 1.1 mrg
1126 1.1 mrg /* Find the predecessor traces. */
1127 1.1 mrg for (t2 = t; t2 > 0;)
1128 1.1 mrg {
1129 1.1 mrg edge_iterator ei;
1130 1.1 mrg best = NULL;
1131 1.1 mrg best_len = 0;
1132 1.1 mrg FOR_EACH_EDGE (e, ei, traces[t2].first->preds)
1133 1.1 mrg {
1134 1.1 mrg int si = e->src->index;
1135 1.1 mrg
1136 1.1 mrg if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1137 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU)
1138 1.1 mrg && !(e->flags & EDGE_COMPLEX)
1139 1.1 mrg && bbd[si].end_of_trace >= 0
1140 1.1 mrg && !connected[bbd[si].end_of_trace]
1141 1.1 mrg && (BB_PARTITION (e->src) == current_partition)
1142 1.1 mrg && connect_better_edge_p (e, true, best_len, best, traces))
1143 1.1 mrg {
1144 1.1 mrg best = e;
1145 1.1 mrg best_len = traces[bbd[si].end_of_trace].length;
1146 1.1 mrg }
1147 1.1 mrg }
1148 1.1 mrg if (best)
1149 1.1 mrg {
1150 1.1 mrg best->src->aux = best->dest;
1151 1.1 mrg t2 = bbd[best->src->index].end_of_trace;
1152 1.1 mrg connected[t2] = true;
1153 1.1 mrg
1154 1.1 mrg if (dump_file)
1155 1.1 mrg {
1156 1.1 mrg fprintf (dump_file, "Connection: %d %d\n",
1157 1.1 mrg best->src->index, best->dest->index);
1158 1.1 mrg }
1159 1.1 mrg }
1160 1.1 mrg else
1161 1.1 mrg break;
1162 1.1 mrg }
1163 1.1 mrg
1164 1.1 mrg if (last_trace >= 0)
1165 1.1 mrg traces[last_trace].last->aux = traces[t2].first;
1166 1.1 mrg last_trace = t;
1167 1.1 mrg
1168 1.1 mrg /* Find the successor traces. */
1169 1.1 mrg while (1)
1170 1.1 mrg {
1171 1.1 mrg /* Find the continuation of the chain. */
1172 1.1 mrg edge_iterator ei;
1173 1.1 mrg best = NULL;
1174 1.1 mrg best_len = 0;
1175 1.1 mrg FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1176 1.1 mrg {
1177 1.1 mrg int di = e->dest->index;
1178 1.1 mrg
1179 1.1 mrg if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1180 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU)
1181 1.1 mrg && !(e->flags & EDGE_COMPLEX)
1182 1.1 mrg && bbd[di].start_of_trace >= 0
1183 1.1 mrg && !connected[bbd[di].start_of_trace]
1184 1.1 mrg && (BB_PARTITION (e->dest) == current_partition)
1185 1.1 mrg && connect_better_edge_p (e, false, best_len, best, traces))
1186 1.1 mrg {
1187 1.1 mrg best = e;
1188 1.1 mrg best_len = traces[bbd[di].start_of_trace].length;
1189 1.1 mrg }
1190 1.1 mrg }
1191 1.1 mrg
1192 1.1 mrg if (for_size)
1193 1.1 mrg {
1194 1.1 mrg if (!best)
1195 1.1 mrg /* Stop finding the successor traces. */
1196 1.1 mrg break;
1197 1.1 mrg
1198 1.1 mrg /* It is OK to connect block n with block n + 1 or a block
1199 1.1 mrg before n. For others, only connect to the loop header. */
1200 1.1 mrg if (best->dest->index > (traces[t].last->index + 1))
1201 1.1 mrg {
1202 1.1 mrg int count = EDGE_COUNT (best->dest->preds);
1203 1.1 mrg
1204 1.1 mrg FOR_EACH_EDGE (e, ei, best->dest->preds)
1205 1.1 mrg if (e->flags & EDGE_DFS_BACK)
1206 1.1 mrg count--;
1207 1.1 mrg
1208 1.1 mrg /* If dest has multiple predecessors, skip it. We expect
1209 1.1 mrg that one predecessor with smaller index connects with it
1210 1.1 mrg later. */
1211 1.1 mrg if (count != 1)
1212 1.1 mrg break;
1213 1.1 mrg }
1214 1.1 mrg
1215 1.1 mrg /* Only connect Trace n with Trace n + 1. It is conservative
1216 1.1 mrg to keep the order as close as possible to the original order.
1217 1.1 mrg It also helps to reduce long jumps. */
1218 1.1 mrg if (last_trace != bbd[best->dest->index].start_of_trace - 1)
1219 1.1 mrg break;
1220 1.1 mrg
1221 1.1 mrg if (dump_file)
1222 1.1 mrg fprintf (dump_file, "Connection: %d %d\n",
1223 1.1 mrg best->src->index, best->dest->index);
1224 1.1 mrg
1225 1.1 mrg t = bbd[best->dest->index].start_of_trace;
1226 1.1 mrg traces[last_trace].last->aux = traces[t].first;
1227 1.1 mrg connected[t] = true;
1228 1.1 mrg last_trace = t;
1229 1.1 mrg }
1230 1.1 mrg else if (best)
1231 1.1 mrg {
1232 1.1 mrg if (dump_file)
1233 1.1 mrg {
1234 1.1 mrg fprintf (dump_file, "Connection: %d %d\n",
1235 1.1 mrg best->src->index, best->dest->index);
1236 1.1 mrg }
1237 1.1 mrg t = bbd[best->dest->index].start_of_trace;
1238 1.1 mrg traces[last_trace].last->aux = traces[t].first;
1239 1.1 mrg connected[t] = true;
1240 1.1 mrg last_trace = t;
1241 1.1 mrg }
1242 1.1 mrg else
1243 1.1 mrg {
1244 1.1 mrg /* Try to connect the traces by duplication of 1 block. */
1245 1.1 mrg edge e2;
1246 1.1 mrg basic_block next_bb = NULL;
1247 1.1 mrg bool try_copy = false;
1248 1.1 mrg
1249 1.1 mrg FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1250 1.1 mrg if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1251 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU)
1252 1.1 mrg && !(e->flags & EDGE_COMPLEX)
1253 1.1 mrg && (!best || e->probability > best->probability))
1254 1.1 mrg {
1255 1.1 mrg edge_iterator ei;
1256 1.1 mrg edge best2 = NULL;
1257 1.1 mrg int best2_len = 0;
1258 1.1 mrg
1259 1.1 mrg /* If the destination is a start of a trace which is only
1260 1.1 mrg one block long, then no need to search the successor
1261 1.1 mrg blocks of the trace. Accept it. */
1262 1.1 mrg if (bbd[e->dest->index].start_of_trace >= 0
1263 1.1 mrg && traces[bbd[e->dest->index].start_of_trace].length
1264 1.1 mrg == 1)
1265 1.1 mrg {
1266 1.1 mrg best = e;
1267 1.1 mrg try_copy = true;
1268 1.1 mrg continue;
1269 1.1 mrg }
1270 1.1 mrg
1271 1.1 mrg FOR_EACH_EDGE (e2, ei, e->dest->succs)
1272 1.1 mrg {
1273 1.1 mrg int di = e2->dest->index;
1274 1.1 mrg
1275 1.1 mrg if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
1276 1.1 mrg || ((e2->flags & EDGE_CAN_FALLTHRU)
1277 1.1 mrg && !(e2->flags & EDGE_COMPLEX)
1278 1.1 mrg && bbd[di].start_of_trace >= 0
1279 1.1 mrg && !connected[bbd[di].start_of_trace]
1280 1.1 mrg && BB_PARTITION (e2->dest) == current_partition
1281 1.1 mrg && e2->count () >= count_threshold
1282 1.1 mrg && (!best2
1283 1.1 mrg || e2->probability > best2->probability
1284 1.1 mrg || (e2->probability == best2->probability
1285 1.1 mrg && traces[bbd[di].start_of_trace].length
1286 1.1 mrg > best2_len))))
1287 1.1 mrg {
1288 1.1 mrg best = e;
1289 1.1 mrg best2 = e2;
1290 1.1 mrg if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
1291 1.1 mrg best2_len = traces[bbd[di].start_of_trace].length;
1292 1.1 mrg else
1293 1.1 mrg best2_len = INT_MAX;
1294 1.1 mrg next_bb = e2->dest;
1295 1.1 mrg try_copy = true;
1296 1.1 mrg }
1297 1.1 mrg }
1298 1.1 mrg }
1299 1.1 mrg
1300 1.1 mrg /* Copy tiny blocks always; copy larger blocks only when the
1301 1.1 mrg edge is traversed frequently enough. */
1302 1.1 mrg if (try_copy
1303 1.1 mrg && BB_PARTITION (best->src) == BB_PARTITION (best->dest)
1304 1.1 mrg && copy_bb_p (best->dest,
1305 1.1 mrg optimize_edge_for_speed_p (best)
1306 1.1 mrg && (!best->count ().initialized_p ()
1307 1.1 mrg || best->count () >= count_threshold)))
1308 1.1 mrg {
1309 1.1 mrg basic_block new_bb;
1310 1.1 mrg
1311 1.1 mrg if (dump_file)
1312 1.1 mrg {
1313 1.1 mrg fprintf (dump_file, "Connection: %d %d ",
1314 1.1 mrg traces[t].last->index, best->dest->index);
1315 1.1 mrg if (!next_bb)
1316 1.1 mrg fputc ('\n', dump_file);
1317 1.1 mrg else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
1318 1.1 mrg fprintf (dump_file, "exit\n");
1319 1.1 mrg else
1320 1.1 mrg fprintf (dump_file, "%d\n", next_bb->index);
1321 1.1 mrg }
1322 1.1 mrg
1323 1.1 mrg new_bb = copy_bb (best->dest, best, traces[t].last, t);
1324 1.1 mrg traces[t].last = new_bb;
1325 1.1 mrg if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
1326 1.1 mrg {
1327 1.1 mrg t = bbd[next_bb->index].start_of_trace;
1328 1.1 mrg traces[last_trace].last->aux = traces[t].first;
1329 1.1 mrg connected[t] = true;
1330 1.1 mrg last_trace = t;
1331 1.1 mrg }
1332 1.1 mrg else
1333 1.1 mrg break; /* Stop finding the successor traces. */
1334 1.1 mrg }
1335 1.1 mrg else
1336 1.1 mrg break; /* Stop finding the successor traces. */
1337 1.1 mrg }
1338 1.1 mrg }
1339 1.1 mrg }
1340 1.1 mrg
1341 1.1 mrg if (dump_file)
1342 1.1 mrg {
1343 1.1 mrg basic_block bb;
1344 1.1 mrg
1345 1.1 mrg fprintf (dump_file, "Final order:\n");
1346 1.1 mrg for (bb = traces[0].first; bb; bb = (basic_block) bb->aux)
1347 1.1 mrg fprintf (dump_file, "%d ", bb->index);
1348 1.1 mrg fprintf (dump_file, "\n");
1349 1.1 mrg fflush (dump_file);
1350 1.1 mrg }
1351 1.1 mrg
1352 1.1 mrg FREE (connected);
1353 1.1 mrg }
1354 1.1 mrg
1355 1.1 mrg /* Return true when BB can and should be copied. CODE_MAY_GROW is true
1356 1.1 mrg when code size is allowed to grow by duplication. */
1357 1.1 mrg
1358 1.1 mrg static bool
1359 1.1 mrg copy_bb_p (const_basic_block bb, int code_may_grow)
1360 1.1 mrg {
1361 1.1 mrg unsigned int size = 0;
1362 1.1 mrg unsigned int max_size = uncond_jump_length;
1363 1.1 mrg rtx_insn *insn;
1364 1.1 mrg
1365 1.1 mrg if (EDGE_COUNT (bb->preds) < 2)
1366 1.1 mrg return false;
1367 1.1 mrg if (!can_duplicate_block_p (bb))
1368 1.1 mrg return false;
1369 1.1 mrg
1370 1.1 mrg /* Avoid duplicating blocks which have many successors (PR/13430). */
1371 1.1 mrg if (EDGE_COUNT (bb->succs) > 8)
1372 1.1 mrg return false;
1373 1.1 mrg
1374 1.1 mrg if (code_may_grow && optimize_bb_for_speed_p (bb))
1375 1.1 mrg max_size *= param_max_grow_copy_bb_insns;
1376 1.1 mrg
1377 1.1 mrg FOR_BB_INSNS (bb, insn)
1378 1.1 mrg {
1379 1.1 mrg if (INSN_P (insn))
1380 1.1 mrg {
1381 1.1 mrg size += get_attr_min_length (insn);
1382 1.1 mrg if (size > max_size)
1383 1.1 mrg break;
1384 1.1 mrg }
1385 1.1 mrg }
1386 1.1 mrg
1387 1.1 mrg if (size <= max_size)
1388 1.1 mrg return true;
1389 1.1 mrg
1390 1.1 mrg if (dump_file)
1391 1.1 mrg {
1392 1.1 mrg fprintf (dump_file,
1393 1.1 mrg "Block %d can't be copied because its size = %u.\n",
1394 1.1 mrg bb->index, size);
1395 1.1 mrg }
1396 1.1 mrg
1397 1.1 mrg return false;
1398 1.1 mrg }
1399 1.1 mrg
1400 1.1 mrg /* Return the length of unconditional jump instruction. */
1401 1.1 mrg
1402 1.1 mrg int
1403 1.1 mrg get_uncond_jump_length (void)
1404 1.1 mrg {
1405 1.1 mrg unsigned int length;
1406 1.1 mrg
1407 1.1 mrg start_sequence ();
1408 1.1 mrg rtx_code_label *label = emit_label (gen_label_rtx ());
1409 1.1 mrg rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label));
1410 1.1 mrg length = get_attr_min_length (jump);
1411 1.1 mrg end_sequence ();
1412 1.1 mrg
1413 1.1 mrg gcc_assert (length < INT_MAX);
1414 1.1 mrg return length;
1415 1.1 mrg }
1416 1.1 mrg
1417 1.1 mrg /* Create a forwarder block to OLD_BB starting with NEW_LABEL and in the
1418 1.1 mrg other partition wrt OLD_BB. */
1419 1.1 mrg
1420 1.1 mrg static basic_block
1421 1.1 mrg create_eh_forwarder_block (rtx_code_label *new_label, basic_block old_bb)
1422 1.1 mrg {
1423 1.1 mrg /* Split OLD_BB, so that EH pads have always only incoming EH edges,
1424 1.1 mrg bb_has_eh_pred bbs are treated specially by DF infrastructure. */
1425 1.1 mrg old_bb = split_block_after_labels (old_bb)->dest;
1426 1.1 mrg
1427 1.1 mrg /* Put the new label and a jump in the new basic block. */
1428 1.1 mrg rtx_insn *label = emit_label (new_label);
1429 1.1 mrg rtx_code_label *old_label = block_label (old_bb);
1430 1.1 mrg rtx_insn *jump = emit_jump_insn (targetm.gen_jump (old_label));
1431 1.1 mrg JUMP_LABEL (jump) = old_label;
1432 1.1 mrg
1433 1.1 mrg /* Create the new basic block and put it in last position. */
1434 1.1 mrg basic_block last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
1435 1.1 mrg basic_block new_bb = create_basic_block (label, jump, last_bb);
1436 1.1 mrg new_bb->aux = last_bb->aux;
1437 1.1 mrg new_bb->count = old_bb->count;
1438 1.1 mrg last_bb->aux = new_bb;
1439 1.1 mrg
1440 1.1 mrg emit_barrier_after_bb (new_bb);
1441 1.1 mrg
1442 1.1 mrg make_single_succ_edge (new_bb, old_bb, 0);
1443 1.1 mrg
1444 1.1 mrg /* Make sure the new basic block is in the other partition. */
1445 1.1 mrg unsigned new_partition = BB_PARTITION (old_bb);
1446 1.1 mrg new_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1447 1.1 mrg BB_SET_PARTITION (new_bb, new_partition);
1448 1.1 mrg
1449 1.1 mrg return new_bb;
1450 1.1 mrg }
1451 1.1 mrg
1452 1.1 mrg /* The common landing pad in block OLD_BB has edges from both partitions.
1453 1.1 mrg Add a new landing pad that will just jump to the old one and split the
1454 1.1 mrg edges so that no EH edge crosses partitions. */
1455 1.1 mrg
1456 1.1 mrg static void
1457 1.1 mrg sjlj_fix_up_crossing_landing_pad (basic_block old_bb)
1458 1.1 mrg {
1459 1.1 mrg const unsigned lp_len = cfun->eh->lp_array->length ();
1460 1.1 mrg edge_iterator ei;
1461 1.1 mrg edge e;
1462 1.1 mrg
1463 1.1 mrg /* Generate the new common landing-pad label. */
1464 1.1 mrg rtx_code_label *new_label = gen_label_rtx ();
1465 1.1 mrg LABEL_PRESERVE_P (new_label) = 1;
1466 1.1 mrg
1467 1.1 mrg /* Create the forwarder block. */
1468 1.1 mrg basic_block new_bb = create_eh_forwarder_block (new_label, old_bb);
1469 1.1 mrg
1470 1.1 mrg /* Create the map from old to new lp index and initialize it. */
1471 1.1 mrg unsigned *index_map = (unsigned *) alloca (lp_len * sizeof (unsigned));
1472 1.1 mrg memset (index_map, 0, lp_len * sizeof (unsigned));
1473 1.1 mrg
1474 1.1 mrg /* Fix up the edges. */
1475 1.1 mrg for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1476 1.1 mrg if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1477 1.1 mrg {
1478 1.1 mrg rtx_insn *insn = BB_END (e->src);
1479 1.1 mrg rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1480 1.1 mrg
1481 1.1 mrg gcc_assert (note != NULL);
1482 1.1 mrg const unsigned old_index = INTVAL (XEXP (note, 0));
1483 1.1 mrg
1484 1.1 mrg /* Generate the new landing-pad structure. */
1485 1.1 mrg if (index_map[old_index] == 0)
1486 1.1 mrg {
1487 1.1 mrg eh_landing_pad old_lp = (*cfun->eh->lp_array)[old_index];
1488 1.1 mrg eh_landing_pad new_lp = gen_eh_landing_pad (old_lp->region);
1489 1.1 mrg new_lp->post_landing_pad = old_lp->post_landing_pad;
1490 1.1 mrg new_lp->landing_pad = new_label;
1491 1.1 mrg index_map[old_index] = new_lp->index;
1492 1.1 mrg }
1493 1.1 mrg XEXP (note, 0) = GEN_INT (index_map[old_index]);
1494 1.1 mrg
1495 1.1 mrg /* Adjust the edge to the new destination. */
1496 1.1 mrg redirect_edge_succ (e, new_bb);
1497 1.1 mrg }
1498 1.1 mrg else
1499 1.1 mrg ei_next (&ei);
1500 1.1 mrg }
1501 1.1 mrg
1502 1.1 mrg /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
1503 1.1 mrg Add a new landing pad that will just jump to the old one and split the
1504 1.1 mrg edges so that no EH edge crosses partitions. */
1505 1.1 mrg
1506 1.1 mrg static void
1507 1.1 mrg dw2_fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
1508 1.1 mrg {
1509 1.1 mrg eh_landing_pad new_lp;
1510 1.1 mrg edge_iterator ei;
1511 1.1 mrg edge e;
1512 1.1 mrg
1513 1.1 mrg /* Generate the new landing-pad structure. */
1514 1.1 mrg new_lp = gen_eh_landing_pad (old_lp->region);
1515 1.1 mrg new_lp->post_landing_pad = old_lp->post_landing_pad;
1516 1.1 mrg new_lp->landing_pad = gen_label_rtx ();
1517 1.1 mrg LABEL_PRESERVE_P (new_lp->landing_pad) = 1;
1518 1.1 mrg
1519 1.1 mrg /* Create the forwarder block. */
1520 1.1 mrg basic_block new_bb = create_eh_forwarder_block (new_lp->landing_pad, old_bb);
1521 1.1 mrg
1522 1.1 mrg /* Fix up the edges. */
1523 1.1 mrg for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; )
1524 1.1 mrg if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1525 1.1 mrg {
1526 1.1 mrg rtx_insn *insn = BB_END (e->src);
1527 1.1 mrg rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1528 1.1 mrg
1529 1.1 mrg gcc_assert (note != NULL);
1530 1.1 mrg gcc_checking_assert (INTVAL (XEXP (note, 0)) == old_lp->index);
1531 1.1 mrg XEXP (note, 0) = GEN_INT (new_lp->index);
1532 1.1 mrg
1533 1.1 mrg /* Adjust the edge to the new destination. */
1534 1.1 mrg redirect_edge_succ (e, new_bb);
1535 1.1 mrg }
1536 1.1 mrg else
1537 1.1 mrg ei_next (&ei);
1538 1.1 mrg }
1539 1.1 mrg
1540 1.1 mrg
1541 1.1 mrg /* Ensure that all hot bbs are included in a hot path through the
1542 1.1 mrg procedure. This is done by calling this function twice, once
1543 1.1 mrg with WALK_UP true (to look for paths from the entry to hot bbs) and
1544 1.1 mrg once with WALK_UP false (to look for paths from hot bbs to the exit).
1545 1.1 mrg Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
1546 1.1 mrg to BBS_IN_HOT_PARTITION. */
1547 1.1 mrg
1548 1.1 mrg static unsigned int
1549 1.1 mrg sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
1550 1.1 mrg vec<basic_block> *bbs_in_hot_partition)
1551 1.1 mrg {
1552 1.1 mrg /* Callers check this. */
1553 1.1 mrg gcc_checking_assert (cold_bb_count);
1554 1.1 mrg
1555 1.1 mrg /* Keep examining hot bbs while we still have some left to check
1556 1.1 mrg and there are remaining cold bbs. */
1557 1.1 mrg vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
1558 1.1 mrg while (! hot_bbs_to_check.is_empty ()
1559 1.1 mrg && cold_bb_count)
1560 1.1 mrg {
1561 1.1 mrg basic_block bb = hot_bbs_to_check.pop ();
1562 1.1 mrg vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
1563 1.1 mrg edge e;
1564 1.1 mrg edge_iterator ei;
1565 1.1 mrg profile_probability highest_probability
1566 1.1 mrg = profile_probability::uninitialized ();
1567 1.1 mrg profile_count highest_count = profile_count::uninitialized ();
1568 1.1 mrg bool found = false;
1569 1.1 mrg
1570 1.1 mrg /* Walk the preds/succs and check if there is at least one already
1571 1.1 mrg marked hot. Keep track of the most frequent pred/succ so that we
1572 1.1 mrg can mark it hot if we don't find one. */
1573 1.1 mrg FOR_EACH_EDGE (e, ei, edges)
1574 1.1 mrg {
1575 1.1 mrg basic_block reach_bb = walk_up ? e->src : e->dest;
1576 1.1 mrg
1577 1.1 mrg if (e->flags & EDGE_DFS_BACK)
1578 1.1 mrg continue;
1579 1.1 mrg
1580 1.1 mrg /* Do not expect profile insanities when profile was not adjusted. */
1581 1.1 mrg if (e->probability == profile_probability::never ()
1582 1.1 mrg || e->count () == profile_count::zero ())
1583 1.1 mrg continue;
1584 1.1 mrg
1585 1.1 mrg if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
1586 1.1 mrg {
1587 1.1 mrg found = true;
1588 1.1 mrg break;
1589 1.1 mrg }
1590 1.1 mrg /* The following loop will look for the hottest edge via
1591 1.1 mrg the edge count, if it is non-zero, then fallback to
1592 1.1 mrg the edge probability. */
1593 1.1 mrg if (!(e->count () > highest_count))
1594 1.1 mrg highest_count = e->count ();
1595 1.1 mrg if (!highest_probability.initialized_p ()
1596 1.1 mrg || e->probability > highest_probability)
1597 1.1 mrg highest_probability = e->probability;
1598 1.1 mrg }
1599 1.1 mrg
1600 1.1 mrg /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot
1601 1.1 mrg block (or unpartitioned, e.g. the entry block) then it is ok. If not,
1602 1.1 mrg then the most frequent pred (or succ) needs to be adjusted. In the
1603 1.1 mrg case where multiple preds/succs have the same frequency (e.g. a
1604 1.1 mrg 50-50 branch), then both will be adjusted. */
1605 1.1 mrg if (found)
1606 1.1 mrg continue;
1607 1.1 mrg
1608 1.1 mrg FOR_EACH_EDGE (e, ei, edges)
1609 1.1 mrg {
1610 1.1 mrg if (e->flags & EDGE_DFS_BACK)
1611 1.1 mrg continue;
1612 1.1 mrg /* Do not expect profile insanities when profile was not adjusted. */
1613 1.1 mrg if (e->probability == profile_probability::never ()
1614 1.1 mrg || e->count () == profile_count::zero ())
1615 1.1 mrg continue;
1616 1.1 mrg /* Select the hottest edge using the edge count, if it is non-zero,
1617 1.1 mrg then fallback to the edge probability. */
1618 1.1 mrg if (highest_count.initialized_p ())
1619 1.1 mrg {
1620 1.1 mrg if (!(e->count () >= highest_count))
1621 1.1 mrg continue;
1622 1.1 mrg }
1623 1.1 mrg else if (!(e->probability >= highest_probability))
1624 1.1 mrg continue;
1625 1.1 mrg
1626 1.1 mrg basic_block reach_bb = walk_up ? e->src : e->dest;
1627 1.1 mrg
1628 1.1 mrg /* We have a hot bb with an immediate dominator that is cold.
1629 1.1 mrg The dominator needs to be re-marked hot. */
1630 1.1 mrg BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
1631 1.1 mrg if (dump_file)
1632 1.1 mrg fprintf (dump_file, "Promoting bb %i to hot partition to sanitize "
1633 1.1 mrg "profile of bb %i in %s walk\n", reach_bb->index,
1634 1.1 mrg bb->index, walk_up ? "backward" : "forward");
1635 1.1 mrg cold_bb_count--;
1636 1.1 mrg
1637 1.1 mrg /* Now we need to examine newly-hot reach_bb to see if it is also
1638 1.1 mrg dominated by a cold bb. */
1639 1.1 mrg bbs_in_hot_partition->safe_push (reach_bb);
1640 1.1 mrg hot_bbs_to_check.safe_push (reach_bb);
1641 1.1 mrg }
1642 1.1 mrg }
1643 1.1 mrg hot_bbs_to_check.release ();
1644 1.1 mrg
1645 1.1 mrg return cold_bb_count;
1646 1.1 mrg }
1647 1.1 mrg
1648 1.1 mrg
1649 1.1 mrg /* Find the basic blocks that are rarely executed and need to be moved to
1650 1.1 mrg a separate section of the .o file (to cut down on paging and improve
1651 1.1 mrg cache locality). Return a vector of all edges that cross. */
1652 1.1 mrg
1653 1.1 mrg static vec<edge>
1654 1.1 mrg find_rarely_executed_basic_blocks_and_crossing_edges (void)
1655 1.1 mrg {
1656 1.1 mrg vec<edge> crossing_edges = vNULL;
1657 1.1 mrg basic_block bb;
1658 1.1 mrg edge e;
1659 1.1 mrg edge_iterator ei;
1660 1.1 mrg unsigned int cold_bb_count = 0;
1661 1.1 mrg auto_vec<basic_block> bbs_in_hot_partition;
1662 1.1 mrg
1663 1.1 mrg propagate_unlikely_bbs_forward ();
1664 1.1 mrg
1665 1.1 mrg /* Mark which partition (hot/cold) each basic block belongs in. */
1666 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
1667 1.1 mrg {
1668 1.1 mrg bool cold_bb = false;
1669 1.1 mrg
1670 1.1 mrg if (probably_never_executed_bb_p (cfun, bb))
1671 1.1 mrg {
1672 1.1 mrg cold_bb = true;
1673 1.1 mrg
1674 1.1 mrg /* Handle profile insanities created by upstream optimizations
1675 1.1 mrg by also checking the incoming edge weights. If there is a non-cold
1676 1.1 mrg incoming edge, conservatively prevent this block from being split
1677 1.1 mrg into the cold section. */
1678 1.1 mrg if (!bb->count.precise_p ())
1679 1.1 mrg FOR_EACH_EDGE (e, ei, bb->preds)
1680 1.1 mrg if (!probably_never_executed_edge_p (cfun, e))
1681 1.1 mrg {
1682 1.1 mrg cold_bb = false;
1683 1.1 mrg break;
1684 1.1 mrg }
1685 1.1 mrg }
1686 1.1 mrg if (cold_bb)
1687 1.1 mrg {
1688 1.1 mrg BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1689 1.1 mrg cold_bb_count++;
1690 1.1 mrg }
1691 1.1 mrg else
1692 1.1 mrg {
1693 1.1 mrg BB_SET_PARTITION (bb, BB_HOT_PARTITION);
1694 1.1 mrg bbs_in_hot_partition.safe_push (bb);
1695 1.1 mrg }
1696 1.1 mrg }
1697 1.1 mrg
1698 1.1 mrg /* Ensure that hot bbs are included along a hot path from the entry to exit.
1699 1.1 mrg Several different possibilities may include cold bbs along all paths
1700 1.1 mrg to/from a hot bb. One is that there are edge weight insanities
1701 1.1 mrg due to optimization phases that do not properly update basic block profile
1702 1.1 mrg counts. The second is that the entry of the function may not be hot, because
1703 1.1 mrg it is entered fewer times than the number of profile training runs, but there
1704 1.1 mrg is a loop inside the function that causes blocks within the function to be
1705 1.1 mrg above the threshold for hotness. This is fixed by walking up from hot bbs
1706 1.1 mrg to the entry block, and then down from hot bbs to the exit, performing
1707 1.1 mrg partitioning fixups as necessary. */
1708 1.1 mrg if (cold_bb_count)
1709 1.1 mrg {
1710 1.1 mrg mark_dfs_back_edges ();
1711 1.1 mrg cold_bb_count = sanitize_hot_paths (true, cold_bb_count,
1712 1.1 mrg &bbs_in_hot_partition);
1713 1.1 mrg if (cold_bb_count)
1714 1.1 mrg sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition);
1715 1.1 mrg
1716 1.1 mrg hash_set <basic_block> set;
1717 1.1 mrg find_bbs_reachable_by_hot_paths (&set);
1718 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
1719 1.1 mrg if (!set.contains (bb))
1720 1.1 mrg BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1721 1.1 mrg }
1722 1.1 mrg
1723 1.1 mrg /* The format of .gcc_except_table does not allow landing pads to
1724 1.1 mrg be in a different partition as the throw. Fix this by either
1725 1.1 mrg moving the landing pads or inserting forwarder landing pads. */
1726 1.1 mrg if (cfun->eh->lp_array)
1727 1.1 mrg {
1728 1.1 mrg const bool sjlj
1729 1.1 mrg = (targetm_common.except_unwind_info (&global_options) == UI_SJLJ);
1730 1.1 mrg unsigned i;
1731 1.1 mrg eh_landing_pad lp;
1732 1.1 mrg
1733 1.1 mrg FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp)
1734 1.1 mrg {
1735 1.1 mrg bool all_same, all_diff;
1736 1.1 mrg
1737 1.1 mrg if (lp == NULL
1738 1.1 mrg || lp->landing_pad == NULL_RTX
1739 1.1 mrg || !LABEL_P (lp->landing_pad))
1740 1.1 mrg continue;
1741 1.1 mrg
1742 1.1 mrg all_same = all_diff = true;
1743 1.1 mrg bb = BLOCK_FOR_INSN (lp->landing_pad);
1744 1.1 mrg FOR_EACH_EDGE (e, ei, bb->preds)
1745 1.1 mrg {
1746 1.1 mrg gcc_assert (e->flags & EDGE_EH);
1747 1.1 mrg if (BB_PARTITION (bb) == BB_PARTITION (e->src))
1748 1.1 mrg all_diff = false;
1749 1.1 mrg else
1750 1.1 mrg all_same = false;
1751 1.1 mrg }
1752 1.1 mrg
1753 1.1 mrg if (all_same)
1754 1.1 mrg ;
1755 1.1 mrg else if (all_diff)
1756 1.1 mrg {
1757 1.1 mrg int which = BB_PARTITION (bb);
1758 1.1 mrg which ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
1759 1.1 mrg BB_SET_PARTITION (bb, which);
1760 1.1 mrg }
1761 1.1 mrg else if (sjlj)
1762 1.1 mrg sjlj_fix_up_crossing_landing_pad (bb);
1763 1.1 mrg else
1764 1.1 mrg dw2_fix_up_crossing_landing_pad (lp, bb);
1765 1.1 mrg
1766 1.1 mrg /* There is a single, common landing pad in SJLJ mode. */
1767 1.1 mrg if (sjlj)
1768 1.1 mrg break;
1769 1.1 mrg }
1770 1.1 mrg }
1771 1.1 mrg
1772 1.1 mrg /* Mark every edge that crosses between sections. */
1773 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
1774 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
1775 1.1 mrg {
1776 1.1 mrg unsigned int flags = e->flags;
1777 1.1 mrg
1778 1.1 mrg /* We should never have EDGE_CROSSING set yet. */
1779 1.1 mrg gcc_checking_assert ((flags & EDGE_CROSSING) == 0);
1780 1.1 mrg
1781 1.1 mrg if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1782 1.1 mrg && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1783 1.1 mrg && BB_PARTITION (e->src) != BB_PARTITION (e->dest))
1784 1.1 mrg {
1785 1.1 mrg crossing_edges.safe_push (e);
1786 1.1 mrg flags |= EDGE_CROSSING;
1787 1.1 mrg }
1788 1.1 mrg
1789 1.1 mrg /* Now that we've split eh edges as appropriate, allow landing pads
1790 1.1 mrg to be merged with the post-landing pads. */
1791 1.1 mrg flags &= ~EDGE_PRESERVE;
1792 1.1 mrg
1793 1.1 mrg e->flags = flags;
1794 1.1 mrg }
1795 1.1 mrg
1796 1.1 mrg return crossing_edges;
1797 1.1 mrg }
1798 1.1 mrg
1799 1.1 mrg /* Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. */
1800 1.1 mrg
1801 1.1 mrg static void
1802 1.1 mrg set_edge_can_fallthru_flag (void)
1803 1.1 mrg {
1804 1.1 mrg basic_block bb;
1805 1.1 mrg
1806 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
1807 1.1 mrg {
1808 1.1 mrg edge e;
1809 1.1 mrg edge_iterator ei;
1810 1.1 mrg
1811 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
1812 1.1 mrg {
1813 1.1 mrg e->flags &= ~EDGE_CAN_FALLTHRU;
1814 1.1 mrg
1815 1.1 mrg /* The FALLTHRU edge is also CAN_FALLTHRU edge. */
1816 1.1 mrg if (e->flags & EDGE_FALLTHRU)
1817 1.1 mrg e->flags |= EDGE_CAN_FALLTHRU;
1818 1.1 mrg }
1819 1.1 mrg
1820 1.1 mrg /* If the BB ends with an invertible condjump all (2) edges are
1821 1.1 mrg CAN_FALLTHRU edges. */
1822 1.1 mrg if (EDGE_COUNT (bb->succs) != 2)
1823 1.1 mrg continue;
1824 1.1 mrg if (!any_condjump_p (BB_END (bb)))
1825 1.1 mrg continue;
1826 1.1 mrg
1827 1.1 mrg rtx_jump_insn *bb_end_jump = as_a <rtx_jump_insn *> (BB_END (bb));
1828 1.1 mrg if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0))
1829 1.1 mrg continue;
1830 1.1 mrg invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0);
1831 1.1 mrg EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU;
1832 1.1 mrg EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU;
1833 1.1 mrg }
1834 1.1 mrg }
1835 1.1 mrg
1836 1.1 mrg /* If any destination of a crossing edge does not have a label, add label;
1837 1.1 mrg Convert any easy fall-through crossing edges to unconditional jumps. */
1838 1.1 mrg
1839 1.1 mrg static void
1840 1.1 mrg add_labels_and_missing_jumps (vec<edge> crossing_edges)
1841 1.1 mrg {
1842 1.1 mrg size_t i;
1843 1.1 mrg edge e;
1844 1.1 mrg
1845 1.1 mrg FOR_EACH_VEC_ELT (crossing_edges, i, e)
1846 1.1 mrg {
1847 1.1 mrg basic_block src = e->src;
1848 1.1 mrg basic_block dest = e->dest;
1849 1.1 mrg rtx_jump_insn *new_jump;
1850 1.1 mrg
1851 1.1 mrg if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
1852 1.1 mrg continue;
1853 1.1 mrg
1854 1.1 mrg /* Make sure dest has a label. */
1855 1.1 mrg rtx_code_label *label = block_label (dest);
1856 1.1 mrg
1857 1.1 mrg /* Nothing to do for non-fallthru edges. */
1858 1.1 mrg if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun))
1859 1.1 mrg continue;
1860 1.1 mrg if ((e->flags & EDGE_FALLTHRU) == 0)
1861 1.1 mrg continue;
1862 1.1 mrg
1863 1.1 mrg /* If the block does not end with a control flow insn, then we
1864 1.1 mrg can trivially add a jump to the end to fixup the crossing.
1865 1.1 mrg Otherwise the jump will have to go in a new bb, which will
1866 1.1 mrg be handled by fix_up_fall_thru_edges function. */
1867 1.1 mrg if (control_flow_insn_p (BB_END (src)))
1868 1.1 mrg continue;
1869 1.1 mrg
1870 1.1 mrg /* Make sure there's only one successor. */
1871 1.1 mrg gcc_assert (single_succ_p (src));
1872 1.1 mrg
1873 1.1 mrg new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src));
1874 1.1 mrg BB_END (src) = new_jump;
1875 1.1 mrg JUMP_LABEL (new_jump) = label;
1876 1.1 mrg LABEL_NUSES (label) += 1;
1877 1.1 mrg
1878 1.1 mrg emit_barrier_after_bb (src);
1879 1.1 mrg
1880 1.1 mrg /* Mark edge as non-fallthru. */
1881 1.1 mrg e->flags &= ~EDGE_FALLTHRU;
1882 1.1 mrg }
1883 1.1 mrg }
1884 1.1 mrg
1885 1.1 mrg /* Find any bb's where the fall-through edge is a crossing edge (note that
1886 1.1 mrg these bb's must also contain a conditional jump or end with a call
1887 1.1 mrg instruction; we've already dealt with fall-through edges for blocks
1888 1.1 mrg that didn't have a conditional jump or didn't end with call instruction
1889 1.1 mrg in the call to add_labels_and_missing_jumps). Convert the fall-through
1890 1.1 mrg edge to non-crossing edge by inserting a new bb to fall-through into.
1891 1.1 mrg The new bb will contain an unconditional jump (crossing edge) to the
1892 1.1 mrg original fall through destination. */
1893 1.1 mrg
1894 1.1 mrg static void
1895 1.1 mrg fix_up_fall_thru_edges (void)
1896 1.1 mrg {
1897 1.1 mrg basic_block cur_bb;
1898 1.1 mrg
1899 1.1 mrg FOR_EACH_BB_FN (cur_bb, cfun)
1900 1.1 mrg {
1901 1.1 mrg edge succ1;
1902 1.1 mrg edge succ2;
1903 1.1 mrg edge fall_thru = NULL;
1904 1.1 mrg edge cond_jump = NULL;
1905 1.1 mrg
1906 1.1 mrg fall_thru = NULL;
1907 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 0)
1908 1.1 mrg succ1 = EDGE_SUCC (cur_bb, 0);
1909 1.1 mrg else
1910 1.1 mrg succ1 = NULL;
1911 1.1 mrg
1912 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 1)
1913 1.1 mrg succ2 = EDGE_SUCC (cur_bb, 1);
1914 1.1 mrg else
1915 1.1 mrg succ2 = NULL;
1916 1.1 mrg
1917 1.1 mrg /* Find the fall-through edge. */
1918 1.1 mrg
1919 1.1 mrg if (succ1
1920 1.1 mrg && (succ1->flags & EDGE_FALLTHRU))
1921 1.1 mrg {
1922 1.1 mrg fall_thru = succ1;
1923 1.1 mrg cond_jump = succ2;
1924 1.1 mrg }
1925 1.1 mrg else if (succ2
1926 1.1 mrg && (succ2->flags & EDGE_FALLTHRU))
1927 1.1 mrg {
1928 1.1 mrg fall_thru = succ2;
1929 1.1 mrg cond_jump = succ1;
1930 1.1 mrg }
1931 1.1 mrg else if (succ2 && EDGE_COUNT (cur_bb->succs) > 2)
1932 1.1 mrg fall_thru = find_fallthru_edge (cur_bb->succs);
1933 1.1 mrg
1934 1.1 mrg if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)))
1935 1.1 mrg {
1936 1.1 mrg /* Check to see if the fall-thru edge is a crossing edge. */
1937 1.1 mrg
1938 1.1 mrg if (fall_thru->flags & EDGE_CROSSING)
1939 1.1 mrg {
1940 1.1 mrg /* The fall_thru edge crosses; now check the cond jump edge, if
1941 1.1 mrg it exists. */
1942 1.1 mrg
1943 1.1 mrg bool cond_jump_crosses = true;
1944 1.1 mrg int invert_worked = 0;
1945 1.1 mrg rtx_insn *old_jump = BB_END (cur_bb);
1946 1.1 mrg
1947 1.1 mrg /* Find the jump instruction, if there is one. */
1948 1.1 mrg
1949 1.1 mrg if (cond_jump)
1950 1.1 mrg {
1951 1.1 mrg if (!(cond_jump->flags & EDGE_CROSSING))
1952 1.1 mrg cond_jump_crosses = false;
1953 1.1 mrg
1954 1.1 mrg /* We know the fall-thru edge crosses; if the cond
1955 1.1 mrg jump edge does NOT cross, and its destination is the
1956 1.1 mrg next block in the bb order, invert the jump
1957 1.1 mrg (i.e. fix it so the fall through does not cross and
1958 1.1 mrg the cond jump does). */
1959 1.1 mrg
1960 1.1 mrg if (!cond_jump_crosses)
1961 1.1 mrg {
1962 1.1 mrg /* Find label in fall_thru block. We've already added
1963 1.1 mrg any missing labels, so there must be one. */
1964 1.1 mrg
1965 1.1 mrg rtx_code_label *fall_thru_label
1966 1.1 mrg = block_label (fall_thru->dest);
1967 1.1 mrg
1968 1.1 mrg if (old_jump && fall_thru_label)
1969 1.1 mrg {
1970 1.1 mrg rtx_jump_insn *old_jump_insn
1971 1.1 mrg = dyn_cast <rtx_jump_insn *> (old_jump);
1972 1.1 mrg if (old_jump_insn)
1973 1.1 mrg invert_worked = invert_jump (old_jump_insn,
1974 1.1 mrg fall_thru_label, 0);
1975 1.1 mrg }
1976 1.1 mrg
1977 1.1 mrg if (invert_worked)
1978 1.1 mrg {
1979 1.1 mrg fall_thru->flags &= ~EDGE_FALLTHRU;
1980 1.1 mrg cond_jump->flags |= EDGE_FALLTHRU;
1981 1.1 mrg update_br_prob_note (cur_bb);
1982 1.1 mrg std::swap (fall_thru, cond_jump);
1983 1.1 mrg cond_jump->flags |= EDGE_CROSSING;
1984 1.1 mrg fall_thru->flags &= ~EDGE_CROSSING;
1985 1.1 mrg }
1986 1.1 mrg }
1987 1.1 mrg }
1988 1.1 mrg
1989 1.1 mrg if (cond_jump_crosses || !invert_worked)
1990 1.1 mrg {
1991 1.1 mrg /* This is the case where both edges out of the basic
1992 1.1 mrg block are crossing edges. Here we will fix up the
1993 1.1 mrg fall through edge. The jump edge will be taken care
1994 1.1 mrg of later. The EDGE_CROSSING flag of fall_thru edge
1995 1.1 mrg is unset before the call to force_nonfallthru
1996 1.1 mrg function because if a new basic-block is created
1997 1.1 mrg this edge remains in the current section boundary
1998 1.1 mrg while the edge between new_bb and the fall_thru->dest
1999 1.1 mrg becomes EDGE_CROSSING. */
2000 1.1 mrg
2001 1.1 mrg fall_thru->flags &= ~EDGE_CROSSING;
2002 1.1 mrg unsigned old_count = EDGE_COUNT (cur_bb->succs);
2003 1.1 mrg basic_block new_bb = force_nonfallthru (fall_thru);
2004 1.1 mrg
2005 1.1 mrg if (new_bb)
2006 1.1 mrg {
2007 1.1 mrg new_bb->aux = cur_bb->aux;
2008 1.1 mrg cur_bb->aux = new_bb;
2009 1.1 mrg
2010 1.1 mrg /* This is done by force_nonfallthru_and_redirect. */
2011 1.1 mrg gcc_assert (BB_PARTITION (new_bb)
2012 1.1 mrg == BB_PARTITION (cur_bb));
2013 1.1 mrg
2014 1.1 mrg edge e = single_succ_edge (new_bb);
2015 1.1 mrg e->flags |= EDGE_CROSSING;
2016 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > old_count)
2017 1.1 mrg {
2018 1.1 mrg /* If asm goto has a crossing fallthrough edge
2019 1.1 mrg and at least one of the labels to the same bb,
2020 1.1 mrg force_nonfallthru can result in the fallthrough
2021 1.1 mrg edge being redirected and a new edge added for the
2022 1.1 mrg label or more labels to e->dest. As we've
2023 1.1 mrg temporarily cleared EDGE_CROSSING flag on the
2024 1.1 mrg fallthrough edge, we need to restore it again.
2025 1.1 mrg See PR108596. */
2026 1.1 mrg rtx_insn *j = BB_END (cur_bb);
2027 1.1 mrg gcc_checking_assert (JUMP_P (j)
2028 1.1 mrg && asm_noperands (PATTERN (j)));
2029 1.1 mrg edge e2 = find_edge (cur_bb, e->dest);
2030 1.1 mrg if (e2)
2031 1.1 mrg e2->flags |= EDGE_CROSSING;
2032 1.1 mrg }
2033 1.1 mrg }
2034 1.1 mrg else
2035 1.1 mrg {
2036 1.1 mrg /* If a new basic-block was not created; restore
2037 1.1 mrg the EDGE_CROSSING flag. */
2038 1.1 mrg fall_thru->flags |= EDGE_CROSSING;
2039 1.1 mrg }
2040 1.1 mrg
2041 1.1 mrg /* Add barrier after new jump */
2042 1.1 mrg emit_barrier_after_bb (new_bb ? new_bb : cur_bb);
2043 1.1 mrg }
2044 1.1 mrg }
2045 1.1 mrg }
2046 1.1 mrg }
2047 1.1 mrg }
2048 1.1 mrg
2049 1.1 mrg /* This function checks the destination block of a "crossing jump" to
2050 1.1 mrg see if it has any crossing predecessors that begin with a code label
2051 1.1 mrg and end with an unconditional jump. If so, it returns that predecessor
2052 1.1 mrg block. (This is to avoid creating lots of new basic blocks that all
2053 1.1 mrg contain unconditional jumps to the same destination). */
2054 1.1 mrg
2055 1.1 mrg static basic_block
2056 1.1 mrg find_jump_block (basic_block jump_dest)
2057 1.1 mrg {
2058 1.1 mrg basic_block source_bb = NULL;
2059 1.1 mrg edge e;
2060 1.1 mrg rtx_insn *insn;
2061 1.1 mrg edge_iterator ei;
2062 1.1 mrg
2063 1.1 mrg FOR_EACH_EDGE (e, ei, jump_dest->preds)
2064 1.1 mrg if (e->flags & EDGE_CROSSING)
2065 1.1 mrg {
2066 1.1 mrg basic_block src = e->src;
2067 1.1 mrg
2068 1.1 mrg /* Check each predecessor to see if it has a label, and contains
2069 1.1 mrg only one executable instruction, which is an unconditional jump.
2070 1.1 mrg If so, we can use it. */
2071 1.1 mrg
2072 1.1 mrg if (LABEL_P (BB_HEAD (src)))
2073 1.1 mrg for (insn = BB_HEAD (src);
2074 1.1 mrg !INSN_P (insn) && insn != NEXT_INSN (BB_END (src));
2075 1.1 mrg insn = NEXT_INSN (insn))
2076 1.1 mrg {
2077 1.1 mrg if (INSN_P (insn)
2078 1.1 mrg && insn == BB_END (src)
2079 1.1 mrg && JUMP_P (insn)
2080 1.1 mrg && !any_condjump_p (insn))
2081 1.1 mrg {
2082 1.1 mrg source_bb = src;
2083 1.1 mrg break;
2084 1.1 mrg }
2085 1.1 mrg }
2086 1.1 mrg
2087 1.1 mrg if (source_bb)
2088 1.1 mrg break;
2089 1.1 mrg }
2090 1.1 mrg
2091 1.1 mrg return source_bb;
2092 1.1 mrg }
2093 1.1 mrg
2094 1.1 mrg /* Find all BB's with conditional jumps that are crossing edges;
2095 1.1 mrg insert a new bb and make the conditional jump branch to the new
2096 1.1 mrg bb instead (make the new bb same color so conditional branch won't
2097 1.1 mrg be a 'crossing' edge). Insert an unconditional jump from the
2098 1.1 mrg new bb to the original destination of the conditional jump. */
2099 1.1 mrg
2100 1.1 mrg static void
2101 1.1 mrg fix_crossing_conditional_branches (void)
2102 1.1 mrg {
2103 1.1 mrg basic_block cur_bb;
2104 1.1 mrg basic_block new_bb;
2105 1.1 mrg basic_block dest;
2106 1.1 mrg edge succ1;
2107 1.1 mrg edge succ2;
2108 1.1 mrg edge crossing_edge;
2109 1.1 mrg edge new_edge;
2110 1.1 mrg rtx set_src;
2111 1.1 mrg rtx old_label = NULL_RTX;
2112 1.1 mrg rtx_code_label *new_label;
2113 1.1 mrg
2114 1.1 mrg FOR_EACH_BB_FN (cur_bb, cfun)
2115 1.1 mrg {
2116 1.1 mrg crossing_edge = NULL;
2117 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 0)
2118 1.1 mrg succ1 = EDGE_SUCC (cur_bb, 0);
2119 1.1 mrg else
2120 1.1 mrg succ1 = NULL;
2121 1.1 mrg
2122 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 1)
2123 1.1 mrg succ2 = EDGE_SUCC (cur_bb, 1);
2124 1.1 mrg else
2125 1.1 mrg succ2 = NULL;
2126 1.1 mrg
2127 1.1 mrg /* We already took care of fall-through edges, so only one successor
2128 1.1 mrg can be a crossing edge. */
2129 1.1 mrg
2130 1.1 mrg if (succ1 && (succ1->flags & EDGE_CROSSING))
2131 1.1 mrg crossing_edge = succ1;
2132 1.1 mrg else if (succ2 && (succ2->flags & EDGE_CROSSING))
2133 1.1 mrg crossing_edge = succ2;
2134 1.1 mrg
2135 1.1 mrg if (crossing_edge)
2136 1.1 mrg {
2137 1.1 mrg rtx_insn *old_jump = BB_END (cur_bb);
2138 1.1 mrg
2139 1.1 mrg /* Check to make sure the jump instruction is a
2140 1.1 mrg conditional jump. */
2141 1.1 mrg
2142 1.1 mrg set_src = NULL_RTX;
2143 1.1 mrg
2144 1.1 mrg if (any_condjump_p (old_jump))
2145 1.1 mrg {
2146 1.1 mrg if (GET_CODE (PATTERN (old_jump)) == SET)
2147 1.1 mrg set_src = SET_SRC (PATTERN (old_jump));
2148 1.1 mrg else if (GET_CODE (PATTERN (old_jump)) == PARALLEL)
2149 1.1 mrg {
2150 1.1 mrg set_src = XVECEXP (PATTERN (old_jump), 0,0);
2151 1.1 mrg if (GET_CODE (set_src) == SET)
2152 1.1 mrg set_src = SET_SRC (set_src);
2153 1.1 mrg else
2154 1.1 mrg set_src = NULL_RTX;
2155 1.1 mrg }
2156 1.1 mrg }
2157 1.1 mrg
2158 1.1 mrg if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE))
2159 1.1 mrg {
2160 1.1 mrg rtx_jump_insn *old_jump_insn =
2161 1.1 mrg as_a <rtx_jump_insn *> (old_jump);
2162 1.1 mrg
2163 1.1 mrg if (GET_CODE (XEXP (set_src, 1)) == PC)
2164 1.1 mrg old_label = XEXP (set_src, 2);
2165 1.1 mrg else if (GET_CODE (XEXP (set_src, 2)) == PC)
2166 1.1 mrg old_label = XEXP (set_src, 1);
2167 1.1 mrg
2168 1.1 mrg /* Check to see if new bb for jumping to that dest has
2169 1.1 mrg already been created; if so, use it; if not, create
2170 1.1 mrg a new one. */
2171 1.1 mrg
2172 1.1 mrg new_bb = find_jump_block (crossing_edge->dest);
2173 1.1 mrg
2174 1.1 mrg if (new_bb)
2175 1.1 mrg new_label = block_label (new_bb);
2176 1.1 mrg else
2177 1.1 mrg {
2178 1.1 mrg basic_block last_bb;
2179 1.1 mrg rtx_code_label *old_jump_target;
2180 1.1 mrg rtx_jump_insn *new_jump;
2181 1.1 mrg
2182 1.1 mrg /* Create new basic block to be dest for
2183 1.1 mrg conditional jump. */
2184 1.1 mrg
2185 1.1 mrg /* Put appropriate instructions in new bb. */
2186 1.1 mrg
2187 1.1 mrg new_label = gen_label_rtx ();
2188 1.1 mrg emit_label (new_label);
2189 1.1 mrg
2190 1.1 mrg gcc_assert (GET_CODE (old_label) == LABEL_REF);
2191 1.1 mrg old_jump_target = old_jump_insn->jump_target ();
2192 1.1 mrg new_jump = as_a <rtx_jump_insn *>
2193 1.1 mrg (emit_jump_insn (targetm.gen_jump (old_jump_target)));
2194 1.1 mrg new_jump->set_jump_target (old_jump_target);
2195 1.1 mrg
2196 1.1 mrg last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
2197 1.1 mrg new_bb = create_basic_block (new_label, new_jump, last_bb);
2198 1.1 mrg new_bb->aux = last_bb->aux;
2199 1.1 mrg last_bb->aux = new_bb;
2200 1.1 mrg
2201 1.1 mrg emit_barrier_after_bb (new_bb);
2202 1.1 mrg
2203 1.1 mrg /* Make sure new bb is in same partition as source
2204 1.1 mrg of conditional branch. */
2205 1.1 mrg BB_COPY_PARTITION (new_bb, cur_bb);
2206 1.1 mrg }
2207 1.1 mrg
2208 1.1 mrg /* Make old jump branch to new bb. */
2209 1.1 mrg
2210 1.1 mrg redirect_jump (old_jump_insn, new_label, 0);
2211 1.1 mrg
2212 1.1 mrg /* Remove crossing_edge as predecessor of 'dest'. */
2213 1.1 mrg
2214 1.1 mrg dest = crossing_edge->dest;
2215 1.1 mrg
2216 1.1 mrg redirect_edge_succ (crossing_edge, new_bb);
2217 1.1 mrg
2218 1.1 mrg /* Make a new edge from new_bb to old dest; new edge
2219 1.1 mrg will be a successor for new_bb and a predecessor
2220 1.1 mrg for 'dest'. */
2221 1.1 mrg
2222 1.1 mrg if (EDGE_COUNT (new_bb->succs) == 0)
2223 1.1 mrg new_edge = make_single_succ_edge (new_bb, dest, 0);
2224 1.1 mrg else
2225 1.1 mrg new_edge = EDGE_SUCC (new_bb, 0);
2226 1.1 mrg
2227 1.1 mrg crossing_edge->flags &= ~EDGE_CROSSING;
2228 1.1 mrg new_edge->flags |= EDGE_CROSSING;
2229 1.1 mrg }
2230 1.1 mrg }
2231 1.1 mrg }
2232 1.1 mrg }
2233 1.1 mrg
2234 1.1 mrg /* Find any unconditional branches that cross between hot and cold
2235 1.1 mrg sections. Convert them into indirect jumps instead. */
2236 1.1 mrg
2237 1.1 mrg static void
2238 1.1 mrg fix_crossing_unconditional_branches (void)
2239 1.1 mrg {
2240 1.1 mrg basic_block cur_bb;
2241 1.1 mrg rtx_insn *last_insn;
2242 1.1 mrg rtx label;
2243 1.1 mrg rtx label_addr;
2244 1.1 mrg rtx_insn *indirect_jump_sequence;
2245 1.1 mrg rtx_insn *jump_insn = NULL;
2246 1.1 mrg rtx new_reg;
2247 1.1 mrg rtx_insn *cur_insn;
2248 1.1 mrg edge succ;
2249 1.1 mrg
2250 1.1 mrg FOR_EACH_BB_FN (cur_bb, cfun)
2251 1.1 mrg {
2252 1.1 mrg last_insn = BB_END (cur_bb);
2253 1.1 mrg
2254 1.1 mrg if (EDGE_COUNT (cur_bb->succs) < 1)
2255 1.1 mrg continue;
2256 1.1 mrg
2257 1.1 mrg succ = EDGE_SUCC (cur_bb, 0);
2258 1.1 mrg
2259 1.1 mrg /* Check to see if bb ends in a crossing (unconditional) jump. At
2260 1.1 mrg this point, no crossing jumps should be conditional. */
2261 1.1 mrg
2262 1.1 mrg if (JUMP_P (last_insn)
2263 1.1 mrg && (succ->flags & EDGE_CROSSING))
2264 1.1 mrg {
2265 1.1 mrg gcc_assert (!any_condjump_p (last_insn));
2266 1.1 mrg
2267 1.1 mrg /* Make sure the jump is not already an indirect or table jump. */
2268 1.1 mrg
2269 1.1 mrg if (!computed_jump_p (last_insn)
2270 1.1 mrg && !tablejump_p (last_insn, NULL, NULL)
2271 1.1 mrg && asm_noperands (PATTERN (last_insn)) < 0)
2272 1.1 mrg {
2273 1.1 mrg /* We have found a "crossing" unconditional branch. Now
2274 1.1 mrg we must convert it to an indirect jump. First create
2275 1.1 mrg reference of label, as target for jump. */
2276 1.1 mrg
2277 1.1 mrg label = JUMP_LABEL (last_insn);
2278 1.1 mrg label_addr = gen_rtx_LABEL_REF (Pmode, label);
2279 1.1 mrg LABEL_NUSES (label) += 1;
2280 1.1 mrg
2281 1.1 mrg /* Get a register to use for the indirect jump. */
2282 1.1 mrg
2283 1.1 mrg new_reg = gen_reg_rtx (Pmode);
2284 1.1 mrg
2285 1.1 mrg /* Generate indirect the jump sequence. */
2286 1.1 mrg
2287 1.1 mrg start_sequence ();
2288 1.1 mrg emit_move_insn (new_reg, label_addr);
2289 1.1 mrg emit_indirect_jump (new_reg);
2290 1.1 mrg indirect_jump_sequence = get_insns ();
2291 1.1 mrg end_sequence ();
2292 1.1 mrg
2293 1.1 mrg /* Make sure every instruction in the new jump sequence has
2294 1.1 mrg its basic block set to be cur_bb. */
2295 1.1 mrg
2296 1.1 mrg for (cur_insn = indirect_jump_sequence; cur_insn;
2297 1.1 mrg cur_insn = NEXT_INSN (cur_insn))
2298 1.1 mrg {
2299 1.1 mrg if (!BARRIER_P (cur_insn))
2300 1.1 mrg BLOCK_FOR_INSN (cur_insn) = cur_bb;
2301 1.1 mrg if (JUMP_P (cur_insn))
2302 1.1 mrg jump_insn = cur_insn;
2303 1.1 mrg }
2304 1.1 mrg
2305 1.1 mrg /* Insert the new (indirect) jump sequence immediately before
2306 1.1 mrg the unconditional jump, then delete the unconditional jump. */
2307 1.1 mrg
2308 1.1 mrg emit_insn_before (indirect_jump_sequence, last_insn);
2309 1.1 mrg delete_insn (last_insn);
2310 1.1 mrg
2311 1.1 mrg JUMP_LABEL (jump_insn) = label;
2312 1.1 mrg LABEL_NUSES (label)++;
2313 1.1 mrg
2314 1.1 mrg /* Make BB_END for cur_bb be the jump instruction (NOT the
2315 1.1 mrg barrier instruction at the end of the sequence...). */
2316 1.1 mrg
2317 1.1 mrg BB_END (cur_bb) = jump_insn;
2318 1.1 mrg }
2319 1.1 mrg }
2320 1.1 mrg }
2321 1.1 mrg }
2322 1.1 mrg
2323 1.1 mrg /* Update CROSSING_JUMP_P flags on all jump insns. */
2324 1.1 mrg
2325 1.1 mrg static void
2326 1.1 mrg update_crossing_jump_flags (void)
2327 1.1 mrg {
2328 1.1 mrg basic_block bb;
2329 1.1 mrg edge e;
2330 1.1 mrg edge_iterator ei;
2331 1.1 mrg
2332 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
2333 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs)
2334 1.1 mrg if (e->flags & EDGE_CROSSING)
2335 1.1 mrg {
2336 1.1 mrg if (JUMP_P (BB_END (bb)))
2337 1.1 mrg CROSSING_JUMP_P (BB_END (bb)) = 1;
2338 1.1 mrg break;
2339 1.1 mrg }
2340 1.1 mrg }
2341 1.1 mrg
2342 1.1 mrg /* Reorder basic blocks using the software trace cache (STC) algorithm. */
2343 1.1 mrg
2344 1.1 mrg static void
2345 1.1 mrg reorder_basic_blocks_software_trace_cache (void)
2346 1.1 mrg {
2347 1.1 mrg if (dump_file)
2348 1.1 mrg fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
2349 1.1 mrg
2350 1.1 mrg int n_traces;
2351 1.1 mrg int i;
2352 1.1 mrg struct trace *traces;
2353 1.1 mrg
2354 1.1 mrg /* We are estimating the length of uncond jump insn only once since the code
2355 1.1 mrg for getting the insn length always returns the minimal length now. */
2356 1.1 mrg if (uncond_jump_length == 0)
2357 1.1 mrg uncond_jump_length = get_uncond_jump_length ();
2358 1.1 mrg
2359 1.1 mrg /* We need to know some information for each basic block. */
2360 1.1 mrg array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun));
2361 1.1 mrg bbd = XNEWVEC (bbro_basic_block_data, array_size);
2362 1.1 mrg for (i = 0; i < array_size; i++)
2363 1.1 mrg {
2364 1.1 mrg bbd[i].start_of_trace = -1;
2365 1.1 mrg bbd[i].end_of_trace = -1;
2366 1.1 mrg bbd[i].in_trace = -1;
2367 1.1 mrg bbd[i].visited = 0;
2368 1.1 mrg bbd[i].priority = -1;
2369 1.1 mrg bbd[i].heap = NULL;
2370 1.1 mrg bbd[i].node = NULL;
2371 1.1 mrg }
2372 1.1 mrg
2373 1.1 mrg traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun));
2374 1.1 mrg n_traces = 0;
2375 1.1 mrg find_traces (&n_traces, traces);
2376 1.1 mrg connect_traces (n_traces, traces);
2377 1.1 mrg FREE (traces);
2378 1.1 mrg FREE (bbd);
2379 1.1 mrg }
2380 1.1 mrg
2381 1.1 mrg /* Order edges by execution frequency, higher first. */
2382 1.1 mrg
2383 1.1 mrg static int
2384 1.1 mrg edge_order (const void *ve1, const void *ve2)
2385 1.1 mrg {
2386 1.1 mrg edge e1 = *(const edge *) ve1;
2387 1.1 mrg edge e2 = *(const edge *) ve2;
2388 1.1 mrg profile_count c1 = e1->count ();
2389 1.1 mrg profile_count c2 = e2->count ();
2390 1.1 mrg /* Since profile_count::operator< does not establish a strict weak order
2391 1.1 mrg in presence of uninitialized counts, use 'max': this makes them appear
2392 1.1 mrg as if having execution frequency less than any initialized count. */
2393 1.1 mrg profile_count m = c1.max (c2);
2394 1.1 mrg return (m == c2) - (m == c1);
2395 1.1 mrg }
2396 1.1 mrg
2397 1.1 mrg /* Reorder basic blocks using the "simple" algorithm. This tries to
2398 1.1 mrg maximize the dynamic number of branches that are fallthrough, without
2399 1.1 mrg copying instructions. The algorithm is greedy, looking at the most
2400 1.1 mrg frequently executed branch first. */
2401 1.1 mrg
2402 1.1 mrg static void
2403 1.1 mrg reorder_basic_blocks_simple (void)
2404 1.1 mrg {
2405 1.1 mrg if (dump_file)
2406 1.1 mrg fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
2407 1.1 mrg
2408 1.1 mrg edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
2409 1.1 mrg
2410 1.1 mrg /* First, collect all edges that can be optimized by reordering blocks:
2411 1.1 mrg simple jumps and conditional jumps, as well as the function entry edge. */
2412 1.1 mrg
2413 1.1 mrg int n = 0;
2414 1.1 mrg edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
2415 1.1 mrg
2416 1.1 mrg basic_block bb;
2417 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
2418 1.1 mrg {
2419 1.1 mrg rtx_insn *end = BB_END (bb);
2420 1.1 mrg
2421 1.1 mrg if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
2422 1.1 mrg continue;
2423 1.1 mrg
2424 1.1 mrg /* We cannot optimize asm goto. */
2425 1.1 mrg if (JUMP_P (end) && extract_asm_operands (end))
2426 1.1 mrg continue;
2427 1.1 mrg
2428 1.1 mrg if (single_succ_p (bb))
2429 1.1 mrg edges[n++] = EDGE_SUCC (bb, 0);
2430 1.1 mrg else if (any_condjump_p (end))
2431 1.1 mrg {
2432 1.1 mrg edge e0 = EDGE_SUCC (bb, 0);
2433 1.1 mrg edge e1 = EDGE_SUCC (bb, 1);
2434 1.1 mrg /* When optimizing for size it is best to keep the original
2435 1.1 mrg fallthrough edges. */
2436 1.1 mrg if (e1->flags & EDGE_FALLTHRU)
2437 1.1 mrg std::swap (e0, e1);
2438 1.1 mrg edges[n++] = e0;
2439 1.1 mrg edges[n++] = e1;
2440 1.1 mrg }
2441 1.1 mrg }
2442 1.1 mrg
2443 1.1 mrg /* Sort the edges, the most desirable first. When optimizing for size
2444 1.1 mrg all edges are equally desirable. */
2445 1.1 mrg
2446 1.1 mrg if (optimize_function_for_speed_p (cfun))
2447 1.1 mrg gcc_stablesort (edges, n, sizeof *edges, edge_order);
2448 1.1 mrg
2449 1.1 mrg /* Now decide which of those edges to make fallthrough edges. We set
2450 1.1 mrg BB_VISITED if a block already has a fallthrough successor assigned
2451 1.1 mrg to it. We make ->AUX of an endpoint point to the opposite endpoint
2452 1.1 mrg of a sequence of blocks that fall through, and ->AUX will be NULL
2453 1.1 mrg for a block that is in such a sequence but not an endpoint anymore.
2454 1.1 mrg
2455 1.1 mrg To start with, everything points to itself, nothing is assigned yet. */
2456 1.1 mrg
2457 1.1 mrg FOR_ALL_BB_FN (bb, cfun)
2458 1.1 mrg {
2459 1.1 mrg bb->aux = bb;
2460 1.1 mrg bb->flags &= ~BB_VISITED;
2461 1.1 mrg }
2462 1.1 mrg
2463 1.1 mrg EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
2464 1.1 mrg
2465 1.1 mrg /* Now for all edges, the most desirable first, see if that edge can
2466 1.1 mrg connect two sequences. If it can, update AUX and BB_VISITED; if it
2467 1.1 mrg cannot, zero out the edge in the table. */
2468 1.1 mrg
2469 1.1 mrg for (int j = 0; j < n; j++)
2470 1.1 mrg {
2471 1.1 mrg edge e = edges[j];
2472 1.1 mrg
2473 1.1 mrg basic_block tail_a = e->src;
2474 1.1 mrg basic_block head_b = e->dest;
2475 1.1 mrg basic_block head_a = (basic_block) tail_a->aux;
2476 1.1 mrg basic_block tail_b = (basic_block) head_b->aux;
2477 1.1 mrg
2478 1.1 mrg /* An edge cannot connect two sequences if:
2479 1.1 mrg - it crosses partitions;
2480 1.1 mrg - its src is not a current endpoint;
2481 1.1 mrg - its dest is not a current endpoint;
2482 1.1 mrg - or, it would create a loop. */
2483 1.1 mrg
2484 1.1 mrg if (e->flags & EDGE_CROSSING
2485 1.1 mrg || tail_a->flags & BB_VISITED
2486 1.1 mrg || !tail_b
2487 1.1 mrg || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
2488 1.1 mrg || tail_a == tail_b)
2489 1.1 mrg {
2490 1.1 mrg edges[j] = 0;
2491 1.1 mrg continue;
2492 1.1 mrg }
2493 1.1 mrg
2494 1.1 mrg tail_a->aux = 0;
2495 1.1 mrg head_b->aux = 0;
2496 1.1 mrg head_a->aux = tail_b;
2497 1.1 mrg tail_b->aux = head_a;
2498 1.1 mrg tail_a->flags |= BB_VISITED;
2499 1.1 mrg }
2500 1.1 mrg
2501 1.1 mrg /* Put the pieces together, in the same order that the start blocks of
2502 1.1 mrg the sequences already had. The hot/cold partitioning gives a little
2503 1.1 mrg complication: as a first pass only do this for blocks in the same
2504 1.1 mrg partition as the start block, and (if there is anything left to do)
2505 1.1 mrg in a second pass handle the other partition. */
2506 1.1 mrg
2507 1.1 mrg basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2508 1.1 mrg
2509 1.1 mrg int current_partition
2510 1.1 mrg = BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun)
2511 1.1 mrg ? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0)->dest
2512 1.1 mrg : last_tail);
2513 1.1 mrg bool need_another_pass = true;
2514 1.1 mrg
2515 1.1 mrg for (int pass = 0; pass < 2 && need_another_pass; pass++)
2516 1.1 mrg {
2517 1.1 mrg need_another_pass = false;
2518 1.1 mrg
2519 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
2520 1.1 mrg if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
2521 1.1 mrg {
2522 1.1 mrg if (BB_PARTITION (bb) != current_partition)
2523 1.1 mrg {
2524 1.1 mrg need_another_pass = true;
2525 1.1 mrg continue;
2526 1.1 mrg }
2527 1.1 mrg
2528 1.1 mrg last_tail->aux = bb;
2529 1.1 mrg last_tail = (basic_block) bb->aux;
2530 1.1 mrg }
2531 1.1 mrg
2532 1.1 mrg current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
2533 1.1 mrg }
2534 1.1 mrg
2535 1.1 mrg last_tail->aux = 0;
2536 1.1 mrg
2537 1.1 mrg /* Finally, link all the chosen fallthrough edges. */
2538 1.1 mrg
2539 1.1 mrg for (int j = 0; j < n; j++)
2540 1.1 mrg if (edges[j])
2541 1.1 mrg edges[j]->src->aux = edges[j]->dest;
2542 1.1 mrg
2543 1.1 mrg delete[] edges;
2544 1.1 mrg
2545 1.1 mrg /* If the entry edge no longer falls through we have to make a new
2546 1.1 mrg block so it can do so again. */
2547 1.1 mrg
2548 1.1 mrg edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
2549 1.1 mrg if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
2550 1.1 mrg {
2551 1.1 mrg force_nonfallthru (e);
2552 1.1 mrg e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2553 1.1 mrg }
2554 1.1 mrg }
2555 1.1 mrg
2556 1.1 mrg /* Reorder basic blocks. The main entry point to this file. */
2557 1.1 mrg
2558 1.1 mrg static void
2559 1.1 mrg reorder_basic_blocks (void)
2560 1.1 mrg {
2561 1.1 mrg gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
2562 1.1 mrg
2563 1.1 mrg if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
2564 1.1 mrg return;
2565 1.1 mrg
2566 1.1 mrg set_edge_can_fallthru_flag ();
2567 1.1 mrg mark_dfs_back_edges ();
2568 1.1 mrg
2569 1.1 mrg switch (flag_reorder_blocks_algorithm)
2570 1.1 mrg {
2571 1.1 mrg case REORDER_BLOCKS_ALGORITHM_SIMPLE:
2572 1.1 mrg reorder_basic_blocks_simple ();
2573 1.1 mrg break;
2574 1.1 mrg
2575 1.1 mrg case REORDER_BLOCKS_ALGORITHM_STC:
2576 1.1 mrg reorder_basic_blocks_software_trace_cache ();
2577 1.1 mrg break;
2578 1.1 mrg
2579 1.1 mrg default:
2580 1.1 mrg gcc_unreachable ();
2581 1.1 mrg }
2582 1.1 mrg
2583 1.1 mrg relink_block_chain (/*stay_in_cfglayout_mode=*/true);
2584 1.1 mrg
2585 1.1 mrg if (dump_file)
2586 1.1 mrg {
2587 1.1 mrg if (dump_flags & TDF_DETAILS)
2588 1.1 mrg dump_reg_info (dump_file);
2589 1.1 mrg dump_flow_info (dump_file, dump_flags);
2590 1.1 mrg }
2591 1.1 mrg
2592 1.1 mrg /* Signal that rtl_verify_flow_info_1 can now verify that there
2593 1.1 mrg is at most one switch between hot/cold sections. */
2594 1.1 mrg crtl->bb_reorder_complete = true;
2595 1.1 mrg }
2596 1.1 mrg
2597 1.1 mrg /* Determine which partition the first basic block in the function
2598 1.1 mrg belongs to, then find the first basic block in the current function
2599 1.1 mrg that belongs to a different section, and insert a
2600 1.1 mrg NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the
2601 1.1 mrg instruction stream. When writing out the assembly code,
2602 1.1 mrg encountering this note will make the compiler switch between the
2603 1.1 mrg hot and cold text sections. */
2604 1.1 mrg
2605 1.1 mrg void
2606 1.1 mrg insert_section_boundary_note (void)
2607 1.1 mrg {
2608 1.1 mrg basic_block bb;
2609 1.1 mrg bool switched_sections = false;
2610 1.1 mrg int current_partition = 0;
2611 1.1 mrg
2612 1.1 mrg if (!crtl->has_bb_partition)
2613 1.1 mrg return;
2614 1.1 mrg
2615 1.1 mrg FOR_EACH_BB_FN (bb, cfun)
2616 1.1 mrg {
2617 1.1 mrg if (!current_partition)
2618 1.1 mrg current_partition = BB_PARTITION (bb);
2619 1.1 mrg if (BB_PARTITION (bb) != current_partition)
2620 1.1 mrg {
2621 1.1 mrg gcc_assert (!switched_sections);
2622 1.1 mrg switched_sections = true;
2623 1.1 mrg emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
2624 1.1 mrg current_partition = BB_PARTITION (bb);
2625 1.1 mrg }
2626 1.1 mrg }
2627 1.1 mrg
2628 1.1 mrg /* Make sure crtl->has_bb_partition matches reality even if bbpart finds
2629 1.1 mrg some hot and some cold basic blocks, but later one of those kinds is
2630 1.1 mrg optimized away. */
2631 1.1 mrg crtl->has_bb_partition = switched_sections;
2632 1.1 mrg }
2633 1.1 mrg
2634 1.1 mrg namespace {
2635 1.1 mrg
2636 1.1 mrg const pass_data pass_data_reorder_blocks =
2637 1.1 mrg {
2638 1.1 mrg RTL_PASS, /* type */
2639 1.1 mrg "bbro", /* name */
2640 1.1 mrg OPTGROUP_NONE, /* optinfo_flags */
2641 1.1 mrg TV_REORDER_BLOCKS, /* tv_id */
2642 1.1 mrg 0, /* properties_required */
2643 1.1 mrg 0, /* properties_provided */
2644 1.1 mrg 0, /* properties_destroyed */
2645 1.1 mrg 0, /* todo_flags_start */
2646 1.1 mrg 0, /* todo_flags_finish */
2647 1.1 mrg };
2648 1.1 mrg
2649 1.1 mrg class pass_reorder_blocks : public rtl_opt_pass
2650 1.1 mrg {
2651 1.1 mrg public:
2652 1.1 mrg pass_reorder_blocks (gcc::context *ctxt)
2653 1.1 mrg : rtl_opt_pass (pass_data_reorder_blocks, ctxt)
2654 1.1 mrg {}
2655 1.1 mrg
2656 1.1 mrg /* opt_pass methods: */
2657 1.1 mrg virtual bool gate (function *)
2658 1.1 mrg {
2659 1.1 mrg if (targetm.cannot_modify_jumps_p ())
2660 1.1 mrg return false;
2661 1.1 mrg return (optimize > 0
2662 1.1 mrg && (flag_reorder_blocks || flag_reorder_blocks_and_partition));
2663 1.1 mrg }
2664 1.1 mrg
2665 1.1 mrg virtual unsigned int execute (function *);
2666 1.1 mrg
2667 1.1 mrg }; // class pass_reorder_blocks
2668 1.1 mrg
2669 1.1 mrg unsigned int
2670 1.1 mrg pass_reorder_blocks::execute (function *fun)
2671 1.1 mrg {
2672 1.1 mrg basic_block bb;
2673 1.1 mrg
2674 1.1 mrg /* Last attempt to optimize CFG, as scheduling, peepholing and insn
2675 1.1 mrg splitting possibly introduced more crossjumping opportunities. */
2676 1.1 mrg cfg_layout_initialize (CLEANUP_EXPENSIVE);
2677 1.1 mrg
2678 1.1 mrg reorder_basic_blocks ();
2679 1.1 mrg cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING);
2680 1.1 mrg
2681 1.1 mrg FOR_EACH_BB_FN (bb, fun)
2682 1.1 mrg if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
2683 1.1 mrg bb->aux = bb->next_bb;
2684 1.1 mrg cfg_layout_finalize ();
2685 1.1 mrg
2686 1.1 mrg FOR_EACH_BB_FN (bb, fun)
2687 1.1 mrg df_recompute_luids (bb);
2688 1.1 mrg return 0;
2689 1.1 mrg }
2690 1.1 mrg
2691 1.1 mrg } // anon namespace
2692 1.1 mrg
2693 1.1 mrg rtl_opt_pass *
2694 1.1 mrg make_pass_reorder_blocks (gcc::context *ctxt)
2695 1.1 mrg {
2696 1.1 mrg return new pass_reorder_blocks (ctxt);
2697 1.1 mrg }
2698 1.1 mrg
2699 1.1 mrg /* Duplicate a block (that we already know ends in a computed jump) into its
2700 1.1 mrg predecessors, where possible. Return whether anything is changed. */
2701 1.1 mrg static bool
2702 1.1 mrg maybe_duplicate_computed_goto (basic_block bb, int max_size)
2703 1.1 mrg {
2704 1.1 mrg /* Make sure that the block is small enough. */
2705 1.1 mrg rtx_insn *insn;
2706 1.1 mrg FOR_BB_INSNS (bb, insn)
2707 1.1 mrg if (INSN_P (insn))
2708 1.1 mrg {
2709 1.1 mrg max_size -= get_attr_min_length (insn);
2710 1.1 mrg if (max_size < 0)
2711 1.1 mrg return false;
2712 1.1 mrg }
2713 1.1 mrg
2714 1.1 mrg bool changed = false;
2715 1.1 mrg edge e;
2716 1.1 mrg edge_iterator ei;
2717 1.1 mrg for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
2718 1.1 mrg {
2719 1.1 mrg basic_block pred = e->src;
2720 1.1 mrg
2721 1.1 mrg /* Do not duplicate BB into PRED if we cannot merge a copy of BB
2722 1.1 mrg with PRED. */
2723 1.1 mrg if (!single_succ_p (pred)
2724 1.1 mrg || e->flags & EDGE_COMPLEX
2725 1.1 mrg || pred->index < NUM_FIXED_BLOCKS
2726 1.1 mrg || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
2727 1.1 mrg || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
2728 1.1 mrg {
2729 1.1 mrg ei_next (&ei);
2730 1.1 mrg continue;
2731 1.1 mrg }
2732 1.1 mrg
2733 1.1 mrg if (dump_file)
2734 1.1 mrg fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
2735 1.1 mrg bb->index, e->src->index);
2736 1.1 mrg
2737 1.1 mrg /* Remember if PRED can be duplicated; if so, the copy of BB merged
2738 1.1 mrg with PRED can be duplicated as well. */
2739 1.1 mrg bool can_dup_more = can_duplicate_block_p (pred);
2740 1.1 mrg
2741 1.1 mrg /* Make a copy of BB, merge it into PRED. */
2742 1.1 mrg basic_block copy = duplicate_block (bb, e, NULL);
2743 1.1 mrg emit_barrier_after_bb (copy);
2744 1.1 mrg reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
2745 1.1 mrg merge_blocks (pred, copy);
2746 1.1 mrg
2747 1.1 mrg changed = true;
2748 1.1 mrg
2749 1.1 mrg /* Try to merge the resulting merged PRED into further predecessors. */
2750 1.1 mrg if (can_dup_more)
2751 1.1 mrg maybe_duplicate_computed_goto (pred, max_size);
2752 1.1 mrg }
2753 1.1 mrg
2754 1.1 mrg return changed;
2755 1.1 mrg }
2756 1.1 mrg
2757 1.1 mrg /* Duplicate the blocks containing computed gotos. This basically unfactors
2758 1.1 mrg computed gotos that were factored early on in the compilation process to
2759 1.1 mrg speed up edge based data flow. We used to not unfactor them again, which
2760 1.1 mrg can seriously pessimize code with many computed jumps in the source code,
2761 1.1 mrg such as interpreters. See e.g. PR15242. */
2762 1.1 mrg static void
2763 1.1 mrg duplicate_computed_gotos (function *fun)
2764 1.1 mrg {
2765 1.1 mrg /* We are estimating the length of uncond jump insn only once
2766 1.1 mrg since the code for getting the insn length always returns
2767 1.1 mrg the minimal length now. */
2768 1.1 mrg if (uncond_jump_length == 0)
2769 1.1 mrg uncond_jump_length = get_uncond_jump_length ();
2770 1.1 mrg
2771 1.1 mrg /* Never copy a block larger than this. */
2772 1.1 mrg int max_size
2773 1.1 mrg = uncond_jump_length * param_max_goto_duplication_insns;
2774 1.1 mrg
2775 1.1 mrg bool changed = false;
2776 1.1 mrg
2777 1.1 mrg /* Try to duplicate all blocks that end in a computed jump and that
2778 1.1 mrg can be duplicated at all. */
2779 1.1 mrg basic_block bb;
2780 1.1 mrg FOR_EACH_BB_FN (bb, fun)
2781 1.1 mrg if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
2782 1.1 mrg changed |= maybe_duplicate_computed_goto (bb, max_size);
2783 1.1 mrg
2784 1.1 mrg /* Some blocks may have become unreachable. */
2785 1.1 mrg if (changed)
2786 1.1 mrg cleanup_cfg (0);
2787 1.1 mrg
2788 1.1 mrg /* Duplicating blocks will redirect edges and may cause hot blocks
2789 1.1 mrg previously reached by both hot and cold blocks to become dominated
2790 1.1 mrg only by cold blocks. */
2791 1.1 mrg if (changed)
2792 1.1 mrg fixup_partitions ();
2793 1.1 mrg }
2794 1.1 mrg
2795 1.1 mrg namespace {
2796 1.1 mrg
2797 1.1 mrg const pass_data pass_data_duplicate_computed_gotos =
2798 1.1 mrg {
2799 1.1 mrg RTL_PASS, /* type */
2800 1.1 mrg "compgotos", /* name */
2801 1.1 mrg OPTGROUP_NONE, /* optinfo_flags */
2802 1.1 mrg TV_REORDER_BLOCKS, /* tv_id */
2803 1.1 mrg 0, /* properties_required */
2804 1.1 mrg 0, /* properties_provided */
2805 1.1 mrg 0, /* properties_destroyed */
2806 1.1 mrg 0, /* todo_flags_start */
2807 1.1 mrg 0, /* todo_flags_finish */
2808 1.1 mrg };
2809 1.1 mrg
2810 1.1 mrg class pass_duplicate_computed_gotos : public rtl_opt_pass
2811 1.1 mrg {
2812 1.1 mrg public:
2813 1.1 mrg pass_duplicate_computed_gotos (gcc::context *ctxt)
2814 1.1 mrg : rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt)
2815 1.1 mrg {}
2816 1.1 mrg
2817 1.1 mrg /* opt_pass methods: */
2818 1.1 mrg virtual bool gate (function *);
2819 1.1 mrg virtual unsigned int execute (function *);
2820 1.1 mrg
2821 1.1 mrg }; // class pass_duplicate_computed_gotos
2822 1.1 mrg
2823 1.1 mrg bool
2824 1.1 mrg pass_duplicate_computed_gotos::gate (function *fun)
2825 1.1 mrg {
2826 1.1 mrg if (targetm.cannot_modify_jumps_p ())
2827 1.1 mrg return false;
2828 1.1 mrg return (optimize > 0
2829 1.1 mrg && flag_expensive_optimizations
2830 1.1 mrg && ! optimize_function_for_size_p (fun));
2831 1.1 mrg }
2832 1.1 mrg
2833 1.1 mrg unsigned int
2834 1.1 mrg pass_duplicate_computed_gotos::execute (function *fun)
2835 1.1 mrg {
2836 1.1 mrg duplicate_computed_gotos (fun);
2837 1.1 mrg
2838 1.1 mrg return 0;
2839 1.1 mrg }
2840 1.1 mrg
2841 1.1 mrg } // anon namespace
2842 1.1 mrg
2843 1.1 mrg rtl_opt_pass *
2844 1.1 mrg make_pass_duplicate_computed_gotos (gcc::context *ctxt)
2845 1.1 mrg {
2846 1.1 mrg return new pass_duplicate_computed_gotos (ctxt);
2847 1.1 mrg }
2848 1.1 mrg
2849 1.1 mrg /* This function is the main 'entrance' for the optimization that
2850 1.1 mrg partitions hot and cold basic blocks into separate sections of the
2851 1.1 mrg .o file (to improve performance and cache locality). Ideally it
2852 1.1 mrg would be called after all optimizations that rearrange the CFG have
2853 1.1 mrg been called. However part of this optimization may introduce new
2854 1.1 mrg register usage, so it must be called before register allocation has
2855 1.1 mrg occurred. This means that this optimization is actually called
2856 1.1 mrg well before the optimization that reorders basic blocks (see
2857 1.1 mrg function above).
2858 1.1 mrg
2859 1.1 mrg This optimization checks the feedback information to determine
2860 1.1 mrg which basic blocks are hot/cold, updates flags on the basic blocks
2861 1.1 mrg to indicate which section they belong in. This information is
2862 1.1 mrg later used for writing out sections in the .o file. Because hot
2863 1.1 mrg and cold sections can be arbitrarily large (within the bounds of
2864 1.1 mrg memory), far beyond the size of a single function, it is necessary
2865 1.1 mrg to fix up all edges that cross section boundaries, to make sure the
2866 1.1 mrg instructions used can actually span the required distance. The
2867 1.1 mrg fixes are described below.
2868 1.1 mrg
2869 1.1 mrg Fall-through edges must be changed into jumps; it is not safe or
2870 1.1 mrg legal to fall through across a section boundary. Whenever a
2871 1.1 mrg fall-through edge crossing a section boundary is encountered, a new
2872 1.1 mrg basic block is inserted (in the same section as the fall-through
2873 1.1 mrg source), and the fall through edge is redirected to the new basic
2874 1.1 mrg block. The new basic block contains an unconditional jump to the
2875 1.1 mrg original fall-through target. (If the unconditional jump is
2876 1.1 mrg insufficient to cross section boundaries, that is dealt with a
2877 1.1 mrg little later, see below).
2878 1.1 mrg
2879 1.1 mrg In order to deal with architectures that have short conditional
2880 1.1 mrg branches (which cannot span all of memory) we take any conditional
2881 1.1 mrg jump that attempts to cross a section boundary and add a level of
2882 1.1 mrg indirection: it becomes a conditional jump to a new basic block, in
2883 1.1 mrg the same section. The new basic block contains an unconditional
2884 1.1 mrg jump to the original target, in the other section.
2885 1.1 mrg
2886 1.1 mrg For those architectures whose unconditional branch is also
2887 1.1 mrg incapable of reaching all of memory, those unconditional jumps are
2888 1.1 mrg converted into indirect jumps, through a register.
2889 1.1 mrg
2890 1.1 mrg IMPORTANT NOTE: This optimization causes some messy interactions
2891 1.1 mrg with the cfg cleanup optimizations; those optimizations want to
2892 1.1 mrg merge blocks wherever possible, and to collapse indirect jump
2893 1.1 mrg sequences (change "A jumps to B jumps to C" directly into "A jumps
2894 1.1 mrg to C"). Those optimizations can undo the jump fixes that
2895 1.1 mrg partitioning is required to make (see above), in order to ensure
2896 1.1 mrg that jumps attempting to cross section boundaries are really able
2897 1.1 mrg to cover whatever distance the jump requires (on many architectures
2898 1.1 mrg conditional or unconditional jumps are not able to reach all of
2899 1.1 mrg memory). Therefore tests have to be inserted into each such
2900 1.1 mrg optimization to make sure that it does not undo stuff necessary to
2901 1.1 mrg cross partition boundaries. This would be much less of a problem
2902 1.1 mrg if we could perform this optimization later in the compilation, but
2903 1.1 mrg unfortunately the fact that we may need to create indirect jumps
2904 1.1 mrg (through registers) requires that this optimization be performed
2905 1.1 mrg before register allocation.
2906 1.1 mrg
2907 1.1 mrg Hot and cold basic blocks are partitioned and put in separate
2908 1.1 mrg sections of the .o file, to reduce paging and improve cache
2909 1.1 mrg performance (hopefully). This can result in bits of code from the
2910 1.1 mrg same function being widely separated in the .o file. However this
2911 1.1 mrg is not obvious to the current bb structure. Therefore we must take
2912 1.1 mrg care to ensure that: 1). There are no fall_thru edges that cross
2913 1.1 mrg between sections; 2). For those architectures which have "short"
2914 1.1 mrg conditional branches, all conditional branches that attempt to
2915 1.1 mrg cross between sections are converted to unconditional branches;
2916 1.1 mrg and, 3). For those architectures which have "short" unconditional
2917 1.1 mrg branches, all unconditional branches that attempt to cross between
2918 1.1 mrg sections are converted to indirect jumps.
2919 1.1 mrg
2920 1.1 mrg The code for fixing up fall_thru edges that cross between hot and
2921 1.1 mrg cold basic blocks does so by creating new basic blocks containing
2922 1.1 mrg unconditional branches to the appropriate label in the "other"
2923 1.1 mrg section. The new basic block is then put in the same (hot or cold)
2924 1.1 mrg section as the original conditional branch, and the fall_thru edge
2925 1.1 mrg is modified to fall into the new basic block instead. By adding
2926 1.1 mrg this level of indirection we end up with only unconditional branches
2927 1.1 mrg crossing between hot and cold sections.
2928 1.1 mrg
2929 1.1 mrg Conditional branches are dealt with by adding a level of indirection.
2930 1.1 mrg A new basic block is added in the same (hot/cold) section as the
2931 1.1 mrg conditional branch, and the conditional branch is retargeted to the
2932 1.1 mrg new basic block. The new basic block contains an unconditional branch
2933 1.1 mrg to the original target of the conditional branch (in the other section).
2934 1.1 mrg
2935 1.1 mrg Unconditional branches are dealt with by converting them into
2936 1.1 mrg indirect jumps. */
2937 1.1 mrg
2938 1.1 mrg namespace {
2939 1.1 mrg
2940 1.1 mrg const pass_data pass_data_partition_blocks =
2941 1.1 mrg {
2942 1.1 mrg RTL_PASS, /* type */
2943 1.1 mrg "bbpart", /* name */
2944 1.1 mrg OPTGROUP_NONE, /* optinfo_flags */
2945 1.1 mrg TV_REORDER_BLOCKS, /* tv_id */
2946 1.1 mrg PROP_cfglayout, /* properties_required */
2947 1.1 mrg 0, /* properties_provided */
2948 1.1 mrg 0, /* properties_destroyed */
2949 1.1 mrg 0, /* todo_flags_start */
2950 1.1 mrg 0, /* todo_flags_finish */
2951 1.1 mrg };
2952 1.1 mrg
2953 1.1 mrg class pass_partition_blocks : public rtl_opt_pass
2954 1.1 mrg {
2955 1.1 mrg public:
2956 1.1 mrg pass_partition_blocks (gcc::context *ctxt)
2957 1.1 mrg : rtl_opt_pass (pass_data_partition_blocks, ctxt)
2958 1.1 mrg {}
2959 1.1 mrg
2960 1.1 mrg /* opt_pass methods: */
2961 1.1 mrg virtual bool gate (function *);
2962 1.1 mrg virtual unsigned int execute (function *);
2963 1.1 mrg
2964 1.1 mrg }; // class pass_partition_blocks
2965 1.1 mrg
2966 1.1 mrg bool
2967 1.1 mrg pass_partition_blocks::gate (function *fun)
2968 1.1 mrg {
2969 1.1 mrg /* The optimization to partition hot/cold basic blocks into separate
2970 1.1 mrg sections of the .o file does not work well with linkonce or with
2971 1.1 mrg user defined section attributes or with naked attribute. Don't call
2972 1.1 mrg it if either case arises. */
2973 1.1 mrg return (flag_reorder_blocks_and_partition
2974 1.1 mrg && optimize
2975 1.1 mrg /* See pass_reorder_blocks::gate. We should not partition if
2976 1.1 mrg we are going to omit the reordering. */
2977 1.1 mrg && optimize_function_for_speed_p (fun)
2978 1.1 mrg && !DECL_COMDAT_GROUP (current_function_decl)
2979 1.1 mrg && !lookup_attribute ("section", DECL_ATTRIBUTES (fun->decl))
2980 1.1 mrg && !lookup_attribute ("naked", DECL_ATTRIBUTES (fun->decl))
2981 1.1 mrg /* Workaround a bug in GDB where read_partial_die doesn't cope
2982 1.1 mrg with DIEs with DW_AT_ranges, see PR81115. */
2983 1.1 mrg && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl))));
2984 1.1 mrg }
2985 1.1 mrg
2986 1.1 mrg unsigned
2987 1.1 mrg pass_partition_blocks::execute (function *fun)
2988 1.1 mrg {
2989 1.1 mrg vec<edge> crossing_edges;
2990 1.1 mrg
2991 1.1 mrg if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
2992 1.1 mrg return 0;
2993 1.1 mrg
2994 1.1 mrg df_set_flags (DF_DEFER_INSN_RESCAN);
2995 1.1 mrg
2996 1.1 mrg crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
2997 1.1 mrg if (!crossing_edges.exists ())
2998 1.1 mrg /* Make sure to process deferred rescans and clear changeable df flags. */
2999 1.1 mrg return TODO_df_finish;
3000 1.1 mrg
3001 1.1 mrg crtl->has_bb_partition = true;
3002 1.1 mrg
3003 1.1 mrg /* Make sure the source of any crossing edge ends in a jump and the
3004 1.1 mrg destination of any crossing edge has a label. */
3005 1.1 mrg add_labels_and_missing_jumps (crossing_edges);
3006 1.1 mrg
3007 1.1 mrg /* Convert all crossing fall_thru edges to non-crossing fall
3008 1.1 mrg thrus to unconditional jumps (that jump to the original fall
3009 1.1 mrg through dest). */
3010 1.1 mrg fix_up_fall_thru_edges ();
3011 1.1 mrg
3012 1.1 mrg /* If the architecture does not have conditional branches that can
3013 1.1 mrg span all of memory, convert crossing conditional branches into
3014 1.1 mrg crossing unconditional branches. */
3015 1.1 mrg if (!HAS_LONG_COND_BRANCH)
3016 1.1 mrg fix_crossing_conditional_branches ();
3017 1.1 mrg
3018 1.1 mrg /* If the architecture does not have unconditional branches that
3019 1.1 mrg can span all of memory, convert crossing unconditional branches
3020 1.1 mrg into indirect jumps. Since adding an indirect jump also adds
3021 1.1 mrg a new register usage, update the register usage information as
3022 1.1 mrg well. */
3023 1.1 mrg if (!HAS_LONG_UNCOND_BRANCH)
3024 1.1 mrg fix_crossing_unconditional_branches ();
3025 1.1 mrg
3026 1.1 mrg update_crossing_jump_flags ();
3027 1.1 mrg
3028 1.1 mrg /* Clear bb->aux fields that the above routines were using. */
3029 1.1 mrg clear_aux_for_blocks ();
3030 1.1 mrg
3031 1.1 mrg crossing_edges.release ();
3032 1.1 mrg
3033 1.1 mrg /* ??? FIXME: DF generates the bb info for a block immediately.
3034 1.1 mrg And by immediately, I mean *during* creation of the block.
3035 1.1 mrg
3036 1.1 mrg #0 df_bb_refs_collect
3037 1.1 mrg #1 in df_bb_refs_record
3038 1.1 mrg #2 in create_basic_block_structure
3039 1.1 mrg
3040 1.1 mrg Which means that the bb_has_eh_pred test in df_bb_refs_collect
3041 1.1 mrg will *always* fail, because no edges can have been added to the
3042 1.1 mrg block yet. Which of course means we don't add the right
3043 1.1 mrg artificial refs, which means we fail df_verify (much) later.
3044 1.1 mrg
3045 1.1 mrg Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply
3046 1.1 mrg that we also shouldn't grab data from the new blocks those new
3047 1.1 mrg insns are in either. In this way one can create the block, link
3048 1.1 mrg it up properly, and have everything Just Work later, when deferred
3049 1.1 mrg insns are processed.
3050 1.1 mrg
3051 1.1 mrg In the meantime, we have no other option but to throw away all
3052 1.1 mrg of the DF data and recompute it all. */
3053 1.1 mrg if (fun->eh->lp_array)
3054 1.1 mrg {
3055 1.1 mrg df_finish_pass (true);
3056 1.1 mrg df_scan_alloc (NULL);
3057 1.1 mrg df_scan_blocks ();
3058 1.1 mrg /* Not all post-landing pads use all of the EH_RETURN_DATA_REGNO
3059 1.1 mrg data. We blindly generated all of them when creating the new
3060 1.1 mrg landing pad. Delete those assignments we don't use. */
3061 1.1 mrg df_set_flags (DF_LR_RUN_DCE);
3062 1.1 mrg df_analyze ();
3063 1.1 mrg }
3064 1.1 mrg
3065 1.1 mrg /* Make sure to process deferred rescans and clear changeable df flags. */
3066 1.1 mrg return TODO_df_finish;
3067 1.1 mrg }
3068 1.1 mrg
3069 1.1 mrg } // anon namespace
3070 1.1 mrg
3071 1.1 mrg rtl_opt_pass *
3072 1.1 mrg make_pass_partition_blocks (gcc::context *ctxt)
3073 1.1 mrg {
3074 1.1 mrg return new pass_partition_blocks (ctxt);
3075 }
3076