1 1.1 mrg /* Basic block reordering routines for the GNU compiler. 2 1.1 mrg Copyright (C) 2000-2022 Free Software Foundation, Inc. 3 1.1 mrg 4 1.1 mrg This file is part of GCC. 5 1.1 mrg 6 1.1 mrg GCC is free software; you can redistribute it and/or modify it 7 1.1 mrg under the terms of the GNU General Public License as published by 8 1.1 mrg the Free Software Foundation; either version 3, or (at your option) 9 1.1 mrg any later version. 10 1.1 mrg 11 1.1 mrg GCC is distributed in the hope that it will be useful, but WITHOUT 12 1.1 mrg ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 13 1.1 mrg or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public 14 1.1 mrg License for more details. 15 1.1 mrg 16 1.1 mrg You should have received a copy of the GNU General Public License 17 1.1 mrg along with GCC; see the file COPYING3. If not see 18 1.1 mrg <http://www.gnu.org/licenses/>. */ 19 1.1 mrg 20 1.1 mrg /* This file contains the "reorder blocks" pass, which changes the control 21 1.1 mrg flow of a function to encounter fewer branches; the "partition blocks" 22 1.1 mrg pass, which divides the basic blocks into "hot" and "cold" partitions, 23 1.1 mrg which are kept separate; and the "duplicate computed gotos" pass, which 24 1.1 mrg duplicates blocks ending in an indirect jump. 25 1.1 mrg 26 1.1 mrg There are two algorithms for "reorder blocks": the "simple" algorithm, 27 1.1 mrg which just rearranges blocks, trying to minimize the number of executed 28 1.1 mrg unconditional branches; and the "software trace cache" algorithm, which 29 1.1 mrg also copies code, and in general tries a lot harder to have long linear 30 1.1 mrg pieces of machine code executed. This algorithm is described next. */ 31 1.1 mrg 32 1.1 mrg /* This (greedy) algorithm constructs traces in several rounds. 33 1.1 mrg The construction starts from "seeds". The seed for the first round 34 1.1 mrg is the entry point of the function. When there are more than one seed, 35 1.1 mrg the one with the lowest key in the heap is selected first (see bb_to_key). 36 1.1 mrg Then the algorithm repeatedly adds the most probable successor to the end 37 1.1 mrg of a trace. Finally it connects the traces. 38 1.1 mrg 39 1.1 mrg There are two parameters: Branch Threshold and Exec Threshold. 40 1.1 mrg If the probability of an edge to a successor of the current basic block is 41 1.1 mrg lower than Branch Threshold or its count is lower than Exec Threshold, 42 1.1 mrg then the successor will be the seed in one of the next rounds. 43 1.1 mrg Each round has these parameters lower than the previous one. 44 1.1 mrg The last round has to have these parameters set to zero so that the 45 1.1 mrg remaining blocks are picked up. 46 1.1 mrg 47 1.1 mrg The algorithm selects the most probable successor from all unvisited 48 1.1 mrg successors and successors that have been added to this trace. 49 1.1 mrg The other successors (that has not been "sent" to the next round) will be 50 1.1 mrg other seeds for this round and the secondary traces will start from them. 51 1.1 mrg If the successor has not been visited in this trace, it is added to the 52 1.1 mrg trace (however, there is some heuristic for simple branches). 53 1.1 mrg If the successor has been visited in this trace, a loop has been found. 54 1.1 mrg If the loop has many iterations, the loop is rotated so that the source 55 1.1 mrg block of the most probable edge going out of the loop is the last block 56 1.1 mrg of the trace. 57 1.1 mrg If the loop has few iterations and there is no edge from the last block of 58 1.1 mrg the loop going out of the loop, the loop header is duplicated. 59 1.1 mrg 60 1.1 mrg When connecting traces, the algorithm first checks whether there is an edge 61 1.1 mrg from the last block of a trace to the first block of another trace. 62 1.1 mrg When there are still some unconnected traces it checks whether there exists 63 1.1 mrg a basic block BB such that BB is a successor of the last block of a trace 64 1.1 mrg and BB is a predecessor of the first block of another trace. In this case, 65 1.1 mrg BB is duplicated, added at the end of the first trace and the traces are 66 1.1 mrg connected through it. 67 1.1 mrg The rest of traces are simply connected so there will be a jump to the 68 1.1 mrg beginning of the rest of traces. 69 1.1 mrg 70 1.1 mrg The above description is for the full algorithm, which is used when the 71 1.1 mrg function is optimized for speed. When the function is optimized for size, 72 1.1 mrg in order to reduce long jumps and connect more fallthru edges, the 73 1.1 mrg algorithm is modified as follows: 74 1.1 mrg (1) Break long traces to short ones. A trace is broken at a block that has 75 1.1 mrg multiple predecessors/ successors during trace discovery. When connecting 76 1.1 mrg traces, only connect Trace n with Trace n + 1. This change reduces most 77 1.1 mrg long jumps compared with the above algorithm. 78 1.1 mrg (2) Ignore the edge probability and count for fallthru edges. 79 1.1 mrg (3) Keep the original order of blocks when there is no chance to fall 80 1.1 mrg through. We rely on the results of cfg_cleanup. 81 1.1 mrg 82 1.1 mrg To implement the change for code size optimization, block's index is 83 1.1 mrg selected as the key and all traces are found in one round. 84 1.1 mrg 85 1.1 mrg References: 86 1.1 mrg 87 1.1 mrg "Software Trace Cache" 88 1.1 mrg A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999 89 1.1 mrg http://citeseer.nj.nec.com/15361.html 90 1.1 mrg 91 1.1 mrg */ 92 1.1 mrg 93 1.1 mrg #include "config.h" 94 1.1 mrg #include "system.h" 95 1.1 mrg #include "coretypes.h" 96 1.1 mrg #include "backend.h" 97 1.1 mrg #include "target.h" 98 1.1 mrg #include "rtl.h" 99 1.1 mrg #include "tree.h" 100 1.1 mrg #include "cfghooks.h" 101 1.1 mrg #include "df.h" 102 1.1 mrg #include "memmodel.h" 103 1.1 mrg #include "optabs.h" 104 1.1 mrg #include "regs.h" 105 1.1 mrg #include "emit-rtl.h" 106 1.1 mrg #include "output.h" 107 1.1 mrg #include "expr.h" 108 1.1 mrg #include "tree-pass.h" 109 1.1 mrg #include "cfgrtl.h" 110 1.1 mrg #include "cfganal.h" 111 1.1 mrg #include "cfgbuild.h" 112 1.1 mrg #include "cfgcleanup.h" 113 1.1 mrg #include "bb-reorder.h" 114 1.1 mrg #include "except.h" 115 1.1 mrg #include "alloc-pool.h" 116 1.1 mrg #include "fibonacci_heap.h" 117 1.1 mrg #include "stringpool.h" 118 1.1 mrg #include "attribs.h" 119 1.1 mrg #include "common/common-target.h" 120 1.1 mrg 121 1.1 mrg /* The number of rounds. In most cases there will only be 4 rounds, but 122 1.1 mrg when partitioning hot and cold basic blocks into separate sections of 123 1.1 mrg the object file there will be an extra round. */ 124 1.1 mrg #define N_ROUNDS 5 125 1.1 mrg 126 1.1 mrg struct target_bb_reorder default_target_bb_reorder; 127 1.1 mrg #if SWITCHABLE_TARGET 128 1.1 mrg struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder; 129 1.1 mrg #endif 130 1.1 mrg 131 1.1 mrg #define uncond_jump_length \ 132 1.1 mrg (this_target_bb_reorder->x_uncond_jump_length) 133 1.1 mrg 134 1.1 mrg /* Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. */ 135 1.1 mrg static const int branch_threshold[N_ROUNDS] = {400, 200, 100, 0, 0}; 136 1.1 mrg 137 1.1 mrg /* Exec thresholds in thousandths (per mille) of the count of bb 0. */ 138 1.1 mrg static const int exec_threshold[N_ROUNDS] = {500, 200, 50, 0, 0}; 139 1.1 mrg 140 1.1 mrg /* If edge count is lower than DUPLICATION_THRESHOLD per mille of entry 141 1.1 mrg block the edge destination is not duplicated while connecting traces. */ 142 1.1 mrg #define DUPLICATION_THRESHOLD 100 143 1.1 mrg 144 1.1 mrg typedef fibonacci_heap <long, basic_block_def> bb_heap_t; 145 1.1 mrg typedef fibonacci_node <long, basic_block_def> bb_heap_node_t; 146 1.1 mrg 147 1.1 mrg /* Structure to hold needed information for each basic block. */ 148 1.1 mrg struct bbro_basic_block_data 149 1.1 mrg { 150 1.1 mrg /* Which trace is the bb start of (-1 means it is not a start of any). */ 151 1.1 mrg int start_of_trace; 152 1.1 mrg 153 1.1 mrg /* Which trace is the bb end of (-1 means it is not an end of any). */ 154 1.1 mrg int end_of_trace; 155 1.1 mrg 156 1.1 mrg /* Which trace is the bb in? */ 157 1.1 mrg int in_trace; 158 1.1 mrg 159 1.1 mrg /* Which trace was this bb visited in? */ 160 1.1 mrg int visited; 161 1.1 mrg 162 1.1 mrg /* Cached maximum frequency of interesting incoming edges. 163 1.1 mrg Minus one means not yet computed. */ 164 1.1 mrg int priority; 165 1.1 mrg 166 1.1 mrg /* Which heap is BB in (if any)? */ 167 1.1 mrg bb_heap_t *heap; 168 1.1 mrg 169 1.1 mrg /* Which heap node is BB in (if any)? */ 170 1.1 mrg bb_heap_node_t *node; 171 1.1 mrg }; 172 1.1 mrg 173 1.1 mrg /* The current size of the following dynamic array. */ 174 1.1 mrg static int array_size; 175 1.1 mrg 176 1.1 mrg /* The array which holds needed information for basic blocks. */ 177 1.1 mrg static bbro_basic_block_data *bbd; 178 1.1 mrg 179 1.1 mrg /* To avoid frequent reallocation the size of arrays is greater than needed, 180 1.1 mrg the number of elements is (not less than) 1.25 * size_wanted. */ 181 1.1 mrg #define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5) 182 1.1 mrg 183 1.1 mrg /* Free the memory and set the pointer to NULL. */ 184 1.1 mrg #define FREE(P) (gcc_assert (P), free (P), P = 0) 185 1.1 mrg 186 1.1 mrg /* Structure for holding information about a trace. */ 187 1.1 mrg struct trace 188 1.1 mrg { 189 1.1 mrg /* First and last basic block of the trace. */ 190 1.1 mrg basic_block first, last; 191 1.1 mrg 192 1.1 mrg /* The round of the STC creation which this trace was found in. */ 193 1.1 mrg int round; 194 1.1 mrg 195 1.1 mrg /* The length (i.e. the number of basic blocks) of the trace. */ 196 1.1 mrg int length; 197 1.1 mrg }; 198 1.1 mrg 199 1.1 mrg /* Maximum count of one of the entry blocks. */ 200 1.1 mrg static profile_count max_entry_count; 201 1.1 mrg 202 1.1 mrg /* Local function prototypes. */ 203 1.1 mrg static void find_traces_1_round (int, profile_count, struct trace *, int *, 204 1.1 mrg int, bb_heap_t **, int); 205 1.1 mrg static basic_block copy_bb (basic_block, edge, basic_block, int); 206 1.1 mrg static long bb_to_key (basic_block); 207 1.1 mrg static bool better_edge_p (const_basic_block, const_edge, profile_probability, 208 1.1 mrg profile_count, profile_probability, profile_count, 209 1.1 mrg const_edge); 210 1.1 mrg static bool copy_bb_p (const_basic_block, int); 211 1.1 mrg 212 1.1 mrg /* Return the trace number in which BB was visited. */ 214 1.1 mrg 215 1.1 mrg static int 216 1.1 mrg bb_visited_trace (const_basic_block bb) 217 1.1 mrg { 218 1.1 mrg gcc_assert (bb->index < array_size); 219 1.1 mrg return bbd[bb->index].visited; 220 1.1 mrg } 221 1.1 mrg 222 1.1 mrg /* This function marks BB that it was visited in trace number TRACE. */ 223 1.1 mrg 224 1.1 mrg static void 225 1.1 mrg mark_bb_visited (basic_block bb, int trace) 226 1.1 mrg { 227 1.1 mrg bbd[bb->index].visited = trace; 228 1.1 mrg if (bbd[bb->index].heap) 229 1.1 mrg { 230 1.1 mrg bbd[bb->index].heap->delete_node (bbd[bb->index].node); 231 1.1 mrg bbd[bb->index].heap = NULL; 232 1.1 mrg bbd[bb->index].node = NULL; 233 1.1 mrg } 234 1.1 mrg } 235 1.1 mrg 236 1.1 mrg /* Check to see if bb should be pushed into the next round of trace 237 1.1 mrg collections or not. Reasons for pushing the block forward are 1). 238 1.1 mrg If the block is cold, we are doing partitioning, and there will be 239 1.1 mrg another round (cold partition blocks are not supposed to be 240 1.1 mrg collected into traces until the very last round); or 2). There will 241 1.1 mrg be another round, and the basic block is not "hot enough" for the 242 1.1 mrg current round of trace collection. */ 243 1.1 mrg 244 1.1 mrg static bool 245 1.1 mrg push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds, 246 1.1 mrg profile_count count_th) 247 1.1 mrg { 248 1.1 mrg bool there_exists_another_round; 249 1.1 mrg bool block_not_hot_enough; 250 1.1 mrg 251 1.1 mrg there_exists_another_round = round < number_of_rounds - 1; 252 1.1 mrg 253 1.1 mrg block_not_hot_enough = (bb->count < count_th 254 1.1 mrg || probably_never_executed_bb_p (cfun, bb)); 255 1.1 mrg 256 1.1 mrg if (there_exists_another_round 257 1.1 mrg && block_not_hot_enough) 258 1.1 mrg return true; 259 1.1 mrg else 260 1.1 mrg return false; 261 1.1 mrg } 262 1.1 mrg 263 1.1 mrg /* Find the traces for Software Trace Cache. Chain each trace through 264 1.1 mrg RBI()->next. Store the number of traces to N_TRACES and description of 265 1.1 mrg traces to TRACES. */ 266 1.1 mrg 267 1.1 mrg static void 268 1.1 mrg find_traces (int *n_traces, struct trace *traces) 269 1.1 mrg { 270 1.1 mrg int i; 271 1.1 mrg int number_of_rounds; 272 1.1 mrg edge e; 273 1.1 mrg edge_iterator ei; 274 1.1 mrg bb_heap_t *heap = new bb_heap_t (LONG_MIN); 275 1.1 mrg 276 1.1 mrg /* Add one extra round of trace collection when partitioning hot/cold 277 1.1 mrg basic blocks into separate sections. The last round is for all the 278 1.1 mrg cold blocks (and ONLY the cold blocks). */ 279 1.1 mrg 280 1.1 mrg number_of_rounds = N_ROUNDS - 1; 281 1.1 mrg 282 1.1 mrg /* Insert entry points of function into heap. */ 283 1.1 mrg max_entry_count = profile_count::zero (); 284 1.1 mrg FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs) 285 1.1 mrg { 286 1.1 mrg bbd[e->dest->index].heap = heap; 287 1.1 mrg bbd[e->dest->index].node = heap->insert (bb_to_key (e->dest), e->dest); 288 1.1 mrg if (e->dest->count > max_entry_count) 289 1.1 mrg max_entry_count = e->dest->count; 290 1.1 mrg } 291 1.1 mrg 292 1.1 mrg /* Find the traces. */ 293 1.1 mrg for (i = 0; i < number_of_rounds; i++) 294 1.1 mrg { 295 1.1 mrg profile_count count_threshold; 296 1.1 mrg 297 1.1 mrg if (dump_file) 298 1.1 mrg fprintf (dump_file, "STC - round %d\n", i + 1); 299 1.1 mrg 300 1.1 mrg count_threshold = max_entry_count.apply_scale (exec_threshold[i], 1000); 301 1.1 mrg 302 1.1 mrg find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / 1000, 303 1.1 mrg count_threshold, traces, n_traces, i, &heap, 304 1.1 mrg number_of_rounds); 305 1.1 mrg } 306 1.1 mrg delete heap; 307 1.1 mrg 308 1.1 mrg if (dump_file) 309 1.1 mrg { 310 1.1 mrg for (i = 0; i < *n_traces; i++) 311 1.1 mrg { 312 1.1 mrg basic_block bb; 313 1.1 mrg fprintf (dump_file, "Trace %d (round %d): ", i + 1, 314 1.1 mrg traces[i].round + 1); 315 1.1 mrg for (bb = traces[i].first; 316 1.1 mrg bb != traces[i].last; 317 1.1 mrg bb = (basic_block) bb->aux) 318 1.1 mrg { 319 1.1 mrg fprintf (dump_file, "%d [", bb->index); 320 1.1 mrg bb->count.dump (dump_file); 321 1.1 mrg fprintf (dump_file, "] "); 322 1.1 mrg } 323 1.1 mrg fprintf (dump_file, "%d [", bb->index); 324 1.1 mrg bb->count.dump (dump_file); 325 1.1 mrg fprintf (dump_file, "]\n"); 326 1.1 mrg } 327 1.1 mrg fflush (dump_file); 328 1.1 mrg } 329 1.1 mrg } 330 1.1 mrg 331 1.1 mrg /* Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE 332 1.1 mrg (with sequential number TRACE_N). */ 333 1.1 mrg 334 1.1 mrg static basic_block 335 1.1 mrg rotate_loop (edge back_edge, struct trace *trace, int trace_n) 336 1.1 mrg { 337 1.1 mrg basic_block bb; 338 1.1 mrg 339 1.1 mrg /* Information about the best end (end after rotation) of the loop. */ 340 1.1 mrg basic_block best_bb = NULL; 341 1.1 mrg edge best_edge = NULL; 342 1.1 mrg profile_count best_count = profile_count::uninitialized (); 343 1.1 mrg /* The best edge is preferred when its destination is not visited yet 344 1.1 mrg or is a start block of some trace. */ 345 1.1 mrg bool is_preferred = false; 346 1.1 mrg 347 1.1 mrg /* Find the most frequent edge that goes out from current trace. */ 348 1.1 mrg bb = back_edge->dest; 349 1.1 mrg do 350 1.1 mrg { 351 1.1 mrg edge e; 352 1.1 mrg edge_iterator ei; 353 1.1 mrg 354 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 355 1.1 mrg if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 356 1.1 mrg && bb_visited_trace (e->dest) != trace_n 357 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU) 358 1.1 mrg && !(e->flags & EDGE_COMPLEX)) 359 1.1 mrg { 360 1.1 mrg if (is_preferred) 361 1.1 mrg { 362 1.1 mrg /* The best edge is preferred. */ 363 1.1 mrg if (!bb_visited_trace (e->dest) 364 1.1 mrg || bbd[e->dest->index].start_of_trace >= 0) 365 1.1 mrg { 366 1.1 mrg /* The current edge E is also preferred. */ 367 1.1 mrg if (e->count () > best_count) 368 1.1 mrg { 369 1.1 mrg best_count = e->count (); 370 1.1 mrg best_edge = e; 371 1.1 mrg best_bb = bb; 372 1.1 mrg } 373 1.1 mrg } 374 1.1 mrg } 375 1.1 mrg else 376 1.1 mrg { 377 1.1 mrg if (!bb_visited_trace (e->dest) 378 1.1 mrg || bbd[e->dest->index].start_of_trace >= 0) 379 1.1 mrg { 380 1.1 mrg /* The current edge E is preferred. */ 381 1.1 mrg is_preferred = true; 382 1.1 mrg best_count = e->count (); 383 1.1 mrg best_edge = e; 384 1.1 mrg best_bb = bb; 385 1.1 mrg } 386 1.1 mrg else 387 1.1 mrg { 388 1.1 mrg if (!best_edge || e->count () > best_count) 389 1.1 mrg { 390 1.1 mrg best_count = e->count (); 391 1.1 mrg best_edge = e; 392 1.1 mrg best_bb = bb; 393 1.1 mrg } 394 1.1 mrg } 395 1.1 mrg } 396 1.1 mrg } 397 1.1 mrg bb = (basic_block) bb->aux; 398 1.1 mrg } 399 1.1 mrg while (bb != back_edge->dest); 400 1.1 mrg 401 1.1 mrg if (best_bb) 402 1.1 mrg { 403 1.1 mrg /* Rotate the loop so that the BEST_EDGE goes out from the last block of 404 1.1 mrg the trace. */ 405 1.1 mrg if (back_edge->dest == trace->first) 406 1.1 mrg { 407 1.1 mrg trace->first = (basic_block) best_bb->aux; 408 1.1 mrg } 409 1.1 mrg else 410 1.1 mrg { 411 1.1 mrg basic_block prev_bb; 412 1.1 mrg 413 1.1 mrg for (prev_bb = trace->first; 414 1.1 mrg prev_bb->aux != back_edge->dest; 415 1.1 mrg prev_bb = (basic_block) prev_bb->aux) 416 1.1 mrg ; 417 1.1 mrg prev_bb->aux = best_bb->aux; 418 1.1 mrg 419 1.1 mrg /* Try to get rid of uncond jump to cond jump. */ 420 1.1 mrg if (single_succ_p (prev_bb)) 421 1.1 mrg { 422 1.1 mrg basic_block header = single_succ (prev_bb); 423 1.1 mrg 424 1.1 mrg /* Duplicate HEADER if it is a small block containing cond jump 425 1.1 mrg in the end. */ 426 1.1 mrg if (any_condjump_p (BB_END (header)) && copy_bb_p (header, 0) 427 1.1 mrg && !CROSSING_JUMP_P (BB_END (header))) 428 1.1 mrg copy_bb (header, single_succ_edge (prev_bb), prev_bb, trace_n); 429 1.1 mrg } 430 1.1 mrg } 431 1.1 mrg } 432 1.1 mrg else 433 1.1 mrg { 434 1.1 mrg /* We have not found suitable loop tail so do no rotation. */ 435 1.1 mrg best_bb = back_edge->src; 436 1.1 mrg } 437 1.1 mrg best_bb->aux = NULL; 438 1.1 mrg return best_bb; 439 1.1 mrg } 440 1.1 mrg 441 1.1 mrg /* One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do 442 1.1 mrg not include basic blocks whose probability is lower than BRANCH_TH or whose 443 1.1 mrg count is lower than EXEC_TH into traces (or whose count is lower than 444 1.1 mrg COUNT_TH). Store the new traces into TRACES and modify the number of 445 1.1 mrg traces *N_TRACES. Set the round (which the trace belongs to) to ROUND. 446 1.1 mrg The function expects starting basic blocks to be in *HEAP and will delete 447 1.1 mrg *HEAP and store starting points for the next round into new *HEAP. */ 448 1.1 mrg 449 1.1 mrg static void 450 1.1 mrg find_traces_1_round (int branch_th, profile_count count_th, 451 1.1 mrg struct trace *traces, int *n_traces, int round, 452 1.1 mrg bb_heap_t **heap, int number_of_rounds) 453 1.1 mrg { 454 1.1 mrg /* Heap for discarded basic blocks which are possible starting points for 455 1.1 mrg the next round. */ 456 1.1 mrg bb_heap_t *new_heap = new bb_heap_t (LONG_MIN); 457 1.1 mrg bool for_size = optimize_function_for_size_p (cfun); 458 1.1 mrg 459 1.1 mrg while (!(*heap)->empty ()) 460 1.1 mrg { 461 1.1 mrg basic_block bb; 462 1.1 mrg struct trace *trace; 463 1.1 mrg edge best_edge, e; 464 1.1 mrg long key; 465 1.1 mrg edge_iterator ei; 466 1.1 mrg 467 1.1 mrg bb = (*heap)->extract_min (); 468 1.1 mrg bbd[bb->index].heap = NULL; 469 1.1 mrg bbd[bb->index].node = NULL; 470 1.1 mrg 471 1.1 mrg if (dump_file) 472 1.1 mrg fprintf (dump_file, "Getting bb %d\n", bb->index); 473 1.1 mrg 474 1.1 mrg /* If the BB's count is too low, send BB to the next round. When 475 1.1 mrg partitioning hot/cold blocks into separate sections, make sure all 476 1.1 mrg the cold blocks (and ONLY the cold blocks) go into the (extra) final 477 1.1 mrg round. When optimizing for size, do not push to next round. */ 478 1.1 mrg 479 1.1 mrg if (!for_size 480 1.1 mrg && push_to_next_round_p (bb, round, number_of_rounds, 481 1.1 mrg count_th)) 482 1.1 mrg { 483 1.1 mrg int key = bb_to_key (bb); 484 1.1 mrg bbd[bb->index].heap = new_heap; 485 1.1 mrg bbd[bb->index].node = new_heap->insert (key, bb); 486 1.1 mrg 487 1.1 mrg if (dump_file) 488 1.1 mrg fprintf (dump_file, 489 1.1 mrg " Possible start point of next round: %d (key: %d)\n", 490 1.1 mrg bb->index, key); 491 1.1 mrg continue; 492 1.1 mrg } 493 1.1 mrg 494 1.1 mrg trace = traces + *n_traces; 495 1.1 mrg trace->first = bb; 496 1.1 mrg trace->round = round; 497 1.1 mrg trace->length = 0; 498 1.1 mrg bbd[bb->index].in_trace = *n_traces; 499 1.1 mrg (*n_traces)++; 500 1.1 mrg 501 1.1 mrg do 502 1.1 mrg { 503 1.1 mrg bool ends_in_call; 504 1.1 mrg 505 1.1 mrg /* The probability and count of the best edge. */ 506 1.1 mrg profile_probability best_prob = profile_probability::uninitialized (); 507 1.1 mrg profile_count best_count = profile_count::uninitialized (); 508 1.1 mrg 509 1.1 mrg best_edge = NULL; 510 1.1 mrg mark_bb_visited (bb, *n_traces); 511 1.1 mrg trace->length++; 512 1.1 mrg 513 1.1 mrg if (dump_file) 514 1.1 mrg fprintf (dump_file, "Basic block %d was visited in trace %d\n", 515 1.1 mrg bb->index, *n_traces); 516 1.1 mrg 517 1.1 mrg ends_in_call = block_ends_with_call_p (bb); 518 1.1 mrg 519 1.1 mrg /* Select the successor that will be placed after BB. */ 520 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 521 1.1 mrg { 522 1.1 mrg gcc_assert (!(e->flags & EDGE_FAKE)); 523 1.1 mrg 524 1.1 mrg if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) 525 1.1 mrg continue; 526 1.1 mrg 527 1.1 mrg if (bb_visited_trace (e->dest) 528 1.1 mrg && bb_visited_trace (e->dest) != *n_traces) 529 1.1 mrg continue; 530 1.1 mrg 531 1.1 mrg /* If partitioning hot/cold basic blocks, don't consider edges 532 1.1 mrg that cross section boundaries. */ 533 1.1 mrg if (BB_PARTITION (e->dest) != BB_PARTITION (bb)) 534 1.1 mrg continue; 535 1.1 mrg 536 1.1 mrg profile_probability prob = e->probability; 537 1.1 mrg profile_count count = e->dest->count; 538 1.1 mrg 539 1.1 mrg /* The only sensible preference for a call instruction is the 540 1.1 mrg fallthru edge. Don't bother selecting anything else. */ 541 1.1 mrg if (ends_in_call) 542 1.1 mrg { 543 1.1 mrg if (e->flags & EDGE_CAN_FALLTHRU) 544 1.1 mrg { 545 1.1 mrg best_edge = e; 546 1.1 mrg best_prob = prob; 547 1.1 mrg best_count = count; 548 1.1 mrg } 549 1.1 mrg continue; 550 1.1 mrg } 551 1.1 mrg 552 1.1 mrg /* Edge that cannot be fallthru or improbable or infrequent 553 1.1 mrg successor (i.e. it is unsuitable successor). When optimizing 554 1.1 mrg for size, ignore the probability and count. */ 555 1.1 mrg if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX) 556 1.1 mrg || !prob.initialized_p () 557 1.1 mrg || ((prob.to_reg_br_prob_base () < branch_th 558 1.1 mrg || e->count () < count_th) && (!for_size))) 559 1.1 mrg continue; 560 1.1 mrg 561 1.1 mrg if (better_edge_p (bb, e, prob, count, best_prob, best_count, 562 1.1 mrg best_edge)) 563 1.1 mrg { 564 1.1 mrg best_edge = e; 565 1.1 mrg best_prob = prob; 566 1.1 mrg best_count = count; 567 1.1 mrg } 568 1.1 mrg } 569 1.1 mrg 570 1.1 mrg /* If the best destination has multiple predecessors and can be 571 1.1 mrg duplicated cheaper than a jump, don't allow it to be added to 572 1.1 mrg a trace; we'll duplicate it when connecting the traces later. 573 1.1 mrg However, we need to check that this duplication wouldn't leave 574 1.1 mrg the best destination with only crossing predecessors, because 575 1.1 mrg this would change its effective partition from hot to cold. */ 576 1.1 mrg if (best_edge 577 1.1 mrg && EDGE_COUNT (best_edge->dest->preds) >= 2 578 1.1 mrg && copy_bb_p (best_edge->dest, 0)) 579 1.1 mrg { 580 1.1 mrg bool only_crossing_preds = true; 581 1.1 mrg edge e; 582 1.1 mrg edge_iterator ei; 583 1.1 mrg FOR_EACH_EDGE (e, ei, best_edge->dest->preds) 584 1.1 mrg if (e != best_edge && !(e->flags & EDGE_CROSSING)) 585 1.1 mrg { 586 1.1 mrg only_crossing_preds = false; 587 1.1 mrg break; 588 1.1 mrg } 589 1.1 mrg if (!only_crossing_preds) 590 1.1 mrg best_edge = NULL; 591 1.1 mrg } 592 1.1 mrg 593 1.1 mrg /* If the best destination has multiple successors or predecessors, 594 1.1 mrg don't allow it to be added when optimizing for size. This makes 595 1.1 mrg sure predecessors with smaller index are handled before the best 596 1.1 mrg destination. It breaks long trace and reduces long jumps. 597 1.1 mrg 598 1.1 mrg Take if-then-else as an example. 599 1.1 mrg A 600 1.1 mrg / \ 601 1.1 mrg B C 602 1.1 mrg \ / 603 1.1 mrg D 604 1.1 mrg If we do not remove the best edge B->D/C->D, the final order might 605 1.1 mrg be A B D ... C. C is at the end of the program. If D's successors 606 1.1 mrg and D are complicated, might need long jumps for A->C and C->D. 607 1.1 mrg Similar issue for order: A C D ... B. 608 1.1 mrg 609 1.1 mrg After removing the best edge, the final result will be ABCD/ ACBD. 610 1.1 mrg It does not add jump compared with the previous order. But it 611 1.1 mrg reduces the possibility of long jumps. */ 612 1.1 mrg if (best_edge && for_size 613 1.1 mrg && (EDGE_COUNT (best_edge->dest->succs) > 1 614 1.1 mrg || EDGE_COUNT (best_edge->dest->preds) > 1)) 615 1.1 mrg best_edge = NULL; 616 1.1 mrg 617 1.1 mrg /* Add all non-selected successors to the heaps. */ 618 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 619 1.1 mrg { 620 1.1 mrg if (e == best_edge 621 1.1 mrg || e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun) 622 1.1 mrg || bb_visited_trace (e->dest)) 623 1.1 mrg continue; 624 1.1 mrg 625 1.1 mrg key = bb_to_key (e->dest); 626 1.1 mrg 627 1.1 mrg if (bbd[e->dest->index].heap) 628 1.1 mrg { 629 1.1 mrg /* E->DEST is already in some heap. */ 630 1.1 mrg if (key != bbd[e->dest->index].node->get_key ()) 631 1.1 mrg { 632 1.1 mrg if (dump_file) 633 1.1 mrg { 634 1.1 mrg fprintf (dump_file, 635 1.1 mrg "Changing key for bb %d from %ld to %ld.\n", 636 1.1 mrg e->dest->index, 637 1.1 mrg (long) bbd[e->dest->index].node->get_key (), 638 1.1 mrg key); 639 1.1 mrg } 640 1.1 mrg bbd[e->dest->index].heap->replace_key 641 1.1 mrg (bbd[e->dest->index].node, key); 642 1.1 mrg } 643 1.1 mrg } 644 1.1 mrg else 645 1.1 mrg { 646 1.1 mrg bb_heap_t *which_heap = *heap; 647 1.1 mrg 648 1.1 mrg profile_probability prob = e->probability; 649 1.1 mrg 650 1.1 mrg if (!(e->flags & EDGE_CAN_FALLTHRU) 651 1.1 mrg || (e->flags & EDGE_COMPLEX) 652 1.1 mrg || !prob.initialized_p () 653 1.1 mrg || prob.to_reg_br_prob_base () < branch_th 654 1.1 mrg || e->count () < count_th) 655 1.1 mrg { 656 1.1 mrg /* When partitioning hot/cold basic blocks, make sure 657 1.1 mrg the cold blocks (and only the cold blocks) all get 658 1.1 mrg pushed to the last round of trace collection. When 659 1.1 mrg optimizing for size, do not push to next round. */ 660 1.1 mrg 661 1.1 mrg if (!for_size && push_to_next_round_p (e->dest, round, 662 1.1 mrg number_of_rounds, 663 1.1 mrg count_th)) 664 1.1 mrg which_heap = new_heap; 665 1.1 mrg } 666 1.1 mrg 667 1.1 mrg bbd[e->dest->index].heap = which_heap; 668 1.1 mrg bbd[e->dest->index].node = which_heap->insert (key, e->dest); 669 1.1 mrg 670 1.1 mrg if (dump_file) 671 1.1 mrg { 672 1.1 mrg fprintf (dump_file, 673 1.1 mrg " Possible start of %s round: %d (key: %ld)\n", 674 1.1 mrg (which_heap == new_heap) ? "next" : "this", 675 1.1 mrg e->dest->index, (long) key); 676 1.1 mrg } 677 1.1 mrg 678 1.1 mrg } 679 1.1 mrg } 680 1.1 mrg 681 1.1 mrg if (best_edge) /* Suitable successor was found. */ 682 1.1 mrg { 683 1.1 mrg if (bb_visited_trace (best_edge->dest) == *n_traces) 684 1.1 mrg { 685 1.1 mrg /* We do nothing with one basic block loops. */ 686 1.1 mrg if (best_edge->dest != bb) 687 1.1 mrg { 688 1.1 mrg if (best_edge->count () 689 1.1 mrg > best_edge->dest->count.apply_scale (4, 5)) 690 1.1 mrg { 691 1.1 mrg /* The loop has at least 4 iterations. If the loop 692 1.1 mrg header is not the first block of the function 693 1.1 mrg we can rotate the loop. */ 694 1.1 mrg 695 1.1 mrg if (best_edge->dest 696 1.1 mrg != ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb) 697 1.1 mrg { 698 1.1 mrg if (dump_file) 699 1.1 mrg { 700 1.1 mrg fprintf (dump_file, 701 1.1 mrg "Rotating loop %d - %d\n", 702 1.1 mrg best_edge->dest->index, bb->index); 703 1.1 mrg } 704 1.1 mrg bb->aux = best_edge->dest; 705 1.1 mrg bbd[best_edge->dest->index].in_trace = 706 1.1 mrg (*n_traces) - 1; 707 1.1 mrg bb = rotate_loop (best_edge, trace, *n_traces); 708 1.1 mrg } 709 1.1 mrg } 710 1.1 mrg else 711 1.1 mrg { 712 1.1 mrg /* The loop has less than 4 iterations. */ 713 1.1 mrg 714 1.1 mrg if (single_succ_p (bb) 715 1.1 mrg && copy_bb_p (best_edge->dest, 716 1.1 mrg optimize_edge_for_speed_p 717 1.1 mrg (best_edge))) 718 1.1 mrg { 719 1.1 mrg bb = copy_bb (best_edge->dest, best_edge, bb, 720 1.1 mrg *n_traces); 721 1.1 mrg trace->length++; 722 1.1 mrg } 723 1.1 mrg } 724 1.1 mrg } 725 1.1 mrg 726 1.1 mrg /* Terminate the trace. */ 727 1.1 mrg break; 728 1.1 mrg } 729 1.1 mrg else 730 1.1 mrg { 731 1.1 mrg /* Check for a situation 732 1.1 mrg 733 1.1 mrg A 734 1.1 mrg /| 735 1.1 mrg B | 736 1.1 mrg \| 737 1.1 mrg C 738 1.1 mrg 739 1.1 mrg where 740 1.1 mrg AB->count () + BC->count () >= AC->count (). 741 1.1 mrg (i.e. 2 * B->count >= AC->count ) 742 1.1 mrg Best ordering is then A B C. 743 1.1 mrg 744 1.1 mrg When optimizing for size, A B C is always the best order. 745 1.1 mrg 746 1.1 mrg This situation is created for example by: 747 1.1 mrg 748 1.1 mrg if (A) B; 749 1.1 mrg C; 750 1.1 mrg 751 1.1 mrg */ 752 1.1 mrg 753 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 754 1.1 mrg if (e != best_edge 755 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU) 756 1.1 mrg && !(e->flags & EDGE_COMPLEX) 757 1.1 mrg && !bb_visited_trace (e->dest) 758 1.1 mrg && single_pred_p (e->dest) 759 1.1 mrg && !(e->flags & EDGE_CROSSING) 760 1.1 mrg && single_succ_p (e->dest) 761 1.1 mrg && (single_succ_edge (e->dest)->flags 762 1.1 mrg & EDGE_CAN_FALLTHRU) 763 1.1 mrg && !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX) 764 1.1 mrg && single_succ (e->dest) == best_edge->dest 765 1.1 mrg && (e->dest->count.apply_scale (2, 1) 766 1.1 mrg >= best_edge->count () || for_size)) 767 1.1 mrg { 768 1.1 mrg best_edge = e; 769 1.1 mrg if (dump_file) 770 1.1 mrg fprintf (dump_file, "Selecting BB %d\n", 771 1.1 mrg best_edge->dest->index); 772 1.1 mrg break; 773 1.1 mrg } 774 1.1 mrg 775 1.1 mrg bb->aux = best_edge->dest; 776 1.1 mrg bbd[best_edge->dest->index].in_trace = (*n_traces) - 1; 777 1.1 mrg bb = best_edge->dest; 778 1.1 mrg } 779 1.1 mrg } 780 1.1 mrg } 781 1.1 mrg while (best_edge); 782 1.1 mrg trace->last = bb; 783 1.1 mrg bbd[trace->first->index].start_of_trace = *n_traces - 1; 784 1.1 mrg if (bbd[trace->last->index].end_of_trace != *n_traces - 1) 785 1.1 mrg { 786 1.1 mrg bbd[trace->last->index].end_of_trace = *n_traces - 1; 787 1.1 mrg /* Update the cached maximum frequency for interesting predecessor 788 1.1 mrg edges for successors of the new trace end. */ 789 1.1 mrg FOR_EACH_EDGE (e, ei, trace->last->succs) 790 1.1 mrg if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority) 791 1.1 mrg bbd[e->dest->index].priority = EDGE_FREQUENCY (e); 792 1.1 mrg } 793 1.1 mrg 794 1.1 mrg /* The trace is terminated so we have to recount the keys in heap 795 1.1 mrg (some block can have a lower key because now one of its predecessors 796 1.1 mrg is an end of the trace). */ 797 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 798 1.1 mrg { 799 1.1 mrg if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun) 800 1.1 mrg || bb_visited_trace (e->dest)) 801 1.1 mrg continue; 802 1.1 mrg 803 1.1 mrg if (bbd[e->dest->index].heap) 804 1.1 mrg { 805 1.1 mrg key = bb_to_key (e->dest); 806 1.1 mrg if (key != bbd[e->dest->index].node->get_key ()) 807 1.1 mrg { 808 1.1 mrg if (dump_file) 809 1.1 mrg { 810 1.1 mrg fprintf (dump_file, 811 1.1 mrg "Changing key for bb %d from %ld to %ld.\n", 812 1.1 mrg e->dest->index, 813 1.1 mrg (long) bbd[e->dest->index].node->get_key (), key); 814 1.1 mrg } 815 1.1 mrg bbd[e->dest->index].heap->replace_key 816 1.1 mrg (bbd[e->dest->index].node, key); 817 1.1 mrg } 818 1.1 mrg } 819 1.1 mrg } 820 1.1 mrg } 821 1.1 mrg 822 1.1 mrg delete (*heap); 823 1.1 mrg 824 1.1 mrg /* "Return" the new heap. */ 825 1.1 mrg *heap = new_heap; 826 1.1 mrg } 827 1.1 mrg 828 1.1 mrg /* Create a duplicate of the basic block OLD_BB and redirect edge E to it, add 829 1.1 mrg it to trace after BB, mark OLD_BB visited and update pass' data structures 830 1.1 mrg (TRACE is a number of trace which OLD_BB is duplicated to). */ 831 1.1 mrg 832 1.1 mrg static basic_block 833 1.1 mrg copy_bb (basic_block old_bb, edge e, basic_block bb, int trace) 834 1.1 mrg { 835 1.1 mrg basic_block new_bb; 836 1.1 mrg 837 1.1 mrg new_bb = duplicate_block (old_bb, e, bb); 838 1.1 mrg BB_COPY_PARTITION (new_bb, old_bb); 839 1.1 mrg 840 1.1 mrg gcc_assert (e->dest == new_bb); 841 1.1 mrg 842 1.1 mrg if (dump_file) 843 1.1 mrg fprintf (dump_file, 844 1.1 mrg "Duplicated bb %d (created bb %d)\n", 845 1.1 mrg old_bb->index, new_bb->index); 846 1.1 mrg 847 1.1 mrg if (new_bb->index >= array_size 848 1.1 mrg || last_basic_block_for_fn (cfun) > array_size) 849 1.1 mrg { 850 1.1 mrg int i; 851 1.1 mrg int new_size; 852 1.1 mrg 853 1.1 mrg new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + 1); 854 1.1 mrg new_size = GET_ARRAY_SIZE (new_size); 855 1.1 mrg bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size); 856 1.1 mrg for (i = array_size; i < new_size; i++) 857 1.1 mrg { 858 1.1 mrg bbd[i].start_of_trace = -1; 859 1.1 mrg bbd[i].end_of_trace = -1; 860 1.1 mrg bbd[i].in_trace = -1; 861 1.1 mrg bbd[i].visited = 0; 862 1.1 mrg bbd[i].priority = -1; 863 1.1 mrg bbd[i].heap = NULL; 864 1.1 mrg bbd[i].node = NULL; 865 1.1 mrg } 866 1.1 mrg array_size = new_size; 867 1.1 mrg 868 1.1 mrg if (dump_file) 869 1.1 mrg { 870 1.1 mrg fprintf (dump_file, 871 1.1 mrg "Growing the dynamic array to %d elements.\n", 872 1.1 mrg array_size); 873 1.1 mrg } 874 1.1 mrg } 875 1.1 mrg 876 1.1 mrg gcc_assert (!bb_visited_trace (e->dest)); 877 1.1 mrg mark_bb_visited (new_bb, trace); 878 1.1 mrg new_bb->aux = bb->aux; 879 1.1 mrg bb->aux = new_bb; 880 1.1 mrg 881 1.1 mrg bbd[new_bb->index].in_trace = trace; 882 1.1 mrg 883 1.1 mrg return new_bb; 884 1.1 mrg } 885 1.1 mrg 886 1.1 mrg /* Compute and return the key (for the heap) of the basic block BB. */ 887 1.1 mrg 888 1.1 mrg static long 889 1.1 mrg bb_to_key (basic_block bb) 890 1.1 mrg { 891 1.1 mrg edge e; 892 1.1 mrg edge_iterator ei; 893 1.1 mrg 894 1.1 mrg /* Use index as key to align with its original order. */ 895 1.1 mrg if (optimize_function_for_size_p (cfun)) 896 1.1 mrg return bb->index; 897 1.1 mrg 898 1.1 mrg /* Do not start in probably never executed blocks. */ 899 1.1 mrg 900 1.1 mrg if (BB_PARTITION (bb) == BB_COLD_PARTITION 901 1.1 mrg || probably_never_executed_bb_p (cfun, bb)) 902 1.1 mrg return BB_FREQ_MAX; 903 1.1 mrg 904 1.1 mrg /* Prefer blocks whose predecessor is an end of some trace 905 1.1 mrg or whose predecessor edge is EDGE_DFS_BACK. */ 906 1.1 mrg int priority = bbd[bb->index].priority; 907 1.1 mrg if (priority == -1) 908 1.1 mrg { 909 1.1 mrg priority = 0; 910 1.1 mrg FOR_EACH_EDGE (e, ei, bb->preds) 911 1.1 mrg { 912 1.1 mrg if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) 913 1.1 mrg && bbd[e->src->index].end_of_trace >= 0) 914 1.1 mrg || (e->flags & EDGE_DFS_BACK)) 915 1.1 mrg { 916 1.1 mrg int edge_freq = EDGE_FREQUENCY (e); 917 1.1 mrg 918 1.1 mrg if (edge_freq > priority) 919 1.1 mrg priority = edge_freq; 920 1.1 mrg } 921 1.1 mrg } 922 1.1 mrg bbd[bb->index].priority = priority; 923 1.1 mrg } 924 1.1 mrg 925 1.1 mrg if (priority) 926 1.1 mrg /* The block with priority should have significantly lower key. */ 927 1.1 mrg return -(100 * BB_FREQ_MAX + 100 * priority + bb->count.to_frequency (cfun)); 928 1.1 mrg 929 1.1 mrg return -bb->count.to_frequency (cfun); 930 1.1 mrg } 931 1.1 mrg 932 1.1 mrg /* Return true when the edge E from basic block BB is better than the temporary 933 1.1 mrg best edge (details are in function). The probability of edge E is PROB. The 934 1.1 mrg count of the successor is COUNT. The current best probability is 935 1.1 mrg BEST_PROB, the best count is BEST_COUNT. 936 1.1 mrg The edge is considered to be equivalent when PROB does not differ much from 937 1.1 mrg BEST_PROB; similarly for count. */ 938 1.1 mrg 939 1.1 mrg static bool 940 1.1 mrg better_edge_p (const_basic_block bb, const_edge e, profile_probability prob, 941 1.1 mrg profile_count count, profile_probability best_prob, 942 1.1 mrg profile_count best_count, const_edge cur_best_edge) 943 1.1 mrg { 944 1.1 mrg bool is_better_edge; 945 1.1 mrg 946 1.1 mrg /* The BEST_* values do not have to be best, but can be a bit smaller than 947 1.1 mrg maximum values. */ 948 1.1 mrg profile_probability diff_prob = best_prob.apply_scale (1, 10); 949 1.1 mrg 950 1.1 mrg /* The smaller one is better to keep the original order. */ 951 1.1 mrg if (optimize_function_for_size_p (cfun)) 952 1.1 mrg return !cur_best_edge 953 1.1 mrg || cur_best_edge->dest->index > e->dest->index; 954 1.1 mrg 955 1.1 mrg /* Those edges are so expensive that continuing a trace is not useful 956 1.1 mrg performance wise. */ 957 1.1 mrg if (e->flags & (EDGE_ABNORMAL | EDGE_EH)) 958 1.1 mrg return false; 959 1.1 mrg 960 1.1 mrg if (prob > best_prob + diff_prob 961 1.1 mrg || (!best_prob.initialized_p () 962 1.1 mrg && prob > profile_probability::guessed_never ())) 963 1.1 mrg /* The edge has higher probability than the temporary best edge. */ 964 1.1 mrg is_better_edge = true; 965 1.1 mrg else if (prob < best_prob - diff_prob) 966 1.1 mrg /* The edge has lower probability than the temporary best edge. */ 967 1.1 mrg is_better_edge = false; 968 1.1 mrg else 969 1.1 mrg { 970 1.1 mrg profile_count diff_count = best_count.apply_scale (1, 10); 971 1.1 mrg if (count < best_count - diff_count 972 1.1 mrg || (!best_count.initialized_p () 973 1.1 mrg && count.nonzero_p ())) 974 1.1 mrg /* The edge and the temporary best edge have almost equivalent 975 1.1 mrg probabilities. The higher countuency of a successor now means 976 1.1 mrg that there is another edge going into that successor. 977 1.1 mrg This successor has lower countuency so it is better. */ 978 1.1 mrg is_better_edge = true; 979 1.1 mrg else if (count > best_count + diff_count) 980 1.1 mrg /* This successor has higher countuency so it is worse. */ 981 1.1 mrg is_better_edge = false; 982 1.1 mrg else if (e->dest->prev_bb == bb) 983 1.1 mrg /* The edges have equivalent probabilities and the successors 984 1.1 mrg have equivalent frequencies. Select the previous successor. */ 985 1.1 mrg is_better_edge = true; 986 1.1 mrg else 987 1.1 mrg is_better_edge = false; 988 1.1 mrg } 989 1.1 mrg 990 1.1 mrg return is_better_edge; 991 1.1 mrg } 992 1.1 mrg 993 1.1 mrg /* Return true when the edge E is better than the temporary best edge 994 1.1 mrg CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of 995 1.1 mrg E and CUR_BEST_EDGE; otherwise it will compare the dest bb. 996 1.1 mrg BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE. 997 1.1 mrg TRACES record the information about traces. 998 1.1 mrg When optimizing for size, the edge with smaller index is better. 999 1.1 mrg When optimizing for speed, the edge with bigger probability or longer trace 1000 1.1 mrg is better. */ 1001 1.1 mrg 1002 1.1 mrg static bool 1003 1.1 mrg connect_better_edge_p (const_edge e, bool src_index_p, int best_len, 1004 1.1 mrg const_edge cur_best_edge, struct trace *traces) 1005 1.1 mrg { 1006 1.1 mrg int e_index; 1007 1.1 mrg int b_index; 1008 1.1 mrg bool is_better_edge; 1009 1.1 mrg 1010 1.1 mrg if (!cur_best_edge) 1011 1.1 mrg return true; 1012 1.1 mrg 1013 1.1 mrg if (optimize_function_for_size_p (cfun)) 1014 1.1 mrg { 1015 1.1 mrg e_index = src_index_p ? e->src->index : e->dest->index; 1016 1.1 mrg b_index = src_index_p ? cur_best_edge->src->index 1017 1.1 mrg : cur_best_edge->dest->index; 1018 1.1 mrg /* The smaller one is better to keep the original order. */ 1019 1.1 mrg return b_index > e_index; 1020 1.1 mrg } 1021 1.1 mrg 1022 1.1 mrg if (src_index_p) 1023 1.1 mrg { 1024 1.1 mrg e_index = e->src->index; 1025 1.1 mrg 1026 1.1 mrg /* We are looking for predecessor, so probabilities are not that 1027 1.1 mrg informative. We do not want to connect A to B because A has 1028 1.1 mrg only one successor (probability is 100%) while there is edge 1029 1.1 mrg A' to B where probability is 90% but which is much more frequent. */ 1030 1.1 mrg if (e->count () > cur_best_edge->count ()) 1031 1.1 mrg /* The edge has higher probability than the temporary best edge. */ 1032 1.1 mrg is_better_edge = true; 1033 1.1 mrg else if (e->count () < cur_best_edge->count ()) 1034 1.1 mrg /* The edge has lower probability than the temporary best edge. */ 1035 1.1 mrg is_better_edge = false; 1036 1.1 mrg else if (e->probability > cur_best_edge->probability) 1037 1.1 mrg /* The edge has higher probability than the temporary best edge. */ 1038 1.1 mrg is_better_edge = true; 1039 1.1 mrg else if (e->probability < cur_best_edge->probability) 1040 1.1 mrg /* The edge has lower probability than the temporary best edge. */ 1041 1.1 mrg is_better_edge = false; 1042 1.1 mrg else if (traces[bbd[e_index].end_of_trace].length > best_len) 1043 1.1 mrg /* The edge and the temporary best edge have equivalent probabilities. 1044 1.1 mrg The edge with longer trace is better. */ 1045 1.1 mrg is_better_edge = true; 1046 1.1 mrg else 1047 1.1 mrg is_better_edge = false; 1048 1.1 mrg } 1049 1.1 mrg else 1050 1.1 mrg { 1051 1.1 mrg e_index = e->dest->index; 1052 1.1 mrg 1053 1.1 mrg if (e->probability > cur_best_edge->probability) 1054 1.1 mrg /* The edge has higher probability than the temporary best edge. */ 1055 1.1 mrg is_better_edge = true; 1056 1.1 mrg else if (e->probability < cur_best_edge->probability) 1057 1.1 mrg /* The edge has lower probability than the temporary best edge. */ 1058 1.1 mrg is_better_edge = false; 1059 1.1 mrg else if (traces[bbd[e_index].start_of_trace].length > best_len) 1060 1.1 mrg /* The edge and the temporary best edge have equivalent probabilities. 1061 1.1 mrg The edge with longer trace is better. */ 1062 1.1 mrg is_better_edge = true; 1063 1.1 mrg else 1064 1.1 mrg is_better_edge = false; 1065 1.1 mrg } 1066 1.1 mrg 1067 1.1 mrg return is_better_edge; 1068 1.1 mrg } 1069 1.1 mrg 1070 1.1 mrg /* Connect traces in array TRACES, N_TRACES is the count of traces. */ 1071 1.1 mrg 1072 1.1 mrg static void 1073 1.1 mrg connect_traces (int n_traces, struct trace *traces) 1074 1.1 mrg { 1075 1.1 mrg int i; 1076 1.1 mrg bool *connected; 1077 1.1 mrg bool two_passes; 1078 1.1 mrg int last_trace; 1079 1.1 mrg int current_pass; 1080 1.1 mrg int current_partition; 1081 1.1 mrg profile_count count_threshold; 1082 1.1 mrg bool for_size = optimize_function_for_size_p (cfun); 1083 1.1 mrg 1084 1.1 mrg count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, 1000); 1085 1.1 mrg 1086 1.1 mrg connected = XCNEWVEC (bool, n_traces); 1087 1.1 mrg last_trace = -1; 1088 1.1 mrg current_pass = 1; 1089 1.1 mrg current_partition = BB_PARTITION (traces[0].first); 1090 1.1 mrg two_passes = false; 1091 1.1 mrg 1092 1.1 mrg if (crtl->has_bb_partition) 1093 1.1 mrg for (i = 0; i < n_traces && !two_passes; i++) 1094 1.1 mrg if (BB_PARTITION (traces[0].first) 1095 1.1 mrg != BB_PARTITION (traces[i].first)) 1096 1.1 mrg two_passes = true; 1097 1.1 mrg 1098 1.1 mrg for (i = 0; i < n_traces || (two_passes && current_pass == 1) ; i++) 1099 1.1 mrg { 1100 1.1 mrg int t = i; 1101 1.1 mrg int t2; 1102 1.1 mrg edge e, best; 1103 1.1 mrg int best_len; 1104 1.1 mrg 1105 1.1 mrg if (i >= n_traces) 1106 1.1 mrg { 1107 1.1 mrg gcc_assert (two_passes && current_pass == 1); 1108 1.1 mrg i = 0; 1109 1.1 mrg t = i; 1110 1.1 mrg current_pass = 2; 1111 1.1 mrg if (current_partition == BB_HOT_PARTITION) 1112 1.1 mrg current_partition = BB_COLD_PARTITION; 1113 1.1 mrg else 1114 1.1 mrg current_partition = BB_HOT_PARTITION; 1115 1.1 mrg } 1116 1.1 mrg 1117 1.1 mrg if (connected[t]) 1118 1.1 mrg continue; 1119 1.1 mrg 1120 1.1 mrg if (two_passes 1121 1.1 mrg && BB_PARTITION (traces[t].first) != current_partition) 1122 1.1 mrg continue; 1123 1.1 mrg 1124 1.1 mrg connected[t] = true; 1125 1.1 mrg 1126 1.1 mrg /* Find the predecessor traces. */ 1127 1.1 mrg for (t2 = t; t2 > 0;) 1128 1.1 mrg { 1129 1.1 mrg edge_iterator ei; 1130 1.1 mrg best = NULL; 1131 1.1 mrg best_len = 0; 1132 1.1 mrg FOR_EACH_EDGE (e, ei, traces[t2].first->preds) 1133 1.1 mrg { 1134 1.1 mrg int si = e->src->index; 1135 1.1 mrg 1136 1.1 mrg if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) 1137 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU) 1138 1.1 mrg && !(e->flags & EDGE_COMPLEX) 1139 1.1 mrg && bbd[si].end_of_trace >= 0 1140 1.1 mrg && !connected[bbd[si].end_of_trace] 1141 1.1 mrg && (BB_PARTITION (e->src) == current_partition) 1142 1.1 mrg && connect_better_edge_p (e, true, best_len, best, traces)) 1143 1.1 mrg { 1144 1.1 mrg best = e; 1145 1.1 mrg best_len = traces[bbd[si].end_of_trace].length; 1146 1.1 mrg } 1147 1.1 mrg } 1148 1.1 mrg if (best) 1149 1.1 mrg { 1150 1.1 mrg best->src->aux = best->dest; 1151 1.1 mrg t2 = bbd[best->src->index].end_of_trace; 1152 1.1 mrg connected[t2] = true; 1153 1.1 mrg 1154 1.1 mrg if (dump_file) 1155 1.1 mrg { 1156 1.1 mrg fprintf (dump_file, "Connection: %d %d\n", 1157 1.1 mrg best->src->index, best->dest->index); 1158 1.1 mrg } 1159 1.1 mrg } 1160 1.1 mrg else 1161 1.1 mrg break; 1162 1.1 mrg } 1163 1.1 mrg 1164 1.1 mrg if (last_trace >= 0) 1165 1.1 mrg traces[last_trace].last->aux = traces[t2].first; 1166 1.1 mrg last_trace = t; 1167 1.1 mrg 1168 1.1 mrg /* Find the successor traces. */ 1169 1.1 mrg while (1) 1170 1.1 mrg { 1171 1.1 mrg /* Find the continuation of the chain. */ 1172 1.1 mrg edge_iterator ei; 1173 1.1 mrg best = NULL; 1174 1.1 mrg best_len = 0; 1175 1.1 mrg FOR_EACH_EDGE (e, ei, traces[t].last->succs) 1176 1.1 mrg { 1177 1.1 mrg int di = e->dest->index; 1178 1.1 mrg 1179 1.1 mrg if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 1180 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU) 1181 1.1 mrg && !(e->flags & EDGE_COMPLEX) 1182 1.1 mrg && bbd[di].start_of_trace >= 0 1183 1.1 mrg && !connected[bbd[di].start_of_trace] 1184 1.1 mrg && (BB_PARTITION (e->dest) == current_partition) 1185 1.1 mrg && connect_better_edge_p (e, false, best_len, best, traces)) 1186 1.1 mrg { 1187 1.1 mrg best = e; 1188 1.1 mrg best_len = traces[bbd[di].start_of_trace].length; 1189 1.1 mrg } 1190 1.1 mrg } 1191 1.1 mrg 1192 1.1 mrg if (for_size) 1193 1.1 mrg { 1194 1.1 mrg if (!best) 1195 1.1 mrg /* Stop finding the successor traces. */ 1196 1.1 mrg break; 1197 1.1 mrg 1198 1.1 mrg /* It is OK to connect block n with block n + 1 or a block 1199 1.1 mrg before n. For others, only connect to the loop header. */ 1200 1.1 mrg if (best->dest->index > (traces[t].last->index + 1)) 1201 1.1 mrg { 1202 1.1 mrg int count = EDGE_COUNT (best->dest->preds); 1203 1.1 mrg 1204 1.1 mrg FOR_EACH_EDGE (e, ei, best->dest->preds) 1205 1.1 mrg if (e->flags & EDGE_DFS_BACK) 1206 1.1 mrg count--; 1207 1.1 mrg 1208 1.1 mrg /* If dest has multiple predecessors, skip it. We expect 1209 1.1 mrg that one predecessor with smaller index connects with it 1210 1.1 mrg later. */ 1211 1.1 mrg if (count != 1) 1212 1.1 mrg break; 1213 1.1 mrg } 1214 1.1 mrg 1215 1.1 mrg /* Only connect Trace n with Trace n + 1. It is conservative 1216 1.1 mrg to keep the order as close as possible to the original order. 1217 1.1 mrg It also helps to reduce long jumps. */ 1218 1.1 mrg if (last_trace != bbd[best->dest->index].start_of_trace - 1) 1219 1.1 mrg break; 1220 1.1 mrg 1221 1.1 mrg if (dump_file) 1222 1.1 mrg fprintf (dump_file, "Connection: %d %d\n", 1223 1.1 mrg best->src->index, best->dest->index); 1224 1.1 mrg 1225 1.1 mrg t = bbd[best->dest->index].start_of_trace; 1226 1.1 mrg traces[last_trace].last->aux = traces[t].first; 1227 1.1 mrg connected[t] = true; 1228 1.1 mrg last_trace = t; 1229 1.1 mrg } 1230 1.1 mrg else if (best) 1231 1.1 mrg { 1232 1.1 mrg if (dump_file) 1233 1.1 mrg { 1234 1.1 mrg fprintf (dump_file, "Connection: %d %d\n", 1235 1.1 mrg best->src->index, best->dest->index); 1236 1.1 mrg } 1237 1.1 mrg t = bbd[best->dest->index].start_of_trace; 1238 1.1 mrg traces[last_trace].last->aux = traces[t].first; 1239 1.1 mrg connected[t] = true; 1240 1.1 mrg last_trace = t; 1241 1.1 mrg } 1242 1.1 mrg else 1243 1.1 mrg { 1244 1.1 mrg /* Try to connect the traces by duplication of 1 block. */ 1245 1.1 mrg edge e2; 1246 1.1 mrg basic_block next_bb = NULL; 1247 1.1 mrg bool try_copy = false; 1248 1.1 mrg 1249 1.1 mrg FOR_EACH_EDGE (e, ei, traces[t].last->succs) 1250 1.1 mrg if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 1251 1.1 mrg && (e->flags & EDGE_CAN_FALLTHRU) 1252 1.1 mrg && !(e->flags & EDGE_COMPLEX) 1253 1.1 mrg && (!best || e->probability > best->probability)) 1254 1.1 mrg { 1255 1.1 mrg edge_iterator ei; 1256 1.1 mrg edge best2 = NULL; 1257 1.1 mrg int best2_len = 0; 1258 1.1 mrg 1259 1.1 mrg /* If the destination is a start of a trace which is only 1260 1.1 mrg one block long, then no need to search the successor 1261 1.1 mrg blocks of the trace. Accept it. */ 1262 1.1 mrg if (bbd[e->dest->index].start_of_trace >= 0 1263 1.1 mrg && traces[bbd[e->dest->index].start_of_trace].length 1264 1.1 mrg == 1) 1265 1.1 mrg { 1266 1.1 mrg best = e; 1267 1.1 mrg try_copy = true; 1268 1.1 mrg continue; 1269 1.1 mrg } 1270 1.1 mrg 1271 1.1 mrg FOR_EACH_EDGE (e2, ei, e->dest->succs) 1272 1.1 mrg { 1273 1.1 mrg int di = e2->dest->index; 1274 1.1 mrg 1275 1.1 mrg if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun) 1276 1.1 mrg || ((e2->flags & EDGE_CAN_FALLTHRU) 1277 1.1 mrg && !(e2->flags & EDGE_COMPLEX) 1278 1.1 mrg && bbd[di].start_of_trace >= 0 1279 1.1 mrg && !connected[bbd[di].start_of_trace] 1280 1.1 mrg && BB_PARTITION (e2->dest) == current_partition 1281 1.1 mrg && e2->count () >= count_threshold 1282 1.1 mrg && (!best2 1283 1.1 mrg || e2->probability > best2->probability 1284 1.1 mrg || (e2->probability == best2->probability 1285 1.1 mrg && traces[bbd[di].start_of_trace].length 1286 1.1 mrg > best2_len)))) 1287 1.1 mrg { 1288 1.1 mrg best = e; 1289 1.1 mrg best2 = e2; 1290 1.1 mrg if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)) 1291 1.1 mrg best2_len = traces[bbd[di].start_of_trace].length; 1292 1.1 mrg else 1293 1.1 mrg best2_len = INT_MAX; 1294 1.1 mrg next_bb = e2->dest; 1295 1.1 mrg try_copy = true; 1296 1.1 mrg } 1297 1.1 mrg } 1298 1.1 mrg } 1299 1.1 mrg 1300 1.1 mrg /* Copy tiny blocks always; copy larger blocks only when the 1301 1.1 mrg edge is traversed frequently enough. */ 1302 1.1 mrg if (try_copy 1303 1.1 mrg && BB_PARTITION (best->src) == BB_PARTITION (best->dest) 1304 1.1 mrg && copy_bb_p (best->dest, 1305 1.1 mrg optimize_edge_for_speed_p (best) 1306 1.1 mrg && (!best->count ().initialized_p () 1307 1.1 mrg || best->count () >= count_threshold))) 1308 1.1 mrg { 1309 1.1 mrg basic_block new_bb; 1310 1.1 mrg 1311 1.1 mrg if (dump_file) 1312 1.1 mrg { 1313 1.1 mrg fprintf (dump_file, "Connection: %d %d ", 1314 1.1 mrg traces[t].last->index, best->dest->index); 1315 1.1 mrg if (!next_bb) 1316 1.1 mrg fputc ('\n', dump_file); 1317 1.1 mrg else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)) 1318 1.1 mrg fprintf (dump_file, "exit\n"); 1319 1.1 mrg else 1320 1.1 mrg fprintf (dump_file, "%d\n", next_bb->index); 1321 1.1 mrg } 1322 1.1 mrg 1323 1.1 mrg new_bb = copy_bb (best->dest, best, traces[t].last, t); 1324 1.1 mrg traces[t].last = new_bb; 1325 1.1 mrg if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun)) 1326 1.1 mrg { 1327 1.1 mrg t = bbd[next_bb->index].start_of_trace; 1328 1.1 mrg traces[last_trace].last->aux = traces[t].first; 1329 1.1 mrg connected[t] = true; 1330 1.1 mrg last_trace = t; 1331 1.1 mrg } 1332 1.1 mrg else 1333 1.1 mrg break; /* Stop finding the successor traces. */ 1334 1.1 mrg } 1335 1.1 mrg else 1336 1.1 mrg break; /* Stop finding the successor traces. */ 1337 1.1 mrg } 1338 1.1 mrg } 1339 1.1 mrg } 1340 1.1 mrg 1341 1.1 mrg if (dump_file) 1342 1.1 mrg { 1343 1.1 mrg basic_block bb; 1344 1.1 mrg 1345 1.1 mrg fprintf (dump_file, "Final order:\n"); 1346 1.1 mrg for (bb = traces[0].first; bb; bb = (basic_block) bb->aux) 1347 1.1 mrg fprintf (dump_file, "%d ", bb->index); 1348 1.1 mrg fprintf (dump_file, "\n"); 1349 1.1 mrg fflush (dump_file); 1350 1.1 mrg } 1351 1.1 mrg 1352 1.1 mrg FREE (connected); 1353 1.1 mrg } 1354 1.1 mrg 1355 1.1 mrg /* Return true when BB can and should be copied. CODE_MAY_GROW is true 1356 1.1 mrg when code size is allowed to grow by duplication. */ 1357 1.1 mrg 1358 1.1 mrg static bool 1359 1.1 mrg copy_bb_p (const_basic_block bb, int code_may_grow) 1360 1.1 mrg { 1361 1.1 mrg unsigned int size = 0; 1362 1.1 mrg unsigned int max_size = uncond_jump_length; 1363 1.1 mrg rtx_insn *insn; 1364 1.1 mrg 1365 1.1 mrg if (EDGE_COUNT (bb->preds) < 2) 1366 1.1 mrg return false; 1367 1.1 mrg if (!can_duplicate_block_p (bb)) 1368 1.1 mrg return false; 1369 1.1 mrg 1370 1.1 mrg /* Avoid duplicating blocks which have many successors (PR/13430). */ 1371 1.1 mrg if (EDGE_COUNT (bb->succs) > 8) 1372 1.1 mrg return false; 1373 1.1 mrg 1374 1.1 mrg if (code_may_grow && optimize_bb_for_speed_p (bb)) 1375 1.1 mrg max_size *= param_max_grow_copy_bb_insns; 1376 1.1 mrg 1377 1.1 mrg FOR_BB_INSNS (bb, insn) 1378 1.1 mrg { 1379 1.1 mrg if (INSN_P (insn)) 1380 1.1 mrg { 1381 1.1 mrg size += get_attr_min_length (insn); 1382 1.1 mrg if (size > max_size) 1383 1.1 mrg break; 1384 1.1 mrg } 1385 1.1 mrg } 1386 1.1 mrg 1387 1.1 mrg if (size <= max_size) 1388 1.1 mrg return true; 1389 1.1 mrg 1390 1.1 mrg if (dump_file) 1391 1.1 mrg { 1392 1.1 mrg fprintf (dump_file, 1393 1.1 mrg "Block %d can't be copied because its size = %u.\n", 1394 1.1 mrg bb->index, size); 1395 1.1 mrg } 1396 1.1 mrg 1397 1.1 mrg return false; 1398 1.1 mrg } 1399 1.1 mrg 1400 1.1 mrg /* Return the length of unconditional jump instruction. */ 1401 1.1 mrg 1402 1.1 mrg int 1403 1.1 mrg get_uncond_jump_length (void) 1404 1.1 mrg { 1405 1.1 mrg unsigned int length; 1406 1.1 mrg 1407 1.1 mrg start_sequence (); 1408 1.1 mrg rtx_code_label *label = emit_label (gen_label_rtx ()); 1409 1.1 mrg rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label)); 1410 1.1 mrg length = get_attr_min_length (jump); 1411 1.1 mrg end_sequence (); 1412 1.1 mrg 1413 1.1 mrg gcc_assert (length < INT_MAX); 1414 1.1 mrg return length; 1415 1.1 mrg } 1416 1.1 mrg 1417 1.1 mrg /* Create a forwarder block to OLD_BB starting with NEW_LABEL and in the 1418 1.1 mrg other partition wrt OLD_BB. */ 1419 1.1 mrg 1420 1.1 mrg static basic_block 1421 1.1 mrg create_eh_forwarder_block (rtx_code_label *new_label, basic_block old_bb) 1422 1.1 mrg { 1423 1.1 mrg /* Split OLD_BB, so that EH pads have always only incoming EH edges, 1424 1.1 mrg bb_has_eh_pred bbs are treated specially by DF infrastructure. */ 1425 1.1 mrg old_bb = split_block_after_labels (old_bb)->dest; 1426 1.1 mrg 1427 1.1 mrg /* Put the new label and a jump in the new basic block. */ 1428 1.1 mrg rtx_insn *label = emit_label (new_label); 1429 1.1 mrg rtx_code_label *old_label = block_label (old_bb); 1430 1.1 mrg rtx_insn *jump = emit_jump_insn (targetm.gen_jump (old_label)); 1431 1.1 mrg JUMP_LABEL (jump) = old_label; 1432 1.1 mrg 1433 1.1 mrg /* Create the new basic block and put it in last position. */ 1434 1.1 mrg basic_block last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; 1435 1.1 mrg basic_block new_bb = create_basic_block (label, jump, last_bb); 1436 1.1 mrg new_bb->aux = last_bb->aux; 1437 1.1 mrg new_bb->count = old_bb->count; 1438 1.1 mrg last_bb->aux = new_bb; 1439 1.1 mrg 1440 1.1 mrg emit_barrier_after_bb (new_bb); 1441 1.1 mrg 1442 1.1 mrg make_single_succ_edge (new_bb, old_bb, 0); 1443 1.1 mrg 1444 1.1 mrg /* Make sure the new basic block is in the other partition. */ 1445 1.1 mrg unsigned new_partition = BB_PARTITION (old_bb); 1446 1.1 mrg new_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION; 1447 1.1 mrg BB_SET_PARTITION (new_bb, new_partition); 1448 1.1 mrg 1449 1.1 mrg return new_bb; 1450 1.1 mrg } 1451 1.1 mrg 1452 1.1 mrg /* The common landing pad in block OLD_BB has edges from both partitions. 1453 1.1 mrg Add a new landing pad that will just jump to the old one and split the 1454 1.1 mrg edges so that no EH edge crosses partitions. */ 1455 1.1 mrg 1456 1.1 mrg static void 1457 1.1 mrg sjlj_fix_up_crossing_landing_pad (basic_block old_bb) 1458 1.1 mrg { 1459 1.1 mrg const unsigned lp_len = cfun->eh->lp_array->length (); 1460 1.1 mrg edge_iterator ei; 1461 1.1 mrg edge e; 1462 1.1 mrg 1463 1.1 mrg /* Generate the new common landing-pad label. */ 1464 1.1 mrg rtx_code_label *new_label = gen_label_rtx (); 1465 1.1 mrg LABEL_PRESERVE_P (new_label) = 1; 1466 1.1 mrg 1467 1.1 mrg /* Create the forwarder block. */ 1468 1.1 mrg basic_block new_bb = create_eh_forwarder_block (new_label, old_bb); 1469 1.1 mrg 1470 1.1 mrg /* Create the map from old to new lp index and initialize it. */ 1471 1.1 mrg unsigned *index_map = (unsigned *) alloca (lp_len * sizeof (unsigned)); 1472 1.1 mrg memset (index_map, 0, lp_len * sizeof (unsigned)); 1473 1.1 mrg 1474 1.1 mrg /* Fix up the edges. */ 1475 1.1 mrg for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; ) 1476 1.1 mrg if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb)) 1477 1.1 mrg { 1478 1.1 mrg rtx_insn *insn = BB_END (e->src); 1479 1.1 mrg rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX); 1480 1.1 mrg 1481 1.1 mrg gcc_assert (note != NULL); 1482 1.1 mrg const unsigned old_index = INTVAL (XEXP (note, 0)); 1483 1.1 mrg 1484 1.1 mrg /* Generate the new landing-pad structure. */ 1485 1.1 mrg if (index_map[old_index] == 0) 1486 1.1 mrg { 1487 1.1 mrg eh_landing_pad old_lp = (*cfun->eh->lp_array)[old_index]; 1488 1.1 mrg eh_landing_pad new_lp = gen_eh_landing_pad (old_lp->region); 1489 1.1 mrg new_lp->post_landing_pad = old_lp->post_landing_pad; 1490 1.1 mrg new_lp->landing_pad = new_label; 1491 1.1 mrg index_map[old_index] = new_lp->index; 1492 1.1 mrg } 1493 1.1 mrg XEXP (note, 0) = GEN_INT (index_map[old_index]); 1494 1.1 mrg 1495 1.1 mrg /* Adjust the edge to the new destination. */ 1496 1.1 mrg redirect_edge_succ (e, new_bb); 1497 1.1 mrg } 1498 1.1 mrg else 1499 1.1 mrg ei_next (&ei); 1500 1.1 mrg } 1501 1.1 mrg 1502 1.1 mrg /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. 1503 1.1 mrg Add a new landing pad that will just jump to the old one and split the 1504 1.1 mrg edges so that no EH edge crosses partitions. */ 1505 1.1 mrg 1506 1.1 mrg static void 1507 1.1 mrg dw2_fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb) 1508 1.1 mrg { 1509 1.1 mrg eh_landing_pad new_lp; 1510 1.1 mrg edge_iterator ei; 1511 1.1 mrg edge e; 1512 1.1 mrg 1513 1.1 mrg /* Generate the new landing-pad structure. */ 1514 1.1 mrg new_lp = gen_eh_landing_pad (old_lp->region); 1515 1.1 mrg new_lp->post_landing_pad = old_lp->post_landing_pad; 1516 1.1 mrg new_lp->landing_pad = gen_label_rtx (); 1517 1.1 mrg LABEL_PRESERVE_P (new_lp->landing_pad) = 1; 1518 1.1 mrg 1519 1.1 mrg /* Create the forwarder block. */ 1520 1.1 mrg basic_block new_bb = create_eh_forwarder_block (new_lp->landing_pad, old_bb); 1521 1.1 mrg 1522 1.1 mrg /* Fix up the edges. */ 1523 1.1 mrg for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; ) 1524 1.1 mrg if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb)) 1525 1.1 mrg { 1526 1.1 mrg rtx_insn *insn = BB_END (e->src); 1527 1.1 mrg rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX); 1528 1.1 mrg 1529 1.1 mrg gcc_assert (note != NULL); 1530 1.1 mrg gcc_checking_assert (INTVAL (XEXP (note, 0)) == old_lp->index); 1531 1.1 mrg XEXP (note, 0) = GEN_INT (new_lp->index); 1532 1.1 mrg 1533 1.1 mrg /* Adjust the edge to the new destination. */ 1534 1.1 mrg redirect_edge_succ (e, new_bb); 1535 1.1 mrg } 1536 1.1 mrg else 1537 1.1 mrg ei_next (&ei); 1538 1.1 mrg } 1539 1.1 mrg 1540 1.1 mrg 1541 1.1 mrg /* Ensure that all hot bbs are included in a hot path through the 1542 1.1 mrg procedure. This is done by calling this function twice, once 1543 1.1 mrg with WALK_UP true (to look for paths from the entry to hot bbs) and 1544 1.1 mrg once with WALK_UP false (to look for paths from hot bbs to the exit). 1545 1.1 mrg Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs 1546 1.1 mrg to BBS_IN_HOT_PARTITION. */ 1547 1.1 mrg 1548 1.1 mrg static unsigned int 1549 1.1 mrg sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count, 1550 1.1 mrg vec<basic_block> *bbs_in_hot_partition) 1551 1.1 mrg { 1552 1.1 mrg /* Callers check this. */ 1553 1.1 mrg gcc_checking_assert (cold_bb_count); 1554 1.1 mrg 1555 1.1 mrg /* Keep examining hot bbs while we still have some left to check 1556 1.1 mrg and there are remaining cold bbs. */ 1557 1.1 mrg vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy (); 1558 1.1 mrg while (! hot_bbs_to_check.is_empty () 1559 1.1 mrg && cold_bb_count) 1560 1.1 mrg { 1561 1.1 mrg basic_block bb = hot_bbs_to_check.pop (); 1562 1.1 mrg vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs; 1563 1.1 mrg edge e; 1564 1.1 mrg edge_iterator ei; 1565 1.1 mrg profile_probability highest_probability 1566 1.1 mrg = profile_probability::uninitialized (); 1567 1.1 mrg profile_count highest_count = profile_count::uninitialized (); 1568 1.1 mrg bool found = false; 1569 1.1 mrg 1570 1.1 mrg /* Walk the preds/succs and check if there is at least one already 1571 1.1 mrg marked hot. Keep track of the most frequent pred/succ so that we 1572 1.1 mrg can mark it hot if we don't find one. */ 1573 1.1 mrg FOR_EACH_EDGE (e, ei, edges) 1574 1.1 mrg { 1575 1.1 mrg basic_block reach_bb = walk_up ? e->src : e->dest; 1576 1.1 mrg 1577 1.1 mrg if (e->flags & EDGE_DFS_BACK) 1578 1.1 mrg continue; 1579 1.1 mrg 1580 1.1 mrg /* Do not expect profile insanities when profile was not adjusted. */ 1581 1.1 mrg if (e->probability == profile_probability::never () 1582 1.1 mrg || e->count () == profile_count::zero ()) 1583 1.1 mrg continue; 1584 1.1 mrg 1585 1.1 mrg if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) 1586 1.1 mrg { 1587 1.1 mrg found = true; 1588 1.1 mrg break; 1589 1.1 mrg } 1590 1.1 mrg /* The following loop will look for the hottest edge via 1591 1.1 mrg the edge count, if it is non-zero, then fallback to 1592 1.1 mrg the edge probability. */ 1593 1.1 mrg if (!(e->count () > highest_count)) 1594 1.1 mrg highest_count = e->count (); 1595 1.1 mrg if (!highest_probability.initialized_p () 1596 1.1 mrg || e->probability > highest_probability) 1597 1.1 mrg highest_probability = e->probability; 1598 1.1 mrg } 1599 1.1 mrg 1600 1.1 mrg /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot 1601 1.1 mrg block (or unpartitioned, e.g. the entry block) then it is ok. If not, 1602 1.1 mrg then the most frequent pred (or succ) needs to be adjusted. In the 1603 1.1 mrg case where multiple preds/succs have the same frequency (e.g. a 1604 1.1 mrg 50-50 branch), then both will be adjusted. */ 1605 1.1 mrg if (found) 1606 1.1 mrg continue; 1607 1.1 mrg 1608 1.1 mrg FOR_EACH_EDGE (e, ei, edges) 1609 1.1 mrg { 1610 1.1 mrg if (e->flags & EDGE_DFS_BACK) 1611 1.1 mrg continue; 1612 1.1 mrg /* Do not expect profile insanities when profile was not adjusted. */ 1613 1.1 mrg if (e->probability == profile_probability::never () 1614 1.1 mrg || e->count () == profile_count::zero ()) 1615 1.1 mrg continue; 1616 1.1 mrg /* Select the hottest edge using the edge count, if it is non-zero, 1617 1.1 mrg then fallback to the edge probability. */ 1618 1.1 mrg if (highest_count.initialized_p ()) 1619 1.1 mrg { 1620 1.1 mrg if (!(e->count () >= highest_count)) 1621 1.1 mrg continue; 1622 1.1 mrg } 1623 1.1 mrg else if (!(e->probability >= highest_probability)) 1624 1.1 mrg continue; 1625 1.1 mrg 1626 1.1 mrg basic_block reach_bb = walk_up ? e->src : e->dest; 1627 1.1 mrg 1628 1.1 mrg /* We have a hot bb with an immediate dominator that is cold. 1629 1.1 mrg The dominator needs to be re-marked hot. */ 1630 1.1 mrg BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION); 1631 1.1 mrg if (dump_file) 1632 1.1 mrg fprintf (dump_file, "Promoting bb %i to hot partition to sanitize " 1633 1.1 mrg "profile of bb %i in %s walk\n", reach_bb->index, 1634 1.1 mrg bb->index, walk_up ? "backward" : "forward"); 1635 1.1 mrg cold_bb_count--; 1636 1.1 mrg 1637 1.1 mrg /* Now we need to examine newly-hot reach_bb to see if it is also 1638 1.1 mrg dominated by a cold bb. */ 1639 1.1 mrg bbs_in_hot_partition->safe_push (reach_bb); 1640 1.1 mrg hot_bbs_to_check.safe_push (reach_bb); 1641 1.1 mrg } 1642 1.1 mrg } 1643 1.1 mrg hot_bbs_to_check.release (); 1644 1.1 mrg 1645 1.1 mrg return cold_bb_count; 1646 1.1 mrg } 1647 1.1 mrg 1648 1.1 mrg 1649 1.1 mrg /* Find the basic blocks that are rarely executed and need to be moved to 1650 1.1 mrg a separate section of the .o file (to cut down on paging and improve 1651 1.1 mrg cache locality). Return a vector of all edges that cross. */ 1652 1.1 mrg 1653 1.1 mrg static vec<edge> 1654 1.1 mrg find_rarely_executed_basic_blocks_and_crossing_edges (void) 1655 1.1 mrg { 1656 1.1 mrg vec<edge> crossing_edges = vNULL; 1657 1.1 mrg basic_block bb; 1658 1.1 mrg edge e; 1659 1.1 mrg edge_iterator ei; 1660 1.1 mrg unsigned int cold_bb_count = 0; 1661 1.1 mrg auto_vec<basic_block> bbs_in_hot_partition; 1662 1.1 mrg 1663 1.1 mrg propagate_unlikely_bbs_forward (); 1664 1.1 mrg 1665 1.1 mrg /* Mark which partition (hot/cold) each basic block belongs in. */ 1666 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 1667 1.1 mrg { 1668 1.1 mrg bool cold_bb = false; 1669 1.1 mrg 1670 1.1 mrg if (probably_never_executed_bb_p (cfun, bb)) 1671 1.1 mrg { 1672 1.1 mrg cold_bb = true; 1673 1.1 mrg 1674 1.1 mrg /* Handle profile insanities created by upstream optimizations 1675 1.1 mrg by also checking the incoming edge weights. If there is a non-cold 1676 1.1 mrg incoming edge, conservatively prevent this block from being split 1677 1.1 mrg into the cold section. */ 1678 1.1 mrg if (!bb->count.precise_p ()) 1679 1.1 mrg FOR_EACH_EDGE (e, ei, bb->preds) 1680 1.1 mrg if (!probably_never_executed_edge_p (cfun, e)) 1681 1.1 mrg { 1682 1.1 mrg cold_bb = false; 1683 1.1 mrg break; 1684 1.1 mrg } 1685 1.1 mrg } 1686 1.1 mrg if (cold_bb) 1687 1.1 mrg { 1688 1.1 mrg BB_SET_PARTITION (bb, BB_COLD_PARTITION); 1689 1.1 mrg cold_bb_count++; 1690 1.1 mrg } 1691 1.1 mrg else 1692 1.1 mrg { 1693 1.1 mrg BB_SET_PARTITION (bb, BB_HOT_PARTITION); 1694 1.1 mrg bbs_in_hot_partition.safe_push (bb); 1695 1.1 mrg } 1696 1.1 mrg } 1697 1.1 mrg 1698 1.1 mrg /* Ensure that hot bbs are included along a hot path from the entry to exit. 1699 1.1 mrg Several different possibilities may include cold bbs along all paths 1700 1.1 mrg to/from a hot bb. One is that there are edge weight insanities 1701 1.1 mrg due to optimization phases that do not properly update basic block profile 1702 1.1 mrg counts. The second is that the entry of the function may not be hot, because 1703 1.1 mrg it is entered fewer times than the number of profile training runs, but there 1704 1.1 mrg is a loop inside the function that causes blocks within the function to be 1705 1.1 mrg above the threshold for hotness. This is fixed by walking up from hot bbs 1706 1.1 mrg to the entry block, and then down from hot bbs to the exit, performing 1707 1.1 mrg partitioning fixups as necessary. */ 1708 1.1 mrg if (cold_bb_count) 1709 1.1 mrg { 1710 1.1 mrg mark_dfs_back_edges (); 1711 1.1 mrg cold_bb_count = sanitize_hot_paths (true, cold_bb_count, 1712 1.1 mrg &bbs_in_hot_partition); 1713 1.1 mrg if (cold_bb_count) 1714 1.1 mrg sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition); 1715 1.1 mrg 1716 1.1 mrg hash_set <basic_block> set; 1717 1.1 mrg find_bbs_reachable_by_hot_paths (&set); 1718 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 1719 1.1 mrg if (!set.contains (bb)) 1720 1.1 mrg BB_SET_PARTITION (bb, BB_COLD_PARTITION); 1721 1.1 mrg } 1722 1.1 mrg 1723 1.1 mrg /* The format of .gcc_except_table does not allow landing pads to 1724 1.1 mrg be in a different partition as the throw. Fix this by either 1725 1.1 mrg moving the landing pads or inserting forwarder landing pads. */ 1726 1.1 mrg if (cfun->eh->lp_array) 1727 1.1 mrg { 1728 1.1 mrg const bool sjlj 1729 1.1 mrg = (targetm_common.except_unwind_info (&global_options) == UI_SJLJ); 1730 1.1 mrg unsigned i; 1731 1.1 mrg eh_landing_pad lp; 1732 1.1 mrg 1733 1.1 mrg FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp) 1734 1.1 mrg { 1735 1.1 mrg bool all_same, all_diff; 1736 1.1 mrg 1737 1.1 mrg if (lp == NULL 1738 1.1 mrg || lp->landing_pad == NULL_RTX 1739 1.1 mrg || !LABEL_P (lp->landing_pad)) 1740 1.1 mrg continue; 1741 1.1 mrg 1742 1.1 mrg all_same = all_diff = true; 1743 1.1 mrg bb = BLOCK_FOR_INSN (lp->landing_pad); 1744 1.1 mrg FOR_EACH_EDGE (e, ei, bb->preds) 1745 1.1 mrg { 1746 1.1 mrg gcc_assert (e->flags & EDGE_EH); 1747 1.1 mrg if (BB_PARTITION (bb) == BB_PARTITION (e->src)) 1748 1.1 mrg all_diff = false; 1749 1.1 mrg else 1750 1.1 mrg all_same = false; 1751 1.1 mrg } 1752 1.1 mrg 1753 1.1 mrg if (all_same) 1754 1.1 mrg ; 1755 1.1 mrg else if (all_diff) 1756 1.1 mrg { 1757 1.1 mrg int which = BB_PARTITION (bb); 1758 1.1 mrg which ^= BB_HOT_PARTITION | BB_COLD_PARTITION; 1759 1.1 mrg BB_SET_PARTITION (bb, which); 1760 1.1 mrg } 1761 1.1 mrg else if (sjlj) 1762 1.1 mrg sjlj_fix_up_crossing_landing_pad (bb); 1763 1.1 mrg else 1764 1.1 mrg dw2_fix_up_crossing_landing_pad (lp, bb); 1765 1.1 mrg 1766 1.1 mrg /* There is a single, common landing pad in SJLJ mode. */ 1767 1.1 mrg if (sjlj) 1768 1.1 mrg break; 1769 1.1 mrg } 1770 1.1 mrg } 1771 1.1 mrg 1772 1.1 mrg /* Mark every edge that crosses between sections. */ 1773 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 1774 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 1775 1.1 mrg { 1776 1.1 mrg unsigned int flags = e->flags; 1777 1.1 mrg 1778 1.1 mrg /* We should never have EDGE_CROSSING set yet. */ 1779 1.1 mrg gcc_checking_assert ((flags & EDGE_CROSSING) == 0); 1780 1.1 mrg 1781 1.1 mrg if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) 1782 1.1 mrg && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 1783 1.1 mrg && BB_PARTITION (e->src) != BB_PARTITION (e->dest)) 1784 1.1 mrg { 1785 1.1 mrg crossing_edges.safe_push (e); 1786 1.1 mrg flags |= EDGE_CROSSING; 1787 1.1 mrg } 1788 1.1 mrg 1789 1.1 mrg /* Now that we've split eh edges as appropriate, allow landing pads 1790 1.1 mrg to be merged with the post-landing pads. */ 1791 1.1 mrg flags &= ~EDGE_PRESERVE; 1792 1.1 mrg 1793 1.1 mrg e->flags = flags; 1794 1.1 mrg } 1795 1.1 mrg 1796 1.1 mrg return crossing_edges; 1797 1.1 mrg } 1798 1.1 mrg 1799 1.1 mrg /* Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. */ 1800 1.1 mrg 1801 1.1 mrg static void 1802 1.1 mrg set_edge_can_fallthru_flag (void) 1803 1.1 mrg { 1804 1.1 mrg basic_block bb; 1805 1.1 mrg 1806 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 1807 1.1 mrg { 1808 1.1 mrg edge e; 1809 1.1 mrg edge_iterator ei; 1810 1.1 mrg 1811 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 1812 1.1 mrg { 1813 1.1 mrg e->flags &= ~EDGE_CAN_FALLTHRU; 1814 1.1 mrg 1815 1.1 mrg /* The FALLTHRU edge is also CAN_FALLTHRU edge. */ 1816 1.1 mrg if (e->flags & EDGE_FALLTHRU) 1817 1.1 mrg e->flags |= EDGE_CAN_FALLTHRU; 1818 1.1 mrg } 1819 1.1 mrg 1820 1.1 mrg /* If the BB ends with an invertible condjump all (2) edges are 1821 1.1 mrg CAN_FALLTHRU edges. */ 1822 1.1 mrg if (EDGE_COUNT (bb->succs) != 2) 1823 1.1 mrg continue; 1824 1.1 mrg if (!any_condjump_p (BB_END (bb))) 1825 1.1 mrg continue; 1826 1.1 mrg 1827 1.1 mrg rtx_jump_insn *bb_end_jump = as_a <rtx_jump_insn *> (BB_END (bb)); 1828 1.1 mrg if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0)) 1829 1.1 mrg continue; 1830 1.1 mrg invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0); 1831 1.1 mrg EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU; 1832 1.1 mrg EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU; 1833 1.1 mrg } 1834 1.1 mrg } 1835 1.1 mrg 1836 1.1 mrg /* If any destination of a crossing edge does not have a label, add label; 1837 1.1 mrg Convert any easy fall-through crossing edges to unconditional jumps. */ 1838 1.1 mrg 1839 1.1 mrg static void 1840 1.1 mrg add_labels_and_missing_jumps (vec<edge> crossing_edges) 1841 1.1 mrg { 1842 1.1 mrg size_t i; 1843 1.1 mrg edge e; 1844 1.1 mrg 1845 1.1 mrg FOR_EACH_VEC_ELT (crossing_edges, i, e) 1846 1.1 mrg { 1847 1.1 mrg basic_block src = e->src; 1848 1.1 mrg basic_block dest = e->dest; 1849 1.1 mrg rtx_jump_insn *new_jump; 1850 1.1 mrg 1851 1.1 mrg if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) 1852 1.1 mrg continue; 1853 1.1 mrg 1854 1.1 mrg /* Make sure dest has a label. */ 1855 1.1 mrg rtx_code_label *label = block_label (dest); 1856 1.1 mrg 1857 1.1 mrg /* Nothing to do for non-fallthru edges. */ 1858 1.1 mrg if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun)) 1859 1.1 mrg continue; 1860 1.1 mrg if ((e->flags & EDGE_FALLTHRU) == 0) 1861 1.1 mrg continue; 1862 1.1 mrg 1863 1.1 mrg /* If the block does not end with a control flow insn, then we 1864 1.1 mrg can trivially add a jump to the end to fixup the crossing. 1865 1.1 mrg Otherwise the jump will have to go in a new bb, which will 1866 1.1 mrg be handled by fix_up_fall_thru_edges function. */ 1867 1.1 mrg if (control_flow_insn_p (BB_END (src))) 1868 1.1 mrg continue; 1869 1.1 mrg 1870 1.1 mrg /* Make sure there's only one successor. */ 1871 1.1 mrg gcc_assert (single_succ_p (src)); 1872 1.1 mrg 1873 1.1 mrg new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src)); 1874 1.1 mrg BB_END (src) = new_jump; 1875 1.1 mrg JUMP_LABEL (new_jump) = label; 1876 1.1 mrg LABEL_NUSES (label) += 1; 1877 1.1 mrg 1878 1.1 mrg emit_barrier_after_bb (src); 1879 1.1 mrg 1880 1.1 mrg /* Mark edge as non-fallthru. */ 1881 1.1 mrg e->flags &= ~EDGE_FALLTHRU; 1882 1.1 mrg } 1883 1.1 mrg } 1884 1.1 mrg 1885 1.1 mrg /* Find any bb's where the fall-through edge is a crossing edge (note that 1886 1.1 mrg these bb's must also contain a conditional jump or end with a call 1887 1.1 mrg instruction; we've already dealt with fall-through edges for blocks 1888 1.1 mrg that didn't have a conditional jump or didn't end with call instruction 1889 1.1 mrg in the call to add_labels_and_missing_jumps). Convert the fall-through 1890 1.1 mrg edge to non-crossing edge by inserting a new bb to fall-through into. 1891 1.1 mrg The new bb will contain an unconditional jump (crossing edge) to the 1892 1.1 mrg original fall through destination. */ 1893 1.1 mrg 1894 1.1 mrg static void 1895 1.1 mrg fix_up_fall_thru_edges (void) 1896 1.1 mrg { 1897 1.1 mrg basic_block cur_bb; 1898 1.1 mrg 1899 1.1 mrg FOR_EACH_BB_FN (cur_bb, cfun) 1900 1.1 mrg { 1901 1.1 mrg edge succ1; 1902 1.1 mrg edge succ2; 1903 1.1 mrg edge fall_thru = NULL; 1904 1.1 mrg edge cond_jump = NULL; 1905 1.1 mrg 1906 1.1 mrg fall_thru = NULL; 1907 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 0) 1908 1.1 mrg succ1 = EDGE_SUCC (cur_bb, 0); 1909 1.1 mrg else 1910 1.1 mrg succ1 = NULL; 1911 1.1 mrg 1912 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 1) 1913 1.1 mrg succ2 = EDGE_SUCC (cur_bb, 1); 1914 1.1 mrg else 1915 1.1 mrg succ2 = NULL; 1916 1.1 mrg 1917 1.1 mrg /* Find the fall-through edge. */ 1918 1.1 mrg 1919 1.1 mrg if (succ1 1920 1.1 mrg && (succ1->flags & EDGE_FALLTHRU)) 1921 1.1 mrg { 1922 1.1 mrg fall_thru = succ1; 1923 1.1 mrg cond_jump = succ2; 1924 1.1 mrg } 1925 1.1 mrg else if (succ2 1926 1.1 mrg && (succ2->flags & EDGE_FALLTHRU)) 1927 1.1 mrg { 1928 1.1 mrg fall_thru = succ2; 1929 1.1 mrg cond_jump = succ1; 1930 1.1 mrg } 1931 1.1 mrg else if (succ2 && EDGE_COUNT (cur_bb->succs) > 2) 1932 1.1 mrg fall_thru = find_fallthru_edge (cur_bb->succs); 1933 1.1 mrg 1934 1.1 mrg if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))) 1935 1.1 mrg { 1936 1.1 mrg /* Check to see if the fall-thru edge is a crossing edge. */ 1937 1.1 mrg 1938 1.1 mrg if (fall_thru->flags & EDGE_CROSSING) 1939 1.1 mrg { 1940 1.1 mrg /* The fall_thru edge crosses; now check the cond jump edge, if 1941 1.1 mrg it exists. */ 1942 1.1 mrg 1943 1.1 mrg bool cond_jump_crosses = true; 1944 1.1 mrg int invert_worked = 0; 1945 1.1 mrg rtx_insn *old_jump = BB_END (cur_bb); 1946 1.1 mrg 1947 1.1 mrg /* Find the jump instruction, if there is one. */ 1948 1.1 mrg 1949 1.1 mrg if (cond_jump) 1950 1.1 mrg { 1951 1.1 mrg if (!(cond_jump->flags & EDGE_CROSSING)) 1952 1.1 mrg cond_jump_crosses = false; 1953 1.1 mrg 1954 1.1 mrg /* We know the fall-thru edge crosses; if the cond 1955 1.1 mrg jump edge does NOT cross, and its destination is the 1956 1.1 mrg next block in the bb order, invert the jump 1957 1.1 mrg (i.e. fix it so the fall through does not cross and 1958 1.1 mrg the cond jump does). */ 1959 1.1 mrg 1960 1.1 mrg if (!cond_jump_crosses) 1961 1.1 mrg { 1962 1.1 mrg /* Find label in fall_thru block. We've already added 1963 1.1 mrg any missing labels, so there must be one. */ 1964 1.1 mrg 1965 1.1 mrg rtx_code_label *fall_thru_label 1966 1.1 mrg = block_label (fall_thru->dest); 1967 1.1 mrg 1968 1.1 mrg if (old_jump && fall_thru_label) 1969 1.1 mrg { 1970 1.1 mrg rtx_jump_insn *old_jump_insn 1971 1.1 mrg = dyn_cast <rtx_jump_insn *> (old_jump); 1972 1.1 mrg if (old_jump_insn) 1973 1.1 mrg invert_worked = invert_jump (old_jump_insn, 1974 1.1 mrg fall_thru_label, 0); 1975 1.1 mrg } 1976 1.1 mrg 1977 1.1 mrg if (invert_worked) 1978 1.1 mrg { 1979 1.1 mrg fall_thru->flags &= ~EDGE_FALLTHRU; 1980 1.1 mrg cond_jump->flags |= EDGE_FALLTHRU; 1981 1.1 mrg update_br_prob_note (cur_bb); 1982 1.1 mrg std::swap (fall_thru, cond_jump); 1983 1.1 mrg cond_jump->flags |= EDGE_CROSSING; 1984 1.1 mrg fall_thru->flags &= ~EDGE_CROSSING; 1985 1.1 mrg } 1986 1.1 mrg } 1987 1.1 mrg } 1988 1.1 mrg 1989 1.1 mrg if (cond_jump_crosses || !invert_worked) 1990 1.1 mrg { 1991 1.1 mrg /* This is the case where both edges out of the basic 1992 1.1 mrg block are crossing edges. Here we will fix up the 1993 1.1 mrg fall through edge. The jump edge will be taken care 1994 1.1 mrg of later. The EDGE_CROSSING flag of fall_thru edge 1995 1.1 mrg is unset before the call to force_nonfallthru 1996 1.1 mrg function because if a new basic-block is created 1997 1.1 mrg this edge remains in the current section boundary 1998 1.1 mrg while the edge between new_bb and the fall_thru->dest 1999 1.1 mrg becomes EDGE_CROSSING. */ 2000 1.1 mrg 2001 1.1 mrg fall_thru->flags &= ~EDGE_CROSSING; 2002 1.1 mrg unsigned old_count = EDGE_COUNT (cur_bb->succs); 2003 1.1 mrg basic_block new_bb = force_nonfallthru (fall_thru); 2004 1.1 mrg 2005 1.1 mrg if (new_bb) 2006 1.1 mrg { 2007 1.1 mrg new_bb->aux = cur_bb->aux; 2008 1.1 mrg cur_bb->aux = new_bb; 2009 1.1 mrg 2010 1.1 mrg /* This is done by force_nonfallthru_and_redirect. */ 2011 1.1 mrg gcc_assert (BB_PARTITION (new_bb) 2012 1.1 mrg == BB_PARTITION (cur_bb)); 2013 1.1 mrg 2014 1.1 mrg edge e = single_succ_edge (new_bb); 2015 1.1 mrg e->flags |= EDGE_CROSSING; 2016 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > old_count) 2017 1.1 mrg { 2018 1.1 mrg /* If asm goto has a crossing fallthrough edge 2019 1.1 mrg and at least one of the labels to the same bb, 2020 1.1 mrg force_nonfallthru can result in the fallthrough 2021 1.1 mrg edge being redirected and a new edge added for the 2022 1.1 mrg label or more labels to e->dest. As we've 2023 1.1 mrg temporarily cleared EDGE_CROSSING flag on the 2024 1.1 mrg fallthrough edge, we need to restore it again. 2025 1.1 mrg See PR108596. */ 2026 1.1 mrg rtx_insn *j = BB_END (cur_bb); 2027 1.1 mrg gcc_checking_assert (JUMP_P (j) 2028 1.1 mrg && asm_noperands (PATTERN (j))); 2029 1.1 mrg edge e2 = find_edge (cur_bb, e->dest); 2030 1.1 mrg if (e2) 2031 1.1 mrg e2->flags |= EDGE_CROSSING; 2032 1.1 mrg } 2033 1.1 mrg } 2034 1.1 mrg else 2035 1.1 mrg { 2036 1.1 mrg /* If a new basic-block was not created; restore 2037 1.1 mrg the EDGE_CROSSING flag. */ 2038 1.1 mrg fall_thru->flags |= EDGE_CROSSING; 2039 1.1 mrg } 2040 1.1 mrg 2041 1.1 mrg /* Add barrier after new jump */ 2042 1.1 mrg emit_barrier_after_bb (new_bb ? new_bb : cur_bb); 2043 1.1 mrg } 2044 1.1 mrg } 2045 1.1 mrg } 2046 1.1 mrg } 2047 1.1 mrg } 2048 1.1 mrg 2049 1.1 mrg /* This function checks the destination block of a "crossing jump" to 2050 1.1 mrg see if it has any crossing predecessors that begin with a code label 2051 1.1 mrg and end with an unconditional jump. If so, it returns that predecessor 2052 1.1 mrg block. (This is to avoid creating lots of new basic blocks that all 2053 1.1 mrg contain unconditional jumps to the same destination). */ 2054 1.1 mrg 2055 1.1 mrg static basic_block 2056 1.1 mrg find_jump_block (basic_block jump_dest) 2057 1.1 mrg { 2058 1.1 mrg basic_block source_bb = NULL; 2059 1.1 mrg edge e; 2060 1.1 mrg rtx_insn *insn; 2061 1.1 mrg edge_iterator ei; 2062 1.1 mrg 2063 1.1 mrg FOR_EACH_EDGE (e, ei, jump_dest->preds) 2064 1.1 mrg if (e->flags & EDGE_CROSSING) 2065 1.1 mrg { 2066 1.1 mrg basic_block src = e->src; 2067 1.1 mrg 2068 1.1 mrg /* Check each predecessor to see if it has a label, and contains 2069 1.1 mrg only one executable instruction, which is an unconditional jump. 2070 1.1 mrg If so, we can use it. */ 2071 1.1 mrg 2072 1.1 mrg if (LABEL_P (BB_HEAD (src))) 2073 1.1 mrg for (insn = BB_HEAD (src); 2074 1.1 mrg !INSN_P (insn) && insn != NEXT_INSN (BB_END (src)); 2075 1.1 mrg insn = NEXT_INSN (insn)) 2076 1.1 mrg { 2077 1.1 mrg if (INSN_P (insn) 2078 1.1 mrg && insn == BB_END (src) 2079 1.1 mrg && JUMP_P (insn) 2080 1.1 mrg && !any_condjump_p (insn)) 2081 1.1 mrg { 2082 1.1 mrg source_bb = src; 2083 1.1 mrg break; 2084 1.1 mrg } 2085 1.1 mrg } 2086 1.1 mrg 2087 1.1 mrg if (source_bb) 2088 1.1 mrg break; 2089 1.1 mrg } 2090 1.1 mrg 2091 1.1 mrg return source_bb; 2092 1.1 mrg } 2093 1.1 mrg 2094 1.1 mrg /* Find all BB's with conditional jumps that are crossing edges; 2095 1.1 mrg insert a new bb and make the conditional jump branch to the new 2096 1.1 mrg bb instead (make the new bb same color so conditional branch won't 2097 1.1 mrg be a 'crossing' edge). Insert an unconditional jump from the 2098 1.1 mrg new bb to the original destination of the conditional jump. */ 2099 1.1 mrg 2100 1.1 mrg static void 2101 1.1 mrg fix_crossing_conditional_branches (void) 2102 1.1 mrg { 2103 1.1 mrg basic_block cur_bb; 2104 1.1 mrg basic_block new_bb; 2105 1.1 mrg basic_block dest; 2106 1.1 mrg edge succ1; 2107 1.1 mrg edge succ2; 2108 1.1 mrg edge crossing_edge; 2109 1.1 mrg edge new_edge; 2110 1.1 mrg rtx set_src; 2111 1.1 mrg rtx old_label = NULL_RTX; 2112 1.1 mrg rtx_code_label *new_label; 2113 1.1 mrg 2114 1.1 mrg FOR_EACH_BB_FN (cur_bb, cfun) 2115 1.1 mrg { 2116 1.1 mrg crossing_edge = NULL; 2117 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 0) 2118 1.1 mrg succ1 = EDGE_SUCC (cur_bb, 0); 2119 1.1 mrg else 2120 1.1 mrg succ1 = NULL; 2121 1.1 mrg 2122 1.1 mrg if (EDGE_COUNT (cur_bb->succs) > 1) 2123 1.1 mrg succ2 = EDGE_SUCC (cur_bb, 1); 2124 1.1 mrg else 2125 1.1 mrg succ2 = NULL; 2126 1.1 mrg 2127 1.1 mrg /* We already took care of fall-through edges, so only one successor 2128 1.1 mrg can be a crossing edge. */ 2129 1.1 mrg 2130 1.1 mrg if (succ1 && (succ1->flags & EDGE_CROSSING)) 2131 1.1 mrg crossing_edge = succ1; 2132 1.1 mrg else if (succ2 && (succ2->flags & EDGE_CROSSING)) 2133 1.1 mrg crossing_edge = succ2; 2134 1.1 mrg 2135 1.1 mrg if (crossing_edge) 2136 1.1 mrg { 2137 1.1 mrg rtx_insn *old_jump = BB_END (cur_bb); 2138 1.1 mrg 2139 1.1 mrg /* Check to make sure the jump instruction is a 2140 1.1 mrg conditional jump. */ 2141 1.1 mrg 2142 1.1 mrg set_src = NULL_RTX; 2143 1.1 mrg 2144 1.1 mrg if (any_condjump_p (old_jump)) 2145 1.1 mrg { 2146 1.1 mrg if (GET_CODE (PATTERN (old_jump)) == SET) 2147 1.1 mrg set_src = SET_SRC (PATTERN (old_jump)); 2148 1.1 mrg else if (GET_CODE (PATTERN (old_jump)) == PARALLEL) 2149 1.1 mrg { 2150 1.1 mrg set_src = XVECEXP (PATTERN (old_jump), 0,0); 2151 1.1 mrg if (GET_CODE (set_src) == SET) 2152 1.1 mrg set_src = SET_SRC (set_src); 2153 1.1 mrg else 2154 1.1 mrg set_src = NULL_RTX; 2155 1.1 mrg } 2156 1.1 mrg } 2157 1.1 mrg 2158 1.1 mrg if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE)) 2159 1.1 mrg { 2160 1.1 mrg rtx_jump_insn *old_jump_insn = 2161 1.1 mrg as_a <rtx_jump_insn *> (old_jump); 2162 1.1 mrg 2163 1.1 mrg if (GET_CODE (XEXP (set_src, 1)) == PC) 2164 1.1 mrg old_label = XEXP (set_src, 2); 2165 1.1 mrg else if (GET_CODE (XEXP (set_src, 2)) == PC) 2166 1.1 mrg old_label = XEXP (set_src, 1); 2167 1.1 mrg 2168 1.1 mrg /* Check to see if new bb for jumping to that dest has 2169 1.1 mrg already been created; if so, use it; if not, create 2170 1.1 mrg a new one. */ 2171 1.1 mrg 2172 1.1 mrg new_bb = find_jump_block (crossing_edge->dest); 2173 1.1 mrg 2174 1.1 mrg if (new_bb) 2175 1.1 mrg new_label = block_label (new_bb); 2176 1.1 mrg else 2177 1.1 mrg { 2178 1.1 mrg basic_block last_bb; 2179 1.1 mrg rtx_code_label *old_jump_target; 2180 1.1 mrg rtx_jump_insn *new_jump; 2181 1.1 mrg 2182 1.1 mrg /* Create new basic block to be dest for 2183 1.1 mrg conditional jump. */ 2184 1.1 mrg 2185 1.1 mrg /* Put appropriate instructions in new bb. */ 2186 1.1 mrg 2187 1.1 mrg new_label = gen_label_rtx (); 2188 1.1 mrg emit_label (new_label); 2189 1.1 mrg 2190 1.1 mrg gcc_assert (GET_CODE (old_label) == LABEL_REF); 2191 1.1 mrg old_jump_target = old_jump_insn->jump_target (); 2192 1.1 mrg new_jump = as_a <rtx_jump_insn *> 2193 1.1 mrg (emit_jump_insn (targetm.gen_jump (old_jump_target))); 2194 1.1 mrg new_jump->set_jump_target (old_jump_target); 2195 1.1 mrg 2196 1.1 mrg last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; 2197 1.1 mrg new_bb = create_basic_block (new_label, new_jump, last_bb); 2198 1.1 mrg new_bb->aux = last_bb->aux; 2199 1.1 mrg last_bb->aux = new_bb; 2200 1.1 mrg 2201 1.1 mrg emit_barrier_after_bb (new_bb); 2202 1.1 mrg 2203 1.1 mrg /* Make sure new bb is in same partition as source 2204 1.1 mrg of conditional branch. */ 2205 1.1 mrg BB_COPY_PARTITION (new_bb, cur_bb); 2206 1.1 mrg } 2207 1.1 mrg 2208 1.1 mrg /* Make old jump branch to new bb. */ 2209 1.1 mrg 2210 1.1 mrg redirect_jump (old_jump_insn, new_label, 0); 2211 1.1 mrg 2212 1.1 mrg /* Remove crossing_edge as predecessor of 'dest'. */ 2213 1.1 mrg 2214 1.1 mrg dest = crossing_edge->dest; 2215 1.1 mrg 2216 1.1 mrg redirect_edge_succ (crossing_edge, new_bb); 2217 1.1 mrg 2218 1.1 mrg /* Make a new edge from new_bb to old dest; new edge 2219 1.1 mrg will be a successor for new_bb and a predecessor 2220 1.1 mrg for 'dest'. */ 2221 1.1 mrg 2222 1.1 mrg if (EDGE_COUNT (new_bb->succs) == 0) 2223 1.1 mrg new_edge = make_single_succ_edge (new_bb, dest, 0); 2224 1.1 mrg else 2225 1.1 mrg new_edge = EDGE_SUCC (new_bb, 0); 2226 1.1 mrg 2227 1.1 mrg crossing_edge->flags &= ~EDGE_CROSSING; 2228 1.1 mrg new_edge->flags |= EDGE_CROSSING; 2229 1.1 mrg } 2230 1.1 mrg } 2231 1.1 mrg } 2232 1.1 mrg } 2233 1.1 mrg 2234 1.1 mrg /* Find any unconditional branches that cross between hot and cold 2235 1.1 mrg sections. Convert them into indirect jumps instead. */ 2236 1.1 mrg 2237 1.1 mrg static void 2238 1.1 mrg fix_crossing_unconditional_branches (void) 2239 1.1 mrg { 2240 1.1 mrg basic_block cur_bb; 2241 1.1 mrg rtx_insn *last_insn; 2242 1.1 mrg rtx label; 2243 1.1 mrg rtx label_addr; 2244 1.1 mrg rtx_insn *indirect_jump_sequence; 2245 1.1 mrg rtx_insn *jump_insn = NULL; 2246 1.1 mrg rtx new_reg; 2247 1.1 mrg rtx_insn *cur_insn; 2248 1.1 mrg edge succ; 2249 1.1 mrg 2250 1.1 mrg FOR_EACH_BB_FN (cur_bb, cfun) 2251 1.1 mrg { 2252 1.1 mrg last_insn = BB_END (cur_bb); 2253 1.1 mrg 2254 1.1 mrg if (EDGE_COUNT (cur_bb->succs) < 1) 2255 1.1 mrg continue; 2256 1.1 mrg 2257 1.1 mrg succ = EDGE_SUCC (cur_bb, 0); 2258 1.1 mrg 2259 1.1 mrg /* Check to see if bb ends in a crossing (unconditional) jump. At 2260 1.1 mrg this point, no crossing jumps should be conditional. */ 2261 1.1 mrg 2262 1.1 mrg if (JUMP_P (last_insn) 2263 1.1 mrg && (succ->flags & EDGE_CROSSING)) 2264 1.1 mrg { 2265 1.1 mrg gcc_assert (!any_condjump_p (last_insn)); 2266 1.1 mrg 2267 1.1 mrg /* Make sure the jump is not already an indirect or table jump. */ 2268 1.1 mrg 2269 1.1 mrg if (!computed_jump_p (last_insn) 2270 1.1 mrg && !tablejump_p (last_insn, NULL, NULL) 2271 1.1 mrg && asm_noperands (PATTERN (last_insn)) < 0) 2272 1.1 mrg { 2273 1.1 mrg /* We have found a "crossing" unconditional branch. Now 2274 1.1 mrg we must convert it to an indirect jump. First create 2275 1.1 mrg reference of label, as target for jump. */ 2276 1.1 mrg 2277 1.1 mrg label = JUMP_LABEL (last_insn); 2278 1.1 mrg label_addr = gen_rtx_LABEL_REF (Pmode, label); 2279 1.1 mrg LABEL_NUSES (label) += 1; 2280 1.1 mrg 2281 1.1 mrg /* Get a register to use for the indirect jump. */ 2282 1.1 mrg 2283 1.1 mrg new_reg = gen_reg_rtx (Pmode); 2284 1.1 mrg 2285 1.1 mrg /* Generate indirect the jump sequence. */ 2286 1.1 mrg 2287 1.1 mrg start_sequence (); 2288 1.1 mrg emit_move_insn (new_reg, label_addr); 2289 1.1 mrg emit_indirect_jump (new_reg); 2290 1.1 mrg indirect_jump_sequence = get_insns (); 2291 1.1 mrg end_sequence (); 2292 1.1 mrg 2293 1.1 mrg /* Make sure every instruction in the new jump sequence has 2294 1.1 mrg its basic block set to be cur_bb. */ 2295 1.1 mrg 2296 1.1 mrg for (cur_insn = indirect_jump_sequence; cur_insn; 2297 1.1 mrg cur_insn = NEXT_INSN (cur_insn)) 2298 1.1 mrg { 2299 1.1 mrg if (!BARRIER_P (cur_insn)) 2300 1.1 mrg BLOCK_FOR_INSN (cur_insn) = cur_bb; 2301 1.1 mrg if (JUMP_P (cur_insn)) 2302 1.1 mrg jump_insn = cur_insn; 2303 1.1 mrg } 2304 1.1 mrg 2305 1.1 mrg /* Insert the new (indirect) jump sequence immediately before 2306 1.1 mrg the unconditional jump, then delete the unconditional jump. */ 2307 1.1 mrg 2308 1.1 mrg emit_insn_before (indirect_jump_sequence, last_insn); 2309 1.1 mrg delete_insn (last_insn); 2310 1.1 mrg 2311 1.1 mrg JUMP_LABEL (jump_insn) = label; 2312 1.1 mrg LABEL_NUSES (label)++; 2313 1.1 mrg 2314 1.1 mrg /* Make BB_END for cur_bb be the jump instruction (NOT the 2315 1.1 mrg barrier instruction at the end of the sequence...). */ 2316 1.1 mrg 2317 1.1 mrg BB_END (cur_bb) = jump_insn; 2318 1.1 mrg } 2319 1.1 mrg } 2320 1.1 mrg } 2321 1.1 mrg } 2322 1.1 mrg 2323 1.1 mrg /* Update CROSSING_JUMP_P flags on all jump insns. */ 2324 1.1 mrg 2325 1.1 mrg static void 2326 1.1 mrg update_crossing_jump_flags (void) 2327 1.1 mrg { 2328 1.1 mrg basic_block bb; 2329 1.1 mrg edge e; 2330 1.1 mrg edge_iterator ei; 2331 1.1 mrg 2332 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 2333 1.1 mrg FOR_EACH_EDGE (e, ei, bb->succs) 2334 1.1 mrg if (e->flags & EDGE_CROSSING) 2335 1.1 mrg { 2336 1.1 mrg if (JUMP_P (BB_END (bb))) 2337 1.1 mrg CROSSING_JUMP_P (BB_END (bb)) = 1; 2338 1.1 mrg break; 2339 1.1 mrg } 2340 1.1 mrg } 2341 1.1 mrg 2342 1.1 mrg /* Reorder basic blocks using the software trace cache (STC) algorithm. */ 2343 1.1 mrg 2344 1.1 mrg static void 2345 1.1 mrg reorder_basic_blocks_software_trace_cache (void) 2346 1.1 mrg { 2347 1.1 mrg if (dump_file) 2348 1.1 mrg fprintf (dump_file, "\nReordering with the STC algorithm.\n\n"); 2349 1.1 mrg 2350 1.1 mrg int n_traces; 2351 1.1 mrg int i; 2352 1.1 mrg struct trace *traces; 2353 1.1 mrg 2354 1.1 mrg /* We are estimating the length of uncond jump insn only once since the code 2355 1.1 mrg for getting the insn length always returns the minimal length now. */ 2356 1.1 mrg if (uncond_jump_length == 0) 2357 1.1 mrg uncond_jump_length = get_uncond_jump_length (); 2358 1.1 mrg 2359 1.1 mrg /* We need to know some information for each basic block. */ 2360 1.1 mrg array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun)); 2361 1.1 mrg bbd = XNEWVEC (bbro_basic_block_data, array_size); 2362 1.1 mrg for (i = 0; i < array_size; i++) 2363 1.1 mrg { 2364 1.1 mrg bbd[i].start_of_trace = -1; 2365 1.1 mrg bbd[i].end_of_trace = -1; 2366 1.1 mrg bbd[i].in_trace = -1; 2367 1.1 mrg bbd[i].visited = 0; 2368 1.1 mrg bbd[i].priority = -1; 2369 1.1 mrg bbd[i].heap = NULL; 2370 1.1 mrg bbd[i].node = NULL; 2371 1.1 mrg } 2372 1.1 mrg 2373 1.1 mrg traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun)); 2374 1.1 mrg n_traces = 0; 2375 1.1 mrg find_traces (&n_traces, traces); 2376 1.1 mrg connect_traces (n_traces, traces); 2377 1.1 mrg FREE (traces); 2378 1.1 mrg FREE (bbd); 2379 1.1 mrg } 2380 1.1 mrg 2381 1.1 mrg /* Order edges by execution frequency, higher first. */ 2382 1.1 mrg 2383 1.1 mrg static int 2384 1.1 mrg edge_order (const void *ve1, const void *ve2) 2385 1.1 mrg { 2386 1.1 mrg edge e1 = *(const edge *) ve1; 2387 1.1 mrg edge e2 = *(const edge *) ve2; 2388 1.1 mrg profile_count c1 = e1->count (); 2389 1.1 mrg profile_count c2 = e2->count (); 2390 1.1 mrg /* Since profile_count::operator< does not establish a strict weak order 2391 1.1 mrg in presence of uninitialized counts, use 'max': this makes them appear 2392 1.1 mrg as if having execution frequency less than any initialized count. */ 2393 1.1 mrg profile_count m = c1.max (c2); 2394 1.1 mrg return (m == c2) - (m == c1); 2395 1.1 mrg } 2396 1.1 mrg 2397 1.1 mrg /* Reorder basic blocks using the "simple" algorithm. This tries to 2398 1.1 mrg maximize the dynamic number of branches that are fallthrough, without 2399 1.1 mrg copying instructions. The algorithm is greedy, looking at the most 2400 1.1 mrg frequently executed branch first. */ 2401 1.1 mrg 2402 1.1 mrg static void 2403 1.1 mrg reorder_basic_blocks_simple (void) 2404 1.1 mrg { 2405 1.1 mrg if (dump_file) 2406 1.1 mrg fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n"); 2407 1.1 mrg 2408 1.1 mrg edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)]; 2409 1.1 mrg 2410 1.1 mrg /* First, collect all edges that can be optimized by reordering blocks: 2411 1.1 mrg simple jumps and conditional jumps, as well as the function entry edge. */ 2412 1.1 mrg 2413 1.1 mrg int n = 0; 2414 1.1 mrg edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0); 2415 1.1 mrg 2416 1.1 mrg basic_block bb; 2417 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 2418 1.1 mrg { 2419 1.1 mrg rtx_insn *end = BB_END (bb); 2420 1.1 mrg 2421 1.1 mrg if (computed_jump_p (end) || tablejump_p (end, NULL, NULL)) 2422 1.1 mrg continue; 2423 1.1 mrg 2424 1.1 mrg /* We cannot optimize asm goto. */ 2425 1.1 mrg if (JUMP_P (end) && extract_asm_operands (end)) 2426 1.1 mrg continue; 2427 1.1 mrg 2428 1.1 mrg if (single_succ_p (bb)) 2429 1.1 mrg edges[n++] = EDGE_SUCC (bb, 0); 2430 1.1 mrg else if (any_condjump_p (end)) 2431 1.1 mrg { 2432 1.1 mrg edge e0 = EDGE_SUCC (bb, 0); 2433 1.1 mrg edge e1 = EDGE_SUCC (bb, 1); 2434 1.1 mrg /* When optimizing for size it is best to keep the original 2435 1.1 mrg fallthrough edges. */ 2436 1.1 mrg if (e1->flags & EDGE_FALLTHRU) 2437 1.1 mrg std::swap (e0, e1); 2438 1.1 mrg edges[n++] = e0; 2439 1.1 mrg edges[n++] = e1; 2440 1.1 mrg } 2441 1.1 mrg } 2442 1.1 mrg 2443 1.1 mrg /* Sort the edges, the most desirable first. When optimizing for size 2444 1.1 mrg all edges are equally desirable. */ 2445 1.1 mrg 2446 1.1 mrg if (optimize_function_for_speed_p (cfun)) 2447 1.1 mrg gcc_stablesort (edges, n, sizeof *edges, edge_order); 2448 1.1 mrg 2449 1.1 mrg /* Now decide which of those edges to make fallthrough edges. We set 2450 1.1 mrg BB_VISITED if a block already has a fallthrough successor assigned 2451 1.1 mrg to it. We make ->AUX of an endpoint point to the opposite endpoint 2452 1.1 mrg of a sequence of blocks that fall through, and ->AUX will be NULL 2453 1.1 mrg for a block that is in such a sequence but not an endpoint anymore. 2454 1.1 mrg 2455 1.1 mrg To start with, everything points to itself, nothing is assigned yet. */ 2456 1.1 mrg 2457 1.1 mrg FOR_ALL_BB_FN (bb, cfun) 2458 1.1 mrg { 2459 1.1 mrg bb->aux = bb; 2460 1.1 mrg bb->flags &= ~BB_VISITED; 2461 1.1 mrg } 2462 1.1 mrg 2463 1.1 mrg EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0; 2464 1.1 mrg 2465 1.1 mrg /* Now for all edges, the most desirable first, see if that edge can 2466 1.1 mrg connect two sequences. If it can, update AUX and BB_VISITED; if it 2467 1.1 mrg cannot, zero out the edge in the table. */ 2468 1.1 mrg 2469 1.1 mrg for (int j = 0; j < n; j++) 2470 1.1 mrg { 2471 1.1 mrg edge e = edges[j]; 2472 1.1 mrg 2473 1.1 mrg basic_block tail_a = e->src; 2474 1.1 mrg basic_block head_b = e->dest; 2475 1.1 mrg basic_block head_a = (basic_block) tail_a->aux; 2476 1.1 mrg basic_block tail_b = (basic_block) head_b->aux; 2477 1.1 mrg 2478 1.1 mrg /* An edge cannot connect two sequences if: 2479 1.1 mrg - it crosses partitions; 2480 1.1 mrg - its src is not a current endpoint; 2481 1.1 mrg - its dest is not a current endpoint; 2482 1.1 mrg - or, it would create a loop. */ 2483 1.1 mrg 2484 1.1 mrg if (e->flags & EDGE_CROSSING 2485 1.1 mrg || tail_a->flags & BB_VISITED 2486 1.1 mrg || !tail_b 2487 1.1 mrg || (!(head_b->flags & BB_VISITED) && head_b != tail_b) 2488 1.1 mrg || tail_a == tail_b) 2489 1.1 mrg { 2490 1.1 mrg edges[j] = 0; 2491 1.1 mrg continue; 2492 1.1 mrg } 2493 1.1 mrg 2494 1.1 mrg tail_a->aux = 0; 2495 1.1 mrg head_b->aux = 0; 2496 1.1 mrg head_a->aux = tail_b; 2497 1.1 mrg tail_b->aux = head_a; 2498 1.1 mrg tail_a->flags |= BB_VISITED; 2499 1.1 mrg } 2500 1.1 mrg 2501 1.1 mrg /* Put the pieces together, in the same order that the start blocks of 2502 1.1 mrg the sequences already had. The hot/cold partitioning gives a little 2503 1.1 mrg complication: as a first pass only do this for blocks in the same 2504 1.1 mrg partition as the start block, and (if there is anything left to do) 2505 1.1 mrg in a second pass handle the other partition. */ 2506 1.1 mrg 2507 1.1 mrg basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux; 2508 1.1 mrg 2509 1.1 mrg int current_partition 2510 1.1 mrg = BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun) 2511 1.1 mrg ? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0)->dest 2512 1.1 mrg : last_tail); 2513 1.1 mrg bool need_another_pass = true; 2514 1.1 mrg 2515 1.1 mrg for (int pass = 0; pass < 2 && need_another_pass; pass++) 2516 1.1 mrg { 2517 1.1 mrg need_another_pass = false; 2518 1.1 mrg 2519 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 2520 1.1 mrg if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb) 2521 1.1 mrg { 2522 1.1 mrg if (BB_PARTITION (bb) != current_partition) 2523 1.1 mrg { 2524 1.1 mrg need_another_pass = true; 2525 1.1 mrg continue; 2526 1.1 mrg } 2527 1.1 mrg 2528 1.1 mrg last_tail->aux = bb; 2529 1.1 mrg last_tail = (basic_block) bb->aux; 2530 1.1 mrg } 2531 1.1 mrg 2532 1.1 mrg current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION; 2533 1.1 mrg } 2534 1.1 mrg 2535 1.1 mrg last_tail->aux = 0; 2536 1.1 mrg 2537 1.1 mrg /* Finally, link all the chosen fallthrough edges. */ 2538 1.1 mrg 2539 1.1 mrg for (int j = 0; j < n; j++) 2540 1.1 mrg if (edges[j]) 2541 1.1 mrg edges[j]->src->aux = edges[j]->dest; 2542 1.1 mrg 2543 1.1 mrg delete[] edges; 2544 1.1 mrg 2545 1.1 mrg /* If the entry edge no longer falls through we have to make a new 2546 1.1 mrg block so it can do so again. */ 2547 1.1 mrg 2548 1.1 mrg edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0); 2549 1.1 mrg if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux) 2550 1.1 mrg { 2551 1.1 mrg force_nonfallthru (e); 2552 1.1 mrg e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux; 2553 1.1 mrg } 2554 1.1 mrg } 2555 1.1 mrg 2556 1.1 mrg /* Reorder basic blocks. The main entry point to this file. */ 2557 1.1 mrg 2558 1.1 mrg static void 2559 1.1 mrg reorder_basic_blocks (void) 2560 1.1 mrg { 2561 1.1 mrg gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT); 2562 1.1 mrg 2563 1.1 mrg if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1) 2564 1.1 mrg return; 2565 1.1 mrg 2566 1.1 mrg set_edge_can_fallthru_flag (); 2567 1.1 mrg mark_dfs_back_edges (); 2568 1.1 mrg 2569 1.1 mrg switch (flag_reorder_blocks_algorithm) 2570 1.1 mrg { 2571 1.1 mrg case REORDER_BLOCKS_ALGORITHM_SIMPLE: 2572 1.1 mrg reorder_basic_blocks_simple (); 2573 1.1 mrg break; 2574 1.1 mrg 2575 1.1 mrg case REORDER_BLOCKS_ALGORITHM_STC: 2576 1.1 mrg reorder_basic_blocks_software_trace_cache (); 2577 1.1 mrg break; 2578 1.1 mrg 2579 1.1 mrg default: 2580 1.1 mrg gcc_unreachable (); 2581 1.1 mrg } 2582 1.1 mrg 2583 1.1 mrg relink_block_chain (/*stay_in_cfglayout_mode=*/true); 2584 1.1 mrg 2585 1.1 mrg if (dump_file) 2586 1.1 mrg { 2587 1.1 mrg if (dump_flags & TDF_DETAILS) 2588 1.1 mrg dump_reg_info (dump_file); 2589 1.1 mrg dump_flow_info (dump_file, dump_flags); 2590 1.1 mrg } 2591 1.1 mrg 2592 1.1 mrg /* Signal that rtl_verify_flow_info_1 can now verify that there 2593 1.1 mrg is at most one switch between hot/cold sections. */ 2594 1.1 mrg crtl->bb_reorder_complete = true; 2595 1.1 mrg } 2596 1.1 mrg 2597 1.1 mrg /* Determine which partition the first basic block in the function 2598 1.1 mrg belongs to, then find the first basic block in the current function 2599 1.1 mrg that belongs to a different section, and insert a 2600 1.1 mrg NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the 2601 1.1 mrg instruction stream. When writing out the assembly code, 2602 1.1 mrg encountering this note will make the compiler switch between the 2603 1.1 mrg hot and cold text sections. */ 2604 1.1 mrg 2605 1.1 mrg void 2606 1.1 mrg insert_section_boundary_note (void) 2607 1.1 mrg { 2608 1.1 mrg basic_block bb; 2609 1.1 mrg bool switched_sections = false; 2610 1.1 mrg int current_partition = 0; 2611 1.1 mrg 2612 1.1 mrg if (!crtl->has_bb_partition) 2613 1.1 mrg return; 2614 1.1 mrg 2615 1.1 mrg FOR_EACH_BB_FN (bb, cfun) 2616 1.1 mrg { 2617 1.1 mrg if (!current_partition) 2618 1.1 mrg current_partition = BB_PARTITION (bb); 2619 1.1 mrg if (BB_PARTITION (bb) != current_partition) 2620 1.1 mrg { 2621 1.1 mrg gcc_assert (!switched_sections); 2622 1.1 mrg switched_sections = true; 2623 1.1 mrg emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb)); 2624 1.1 mrg current_partition = BB_PARTITION (bb); 2625 1.1 mrg } 2626 1.1 mrg } 2627 1.1 mrg 2628 1.1 mrg /* Make sure crtl->has_bb_partition matches reality even if bbpart finds 2629 1.1 mrg some hot and some cold basic blocks, but later one of those kinds is 2630 1.1 mrg optimized away. */ 2631 1.1 mrg crtl->has_bb_partition = switched_sections; 2632 1.1 mrg } 2633 1.1 mrg 2634 1.1 mrg namespace { 2635 1.1 mrg 2636 1.1 mrg const pass_data pass_data_reorder_blocks = 2637 1.1 mrg { 2638 1.1 mrg RTL_PASS, /* type */ 2639 1.1 mrg "bbro", /* name */ 2640 1.1 mrg OPTGROUP_NONE, /* optinfo_flags */ 2641 1.1 mrg TV_REORDER_BLOCKS, /* tv_id */ 2642 1.1 mrg 0, /* properties_required */ 2643 1.1 mrg 0, /* properties_provided */ 2644 1.1 mrg 0, /* properties_destroyed */ 2645 1.1 mrg 0, /* todo_flags_start */ 2646 1.1 mrg 0, /* todo_flags_finish */ 2647 1.1 mrg }; 2648 1.1 mrg 2649 1.1 mrg class pass_reorder_blocks : public rtl_opt_pass 2650 1.1 mrg { 2651 1.1 mrg public: 2652 1.1 mrg pass_reorder_blocks (gcc::context *ctxt) 2653 1.1 mrg : rtl_opt_pass (pass_data_reorder_blocks, ctxt) 2654 1.1 mrg {} 2655 1.1 mrg 2656 1.1 mrg /* opt_pass methods: */ 2657 1.1 mrg virtual bool gate (function *) 2658 1.1 mrg { 2659 1.1 mrg if (targetm.cannot_modify_jumps_p ()) 2660 1.1 mrg return false; 2661 1.1 mrg return (optimize > 0 2662 1.1 mrg && (flag_reorder_blocks || flag_reorder_blocks_and_partition)); 2663 1.1 mrg } 2664 1.1 mrg 2665 1.1 mrg virtual unsigned int execute (function *); 2666 1.1 mrg 2667 1.1 mrg }; // class pass_reorder_blocks 2668 1.1 mrg 2669 1.1 mrg unsigned int 2670 1.1 mrg pass_reorder_blocks::execute (function *fun) 2671 1.1 mrg { 2672 1.1 mrg basic_block bb; 2673 1.1 mrg 2674 1.1 mrg /* Last attempt to optimize CFG, as scheduling, peepholing and insn 2675 1.1 mrg splitting possibly introduced more crossjumping opportunities. */ 2676 1.1 mrg cfg_layout_initialize (CLEANUP_EXPENSIVE); 2677 1.1 mrg 2678 1.1 mrg reorder_basic_blocks (); 2679 1.1 mrg cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING); 2680 1.1 mrg 2681 1.1 mrg FOR_EACH_BB_FN (bb, fun) 2682 1.1 mrg if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun)) 2683 1.1 mrg bb->aux = bb->next_bb; 2684 1.1 mrg cfg_layout_finalize (); 2685 1.1 mrg 2686 1.1 mrg FOR_EACH_BB_FN (bb, fun) 2687 1.1 mrg df_recompute_luids (bb); 2688 1.1 mrg return 0; 2689 1.1 mrg } 2690 1.1 mrg 2691 1.1 mrg } // anon namespace 2692 1.1 mrg 2693 1.1 mrg rtl_opt_pass * 2694 1.1 mrg make_pass_reorder_blocks (gcc::context *ctxt) 2695 1.1 mrg { 2696 1.1 mrg return new pass_reorder_blocks (ctxt); 2697 1.1 mrg } 2698 1.1 mrg 2699 1.1 mrg /* Duplicate a block (that we already know ends in a computed jump) into its 2700 1.1 mrg predecessors, where possible. Return whether anything is changed. */ 2701 1.1 mrg static bool 2702 1.1 mrg maybe_duplicate_computed_goto (basic_block bb, int max_size) 2703 1.1 mrg { 2704 1.1 mrg /* Make sure that the block is small enough. */ 2705 1.1 mrg rtx_insn *insn; 2706 1.1 mrg FOR_BB_INSNS (bb, insn) 2707 1.1 mrg if (INSN_P (insn)) 2708 1.1 mrg { 2709 1.1 mrg max_size -= get_attr_min_length (insn); 2710 1.1 mrg if (max_size < 0) 2711 1.1 mrg return false; 2712 1.1 mrg } 2713 1.1 mrg 2714 1.1 mrg bool changed = false; 2715 1.1 mrg edge e; 2716 1.1 mrg edge_iterator ei; 2717 1.1 mrg for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) 2718 1.1 mrg { 2719 1.1 mrg basic_block pred = e->src; 2720 1.1 mrg 2721 1.1 mrg /* Do not duplicate BB into PRED if we cannot merge a copy of BB 2722 1.1 mrg with PRED. */ 2723 1.1 mrg if (!single_succ_p (pred) 2724 1.1 mrg || e->flags & EDGE_COMPLEX 2725 1.1 mrg || pred->index < NUM_FIXED_BLOCKS 2726 1.1 mrg || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred))) 2727 1.1 mrg || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred)))) 2728 1.1 mrg { 2729 1.1 mrg ei_next (&ei); 2730 1.1 mrg continue; 2731 1.1 mrg } 2732 1.1 mrg 2733 1.1 mrg if (dump_file) 2734 1.1 mrg fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n", 2735 1.1 mrg bb->index, e->src->index); 2736 1.1 mrg 2737 1.1 mrg /* Remember if PRED can be duplicated; if so, the copy of BB merged 2738 1.1 mrg with PRED can be duplicated as well. */ 2739 1.1 mrg bool can_dup_more = can_duplicate_block_p (pred); 2740 1.1 mrg 2741 1.1 mrg /* Make a copy of BB, merge it into PRED. */ 2742 1.1 mrg basic_block copy = duplicate_block (bb, e, NULL); 2743 1.1 mrg emit_barrier_after_bb (copy); 2744 1.1 mrg reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred)); 2745 1.1 mrg merge_blocks (pred, copy); 2746 1.1 mrg 2747 1.1 mrg changed = true; 2748 1.1 mrg 2749 1.1 mrg /* Try to merge the resulting merged PRED into further predecessors. */ 2750 1.1 mrg if (can_dup_more) 2751 1.1 mrg maybe_duplicate_computed_goto (pred, max_size); 2752 1.1 mrg } 2753 1.1 mrg 2754 1.1 mrg return changed; 2755 1.1 mrg } 2756 1.1 mrg 2757 1.1 mrg /* Duplicate the blocks containing computed gotos. This basically unfactors 2758 1.1 mrg computed gotos that were factored early on in the compilation process to 2759 1.1 mrg speed up edge based data flow. We used to not unfactor them again, which 2760 1.1 mrg can seriously pessimize code with many computed jumps in the source code, 2761 1.1 mrg such as interpreters. See e.g. PR15242. */ 2762 1.1 mrg static void 2763 1.1 mrg duplicate_computed_gotos (function *fun) 2764 1.1 mrg { 2765 1.1 mrg /* We are estimating the length of uncond jump insn only once 2766 1.1 mrg since the code for getting the insn length always returns 2767 1.1 mrg the minimal length now. */ 2768 1.1 mrg if (uncond_jump_length == 0) 2769 1.1 mrg uncond_jump_length = get_uncond_jump_length (); 2770 1.1 mrg 2771 1.1 mrg /* Never copy a block larger than this. */ 2772 1.1 mrg int max_size 2773 1.1 mrg = uncond_jump_length * param_max_goto_duplication_insns; 2774 1.1 mrg 2775 1.1 mrg bool changed = false; 2776 1.1 mrg 2777 1.1 mrg /* Try to duplicate all blocks that end in a computed jump and that 2778 1.1 mrg can be duplicated at all. */ 2779 1.1 mrg basic_block bb; 2780 1.1 mrg FOR_EACH_BB_FN (bb, fun) 2781 1.1 mrg if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb)) 2782 1.1 mrg changed |= maybe_duplicate_computed_goto (bb, max_size); 2783 1.1 mrg 2784 1.1 mrg /* Some blocks may have become unreachable. */ 2785 1.1 mrg if (changed) 2786 1.1 mrg cleanup_cfg (0); 2787 1.1 mrg 2788 1.1 mrg /* Duplicating blocks will redirect edges and may cause hot blocks 2789 1.1 mrg previously reached by both hot and cold blocks to become dominated 2790 1.1 mrg only by cold blocks. */ 2791 1.1 mrg if (changed) 2792 1.1 mrg fixup_partitions (); 2793 1.1 mrg } 2794 1.1 mrg 2795 1.1 mrg namespace { 2796 1.1 mrg 2797 1.1 mrg const pass_data pass_data_duplicate_computed_gotos = 2798 1.1 mrg { 2799 1.1 mrg RTL_PASS, /* type */ 2800 1.1 mrg "compgotos", /* name */ 2801 1.1 mrg OPTGROUP_NONE, /* optinfo_flags */ 2802 1.1 mrg TV_REORDER_BLOCKS, /* tv_id */ 2803 1.1 mrg 0, /* properties_required */ 2804 1.1 mrg 0, /* properties_provided */ 2805 1.1 mrg 0, /* properties_destroyed */ 2806 1.1 mrg 0, /* todo_flags_start */ 2807 1.1 mrg 0, /* todo_flags_finish */ 2808 1.1 mrg }; 2809 1.1 mrg 2810 1.1 mrg class pass_duplicate_computed_gotos : public rtl_opt_pass 2811 1.1 mrg { 2812 1.1 mrg public: 2813 1.1 mrg pass_duplicate_computed_gotos (gcc::context *ctxt) 2814 1.1 mrg : rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt) 2815 1.1 mrg {} 2816 1.1 mrg 2817 1.1 mrg /* opt_pass methods: */ 2818 1.1 mrg virtual bool gate (function *); 2819 1.1 mrg virtual unsigned int execute (function *); 2820 1.1 mrg 2821 1.1 mrg }; // class pass_duplicate_computed_gotos 2822 1.1 mrg 2823 1.1 mrg bool 2824 1.1 mrg pass_duplicate_computed_gotos::gate (function *fun) 2825 1.1 mrg { 2826 1.1 mrg if (targetm.cannot_modify_jumps_p ()) 2827 1.1 mrg return false; 2828 1.1 mrg return (optimize > 0 2829 1.1 mrg && flag_expensive_optimizations 2830 1.1 mrg && ! optimize_function_for_size_p (fun)); 2831 1.1 mrg } 2832 1.1 mrg 2833 1.1 mrg unsigned int 2834 1.1 mrg pass_duplicate_computed_gotos::execute (function *fun) 2835 1.1 mrg { 2836 1.1 mrg duplicate_computed_gotos (fun); 2837 1.1 mrg 2838 1.1 mrg return 0; 2839 1.1 mrg } 2840 1.1 mrg 2841 1.1 mrg } // anon namespace 2842 1.1 mrg 2843 1.1 mrg rtl_opt_pass * 2844 1.1 mrg make_pass_duplicate_computed_gotos (gcc::context *ctxt) 2845 1.1 mrg { 2846 1.1 mrg return new pass_duplicate_computed_gotos (ctxt); 2847 1.1 mrg } 2848 1.1 mrg 2849 1.1 mrg /* This function is the main 'entrance' for the optimization that 2850 1.1 mrg partitions hot and cold basic blocks into separate sections of the 2851 1.1 mrg .o file (to improve performance and cache locality). Ideally it 2852 1.1 mrg would be called after all optimizations that rearrange the CFG have 2853 1.1 mrg been called. However part of this optimization may introduce new 2854 1.1 mrg register usage, so it must be called before register allocation has 2855 1.1 mrg occurred. This means that this optimization is actually called 2856 1.1 mrg well before the optimization that reorders basic blocks (see 2857 1.1 mrg function above). 2858 1.1 mrg 2859 1.1 mrg This optimization checks the feedback information to determine 2860 1.1 mrg which basic blocks are hot/cold, updates flags on the basic blocks 2861 1.1 mrg to indicate which section they belong in. This information is 2862 1.1 mrg later used for writing out sections in the .o file. Because hot 2863 1.1 mrg and cold sections can be arbitrarily large (within the bounds of 2864 1.1 mrg memory), far beyond the size of a single function, it is necessary 2865 1.1 mrg to fix up all edges that cross section boundaries, to make sure the 2866 1.1 mrg instructions used can actually span the required distance. The 2867 1.1 mrg fixes are described below. 2868 1.1 mrg 2869 1.1 mrg Fall-through edges must be changed into jumps; it is not safe or 2870 1.1 mrg legal to fall through across a section boundary. Whenever a 2871 1.1 mrg fall-through edge crossing a section boundary is encountered, a new 2872 1.1 mrg basic block is inserted (in the same section as the fall-through 2873 1.1 mrg source), and the fall through edge is redirected to the new basic 2874 1.1 mrg block. The new basic block contains an unconditional jump to the 2875 1.1 mrg original fall-through target. (If the unconditional jump is 2876 1.1 mrg insufficient to cross section boundaries, that is dealt with a 2877 1.1 mrg little later, see below). 2878 1.1 mrg 2879 1.1 mrg In order to deal with architectures that have short conditional 2880 1.1 mrg branches (which cannot span all of memory) we take any conditional 2881 1.1 mrg jump that attempts to cross a section boundary and add a level of 2882 1.1 mrg indirection: it becomes a conditional jump to a new basic block, in 2883 1.1 mrg the same section. The new basic block contains an unconditional 2884 1.1 mrg jump to the original target, in the other section. 2885 1.1 mrg 2886 1.1 mrg For those architectures whose unconditional branch is also 2887 1.1 mrg incapable of reaching all of memory, those unconditional jumps are 2888 1.1 mrg converted into indirect jumps, through a register. 2889 1.1 mrg 2890 1.1 mrg IMPORTANT NOTE: This optimization causes some messy interactions 2891 1.1 mrg with the cfg cleanup optimizations; those optimizations want to 2892 1.1 mrg merge blocks wherever possible, and to collapse indirect jump 2893 1.1 mrg sequences (change "A jumps to B jumps to C" directly into "A jumps 2894 1.1 mrg to C"). Those optimizations can undo the jump fixes that 2895 1.1 mrg partitioning is required to make (see above), in order to ensure 2896 1.1 mrg that jumps attempting to cross section boundaries are really able 2897 1.1 mrg to cover whatever distance the jump requires (on many architectures 2898 1.1 mrg conditional or unconditional jumps are not able to reach all of 2899 1.1 mrg memory). Therefore tests have to be inserted into each such 2900 1.1 mrg optimization to make sure that it does not undo stuff necessary to 2901 1.1 mrg cross partition boundaries. This would be much less of a problem 2902 1.1 mrg if we could perform this optimization later in the compilation, but 2903 1.1 mrg unfortunately the fact that we may need to create indirect jumps 2904 1.1 mrg (through registers) requires that this optimization be performed 2905 1.1 mrg before register allocation. 2906 1.1 mrg 2907 1.1 mrg Hot and cold basic blocks are partitioned and put in separate 2908 1.1 mrg sections of the .o file, to reduce paging and improve cache 2909 1.1 mrg performance (hopefully). This can result in bits of code from the 2910 1.1 mrg same function being widely separated in the .o file. However this 2911 1.1 mrg is not obvious to the current bb structure. Therefore we must take 2912 1.1 mrg care to ensure that: 1). There are no fall_thru edges that cross 2913 1.1 mrg between sections; 2). For those architectures which have "short" 2914 1.1 mrg conditional branches, all conditional branches that attempt to 2915 1.1 mrg cross between sections are converted to unconditional branches; 2916 1.1 mrg and, 3). For those architectures which have "short" unconditional 2917 1.1 mrg branches, all unconditional branches that attempt to cross between 2918 1.1 mrg sections are converted to indirect jumps. 2919 1.1 mrg 2920 1.1 mrg The code for fixing up fall_thru edges that cross between hot and 2921 1.1 mrg cold basic blocks does so by creating new basic blocks containing 2922 1.1 mrg unconditional branches to the appropriate label in the "other" 2923 1.1 mrg section. The new basic block is then put in the same (hot or cold) 2924 1.1 mrg section as the original conditional branch, and the fall_thru edge 2925 1.1 mrg is modified to fall into the new basic block instead. By adding 2926 1.1 mrg this level of indirection we end up with only unconditional branches 2927 1.1 mrg crossing between hot and cold sections. 2928 1.1 mrg 2929 1.1 mrg Conditional branches are dealt with by adding a level of indirection. 2930 1.1 mrg A new basic block is added in the same (hot/cold) section as the 2931 1.1 mrg conditional branch, and the conditional branch is retargeted to the 2932 1.1 mrg new basic block. The new basic block contains an unconditional branch 2933 1.1 mrg to the original target of the conditional branch (in the other section). 2934 1.1 mrg 2935 1.1 mrg Unconditional branches are dealt with by converting them into 2936 1.1 mrg indirect jumps. */ 2937 1.1 mrg 2938 1.1 mrg namespace { 2939 1.1 mrg 2940 1.1 mrg const pass_data pass_data_partition_blocks = 2941 1.1 mrg { 2942 1.1 mrg RTL_PASS, /* type */ 2943 1.1 mrg "bbpart", /* name */ 2944 1.1 mrg OPTGROUP_NONE, /* optinfo_flags */ 2945 1.1 mrg TV_REORDER_BLOCKS, /* tv_id */ 2946 1.1 mrg PROP_cfglayout, /* properties_required */ 2947 1.1 mrg 0, /* properties_provided */ 2948 1.1 mrg 0, /* properties_destroyed */ 2949 1.1 mrg 0, /* todo_flags_start */ 2950 1.1 mrg 0, /* todo_flags_finish */ 2951 1.1 mrg }; 2952 1.1 mrg 2953 1.1 mrg class pass_partition_blocks : public rtl_opt_pass 2954 1.1 mrg { 2955 1.1 mrg public: 2956 1.1 mrg pass_partition_blocks (gcc::context *ctxt) 2957 1.1 mrg : rtl_opt_pass (pass_data_partition_blocks, ctxt) 2958 1.1 mrg {} 2959 1.1 mrg 2960 1.1 mrg /* opt_pass methods: */ 2961 1.1 mrg virtual bool gate (function *); 2962 1.1 mrg virtual unsigned int execute (function *); 2963 1.1 mrg 2964 1.1 mrg }; // class pass_partition_blocks 2965 1.1 mrg 2966 1.1 mrg bool 2967 1.1 mrg pass_partition_blocks::gate (function *fun) 2968 1.1 mrg { 2969 1.1 mrg /* The optimization to partition hot/cold basic blocks into separate 2970 1.1 mrg sections of the .o file does not work well with linkonce or with 2971 1.1 mrg user defined section attributes or with naked attribute. Don't call 2972 1.1 mrg it if either case arises. */ 2973 1.1 mrg return (flag_reorder_blocks_and_partition 2974 1.1 mrg && optimize 2975 1.1 mrg /* See pass_reorder_blocks::gate. We should not partition if 2976 1.1 mrg we are going to omit the reordering. */ 2977 1.1 mrg && optimize_function_for_speed_p (fun) 2978 1.1 mrg && !DECL_COMDAT_GROUP (current_function_decl) 2979 1.1 mrg && !lookup_attribute ("section", DECL_ATTRIBUTES (fun->decl)) 2980 1.1 mrg && !lookup_attribute ("naked", DECL_ATTRIBUTES (fun->decl)) 2981 1.1 mrg /* Workaround a bug in GDB where read_partial_die doesn't cope 2982 1.1 mrg with DIEs with DW_AT_ranges, see PR81115. */ 2983 1.1 mrg && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl)))); 2984 1.1 mrg } 2985 1.1 mrg 2986 1.1 mrg unsigned 2987 1.1 mrg pass_partition_blocks::execute (function *fun) 2988 1.1 mrg { 2989 1.1 mrg vec<edge> crossing_edges; 2990 1.1 mrg 2991 1.1 mrg if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) 2992 1.1 mrg return 0; 2993 1.1 mrg 2994 1.1 mrg df_set_flags (DF_DEFER_INSN_RESCAN); 2995 1.1 mrg 2996 1.1 mrg crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges (); 2997 1.1 mrg if (!crossing_edges.exists ()) 2998 1.1 mrg /* Make sure to process deferred rescans and clear changeable df flags. */ 2999 1.1 mrg return TODO_df_finish; 3000 1.1 mrg 3001 1.1 mrg crtl->has_bb_partition = true; 3002 1.1 mrg 3003 1.1 mrg /* Make sure the source of any crossing edge ends in a jump and the 3004 1.1 mrg destination of any crossing edge has a label. */ 3005 1.1 mrg add_labels_and_missing_jumps (crossing_edges); 3006 1.1 mrg 3007 1.1 mrg /* Convert all crossing fall_thru edges to non-crossing fall 3008 1.1 mrg thrus to unconditional jumps (that jump to the original fall 3009 1.1 mrg through dest). */ 3010 1.1 mrg fix_up_fall_thru_edges (); 3011 1.1 mrg 3012 1.1 mrg /* If the architecture does not have conditional branches that can 3013 1.1 mrg span all of memory, convert crossing conditional branches into 3014 1.1 mrg crossing unconditional branches. */ 3015 1.1 mrg if (!HAS_LONG_COND_BRANCH) 3016 1.1 mrg fix_crossing_conditional_branches (); 3017 1.1 mrg 3018 1.1 mrg /* If the architecture does not have unconditional branches that 3019 1.1 mrg can span all of memory, convert crossing unconditional branches 3020 1.1 mrg into indirect jumps. Since adding an indirect jump also adds 3021 1.1 mrg a new register usage, update the register usage information as 3022 1.1 mrg well. */ 3023 1.1 mrg if (!HAS_LONG_UNCOND_BRANCH) 3024 1.1 mrg fix_crossing_unconditional_branches (); 3025 1.1 mrg 3026 1.1 mrg update_crossing_jump_flags (); 3027 1.1 mrg 3028 1.1 mrg /* Clear bb->aux fields that the above routines were using. */ 3029 1.1 mrg clear_aux_for_blocks (); 3030 1.1 mrg 3031 1.1 mrg crossing_edges.release (); 3032 1.1 mrg 3033 1.1 mrg /* ??? FIXME: DF generates the bb info for a block immediately. 3034 1.1 mrg And by immediately, I mean *during* creation of the block. 3035 1.1 mrg 3036 1.1 mrg #0 df_bb_refs_collect 3037 1.1 mrg #1 in df_bb_refs_record 3038 1.1 mrg #2 in create_basic_block_structure 3039 1.1 mrg 3040 1.1 mrg Which means that the bb_has_eh_pred test in df_bb_refs_collect 3041 1.1 mrg will *always* fail, because no edges can have been added to the 3042 1.1 mrg block yet. Which of course means we don't add the right 3043 1.1 mrg artificial refs, which means we fail df_verify (much) later. 3044 1.1 mrg 3045 1.1 mrg Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply 3046 1.1 mrg that we also shouldn't grab data from the new blocks those new 3047 1.1 mrg insns are in either. In this way one can create the block, link 3048 1.1 mrg it up properly, and have everything Just Work later, when deferred 3049 1.1 mrg insns are processed. 3050 1.1 mrg 3051 1.1 mrg In the meantime, we have no other option but to throw away all 3052 1.1 mrg of the DF data and recompute it all. */ 3053 1.1 mrg if (fun->eh->lp_array) 3054 1.1 mrg { 3055 1.1 mrg df_finish_pass (true); 3056 1.1 mrg df_scan_alloc (NULL); 3057 1.1 mrg df_scan_blocks (); 3058 1.1 mrg /* Not all post-landing pads use all of the EH_RETURN_DATA_REGNO 3059 1.1 mrg data. We blindly generated all of them when creating the new 3060 1.1 mrg landing pad. Delete those assignments we don't use. */ 3061 1.1 mrg df_set_flags (DF_LR_RUN_DCE); 3062 1.1 mrg df_analyze (); 3063 1.1 mrg } 3064 1.1 mrg 3065 1.1 mrg /* Make sure to process deferred rescans and clear changeable df flags. */ 3066 1.1 mrg return TODO_df_finish; 3067 1.1 mrg } 3068 1.1 mrg 3069 1.1 mrg } // anon namespace 3070 1.1 mrg 3071 1.1 mrg rtl_opt_pass * 3072 1.1 mrg make_pass_partition_blocks (gcc::context *ctxt) 3073 1.1 mrg { 3074 1.1 mrg return new pass_partition_blocks (ctxt); 3075 } 3076