Home | History | Annotate | Line # | Download | only in gcc
      1  1.1  mrg /* Loop distribution.
      2  1.1  mrg    Copyright (C) 2006-2022 Free Software Foundation, Inc.
      3  1.1  mrg    Contributed by Georges-Andre Silber <Georges-Andre.Silber (at) ensmp.fr>
      4  1.1  mrg    and Sebastian Pop <sebastian.pop (at) amd.com>.
      5  1.1  mrg 
      6  1.1  mrg This file is part of GCC.
      7  1.1  mrg 
      8  1.1  mrg GCC is free software; you can redistribute it and/or modify it
      9  1.1  mrg under the terms of the GNU General Public License as published by the
     10  1.1  mrg Free Software Foundation; either version 3, or (at your option) any
     11  1.1  mrg later version.
     12  1.1  mrg 
     13  1.1  mrg GCC is distributed in the hope that it will be useful, but WITHOUT
     14  1.1  mrg ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
     15  1.1  mrg FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
     16  1.1  mrg for more details.
     17  1.1  mrg 
     18  1.1  mrg You should have received a copy of the GNU General Public License
     19  1.1  mrg along with GCC; see the file COPYING3.  If not see
     20  1.1  mrg <http://www.gnu.org/licenses/>.  */
     21  1.1  mrg 
     22  1.1  mrg /* This pass performs loop distribution: for example, the loop
     23  1.1  mrg 
     24  1.1  mrg    |DO I = 2, N
     25  1.1  mrg    |    A(I) = B(I) + C
     26  1.1  mrg    |    D(I) = A(I-1)*E
     27  1.1  mrg    |ENDDO
     28  1.1  mrg 
     29  1.1  mrg    is transformed to
     30  1.1  mrg 
     31  1.1  mrg    |DOALL I = 2, N
     32  1.1  mrg    |   A(I) = B(I) + C
     33  1.1  mrg    |ENDDO
     34  1.1  mrg    |
     35  1.1  mrg    |DOALL I = 2, N
     36  1.1  mrg    |   D(I) = A(I-1)*E
     37  1.1  mrg    |ENDDO
     38  1.1  mrg 
     39  1.1  mrg    Loop distribution is the dual of loop fusion.  It separates statements
     40  1.1  mrg    of a loop (or loop nest) into multiple loops (or loop nests) with the
     41  1.1  mrg    same loop header.  The major goal is to separate statements which may
     42  1.1  mrg    be vectorized from those that can't.  This pass implements distribution
     43  1.1  mrg    in the following steps:
     44  1.1  mrg 
     45  1.1  mrg      1) Seed partitions with specific type statements.  For now we support
     46  1.1  mrg 	two types seed statements: statement defining variable used outside
     47  1.1  mrg 	of loop; statement storing to memory.
     48  1.1  mrg      2) Build reduced dependence graph (RDG) for loop to be distributed.
     49  1.1  mrg 	The vertices (RDG:V) model all statements in the loop and the edges
     50  1.1  mrg 	(RDG:E) model flow and control dependencies between statements.
     51  1.1  mrg      3) Apart from RDG, compute data dependencies between memory references.
     52  1.1  mrg      4) Starting from seed statement, build up partition by adding depended
     53  1.1  mrg 	statements according to RDG's dependence information.  Partition is
     54  1.1  mrg 	classified as parallel type if it can be executed paralleled; or as
     55  1.1  mrg 	sequential type if it can't.  Parallel type partition is further
     56  1.1  mrg 	classified as different builtin kinds if it can be implemented as
     57  1.1  mrg 	builtin function calls.
     58  1.1  mrg      5) Build partition dependence graph (PG) based on data dependencies.
     59  1.1  mrg 	The vertices (PG:V) model all partitions and the edges (PG:E) model
     60  1.1  mrg 	all data dependencies between every partitions pair.  In general,
     61  1.1  mrg 	data dependence is either compilation time known or unknown.  In C
     62  1.1  mrg 	family languages, there exists quite amount compilation time unknown
     63  1.1  mrg 	dependencies because of possible alias relation of data references.
     64  1.1  mrg 	We categorize PG's edge to two types: "true" edge that represents
     65  1.1  mrg 	compilation time known data dependencies; "alias" edge for all other
     66  1.1  mrg 	data dependencies.
     67  1.1  mrg      6) Traverse subgraph of PG as if all "alias" edges don't exist.  Merge
     68  1.1  mrg 	partitions in each strong connected component (SCC) correspondingly.
     69  1.1  mrg 	Build new PG for merged partitions.
     70  1.1  mrg      7) Traverse PG again and this time with both "true" and "alias" edges
     71  1.1  mrg 	included.  We try to break SCCs by removing some edges.  Because
     72  1.1  mrg 	SCCs by "true" edges are all fused in step 6), we can break SCCs
     73  1.1  mrg 	by removing some "alias" edges.  It's NP-hard to choose optimal
     74  1.1  mrg 	edge set, fortunately simple approximation is good enough for us
     75  1.1  mrg 	given the small problem scale.
     76  1.1  mrg      8) Collect all data dependencies of the removed "alias" edges.  Create
     77  1.1  mrg 	runtime alias checks for collected data dependencies.
     78  1.1  mrg      9) Version loop under the condition of runtime alias checks.  Given
     79  1.1  mrg 	loop distribution generally introduces additional overhead, it is
     80  1.1  mrg 	only useful if vectorization is achieved in distributed loop.  We
     81  1.1  mrg 	version loop with internal function call IFN_LOOP_DIST_ALIAS.  If
     82  1.1  mrg 	no distributed loop can be vectorized, we simply remove distributed
     83  1.1  mrg 	loops and recover to the original one.
     84  1.1  mrg 
     85  1.1  mrg    TODO:
     86  1.1  mrg      1) We only distribute innermost two-level loop nest now.  We should
     87  1.1  mrg 	extend it for arbitrary loop nests in the future.
     88  1.1  mrg      2) We only fuse partitions in SCC now.  A better fusion algorithm is
     89  1.1  mrg 	desired to minimize loop overhead, maximize parallelism and maximize
     90  1.1  mrg 	data reuse.  */
     91  1.1  mrg 
     92  1.1  mrg #include "config.h"
     93  1.1  mrg #include "system.h"
     94  1.1  mrg #include "coretypes.h"
     95  1.1  mrg #include "backend.h"
     96  1.1  mrg #include "tree.h"
     97  1.1  mrg #include "gimple.h"
     98  1.1  mrg #include "cfghooks.h"
     99  1.1  mrg #include "tree-pass.h"
    100  1.1  mrg #include "ssa.h"
    101  1.1  mrg #include "gimple-pretty-print.h"
    102  1.1  mrg #include "fold-const.h"
    103  1.1  mrg #include "cfganal.h"
    104  1.1  mrg #include "gimple-iterator.h"
    105  1.1  mrg #include "gimplify-me.h"
    106  1.1  mrg #include "stor-layout.h"
    107  1.1  mrg #include "tree-cfg.h"
    108  1.1  mrg #include "tree-ssa-loop-manip.h"
    109  1.1  mrg #include "tree-ssa-loop-ivopts.h"
    110  1.1  mrg #include "tree-ssa-loop.h"
    111  1.1  mrg #include "tree-into-ssa.h"
    112  1.1  mrg #include "tree-ssa.h"
    113  1.1  mrg #include "cfgloop.h"
    114  1.1  mrg #include "tree-scalar-evolution.h"
    115  1.1  mrg #include "tree-vectorizer.h"
    116  1.1  mrg #include "tree-eh.h"
    117  1.1  mrg #include "gimple-fold.h"
    118  1.1  mrg #include "tree-affine.h"
    119  1.1  mrg #include "intl.h"
    120  1.1  mrg #include "rtl.h"
    121  1.1  mrg #include "memmodel.h"
    122  1.1  mrg #include "optabs.h"
    123  1.1  mrg 
    124  1.1  mrg 
    125  1.1  mrg #define MAX_DATAREFS_NUM \
    126  1.1  mrg 	((unsigned) param_loop_max_datarefs_for_datadeps)
    127  1.1  mrg 
    128  1.1  mrg /* Threshold controlling number of distributed partitions.  Given it may
    129  1.1  mrg    be unnecessary if a memory stream cost model is invented in the future,
    130  1.1  mrg    we define it as a temporary macro, rather than a parameter.  */
    131  1.1  mrg #define NUM_PARTITION_THRESHOLD (4)
    132  1.1  mrg 
    133  1.1  mrg /* Hashtable helpers.  */
    134  1.1  mrg 
    135  1.1  mrg struct ddr_hasher : nofree_ptr_hash <struct data_dependence_relation>
    136  1.1  mrg {
    137  1.1  mrg   static inline hashval_t hash (const data_dependence_relation *);
    138  1.1  mrg   static inline bool equal (const data_dependence_relation *,
    139  1.1  mrg 			    const data_dependence_relation *);
    140  1.1  mrg };
    141  1.1  mrg 
    142  1.1  mrg /* Hash function for data dependence.  */
    143  1.1  mrg 
    144  1.1  mrg inline hashval_t
    145  1.1  mrg ddr_hasher::hash (const data_dependence_relation *ddr)
    146  1.1  mrg {
    147  1.1  mrg   inchash::hash h;
    148  1.1  mrg   h.add_ptr (DDR_A (ddr));
    149  1.1  mrg   h.add_ptr (DDR_B (ddr));
    150  1.1  mrg   return h.end ();
    151  1.1  mrg }
    152  1.1  mrg 
    153  1.1  mrg /* Hash table equality function for data dependence.  */
    154  1.1  mrg 
    155  1.1  mrg inline bool
    156  1.1  mrg ddr_hasher::equal (const data_dependence_relation *ddr1,
    157  1.1  mrg 		   const data_dependence_relation *ddr2)
    158  1.1  mrg {
    159  1.1  mrg   return (DDR_A (ddr1) == DDR_A (ddr2) && DDR_B (ddr1) == DDR_B (ddr2));
    160  1.1  mrg }
    161  1.1  mrg 
    162  1.1  mrg 
    163  1.1  mrg 
    164  1.1  mrg #define DR_INDEX(dr)      ((uintptr_t) (dr)->aux)
    165  1.1  mrg 
    166  1.1  mrg /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
    167  1.1  mrg struct rdg_vertex
    168  1.1  mrg {
    169  1.1  mrg   /* The statement represented by this vertex.  */
    170  1.1  mrg   gimple *stmt;
    171  1.1  mrg 
    172  1.1  mrg   /* Vector of data-references in this statement.  */
    173  1.1  mrg   vec<data_reference_p> datarefs;
    174  1.1  mrg 
    175  1.1  mrg   /* True when the statement contains a write to memory.  */
    176  1.1  mrg   bool has_mem_write;
    177  1.1  mrg 
    178  1.1  mrg   /* True when the statement contains a read from memory.  */
    179  1.1  mrg   bool has_mem_reads;
    180  1.1  mrg };
    181  1.1  mrg 
    182  1.1  mrg #define RDGV_STMT(V)     ((struct rdg_vertex *) ((V)->data))->stmt
    183  1.1  mrg #define RDGV_DATAREFS(V) ((struct rdg_vertex *) ((V)->data))->datarefs
    184  1.1  mrg #define RDGV_HAS_MEM_WRITE(V) ((struct rdg_vertex *) ((V)->data))->has_mem_write
    185  1.1  mrg #define RDGV_HAS_MEM_READS(V) ((struct rdg_vertex *) ((V)->data))->has_mem_reads
    186  1.1  mrg #define RDG_STMT(RDG, I) RDGV_STMT (&(RDG->vertices[I]))
    187  1.1  mrg #define RDG_DATAREFS(RDG, I) RDGV_DATAREFS (&(RDG->vertices[I]))
    188  1.1  mrg #define RDG_MEM_WRITE_STMT(RDG, I) RDGV_HAS_MEM_WRITE (&(RDG->vertices[I]))
    189  1.1  mrg #define RDG_MEM_READS_STMT(RDG, I) RDGV_HAS_MEM_READS (&(RDG->vertices[I]))
    190  1.1  mrg 
    191  1.1  mrg /* Data dependence type.  */
    192  1.1  mrg 
    193  1.1  mrg enum rdg_dep_type
    194  1.1  mrg {
    195  1.1  mrg   /* Read After Write (RAW).  */
    196  1.1  mrg   flow_dd = 'f',
    197  1.1  mrg 
    198  1.1  mrg   /* Control dependence (execute conditional on).  */
    199  1.1  mrg   control_dd = 'c'
    200  1.1  mrg };
    201  1.1  mrg 
    202  1.1  mrg /* Dependence information attached to an edge of the RDG.  */
    203  1.1  mrg 
    204  1.1  mrg struct rdg_edge
    205  1.1  mrg {
    206  1.1  mrg   /* Type of the dependence.  */
    207  1.1  mrg   enum rdg_dep_type type;
    208  1.1  mrg };
    209  1.1  mrg 
    210  1.1  mrg #define RDGE_TYPE(E)        ((struct rdg_edge *) ((E)->data))->type
    211  1.1  mrg 
    212  1.1  mrg /* Kind of distributed loop.  */
    213  1.1  mrg enum partition_kind {
    214  1.1  mrg     PKIND_NORMAL,
    215  1.1  mrg     /* Partial memset stands for a paritition can be distributed into a loop
    216  1.1  mrg        of memset calls, rather than a single memset call.  It's handled just
    217  1.1  mrg        like a normal parition, i.e, distributed as separate loop, no memset
    218  1.1  mrg        call is generated.
    219  1.1  mrg 
    220  1.1  mrg        Note: This is a hacking fix trying to distribute ZERO-ing stmt in a
    221  1.1  mrg        loop nest as deep as possible.  As a result, parloop achieves better
    222  1.1  mrg        parallelization by parallelizing deeper loop nest.  This hack should
    223  1.1  mrg        be unnecessary and removed once distributed memset can be understood
    224  1.1  mrg        and analyzed in data reference analysis.  See PR82604 for more.  */
    225  1.1  mrg     PKIND_PARTIAL_MEMSET,
    226  1.1  mrg     PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
    227  1.1  mrg };
    228  1.1  mrg 
    229  1.1  mrg /* Type of distributed loop.  */
    230  1.1  mrg enum partition_type {
    231  1.1  mrg     /* The distributed loop can be executed parallelly.  */
    232  1.1  mrg     PTYPE_PARALLEL = 0,
    233  1.1  mrg     /* The distributed loop has to be executed sequentially.  */
    234  1.1  mrg     PTYPE_SEQUENTIAL
    235  1.1  mrg };
    236  1.1  mrg 
    237  1.1  mrg /* Builtin info for loop distribution.  */
    238  1.1  mrg struct builtin_info
    239  1.1  mrg {
    240  1.1  mrg   /* data-references a kind != PKIND_NORMAL partition is about.  */
    241  1.1  mrg   data_reference_p dst_dr;
    242  1.1  mrg   data_reference_p src_dr;
    243  1.1  mrg   /* Base address and size of memory objects operated by the builtin.  Note
    244  1.1  mrg      both dest and source memory objects must have the same size.  */
    245  1.1  mrg   tree dst_base;
    246  1.1  mrg   tree src_base;
    247  1.1  mrg   tree size;
    248  1.1  mrg   /* Base and offset part of dst_base after stripping constant offset.  This
    249  1.1  mrg      is only used in memset builtin distribution for now.  */
    250  1.1  mrg   tree dst_base_base;
    251  1.1  mrg   unsigned HOST_WIDE_INT dst_base_offset;
    252  1.1  mrg };
    253  1.1  mrg 
    254  1.1  mrg /* Partition for loop distribution.  */
    255  1.1  mrg struct partition
    256  1.1  mrg {
    257  1.1  mrg   /* Statements of the partition.  */
    258  1.1  mrg   bitmap stmts;
    259  1.1  mrg   /* True if the partition defines variable which is used outside of loop.  */
    260  1.1  mrg   bool reduction_p;
    261  1.1  mrg   location_t loc;
    262  1.1  mrg   enum partition_kind kind;
    263  1.1  mrg   enum partition_type type;
    264  1.1  mrg   /* Data references in the partition.  */
    265  1.1  mrg   bitmap datarefs;
    266  1.1  mrg   /* Information of builtin parition.  */
    267  1.1  mrg   struct builtin_info *builtin;
    268  1.1  mrg };
    269  1.1  mrg 
    270  1.1  mrg /* Partitions are fused because of different reasons.  */
    271  1.1  mrg enum fuse_type
    272  1.1  mrg {
    273  1.1  mrg   FUSE_NON_BUILTIN = 0,
    274  1.1  mrg   FUSE_REDUCTION = 1,
    275  1.1  mrg   FUSE_SHARE_REF = 2,
    276  1.1  mrg   FUSE_SAME_SCC = 3,
    277  1.1  mrg   FUSE_FINALIZE = 4
    278  1.1  mrg };
    279  1.1  mrg 
    280  1.1  mrg /* Description on different fusing reason.  */
    281  1.1  mrg static const char *fuse_message[] = {
    282  1.1  mrg   "they are non-builtins",
    283  1.1  mrg   "they have reductions",
    284  1.1  mrg   "they have shared memory refs",
    285  1.1  mrg   "they are in the same dependence scc",
    286  1.1  mrg   "there is no point to distribute loop"};
    287  1.1  mrg 
    288  1.1  mrg 
    289  1.1  mrg /* Dump vertex I in RDG to FILE.  */
    290  1.1  mrg 
    291  1.1  mrg static void
    292  1.1  mrg dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
    293  1.1  mrg {
    294  1.1  mrg   struct vertex *v = &(rdg->vertices[i]);
    295  1.1  mrg   struct graph_edge *e;
    296  1.1  mrg 
    297  1.1  mrg   fprintf (file, "(vertex %d: (%s%s) (in:", i,
    298  1.1  mrg 	   RDG_MEM_WRITE_STMT (rdg, i) ? "w" : "",
    299  1.1  mrg 	   RDG_MEM_READS_STMT (rdg, i) ? "r" : "");
    300  1.1  mrg 
    301  1.1  mrg   if (v->pred)
    302  1.1  mrg     for (e = v->pred; e; e = e->pred_next)
    303  1.1  mrg       fprintf (file, " %d", e->src);
    304  1.1  mrg 
    305  1.1  mrg   fprintf (file, ") (out:");
    306  1.1  mrg 
    307  1.1  mrg   if (v->succ)
    308  1.1  mrg     for (e = v->succ; e; e = e->succ_next)
    309  1.1  mrg       fprintf (file, " %d", e->dest);
    310  1.1  mrg 
    311  1.1  mrg   fprintf (file, ")\n");
    312  1.1  mrg   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
    313  1.1  mrg   fprintf (file, ")\n");
    314  1.1  mrg }
    315  1.1  mrg 
    316  1.1  mrg /* Call dump_rdg_vertex on stderr.  */
    317  1.1  mrg 
    318  1.1  mrg DEBUG_FUNCTION void
    319  1.1  mrg debug_rdg_vertex (struct graph *rdg, int i)
    320  1.1  mrg {
    321  1.1  mrg   dump_rdg_vertex (stderr, rdg, i);
    322  1.1  mrg }
    323  1.1  mrg 
    324  1.1  mrg /* Dump the reduced dependence graph RDG to FILE.  */
    325  1.1  mrg 
    326  1.1  mrg static void
    327  1.1  mrg dump_rdg (FILE *file, struct graph *rdg)
    328  1.1  mrg {
    329  1.1  mrg   fprintf (file, "(rdg\n");
    330  1.1  mrg   for (int i = 0; i < rdg->n_vertices; i++)
    331  1.1  mrg     dump_rdg_vertex (file, rdg, i);
    332  1.1  mrg   fprintf (file, ")\n");
    333  1.1  mrg }
    334  1.1  mrg 
    335  1.1  mrg /* Call dump_rdg on stderr.  */
    336  1.1  mrg 
    337  1.1  mrg DEBUG_FUNCTION void
    338  1.1  mrg debug_rdg (struct graph *rdg)
    339  1.1  mrg {
    340  1.1  mrg   dump_rdg (stderr, rdg);
    341  1.1  mrg }
    342  1.1  mrg 
    343  1.1  mrg static void
    344  1.1  mrg dot_rdg_1 (FILE *file, struct graph *rdg)
    345  1.1  mrg {
    346  1.1  mrg   int i;
    347  1.1  mrg   pretty_printer buffer;
    348  1.1  mrg   pp_needs_newline (&buffer) = false;
    349  1.1  mrg   buffer.buffer->stream = file;
    350  1.1  mrg 
    351  1.1  mrg   fprintf (file, "digraph RDG {\n");
    352  1.1  mrg 
    353  1.1  mrg   for (i = 0; i < rdg->n_vertices; i++)
    354  1.1  mrg     {
    355  1.1  mrg       struct vertex *v = &(rdg->vertices[i]);
    356  1.1  mrg       struct graph_edge *e;
    357  1.1  mrg 
    358  1.1  mrg       fprintf (file, "%d [label=\"[%d] ", i, i);
    359  1.1  mrg       pp_gimple_stmt_1 (&buffer, RDGV_STMT (v), 0, TDF_SLIM);
    360  1.1  mrg       pp_flush (&buffer);
    361  1.1  mrg       fprintf (file, "\"]\n");
    362  1.1  mrg 
    363  1.1  mrg       /* Highlight reads from memory.  */
    364  1.1  mrg       if (RDG_MEM_READS_STMT (rdg, i))
    365  1.1  mrg        fprintf (file, "%d [style=filled, fillcolor=green]\n", i);
    366  1.1  mrg 
    367  1.1  mrg       /* Highlight stores to memory.  */
    368  1.1  mrg       if (RDG_MEM_WRITE_STMT (rdg, i))
    369  1.1  mrg        fprintf (file, "%d [style=filled, fillcolor=red]\n", i);
    370  1.1  mrg 
    371  1.1  mrg       if (v->succ)
    372  1.1  mrg        for (e = v->succ; e; e = e->succ_next)
    373  1.1  mrg          switch (RDGE_TYPE (e))
    374  1.1  mrg            {
    375  1.1  mrg            case flow_dd:
    376  1.1  mrg              /* These are the most common dependences: don't print these. */
    377  1.1  mrg              fprintf (file, "%d -> %d \n", i, e->dest);
    378  1.1  mrg              break;
    379  1.1  mrg 
    380  1.1  mrg 	   case control_dd:
    381  1.1  mrg              fprintf (file, "%d -> %d [label=control] \n", i, e->dest);
    382  1.1  mrg              break;
    383  1.1  mrg 
    384  1.1  mrg            default:
    385  1.1  mrg              gcc_unreachable ();
    386  1.1  mrg            }
    387  1.1  mrg     }
    388  1.1  mrg 
    389  1.1  mrg   fprintf (file, "}\n\n");
    390  1.1  mrg }
    391  1.1  mrg 
    392  1.1  mrg /* Display the Reduced Dependence Graph using dotty.  */
    393  1.1  mrg 
    394  1.1  mrg DEBUG_FUNCTION void
    395  1.1  mrg dot_rdg (struct graph *rdg)
    396  1.1  mrg {
    397  1.1  mrg   /* When debugging, you may want to enable the following code.  */
    398  1.1  mrg #ifdef HAVE_POPEN
    399  1.1  mrg   FILE *file = popen ("dot -Tx11", "w");
    400  1.1  mrg   if (!file)
    401  1.1  mrg     return;
    402  1.1  mrg   dot_rdg_1 (file, rdg);
    403  1.1  mrg   fflush (file);
    404  1.1  mrg   close (fileno (file));
    405  1.1  mrg   pclose (file);
    406  1.1  mrg #else
    407  1.1  mrg   dot_rdg_1 (stderr, rdg);
    408  1.1  mrg #endif
    409  1.1  mrg }
    410  1.1  mrg 
    411  1.1  mrg /* Returns the index of STMT in RDG.  */
    412  1.1  mrg 
    413  1.1  mrg static int
    414  1.1  mrg rdg_vertex_for_stmt (struct graph *rdg ATTRIBUTE_UNUSED, gimple *stmt)
    415  1.1  mrg {
    416  1.1  mrg   int index = gimple_uid (stmt);
    417  1.1  mrg   gcc_checking_assert (index == -1 || RDG_STMT (rdg, index) == stmt);
    418  1.1  mrg   return index;
    419  1.1  mrg }
    420  1.1  mrg 
    421  1.1  mrg /* Creates dependence edges in RDG for all the uses of DEF.  IDEF is
    422  1.1  mrg    the index of DEF in RDG.  */
    423  1.1  mrg 
    424  1.1  mrg static void
    425  1.1  mrg create_rdg_edges_for_scalar (struct graph *rdg, tree def, int idef)
    426  1.1  mrg {
    427  1.1  mrg   use_operand_p imm_use_p;
    428  1.1  mrg   imm_use_iterator iterator;
    429  1.1  mrg 
    430  1.1  mrg   FOR_EACH_IMM_USE_FAST (imm_use_p, iterator, def)
    431  1.1  mrg     {
    432  1.1  mrg       struct graph_edge *e;
    433  1.1  mrg       int use = rdg_vertex_for_stmt (rdg, USE_STMT (imm_use_p));
    434  1.1  mrg 
    435  1.1  mrg       if (use < 0)
    436  1.1  mrg 	continue;
    437  1.1  mrg 
    438  1.1  mrg       e = add_edge (rdg, idef, use);
    439  1.1  mrg       e->data = XNEW (struct rdg_edge);
    440  1.1  mrg       RDGE_TYPE (e) = flow_dd;
    441  1.1  mrg     }
    442  1.1  mrg }
    443  1.1  mrg 
    444  1.1  mrg /* Creates an edge for the control dependences of BB to the vertex V.  */
    445  1.1  mrg 
    446  1.1  mrg static void
    447  1.1  mrg create_edge_for_control_dependence (struct graph *rdg, basic_block bb,
    448  1.1  mrg 				    int v, control_dependences *cd)
    449  1.1  mrg {
    450  1.1  mrg   bitmap_iterator bi;
    451  1.1  mrg   unsigned edge_n;
    452  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (cd->get_edges_dependent_on (bb->index),
    453  1.1  mrg 			    0, edge_n, bi)
    454  1.1  mrg     {
    455  1.1  mrg       basic_block cond_bb = cd->get_edge_src (edge_n);
    456  1.1  mrg       gimple *stmt = last_stmt (cond_bb);
    457  1.1  mrg       if (stmt && is_ctrl_stmt (stmt))
    458  1.1  mrg 	{
    459  1.1  mrg 	  struct graph_edge *e;
    460  1.1  mrg 	  int c = rdg_vertex_for_stmt (rdg, stmt);
    461  1.1  mrg 	  if (c < 0)
    462  1.1  mrg 	    continue;
    463  1.1  mrg 
    464  1.1  mrg 	  e = add_edge (rdg, c, v);
    465  1.1  mrg 	  e->data = XNEW (struct rdg_edge);
    466  1.1  mrg 	  RDGE_TYPE (e) = control_dd;
    467  1.1  mrg 	}
    468  1.1  mrg     }
    469  1.1  mrg }
    470  1.1  mrg 
    471  1.1  mrg /* Creates the edges of the reduced dependence graph RDG.  */
    472  1.1  mrg 
    473  1.1  mrg static void
    474  1.1  mrg create_rdg_flow_edges (struct graph *rdg)
    475  1.1  mrg {
    476  1.1  mrg   int i;
    477  1.1  mrg   def_operand_p def_p;
    478  1.1  mrg   ssa_op_iter iter;
    479  1.1  mrg 
    480  1.1  mrg   for (i = 0; i < rdg->n_vertices; i++)
    481  1.1  mrg     FOR_EACH_PHI_OR_STMT_DEF (def_p, RDG_STMT (rdg, i),
    482  1.1  mrg 			      iter, SSA_OP_DEF)
    483  1.1  mrg       create_rdg_edges_for_scalar (rdg, DEF_FROM_PTR (def_p), i);
    484  1.1  mrg }
    485  1.1  mrg 
    486  1.1  mrg /* Creates the edges of the reduced dependence graph RDG.  */
    487  1.1  mrg 
    488  1.1  mrg static void
    489  1.1  mrg create_rdg_cd_edges (struct graph *rdg, control_dependences *cd, loop_p loop)
    490  1.1  mrg {
    491  1.1  mrg   int i;
    492  1.1  mrg 
    493  1.1  mrg   for (i = 0; i < rdg->n_vertices; i++)
    494  1.1  mrg     {
    495  1.1  mrg       gimple *stmt = RDG_STMT (rdg, i);
    496  1.1  mrg       if (gimple_code (stmt) == GIMPLE_PHI)
    497  1.1  mrg 	{
    498  1.1  mrg 	  edge_iterator ei;
    499  1.1  mrg 	  edge e;
    500  1.1  mrg 	  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->preds)
    501  1.1  mrg 	    if (flow_bb_inside_loop_p (loop, e->src))
    502  1.1  mrg 	      create_edge_for_control_dependence (rdg, e->src, i, cd);
    503  1.1  mrg 	}
    504  1.1  mrg       else
    505  1.1  mrg 	create_edge_for_control_dependence (rdg, gimple_bb (stmt), i, cd);
    506  1.1  mrg     }
    507  1.1  mrg }
    508  1.1  mrg 
    509  1.1  mrg 
    510  1.1  mrg class loop_distribution
    511  1.1  mrg {
    512  1.1  mrg   private:
    513  1.1  mrg   /* The loop (nest) to be distributed.  */
    514  1.1  mrg   vec<loop_p> loop_nest;
    515  1.1  mrg 
    516  1.1  mrg   /* Vector of data references in the loop to be distributed.  */
    517  1.1  mrg   vec<data_reference_p> datarefs_vec;
    518  1.1  mrg 
    519  1.1  mrg   /* If there is nonaddressable data reference in above vector.  */
    520  1.1  mrg   bool has_nonaddressable_dataref_p;
    521  1.1  mrg 
    522  1.1  mrg   /* Store index of data reference in aux field.  */
    523  1.1  mrg 
    524  1.1  mrg   /* Hash table for data dependence relation in the loop to be distributed.  */
    525  1.1  mrg   hash_table<ddr_hasher> *ddrs_table;
    526  1.1  mrg 
    527  1.1  mrg   /* Array mapping basic block's index to its topological order.  */
    528  1.1  mrg   int *bb_top_order_index;
    529  1.1  mrg   /* And size of the array.  */
    530  1.1  mrg   int bb_top_order_index_size;
    531  1.1  mrg 
    532  1.1  mrg   /* Build the vertices of the reduced dependence graph RDG.  Return false
    533  1.1  mrg      if that failed.  */
    534  1.1  mrg   bool create_rdg_vertices (struct graph *rdg, const vec<gimple *> &stmts,
    535  1.1  mrg 			    loop_p loop);
    536  1.1  mrg 
    537  1.1  mrg   /* Initialize STMTS with all the statements of LOOP.  We use topological
    538  1.1  mrg      order to discover all statements.  The order is important because
    539  1.1  mrg      generate_loops_for_partition is using the same traversal for identifying
    540  1.1  mrg      statements in loop copies.  */
    541  1.1  mrg   void stmts_from_loop (class loop *loop, vec<gimple *> *stmts);
    542  1.1  mrg 
    543  1.1  mrg 
    544  1.1  mrg   /* Build the Reduced Dependence Graph (RDG) with one vertex per statement of
    545  1.1  mrg      LOOP, and one edge per flow dependence or control dependence from control
    546  1.1  mrg      dependence CD.  During visiting each statement, data references are also
    547  1.1  mrg      collected and recorded in global data DATAREFS_VEC.  */
    548  1.1  mrg   struct graph * build_rdg (class loop *loop, control_dependences *cd);
    549  1.1  mrg 
    550  1.1  mrg /* Merge PARTITION into the partition DEST.  RDG is the reduced dependence
    551  1.1  mrg    graph and we update type for result partition if it is non-NULL.  */
    552  1.1  mrg   void partition_merge_into (struct graph *rdg,
    553  1.1  mrg 			     partition *dest, partition *partition,
    554  1.1  mrg 			     enum fuse_type ft);
    555  1.1  mrg 
    556  1.1  mrg 
    557  1.1  mrg   /* Return data dependence relation for data references A and B.  The two
    558  1.1  mrg      data references must be in lexicographic order wrto reduced dependence
    559  1.1  mrg      graph RDG.  We firstly try to find ddr from global ddr hash table.  If
    560  1.1  mrg      it doesn't exist, compute the ddr and cache it.  */
    561  1.1  mrg   data_dependence_relation * get_data_dependence (struct graph *rdg,
    562  1.1  mrg 						  data_reference_p a,
    563  1.1  mrg 						  data_reference_p b);
    564  1.1  mrg 
    565  1.1  mrg 
    566  1.1  mrg   /* In reduced dependence graph RDG for loop distribution, return true if
    567  1.1  mrg      dependence between references DR1 and DR2 leads to a dependence cycle
    568  1.1  mrg      and such dependence cycle can't be resolved by runtime alias check.  */
    569  1.1  mrg   bool data_dep_in_cycle_p (struct graph *rdg, data_reference_p dr1,
    570  1.1  mrg 			    data_reference_p dr2);
    571  1.1  mrg 
    572  1.1  mrg 
    573  1.1  mrg   /* Given reduced dependence graph RDG, PARTITION1 and PARTITION2, update
    574  1.1  mrg      PARTITION1's type after merging PARTITION2 into PARTITION1.  */
    575  1.1  mrg   void update_type_for_merge (struct graph *rdg,
    576  1.1  mrg 			      partition *partition1, partition *partition2);
    577  1.1  mrg 
    578  1.1  mrg 
    579  1.1  mrg   /* Returns a partition with all the statements needed for computing
    580  1.1  mrg      the vertex V of the RDG, also including the loop exit conditions.  */
    581  1.1  mrg   partition *build_rdg_partition_for_vertex (struct graph *rdg, int v);
    582  1.1  mrg 
    583  1.1  mrg   /* Given data references DST_DR and SRC_DR in loop nest LOOP and RDG, classify
    584  1.1  mrg      if it forms builtin memcpy or memmove call.  */
    585  1.1  mrg   void classify_builtin_ldst (loop_p loop, struct graph *rdg, partition *partition,
    586  1.1  mrg 			      data_reference_p dst_dr, data_reference_p src_dr);
    587  1.1  mrg 
    588  1.1  mrg   /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
    589  1.1  mrg      For the moment we detect memset, memcpy and memmove patterns.  Bitmap
    590  1.1  mrg      STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions.
    591  1.1  mrg      Returns true if there is a reduction in all partitions and we
    592  1.1  mrg      possibly did not mark PARTITION as having one for this reason.  */
    593  1.1  mrg 
    594  1.1  mrg   bool
    595  1.1  mrg   classify_partition (loop_p loop,
    596  1.1  mrg 		      struct graph *rdg, partition *partition,
    597  1.1  mrg 		      bitmap stmt_in_all_partitions);
    598  1.1  mrg 
    599  1.1  mrg 
    600  1.1  mrg   /* Returns true when PARTITION1 and PARTITION2 access the same memory
    601  1.1  mrg      object in RDG.  */
    602  1.1  mrg   bool share_memory_accesses (struct graph *rdg,
    603  1.1  mrg 			      partition *partition1, partition *partition2);
    604  1.1  mrg 
    605  1.1  mrg   /* For each seed statement in STARTING_STMTS, this function builds
    606  1.1  mrg      partition for it by adding depended statements according to RDG.
    607  1.1  mrg      All partitions are recorded in PARTITIONS.  */
    608  1.1  mrg   void rdg_build_partitions (struct graph *rdg,
    609  1.1  mrg 			     vec<gimple *> starting_stmts,
    610  1.1  mrg 			     vec<partition *> *partitions);
    611  1.1  mrg 
    612  1.1  mrg   /* Compute partition dependence created by the data references in DRS1
    613  1.1  mrg      and DRS2, modify and return DIR according to that.  IF ALIAS_DDR is
    614  1.1  mrg      not NULL, we record dependence introduced by possible alias between
    615  1.1  mrg      two data references in ALIAS_DDRS; otherwise, we simply ignore such
    616  1.1  mrg      dependence as if it doesn't exist at all.  */
    617  1.1  mrg   int pg_add_dependence_edges (struct graph *rdg, int dir, bitmap drs1,
    618  1.1  mrg 			       bitmap drs2, vec<ddr_p> *alias_ddrs);
    619  1.1  mrg 
    620  1.1  mrg 
    621  1.1  mrg   /* Build and return partition dependence graph for PARTITIONS.  RDG is
    622  1.1  mrg      reduced dependence graph for the loop to be distributed.  If IGNORE_ALIAS_P
    623  1.1  mrg      is true, data dependence caused by possible alias between references
    624  1.1  mrg      is ignored, as if it doesn't exist at all; otherwise all depdendences
    625  1.1  mrg      are considered.  */
    626  1.1  mrg   struct graph *build_partition_graph (struct graph *rdg,
    627  1.1  mrg 				       vec<struct partition *> *partitions,
    628  1.1  mrg 				       bool ignore_alias_p);
    629  1.1  mrg 
    630  1.1  mrg   /* Given reduced dependence graph RDG merge strong connected components
    631  1.1  mrg      of PARTITIONS.  If IGNORE_ALIAS_P is true, data dependence caused by
    632  1.1  mrg      possible alias between references is ignored, as if it doesn't exist
    633  1.1  mrg      at all; otherwise all depdendences are considered.  */
    634  1.1  mrg   void merge_dep_scc_partitions (struct graph *rdg, vec<struct partition *>
    635  1.1  mrg 				 *partitions, bool ignore_alias_p);
    636  1.1  mrg 
    637  1.1  mrg /* This is the main function breaking strong conected components in
    638  1.1  mrg    PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
    639  1.1  mrg    relations for runtime alias check in ALIAS_DDRS.  */
    640  1.1  mrg   void break_alias_scc_partitions (struct graph *rdg, vec<struct partition *>
    641  1.1  mrg 				   *partitions, vec<ddr_p> *alias_ddrs);
    642  1.1  mrg 
    643  1.1  mrg 
    644  1.1  mrg   /* Fuse PARTITIONS of LOOP if necessary before finalizing distribution.
    645  1.1  mrg      ALIAS_DDRS contains ddrs which need runtime alias check.  */
    646  1.1  mrg   void finalize_partitions (class loop *loop, vec<struct partition *>
    647  1.1  mrg 			    *partitions, vec<ddr_p> *alias_ddrs);
    648  1.1  mrg 
    649  1.1  mrg   /* Distributes the code from LOOP in such a way that producer statements
    650  1.1  mrg      are placed before consumer statements.  Tries to separate only the
    651  1.1  mrg      statements from STMTS into separate loops.  Returns the number of
    652  1.1  mrg      distributed loops.  Set NB_CALLS to number of generated builtin calls.
    653  1.1  mrg      Set *DESTROY_P to whether LOOP needs to be destroyed.  */
    654  1.1  mrg   int distribute_loop (class loop *loop, const vec<gimple *> &stmts,
    655  1.1  mrg 		       control_dependences *cd, int *nb_calls, bool *destroy_p,
    656  1.1  mrg 		       bool only_patterns_p);
    657  1.1  mrg 
    658  1.1  mrg   /* Transform loops which mimic the effects of builtins rawmemchr or strlen and
    659  1.1  mrg      replace them accordingly.  */
    660  1.1  mrg   bool transform_reduction_loop (loop_p loop);
    661  1.1  mrg 
    662  1.1  mrg   /* Compute topological order for basic blocks.  Topological order is
    663  1.1  mrg      needed because data dependence is computed for data references in
    664  1.1  mrg      lexicographical order.  */
    665  1.1  mrg   void bb_top_order_init (void);
    666  1.1  mrg 
    667  1.1  mrg   void bb_top_order_destroy (void);
    668  1.1  mrg 
    669  1.1  mrg   public:
    670  1.1  mrg 
    671  1.1  mrg   /* Getter for bb_top_order.  */
    672  1.1  mrg 
    673  1.1  mrg   inline int get_bb_top_order_index_size (void)
    674  1.1  mrg     {
    675  1.1  mrg       return bb_top_order_index_size;
    676  1.1  mrg     }
    677  1.1  mrg 
    678  1.1  mrg   inline int get_bb_top_order_index (int i)
    679  1.1  mrg     {
    680  1.1  mrg       return bb_top_order_index[i];
    681  1.1  mrg     }
    682  1.1  mrg 
    683  1.1  mrg   unsigned int execute (function *fun);
    684  1.1  mrg };
    685  1.1  mrg 
    686  1.1  mrg 
    687  1.1  mrg /* If X has a smaller topological sort number than Y, returns -1;
    688  1.1  mrg    if greater, returns 1.  */
    689  1.1  mrg static int
    690  1.1  mrg bb_top_order_cmp_r (const void *x, const void *y, void *loop)
    691  1.1  mrg {
    692  1.1  mrg   loop_distribution *_loop =
    693  1.1  mrg     (loop_distribution *) loop;
    694  1.1  mrg 
    695  1.1  mrg   basic_block bb1 = *(const basic_block *) x;
    696  1.1  mrg   basic_block bb2 = *(const basic_block *) y;
    697  1.1  mrg 
    698  1.1  mrg   int bb_top_order_index_size = _loop->get_bb_top_order_index_size ();
    699  1.1  mrg 
    700  1.1  mrg   gcc_assert (bb1->index < bb_top_order_index_size
    701  1.1  mrg 	      && bb2->index < bb_top_order_index_size);
    702  1.1  mrg   gcc_assert (bb1 == bb2
    703  1.1  mrg 	      || _loop->get_bb_top_order_index(bb1->index)
    704  1.1  mrg 		 != _loop->get_bb_top_order_index(bb2->index));
    705  1.1  mrg 
    706  1.1  mrg   return (_loop->get_bb_top_order_index(bb1->index) -
    707  1.1  mrg 	  _loop->get_bb_top_order_index(bb2->index));
    708  1.1  mrg }
    709  1.1  mrg 
    710  1.1  mrg bool
    711  1.1  mrg loop_distribution::create_rdg_vertices (struct graph *rdg,
    712  1.1  mrg 					const vec<gimple *> &stmts,
    713  1.1  mrg 					loop_p loop)
    714  1.1  mrg {
    715  1.1  mrg   int i;
    716  1.1  mrg   gimple *stmt;
    717  1.1  mrg 
    718  1.1  mrg   FOR_EACH_VEC_ELT (stmts, i, stmt)
    719  1.1  mrg     {
    720  1.1  mrg       struct vertex *v = &(rdg->vertices[i]);
    721  1.1  mrg 
    722  1.1  mrg       /* Record statement to vertex mapping.  */
    723  1.1  mrg       gimple_set_uid (stmt, i);
    724  1.1  mrg 
    725  1.1  mrg       v->data = XNEW (struct rdg_vertex);
    726  1.1  mrg       RDGV_STMT (v) = stmt;
    727  1.1  mrg       RDGV_DATAREFS (v).create (0);
    728  1.1  mrg       RDGV_HAS_MEM_WRITE (v) = false;
    729  1.1  mrg       RDGV_HAS_MEM_READS (v) = false;
    730  1.1  mrg       if (gimple_code (stmt) == GIMPLE_PHI)
    731  1.1  mrg 	continue;
    732  1.1  mrg 
    733  1.1  mrg       unsigned drp = datarefs_vec.length ();
    734  1.1  mrg       if (!find_data_references_in_stmt (loop, stmt, &datarefs_vec))
    735  1.1  mrg 	return false;
    736  1.1  mrg       for (unsigned j = drp; j < datarefs_vec.length (); ++j)
    737  1.1  mrg 	{
    738  1.1  mrg 	  data_reference_p dr = datarefs_vec[j];
    739  1.1  mrg 	  if (DR_IS_READ (dr))
    740  1.1  mrg 	    RDGV_HAS_MEM_READS (v) = true;
    741  1.1  mrg 	  else
    742  1.1  mrg 	    RDGV_HAS_MEM_WRITE (v) = true;
    743  1.1  mrg 	  RDGV_DATAREFS (v).safe_push (dr);
    744  1.1  mrg 	  has_nonaddressable_dataref_p |= may_be_nonaddressable_p (dr->ref);
    745  1.1  mrg 	}
    746  1.1  mrg     }
    747  1.1  mrg   return true;
    748  1.1  mrg }
    749  1.1  mrg 
    750  1.1  mrg void
    751  1.1  mrg loop_distribution::stmts_from_loop (class loop *loop, vec<gimple *> *stmts)
    752  1.1  mrg {
    753  1.1  mrg   unsigned int i;
    754  1.1  mrg   basic_block *bbs = get_loop_body_in_custom_order (loop, this, bb_top_order_cmp_r);
    755  1.1  mrg 
    756  1.1  mrg   for (i = 0; i < loop->num_nodes; i++)
    757  1.1  mrg     {
    758  1.1  mrg       basic_block bb = bbs[i];
    759  1.1  mrg 
    760  1.1  mrg       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
    761  1.1  mrg 	   gsi_next (&bsi))
    762  1.1  mrg 	if (!virtual_operand_p (gimple_phi_result (bsi.phi ())))
    763  1.1  mrg 	  stmts->safe_push (bsi.phi ());
    764  1.1  mrg 
    765  1.1  mrg       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
    766  1.1  mrg 	   gsi_next (&bsi))
    767  1.1  mrg 	{
    768  1.1  mrg 	  gimple *stmt = gsi_stmt (bsi);
    769  1.1  mrg 	  if (gimple_code (stmt) != GIMPLE_LABEL && !is_gimple_debug (stmt))
    770  1.1  mrg 	    stmts->safe_push (stmt);
    771  1.1  mrg 	}
    772  1.1  mrg     }
    773  1.1  mrg 
    774  1.1  mrg   free (bbs);
    775  1.1  mrg }
    776  1.1  mrg 
    777  1.1  mrg /* Free the reduced dependence graph RDG.  */
    778  1.1  mrg 
    779  1.1  mrg static void
    780  1.1  mrg free_rdg (struct graph *rdg)
    781  1.1  mrg {
    782  1.1  mrg   int i;
    783  1.1  mrg 
    784  1.1  mrg   for (i = 0; i < rdg->n_vertices; i++)
    785  1.1  mrg     {
    786  1.1  mrg       struct vertex *v = &(rdg->vertices[i]);
    787  1.1  mrg       struct graph_edge *e;
    788  1.1  mrg 
    789  1.1  mrg       for (e = v->succ; e; e = e->succ_next)
    790  1.1  mrg 	free (e->data);
    791  1.1  mrg 
    792  1.1  mrg       if (v->data)
    793  1.1  mrg 	{
    794  1.1  mrg 	  gimple_set_uid (RDGV_STMT (v), -1);
    795  1.1  mrg 	  (RDGV_DATAREFS (v)).release ();
    796  1.1  mrg 	  free (v->data);
    797  1.1  mrg 	}
    798  1.1  mrg     }
    799  1.1  mrg 
    800  1.1  mrg   free_graph (rdg);
    801  1.1  mrg }
    802  1.1  mrg 
    803  1.1  mrg struct graph *
    804  1.1  mrg loop_distribution::build_rdg (class loop *loop, control_dependences *cd)
    805  1.1  mrg {
    806  1.1  mrg   struct graph *rdg;
    807  1.1  mrg 
    808  1.1  mrg   /* Create the RDG vertices from the stmts of the loop nest.  */
    809  1.1  mrg   auto_vec<gimple *, 10> stmts;
    810  1.1  mrg   stmts_from_loop (loop, &stmts);
    811  1.1  mrg   rdg = new_graph (stmts.length ());
    812  1.1  mrg   if (!create_rdg_vertices (rdg, stmts, loop))
    813  1.1  mrg     {
    814  1.1  mrg       free_rdg (rdg);
    815  1.1  mrg       return NULL;
    816  1.1  mrg     }
    817  1.1  mrg   stmts.release ();
    818  1.1  mrg 
    819  1.1  mrg   create_rdg_flow_edges (rdg);
    820  1.1  mrg   if (cd)
    821  1.1  mrg     create_rdg_cd_edges (rdg, cd, loop);
    822  1.1  mrg 
    823  1.1  mrg   return rdg;
    824  1.1  mrg }
    825  1.1  mrg 
    826  1.1  mrg 
    827  1.1  mrg /* Allocate and initialize a partition from BITMAP.  */
    828  1.1  mrg 
    829  1.1  mrg static partition *
    830  1.1  mrg partition_alloc (void)
    831  1.1  mrg {
    832  1.1  mrg   partition *partition = XCNEW (struct partition);
    833  1.1  mrg   partition->stmts = BITMAP_ALLOC (NULL);
    834  1.1  mrg   partition->reduction_p = false;
    835  1.1  mrg   partition->loc = UNKNOWN_LOCATION;
    836  1.1  mrg   partition->kind = PKIND_NORMAL;
    837  1.1  mrg   partition->type = PTYPE_PARALLEL;
    838  1.1  mrg   partition->datarefs = BITMAP_ALLOC (NULL);
    839  1.1  mrg   return partition;
    840  1.1  mrg }
    841  1.1  mrg 
    842  1.1  mrg /* Free PARTITION.  */
    843  1.1  mrg 
    844  1.1  mrg static void
    845  1.1  mrg partition_free (partition *partition)
    846  1.1  mrg {
    847  1.1  mrg   BITMAP_FREE (partition->stmts);
    848  1.1  mrg   BITMAP_FREE (partition->datarefs);
    849  1.1  mrg   if (partition->builtin)
    850  1.1  mrg     free (partition->builtin);
    851  1.1  mrg 
    852  1.1  mrg   free (partition);
    853  1.1  mrg }
    854  1.1  mrg 
    855  1.1  mrg /* Returns true if the partition can be generated as a builtin.  */
    856  1.1  mrg 
    857  1.1  mrg static bool
    858  1.1  mrg partition_builtin_p (partition *partition)
    859  1.1  mrg {
    860  1.1  mrg   return partition->kind > PKIND_PARTIAL_MEMSET;
    861  1.1  mrg }
    862  1.1  mrg 
    863  1.1  mrg /* Returns true if the partition contains a reduction.  */
    864  1.1  mrg 
    865  1.1  mrg static bool
    866  1.1  mrg partition_reduction_p (partition *partition)
    867  1.1  mrg {
    868  1.1  mrg   return partition->reduction_p;
    869  1.1  mrg }
    870  1.1  mrg 
    871  1.1  mrg void
    872  1.1  mrg loop_distribution::partition_merge_into (struct graph *rdg,
    873  1.1  mrg 		      partition *dest, partition *partition, enum fuse_type ft)
    874  1.1  mrg {
    875  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
    876  1.1  mrg     {
    877  1.1  mrg       fprintf (dump_file, "Fuse partitions because %s:\n", fuse_message[ft]);
    878  1.1  mrg       fprintf (dump_file, "  Part 1: ");
    879  1.1  mrg       dump_bitmap (dump_file, dest->stmts);
    880  1.1  mrg       fprintf (dump_file, "  Part 2: ");
    881  1.1  mrg       dump_bitmap (dump_file, partition->stmts);
    882  1.1  mrg     }
    883  1.1  mrg 
    884  1.1  mrg   dest->kind = PKIND_NORMAL;
    885  1.1  mrg   if (dest->type == PTYPE_PARALLEL)
    886  1.1  mrg     dest->type = partition->type;
    887  1.1  mrg 
    888  1.1  mrg   bitmap_ior_into (dest->stmts, partition->stmts);
    889  1.1  mrg   if (partition_reduction_p (partition))
    890  1.1  mrg     dest->reduction_p = true;
    891  1.1  mrg 
    892  1.1  mrg   /* Further check if any data dependence prevents us from executing the
    893  1.1  mrg      new partition parallelly.  */
    894  1.1  mrg   if (dest->type == PTYPE_PARALLEL && rdg != NULL)
    895  1.1  mrg     update_type_for_merge (rdg, dest, partition);
    896  1.1  mrg 
    897  1.1  mrg   bitmap_ior_into (dest->datarefs, partition->datarefs);
    898  1.1  mrg }
    899  1.1  mrg 
    900  1.1  mrg 
    901  1.1  mrg /* Returns true when DEF is an SSA_NAME defined in LOOP and used after
    902  1.1  mrg    the LOOP.  */
    903  1.1  mrg 
    904  1.1  mrg static bool
    905  1.1  mrg ssa_name_has_uses_outside_loop_p (tree def, loop_p loop)
    906  1.1  mrg {
    907  1.1  mrg   imm_use_iterator imm_iter;
    908  1.1  mrg   use_operand_p use_p;
    909  1.1  mrg 
    910  1.1  mrg   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
    911  1.1  mrg     {
    912  1.1  mrg       if (is_gimple_debug (USE_STMT (use_p)))
    913  1.1  mrg 	continue;
    914  1.1  mrg 
    915  1.1  mrg       basic_block use_bb = gimple_bb (USE_STMT (use_p));
    916  1.1  mrg       if (!flow_bb_inside_loop_p (loop, use_bb))
    917  1.1  mrg 	return true;
    918  1.1  mrg     }
    919  1.1  mrg 
    920  1.1  mrg   return false;
    921  1.1  mrg }
    922  1.1  mrg 
    923  1.1  mrg /* Returns true when STMT defines a scalar variable used after the
    924  1.1  mrg    loop LOOP.  */
    925  1.1  mrg 
    926  1.1  mrg static bool
    927  1.1  mrg stmt_has_scalar_dependences_outside_loop (loop_p loop, gimple *stmt)
    928  1.1  mrg {
    929  1.1  mrg   def_operand_p def_p;
    930  1.1  mrg   ssa_op_iter op_iter;
    931  1.1  mrg 
    932  1.1  mrg   if (gimple_code (stmt) == GIMPLE_PHI)
    933  1.1  mrg     return ssa_name_has_uses_outside_loop_p (gimple_phi_result (stmt), loop);
    934  1.1  mrg 
    935  1.1  mrg   FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_DEF)
    936  1.1  mrg     if (ssa_name_has_uses_outside_loop_p (DEF_FROM_PTR (def_p), loop))
    937  1.1  mrg       return true;
    938  1.1  mrg 
    939  1.1  mrg   return false;
    940  1.1  mrg }
    941  1.1  mrg 
    942  1.1  mrg /* Return a copy of LOOP placed before LOOP.  */
    943  1.1  mrg 
    944  1.1  mrg static class loop *
    945  1.1  mrg copy_loop_before (class loop *loop)
    946  1.1  mrg {
    947  1.1  mrg   class loop *res;
    948  1.1  mrg   edge preheader = loop_preheader_edge (loop);
    949  1.1  mrg 
    950  1.1  mrg   initialize_original_copy_tables ();
    951  1.1  mrg   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
    952  1.1  mrg   gcc_assert (res != NULL);
    953  1.1  mrg   free_original_copy_tables ();
    954  1.1  mrg   delete_update_ssa ();
    955  1.1  mrg 
    956  1.1  mrg   return res;
    957  1.1  mrg }
    958  1.1  mrg 
    959  1.1  mrg /* Creates an empty basic block after LOOP.  */
    960  1.1  mrg 
    961  1.1  mrg static void
    962  1.1  mrg create_bb_after_loop (class loop *loop)
    963  1.1  mrg {
    964  1.1  mrg   edge exit = single_exit (loop);
    965  1.1  mrg 
    966  1.1  mrg   if (!exit)
    967  1.1  mrg     return;
    968  1.1  mrg 
    969  1.1  mrg   split_edge (exit);
    970  1.1  mrg }
    971  1.1  mrg 
    972  1.1  mrg /* Generate code for PARTITION from the code in LOOP.  The loop is
    973  1.1  mrg    copied when COPY_P is true.  All the statements not flagged in the
    974  1.1  mrg    PARTITION bitmap are removed from the loop or from its copy.  The
    975  1.1  mrg    statements are indexed in sequence inside a basic block, and the
    976  1.1  mrg    basic blocks of a loop are taken in dom order.  */
    977  1.1  mrg 
    978  1.1  mrg static void
    979  1.1  mrg generate_loops_for_partition (class loop *loop, partition *partition,
    980  1.1  mrg 			      bool copy_p)
    981  1.1  mrg {
    982  1.1  mrg   unsigned i;
    983  1.1  mrg   basic_block *bbs;
    984  1.1  mrg 
    985  1.1  mrg   if (copy_p)
    986  1.1  mrg     {
    987  1.1  mrg       int orig_loop_num = loop->orig_loop_num;
    988  1.1  mrg       loop = copy_loop_before (loop);
    989  1.1  mrg       gcc_assert (loop != NULL);
    990  1.1  mrg       loop->orig_loop_num = orig_loop_num;
    991  1.1  mrg       create_preheader (loop, CP_SIMPLE_PREHEADERS);
    992  1.1  mrg       create_bb_after_loop (loop);
    993  1.1  mrg     }
    994  1.1  mrg   else
    995  1.1  mrg     {
    996  1.1  mrg       /* Origin number is set to the new versioned loop's num.  */
    997  1.1  mrg       gcc_assert (loop->orig_loop_num != loop->num);
    998  1.1  mrg     }
    999  1.1  mrg 
   1000  1.1  mrg   /* Remove stmts not in the PARTITION bitmap.  */
   1001  1.1  mrg   bbs = get_loop_body_in_dom_order (loop);
   1002  1.1  mrg 
   1003  1.1  mrg   if (MAY_HAVE_DEBUG_BIND_STMTS)
   1004  1.1  mrg     for (i = 0; i < loop->num_nodes; i++)
   1005  1.1  mrg       {
   1006  1.1  mrg 	basic_block bb = bbs[i];
   1007  1.1  mrg 
   1008  1.1  mrg 	for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
   1009  1.1  mrg 	     gsi_next (&bsi))
   1010  1.1  mrg 	  {
   1011  1.1  mrg 	    gphi *phi = bsi.phi ();
   1012  1.1  mrg 	    if (!virtual_operand_p (gimple_phi_result (phi))
   1013  1.1  mrg 		&& !bitmap_bit_p (partition->stmts, gimple_uid (phi)))
   1014  1.1  mrg 	      reset_debug_uses (phi);
   1015  1.1  mrg 	  }
   1016  1.1  mrg 
   1017  1.1  mrg 	for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
   1018  1.1  mrg 	  {
   1019  1.1  mrg 	    gimple *stmt = gsi_stmt (bsi);
   1020  1.1  mrg 	    if (gimple_code (stmt) != GIMPLE_LABEL
   1021  1.1  mrg 		&& !is_gimple_debug (stmt)
   1022  1.1  mrg 		&& !bitmap_bit_p (partition->stmts, gimple_uid (stmt)))
   1023  1.1  mrg 	      reset_debug_uses (stmt);
   1024  1.1  mrg 	  }
   1025  1.1  mrg       }
   1026  1.1  mrg 
   1027  1.1  mrg   for (i = 0; i < loop->num_nodes; i++)
   1028  1.1  mrg     {
   1029  1.1  mrg       basic_block bb = bbs[i];
   1030  1.1  mrg       edge inner_exit = NULL;
   1031  1.1  mrg 
   1032  1.1  mrg       if (loop != bb->loop_father)
   1033  1.1  mrg 	inner_exit = single_exit (bb->loop_father);
   1034  1.1  mrg 
   1035  1.1  mrg       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);)
   1036  1.1  mrg 	{
   1037  1.1  mrg 	  gphi *phi = bsi.phi ();
   1038  1.1  mrg 	  if (!virtual_operand_p (gimple_phi_result (phi))
   1039  1.1  mrg 	      && !bitmap_bit_p (partition->stmts, gimple_uid (phi)))
   1040  1.1  mrg 	    remove_phi_node (&bsi, true);
   1041  1.1  mrg 	  else
   1042  1.1  mrg 	    gsi_next (&bsi);
   1043  1.1  mrg 	}
   1044  1.1  mrg 
   1045  1.1  mrg       for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);)
   1046  1.1  mrg 	{
   1047  1.1  mrg 	  gimple *stmt = gsi_stmt (bsi);
   1048  1.1  mrg 	  if (gimple_code (stmt) != GIMPLE_LABEL
   1049  1.1  mrg 	      && !is_gimple_debug (stmt)
   1050  1.1  mrg 	      && !bitmap_bit_p (partition->stmts, gimple_uid (stmt)))
   1051  1.1  mrg 	    {
   1052  1.1  mrg 	      /* In distribution of loop nest, if bb is inner loop's exit_bb,
   1053  1.1  mrg 		 we choose its exit edge/path in order to avoid generating
   1054  1.1  mrg 		 infinite loop.  For all other cases, we choose an arbitrary
   1055  1.1  mrg 		 path through the empty CFG part that this unnecessary
   1056  1.1  mrg 		 control stmt controls.  */
   1057  1.1  mrg 	      if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
   1058  1.1  mrg 		{
   1059  1.1  mrg 		  if (inner_exit && inner_exit->flags & EDGE_TRUE_VALUE)
   1060  1.1  mrg 		    gimple_cond_make_true (cond_stmt);
   1061  1.1  mrg 		  else
   1062  1.1  mrg 		    gimple_cond_make_false (cond_stmt);
   1063  1.1  mrg 		  update_stmt (stmt);
   1064  1.1  mrg 		}
   1065  1.1  mrg 	      else if (gimple_code (stmt) == GIMPLE_SWITCH)
   1066  1.1  mrg 		{
   1067  1.1  mrg 		  gswitch *switch_stmt = as_a <gswitch *> (stmt);
   1068  1.1  mrg 		  gimple_switch_set_index
   1069  1.1  mrg 		      (switch_stmt, CASE_LOW (gimple_switch_label (switch_stmt, 1)));
   1070  1.1  mrg 		  update_stmt (stmt);
   1071  1.1  mrg 		}
   1072  1.1  mrg 	      else
   1073  1.1  mrg 		{
   1074  1.1  mrg 		  unlink_stmt_vdef (stmt);
   1075  1.1  mrg 		  gsi_remove (&bsi, true);
   1076  1.1  mrg 		  release_defs (stmt);
   1077  1.1  mrg 		  continue;
   1078  1.1  mrg 		}
   1079  1.1  mrg 	    }
   1080  1.1  mrg 	  gsi_next (&bsi);
   1081  1.1  mrg 	}
   1082  1.1  mrg     }
   1083  1.1  mrg 
   1084  1.1  mrg   free (bbs);
   1085  1.1  mrg }
   1086  1.1  mrg 
   1087  1.1  mrg /* If VAL memory representation contains the same value in all bytes,
   1088  1.1  mrg    return that value, otherwise return -1.
   1089  1.1  mrg    E.g. for 0x24242424 return 0x24, for IEEE double
   1090  1.1  mrg    747708026454360457216.0 return 0x44, etc.  */
   1091  1.1  mrg 
   1092  1.1  mrg static int
   1093  1.1  mrg const_with_all_bytes_same (tree val)
   1094  1.1  mrg {
   1095  1.1  mrg   unsigned char buf[64];
   1096  1.1  mrg   int i, len;
   1097  1.1  mrg 
   1098  1.1  mrg   if (integer_zerop (val)
   1099  1.1  mrg       || (TREE_CODE (val) == CONSTRUCTOR
   1100  1.1  mrg           && !TREE_CLOBBER_P (val)
   1101  1.1  mrg           && CONSTRUCTOR_NELTS (val) == 0))
   1102  1.1  mrg     return 0;
   1103  1.1  mrg 
   1104  1.1  mrg   if (real_zerop (val))
   1105  1.1  mrg     {
   1106  1.1  mrg       /* Only return 0 for +0.0, not for -0.0, which doesn't have
   1107  1.1  mrg 	 an all bytes same memory representation.  Don't transform
   1108  1.1  mrg 	 -0.0 stores into +0.0 even for !HONOR_SIGNED_ZEROS.  */
   1109  1.1  mrg       switch (TREE_CODE (val))
   1110  1.1  mrg 	{
   1111  1.1  mrg 	case REAL_CST:
   1112  1.1  mrg 	  if (!real_isneg (TREE_REAL_CST_PTR (val)))
   1113  1.1  mrg 	    return 0;
   1114  1.1  mrg 	  break;
   1115  1.1  mrg 	case COMPLEX_CST:
   1116  1.1  mrg 	  if (!const_with_all_bytes_same (TREE_REALPART (val))
   1117  1.1  mrg 	      && !const_with_all_bytes_same (TREE_IMAGPART (val)))
   1118  1.1  mrg 	    return 0;
   1119  1.1  mrg 	  break;
   1120  1.1  mrg 	case VECTOR_CST:
   1121  1.1  mrg 	  {
   1122  1.1  mrg 	    unsigned int count = vector_cst_encoded_nelts (val);
   1123  1.1  mrg 	    unsigned int j;
   1124  1.1  mrg 	    for (j = 0; j < count; ++j)
   1125  1.1  mrg 	      if (const_with_all_bytes_same (VECTOR_CST_ENCODED_ELT (val, j)))
   1126  1.1  mrg 		break;
   1127  1.1  mrg 	    if (j == count)
   1128  1.1  mrg 	      return 0;
   1129  1.1  mrg 	    break;
   1130  1.1  mrg 	  }
   1131  1.1  mrg 	default:
   1132  1.1  mrg 	  break;
   1133  1.1  mrg 	}
   1134  1.1  mrg     }
   1135  1.1  mrg 
   1136  1.1  mrg   if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
   1137  1.1  mrg     return -1;
   1138  1.1  mrg 
   1139  1.1  mrg   len = native_encode_expr (val, buf, sizeof (buf));
   1140  1.1  mrg   if (len == 0)
   1141  1.1  mrg     return -1;
   1142  1.1  mrg   for (i = 1; i < len; i++)
   1143  1.1  mrg     if (buf[i] != buf[0])
   1144  1.1  mrg       return -1;
   1145  1.1  mrg   return buf[0];
   1146  1.1  mrg }
   1147  1.1  mrg 
   1148  1.1  mrg /* Generate a call to memset for PARTITION in LOOP.  */
   1149  1.1  mrg 
   1150  1.1  mrg static void
   1151  1.1  mrg generate_memset_builtin (class loop *loop, partition *partition)
   1152  1.1  mrg {
   1153  1.1  mrg   gimple_stmt_iterator gsi;
   1154  1.1  mrg   tree mem, fn, nb_bytes;
   1155  1.1  mrg   tree val;
   1156  1.1  mrg   struct builtin_info *builtin = partition->builtin;
   1157  1.1  mrg   gimple *fn_call;
   1158  1.1  mrg 
   1159  1.1  mrg   /* The new statements will be placed before LOOP.  */
   1160  1.1  mrg   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
   1161  1.1  mrg 
   1162  1.1  mrg   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
   1163  1.1  mrg   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
   1164  1.1  mrg 				       false, GSI_CONTINUE_LINKING);
   1165  1.1  mrg   mem = rewrite_to_non_trapping_overflow (builtin->dst_base);
   1166  1.1  mrg   mem = force_gimple_operand_gsi (&gsi, mem, true, NULL_TREE,
   1167  1.1  mrg 				  false, GSI_CONTINUE_LINKING);
   1168  1.1  mrg 
   1169  1.1  mrg   /* This exactly matches the pattern recognition in classify_partition.  */
   1170  1.1  mrg   val = gimple_assign_rhs1 (DR_STMT (builtin->dst_dr));
   1171  1.1  mrg   /* Handle constants like 0x15151515 and similarly
   1172  1.1  mrg      floating point constants etc. where all bytes are the same.  */
   1173  1.1  mrg   int bytev = const_with_all_bytes_same (val);
   1174  1.1  mrg   if (bytev != -1)
   1175  1.1  mrg     val = build_int_cst (integer_type_node, bytev);
   1176  1.1  mrg   else if (TREE_CODE (val) == INTEGER_CST)
   1177  1.1  mrg     val = fold_convert (integer_type_node, val);
   1178  1.1  mrg   else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE (val)))
   1179  1.1  mrg     {
   1180  1.1  mrg       tree tem = make_ssa_name (integer_type_node);
   1181  1.1  mrg       gimple *cstmt = gimple_build_assign (tem, NOP_EXPR, val);
   1182  1.1  mrg       gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
   1183  1.1  mrg       val = tem;
   1184  1.1  mrg     }
   1185  1.1  mrg 
   1186  1.1  mrg   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
   1187  1.1  mrg   fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
   1188  1.1  mrg   gimple_set_location (fn_call, partition->loc);
   1189  1.1  mrg   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
   1190  1.1  mrg   fold_stmt (&gsi);
   1191  1.1  mrg 
   1192  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   1193  1.1  mrg     {
   1194  1.1  mrg       fprintf (dump_file, "generated memset");
   1195  1.1  mrg       if (bytev == 0)
   1196  1.1  mrg 	fprintf (dump_file, " zero\n");
   1197  1.1  mrg       else
   1198  1.1  mrg 	fprintf (dump_file, "\n");
   1199  1.1  mrg     }
   1200  1.1  mrg }
   1201  1.1  mrg 
   1202  1.1  mrg /* Generate a call to memcpy for PARTITION in LOOP.  */
   1203  1.1  mrg 
   1204  1.1  mrg static void
   1205  1.1  mrg generate_memcpy_builtin (class loop *loop, partition *partition)
   1206  1.1  mrg {
   1207  1.1  mrg   gimple_stmt_iterator gsi;
   1208  1.1  mrg   gimple *fn_call;
   1209  1.1  mrg   tree dest, src, fn, nb_bytes;
   1210  1.1  mrg   enum built_in_function kind;
   1211  1.1  mrg   struct builtin_info *builtin = partition->builtin;
   1212  1.1  mrg 
   1213  1.1  mrg   /* The new statements will be placed before LOOP.  */
   1214  1.1  mrg   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
   1215  1.1  mrg 
   1216  1.1  mrg   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
   1217  1.1  mrg   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
   1218  1.1  mrg 				       false, GSI_CONTINUE_LINKING);
   1219  1.1  mrg   dest = rewrite_to_non_trapping_overflow (builtin->dst_base);
   1220  1.1  mrg   src = rewrite_to_non_trapping_overflow (builtin->src_base);
   1221  1.1  mrg   if (partition->kind == PKIND_MEMCPY
   1222  1.1  mrg       || ! ptr_derefs_may_alias_p (dest, src))
   1223  1.1  mrg     kind = BUILT_IN_MEMCPY;
   1224  1.1  mrg   else
   1225  1.1  mrg     kind = BUILT_IN_MEMMOVE;
   1226  1.1  mrg   /* Try harder if we're copying a constant size.  */
   1227  1.1  mrg   if (kind == BUILT_IN_MEMMOVE && poly_int_tree_p (nb_bytes))
   1228  1.1  mrg     {
   1229  1.1  mrg       aff_tree asrc, adest;
   1230  1.1  mrg       tree_to_aff_combination (src, ptr_type_node, &asrc);
   1231  1.1  mrg       tree_to_aff_combination (dest, ptr_type_node, &adest);
   1232  1.1  mrg       aff_combination_scale (&adest, -1);
   1233  1.1  mrg       aff_combination_add (&asrc, &adest);
   1234  1.1  mrg       if (aff_comb_cannot_overlap_p (&asrc, wi::to_poly_widest (nb_bytes),
   1235  1.1  mrg 				     wi::to_poly_widest (nb_bytes)))
   1236  1.1  mrg 	kind = BUILT_IN_MEMCPY;
   1237  1.1  mrg     }
   1238  1.1  mrg 
   1239  1.1  mrg   dest = force_gimple_operand_gsi (&gsi, dest, true, NULL_TREE,
   1240  1.1  mrg 				   false, GSI_CONTINUE_LINKING);
   1241  1.1  mrg   src = force_gimple_operand_gsi (&gsi, src, true, NULL_TREE,
   1242  1.1  mrg 				  false, GSI_CONTINUE_LINKING);
   1243  1.1  mrg   fn = build_fold_addr_expr (builtin_decl_implicit (kind));
   1244  1.1  mrg   fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
   1245  1.1  mrg   gimple_set_location (fn_call, partition->loc);
   1246  1.1  mrg   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
   1247  1.1  mrg   fold_stmt (&gsi);
   1248  1.1  mrg 
   1249  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   1250  1.1  mrg     {
   1251  1.1  mrg       if (kind == BUILT_IN_MEMCPY)
   1252  1.1  mrg 	fprintf (dump_file, "generated memcpy\n");
   1253  1.1  mrg       else
   1254  1.1  mrg 	fprintf (dump_file, "generated memmove\n");
   1255  1.1  mrg     }
   1256  1.1  mrg }
   1257  1.1  mrg 
   1258  1.1  mrg /* Remove and destroy the loop LOOP.  */
   1259  1.1  mrg 
   1260  1.1  mrg static void
   1261  1.1  mrg destroy_loop (class loop *loop)
   1262  1.1  mrg {
   1263  1.1  mrg   unsigned nbbs = loop->num_nodes;
   1264  1.1  mrg   edge exit = single_exit (loop);
   1265  1.1  mrg   basic_block src = loop_preheader_edge (loop)->src, dest = exit->dest;
   1266  1.1  mrg   basic_block *bbs;
   1267  1.1  mrg   unsigned i;
   1268  1.1  mrg 
   1269  1.1  mrg   bbs = get_loop_body_in_dom_order (loop);
   1270  1.1  mrg 
   1271  1.1  mrg   gimple_stmt_iterator dst_gsi = gsi_after_labels (exit->dest);
   1272  1.1  mrg   bool safe_p = single_pred_p (exit->dest);
   1273  1.1  mrg   for (unsigned i = 0; i < nbbs; ++i)
   1274  1.1  mrg     {
   1275  1.1  mrg       /* We have made sure to not leave any dangling uses of SSA
   1276  1.1  mrg          names defined in the loop.  With the exception of virtuals.
   1277  1.1  mrg 	 Make sure we replace all uses of virtual defs that will remain
   1278  1.1  mrg 	 outside of the loop with the bare symbol as delete_basic_block
   1279  1.1  mrg 	 will release them.  */
   1280  1.1  mrg       for (gphi_iterator gsi = gsi_start_phis (bbs[i]); !gsi_end_p (gsi);
   1281  1.1  mrg 	   gsi_next (&gsi))
   1282  1.1  mrg 	{
   1283  1.1  mrg 	  gphi *phi = gsi.phi ();
   1284  1.1  mrg 	  if (virtual_operand_p (gimple_phi_result (phi)))
   1285  1.1  mrg 	    mark_virtual_phi_result_for_renaming (phi);
   1286  1.1  mrg 	}
   1287  1.1  mrg       for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi);)
   1288  1.1  mrg 	{
   1289  1.1  mrg 	  gimple *stmt = gsi_stmt (gsi);
   1290  1.1  mrg 	  tree vdef = gimple_vdef (stmt);
   1291  1.1  mrg 	  if (vdef && TREE_CODE (vdef) == SSA_NAME)
   1292  1.1  mrg 	    mark_virtual_operand_for_renaming (vdef);
   1293  1.1  mrg 	  /* Also move and eventually reset debug stmts.  We can leave
   1294  1.1  mrg 	     constant values in place in case the stmt dominates the exit.
   1295  1.1  mrg 	     ???  Non-constant values from the last iteration can be
   1296  1.1  mrg 	     replaced with final values if we can compute them.  */
   1297  1.1  mrg 	  if (gimple_debug_bind_p (stmt))
   1298  1.1  mrg 	    {
   1299  1.1  mrg 	      tree val = gimple_debug_bind_get_value (stmt);
   1300  1.1  mrg 	      gsi_move_before (&gsi, &dst_gsi);
   1301  1.1  mrg 	      if (val
   1302  1.1  mrg 		  && (!safe_p
   1303  1.1  mrg 		      || !is_gimple_min_invariant (val)
   1304  1.1  mrg 		      || !dominated_by_p (CDI_DOMINATORS, exit->src, bbs[i])))
   1305  1.1  mrg 		{
   1306  1.1  mrg 		  gimple_debug_bind_reset_value (stmt);
   1307  1.1  mrg 		  update_stmt (stmt);
   1308  1.1  mrg 		}
   1309  1.1  mrg 	    }
   1310  1.1  mrg 	  else
   1311  1.1  mrg 	    gsi_next (&gsi);
   1312  1.1  mrg 	}
   1313  1.1  mrg     }
   1314  1.1  mrg 
   1315  1.1  mrg   redirect_edge_pred (exit, src);
   1316  1.1  mrg   exit->flags &= ~(EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
   1317  1.1  mrg   exit->flags |= EDGE_FALLTHRU;
   1318  1.1  mrg   cancel_loop_tree (loop);
   1319  1.1  mrg   rescan_loop_exit (exit, false, true);
   1320  1.1  mrg 
   1321  1.1  mrg   i = nbbs;
   1322  1.1  mrg   do
   1323  1.1  mrg     {
   1324  1.1  mrg       --i;
   1325  1.1  mrg       delete_basic_block (bbs[i]);
   1326  1.1  mrg     }
   1327  1.1  mrg   while (i != 0);
   1328  1.1  mrg 
   1329  1.1  mrg   free (bbs);
   1330  1.1  mrg 
   1331  1.1  mrg   set_immediate_dominator (CDI_DOMINATORS, dest,
   1332  1.1  mrg 			   recompute_dominator (CDI_DOMINATORS, dest));
   1333  1.1  mrg }
   1334  1.1  mrg 
   1335  1.1  mrg /* Generates code for PARTITION.  Return whether LOOP needs to be destroyed.  */
   1336  1.1  mrg 
   1337  1.1  mrg static bool
   1338  1.1  mrg generate_code_for_partition (class loop *loop,
   1339  1.1  mrg 			     partition *partition, bool copy_p)
   1340  1.1  mrg {
   1341  1.1  mrg   switch (partition->kind)
   1342  1.1  mrg     {
   1343  1.1  mrg     case PKIND_NORMAL:
   1344  1.1  mrg     case PKIND_PARTIAL_MEMSET:
   1345  1.1  mrg       /* Reductions all have to be in the last partition.  */
   1346  1.1  mrg       gcc_assert (!partition_reduction_p (partition)
   1347  1.1  mrg 		  || !copy_p);
   1348  1.1  mrg       generate_loops_for_partition (loop, partition, copy_p);
   1349  1.1  mrg       return false;
   1350  1.1  mrg 
   1351  1.1  mrg     case PKIND_MEMSET:
   1352  1.1  mrg       generate_memset_builtin (loop, partition);
   1353  1.1  mrg       break;
   1354  1.1  mrg 
   1355  1.1  mrg     case PKIND_MEMCPY:
   1356  1.1  mrg     case PKIND_MEMMOVE:
   1357  1.1  mrg       generate_memcpy_builtin (loop, partition);
   1358  1.1  mrg       break;
   1359  1.1  mrg 
   1360  1.1  mrg     default:
   1361  1.1  mrg       gcc_unreachable ();
   1362  1.1  mrg     }
   1363  1.1  mrg 
   1364  1.1  mrg   /* Common tail for partitions we turn into a call.  If this was the last
   1365  1.1  mrg      partition for which we generate code, we have to destroy the loop.  */
   1366  1.1  mrg   if (!copy_p)
   1367  1.1  mrg     return true;
   1368  1.1  mrg   return false;
   1369  1.1  mrg }
   1370  1.1  mrg 
   1371  1.1  mrg data_dependence_relation *
   1372  1.1  mrg loop_distribution::get_data_dependence (struct graph *rdg, data_reference_p a,
   1373  1.1  mrg 					data_reference_p b)
   1374  1.1  mrg {
   1375  1.1  mrg   struct data_dependence_relation ent, **slot;
   1376  1.1  mrg   struct data_dependence_relation *ddr;
   1377  1.1  mrg 
   1378  1.1  mrg   gcc_assert (DR_IS_WRITE (a) || DR_IS_WRITE (b));
   1379  1.1  mrg   gcc_assert (rdg_vertex_for_stmt (rdg, DR_STMT (a))
   1380  1.1  mrg 	      <= rdg_vertex_for_stmt (rdg, DR_STMT (b)));
   1381  1.1  mrg   ent.a = a;
   1382  1.1  mrg   ent.b = b;
   1383  1.1  mrg   slot = ddrs_table->find_slot (&ent, INSERT);
   1384  1.1  mrg   if (*slot == NULL)
   1385  1.1  mrg     {
   1386  1.1  mrg       ddr = initialize_data_dependence_relation (a, b, loop_nest);
   1387  1.1  mrg       compute_affine_dependence (ddr, loop_nest[0]);
   1388  1.1  mrg       *slot = ddr;
   1389  1.1  mrg     }
   1390  1.1  mrg 
   1391  1.1  mrg   return *slot;
   1392  1.1  mrg }
   1393  1.1  mrg 
   1394  1.1  mrg bool
   1395  1.1  mrg loop_distribution::data_dep_in_cycle_p (struct graph *rdg,
   1396  1.1  mrg 					data_reference_p dr1,
   1397  1.1  mrg 					data_reference_p dr2)
   1398  1.1  mrg {
   1399  1.1  mrg   struct data_dependence_relation *ddr;
   1400  1.1  mrg 
   1401  1.1  mrg   /* Re-shuffle data-refs to be in topological order.  */
   1402  1.1  mrg   if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
   1403  1.1  mrg       > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
   1404  1.1  mrg     std::swap (dr1, dr2);
   1405  1.1  mrg 
   1406  1.1  mrg   ddr = get_data_dependence (rdg, dr1, dr2);
   1407  1.1  mrg 
   1408  1.1  mrg   /* In case of no data dependence.  */
   1409  1.1  mrg   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
   1410  1.1  mrg     return false;
   1411  1.1  mrg   /* For unknown data dependence or known data dependence which can't be
   1412  1.1  mrg      expressed in classic distance vector, we check if it can be resolved
   1413  1.1  mrg      by runtime alias check.  If yes, we still consider data dependence
   1414  1.1  mrg      as won't introduce data dependence cycle.  */
   1415  1.1  mrg   else if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
   1416  1.1  mrg 	   || DDR_NUM_DIST_VECTS (ddr) == 0)
   1417  1.1  mrg     return !runtime_alias_check_p (ddr, NULL, true);
   1418  1.1  mrg   else if (DDR_NUM_DIST_VECTS (ddr) > 1)
   1419  1.1  mrg     return true;
   1420  1.1  mrg   else if (DDR_REVERSED_P (ddr)
   1421  1.1  mrg 	   || lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1))
   1422  1.1  mrg     return false;
   1423  1.1  mrg 
   1424  1.1  mrg   return true;
   1425  1.1  mrg }
   1426  1.1  mrg 
   1427  1.1  mrg void
   1428  1.1  mrg loop_distribution::update_type_for_merge (struct graph *rdg,
   1429  1.1  mrg 					   partition *partition1,
   1430  1.1  mrg 					   partition *partition2)
   1431  1.1  mrg {
   1432  1.1  mrg   unsigned i, j;
   1433  1.1  mrg   bitmap_iterator bi, bj;
   1434  1.1  mrg   data_reference_p dr1, dr2;
   1435  1.1  mrg 
   1436  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
   1437  1.1  mrg     {
   1438  1.1  mrg       unsigned start = (partition1 == partition2) ? i + 1 : 0;
   1439  1.1  mrg 
   1440  1.1  mrg       dr1 = datarefs_vec[i];
   1441  1.1  mrg       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, start, j, bj)
   1442  1.1  mrg 	{
   1443  1.1  mrg 	  dr2 = datarefs_vec[j];
   1444  1.1  mrg 	  if (DR_IS_READ (dr1) && DR_IS_READ (dr2))
   1445  1.1  mrg 	    continue;
   1446  1.1  mrg 
   1447  1.1  mrg 	  /* Partition can only be executed sequentially if there is any
   1448  1.1  mrg 	     data dependence cycle.  */
   1449  1.1  mrg 	  if (data_dep_in_cycle_p (rdg, dr1, dr2))
   1450  1.1  mrg 	    {
   1451  1.1  mrg 	      partition1->type = PTYPE_SEQUENTIAL;
   1452  1.1  mrg 	      return;
   1453  1.1  mrg 	    }
   1454  1.1  mrg 	}
   1455  1.1  mrg     }
   1456  1.1  mrg }
   1457  1.1  mrg 
   1458  1.1  mrg partition *
   1459  1.1  mrg loop_distribution::build_rdg_partition_for_vertex (struct graph *rdg, int v)
   1460  1.1  mrg {
   1461  1.1  mrg   partition *partition = partition_alloc ();
   1462  1.1  mrg   auto_vec<int, 3> nodes;
   1463  1.1  mrg   unsigned i, j;
   1464  1.1  mrg   int x;
   1465  1.1  mrg   data_reference_p dr;
   1466  1.1  mrg 
   1467  1.1  mrg   graphds_dfs (rdg, &v, 1, &nodes, false, NULL);
   1468  1.1  mrg 
   1469  1.1  mrg   FOR_EACH_VEC_ELT (nodes, i, x)
   1470  1.1  mrg     {
   1471  1.1  mrg       bitmap_set_bit (partition->stmts, x);
   1472  1.1  mrg 
   1473  1.1  mrg       for (j = 0; RDG_DATAREFS (rdg, x).iterate (j, &dr); ++j)
   1474  1.1  mrg 	{
   1475  1.1  mrg 	  unsigned idx = (unsigned) DR_INDEX (dr);
   1476  1.1  mrg 	  gcc_assert (idx < datarefs_vec.length ());
   1477  1.1  mrg 
   1478  1.1  mrg 	  /* Partition can only be executed sequentially if there is any
   1479  1.1  mrg 	     unknown data reference.  */
   1480  1.1  mrg 	  if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr)
   1481  1.1  mrg 	      || !DR_INIT (dr) || !DR_STEP (dr))
   1482  1.1  mrg 	    partition->type = PTYPE_SEQUENTIAL;
   1483  1.1  mrg 
   1484  1.1  mrg 	  bitmap_set_bit (partition->datarefs, idx);
   1485  1.1  mrg 	}
   1486  1.1  mrg     }
   1487  1.1  mrg 
   1488  1.1  mrg   if (partition->type == PTYPE_SEQUENTIAL)
   1489  1.1  mrg     return partition;
   1490  1.1  mrg 
   1491  1.1  mrg   /* Further check if any data dependence prevents us from executing the
   1492  1.1  mrg      partition parallelly.  */
   1493  1.1  mrg   update_type_for_merge (rdg, partition, partition);
   1494  1.1  mrg 
   1495  1.1  mrg   return partition;
   1496  1.1  mrg }
   1497  1.1  mrg 
   1498  1.1  mrg /* Given PARTITION of LOOP and RDG, record single load/store data references
   1499  1.1  mrg    for builtin partition in SRC_DR/DST_DR, return false if there is no such
   1500  1.1  mrg    data references.  */
   1501  1.1  mrg 
   1502  1.1  mrg static bool
   1503  1.1  mrg find_single_drs (class loop *loop, struct graph *rdg, const bitmap &partition_stmts,
   1504  1.1  mrg 		 data_reference_p *dst_dr, data_reference_p *src_dr)
   1505  1.1  mrg {
   1506  1.1  mrg   unsigned i;
   1507  1.1  mrg   data_reference_p single_ld = NULL, single_st = NULL;
   1508  1.1  mrg   bitmap_iterator bi;
   1509  1.1  mrg 
   1510  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (partition_stmts, 0, i, bi)
   1511  1.1  mrg     {
   1512  1.1  mrg       gimple *stmt = RDG_STMT (rdg, i);
   1513  1.1  mrg       data_reference_p dr;
   1514  1.1  mrg 
   1515  1.1  mrg       if (gimple_code (stmt) == GIMPLE_PHI)
   1516  1.1  mrg 	continue;
   1517  1.1  mrg 
   1518  1.1  mrg       /* Any scalar stmts are ok.  */
   1519  1.1  mrg       if (!gimple_vuse (stmt))
   1520  1.1  mrg 	continue;
   1521  1.1  mrg 
   1522  1.1  mrg       /* Otherwise just regular loads/stores.  */
   1523  1.1  mrg       if (!gimple_assign_single_p (stmt))
   1524  1.1  mrg 	return false;
   1525  1.1  mrg 
   1526  1.1  mrg       /* But exactly one store and/or load.  */
   1527  1.1  mrg       for (unsigned j = 0; RDG_DATAREFS (rdg, i).iterate (j, &dr); ++j)
   1528  1.1  mrg 	{
   1529  1.1  mrg 	  tree type = TREE_TYPE (DR_REF (dr));
   1530  1.1  mrg 
   1531  1.1  mrg 	  /* The memset, memcpy and memmove library calls are only
   1532  1.1  mrg 	     able to deal with generic address space.  */
   1533  1.1  mrg 	  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (type)))
   1534  1.1  mrg 	    return false;
   1535  1.1  mrg 
   1536  1.1  mrg 	  if (DR_IS_READ (dr))
   1537  1.1  mrg 	    {
   1538  1.1  mrg 	      if (single_ld != NULL)
   1539  1.1  mrg 		return false;
   1540  1.1  mrg 	      single_ld = dr;
   1541  1.1  mrg 	    }
   1542  1.1  mrg 	  else
   1543  1.1  mrg 	    {
   1544  1.1  mrg 	      if (single_st != NULL)
   1545  1.1  mrg 		return false;
   1546  1.1  mrg 	      single_st = dr;
   1547  1.1  mrg 	    }
   1548  1.1  mrg 	}
   1549  1.1  mrg     }
   1550  1.1  mrg 
   1551  1.1  mrg   if (!single_ld && !single_st)
   1552  1.1  mrg     return false;
   1553  1.1  mrg 
   1554  1.1  mrg   basic_block bb_ld = NULL;
   1555  1.1  mrg   basic_block bb_st = NULL;
   1556  1.1  mrg 
   1557  1.1  mrg   if (single_ld)
   1558  1.1  mrg     {
   1559  1.1  mrg       /* Bail out if this is a bitfield memory reference.  */
   1560  1.1  mrg       if (TREE_CODE (DR_REF (single_ld)) == COMPONENT_REF
   1561  1.1  mrg 	  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_ld), 1)))
   1562  1.1  mrg 	return false;
   1563  1.1  mrg 
   1564  1.1  mrg       /* Data reference must be executed exactly once per iteration of each
   1565  1.1  mrg 	 loop in the loop nest.  We only need to check dominance information
   1566  1.1  mrg 	 against the outermost one in a perfect loop nest because a bb can't
   1567  1.1  mrg 	 dominate outermost loop's latch without dominating inner loop's.  */
   1568  1.1  mrg       bb_ld = gimple_bb (DR_STMT (single_ld));
   1569  1.1  mrg       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
   1570  1.1  mrg 	return false;
   1571  1.1  mrg     }
   1572  1.1  mrg 
   1573  1.1  mrg   if (single_st)
   1574  1.1  mrg     {
   1575  1.1  mrg       /* Bail out if this is a bitfield memory reference.  */
   1576  1.1  mrg       if (TREE_CODE (DR_REF (single_st)) == COMPONENT_REF
   1577  1.1  mrg 	  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
   1578  1.1  mrg 	return false;
   1579  1.1  mrg 
   1580  1.1  mrg       /* Data reference must be executed exactly once per iteration.
   1581  1.1  mrg 	 Same as single_ld, we only need to check against the outermost
   1582  1.1  mrg 	 loop.  */
   1583  1.1  mrg       bb_st = gimple_bb (DR_STMT (single_st));
   1584  1.1  mrg       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
   1585  1.1  mrg 	return false;
   1586  1.1  mrg     }
   1587  1.1  mrg 
   1588  1.1  mrg   if (single_ld && single_st)
   1589  1.1  mrg     {
   1590  1.1  mrg       /* Load and store must be in the same loop nest.  */
   1591  1.1  mrg       if (bb_st->loop_father != bb_ld->loop_father)
   1592  1.1  mrg 	return false;
   1593  1.1  mrg 
   1594  1.1  mrg       edge e = single_exit (bb_st->loop_father);
   1595  1.1  mrg       bool dom_ld = dominated_by_p (CDI_DOMINATORS, e->src, bb_ld);
   1596  1.1  mrg       bool dom_st = dominated_by_p (CDI_DOMINATORS, e->src, bb_st);
   1597  1.1  mrg       if (dom_ld != dom_st)
   1598  1.1  mrg 	return false;
   1599  1.1  mrg     }
   1600  1.1  mrg 
   1601  1.1  mrg   *src_dr = single_ld;
   1602  1.1  mrg   *dst_dr = single_st;
   1603  1.1  mrg   return true;
   1604  1.1  mrg }
   1605  1.1  mrg 
   1606  1.1  mrg /* Given data reference DR in LOOP_NEST, this function checks the enclosing
   1607  1.1  mrg    loops from inner to outer to see if loop's step equals to access size at
   1608  1.1  mrg    each level of loop.  Return 2 if we can prove this at all level loops;
   1609  1.1  mrg    record access base and size in BASE and SIZE; save loop's step at each
   1610  1.1  mrg    level of loop in STEPS if it is not null.  For example:
   1611  1.1  mrg 
   1612  1.1  mrg      int arr[100][100][100];
   1613  1.1  mrg      for (i = 0; i < 100; i++)       ;steps[2] = 40000
   1614  1.1  mrg        for (j = 100; j > 0; j--)     ;steps[1] = -400
   1615  1.1  mrg 	 for (k = 0; k < 100; k++)   ;steps[0] = 4
   1616  1.1  mrg 	   arr[i][j - 1][k] = 0;     ;base = &arr, size = 4000000
   1617  1.1  mrg 
   1618  1.1  mrg    Return 1 if we can prove the equality at the innermost loop, but not all
   1619  1.1  mrg    level loops.  In this case, no information is recorded.
   1620  1.1  mrg 
   1621  1.1  mrg    Return 0 if no equality can be proven at any level loops.  */
   1622  1.1  mrg 
   1623  1.1  mrg static int
   1624  1.1  mrg compute_access_range (loop_p loop_nest, data_reference_p dr, tree *base,
   1625  1.1  mrg 		      tree *size, vec<tree> *steps = NULL)
   1626  1.1  mrg {
   1627  1.1  mrg   location_t loc = gimple_location (DR_STMT (dr));
   1628  1.1  mrg   basic_block bb = gimple_bb (DR_STMT (dr));
   1629  1.1  mrg   class loop *loop = bb->loop_father;
   1630  1.1  mrg   tree ref = DR_REF (dr);
   1631  1.1  mrg   tree access_base = build_fold_addr_expr (ref);
   1632  1.1  mrg   tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
   1633  1.1  mrg   int res = 0;
   1634  1.1  mrg 
   1635  1.1  mrg   do {
   1636  1.1  mrg       tree scev_fn = analyze_scalar_evolution (loop, access_base);
   1637  1.1  mrg       if (TREE_CODE (scev_fn) != POLYNOMIAL_CHREC)
   1638  1.1  mrg 	return res;
   1639  1.1  mrg 
   1640  1.1  mrg       access_base = CHREC_LEFT (scev_fn);
   1641  1.1  mrg       if (tree_contains_chrecs (access_base, NULL))
   1642  1.1  mrg 	return res;
   1643  1.1  mrg 
   1644  1.1  mrg       tree scev_step = CHREC_RIGHT (scev_fn);
   1645  1.1  mrg       /* Only support constant steps.  */
   1646  1.1  mrg       if (TREE_CODE (scev_step) != INTEGER_CST)
   1647  1.1  mrg 	return res;
   1648  1.1  mrg 
   1649  1.1  mrg       enum ev_direction access_dir = scev_direction (scev_fn);
   1650  1.1  mrg       if (access_dir == EV_DIR_UNKNOWN)
   1651  1.1  mrg 	return res;
   1652  1.1  mrg 
   1653  1.1  mrg       if (steps != NULL)
   1654  1.1  mrg 	steps->safe_push (scev_step);
   1655  1.1  mrg 
   1656  1.1  mrg       scev_step = fold_convert_loc (loc, sizetype, scev_step);
   1657  1.1  mrg       /* Compute absolute value of scev step.  */
   1658  1.1  mrg       if (access_dir == EV_DIR_DECREASES)
   1659  1.1  mrg 	scev_step = fold_build1_loc (loc, NEGATE_EXPR, sizetype, scev_step);
   1660  1.1  mrg 
   1661  1.1  mrg       /* At each level of loop, scev step must equal to access size.  In other
   1662  1.1  mrg 	 words, DR must access consecutive memory between loop iterations.  */
   1663  1.1  mrg       if (!operand_equal_p (scev_step, access_size, 0))
   1664  1.1  mrg 	return res;
   1665  1.1  mrg 
   1666  1.1  mrg       /* Access stride can be computed for data reference at least for the
   1667  1.1  mrg 	 innermost loop.  */
   1668  1.1  mrg       res = 1;
   1669  1.1  mrg 
   1670  1.1  mrg       /* Compute DR's execution times in loop.  */
   1671  1.1  mrg       tree niters = number_of_latch_executions (loop);
   1672  1.1  mrg       niters = fold_convert_loc (loc, sizetype, niters);
   1673  1.1  mrg       if (dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, bb))
   1674  1.1  mrg 	niters = size_binop_loc (loc, PLUS_EXPR, niters, size_one_node);
   1675  1.1  mrg 
   1676  1.1  mrg       /* Compute DR's overall access size in loop.  */
   1677  1.1  mrg       access_size = fold_build2_loc (loc, MULT_EXPR, sizetype,
   1678  1.1  mrg 				     niters, scev_step);
   1679  1.1  mrg       /* Adjust base address in case of negative step.  */
   1680  1.1  mrg       if (access_dir == EV_DIR_DECREASES)
   1681  1.1  mrg 	{
   1682  1.1  mrg 	  tree adj = fold_build2_loc (loc, MINUS_EXPR, sizetype,
   1683  1.1  mrg 				      scev_step, access_size);
   1684  1.1  mrg 	  access_base = fold_build_pointer_plus_loc (loc, access_base, adj);
   1685  1.1  mrg 	}
   1686  1.1  mrg   } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
   1687  1.1  mrg 
   1688  1.1  mrg   *base = access_base;
   1689  1.1  mrg   *size = access_size;
   1690  1.1  mrg   /* Access stride can be computed for data reference at each level loop.  */
   1691  1.1  mrg   return 2;
   1692  1.1  mrg }
   1693  1.1  mrg 
   1694  1.1  mrg /* Allocate and return builtin struct.  Record information like DST_DR,
   1695  1.1  mrg    SRC_DR, DST_BASE, SRC_BASE and SIZE in the allocated struct.  */
   1696  1.1  mrg 
   1697  1.1  mrg static struct builtin_info *
   1698  1.1  mrg alloc_builtin (data_reference_p dst_dr, data_reference_p src_dr,
   1699  1.1  mrg 	       tree dst_base, tree src_base, tree size)
   1700  1.1  mrg {
   1701  1.1  mrg   struct builtin_info *builtin = XNEW (struct builtin_info);
   1702  1.1  mrg   builtin->dst_dr = dst_dr;
   1703  1.1  mrg   builtin->src_dr = src_dr;
   1704  1.1  mrg   builtin->dst_base = dst_base;
   1705  1.1  mrg   builtin->src_base = src_base;
   1706  1.1  mrg   builtin->size = size;
   1707  1.1  mrg   return builtin;
   1708  1.1  mrg }
   1709  1.1  mrg 
   1710  1.1  mrg /* Given data reference DR in loop nest LOOP, classify if it forms builtin
   1711  1.1  mrg    memset call.  */
   1712  1.1  mrg 
   1713  1.1  mrg static void
   1714  1.1  mrg classify_builtin_st (loop_p loop, partition *partition, data_reference_p dr)
   1715  1.1  mrg {
   1716  1.1  mrg   gimple *stmt = DR_STMT (dr);
   1717  1.1  mrg   tree base, size, rhs = gimple_assign_rhs1 (stmt);
   1718  1.1  mrg 
   1719  1.1  mrg   if (const_with_all_bytes_same (rhs) == -1
   1720  1.1  mrg       && (!INTEGRAL_TYPE_P (TREE_TYPE (rhs))
   1721  1.1  mrg 	  || (TYPE_MODE (TREE_TYPE (rhs))
   1722  1.1  mrg 	      != TYPE_MODE (unsigned_char_type_node))))
   1723  1.1  mrg     return;
   1724  1.1  mrg 
   1725  1.1  mrg   if (TREE_CODE (rhs) == SSA_NAME
   1726  1.1  mrg       && !SSA_NAME_IS_DEFAULT_DEF (rhs)
   1727  1.1  mrg       && flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (rhs))))
   1728  1.1  mrg     return;
   1729  1.1  mrg 
   1730  1.1  mrg   int res = compute_access_range (loop, dr, &base, &size);
   1731  1.1  mrg   if (res == 0)
   1732  1.1  mrg     return;
   1733  1.1  mrg   if (res == 1)
   1734  1.1  mrg     {
   1735  1.1  mrg       partition->kind = PKIND_PARTIAL_MEMSET;
   1736  1.1  mrg       return;
   1737  1.1  mrg     }
   1738  1.1  mrg 
   1739  1.1  mrg   poly_uint64 base_offset;
   1740  1.1  mrg   unsigned HOST_WIDE_INT const_base_offset;
   1741  1.1  mrg   tree base_base = strip_offset (base, &base_offset);
   1742  1.1  mrg   if (!base_offset.is_constant (&const_base_offset))
   1743  1.1  mrg     return;
   1744  1.1  mrg 
   1745  1.1  mrg   struct builtin_info *builtin;
   1746  1.1  mrg   builtin = alloc_builtin (dr, NULL, base, NULL_TREE, size);
   1747  1.1  mrg   builtin->dst_base_base = base_base;
   1748  1.1  mrg   builtin->dst_base_offset = const_base_offset;
   1749  1.1  mrg   partition->builtin = builtin;
   1750  1.1  mrg   partition->kind = PKIND_MEMSET;
   1751  1.1  mrg }
   1752  1.1  mrg 
   1753  1.1  mrg /* Given data references DST_DR and SRC_DR in loop nest LOOP and RDG, classify
   1754  1.1  mrg    if it forms builtin memcpy or memmove call.  */
   1755  1.1  mrg 
   1756  1.1  mrg void
   1757  1.1  mrg loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg,
   1758  1.1  mrg 					  partition *partition,
   1759  1.1  mrg 					  data_reference_p dst_dr,
   1760  1.1  mrg 					  data_reference_p src_dr)
   1761  1.1  mrg {
   1762  1.1  mrg   tree base, size, src_base, src_size;
   1763  1.1  mrg   auto_vec<tree> dst_steps, src_steps;
   1764  1.1  mrg 
   1765  1.1  mrg   /* Compute access range of both load and store.  */
   1766  1.1  mrg   int res = compute_access_range (loop, dst_dr, &base, &size, &dst_steps);
   1767  1.1  mrg   if (res != 2)
   1768  1.1  mrg     return;
   1769  1.1  mrg   res = compute_access_range (loop, src_dr, &src_base, &src_size, &src_steps);
   1770  1.1  mrg   if (res != 2)
   1771  1.1  mrg     return;
   1772  1.1  mrg 
   1773  1.1  mrg   /* They must have the same access size.  */
   1774  1.1  mrg   if (!operand_equal_p (size, src_size, 0))
   1775  1.1  mrg     return;
   1776  1.1  mrg 
   1777  1.1  mrg   /* They must have the same storage order.  */
   1778  1.1  mrg   if (reverse_storage_order_for_component_p (DR_REF (dst_dr))
   1779  1.1  mrg       != reverse_storage_order_for_component_p (DR_REF (src_dr)))
   1780  1.1  mrg     return;
   1781  1.1  mrg 
   1782  1.1  mrg   /* Load and store in loop nest must access memory in the same way, i.e,
   1783  1.1  mrg      their must have the same steps in each loop of the nest.  */
   1784  1.1  mrg   if (dst_steps.length () != src_steps.length ())
   1785  1.1  mrg     return;
   1786  1.1  mrg   for (unsigned i = 0; i < dst_steps.length (); ++i)
   1787  1.1  mrg     if (!operand_equal_p (dst_steps[i], src_steps[i], 0))
   1788  1.1  mrg       return;
   1789  1.1  mrg 
   1790  1.1  mrg   /* Now check that if there is a dependence.  */
   1791  1.1  mrg   ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr);
   1792  1.1  mrg 
   1793  1.1  mrg   /* Classify as memmove if no dependence between load and store.  */
   1794  1.1  mrg   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
   1795  1.1  mrg     {
   1796  1.1  mrg       partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size);
   1797  1.1  mrg       partition->kind = PKIND_MEMMOVE;
   1798  1.1  mrg       return;
   1799  1.1  mrg     }
   1800  1.1  mrg 
   1801  1.1  mrg   /* Can't do memmove in case of unknown dependence or dependence without
   1802  1.1  mrg      classical distance vector.  */
   1803  1.1  mrg   if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
   1804  1.1  mrg       || DDR_NUM_DIST_VECTS (ddr) == 0)
   1805  1.1  mrg     return;
   1806  1.1  mrg 
   1807  1.1  mrg   unsigned i;
   1808  1.1  mrg   lambda_vector dist_v;
   1809  1.1  mrg   int num_lev = (DDR_LOOP_NEST (ddr)).length ();
   1810  1.1  mrg   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
   1811  1.1  mrg     {
   1812  1.1  mrg       unsigned dep_lev = dependence_level (dist_v, num_lev);
   1813  1.1  mrg       /* Can't do memmove if load depends on store.  */
   1814  1.1  mrg       if (dep_lev > 0 && dist_v[dep_lev - 1] > 0 && !DDR_REVERSED_P (ddr))
   1815  1.1  mrg 	return;
   1816  1.1  mrg     }
   1817  1.1  mrg 
   1818  1.1  mrg   partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, size);
   1819  1.1  mrg   partition->kind = PKIND_MEMMOVE;
   1820  1.1  mrg   return;
   1821  1.1  mrg }
   1822  1.1  mrg 
   1823  1.1  mrg bool
   1824  1.1  mrg loop_distribution::classify_partition (loop_p loop,
   1825  1.1  mrg 				       struct graph *rdg, partition *partition,
   1826  1.1  mrg 				       bitmap stmt_in_all_partitions)
   1827  1.1  mrg {
   1828  1.1  mrg   bitmap_iterator bi;
   1829  1.1  mrg   unsigned i;
   1830  1.1  mrg   data_reference_p single_ld = NULL, single_st = NULL;
   1831  1.1  mrg   bool volatiles_p = false, has_reduction = false;
   1832  1.1  mrg 
   1833  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, bi)
   1834  1.1  mrg     {
   1835  1.1  mrg       gimple *stmt = RDG_STMT (rdg, i);
   1836  1.1  mrg 
   1837  1.1  mrg       if (gimple_has_volatile_ops (stmt))
   1838  1.1  mrg 	volatiles_p = true;
   1839  1.1  mrg 
   1840  1.1  mrg       /* If the stmt is not included by all partitions and there is uses
   1841  1.1  mrg 	 outside of the loop, then mark the partition as reduction.  */
   1842  1.1  mrg       if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
   1843  1.1  mrg 	{
   1844  1.1  mrg 	  /* Due to limitation in the transform phase we have to fuse all
   1845  1.1  mrg 	     reduction partitions.  As a result, this could cancel valid
   1846  1.1  mrg 	     loop distribution especially for loop that induction variable
   1847  1.1  mrg 	     is used outside of loop.  To workaround this issue, we skip
   1848  1.1  mrg 	     marking partition as reudction if the reduction stmt belongs
   1849  1.1  mrg 	     to all partitions.  In such case, reduction will be computed
   1850  1.1  mrg 	     correctly no matter how partitions are fused/distributed.  */
   1851  1.1  mrg 	  if (!bitmap_bit_p (stmt_in_all_partitions, i))
   1852  1.1  mrg 	    partition->reduction_p = true;
   1853  1.1  mrg 	  else
   1854  1.1  mrg 	    has_reduction = true;
   1855  1.1  mrg 	}
   1856  1.1  mrg     }
   1857  1.1  mrg 
   1858  1.1  mrg   /* Simple workaround to prevent classifying the partition as builtin
   1859  1.1  mrg      if it contains any use outside of loop.  For the case where all
   1860  1.1  mrg      partitions have the reduction this simple workaround is delayed
   1861  1.1  mrg      to only affect the last partition.  */
   1862  1.1  mrg   if (partition->reduction_p)
   1863  1.1  mrg      return has_reduction;
   1864  1.1  mrg 
   1865  1.1  mrg   /* Perform general partition disqualification for builtins.  */
   1866  1.1  mrg   if (volatiles_p
   1867  1.1  mrg       || !flag_tree_loop_distribute_patterns)
   1868  1.1  mrg     return has_reduction;
   1869  1.1  mrg 
   1870  1.1  mrg   /* Find single load/store data references for builtin partition.  */
   1871  1.1  mrg   if (!find_single_drs (loop, rdg, partition->stmts, &single_st, &single_ld)
   1872  1.1  mrg       || !single_st)
   1873  1.1  mrg     return has_reduction;
   1874  1.1  mrg 
   1875  1.1  mrg   if (single_ld && single_st)
   1876  1.1  mrg     {
   1877  1.1  mrg       gimple *store = DR_STMT (single_st), *load = DR_STMT (single_ld);
   1878  1.1  mrg       /* Direct aggregate copy or via an SSA name temporary.  */
   1879  1.1  mrg       if (load != store
   1880  1.1  mrg 	  && gimple_assign_lhs (load) != gimple_assign_rhs1 (store))
   1881  1.1  mrg 	return has_reduction;
   1882  1.1  mrg     }
   1883  1.1  mrg 
   1884  1.1  mrg   partition->loc = gimple_location (DR_STMT (single_st));
   1885  1.1  mrg 
   1886  1.1  mrg   /* Classify the builtin kind.  */
   1887  1.1  mrg   if (single_ld == NULL)
   1888  1.1  mrg     classify_builtin_st (loop, partition, single_st);
   1889  1.1  mrg   else
   1890  1.1  mrg     classify_builtin_ldst (loop, rdg, partition, single_st, single_ld);
   1891  1.1  mrg   return has_reduction;
   1892  1.1  mrg }
   1893  1.1  mrg 
   1894  1.1  mrg bool
   1895  1.1  mrg loop_distribution::share_memory_accesses (struct graph *rdg,
   1896  1.1  mrg 		       partition *partition1, partition *partition2)
   1897  1.1  mrg {
   1898  1.1  mrg   unsigned i, j;
   1899  1.1  mrg   bitmap_iterator bi, bj;
   1900  1.1  mrg   data_reference_p dr1, dr2;
   1901  1.1  mrg 
   1902  1.1  mrg   /* First check whether in the intersection of the two partitions are
   1903  1.1  mrg      any loads or stores.  Common loads are the situation that happens
   1904  1.1  mrg      most often.  */
   1905  1.1  mrg   EXECUTE_IF_AND_IN_BITMAP (partition1->stmts, partition2->stmts, 0, i, bi)
   1906  1.1  mrg     if (RDG_MEM_WRITE_STMT (rdg, i)
   1907  1.1  mrg 	|| RDG_MEM_READS_STMT (rdg, i))
   1908  1.1  mrg       return true;
   1909  1.1  mrg 
   1910  1.1  mrg   /* Then check whether the two partitions access the same memory object.  */
   1911  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
   1912  1.1  mrg     {
   1913  1.1  mrg       dr1 = datarefs_vec[i];
   1914  1.1  mrg 
   1915  1.1  mrg       if (!DR_BASE_ADDRESS (dr1)
   1916  1.1  mrg 	  || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
   1917  1.1  mrg 	continue;
   1918  1.1  mrg 
   1919  1.1  mrg       EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
   1920  1.1  mrg 	{
   1921  1.1  mrg 	  dr2 = datarefs_vec[j];
   1922  1.1  mrg 
   1923  1.1  mrg 	  if (!DR_BASE_ADDRESS (dr2)
   1924  1.1  mrg 	      || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
   1925  1.1  mrg 	    continue;
   1926  1.1  mrg 
   1927  1.1  mrg 	  if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
   1928  1.1  mrg 	      && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
   1929  1.1  mrg 	      && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
   1930  1.1  mrg 	      && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
   1931  1.1  mrg 	    return true;
   1932  1.1  mrg 	}
   1933  1.1  mrg     }
   1934  1.1  mrg 
   1935  1.1  mrg   return false;
   1936  1.1  mrg }
   1937  1.1  mrg 
   1938  1.1  mrg /* For each seed statement in STARTING_STMTS, this function builds
   1939  1.1  mrg    partition for it by adding depended statements according to RDG.
   1940  1.1  mrg    All partitions are recorded in PARTITIONS.  */
   1941  1.1  mrg 
   1942  1.1  mrg void
   1943  1.1  mrg loop_distribution::rdg_build_partitions (struct graph *rdg,
   1944  1.1  mrg 					 vec<gimple *> starting_stmts,
   1945  1.1  mrg 					 vec<partition *> *partitions)
   1946  1.1  mrg {
   1947  1.1  mrg   auto_bitmap processed;
   1948  1.1  mrg   int i;
   1949  1.1  mrg   gimple *stmt;
   1950  1.1  mrg 
   1951  1.1  mrg   FOR_EACH_VEC_ELT (starting_stmts, i, stmt)
   1952  1.1  mrg     {
   1953  1.1  mrg       int v = rdg_vertex_for_stmt (rdg, stmt);
   1954  1.1  mrg 
   1955  1.1  mrg       if (dump_file && (dump_flags & TDF_DETAILS))
   1956  1.1  mrg 	fprintf (dump_file,
   1957  1.1  mrg 		 "ldist asked to generate code for vertex %d\n", v);
   1958  1.1  mrg 
   1959  1.1  mrg       /* If the vertex is already contained in another partition so
   1960  1.1  mrg          is the partition rooted at it.  */
   1961  1.1  mrg       if (bitmap_bit_p (processed, v))
   1962  1.1  mrg 	continue;
   1963  1.1  mrg 
   1964  1.1  mrg       partition *partition = build_rdg_partition_for_vertex (rdg, v);
   1965  1.1  mrg       bitmap_ior_into (processed, partition->stmts);
   1966  1.1  mrg 
   1967  1.1  mrg       if (dump_file && (dump_flags & TDF_DETAILS))
   1968  1.1  mrg 	{
   1969  1.1  mrg 	  fprintf (dump_file, "ldist creates useful %s partition:\n",
   1970  1.1  mrg 		   partition->type == PTYPE_PARALLEL ? "parallel" : "sequent");
   1971  1.1  mrg 	  bitmap_print (dump_file, partition->stmts, "  ", "\n");
   1972  1.1  mrg 	}
   1973  1.1  mrg 
   1974  1.1  mrg       partitions->safe_push (partition);
   1975  1.1  mrg     }
   1976  1.1  mrg 
   1977  1.1  mrg   /* All vertices should have been assigned to at least one partition now,
   1978  1.1  mrg      other than vertices belonging to dead code.  */
   1979  1.1  mrg }
   1980  1.1  mrg 
   1981  1.1  mrg /* Dump to FILE the PARTITIONS.  */
   1982  1.1  mrg 
   1983  1.1  mrg static void
   1984  1.1  mrg dump_rdg_partitions (FILE *file, const vec<partition *> &partitions)
   1985  1.1  mrg {
   1986  1.1  mrg   int i;
   1987  1.1  mrg   partition *partition;
   1988  1.1  mrg 
   1989  1.1  mrg   FOR_EACH_VEC_ELT (partitions, i, partition)
   1990  1.1  mrg     debug_bitmap_file (file, partition->stmts);
   1991  1.1  mrg }
   1992  1.1  mrg 
   1993  1.1  mrg /* Debug PARTITIONS.  */
   1994  1.1  mrg extern void debug_rdg_partitions (const vec<partition *> &);
   1995  1.1  mrg 
   1996  1.1  mrg DEBUG_FUNCTION void
   1997  1.1  mrg debug_rdg_partitions (const vec<partition *> &partitions)
   1998  1.1  mrg {
   1999  1.1  mrg   dump_rdg_partitions (stderr, partitions);
   2000  1.1  mrg }
   2001  1.1  mrg 
   2002  1.1  mrg /* Returns the number of read and write operations in the RDG.  */
   2003  1.1  mrg 
   2004  1.1  mrg static int
   2005  1.1  mrg number_of_rw_in_rdg (struct graph *rdg)
   2006  1.1  mrg {
   2007  1.1  mrg   int i, res = 0;
   2008  1.1  mrg 
   2009  1.1  mrg   for (i = 0; i < rdg->n_vertices; i++)
   2010  1.1  mrg     {
   2011  1.1  mrg       if (RDG_MEM_WRITE_STMT (rdg, i))
   2012  1.1  mrg 	++res;
   2013  1.1  mrg 
   2014  1.1  mrg       if (RDG_MEM_READS_STMT (rdg, i))
   2015  1.1  mrg 	++res;
   2016  1.1  mrg     }
   2017  1.1  mrg 
   2018  1.1  mrg   return res;
   2019  1.1  mrg }
   2020  1.1  mrg 
   2021  1.1  mrg /* Returns the number of read and write operations in a PARTITION of
   2022  1.1  mrg    the RDG.  */
   2023  1.1  mrg 
   2024  1.1  mrg static int
   2025  1.1  mrg number_of_rw_in_partition (struct graph *rdg, partition *partition)
   2026  1.1  mrg {
   2027  1.1  mrg   int res = 0;
   2028  1.1  mrg   unsigned i;
   2029  1.1  mrg   bitmap_iterator ii;
   2030  1.1  mrg 
   2031  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (partition->stmts, 0, i, ii)
   2032  1.1  mrg     {
   2033  1.1  mrg       if (RDG_MEM_WRITE_STMT (rdg, i))
   2034  1.1  mrg 	++res;
   2035  1.1  mrg 
   2036  1.1  mrg       if (RDG_MEM_READS_STMT (rdg, i))
   2037  1.1  mrg 	++res;
   2038  1.1  mrg     }
   2039  1.1  mrg 
   2040  1.1  mrg   return res;
   2041  1.1  mrg }
   2042  1.1  mrg 
   2043  1.1  mrg /* Returns true when one of the PARTITIONS contains all the read or
   2044  1.1  mrg    write operations of RDG.  */
   2045  1.1  mrg 
   2046  1.1  mrg static bool
   2047  1.1  mrg partition_contains_all_rw (struct graph *rdg,
   2048  1.1  mrg 			   const vec<partition *> &partitions)
   2049  1.1  mrg {
   2050  1.1  mrg   int i;
   2051  1.1  mrg   partition *partition;
   2052  1.1  mrg   int nrw = number_of_rw_in_rdg (rdg);
   2053  1.1  mrg 
   2054  1.1  mrg   FOR_EACH_VEC_ELT (partitions, i, partition)
   2055  1.1  mrg     if (nrw == number_of_rw_in_partition (rdg, partition))
   2056  1.1  mrg       return true;
   2057  1.1  mrg 
   2058  1.1  mrg   return false;
   2059  1.1  mrg }
   2060  1.1  mrg 
   2061  1.1  mrg int
   2062  1.1  mrg loop_distribution::pg_add_dependence_edges (struct graph *rdg, int dir,
   2063  1.1  mrg 			 bitmap drs1, bitmap drs2, vec<ddr_p> *alias_ddrs)
   2064  1.1  mrg {
   2065  1.1  mrg   unsigned i, j;
   2066  1.1  mrg   bitmap_iterator bi, bj;
   2067  1.1  mrg   data_reference_p dr1, dr2, saved_dr1;
   2068  1.1  mrg 
   2069  1.1  mrg   /* dependence direction - 0 is no dependence, -1 is back,
   2070  1.1  mrg      1 is forth, 2 is both (we can stop then, merging will occur).  */
   2071  1.1  mrg   EXECUTE_IF_SET_IN_BITMAP (drs1, 0, i, bi)
   2072  1.1  mrg     {
   2073  1.1  mrg       dr1 = datarefs_vec[i];
   2074  1.1  mrg 
   2075  1.1  mrg       EXECUTE_IF_SET_IN_BITMAP (drs2, 0, j, bj)
   2076  1.1  mrg 	{
   2077  1.1  mrg 	  int res, this_dir = 1;
   2078  1.1  mrg 	  ddr_p ddr;
   2079  1.1  mrg 
   2080  1.1  mrg 	  dr2 = datarefs_vec[j];
   2081  1.1  mrg 
   2082  1.1  mrg 	  /* Skip all <read, read> data dependence.  */
   2083  1.1  mrg 	  if (DR_IS_READ (dr1) && DR_IS_READ (dr2))
   2084  1.1  mrg 	    continue;
   2085  1.1  mrg 
   2086  1.1  mrg 	  saved_dr1 = dr1;
   2087  1.1  mrg 	  /* Re-shuffle data-refs to be in topological order.  */
   2088  1.1  mrg 	  if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1))
   2089  1.1  mrg 	      > rdg_vertex_for_stmt (rdg, DR_STMT (dr2)))
   2090  1.1  mrg 	    {
   2091  1.1  mrg 	      std::swap (dr1, dr2);
   2092  1.1  mrg 	      this_dir = -this_dir;
   2093  1.1  mrg 	    }
   2094  1.1  mrg 	  ddr = get_data_dependence (rdg, dr1, dr2);
   2095  1.1  mrg 	  if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
   2096  1.1  mrg 	    {
   2097  1.1  mrg 	      this_dir = 0;
   2098  1.1  mrg 	      res = data_ref_compare_tree (DR_BASE_ADDRESS (dr1),
   2099  1.1  mrg 					   DR_BASE_ADDRESS (dr2));
   2100  1.1  mrg 	      /* Be conservative.  If data references are not well analyzed,
   2101  1.1  mrg 		 or the two data references have the same base address and
   2102  1.1  mrg 		 offset, add dependence and consider it alias to each other.
   2103  1.1  mrg 		 In other words, the dependence cannot be resolved by
   2104  1.1  mrg 		 runtime alias check.  */
   2105  1.1  mrg 	      if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
   2106  1.1  mrg 		  || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
   2107  1.1  mrg 		  || !DR_INIT (dr1) || !DR_INIT (dr2)
   2108  1.1  mrg 		  || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
   2109  1.1  mrg 		  || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
   2110  1.1  mrg 		  || res == 0)
   2111  1.1  mrg 		this_dir = 2;
   2112  1.1  mrg 	      /* Data dependence could be resolved by runtime alias check,
   2113  1.1  mrg 		 record it in ALIAS_DDRS.  */
   2114  1.1  mrg 	      else if (alias_ddrs != NULL)
   2115  1.1  mrg 		alias_ddrs->safe_push (ddr);
   2116  1.1  mrg 	      /* Or simply ignore it.  */
   2117  1.1  mrg 	    }
   2118  1.1  mrg 	  else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
   2119  1.1  mrg 	    {
   2120  1.1  mrg 	      /* Known dependences can still be unordered througout the
   2121  1.1  mrg 		 iteration space, see gcc.dg/tree-ssa/ldist-16.c and
   2122  1.1  mrg 		 gcc.dg/tree-ssa/pr94969.c.  */
   2123  1.1  mrg 	      if (DDR_NUM_DIST_VECTS (ddr) != 1)
   2124  1.1  mrg 		this_dir = 2;
   2125  1.1  mrg 	      else
   2126  1.1  mrg 		{
   2127  1.1  mrg 		  /* If the overlap is exact preserve stmt order.  */
   2128  1.1  mrg 		  if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0),
   2129  1.1  mrg 					   DDR_NB_LOOPS (ddr)))
   2130  1.1  mrg 		    ;
   2131  1.1  mrg 		  /* Else as the distance vector is lexicographic positive swap
   2132  1.1  mrg 		     the dependence direction.  */
   2133  1.1  mrg 		  else
   2134  1.1  mrg 		    {
   2135  1.1  mrg 		      if (DDR_REVERSED_P (ddr))
   2136  1.1  mrg 			this_dir = -this_dir;
   2137  1.1  mrg 		      this_dir = -this_dir;
   2138  1.1  mrg 		    }
   2139  1.1  mrg 		  /* When then dependence distance of the innermost common
   2140  1.1  mrg 		     loop of the DRs is zero we have a conflict.  This is
   2141  1.1  mrg 		     due to wonky dependence analysis which sometimes
   2142  1.1  mrg 		     ends up using a zero distance in place of unknown.  */
   2143  1.1  mrg 		  auto l1 = gimple_bb (DR_STMT (dr1))->loop_father;
   2144  1.1  mrg 		  auto l2 = gimple_bb (DR_STMT (dr2))->loop_father;
   2145  1.1  mrg 		  int idx = index_in_loop_nest (find_common_loop (l1, l2)->num,
   2146  1.1  mrg 						DDR_LOOP_NEST (ddr));
   2147  1.1  mrg 		  if (DDR_DIST_VECT (ddr, 0)[idx] == 0
   2148  1.1  mrg 		      /* Unless it is the outermost loop which is the one
   2149  1.1  mrg 			 we eventually distribute.  */
   2150  1.1  mrg 		      && idx != 0)
   2151  1.1  mrg 		    this_dir = 2;
   2152  1.1  mrg 		}
   2153  1.1  mrg 	    }
   2154  1.1  mrg 	  else
   2155  1.1  mrg 	    this_dir = 0;
   2156  1.1  mrg 	  if (this_dir == 2)
   2157  1.1  mrg 	    return 2;
   2158  1.1  mrg 	  else if (dir == 0)
   2159  1.1  mrg 	    dir = this_dir;
   2160  1.1  mrg 	  else if (this_dir != 0 && dir != this_dir)
   2161  1.1  mrg 	    return 2;
   2162  1.1  mrg 	  /* Shuffle "back" dr1.  */
   2163  1.1  mrg 	  dr1 = saved_dr1;
   2164  1.1  mrg 	}
   2165  1.1  mrg     }
   2166  1.1  mrg   return dir;
   2167  1.1  mrg }
   2168  1.1  mrg 
   2169  1.1  mrg /* Compare postorder number of the partition graph vertices V1 and V2.  */
   2170  1.1  mrg 
   2171  1.1  mrg static int
   2172  1.1  mrg pgcmp (const void *v1_, const void *v2_)
   2173  1.1  mrg {
   2174  1.1  mrg   const vertex *v1 = (const vertex *)v1_;
   2175  1.1  mrg   const vertex *v2 = (const vertex *)v2_;
   2176  1.1  mrg   return v2->post - v1->post;
   2177  1.1  mrg }
   2178  1.1  mrg 
   2179  1.1  mrg /* Data attached to vertices of partition dependence graph.  */
   2180  1.1  mrg struct pg_vdata
   2181  1.1  mrg {
   2182  1.1  mrg   /* ID of the corresponding partition.  */
   2183  1.1  mrg   int id;
   2184  1.1  mrg   /* The partition.  */
   2185  1.1  mrg   struct partition *partition;
   2186  1.1  mrg };
   2187  1.1  mrg 
   2188  1.1  mrg /* Data attached to edges of partition dependence graph.  */
   2189  1.1  mrg struct pg_edata
   2190  1.1  mrg {
   2191  1.1  mrg   /* If the dependence edge can be resolved by runtime alias check,
   2192  1.1  mrg      this vector contains data dependence relations for runtime alias
   2193  1.1  mrg      check.  On the other hand, if the dependence edge is introduced
   2194  1.1  mrg      because of compilation time known data dependence, this vector
   2195  1.1  mrg      contains nothing.  */
   2196  1.1  mrg   vec<ddr_p> alias_ddrs;
   2197  1.1  mrg };
   2198  1.1  mrg 
   2199  1.1  mrg /* Callback data for traversing edges in graph.  */
   2200  1.1  mrg struct pg_edge_callback_data
   2201  1.1  mrg {
   2202  1.1  mrg   /* Bitmap contains strong connected components should be merged.  */
   2203  1.1  mrg   bitmap sccs_to_merge;
   2204  1.1  mrg   /* Array constains component information for all vertices.  */
   2205  1.1  mrg   int *vertices_component;
   2206  1.1  mrg   /* Vector to record all data dependence relations which are needed
   2207  1.1  mrg      to break strong connected components by runtime alias checks.  */
   2208  1.1  mrg   vec<ddr_p> *alias_ddrs;
   2209  1.1  mrg };
   2210  1.1  mrg 
   2211  1.1  mrg /* Initialize vertice's data for partition dependence graph PG with
   2212  1.1  mrg    PARTITIONS.  */
   2213  1.1  mrg 
   2214  1.1  mrg static void
   2215  1.1  mrg init_partition_graph_vertices (struct graph *pg,
   2216  1.1  mrg 			       vec<struct partition *> *partitions)
   2217  1.1  mrg {
   2218  1.1  mrg   int i;
   2219  1.1  mrg   partition *partition;
   2220  1.1  mrg   struct pg_vdata *data;
   2221  1.1  mrg 
   2222  1.1  mrg   for (i = 0; partitions->iterate (i, &partition); ++i)
   2223  1.1  mrg     {
   2224  1.1  mrg       data = new pg_vdata;
   2225  1.1  mrg       pg->vertices[i].data = data;
   2226  1.1  mrg       data->id = i;
   2227  1.1  mrg       data->partition = partition;
   2228  1.1  mrg     }
   2229  1.1  mrg }
   2230  1.1  mrg 
   2231  1.1  mrg /* Add edge <I, J> to partition dependence graph PG.  Attach vector of data
   2232  1.1  mrg    dependence relations to the EDGE if DDRS isn't NULL.  */
   2233  1.1  mrg 
   2234  1.1  mrg static void
   2235  1.1  mrg add_partition_graph_edge (struct graph *pg, int i, int j, vec<ddr_p> *ddrs)
   2236  1.1  mrg {
   2237  1.1  mrg   struct graph_edge *e = add_edge (pg, i, j);
   2238  1.1  mrg 
   2239  1.1  mrg   /* If the edge is attached with data dependence relations, it means this
   2240  1.1  mrg      dependence edge can be resolved by runtime alias checks.  */
   2241  1.1  mrg   if (ddrs != NULL)
   2242  1.1  mrg     {
   2243  1.1  mrg       struct pg_edata *data = new pg_edata;
   2244  1.1  mrg 
   2245  1.1  mrg       gcc_assert (ddrs->length () > 0);
   2246  1.1  mrg       e->data = data;
   2247  1.1  mrg       data->alias_ddrs = vNULL;
   2248  1.1  mrg       data->alias_ddrs.safe_splice (*ddrs);
   2249  1.1  mrg     }
   2250  1.1  mrg }
   2251  1.1  mrg 
   2252  1.1  mrg /* Callback function for graph travesal algorithm.  It returns true
   2253  1.1  mrg    if edge E should skipped when traversing the graph.  */
   2254  1.1  mrg 
   2255  1.1  mrg static bool
   2256  1.1  mrg pg_skip_alias_edge (struct graph_edge *e)
   2257  1.1  mrg {
   2258  1.1  mrg   struct pg_edata *data = (struct pg_edata *)e->data;
   2259  1.1  mrg   return (data != NULL && data->alias_ddrs.length () > 0);
   2260  1.1  mrg }
   2261  1.1  mrg 
   2262  1.1  mrg /* Callback function freeing data attached to edge E of graph.  */
   2263  1.1  mrg 
   2264  1.1  mrg static void
   2265  1.1  mrg free_partition_graph_edata_cb (struct graph *, struct graph_edge *e, void *)
   2266  1.1  mrg {
   2267  1.1  mrg   if (e->data != NULL)
   2268  1.1  mrg     {
   2269  1.1  mrg       struct pg_edata *data = (struct pg_edata *)e->data;
   2270  1.1  mrg       data->alias_ddrs.release ();
   2271  1.1  mrg       delete data;
   2272  1.1  mrg     }
   2273  1.1  mrg }
   2274  1.1  mrg 
   2275  1.1  mrg /* Free data attached to vertice of partition dependence graph PG.  */
   2276  1.1  mrg 
   2277  1.1  mrg static void
   2278  1.1  mrg free_partition_graph_vdata (struct graph *pg)
   2279  1.1  mrg {
   2280  1.1  mrg   int i;
   2281  1.1  mrg   struct pg_vdata *data;
   2282  1.1  mrg 
   2283  1.1  mrg   for (i = 0; i < pg->n_vertices; ++i)
   2284  1.1  mrg     {
   2285  1.1  mrg       data = (struct pg_vdata *)pg->vertices[i].data;
   2286  1.1  mrg       delete data;
   2287  1.1  mrg     }
   2288  1.1  mrg }
   2289  1.1  mrg 
   2290  1.1  mrg /* Build and return partition dependence graph for PARTITIONS.  RDG is
   2291  1.1  mrg    reduced dependence graph for the loop to be distributed.  If IGNORE_ALIAS_P
   2292  1.1  mrg    is true, data dependence caused by possible alias between references
   2293  1.1  mrg    is ignored, as if it doesn't exist at all; otherwise all depdendences
   2294  1.1  mrg    are considered.  */
   2295  1.1  mrg 
   2296  1.1  mrg struct graph *
   2297  1.1  mrg loop_distribution::build_partition_graph (struct graph *rdg,
   2298  1.1  mrg 					  vec<struct partition *> *partitions,
   2299  1.1  mrg 					  bool ignore_alias_p)
   2300  1.1  mrg {
   2301  1.1  mrg   int i, j;
   2302  1.1  mrg   struct partition *partition1, *partition2;
   2303  1.1  mrg   graph *pg = new_graph (partitions->length ());
   2304  1.1  mrg   auto_vec<ddr_p> alias_ddrs, *alias_ddrs_p;
   2305  1.1  mrg 
   2306  1.1  mrg   alias_ddrs_p = ignore_alias_p ? NULL : &alias_ddrs;
   2307  1.1  mrg 
   2308  1.1  mrg   init_partition_graph_vertices (pg, partitions);
   2309  1.1  mrg 
   2310  1.1  mrg   for (i = 0; partitions->iterate (i, &partition1); ++i)
   2311  1.1  mrg     {
   2312  1.1  mrg       for (j = i + 1; partitions->iterate (j, &partition2); ++j)
   2313  1.1  mrg 	{
   2314  1.1  mrg 	  /* dependence direction - 0 is no dependence, -1 is back,
   2315  1.1  mrg 	     1 is forth, 2 is both (we can stop then, merging will occur).  */
   2316  1.1  mrg 	  int dir = 0;
   2317  1.1  mrg 
   2318  1.1  mrg 	  /* If the first partition has reduction, add back edge; if the
   2319  1.1  mrg 	     second partition has reduction, add forth edge.  This makes
   2320  1.1  mrg 	     sure that reduction partition will be sorted as the last one.  */
   2321  1.1  mrg 	  if (partition_reduction_p (partition1))
   2322  1.1  mrg 	    dir = -1;
   2323  1.1  mrg 	  else if (partition_reduction_p (partition2))
   2324  1.1  mrg 	    dir = 1;
   2325  1.1  mrg 
   2326  1.1  mrg 	  /* Cleanup the temporary vector.  */
   2327  1.1  mrg 	  alias_ddrs.truncate (0);
   2328  1.1  mrg 
   2329  1.1  mrg 	  dir = pg_add_dependence_edges (rdg, dir, partition1->datarefs,
   2330  1.1  mrg 					 partition2->datarefs, alias_ddrs_p);
   2331  1.1  mrg 
   2332  1.1  mrg 	  /* Add edge to partition graph if there exists dependence.  There
   2333  1.1  mrg 	     are two types of edges.  One type edge is caused by compilation
   2334  1.1  mrg 	     time known dependence, this type cannot be resolved by runtime
   2335  1.1  mrg 	     alias check.  The other type can be resolved by runtime alias
   2336  1.1  mrg 	     check.  */
   2337  1.1  mrg 	  if (dir == 1 || dir == 2
   2338  1.1  mrg 	      || alias_ddrs.length () > 0)
   2339  1.1  mrg 	    {
   2340  1.1  mrg 	      /* Attach data dependence relations to edge that can be resolved
   2341  1.1  mrg 		 by runtime alias check.  */
   2342  1.1  mrg 	      bool alias_edge_p = (dir != 1 && dir != 2);
   2343  1.1  mrg 	      add_partition_graph_edge (pg, i, j,
   2344  1.1  mrg 					(alias_edge_p) ? &alias_ddrs : NULL);
   2345  1.1  mrg 	    }
   2346  1.1  mrg 	  if (dir == -1 || dir == 2
   2347  1.1  mrg 	      || alias_ddrs.length () > 0)
   2348  1.1  mrg 	    {
   2349  1.1  mrg 	      /* Attach data dependence relations to edge that can be resolved
   2350  1.1  mrg 		 by runtime alias check.  */
   2351  1.1  mrg 	      bool alias_edge_p = (dir != -1 && dir != 2);
   2352  1.1  mrg 	      add_partition_graph_edge (pg, j, i,
   2353  1.1  mrg 					(alias_edge_p) ? &alias_ddrs : NULL);
   2354  1.1  mrg 	    }
   2355  1.1  mrg 	}
   2356  1.1  mrg     }
   2357  1.1  mrg   return pg;
   2358  1.1  mrg }
   2359  1.1  mrg 
   2360  1.1  mrg /* Sort partitions in PG in descending post order and store them in
   2361  1.1  mrg    PARTITIONS.  */
   2362  1.1  mrg 
   2363  1.1  mrg static void
   2364  1.1  mrg sort_partitions_by_post_order (struct graph *pg,
   2365  1.1  mrg 			       vec<struct partition *> *partitions)
   2366  1.1  mrg {
   2367  1.1  mrg   int i;
   2368  1.1  mrg   struct pg_vdata *data;
   2369  1.1  mrg 
   2370  1.1  mrg   /* Now order the remaining nodes in descending postorder.  */
   2371  1.1  mrg   qsort (pg->vertices, pg->n_vertices, sizeof (vertex), pgcmp);
   2372  1.1  mrg   partitions->truncate (0);
   2373  1.1  mrg   for (i = 0; i < pg->n_vertices; ++i)
   2374  1.1  mrg     {
   2375  1.1  mrg       data = (struct pg_vdata *)pg->vertices[i].data;
   2376  1.1  mrg       if (data->partition)
   2377  1.1  mrg 	partitions->safe_push (data->partition);
   2378  1.1  mrg     }
   2379  1.1  mrg }
   2380  1.1  mrg 
   2381  1.1  mrg void
   2382  1.1  mrg loop_distribution::merge_dep_scc_partitions (struct graph *rdg,
   2383  1.1  mrg 					     vec<struct partition *> *partitions,
   2384  1.1  mrg 					     bool ignore_alias_p)
   2385  1.1  mrg {
   2386  1.1  mrg   struct partition *partition1, *partition2;
   2387  1.1  mrg   struct pg_vdata *data;
   2388  1.1  mrg   graph *pg = build_partition_graph (rdg, partitions, ignore_alias_p);
   2389  1.1  mrg   int i, j, num_sccs = graphds_scc (pg, NULL);
   2390  1.1  mrg 
   2391  1.1  mrg   /* Strong connected compoenent means dependence cycle, we cannot distribute
   2392  1.1  mrg      them.  So fuse them together.  */
   2393  1.1  mrg   if ((unsigned) num_sccs < partitions->length ())
   2394  1.1  mrg     {
   2395  1.1  mrg       for (i = 0; i < num_sccs; ++i)
   2396  1.1  mrg 	{
   2397  1.1  mrg 	  for (j = 0; partitions->iterate (j, &partition1); ++j)
   2398  1.1  mrg 	    if (pg->vertices[j].component == i)
   2399  1.1  mrg 	      break;
   2400  1.1  mrg 	  for (j = j + 1; partitions->iterate (j, &partition2); ++j)
   2401  1.1  mrg 	    if (pg->vertices[j].component == i)
   2402  1.1  mrg 	      {
   2403  1.1  mrg 		partition_merge_into (NULL, partition1,
   2404  1.1  mrg 				      partition2, FUSE_SAME_SCC);
   2405  1.1  mrg 		partition1->type = PTYPE_SEQUENTIAL;
   2406  1.1  mrg 		(*partitions)[j] = NULL;
   2407  1.1  mrg 		partition_free (partition2);
   2408  1.1  mrg 		data = (struct pg_vdata *)pg->vertices[j].data;
   2409  1.1  mrg 		data->partition = NULL;
   2410  1.1  mrg 	      }
   2411  1.1  mrg 	}
   2412  1.1  mrg     }
   2413  1.1  mrg 
   2414  1.1  mrg   sort_partitions_by_post_order (pg, partitions);
   2415  1.1  mrg   gcc_assert (partitions->length () == (unsigned)num_sccs);
   2416  1.1  mrg   free_partition_graph_vdata (pg);
   2417  1.1  mrg   for_each_edge (pg, free_partition_graph_edata_cb, NULL);
   2418  1.1  mrg   free_graph (pg);
   2419  1.1  mrg }
   2420  1.1  mrg 
   2421  1.1  mrg /* Callback function for traversing edge E in graph G.  DATA is private
   2422  1.1  mrg    callback data.  */
   2423  1.1  mrg 
   2424  1.1  mrg static void
   2425  1.1  mrg pg_collect_alias_ddrs (struct graph *g, struct graph_edge *e, void *data)
   2426  1.1  mrg {
   2427  1.1  mrg   int i, j, component;
   2428  1.1  mrg   struct pg_edge_callback_data *cbdata;
   2429  1.1  mrg   struct pg_edata *edata = (struct pg_edata *) e->data;
   2430  1.1  mrg 
   2431  1.1  mrg   /* If the edge doesn't have attached data dependence, it represents
   2432  1.1  mrg      compilation time known dependences.  This type dependence cannot
   2433  1.1  mrg      be resolved by runtime alias check.  */
   2434  1.1  mrg   if (edata == NULL || edata->alias_ddrs.length () == 0)
   2435  1.1  mrg     return;
   2436  1.1  mrg 
   2437  1.1  mrg   cbdata = (struct pg_edge_callback_data *) data;
   2438  1.1  mrg   i = e->src;
   2439  1.1  mrg   j = e->dest;
   2440  1.1  mrg   component = cbdata->vertices_component[i];
   2441  1.1  mrg   /* Vertices are topologically sorted according to compilation time
   2442  1.1  mrg      known dependences, so we can break strong connected components
   2443  1.1  mrg      by removing edges of the opposite direction, i.e, edges pointing
   2444  1.1  mrg      from vertice with smaller post number to vertice with bigger post
   2445  1.1  mrg      number.  */
   2446  1.1  mrg   if (g->vertices[i].post < g->vertices[j].post
   2447  1.1  mrg       /* We only need to remove edges connecting vertices in the same
   2448  1.1  mrg 	 strong connected component to break it.  */
   2449  1.1  mrg       && component == cbdata->vertices_component[j]
   2450  1.1  mrg       /* Check if we want to break the strong connected component or not.  */
   2451  1.1  mrg       && !bitmap_bit_p (cbdata->sccs_to_merge, component))
   2452  1.1  mrg     cbdata->alias_ddrs->safe_splice (edata->alias_ddrs);
   2453  1.1  mrg }
   2454  1.1  mrg 
   2455  1.1  mrg /* Callback function for traversing edge E.  DATA is private
   2456  1.1  mrg    callback data.  */
   2457  1.1  mrg 
   2458  1.1  mrg static void
   2459  1.1  mrg pg_unmark_merged_alias_ddrs (struct graph *, struct graph_edge *e, void *data)
   2460  1.1  mrg {
   2461  1.1  mrg   int i, j, component;
   2462  1.1  mrg   struct pg_edge_callback_data *cbdata;
   2463  1.1  mrg   struct pg_edata *edata = (struct pg_edata *) e->data;
   2464  1.1  mrg 
   2465  1.1  mrg   if (edata == NULL || edata->alias_ddrs.length () == 0)
   2466  1.1  mrg     return;
   2467  1.1  mrg 
   2468  1.1  mrg   cbdata = (struct pg_edge_callback_data *) data;
   2469  1.1  mrg   i = e->src;
   2470  1.1  mrg   j = e->dest;
   2471  1.1  mrg   component = cbdata->vertices_component[i];
   2472  1.1  mrg   /* Make sure to not skip vertices inside SCCs we are going to merge.  */
   2473  1.1  mrg   if (component == cbdata->vertices_component[j]
   2474  1.1  mrg       && bitmap_bit_p (cbdata->sccs_to_merge, component))
   2475  1.1  mrg     {
   2476  1.1  mrg       edata->alias_ddrs.release ();
   2477  1.1  mrg       delete edata;
   2478  1.1  mrg       e->data = NULL;
   2479  1.1  mrg     }
   2480  1.1  mrg }
   2481  1.1  mrg 
   2482  1.1  mrg /* This is the main function breaking strong conected components in
   2483  1.1  mrg    PARTITIONS giving reduced depdendence graph RDG.  Store data dependence
   2484  1.1  mrg    relations for runtime alias check in ALIAS_DDRS.  */
   2485  1.1  mrg void
   2486  1.1  mrg loop_distribution::break_alias_scc_partitions (struct graph *rdg,
   2487  1.1  mrg 					       vec<struct partition *> *partitions,
   2488  1.1  mrg 					       vec<ddr_p> *alias_ddrs)
   2489  1.1  mrg {
   2490  1.1  mrg   int i, j, k, num_sccs, num_sccs_no_alias = 0;
   2491  1.1  mrg   /* Build partition dependence graph.  */
   2492  1.1  mrg   graph *pg = build_partition_graph (rdg, partitions, false);
   2493  1.1  mrg 
   2494  1.1  mrg   alias_ddrs->truncate (0);
   2495  1.1  mrg   /* Find strong connected components in the graph, with all dependence edges
   2496  1.1  mrg      considered.  */
   2497  1.1  mrg   num_sccs = graphds_scc (pg, NULL);
   2498  1.1  mrg   /* All SCCs now can be broken by runtime alias checks because SCCs caused by
   2499  1.1  mrg      compilation time known dependences are merged before this function.  */
   2500  1.1  mrg   if ((unsigned) num_sccs < partitions->length ())
   2501  1.1  mrg     {
   2502  1.1  mrg       struct pg_edge_callback_data cbdata;
   2503  1.1  mrg       auto_bitmap sccs_to_merge;
   2504  1.1  mrg       auto_vec<enum partition_type> scc_types;
   2505  1.1  mrg       struct partition *partition, *first;
   2506  1.1  mrg 
   2507  1.1  mrg       /* If all partitions in a SCC have the same type, we can simply merge the
   2508  1.1  mrg 	 SCC.  This loop finds out such SCCS and record them in bitmap.  */
   2509  1.1  mrg       bitmap_set_range (sccs_to_merge, 0, (unsigned) num_sccs);
   2510  1.1  mrg       for (i = 0; i < num_sccs; ++i)
   2511  1.1  mrg 	{
   2512  1.1  mrg 	  for (j = 0; partitions->iterate (j, &first); ++j)
   2513  1.1  mrg 	    if (pg->vertices[j].component == i)
   2514  1.1  mrg 	      break;
   2515  1.1  mrg 
   2516  1.1  mrg 	  bool same_type = true, all_builtins = partition_builtin_p (first);
   2517  1.1  mrg 	  for (++j; partitions->iterate (j, &partition); ++j)
   2518  1.1  mrg 	    {
   2519  1.1  mrg 	      if (pg->vertices[j].component != i)
   2520  1.1  mrg 		continue;
   2521  1.1  mrg 
   2522  1.1  mrg 	      if (first->type != partition->type)
   2523  1.1  mrg 		{
   2524  1.1  mrg 		  same_type = false;
   2525  1.1  mrg 		  break;
   2526  1.1  mrg 		}
   2527  1.1  mrg 	      all_builtins &= partition_builtin_p (partition);
   2528  1.1  mrg 	    }
   2529  1.1  mrg 	  /* Merge SCC if all partitions in SCC have the same type, though the
   2530  1.1  mrg 	     result partition is sequential, because vectorizer can do better
   2531  1.1  mrg 	     runtime alias check.  One expecption is all partitions in SCC are
   2532  1.1  mrg 	     builtins.  */
   2533  1.1  mrg 	  if (!same_type || all_builtins)
   2534  1.1  mrg 	    bitmap_clear_bit (sccs_to_merge, i);
   2535  1.1  mrg 	}
   2536  1.1  mrg 
   2537  1.1  mrg       /* Initialize callback data for traversing.  */
   2538  1.1  mrg       cbdata.sccs_to_merge = sccs_to_merge;
   2539  1.1  mrg       cbdata.alias_ddrs = alias_ddrs;
   2540  1.1  mrg       cbdata.vertices_component = XNEWVEC (int, pg->n_vertices);
   2541  1.1  mrg       /* Record the component information which will be corrupted by next
   2542  1.1  mrg 	 graph scc finding call.  */
   2543  1.1  mrg       for (i = 0; i < pg->n_vertices; ++i)
   2544  1.1  mrg 	cbdata.vertices_component[i] = pg->vertices[i].component;
   2545  1.1  mrg 
   2546  1.1  mrg       /* Collect data dependences for runtime alias checks to break SCCs.  */
   2547  1.1  mrg       if (bitmap_count_bits (sccs_to_merge) != (unsigned) num_sccs)
   2548  1.1  mrg 	{
   2549  1.1  mrg 	  /* For SCCs we want to merge clear all alias_ddrs for edges
   2550  1.1  mrg 	     inside the component.  */
   2551  1.1  mrg 	  for_each_edge (pg, pg_unmark_merged_alias_ddrs, &cbdata);
   2552  1.1  mrg 
   2553  1.1  mrg 	  /* Run SCC finding algorithm again, with alias dependence edges
   2554  1.1  mrg 	     skipped.  This is to topologically sort partitions according to
   2555  1.1  mrg 	     compilation time known dependence.  Note the topological order
   2556  1.1  mrg 	     is stored in the form of pg's post order number.  */
   2557  1.1  mrg 	  num_sccs_no_alias = graphds_scc (pg, NULL, pg_skip_alias_edge);
   2558  1.1  mrg 	  /* We cannot assert partitions->length () == num_sccs_no_alias
   2559  1.1  mrg 	     since we are not ignoring alias edges in cycles we are
   2560  1.1  mrg 	     going to merge.  That's required to compute correct postorder.  */
   2561  1.1  mrg 	  /* With topological order, we can construct two subgraphs L and R.
   2562  1.1  mrg 	     L contains edge <x, y> where x < y in terms of post order, while
   2563  1.1  mrg 	     R contains edge <x, y> where x > y.  Edges for compilation time
   2564  1.1  mrg 	     known dependence all fall in R, so we break SCCs by removing all
   2565  1.1  mrg 	     (alias) edges of in subgraph L.  */
   2566  1.1  mrg 	  for_each_edge (pg, pg_collect_alias_ddrs, &cbdata);
   2567  1.1  mrg 	}
   2568  1.1  mrg 
   2569  1.1  mrg       /* For SCC that doesn't need to be broken, merge it.  */
   2570  1.1  mrg       for (i = 0; i < num_sccs; ++i)
   2571  1.1  mrg 	{
   2572  1.1  mrg 	  if (!bitmap_bit_p (sccs_to_merge, i))
   2573  1.1  mrg 	    continue;
   2574  1.1  mrg 
   2575  1.1  mrg 	  for (j = 0; partitions->iterate (j, &first); ++j)
   2576  1.1  mrg 	    if (cbdata.vertices_component[j] == i)
   2577  1.1  mrg 	      break;
   2578  1.1  mrg 	  for (k = j + 1; partitions->iterate (k, &partition); ++k)
   2579  1.1  mrg 	    {
   2580  1.1  mrg 	      struct pg_vdata *data;
   2581  1.1  mrg 
   2582  1.1  mrg 	      if (cbdata.vertices_component[k] != i)
   2583  1.1  mrg 		continue;
   2584  1.1  mrg 
   2585  1.1  mrg 	      partition_merge_into (NULL, first, partition, FUSE_SAME_SCC);
   2586  1.1  mrg 	      (*partitions)[k] = NULL;
   2587  1.1  mrg 	      partition_free (partition);
   2588  1.1  mrg 	      data = (struct pg_vdata *)pg->vertices[k].data;
   2589  1.1  mrg 	      gcc_assert (data->id == k);
   2590  1.1  mrg 	      data->partition = NULL;
   2591  1.1  mrg 	      /* The result partition of merged SCC must be sequential.  */
   2592  1.1  mrg 	      first->type = PTYPE_SEQUENTIAL;
   2593  1.1  mrg 	    }
   2594  1.1  mrg 	}
   2595  1.1  mrg       /* If reduction partition's SCC is broken by runtime alias checks,
   2596  1.1  mrg 	 we force a negative post order to it making sure it will be scheduled
   2597  1.1  mrg 	 in the last.  */
   2598  1.1  mrg       if (num_sccs_no_alias > 0)
   2599  1.1  mrg 	{
   2600  1.1  mrg 	  j = -1;
   2601  1.1  mrg 	  for (i = 0; i < pg->n_vertices; ++i)
   2602  1.1  mrg 	    {
   2603  1.1  mrg 	      struct pg_vdata *data = (struct pg_vdata *)pg->vertices[i].data;
   2604  1.1  mrg 	      if (data->partition && partition_reduction_p (data->partition))
   2605  1.1  mrg 		{
   2606  1.1  mrg 		  gcc_assert (j == -1);
   2607  1.1  mrg 		  j = i;
   2608  1.1  mrg 		}
   2609  1.1  mrg 	    }
   2610  1.1  mrg 	  if (j >= 0)
   2611  1.1  mrg 	    pg->vertices[j].post = -1;
   2612  1.1  mrg 	}
   2613  1.1  mrg 
   2614  1.1  mrg       free (cbdata.vertices_component);
   2615  1.1  mrg     }
   2616  1.1  mrg 
   2617  1.1  mrg   sort_partitions_by_post_order (pg, partitions);
   2618  1.1  mrg   free_partition_graph_vdata (pg);
   2619  1.1  mrg   for_each_edge (pg, free_partition_graph_edata_cb, NULL);
   2620  1.1  mrg   free_graph (pg);
   2621  1.1  mrg 
   2622  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   2623  1.1  mrg     {
   2624  1.1  mrg       fprintf (dump_file, "Possible alias data dependence to break:\n");
   2625  1.1  mrg       dump_data_dependence_relations (dump_file, *alias_ddrs);
   2626  1.1  mrg     }
   2627  1.1  mrg }
   2628  1.1  mrg 
   2629  1.1  mrg /* Compute and return an expression whose value is the segment length which
   2630  1.1  mrg    will be accessed by DR in NITERS iterations.  */
   2631  1.1  mrg 
   2632  1.1  mrg static tree
   2633  1.1  mrg data_ref_segment_size (struct data_reference *dr, tree niters)
   2634  1.1  mrg {
   2635  1.1  mrg   niters = size_binop (MINUS_EXPR,
   2636  1.1  mrg 		       fold_convert (sizetype, niters),
   2637  1.1  mrg 		       size_one_node);
   2638  1.1  mrg   return size_binop (MULT_EXPR,
   2639  1.1  mrg 		     fold_convert (sizetype, DR_STEP (dr)),
   2640  1.1  mrg 		     fold_convert (sizetype, niters));
   2641  1.1  mrg }
   2642  1.1  mrg 
   2643  1.1  mrg /* Return true if LOOP's latch is dominated by statement for data reference
   2644  1.1  mrg    DR.  */
   2645  1.1  mrg 
   2646  1.1  mrg static inline bool
   2647  1.1  mrg latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
   2648  1.1  mrg {
   2649  1.1  mrg   return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
   2650  1.1  mrg 			 gimple_bb (DR_STMT (dr)));
   2651  1.1  mrg }
   2652  1.1  mrg 
   2653  1.1  mrg /* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
   2654  1.1  mrg    data dependence relations ALIAS_DDRS.  */
   2655  1.1  mrg 
   2656  1.1  mrg static void
   2657  1.1  mrg compute_alias_check_pairs (class loop *loop, vec<ddr_p> *alias_ddrs,
   2658  1.1  mrg 			   vec<dr_with_seg_len_pair_t> *comp_alias_pairs)
   2659  1.1  mrg {
   2660  1.1  mrg   unsigned int i;
   2661  1.1  mrg   unsigned HOST_WIDE_INT factor = 1;
   2662  1.1  mrg   tree niters_plus_one, niters = number_of_latch_executions (loop);
   2663  1.1  mrg 
   2664  1.1  mrg   gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
   2665  1.1  mrg   niters = fold_convert (sizetype, niters);
   2666  1.1  mrg   niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
   2667  1.1  mrg 
   2668  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   2669  1.1  mrg     fprintf (dump_file, "Creating alias check pairs:\n");
   2670  1.1  mrg 
   2671  1.1  mrg   /* Iterate all data dependence relations and compute alias check pairs.  */
   2672  1.1  mrg   for (i = 0; i < alias_ddrs->length (); i++)
   2673  1.1  mrg     {
   2674  1.1  mrg       ddr_p ddr = (*alias_ddrs)[i];
   2675  1.1  mrg       struct data_reference *dr_a = DDR_A (ddr);
   2676  1.1  mrg       struct data_reference *dr_b = DDR_B (ddr);
   2677  1.1  mrg       tree seg_length_a, seg_length_b;
   2678  1.1  mrg 
   2679  1.1  mrg       if (latch_dominated_by_data_ref (loop, dr_a))
   2680  1.1  mrg 	seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
   2681  1.1  mrg       else
   2682  1.1  mrg 	seg_length_a = data_ref_segment_size (dr_a, niters);
   2683  1.1  mrg 
   2684  1.1  mrg       if (latch_dominated_by_data_ref (loop, dr_b))
   2685  1.1  mrg 	seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
   2686  1.1  mrg       else
   2687  1.1  mrg 	seg_length_b = data_ref_segment_size (dr_b, niters);
   2688  1.1  mrg 
   2689  1.1  mrg       unsigned HOST_WIDE_INT access_size_a
   2690  1.1  mrg 	= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
   2691  1.1  mrg       unsigned HOST_WIDE_INT access_size_b
   2692  1.1  mrg 	= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
   2693  1.1  mrg       unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
   2694  1.1  mrg       unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
   2695  1.1  mrg 
   2696  1.1  mrg       dr_with_seg_len_pair_t dr_with_seg_len_pair
   2697  1.1  mrg 	(dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
   2698  1.1  mrg 	 dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
   2699  1.1  mrg 	 /* ??? Would WELL_ORDERED be safe?  */
   2700  1.1  mrg 	 dr_with_seg_len_pair_t::REORDERED);
   2701  1.1  mrg 
   2702  1.1  mrg       comp_alias_pairs->safe_push (dr_with_seg_len_pair);
   2703  1.1  mrg     }
   2704  1.1  mrg 
   2705  1.1  mrg   if (tree_fits_uhwi_p (niters))
   2706  1.1  mrg     factor = tree_to_uhwi (niters);
   2707  1.1  mrg 
   2708  1.1  mrg   /* Prune alias check pairs.  */
   2709  1.1  mrg   prune_runtime_alias_test_list (comp_alias_pairs, factor);
   2710  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   2711  1.1  mrg     fprintf (dump_file,
   2712  1.1  mrg 	     "Improved number of alias checks from %d to %d\n",
   2713  1.1  mrg 	     alias_ddrs->length (), comp_alias_pairs->length ());
   2714  1.1  mrg }
   2715  1.1  mrg 
   2716  1.1  mrg /* Given data dependence relations in ALIAS_DDRS, generate runtime alias
   2717  1.1  mrg    checks and version LOOP under condition of these runtime alias checks.  */
   2718  1.1  mrg 
   2719  1.1  mrg static void
   2720  1.1  mrg version_loop_by_alias_check (vec<struct partition *> *partitions,
   2721  1.1  mrg 			     class loop *loop, vec<ddr_p> *alias_ddrs)
   2722  1.1  mrg {
   2723  1.1  mrg   profile_probability prob;
   2724  1.1  mrg   basic_block cond_bb;
   2725  1.1  mrg   class loop *nloop;
   2726  1.1  mrg   tree lhs, arg0, cond_expr = NULL_TREE;
   2727  1.1  mrg   gimple_seq cond_stmts = NULL;
   2728  1.1  mrg   gimple *call_stmt = NULL;
   2729  1.1  mrg   auto_vec<dr_with_seg_len_pair_t> comp_alias_pairs;
   2730  1.1  mrg 
   2731  1.1  mrg   /* Generate code for runtime alias checks if necessary.  */
   2732  1.1  mrg   gcc_assert (alias_ddrs->length () > 0);
   2733  1.1  mrg 
   2734  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   2735  1.1  mrg     fprintf (dump_file,
   2736  1.1  mrg 	     "Version loop <%d> with runtime alias check\n", loop->num);
   2737  1.1  mrg 
   2738  1.1  mrg   compute_alias_check_pairs (loop, alias_ddrs, &comp_alias_pairs);
   2739  1.1  mrg   create_runtime_alias_checks (loop, &comp_alias_pairs, &cond_expr);
   2740  1.1  mrg   cond_expr = force_gimple_operand_1 (cond_expr, &cond_stmts,
   2741  1.1  mrg 				      is_gimple_val, NULL_TREE);
   2742  1.1  mrg 
   2743  1.1  mrg   /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
   2744  1.1  mrg   bool cancelable_p = flag_tree_loop_vectorize;
   2745  1.1  mrg   if (cancelable_p)
   2746  1.1  mrg     {
   2747  1.1  mrg       unsigned i = 0;
   2748  1.1  mrg       struct partition *partition;
   2749  1.1  mrg       for (; partitions->iterate (i, &partition); ++i)
   2750  1.1  mrg 	if (!partition_builtin_p (partition))
   2751  1.1  mrg 	  break;
   2752  1.1  mrg 
   2753  1.1  mrg      /* If all partitions are builtins, distributing it would be profitable and
   2754  1.1  mrg 	we don't want to cancel the runtime alias checks.  */
   2755  1.1  mrg       if (i == partitions->length ())
   2756  1.1  mrg 	cancelable_p = false;
   2757  1.1  mrg     }
   2758  1.1  mrg 
   2759  1.1  mrg   /* Generate internal function call for loop distribution alias check if the
   2760  1.1  mrg      runtime alias check should be cancelable.  */
   2761  1.1  mrg   if (cancelable_p)
   2762  1.1  mrg     {
   2763  1.1  mrg       call_stmt = gimple_build_call_internal (IFN_LOOP_DIST_ALIAS,
   2764  1.1  mrg 					      2, NULL_TREE, cond_expr);
   2765  1.1  mrg       lhs = make_ssa_name (boolean_type_node);
   2766  1.1  mrg       gimple_call_set_lhs (call_stmt, lhs);
   2767  1.1  mrg     }
   2768  1.1  mrg   else
   2769  1.1  mrg     lhs = cond_expr;
   2770  1.1  mrg 
   2771  1.1  mrg   prob = profile_probability::guessed_always ().apply_scale (9, 10);
   2772  1.1  mrg   initialize_original_copy_tables ();
   2773  1.1  mrg   nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
   2774  1.1  mrg 			prob, prob.invert (), true);
   2775  1.1  mrg   free_original_copy_tables ();
   2776  1.1  mrg   /* Record the original loop number in newly generated loops.  In case of
   2777  1.1  mrg      distribution, the original loop will be distributed and the new loop
   2778  1.1  mrg      is kept.  */
   2779  1.1  mrg   loop->orig_loop_num = nloop->num;
   2780  1.1  mrg   nloop->orig_loop_num = nloop->num;
   2781  1.1  mrg   nloop->dont_vectorize = true;
   2782  1.1  mrg   nloop->force_vectorize = false;
   2783  1.1  mrg 
   2784  1.1  mrg   if (call_stmt)
   2785  1.1  mrg     {
   2786  1.1  mrg       /* Record new loop's num in IFN_LOOP_DIST_ALIAS because the original
   2787  1.1  mrg 	 loop could be destroyed.  */
   2788  1.1  mrg       arg0 = build_int_cst (integer_type_node, loop->orig_loop_num);
   2789  1.1  mrg       gimple_call_set_arg (call_stmt, 0, arg0);
   2790  1.1  mrg       gimple_seq_add_stmt_without_update (&cond_stmts, call_stmt);
   2791  1.1  mrg     }
   2792  1.1  mrg 
   2793  1.1  mrg   if (cond_stmts)
   2794  1.1  mrg     {
   2795  1.1  mrg       gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb);
   2796  1.1  mrg       gsi_insert_seq_before (&cond_gsi, cond_stmts, GSI_SAME_STMT);
   2797  1.1  mrg     }
   2798  1.1  mrg   update_ssa (TODO_update_ssa);
   2799  1.1  mrg }
   2800  1.1  mrg 
   2801  1.1  mrg /* Return true if loop versioning is needed to distrubute PARTITIONS.
   2802  1.1  mrg    ALIAS_DDRS are data dependence relations for runtime alias check.  */
   2803  1.1  mrg 
   2804  1.1  mrg static inline bool
   2805  1.1  mrg version_for_distribution_p (vec<struct partition *> *partitions,
   2806  1.1  mrg 			    vec<ddr_p> *alias_ddrs)
   2807  1.1  mrg {
   2808  1.1  mrg   /* No need to version loop if we have only one partition.  */
   2809  1.1  mrg   if (partitions->length () == 1)
   2810  1.1  mrg     return false;
   2811  1.1  mrg 
   2812  1.1  mrg   /* Need to version loop if runtime alias check is necessary.  */
   2813  1.1  mrg   return (alias_ddrs->length () > 0);
   2814  1.1  mrg }
   2815  1.1  mrg 
   2816  1.1  mrg /* Compare base offset of builtin mem* partitions P1 and P2.  */
   2817  1.1  mrg 
   2818  1.1  mrg static int
   2819  1.1  mrg offset_cmp (const void *vp1, const void *vp2)
   2820  1.1  mrg {
   2821  1.1  mrg   struct partition *p1 = *(struct partition *const *) vp1;
   2822  1.1  mrg   struct partition *p2 = *(struct partition *const *) vp2;
   2823  1.1  mrg   unsigned HOST_WIDE_INT o1 = p1->builtin->dst_base_offset;
   2824  1.1  mrg   unsigned HOST_WIDE_INT o2 = p2->builtin->dst_base_offset;
   2825  1.1  mrg   return (o2 < o1) - (o1 < o2);
   2826  1.1  mrg }
   2827  1.1  mrg 
   2828  1.1  mrg /* Fuse adjacent memset builtin PARTITIONS if possible.  This is a special
   2829  1.1  mrg    case optimization transforming below code:
   2830  1.1  mrg 
   2831  1.1  mrg      __builtin_memset (&obj, 0, 100);
   2832  1.1  mrg      _1 = &obj + 100;
   2833  1.1  mrg      __builtin_memset (_1, 0, 200);
   2834  1.1  mrg      _2 = &obj + 300;
   2835  1.1  mrg      __builtin_memset (_2, 0, 100);
   2836  1.1  mrg 
   2837  1.1  mrg    into:
   2838  1.1  mrg 
   2839  1.1  mrg      __builtin_memset (&obj, 0, 400);
   2840  1.1  mrg 
   2841  1.1  mrg    Note we don't have dependence information between different partitions
   2842  1.1  mrg    at this point, as a result, we can't handle nonadjacent memset builtin
   2843  1.1  mrg    partitions since dependence might be broken.  */
   2844  1.1  mrg 
   2845  1.1  mrg static void
   2846  1.1  mrg fuse_memset_builtins (vec<struct partition *> *partitions)
   2847  1.1  mrg {
   2848  1.1  mrg   unsigned i, j;
   2849  1.1  mrg   struct partition *part1, *part2;
   2850  1.1  mrg   tree rhs1, rhs2;
   2851  1.1  mrg 
   2852  1.1  mrg   for (i = 0; partitions->iterate (i, &part1);)
   2853  1.1  mrg     {
   2854  1.1  mrg       if (part1->kind != PKIND_MEMSET)
   2855  1.1  mrg 	{
   2856  1.1  mrg 	  i++;
   2857  1.1  mrg 	  continue;
   2858  1.1  mrg 	}
   2859  1.1  mrg 
   2860  1.1  mrg       /* Find sub-array of memset builtins of the same base.  Index range
   2861  1.1  mrg 	 of the sub-array is [i, j) with "j > i".  */
   2862  1.1  mrg       for (j = i + 1; partitions->iterate (j, &part2); ++j)
   2863  1.1  mrg 	{
   2864  1.1  mrg 	  if (part2->kind != PKIND_MEMSET
   2865  1.1  mrg 	      || !operand_equal_p (part1->builtin->dst_base_base,
   2866  1.1  mrg 				   part2->builtin->dst_base_base, 0))
   2867  1.1  mrg 	    break;
   2868  1.1  mrg 
   2869  1.1  mrg 	  /* Memset calls setting different values can't be merged.  */
   2870  1.1  mrg 	  rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
   2871  1.1  mrg 	  rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
   2872  1.1  mrg 	  if (!operand_equal_p (rhs1, rhs2, 0))
   2873  1.1  mrg 	    break;
   2874  1.1  mrg 	}
   2875  1.1  mrg 
   2876  1.1  mrg       /* Stable sort is required in order to avoid breaking dependence.  */
   2877  1.1  mrg       gcc_stablesort (&(*partitions)[i], j - i, sizeof (*partitions)[i],
   2878  1.1  mrg 		      offset_cmp);
   2879  1.1  mrg       /* Continue with next partition.  */
   2880  1.1  mrg       i = j;
   2881  1.1  mrg     }
   2882  1.1  mrg 
   2883  1.1  mrg   /* Merge all consecutive memset builtin partitions.  */
   2884  1.1  mrg   for (i = 0; i < partitions->length () - 1;)
   2885  1.1  mrg     {
   2886  1.1  mrg       part1 = (*partitions)[i];
   2887  1.1  mrg       if (part1->kind != PKIND_MEMSET)
   2888  1.1  mrg 	{
   2889  1.1  mrg 	  i++;
   2890  1.1  mrg 	  continue;
   2891  1.1  mrg 	}
   2892  1.1  mrg 
   2893  1.1  mrg       part2 = (*partitions)[i + 1];
   2894  1.1  mrg       /* Only merge memset partitions of the same base and with constant
   2895  1.1  mrg 	 access sizes.  */
   2896  1.1  mrg       if (part2->kind != PKIND_MEMSET
   2897  1.1  mrg 	  || TREE_CODE (part1->builtin->size) != INTEGER_CST
   2898  1.1  mrg 	  || TREE_CODE (part2->builtin->size) != INTEGER_CST
   2899  1.1  mrg 	  || !operand_equal_p (part1->builtin->dst_base_base,
   2900  1.1  mrg 			       part2->builtin->dst_base_base, 0))
   2901  1.1  mrg 	{
   2902  1.1  mrg 	  i++;
   2903  1.1  mrg 	  continue;
   2904  1.1  mrg 	}
   2905  1.1  mrg       rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
   2906  1.1  mrg       rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
   2907  1.1  mrg       int bytev1 = const_with_all_bytes_same (rhs1);
   2908  1.1  mrg       int bytev2 = const_with_all_bytes_same (rhs2);
   2909  1.1  mrg       /* Only merge memset partitions of the same value.  */
   2910  1.1  mrg       if (bytev1 != bytev2 || bytev1 == -1)
   2911  1.1  mrg 	{
   2912  1.1  mrg 	  i++;
   2913  1.1  mrg 	  continue;
   2914  1.1  mrg 	}
   2915  1.1  mrg       wide_int end1 = wi::add (part1->builtin->dst_base_offset,
   2916  1.1  mrg 			       wi::to_wide (part1->builtin->size));
   2917  1.1  mrg       /* Only merge adjacent memset partitions.  */
   2918  1.1  mrg       if (wi::ne_p (end1, part2->builtin->dst_base_offset))
   2919  1.1  mrg 	{
   2920  1.1  mrg 	  i++;
   2921  1.1  mrg 	  continue;
   2922  1.1  mrg 	}
   2923  1.1  mrg       /* Merge partitions[i] and partitions[i+1].  */
   2924  1.1  mrg       part1->builtin->size = fold_build2 (PLUS_EXPR, sizetype,
   2925  1.1  mrg 					  part1->builtin->size,
   2926  1.1  mrg 					  part2->builtin->size);
   2927  1.1  mrg       partition_free (part2);
   2928  1.1  mrg       partitions->ordered_remove (i + 1);
   2929  1.1  mrg     }
   2930  1.1  mrg }
   2931  1.1  mrg 
   2932  1.1  mrg void
   2933  1.1  mrg loop_distribution::finalize_partitions (class loop *loop,
   2934  1.1  mrg 					vec<struct partition *> *partitions,
   2935  1.1  mrg 					vec<ddr_p> *alias_ddrs)
   2936  1.1  mrg {
   2937  1.1  mrg   unsigned i;
   2938  1.1  mrg   struct partition *partition, *a;
   2939  1.1  mrg 
   2940  1.1  mrg   if (partitions->length () == 1
   2941  1.1  mrg       || alias_ddrs->length () > 0)
   2942  1.1  mrg     return;
   2943  1.1  mrg 
   2944  1.1  mrg   unsigned num_builtin = 0, num_normal = 0, num_partial_memset = 0;
   2945  1.1  mrg   bool same_type_p = true;
   2946  1.1  mrg   enum partition_type type = ((*partitions)[0])->type;
   2947  1.1  mrg   for (i = 0; partitions->iterate (i, &partition); ++i)
   2948  1.1  mrg     {
   2949  1.1  mrg       same_type_p &= (type == partition->type);
   2950  1.1  mrg       if (partition_builtin_p (partition))
   2951  1.1  mrg 	{
   2952  1.1  mrg 	  num_builtin++;
   2953  1.1  mrg 	  continue;
   2954  1.1  mrg 	}
   2955  1.1  mrg       num_normal++;
   2956  1.1  mrg       if (partition->kind == PKIND_PARTIAL_MEMSET)
   2957  1.1  mrg 	num_partial_memset++;
   2958  1.1  mrg     }
   2959  1.1  mrg 
   2960  1.1  mrg   /* Don't distribute current loop into too many loops given we don't have
   2961  1.1  mrg      memory stream cost model.  Be even more conservative in case of loop
   2962  1.1  mrg      nest distribution.  */
   2963  1.1  mrg   if ((same_type_p && num_builtin == 0
   2964  1.1  mrg        && (loop->inner == NULL || num_normal != 2 || num_partial_memset != 1))
   2965  1.1  mrg       || (loop->inner != NULL
   2966  1.1  mrg 	  && i >= NUM_PARTITION_THRESHOLD && num_normal > 1)
   2967  1.1  mrg       || (loop->inner == NULL
   2968  1.1  mrg 	  && i >= NUM_PARTITION_THRESHOLD && num_normal > num_builtin))
   2969  1.1  mrg     {
   2970  1.1  mrg       a = (*partitions)[0];
   2971  1.1  mrg       for (i = 1; partitions->iterate (i, &partition); ++i)
   2972  1.1  mrg 	{
   2973  1.1  mrg 	  partition_merge_into (NULL, a, partition, FUSE_FINALIZE);
   2974  1.1  mrg 	  partition_free (partition);
   2975  1.1  mrg 	}
   2976  1.1  mrg       partitions->truncate (1);
   2977  1.1  mrg     }
   2978  1.1  mrg 
   2979  1.1  mrg   /* Fuse memset builtins if possible.  */
   2980  1.1  mrg   if (partitions->length () > 1)
   2981  1.1  mrg     fuse_memset_builtins (partitions);
   2982  1.1  mrg }
   2983  1.1  mrg 
   2984  1.1  mrg /* Distributes the code from LOOP in such a way that producer statements
   2985  1.1  mrg    are placed before consumer statements.  Tries to separate only the
   2986  1.1  mrg    statements from STMTS into separate loops.  Returns the number of
   2987  1.1  mrg    distributed loops.  Set NB_CALLS to number of generated builtin calls.
   2988  1.1  mrg    Set *DESTROY_P to whether LOOP needs to be destroyed.  */
   2989  1.1  mrg 
   2990  1.1  mrg int
   2991  1.1  mrg loop_distribution::distribute_loop (class loop *loop,
   2992  1.1  mrg 		 const vec<gimple *> &stmts,
   2993  1.1  mrg 		 control_dependences *cd, int *nb_calls, bool *destroy_p,
   2994  1.1  mrg 		 bool only_patterns_p)
   2995  1.1  mrg {
   2996  1.1  mrg   ddrs_table = new hash_table<ddr_hasher> (389);
   2997  1.1  mrg   struct graph *rdg;
   2998  1.1  mrg   partition *partition;
   2999  1.1  mrg   int i, nbp;
   3000  1.1  mrg 
   3001  1.1  mrg   *destroy_p = false;
   3002  1.1  mrg   *nb_calls = 0;
   3003  1.1  mrg   loop_nest.create (0);
   3004  1.1  mrg   if (!find_loop_nest (loop, &loop_nest))
   3005  1.1  mrg     {
   3006  1.1  mrg       loop_nest.release ();
   3007  1.1  mrg       delete ddrs_table;
   3008  1.1  mrg       return 0;
   3009  1.1  mrg     }
   3010  1.1  mrg 
   3011  1.1  mrg   datarefs_vec.create (20);
   3012  1.1  mrg   has_nonaddressable_dataref_p = false;
   3013  1.1  mrg   rdg = build_rdg (loop, cd);
   3014  1.1  mrg   if (!rdg)
   3015  1.1  mrg     {
   3016  1.1  mrg       if (dump_file && (dump_flags & TDF_DETAILS))
   3017  1.1  mrg 	fprintf (dump_file,
   3018  1.1  mrg 		 "Loop %d not distributed: failed to build the RDG.\n",
   3019  1.1  mrg 		 loop->num);
   3020  1.1  mrg 
   3021  1.1  mrg       loop_nest.release ();
   3022  1.1  mrg       free_data_refs (datarefs_vec);
   3023  1.1  mrg       delete ddrs_table;
   3024  1.1  mrg       return 0;
   3025  1.1  mrg     }
   3026  1.1  mrg 
   3027  1.1  mrg   if (datarefs_vec.length () > MAX_DATAREFS_NUM)
   3028  1.1  mrg     {
   3029  1.1  mrg       if (dump_file && (dump_flags & TDF_DETAILS))
   3030  1.1  mrg 	fprintf (dump_file,
   3031  1.1  mrg 		 "Loop %d not distributed: too many memory references.\n",
   3032  1.1  mrg 		 loop->num);
   3033  1.1  mrg 
   3034  1.1  mrg       free_rdg (rdg);
   3035  1.1  mrg       loop_nest.release ();
   3036  1.1  mrg       free_data_refs (datarefs_vec);
   3037  1.1  mrg       delete ddrs_table;
   3038  1.1  mrg       return 0;
   3039  1.1  mrg     }
   3040  1.1  mrg 
   3041  1.1  mrg   data_reference_p dref;
   3042  1.1  mrg   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
   3043  1.1  mrg     dref->aux = (void *) (uintptr_t) i;
   3044  1.1  mrg 
   3045  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   3046  1.1  mrg     dump_rdg (dump_file, rdg);
   3047  1.1  mrg 
   3048  1.1  mrg   auto_vec<struct partition *, 3> partitions;
   3049  1.1  mrg   rdg_build_partitions (rdg, stmts, &partitions);
   3050  1.1  mrg 
   3051  1.1  mrg   auto_vec<ddr_p> alias_ddrs;
   3052  1.1  mrg 
   3053  1.1  mrg   auto_bitmap stmt_in_all_partitions;
   3054  1.1  mrg   bitmap_copy (stmt_in_all_partitions, partitions[0]->stmts);
   3055  1.1  mrg   for (i = 1; partitions.iterate (i, &partition); ++i)
   3056  1.1  mrg     bitmap_and_into (stmt_in_all_partitions, partitions[i]->stmts);
   3057  1.1  mrg 
   3058  1.1  mrg   bool any_builtin = false;
   3059  1.1  mrg   bool reduction_in_all = false;
   3060  1.1  mrg   FOR_EACH_VEC_ELT (partitions, i, partition)
   3061  1.1  mrg     {
   3062  1.1  mrg       reduction_in_all
   3063  1.1  mrg 	|= classify_partition (loop, rdg, partition, stmt_in_all_partitions);
   3064  1.1  mrg       any_builtin |= partition_builtin_p (partition);
   3065  1.1  mrg     }
   3066  1.1  mrg 
   3067  1.1  mrg   /* If we are only distributing patterns but did not detect any,
   3068  1.1  mrg      simply bail out.  */
   3069  1.1  mrg   if (only_patterns_p
   3070  1.1  mrg       && !any_builtin)
   3071  1.1  mrg     {
   3072  1.1  mrg       nbp = 0;
   3073  1.1  mrg       goto ldist_done;
   3074  1.1  mrg     }
   3075  1.1  mrg 
   3076  1.1  mrg   /* If we are only distributing patterns fuse all partitions that
   3077  1.1  mrg      were not classified as builtins.  This also avoids chopping
   3078  1.1  mrg      a loop into pieces, separated by builtin calls.  That is, we
   3079  1.1  mrg      only want no or a single loop body remaining.  */
   3080  1.1  mrg   struct partition *into;
   3081  1.1  mrg   if (only_patterns_p)
   3082  1.1  mrg     {
   3083  1.1  mrg       for (i = 0; partitions.iterate (i, &into); ++i)
   3084  1.1  mrg 	if (!partition_builtin_p (into))
   3085  1.1  mrg 	  break;
   3086  1.1  mrg       for (++i; partitions.iterate (i, &partition); ++i)
   3087  1.1  mrg 	if (!partition_builtin_p (partition))
   3088  1.1  mrg 	  {
   3089  1.1  mrg 	    partition_merge_into (NULL, into, partition, FUSE_NON_BUILTIN);
   3090  1.1  mrg 	    partitions.unordered_remove (i);
   3091  1.1  mrg 	    partition_free (partition);
   3092  1.1  mrg 	    i--;
   3093  1.1  mrg 	  }
   3094  1.1  mrg     }
   3095  1.1  mrg 
   3096  1.1  mrg   /* Due to limitations in the transform phase we have to fuse all
   3097  1.1  mrg      reduction partitions into the last partition so the existing
   3098  1.1  mrg      loop will contain all loop-closed PHI nodes.  */
   3099  1.1  mrg   for (i = 0; partitions.iterate (i, &into); ++i)
   3100  1.1  mrg     if (partition_reduction_p (into))
   3101  1.1  mrg       break;
   3102  1.1  mrg   for (i = i + 1; partitions.iterate (i, &partition); ++i)
   3103  1.1  mrg     if (partition_reduction_p (partition))
   3104  1.1  mrg       {
   3105  1.1  mrg 	partition_merge_into (rdg, into, partition, FUSE_REDUCTION);
   3106  1.1  mrg 	partitions.unordered_remove (i);
   3107  1.1  mrg 	partition_free (partition);
   3108  1.1  mrg 	i--;
   3109  1.1  mrg       }
   3110  1.1  mrg 
   3111  1.1  mrg   /* Apply our simple cost model - fuse partitions with similar
   3112  1.1  mrg      memory accesses.  */
   3113  1.1  mrg   for (i = 0; partitions.iterate (i, &into); ++i)
   3114  1.1  mrg     {
   3115  1.1  mrg       bool changed = false;
   3116  1.1  mrg       if (partition_builtin_p (into) || into->kind == PKIND_PARTIAL_MEMSET)
   3117  1.1  mrg 	continue;
   3118  1.1  mrg       for (int j = i + 1;
   3119  1.1  mrg 	   partitions.iterate (j, &partition); ++j)
   3120  1.1  mrg 	{
   3121  1.1  mrg 	  if (share_memory_accesses (rdg, into, partition))
   3122  1.1  mrg 	    {
   3123  1.1  mrg 	      partition_merge_into (rdg, into, partition, FUSE_SHARE_REF);
   3124  1.1  mrg 	      partitions.unordered_remove (j);
   3125  1.1  mrg 	      partition_free (partition);
   3126  1.1  mrg 	      j--;
   3127  1.1  mrg 	      changed = true;
   3128  1.1  mrg 	    }
   3129  1.1  mrg 	}
   3130  1.1  mrg       /* If we fused 0 1 2 in step 1 to 0,2 1 as 0 and 2 have similar
   3131  1.1  mrg          accesses when 1 and 2 have similar accesses but not 0 and 1
   3132  1.1  mrg 	 then in the next iteration we will fail to consider merging
   3133  1.1  mrg 	 1 into 0,2.  So try again if we did any merging into 0.  */
   3134  1.1  mrg       if (changed)
   3135  1.1  mrg 	i--;
   3136  1.1  mrg     }
   3137  1.1  mrg 
   3138  1.1  mrg   /* Put a non-builtin partition last if we need to preserve a reduction.
   3139  1.1  mrg      ???  This is a workaround that makes sort_partitions_by_post_order do
   3140  1.1  mrg      the correct thing while in reality it should sort each component
   3141  1.1  mrg      separately and then put the component with a reduction or a non-builtin
   3142  1.1  mrg      last.  */
   3143  1.1  mrg   if (reduction_in_all
   3144  1.1  mrg       && partition_builtin_p (partitions.last()))
   3145  1.1  mrg     FOR_EACH_VEC_ELT (partitions, i, partition)
   3146  1.1  mrg       if (!partition_builtin_p (partition))
   3147  1.1  mrg 	{
   3148  1.1  mrg 	  partitions.unordered_remove (i);
   3149  1.1  mrg 	  partitions.quick_push (partition);
   3150  1.1  mrg 	  break;
   3151  1.1  mrg 	}
   3152  1.1  mrg 
   3153  1.1  mrg   /* Build the partition dependency graph and fuse partitions in strong
   3154  1.1  mrg      connected component.  */
   3155  1.1  mrg   if (partitions.length () > 1)
   3156  1.1  mrg     {
   3157  1.1  mrg       /* Don't support loop nest distribution under runtime alias check
   3158  1.1  mrg 	 since it's not likely to enable many vectorization opportunities.
   3159  1.1  mrg 	 Also if loop has any data reference which may be not addressable
   3160  1.1  mrg 	 since alias check needs to take, compare address of the object.  */
   3161  1.1  mrg       if (loop->inner || has_nonaddressable_dataref_p)
   3162  1.1  mrg 	merge_dep_scc_partitions (rdg, &partitions, false);
   3163  1.1  mrg       else
   3164  1.1  mrg 	{
   3165  1.1  mrg 	  merge_dep_scc_partitions (rdg, &partitions, true);
   3166  1.1  mrg 	  if (partitions.length () > 1)
   3167  1.1  mrg 	    break_alias_scc_partitions (rdg, &partitions, &alias_ddrs);
   3168  1.1  mrg 	}
   3169  1.1  mrg     }
   3170  1.1  mrg 
   3171  1.1  mrg   finalize_partitions (loop, &partitions, &alias_ddrs);
   3172  1.1  mrg 
   3173  1.1  mrg   /* If there is a reduction in all partitions make sure the last one
   3174  1.1  mrg      is not classified for builtin code generation.  */
   3175  1.1  mrg   if (reduction_in_all)
   3176  1.1  mrg     {
   3177  1.1  mrg       partition = partitions.last ();
   3178  1.1  mrg       if (only_patterns_p
   3179  1.1  mrg 	  && partition_builtin_p (partition)
   3180  1.1  mrg 	  && !partition_builtin_p (partitions[0]))
   3181  1.1  mrg 	{
   3182  1.1  mrg 	  nbp = 0;
   3183  1.1  mrg 	  goto ldist_done;
   3184  1.1  mrg 	}
   3185  1.1  mrg       partition->kind = PKIND_NORMAL;
   3186  1.1  mrg     }
   3187  1.1  mrg 
   3188  1.1  mrg   nbp = partitions.length ();
   3189  1.1  mrg   if (nbp == 0
   3190  1.1  mrg       || (nbp == 1 && !partition_builtin_p (partitions[0]))
   3191  1.1  mrg       || (nbp > 1 && partition_contains_all_rw (rdg, partitions)))
   3192  1.1  mrg     {
   3193  1.1  mrg       nbp = 0;
   3194  1.1  mrg       goto ldist_done;
   3195  1.1  mrg     }
   3196  1.1  mrg 
   3197  1.1  mrg   if (version_for_distribution_p (&partitions, &alias_ddrs))
   3198  1.1  mrg     version_loop_by_alias_check (&partitions, loop, &alias_ddrs);
   3199  1.1  mrg 
   3200  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   3201  1.1  mrg     {
   3202  1.1  mrg       fprintf (dump_file,
   3203  1.1  mrg 	       "distribute loop <%d> into partitions:\n", loop->num);
   3204  1.1  mrg       dump_rdg_partitions (dump_file, partitions);
   3205  1.1  mrg     }
   3206  1.1  mrg 
   3207  1.1  mrg   FOR_EACH_VEC_ELT (partitions, i, partition)
   3208  1.1  mrg     {
   3209  1.1  mrg       if (partition_builtin_p (partition))
   3210  1.1  mrg 	(*nb_calls)++;
   3211  1.1  mrg       *destroy_p |= generate_code_for_partition (loop, partition, i < nbp - 1);
   3212  1.1  mrg     }
   3213  1.1  mrg 
   3214  1.1  mrg  ldist_done:
   3215  1.1  mrg   loop_nest.release ();
   3216  1.1  mrg   free_data_refs (datarefs_vec);
   3217  1.1  mrg   for (hash_table<ddr_hasher>::iterator iter = ddrs_table->begin ();
   3218  1.1  mrg        iter != ddrs_table->end (); ++iter)
   3219  1.1  mrg     {
   3220  1.1  mrg       free_dependence_relation (*iter);
   3221  1.1  mrg       *iter = NULL;
   3222  1.1  mrg     }
   3223  1.1  mrg   delete ddrs_table;
   3224  1.1  mrg 
   3225  1.1  mrg   FOR_EACH_VEC_ELT (partitions, i, partition)
   3226  1.1  mrg     partition_free (partition);
   3227  1.1  mrg 
   3228  1.1  mrg   free_rdg (rdg);
   3229  1.1  mrg   return nbp - *nb_calls;
   3230  1.1  mrg }
   3231  1.1  mrg 
   3232  1.1  mrg 
   3233  1.1  mrg void loop_distribution::bb_top_order_init (void)
   3234  1.1  mrg {
   3235  1.1  mrg   int rpo_num;
   3236  1.1  mrg   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
   3237  1.1  mrg   edge entry = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   3238  1.1  mrg   bitmap exit_bbs = BITMAP_ALLOC (NULL);
   3239  1.1  mrg 
   3240  1.1  mrg   bb_top_order_index = XNEWVEC (int, last_basic_block_for_fn (cfun));
   3241  1.1  mrg   bb_top_order_index_size = last_basic_block_for_fn (cfun);
   3242  1.1  mrg 
   3243  1.1  mrg   entry->flags &= ~EDGE_DFS_BACK;
   3244  1.1  mrg   bitmap_set_bit (exit_bbs, EXIT_BLOCK);
   3245  1.1  mrg   rpo_num = rev_post_order_and_mark_dfs_back_seme (cfun, entry, exit_bbs, true,
   3246  1.1  mrg 						   rpo, NULL);
   3247  1.1  mrg   BITMAP_FREE (exit_bbs);
   3248  1.1  mrg 
   3249  1.1  mrg   for (int i = 0; i < rpo_num; i++)
   3250  1.1  mrg     bb_top_order_index[rpo[i]] = i;
   3251  1.1  mrg 
   3252  1.1  mrg   free (rpo);
   3253  1.1  mrg }
   3254  1.1  mrg 
   3255  1.1  mrg void loop_distribution::bb_top_order_destroy ()
   3256  1.1  mrg {
   3257  1.1  mrg   free (bb_top_order_index);
   3258  1.1  mrg   bb_top_order_index = NULL;
   3259  1.1  mrg   bb_top_order_index_size = 0;
   3260  1.1  mrg }
   3261  1.1  mrg 
   3262  1.1  mrg 
   3263  1.1  mrg /* Given LOOP, this function records seed statements for distribution in
   3264  1.1  mrg    WORK_LIST.  Return false if there is nothing for distribution.  */
   3265  1.1  mrg 
   3266  1.1  mrg static bool
   3267  1.1  mrg find_seed_stmts_for_distribution (class loop *loop, vec<gimple *> *work_list)
   3268  1.1  mrg {
   3269  1.1  mrg   basic_block *bbs = get_loop_body_in_dom_order (loop);
   3270  1.1  mrg 
   3271  1.1  mrg   /* Initialize the worklist with stmts we seed the partitions with.  */
   3272  1.1  mrg   for (unsigned i = 0; i < loop->num_nodes; ++i)
   3273  1.1  mrg     {
   3274  1.1  mrg       /* In irreducible sub-regions we don't know how to redirect
   3275  1.1  mrg 	 conditions, so fail.  See PR100492.  */
   3276  1.1  mrg       if (bbs[i]->flags & BB_IRREDUCIBLE_LOOP)
   3277  1.1  mrg 	{
   3278  1.1  mrg 	  if (dump_file && (dump_flags & TDF_DETAILS))
   3279  1.1  mrg 	    fprintf (dump_file, "loop %d contains an irreducible region.\n",
   3280  1.1  mrg 		     loop->num);
   3281  1.1  mrg 	  work_list->truncate (0);
   3282  1.1  mrg 	  break;
   3283  1.1  mrg 	}
   3284  1.1  mrg       for (gphi_iterator gsi = gsi_start_phis (bbs[i]);
   3285  1.1  mrg 	   !gsi_end_p (gsi); gsi_next (&gsi))
   3286  1.1  mrg 	{
   3287  1.1  mrg 	  gphi *phi = gsi.phi ();
   3288  1.1  mrg 	  if (virtual_operand_p (gimple_phi_result (phi)))
   3289  1.1  mrg 	    continue;
   3290  1.1  mrg 	  /* Distribute stmts which have defs that are used outside of
   3291  1.1  mrg 	     the loop.  */
   3292  1.1  mrg 	  if (!stmt_has_scalar_dependences_outside_loop (loop, phi))
   3293  1.1  mrg 	    continue;
   3294  1.1  mrg 	  work_list->safe_push (phi);
   3295  1.1  mrg 	}
   3296  1.1  mrg       for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
   3297  1.1  mrg 	   !gsi_end_p (gsi); gsi_next (&gsi))
   3298  1.1  mrg 	{
   3299  1.1  mrg 	  gimple *stmt = gsi_stmt (gsi);
   3300  1.1  mrg 
   3301  1.1  mrg 	  /* Ignore clobbers, they do not have true side effects.  */
   3302  1.1  mrg 	  if (gimple_clobber_p (stmt))
   3303  1.1  mrg 	    continue;
   3304  1.1  mrg 
   3305  1.1  mrg 	  /* If there is a stmt with side-effects bail out - we
   3306  1.1  mrg 	     cannot and should not distribute this loop.  */
   3307  1.1  mrg 	  if (gimple_has_side_effects (stmt))
   3308  1.1  mrg 	    {
   3309  1.1  mrg 	      free (bbs);
   3310  1.1  mrg 	      return false;
   3311  1.1  mrg 	    }
   3312  1.1  mrg 
   3313  1.1  mrg 	  /* Distribute stmts which have defs that are used outside of
   3314  1.1  mrg 	     the loop.  */
   3315  1.1  mrg 	  if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
   3316  1.1  mrg 	    ;
   3317  1.1  mrg 	  /* Otherwise only distribute stores for now.  */
   3318  1.1  mrg 	  else if (!gimple_vdef (stmt))
   3319  1.1  mrg 	    continue;
   3320  1.1  mrg 
   3321  1.1  mrg 	  work_list->safe_push (stmt);
   3322  1.1  mrg 	}
   3323  1.1  mrg     }
   3324  1.1  mrg   bool res = work_list->length () > 0;
   3325  1.1  mrg   if (res && !can_copy_bbs_p (bbs, loop->num_nodes))
   3326  1.1  mrg     {
   3327  1.1  mrg       if (dump_file && (dump_flags & TDF_DETAILS))
   3328  1.1  mrg 	fprintf (dump_file, "cannot copy loop %d.\n", loop->num);
   3329  1.1  mrg       res = false;
   3330  1.1  mrg     }
   3331  1.1  mrg   free (bbs);
   3332  1.1  mrg   return res;
   3333  1.1  mrg }
   3334  1.1  mrg 
   3335  1.1  mrg /* A helper function for generate_{rawmemchr,strlen}_builtin functions in order
   3336  1.1  mrg    to place new statements SEQ before LOOP and replace the old reduction
   3337  1.1  mrg    variable with the new one.  */
   3338  1.1  mrg 
   3339  1.1  mrg static void
   3340  1.1  mrg generate_reduction_builtin_1 (loop_p loop, gimple_seq &seq,
   3341  1.1  mrg 			      tree reduction_var_old, tree reduction_var_new,
   3342  1.1  mrg 			      const char *info, machine_mode load_mode)
   3343  1.1  mrg {
   3344  1.1  mrg   gcc_assert (flag_tree_loop_distribute_patterns);
   3345  1.1  mrg 
   3346  1.1  mrg   /* Place new statements before LOOP.  */
   3347  1.1  mrg   gimple_stmt_iterator gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
   3348  1.1  mrg   gsi_insert_seq_after (&gsi, seq, GSI_CONTINUE_LINKING);
   3349  1.1  mrg 
   3350  1.1  mrg   /* Replace old reduction variable with new one.  */
   3351  1.1  mrg   imm_use_iterator iter;
   3352  1.1  mrg   gimple *stmt;
   3353  1.1  mrg   use_operand_p use_p;
   3354  1.1  mrg   FOR_EACH_IMM_USE_STMT (stmt, iter, reduction_var_old)
   3355  1.1  mrg     {
   3356  1.1  mrg       FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
   3357  1.1  mrg 	SET_USE (use_p, reduction_var_new);
   3358  1.1  mrg 
   3359  1.1  mrg       update_stmt (stmt);
   3360  1.1  mrg     }
   3361  1.1  mrg 
   3362  1.1  mrg   if (dump_file && (dump_flags & TDF_DETAILS))
   3363  1.1  mrg     fprintf (dump_file, info, GET_MODE_NAME (load_mode));
   3364  1.1  mrg }
   3365  1.1  mrg 
   3366  1.1  mrg /* Generate a call to rawmemchr and place it before LOOP.  REDUCTION_VAR is
   3367  1.1  mrg    replaced with a fresh SSA name representing the result of the call.  */
   3368  1.1  mrg 
   3369  1.1  mrg static void
   3370  1.1  mrg generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
   3371  1.1  mrg 			    data_reference_p store_dr, tree base, tree pattern,
   3372  1.1  mrg 			    location_t loc)
   3373  1.1  mrg {
   3374  1.1  mrg   gimple_seq seq = NULL;
   3375  1.1  mrg 
   3376  1.1  mrg   tree mem = force_gimple_operand (base, &seq, true, NULL_TREE);
   3377  1.1  mrg   gimple *fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, mem, pattern);
   3378  1.1  mrg   tree reduction_var_new = copy_ssa_name (reduction_var);
   3379  1.1  mrg   gimple_call_set_lhs (fn_call, reduction_var_new);
   3380  1.1  mrg   gimple_set_location (fn_call, loc);
   3381  1.1  mrg   gimple_seq_add_stmt (&seq, fn_call);
   3382  1.1  mrg 
   3383  1.1  mrg   if (store_dr)
   3384  1.1  mrg     {
   3385  1.1  mrg       gassign *g = gimple_build_assign (DR_REF (store_dr), reduction_var_new);
   3386  1.1  mrg       gimple_seq_add_stmt (&seq, g);
   3387  1.1  mrg     }
   3388  1.1  mrg 
   3389  1.1  mrg   generate_reduction_builtin_1 (loop, seq, reduction_var, reduction_var_new,
   3390  1.1  mrg 				"generated rawmemchr%s\n",
   3391  1.1  mrg 				TYPE_MODE (TREE_TYPE (TREE_TYPE (base))));
   3392  1.1  mrg }
   3393  1.1  mrg 
   3394  1.1  mrg /* Helper function for generate_strlen_builtin(,_using_rawmemchr)  */
   3395  1.1  mrg 
   3396  1.1  mrg static void
   3397  1.1  mrg generate_strlen_builtin_1 (loop_p loop, gimple_seq &seq,
   3398  1.1  mrg 			   tree reduction_var_old, tree reduction_var_new,
   3399  1.1  mrg 			   machine_mode mode, tree start_len)
   3400  1.1  mrg {
   3401  1.1  mrg   /* REDUCTION_VAR_NEW has either size type or ptrdiff type and must be
   3402  1.1  mrg      converted if types of old and new reduction variable are not compatible. */
   3403  1.1  mrg   reduction_var_new = gimple_convert (&seq, TREE_TYPE (reduction_var_old),
   3404  1.1  mrg 				      reduction_var_new);
   3405  1.1  mrg 
   3406  1.1  mrg   /* Loops of the form `for (i=42; s[i]; ++i);` have an additional start
   3407  1.1  mrg      length.  */
   3408  1.1  mrg   if (!integer_zerop (start_len))
   3409  1.1  mrg     {
   3410  1.1  mrg       tree lhs = make_ssa_name (TREE_TYPE (reduction_var_new));
   3411  1.1  mrg       gimple *g = gimple_build_assign (lhs, PLUS_EXPR, reduction_var_new,
   3412  1.1  mrg 				       start_len);
   3413  1.1  mrg       gimple_seq_add_stmt (&seq, g);
   3414  1.1  mrg       reduction_var_new = lhs;
   3415  1.1  mrg     }
   3416  1.1  mrg 
   3417  1.1  mrg   generate_reduction_builtin_1 (loop, seq, reduction_var_old, reduction_var_new,
   3418  1.1  mrg 				"generated strlen%s\n", mode);
   3419  1.1  mrg }
   3420  1.1  mrg 
   3421  1.1  mrg /* Generate a call to strlen and place it before LOOP.  REDUCTION_VAR is
   3422  1.1  mrg    replaced with a fresh SSA name representing the result of the call.  */
   3423  1.1  mrg 
   3424  1.1  mrg static void
   3425  1.1  mrg generate_strlen_builtin (loop_p loop, tree reduction_var, tree base,
   3426  1.1  mrg 			 tree start_len, location_t loc)
   3427  1.1  mrg {
   3428  1.1  mrg   gimple_seq seq = NULL;
   3429  1.1  mrg 
   3430  1.1  mrg   tree reduction_var_new = make_ssa_name (size_type_node);
   3431  1.1  mrg 
   3432  1.1  mrg   tree mem = force_gimple_operand (base, &seq, true, NULL_TREE);
   3433  1.1  mrg   tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_STRLEN));
   3434  1.1  mrg   gimple *fn_call = gimple_build_call (fn, 1, mem);
   3435  1.1  mrg   gimple_call_set_lhs (fn_call, reduction_var_new);
   3436  1.1  mrg   gimple_set_location (fn_call, loc);
   3437  1.1  mrg   gimple_seq_add_stmt (&seq, fn_call);
   3438  1.1  mrg 
   3439  1.1  mrg   generate_strlen_builtin_1 (loop, seq, reduction_var, reduction_var_new,
   3440  1.1  mrg 			     QImode, start_len);
   3441  1.1  mrg }
   3442  1.1  mrg 
   3443  1.1  mrg /* Generate code in order to mimic the behaviour of strlen but this time over
   3444  1.1  mrg    an array of elements with mode different than QI.  REDUCTION_VAR is replaced
   3445  1.1  mrg    with a fresh SSA name representing the result, i.e., the length.  */
   3446  1.1  mrg 
   3447  1.1  mrg static void
   3448  1.1  mrg generate_strlen_builtin_using_rawmemchr (loop_p loop, tree reduction_var,
   3449  1.1  mrg 					 tree base, tree load_type,
   3450  1.1  mrg 					 tree start_len, location_t loc)
   3451  1.1  mrg {
   3452  1.1  mrg   gimple_seq seq = NULL;
   3453  1.1  mrg 
   3454  1.1  mrg   tree start = force_gimple_operand (base, &seq, true, NULL_TREE);
   3455  1.1  mrg   tree zero = build_zero_cst (load_type);
   3456  1.1  mrg   gimple *fn_call = gimple_build_call_internal (IFN_RAWMEMCHR, 2, start, zero);
   3457  1.1  mrg   tree end = make_ssa_name (TREE_TYPE (base));
   3458  1.1  mrg   gimple_call_set_lhs (fn_call, end);
   3459  1.1  mrg   gimple_set_location (fn_call, loc);
   3460  1.1  mrg   gimple_seq_add_stmt (&seq, fn_call);
   3461  1.1  mrg 
   3462  1.1  mrg   /* Determine the number of elements between START and END by
   3463  1.1  mrg      evaluating (END - START) / sizeof (*START).  */
   3464  1.1  mrg   tree diff = make_ssa_name (ptrdiff_type_node);
   3465  1.1  mrg   gimple *diff_stmt = gimple_build_assign (diff, POINTER_DIFF_EXPR, end, start);
   3466  1.1  mrg   gimple_seq_add_stmt (&seq, diff_stmt);
   3467  1.1  mrg   /* Let SIZE be the size of each character.  */
   3468  1.1  mrg   tree size = gimple_convert (&seq, ptrdiff_type_node,
   3469  1.1  mrg 			      TYPE_SIZE_UNIT (load_type));
   3470  1.1  mrg   tree count = make_ssa_name (ptrdiff_type_node);
   3471  1.1  mrg   gimple *count_stmt = gimple_build_assign (count, TRUNC_DIV_EXPR, diff, size);
   3472  1.1  mrg   gimple_seq_add_stmt (&seq, count_stmt);
   3473  1.1  mrg 
   3474  1.1  mrg   generate_strlen_builtin_1 (loop, seq, reduction_var, count,
   3475  1.1  mrg 			     TYPE_MODE (load_type),
   3476  1.1  mrg 			     start_len);
   3477  1.1  mrg }
   3478  1.1  mrg 
   3479  1.1  mrg /* Return true if we can count at least as many characters by taking pointer
   3480  1.1  mrg    difference as we can count via reduction_var without an overflow.  Thus
   3481  1.1  mrg    compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type),
   3482  1.1  mrg    m = TYPE_PRECISION (ptrdiff_type_node), and s = size of each character.  */
   3483  1.1  mrg static bool
   3484  1.1  mrg reduction_var_overflows_first (tree reduction_var_type, tree load_type)
   3485  1.1  mrg {
   3486  1.1  mrg   widest_int n2 = wi::lshift (1, TYPE_PRECISION (reduction_var_type));;
   3487  1.1  mrg   widest_int m2 = wi::lshift (1, TYPE_PRECISION (ptrdiff_type_node) - 1);
   3488  1.1  mrg   widest_int s = wi::to_widest (TYPE_SIZE_UNIT (load_type));
   3489  1.1  mrg   return wi::ltu_p (n2, wi::udiv_trunc (m2, s));
   3490  1.1  mrg }
   3491  1.1  mrg 
   3492  1.1  mrg static gimple *
   3493  1.1  mrg determine_reduction_stmt_1 (const loop_p loop, const basic_block *bbs)
   3494  1.1  mrg {
   3495  1.1  mrg   gimple *reduction_stmt = NULL;
   3496  1.1  mrg 
   3497  1.1  mrg   for (unsigned i = 0, ninsns = 0; i < loop->num_nodes; ++i)
   3498  1.1  mrg     {
   3499  1.1  mrg       basic_block bb = bbs[i];
   3500  1.1  mrg 
   3501  1.1  mrg       for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
   3502  1.1  mrg 	   gsi_next (&bsi))
   3503  1.1  mrg 	{
   3504  1.1  mrg 	  gphi *phi = bsi.phi ();
   3505  1.1  mrg 	  if (virtual_operand_p (gimple_phi_result (phi)))
   3506  1.1  mrg 	    continue;
   3507  1.1  mrg 	  if (stmt_has_scalar_dependences_outside_loop (loop, phi))
   3508  1.1  mrg 	    {
   3509  1.1  mrg 	      if (reduction_stmt)
   3510  1.1  mrg 		return NULL;
   3511  1.1  mrg 	      reduction_stmt = phi;
   3512  1.1  mrg 	    }
   3513  1.1  mrg 	}
   3514  1.1  mrg 
   3515  1.1  mrg       for (gimple_stmt_iterator bsi = gsi_start_nondebug_bb (bb);
   3516  1.1  mrg 	   !gsi_end_p (bsi); gsi_next_nondebug (&bsi), ++ninsns)
   3517  1.1  mrg 	{
   3518  1.1  mrg 	  /* Bail out early for loops which are unlikely to match.  */
   3519  1.1  mrg 	  if (ninsns > 16)
   3520  1.1  mrg 	    return NULL;
   3521  1.1  mrg 	  gimple *stmt = gsi_stmt (bsi);
   3522  1.1  mrg 	  if (gimple_clobber_p (stmt))
   3523  1.1  mrg 	    continue;
   3524  1.1  mrg 	  if (gimple_code (stmt) == GIMPLE_LABEL)
   3525  1.1  mrg 	    continue;
   3526  1.1  mrg 	  if (gimple_has_volatile_ops (stmt))
   3527  1.1  mrg 	    return NULL;
   3528  1.1  mrg 	  if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
   3529  1.1  mrg 	    {
   3530  1.1  mrg 	      if (reduction_stmt)
   3531  1.1  mrg 		return NULL;
   3532  1.1  mrg 	      reduction_stmt = stmt;
   3533  1.1  mrg 	    }
   3534  1.1  mrg 	}
   3535  1.1  mrg     }
   3536  1.1  mrg 
   3537  1.1  mrg   return reduction_stmt;
   3538  1.1  mrg }
   3539  1.1  mrg 
   3540  1.1  mrg /* If LOOP has a single non-volatile reduction statement, then return a pointer
   3541  1.1  mrg    to it.  Otherwise return NULL.  */
   3542  1.1  mrg static gimple *
   3543  1.1  mrg determine_reduction_stmt (const loop_p loop)
   3544  1.1  mrg {
   3545  1.1  mrg   basic_block *bbs = get_loop_body (loop);
   3546  1.1  mrg   gimple *reduction_stmt = determine_reduction_stmt_1 (loop, bbs);
   3547  1.1  mrg   XDELETEVEC (bbs);
   3548  1.1  mrg   return reduction_stmt;
   3549  1.1  mrg }
   3550  1.1  mrg 
   3551  1.1  mrg /* Transform loops which mimic the effects of builtins rawmemchr or strlen and
   3552  1.1  mrg    replace them accordingly.  For example, a loop of the form
   3553  1.1  mrg 
   3554  1.1  mrg      for (; *p != 42; ++p);
   3555  1.1  mrg 
   3556  1.1  mrg    is replaced by
   3557  1.1  mrg 
   3558  1.1  mrg      p = rawmemchr<MODE> (p, 42);
   3559  1.1  mrg 
   3560  1.1  mrg    under the assumption that rawmemchr is available for a particular MODE.
   3561  1.1  mrg    Another example is
   3562  1.1  mrg 
   3563  1.1  mrg      int i;
   3564  1.1  mrg      for (i = 42; s[i]; ++i);
   3565  1.1  mrg 
   3566  1.1  mrg    which is replaced by
   3567  1.1  mrg 
   3568  1.1  mrg      i = (int)strlen (&s[42]) + 42;
   3569  1.1  mrg 
   3570  1.1  mrg    for some character array S.  In case array S is not of type character array
   3571  1.1  mrg    we end up with
   3572  1.1  mrg 
   3573  1.1  mrg      i = (int)(rawmemchr<MODE> (&s[42], 0) - &s[42]) + 42;
   3574  1.1  mrg 
   3575  1.1  mrg    assuming that rawmemchr is available for a particular MODE.  */
   3576  1.1  mrg 
   3577  1.1  mrg bool
   3578  1.1  mrg loop_distribution::transform_reduction_loop (loop_p loop)
   3579  1.1  mrg {
   3580  1.1  mrg   gimple *reduction_stmt;
   3581  1.1  mrg   data_reference_p load_dr = NULL, store_dr = NULL;
   3582  1.1  mrg 
   3583  1.1  mrg   edge e = single_exit (loop);
   3584  1.1  mrg   gcond *cond = safe_dyn_cast <gcond *> (last_stmt (e->src));
   3585  1.1  mrg   if (!cond)
   3586  1.1  mrg     return false;
   3587  1.1  mrg   /* Ensure loop condition is an (in)equality test and loop is exited either if
   3588  1.1  mrg      the inequality test fails or the equality test succeeds.  */
   3589  1.1  mrg   if (!(e->flags & EDGE_FALSE_VALUE && gimple_cond_code (cond) == NE_EXPR)
   3590  1.1  mrg       && !(e->flags & EDGE_TRUE_VALUE && gimple_cond_code (cond) == EQ_EXPR))
   3591  1.1  mrg     return false;
   3592  1.1  mrg   /* A limitation of the current implementation is that we only support
   3593  1.1  mrg      constant patterns in (in)equality tests.  */
   3594  1.1  mrg   tree pattern = gimple_cond_rhs (cond);
   3595  1.1  mrg   if (TREE_CODE (pattern) != INTEGER_CST)
   3596  1.1  mrg     return false;
   3597  1.1  mrg 
   3598  1.1  mrg   reduction_stmt = determine_reduction_stmt (loop);
   3599  1.1  mrg 
   3600  1.1  mrg   /* A limitation of the current implementation is that we require a reduction
   3601  1.1  mrg      statement.  Therefore, loops without a reduction statement as in the
   3602  1.1  mrg      following are not recognized:
   3603  1.1  mrg      int *p;
   3604  1.1  mrg      void foo (void) { for (; *p; ++p); } */
   3605  1.1  mrg   if (reduction_stmt == NULL)
   3606  1.1  mrg     return false;
   3607  1.1  mrg 
   3608  1.1  mrg   /* Reduction variables are guaranteed to be SSA names.  */
   3609  1.1  mrg   tree reduction_var;
   3610  1.1  mrg   switch (gimple_code (reduction_stmt))
   3611  1.1  mrg     {
   3612  1.1  mrg     case GIMPLE_ASSIGN:
   3613  1.1  mrg     case GIMPLE_PHI:
   3614  1.1  mrg       reduction_var = gimple_get_lhs (reduction_stmt);
   3615  1.1  mrg       break;
   3616  1.1  mrg     default:
   3617  1.1  mrg       /* Bail out e.g. for GIMPLE_CALL.  */
   3618  1.1  mrg       return false;
   3619  1.1  mrg     }
   3620  1.1  mrg 
   3621  1.1  mrg   struct graph *rdg = build_rdg (loop, NULL);
   3622  1.1  mrg   if (rdg == NULL)
   3623  1.1  mrg     {
   3624  1.1  mrg       if (dump_file && (dump_flags & TDF_DETAILS))
   3625  1.1  mrg 	fprintf (dump_file,
   3626  1.1  mrg 		 "Loop %d not transformed: failed to build the RDG.\n",
   3627  1.1  mrg 		 loop->num);
   3628  1.1  mrg 
   3629  1.1  mrg       return false;
   3630  1.1  mrg     }
   3631  1.1  mrg   auto_bitmap partition_stmts;
   3632  1.1  mrg   bitmap_set_range (partition_stmts, 0, rdg->n_vertices);
   3633  1.1  mrg   find_single_drs (loop, rdg, partition_stmts, &store_dr, &load_dr);
   3634  1.1  mrg   free_rdg (rdg);
   3635  1.1  mrg 
   3636  1.1  mrg   /* Bail out if there is no single load.  */
   3637  1.1  mrg   if (load_dr == NULL)
   3638  1.1  mrg     return false;
   3639  1.1  mrg 
   3640  1.1  mrg   /* Reaching this point we have a loop with a single reduction variable,
   3641  1.1  mrg      a single load, and an optional single store.  */
   3642  1.1  mrg 
   3643  1.1  mrg   tree load_ref = DR_REF (load_dr);
   3644  1.1  mrg   tree load_type = TREE_TYPE (load_ref);
   3645  1.1  mrg   tree load_access_base = build_fold_addr_expr (load_ref);
   3646  1.1  mrg   tree load_access_size = TYPE_SIZE_UNIT (load_type);
   3647  1.1  mrg   affine_iv load_iv, reduction_iv;
   3648  1.1  mrg 
   3649  1.1  mrg   if (!INTEGRAL_TYPE_P (load_type)
   3650  1.1  mrg       || !type_has_mode_precision_p (load_type))
   3651  1.1  mrg     return false;
   3652  1.1  mrg 
   3653  1.1  mrg   /* We already ensured that the loop condition tests for (in)equality where the
   3654  1.1  mrg      rhs is a constant pattern. Now ensure that the lhs is the result of the
   3655  1.1  mrg      load.  */
   3656  1.1  mrg   if (gimple_cond_lhs (cond) != gimple_assign_lhs (DR_STMT (load_dr)))
   3657  1.1  mrg     return false;
   3658  1.1  mrg 
   3659  1.1  mrg   /* Bail out if no affine induction variable with constant step can be
   3660  1.1  mrg      determined.  */
   3661  1.1  mrg   if (!simple_iv (loop, loop, load_access_base, &load_iv, false))
   3662  1.1  mrg     return false;
   3663  1.1  mrg 
   3664  1.1  mrg   /* Bail out if memory accesses are not consecutive or not growing.  */
   3665  1.1  mrg   if (!operand_equal_p (load_iv.step, load_access_size, 0))
   3666  1.1  mrg     return false;
   3667  1.1  mrg 
   3668  1.1  mrg   if (!simple_iv (loop, loop, reduction_var, &reduction_iv, false))
   3669  1.1  mrg     return false;
   3670  1.1  mrg 
   3671  1.1  mrg   /* Handle rawmemchr like loops.  */
   3672  1.1  mrg   if (operand_equal_p (load_iv.base, reduction_iv.base)
   3673  1.1  mrg       && operand_equal_p (load_iv.step, reduction_iv.step))
   3674  1.1  mrg     {
   3675  1.1  mrg       if (store_dr)
   3676  1.1  mrg 	{
   3677  1.1  mrg 	  /* Ensure that we store to X and load from X+I where I>0.  */
   3678  1.1  mrg 	  if (TREE_CODE (load_iv.base) != POINTER_PLUS_EXPR
   3679  1.1  mrg 	      || !integer_onep (TREE_OPERAND (load_iv.base, 1)))
   3680  1.1  mrg 	    return false;
   3681  1.1  mrg 	  tree ptr_base = TREE_OPERAND (load_iv.base, 0);
   3682  1.1  mrg 	  if (TREE_CODE (ptr_base) != SSA_NAME)
   3683  1.1  mrg 	    return false;
   3684  1.1  mrg 	  gimple *def = SSA_NAME_DEF_STMT (ptr_base);
   3685  1.1  mrg 	  if (!gimple_assign_single_p (def)
   3686  1.1  mrg 	      || gimple_assign_rhs1 (def) != DR_REF (store_dr))
   3687  1.1  mrg 	    return false;
   3688  1.1  mrg 	  /* Ensure that the reduction value is stored.  */
   3689  1.1  mrg 	  if (gimple_assign_rhs1 (DR_STMT (store_dr)) != reduction_var)
   3690  1.1  mrg 	    return false;
   3691  1.1  mrg 	}
   3692  1.1  mrg       /* Bail out if target does not provide rawmemchr for a certain mode.  */
   3693  1.1  mrg       machine_mode mode = TYPE_MODE (load_type);
   3694  1.1  mrg       if (direct_optab_handler (rawmemchr_optab, mode) == CODE_FOR_nothing)
   3695  1.1  mrg 	return false;
   3696  1.1  mrg       location_t loc = gimple_location (DR_STMT (load_dr));
   3697  1.1  mrg       generate_rawmemchr_builtin (loop, reduction_var, store_dr, load_iv.base,
   3698  1.1  mrg 				  pattern, loc);
   3699  1.1  mrg       return true;
   3700  1.1  mrg     }
   3701  1.1  mrg 
   3702  1.1  mrg   /* Handle strlen like loops.  */
   3703  1.1  mrg   if (store_dr == NULL
   3704  1.1  mrg       && integer_zerop (pattern)
   3705  1.1  mrg       && INTEGRAL_TYPE_P (TREE_TYPE (reduction_var))
   3706  1.1  mrg       && TREE_CODE (reduction_iv.base) == INTEGER_CST
   3707  1.1  mrg       && TREE_CODE (reduction_iv.step) == INTEGER_CST
   3708  1.1  mrg       && integer_onep (reduction_iv.step))
   3709  1.1  mrg     {
   3710  1.1  mrg       location_t loc = gimple_location (DR_STMT (load_dr));
   3711  1.1  mrg       tree reduction_var_type = TREE_TYPE (reduction_var);
   3712  1.1  mrg       /* While determining the length of a string an overflow might occur.
   3713  1.1  mrg 	 If an overflow only occurs in the loop implementation and not in the
   3714  1.1  mrg 	 strlen implementation, then either the overflow is undefined or the
   3715  1.1  mrg 	 truncated result of strlen equals the one of the loop.  Otherwise if
   3716  1.1  mrg 	 an overflow may also occur in the strlen implementation, then
   3717  1.1  mrg 	 replacing a loop by a call to strlen is sound whenever we ensure that
   3718  1.1  mrg 	 if an overflow occurs in the strlen implementation, then also an
   3719  1.1  mrg 	 overflow occurs in the loop implementation which is undefined.  It
   3720  1.1  mrg 	 seems reasonable to relax this and assume that the strlen
   3721  1.1  mrg 	 implementation cannot overflow in case sizetype is big enough in the
   3722  1.1  mrg 	 sense that an overflow can only happen for string objects which are
   3723  1.1  mrg 	 bigger than half of the address space; at least for 32-bit targets and
   3724  1.1  mrg 	 up.
   3725  1.1  mrg 
   3726  1.1  mrg 	 For strlen which makes use of rawmemchr the maximal length of a string
   3727  1.1  mrg 	 which can be determined without an overflow is PTRDIFF_MAX / S where
   3728  1.1  mrg 	 each character has size S.  Since an overflow for ptrdiff type is
   3729  1.1  mrg 	 undefined we have to make sure that if an overflow occurs, then an
   3730  1.1  mrg 	 overflow occurs in the loop implementation, too, and this is
   3731  1.1  mrg 	 undefined, too.  Similar as before we relax this and assume that no
   3732  1.1  mrg 	 string object is larger than half of the address space; at least for
   3733  1.1  mrg 	 32-bit targets and up.  */
   3734  1.1  mrg       if (TYPE_MODE (load_type) == TYPE_MODE (char_type_node)
   3735  1.1  mrg 	  && TYPE_PRECISION (load_type) == TYPE_PRECISION (char_type_node)
   3736  1.1  mrg 	  && ((TYPE_PRECISION (sizetype) >= TYPE_PRECISION (ptr_type_node) - 1
   3737  1.1  mrg 	       && TYPE_PRECISION (ptr_type_node) >= 32)
   3738  1.1  mrg 	      || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
   3739  1.1  mrg 		  && TYPE_PRECISION (reduction_var_type) <= TYPE_PRECISION (sizetype)))
   3740  1.1  mrg 	  && builtin_decl_implicit (BUILT_IN_STRLEN))
   3741  1.1  mrg 	generate_strlen_builtin (loop, reduction_var, load_iv.base,
   3742  1.1  mrg 				 reduction_iv.base, loc);
   3743  1.1  mrg       else if (direct_optab_handler (rawmemchr_optab, TYPE_MODE (load_type))
   3744  1.1  mrg 	       != CODE_FOR_nothing
   3745  1.1  mrg 	       && ((TYPE_PRECISION (ptrdiff_type_node) == TYPE_PRECISION (ptr_type_node)
   3746  1.1  mrg 		    && TYPE_PRECISION (ptrdiff_type_node) >= 32)
   3747  1.1  mrg 		   || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
   3748  1.1  mrg 		       && reduction_var_overflows_first (reduction_var_type, load_type))))
   3749  1.1  mrg 	generate_strlen_builtin_using_rawmemchr (loop, reduction_var,
   3750  1.1  mrg 						 load_iv.base,
   3751  1.1  mrg 						 load_type,
   3752  1.1  mrg 						 reduction_iv.base, loc);
   3753  1.1  mrg       else
   3754  1.1  mrg 	return false;
   3755  1.1  mrg       return true;
   3756  1.1  mrg     }
   3757  1.1  mrg 
   3758  1.1  mrg   return false;
   3759  1.1  mrg }
   3760  1.1  mrg 
   3761  1.1  mrg /* Given innermost LOOP, return the outermost enclosing loop that forms a
   3762  1.1  mrg    perfect loop nest.  */
   3763  1.1  mrg 
   3764  1.1  mrg static class loop *
   3765  1.1  mrg prepare_perfect_loop_nest (class loop *loop)
   3766  1.1  mrg {
   3767  1.1  mrg   class loop *outer = loop_outer (loop);
   3768  1.1  mrg   tree niters = number_of_latch_executions (loop);
   3769  1.1  mrg 
   3770  1.1  mrg   /* TODO: We only support the innermost 3-level loop nest distribution
   3771  1.1  mrg      because of compilation time issue for now.  This should be relaxed
   3772  1.1  mrg      in the future.  Note we only allow 3-level loop nest distribution
   3773  1.1  mrg      when parallelizing loops.  */
   3774  1.1  mrg   while ((loop->inner == NULL
   3775  1.1  mrg 	  || (loop->inner->inner == NULL && flag_tree_parallelize_loops > 1))
   3776  1.1  mrg 	 && loop_outer (outer)
   3777  1.1  mrg 	 && outer->inner == loop && loop->next == NULL
   3778  1.1  mrg 	 && single_exit (outer)
   3779  1.1  mrg 	 && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
   3780  1.1  mrg 	 && (niters = number_of_latch_executions (outer)) != NULL_TREE
   3781  1.1  mrg 	 && niters != chrec_dont_know)
   3782  1.1  mrg     {
   3783  1.1  mrg       loop = outer;
   3784  1.1  mrg       outer = loop_outer (loop);
   3785  1.1  mrg     }
   3786  1.1  mrg 
   3787  1.1  mrg   return loop;
   3788  1.1  mrg }
   3789  1.1  mrg 
   3790  1.1  mrg 
   3791  1.1  mrg unsigned int
   3792  1.1  mrg loop_distribution::execute (function *fun)
   3793  1.1  mrg {
   3794  1.1  mrg   bool changed = false;
   3795  1.1  mrg   basic_block bb;
   3796  1.1  mrg   control_dependences *cd = NULL;
   3797  1.1  mrg   auto_vec<loop_p> loops_to_be_destroyed;
   3798  1.1  mrg 
   3799  1.1  mrg   if (number_of_loops (fun) <= 1)
   3800  1.1  mrg     return 0;
   3801  1.1  mrg 
   3802  1.1  mrg   bb_top_order_init ();
   3803  1.1  mrg 
   3804  1.1  mrg   FOR_ALL_BB_FN (bb, fun)
   3805  1.1  mrg     {
   3806  1.1  mrg       gimple_stmt_iterator gsi;
   3807  1.1  mrg       for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
   3808  1.1  mrg 	gimple_set_uid (gsi_stmt (gsi), -1);
   3809  1.1  mrg       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
   3810  1.1  mrg 	gimple_set_uid (gsi_stmt (gsi), -1);
   3811  1.1  mrg     }
   3812  1.1  mrg 
   3813  1.1  mrg   /* We can at the moment only distribute non-nested loops, thus restrict
   3814  1.1  mrg      walking to innermost loops.  */
   3815  1.1  mrg   for (auto loop : loops_list (cfun, LI_ONLY_INNERMOST))
   3816  1.1  mrg     {
   3817  1.1  mrg       /* Don't distribute multiple exit edges loop, or cold loop when
   3818  1.1  mrg          not doing pattern detection.  */
   3819  1.1  mrg       if (!single_exit (loop)
   3820  1.1  mrg 	  || (!flag_tree_loop_distribute_patterns
   3821  1.1  mrg 	      && !optimize_loop_for_speed_p (loop)))
   3822  1.1  mrg 	continue;
   3823  1.1  mrg 
   3824  1.1  mrg       /* If niters is unknown don't distribute loop but rather try to transform
   3825  1.1  mrg 	 it to a call to a builtin.  */
   3826  1.1  mrg       tree niters = number_of_latch_executions (loop);
   3827  1.1  mrg       if (niters == NULL_TREE || niters == chrec_dont_know)
   3828  1.1  mrg 	{
   3829  1.1  mrg 	  datarefs_vec.create (20);
   3830  1.1  mrg 	  if (flag_tree_loop_distribute_patterns
   3831  1.1  mrg 	      && transform_reduction_loop (loop))
   3832  1.1  mrg 	    {
   3833  1.1  mrg 	      changed = true;
   3834  1.1  mrg 	      loops_to_be_destroyed.safe_push (loop);
   3835  1.1  mrg 	      if (dump_enabled_p ())
   3836  1.1  mrg 		{
   3837  1.1  mrg 		  dump_user_location_t loc = find_loop_location (loop);
   3838  1.1  mrg 		  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
   3839  1.1  mrg 				   loc, "Loop %d transformed into a builtin.\n",
   3840  1.1  mrg 				   loop->num);
   3841  1.1  mrg 		}
   3842  1.1  mrg 	    }
   3843  1.1  mrg 	  free_data_refs (datarefs_vec);
   3844  1.1  mrg 	  continue;
   3845  1.1  mrg 	}
   3846  1.1  mrg 
   3847  1.1  mrg       /* Get the perfect loop nest for distribution.  */
   3848  1.1  mrg       loop = prepare_perfect_loop_nest (loop);
   3849  1.1  mrg       for (; loop; loop = loop->inner)
   3850  1.1  mrg 	{
   3851  1.1  mrg 	  auto_vec<gimple *> work_list;
   3852  1.1  mrg 	  if (!find_seed_stmts_for_distribution (loop, &work_list))
   3853  1.1  mrg 	    break;
   3854  1.1  mrg 
   3855  1.1  mrg 	  const char *str = loop->inner ? " nest" : "";
   3856  1.1  mrg 	  dump_user_location_t loc = find_loop_location (loop);
   3857  1.1  mrg 	  if (!cd)
   3858  1.1  mrg 	    {
   3859  1.1  mrg 	      calculate_dominance_info (CDI_DOMINATORS);
   3860  1.1  mrg 	      calculate_dominance_info (CDI_POST_DOMINATORS);
   3861  1.1  mrg 	      cd = new control_dependences ();
   3862  1.1  mrg 	      free_dominance_info (CDI_POST_DOMINATORS);
   3863  1.1  mrg 	    }
   3864  1.1  mrg 
   3865  1.1  mrg 	  bool destroy_p;
   3866  1.1  mrg 	  int nb_generated_loops, nb_generated_calls;
   3867  1.1  mrg 	  nb_generated_loops
   3868  1.1  mrg 	    = distribute_loop (loop, work_list, cd, &nb_generated_calls,
   3869  1.1  mrg 			       &destroy_p, (!optimize_loop_for_speed_p (loop)
   3870  1.1  mrg 					    || !flag_tree_loop_distribution));
   3871  1.1  mrg 	  if (destroy_p)
   3872  1.1  mrg 	    loops_to_be_destroyed.safe_push (loop);
   3873  1.1  mrg 
   3874  1.1  mrg 	  if (nb_generated_loops + nb_generated_calls > 0)
   3875  1.1  mrg 	    {
   3876  1.1  mrg 	      changed = true;
   3877  1.1  mrg 	      if (dump_enabled_p ())
   3878  1.1  mrg 		dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
   3879  1.1  mrg 				 loc, "Loop%s %d distributed: split to %d loops "
   3880  1.1  mrg 				 "and %d library calls.\n", str, loop->num,
   3881  1.1  mrg 				 nb_generated_loops, nb_generated_calls);
   3882  1.1  mrg 
   3883  1.1  mrg 	      break;
   3884  1.1  mrg 	    }
   3885  1.1  mrg 
   3886  1.1  mrg 	  if (dump_file && (dump_flags & TDF_DETAILS))
   3887  1.1  mrg 	    fprintf (dump_file, "Loop%s %d not distributed.\n", str, loop->num);
   3888  1.1  mrg 	}
   3889  1.1  mrg     }
   3890  1.1  mrg 
   3891  1.1  mrg   if (cd)
   3892  1.1  mrg     delete cd;
   3893  1.1  mrg 
   3894  1.1  mrg   if (bb_top_order_index != NULL)
   3895  1.1  mrg     bb_top_order_destroy ();
   3896  1.1  mrg 
   3897  1.1  mrg   if (changed)
   3898  1.1  mrg     {
   3899  1.1  mrg       /* Destroy loop bodies that could not be reused.  Do this late as we
   3900  1.1  mrg 	 otherwise can end up refering to stale data in control dependences.  */
   3901  1.1  mrg       unsigned i;
   3902  1.1  mrg       class loop *loop;
   3903  1.1  mrg       FOR_EACH_VEC_ELT (loops_to_be_destroyed, i, loop)
   3904  1.1  mrg 	destroy_loop (loop);
   3905  1.1  mrg 
   3906  1.1  mrg       /* Cached scalar evolutions now may refer to wrong or non-existing
   3907  1.1  mrg 	 loops.  */
   3908  1.1  mrg       scev_reset ();
   3909  1.1  mrg       mark_virtual_operands_for_renaming (fun);
   3910  1.1  mrg       rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
   3911  1.1  mrg     }
   3912  1.1  mrg 
   3913  1.1  mrg   checking_verify_loop_structure ();
   3914  1.1  mrg 
   3915  1.1  mrg   return changed ? TODO_cleanup_cfg : 0;
   3916  1.1  mrg }
   3917  1.1  mrg 
   3918  1.1  mrg 
   3919  1.1  mrg /* Distribute all loops in the current function.  */
   3920  1.1  mrg 
   3921  1.1  mrg namespace {
   3922  1.1  mrg 
   3923  1.1  mrg const pass_data pass_data_loop_distribution =
   3924  1.1  mrg {
   3925  1.1  mrg   GIMPLE_PASS, /* type */
   3926  1.1  mrg   "ldist", /* name */
   3927  1.1  mrg   OPTGROUP_LOOP, /* optinfo_flags */
   3928  1.1  mrg   TV_TREE_LOOP_DISTRIBUTION, /* tv_id */
   3929  1.1  mrg   ( PROP_cfg | PROP_ssa ), /* properties_required */
   3930  1.1  mrg   0, /* properties_provided */
   3931  1.1  mrg   0, /* properties_destroyed */
   3932  1.1  mrg   0, /* todo_flags_start */
   3933  1.1  mrg   0, /* todo_flags_finish */
   3934  1.1  mrg };
   3935  1.1  mrg 
   3936  1.1  mrg class pass_loop_distribution : public gimple_opt_pass
   3937  1.1  mrg {
   3938  1.1  mrg public:
   3939  1.1  mrg   pass_loop_distribution (gcc::context *ctxt)
   3940  1.1  mrg     : gimple_opt_pass (pass_data_loop_distribution, ctxt)
   3941  1.1  mrg   {}
   3942  1.1  mrg 
   3943  1.1  mrg   /* opt_pass methods: */
   3944  1.1  mrg   virtual bool gate (function *)
   3945  1.1  mrg     {
   3946  1.1  mrg       return flag_tree_loop_distribution
   3947  1.1  mrg 	|| flag_tree_loop_distribute_patterns;
   3948  1.1  mrg     }
   3949  1.1  mrg 
   3950  1.1  mrg   virtual unsigned int execute (function *);
   3951  1.1  mrg 
   3952  1.1  mrg }; // class pass_loop_distribution
   3953  1.1  mrg 
   3954  1.1  mrg unsigned int
   3955  1.1  mrg pass_loop_distribution::execute (function *fun)
   3956  1.1  mrg {
   3957  1.1  mrg   return loop_distribution ().execute (fun);
   3958  1.1  mrg }
   3959  1.1  mrg 
   3960  1.1  mrg } // anon namespace
   3961  1.1  mrg 
   3962  1.1  mrg gimple_opt_pass *
   3963  1.1  mrg make_pass_loop_distribution (gcc::context *ctxt)
   3964  1.1  mrg {
   3965  1.1  mrg   return new pass_loop_distribution (ctxt);
   3966  1.1  mrg }
   3967