Home | History | Annotate | Line # | Download | only in manual
      1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
      2 	 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
      3 <?dbhtml filename="parallel_mode.html"?>
      4 
      5 <info><title>Parallel Mode</title>
      6   <keywordset>
      7     <keyword>C++</keyword>
      8     <keyword>library</keyword>
      9     <keyword>parallel</keyword>
     10   </keywordset>
     11 </info>
     12 
     13 
     14 
     15 <para> The libstdc++ parallel mode is an experimental parallel
     16 implementation of many algorithms of the C++ Standard Library.
     17 </para>
     18 
     19 <para>
     20 Several of the standard algorithms, for instance
     21 <function>std::sort</function>, are made parallel using OpenMP
     22 annotations. These parallel mode constructs can be invoked by
     23 explicit source declaration or by compiling existing sources with a
     24 specific compiler flag.
     25 </para>
     26 
     27 <note>
     28   <para>
     29     The parallel mode has not been kept up to date with recent C++ standards
     30     and so it only conforms to the C++03 requirements.
     31     That means that move-only predicates may not work with parallel mode
     32     algorithms, and for C++20 most of the algorithms cannot be used in
     33     <code>constexpr</code> functions.
     34   </para>
     35   <para>
     36     For C++17 and above there are new overloads of the standard algorithms
     37     which take an execution policy argument. You should consider using those
     38     instead of the non-standard parallel mode extensions.
     39   </para>
     40 </note>
     41 
     42 <section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
     43 
     44 
     45 <para>The following library components in the include
     46 <filename class="headerfile">numeric</filename> are included in the parallel mode:</para>
     47 <itemizedlist>
     48   <listitem><para><function>std::accumulate</function></para></listitem>
     49   <listitem><para><function>std::adjacent_difference</function></para></listitem>
     50   <listitem><para><function>std::inner_product</function></para></listitem>
     51   <listitem><para><function>std::partial_sum</function></para></listitem>
     52 </itemizedlist>
     53 
     54 <para>The following library components in the include
     55 <filename class="headerfile">algorithm</filename> are included in the parallel mode:</para>
     56 <itemizedlist>
     57   <listitem><para><function>std::adjacent_find</function></para></listitem>
     58   <listitem><para><function>std::count</function></para></listitem>
     59   <listitem><para><function>std::count_if</function></para></listitem>
     60   <listitem><para><function>std::equal</function></para></listitem>
     61   <listitem><para><function>std::find</function></para></listitem>
     62   <listitem><para><function>std::find_if</function></para></listitem>
     63   <listitem><para><function>std::find_first_of</function></para></listitem>
     64   <listitem><para><function>std::for_each</function></para></listitem>
     65   <listitem><para><function>std::generate</function></para></listitem>
     66   <listitem><para><function>std::generate_n</function></para></listitem>
     67   <listitem><para><function>std::lexicographical_compare</function></para></listitem>
     68   <listitem><para><function>std::mismatch</function></para></listitem>
     69   <listitem><para><function>std::search</function></para></listitem>
     70   <listitem><para><function>std::search_n</function></para></listitem>
     71   <listitem><para><function>std::transform</function></para></listitem>
     72   <listitem><para><function>std::replace</function></para></listitem>
     73   <listitem><para><function>std::replace_if</function></para></listitem>
     74   <listitem><para><function>std::max_element</function></para></listitem>
     75   <listitem><para><function>std::merge</function></para></listitem>
     76   <listitem><para><function>std::min_element</function></para></listitem>
     77   <listitem><para><function>std::nth_element</function></para></listitem>
     78   <listitem><para><function>std::partial_sort</function></para></listitem>
     79   <listitem><para><function>std::partition</function></para></listitem>
     80   <listitem><para><function>std::random_shuffle</function></para></listitem>
     81   <listitem><para><function>std::set_union</function></para></listitem>
     82   <listitem><para><function>std::set_intersection</function></para></listitem>
     83   <listitem><para><function>std::set_symmetric_difference</function></para></listitem>
     84   <listitem><para><function>std::set_difference</function></para></listitem>
     85   <listitem><para><function>std::sort</function></para></listitem>
     86   <listitem><para><function>std::stable_sort</function></para></listitem>
     87   <listitem><para><function>std::unique_copy</function></para></listitem>
     88 </itemizedlist>
     89 
     90 </section>
     91 
     92 <section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info>
     93 <?dbhtml filename="parallel_mode_semantics.html"?>
     94 
     95 
     96 <para> The parallel mode STL algorithms are currently not exception-safe,
     97 i.e. user-defined functors must not throw exceptions.
     98 Also, the order of execution is not guaranteed for some functions, of course.
     99 Therefore, user-defined functors should not have any concurrent side effects.
    100 </para>
    101 
    102 <para> Since the current GCC OpenMP implementation does not support
    103 OpenMP parallel regions in concurrent threads,
    104 it is not possible to call parallel STL algorithm in
    105 concurrent threads, either.
    106 It might work with other compilers, though.</para>
    107 
    108 </section>
    109 
    110 <section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info>
    111 <?dbhtml filename="parallel_mode_using.html"?>
    112 
    113 
    114 <section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info>
    115 
    116 
    117 <para>
    118   Any use of parallel functionality requires additional compiler
    119   and runtime support, in particular support for OpenMP. Adding this support is
    120   not difficult: just compile your application with the compiler
    121   flag <literal>-fopenmp</literal>. This will link
    122   in <code>libgomp</code>, the
    123   <link xmlns:xlink="http://www.w3.org/1999/xlink"
    124     xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and
    125     Multi Processing Runtime Library</link>,
    126   whose presence is mandatory.
    127 </para>
    128 
    129 <para>
    130 In addition, hardware that supports atomic operations and a compiler
    131   capable of producing atomic operations is mandatory: GCC defaults to no
    132   support for atomic operations on some common hardware
    133   architectures. Activating atomic operations may require explicit
    134   compiler flags on some targets (like sparc and x86), such
    135   as <literal>-march=i686</literal>,
    136   <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See
    137   the GCC manual for more information.
    138 </para>
    139 
    140 </section>
    141 
    142 <section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info>
    143 
    144 
    145 <para>
    146   To use the libstdc++ parallel mode, compile your application with
    147   the prerequisite flags as detailed above, and in addition
    148   add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all
    149   use of the standard (sequential) algorithms to the appropriate parallel
    150   equivalents. Please note that this doesn't necessarily mean that
    151   everything will end up being executed in a parallel manner, but
    152   rather that the heuristics and settings coded into the parallel
    153   versions will be used to determine if all, some, or no algorithms
    154   will be executed using parallel variants.
    155 </para>
    156 
    157 <para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the
    158   sizes and behavior of standard class templates such as
    159   <function>std::search</function>, and therefore one can only link code
    160   compiled with parallel mode and code compiled without parallel mode
    161   if no instantiation of a container is passed between the two
    162   translation units. Parallel mode functionality has distinct linkage,
    163   and cannot be confused with normal mode symbols.
    164 </para>
    165 </section>
    166 
    167 <section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info>
    168 
    169 
    170 <para>When it is not feasible to recompile your entire application, or
    171   only specific algorithms need to be parallel-aware, individual
    172   parallel algorithms can be made available explicitly. These
    173   parallel algorithms are functionally equivalent to the standard
    174   drop-in algorithms used in parallel mode, but they are available in
    175   a separate namespace as GNU extensions and may be used in programs
    176   compiled with either release mode or with parallel mode.
    177 </para>
    178 
    179 
    180 <para>An example of using a parallel version
    181 of <function>std::sort</function>, but no other parallel algorithms, is:
    182 </para>
    183 
    184 <programlisting>
    185 #include &lt;vector&gt;
    186 #include &lt;parallel/algorithm&gt;
    187 
    188 int main()
    189 {
    190   std::vector&lt;int&gt; v(100);
    191 
    192   // ...
    193 
    194   // Explicitly force a call to parallel sort.
    195   __gnu_parallel::sort(v.begin(), v.end());
    196   return 0;
    197 }
    198 </programlisting>
    199 
    200 <para>
    201 Then compile this code with the prerequisite compiler flags
    202 (<literal>-fopenmp</literal> and any necessary architecture-specific
    203 flags for atomic operations.)
    204 </para>
    205 
    206 <para> The following table provides the names and headers of all the
    207   parallel algorithms that can be used in a similar manner:
    208 </para>
    209 
    210 <table frame="all" xml:id="table.parallel_algos">
    211 <title>Parallel Algorithms</title>
    212 
    213 <tgroup cols="4" align="left" colsep="1" rowsep="1">
    214 <colspec colname="c1"/>
    215 <colspec colname="c2"/>
    216 <colspec colname="c3"/>
    217 <colspec colname="c4"/>
    218 
    219 <thead>
    220   <row>
    221     <entry>Algorithm</entry>
    222     <entry>Header</entry>
    223     <entry>Parallel algorithm</entry>
    224     <entry>Parallel header</entry>
    225   </row>
    226 </thead>
    227 
    228 <tbody>
    229   <row>
    230     <entry><function>std::accumulate</function></entry>
    231     <entry><filename class="headerfile">numeric</filename></entry>
    232     <entry><function>__gnu_parallel::accumulate</function></entry>
    233     <entry><filename class="headerfile">parallel/numeric</filename></entry>
    234   </row>
    235   <row>
    236     <entry><function>std::adjacent_difference</function></entry>
    237     <entry><filename class="headerfile">numeric</filename></entry>
    238     <entry><function>__gnu_parallel::adjacent_difference</function></entry>
    239     <entry><filename class="headerfile">parallel/numeric</filename></entry>
    240   </row>
    241   <row>
    242     <entry><function>std::inner_product</function></entry>
    243     <entry><filename class="headerfile">numeric</filename></entry>
    244     <entry><function>__gnu_parallel::inner_product</function></entry>
    245     <entry><filename class="headerfile">parallel/numeric</filename></entry>
    246   </row>
    247   <row>
    248     <entry><function>std::partial_sum</function></entry>
    249     <entry><filename class="headerfile">numeric</filename></entry>
    250     <entry><function>__gnu_parallel::partial_sum</function></entry>
    251     <entry><filename class="headerfile">parallel/numeric</filename></entry>
    252   </row>
    253   <row>
    254     <entry><function>std::adjacent_find</function></entry>
    255     <entry><filename class="headerfile">algorithm</filename></entry>
    256     <entry><function>__gnu_parallel::adjacent_find</function></entry>
    257     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    258   </row>
    259 
    260   <row>
    261     <entry><function>std::count</function></entry>
    262     <entry><filename class="headerfile">algorithm</filename></entry>
    263     <entry><function>__gnu_parallel::count</function></entry>
    264     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    265   </row>
    266 
    267   <row>
    268     <entry><function>std::count_if</function></entry>
    269     <entry><filename class="headerfile">algorithm</filename></entry>
    270     <entry><function>__gnu_parallel::count_if</function></entry>
    271     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    272   </row>
    273 
    274   <row>
    275     <entry><function>std::equal</function></entry>
    276     <entry><filename class="headerfile">algorithm</filename></entry>
    277     <entry><function>__gnu_parallel::equal</function></entry>
    278     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    279   </row>
    280 
    281   <row>
    282     <entry><function>std::find</function></entry>
    283     <entry><filename class="headerfile">algorithm</filename></entry>
    284     <entry><function>__gnu_parallel::find</function></entry>
    285     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    286   </row>
    287 
    288   <row>
    289     <entry><function>std::find_if</function></entry>
    290     <entry><filename class="headerfile">algorithm</filename></entry>
    291     <entry><function>__gnu_parallel::find_if</function></entry>
    292     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    293   </row>
    294 
    295   <row>
    296     <entry><function>std::find_first_of</function></entry>
    297     <entry><filename class="headerfile">algorithm</filename></entry>
    298     <entry><function>__gnu_parallel::find_first_of</function></entry>
    299     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    300   </row>
    301 
    302   <row>
    303     <entry><function>std::for_each</function></entry>
    304     <entry><filename class="headerfile">algorithm</filename></entry>
    305     <entry><function>__gnu_parallel::for_each</function></entry>
    306     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    307   </row>
    308 
    309   <row>
    310     <entry><function>std::generate</function></entry>
    311     <entry><filename class="headerfile">algorithm</filename></entry>
    312     <entry><function>__gnu_parallel::generate</function></entry>
    313     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    314   </row>
    315 
    316   <row>
    317     <entry><function>std::generate_n</function></entry>
    318     <entry><filename class="headerfile">algorithm</filename></entry>
    319     <entry><function>__gnu_parallel::generate_n</function></entry>
    320     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    321   </row>
    322 
    323   <row>
    324     <entry><function>std::lexicographical_compare</function></entry>
    325     <entry><filename class="headerfile">algorithm</filename></entry>
    326     <entry><function>__gnu_parallel::lexicographical_compare</function></entry>
    327     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    328   </row>
    329 
    330   <row>
    331     <entry><function>std::mismatch</function></entry>
    332     <entry><filename class="headerfile">algorithm</filename></entry>
    333     <entry><function>__gnu_parallel::mismatch</function></entry>
    334     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    335   </row>
    336 
    337   <row>
    338     <entry><function>std::search</function></entry>
    339     <entry><filename class="headerfile">algorithm</filename></entry>
    340     <entry><function>__gnu_parallel::search</function></entry>
    341     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    342   </row>
    343 
    344   <row>
    345     <entry><function>std::search_n</function></entry>
    346     <entry><filename class="headerfile">algorithm</filename></entry>
    347     <entry><function>__gnu_parallel::search_n</function></entry>
    348     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    349   </row>
    350 
    351   <row>
    352     <entry><function>std::transform</function></entry>
    353     <entry><filename class="headerfile">algorithm</filename></entry>
    354     <entry><function>__gnu_parallel::transform</function></entry>
    355     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    356   </row>
    357 
    358   <row>
    359     <entry><function>std::replace</function></entry>
    360     <entry><filename class="headerfile">algorithm</filename></entry>
    361     <entry><function>__gnu_parallel::replace</function></entry>
    362     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    363   </row>
    364 
    365   <row>
    366     <entry><function>std::replace_if</function></entry>
    367     <entry><filename class="headerfile">algorithm</filename></entry>
    368     <entry><function>__gnu_parallel::replace_if</function></entry>
    369     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    370   </row>
    371 
    372   <row>
    373     <entry><function>std::max_element</function></entry>
    374     <entry><filename class="headerfile">algorithm</filename></entry>
    375     <entry><function>__gnu_parallel::max_element</function></entry>
    376     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    377   </row>
    378 
    379   <row>
    380     <entry><function>std::merge</function></entry>
    381     <entry><filename class="headerfile">algorithm</filename></entry>
    382     <entry><function>__gnu_parallel::merge</function></entry>
    383     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    384   </row>
    385 
    386   <row>
    387     <entry><function>std::min_element</function></entry>
    388     <entry><filename class="headerfile">algorithm</filename></entry>
    389     <entry><function>__gnu_parallel::min_element</function></entry>
    390     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    391   </row>
    392 
    393   <row>
    394     <entry><function>std::nth_element</function></entry>
    395     <entry><filename class="headerfile">algorithm</filename></entry>
    396     <entry><function>__gnu_parallel::nth_element</function></entry>
    397     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    398   </row>
    399 
    400   <row>
    401     <entry><function>std::partial_sort</function></entry>
    402     <entry><filename class="headerfile">algorithm</filename></entry>
    403     <entry><function>__gnu_parallel::partial_sort</function></entry>
    404     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    405   </row>
    406 
    407   <row>
    408     <entry><function>std::partition</function></entry>
    409     <entry><filename class="headerfile">algorithm</filename></entry>
    410     <entry><function>__gnu_parallel::partition</function></entry>
    411     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    412   </row>
    413 
    414   <row>
    415     <entry><function>std::random_shuffle</function></entry>
    416     <entry><filename class="headerfile">algorithm</filename></entry>
    417     <entry><function>__gnu_parallel::random_shuffle</function></entry>
    418     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    419   </row>
    420 
    421   <row>
    422     <entry><function>std::set_union</function></entry>
    423     <entry><filename class="headerfile">algorithm</filename></entry>
    424     <entry><function>__gnu_parallel::set_union</function></entry>
    425     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    426   </row>
    427 
    428   <row>
    429     <entry><function>std::set_intersection</function></entry>
    430     <entry><filename class="headerfile">algorithm</filename></entry>
    431     <entry><function>__gnu_parallel::set_intersection</function></entry>
    432     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    433   </row>
    434 
    435   <row>
    436     <entry><function>std::set_symmetric_difference</function></entry>
    437     <entry><filename class="headerfile">algorithm</filename></entry>
    438     <entry><function>__gnu_parallel::set_symmetric_difference</function></entry>
    439     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    440   </row>
    441 
    442   <row>
    443     <entry><function>std::set_difference</function></entry>
    444     <entry><filename class="headerfile">algorithm</filename></entry>
    445     <entry><function>__gnu_parallel::set_difference</function></entry>
    446     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    447   </row>
    448 
    449   <row>
    450     <entry><function>std::sort</function></entry>
    451     <entry><filename class="headerfile">algorithm</filename></entry>
    452     <entry><function>__gnu_parallel::sort</function></entry>
    453     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    454   </row>
    455 
    456   <row>
    457     <entry><function>std::stable_sort</function></entry>
    458     <entry><filename class="headerfile">algorithm</filename></entry>
    459     <entry><function>__gnu_parallel::stable_sort</function></entry>
    460     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    461   </row>
    462 
    463   <row>
    464     <entry><function>std::unique_copy</function></entry>
    465     <entry><filename class="headerfile">algorithm</filename></entry>
    466     <entry><function>__gnu_parallel::unique_copy</function></entry>
    467     <entry><filename class="headerfile">parallel/algorithm</filename></entry>
    468   </row>
    469 </tbody>
    470 </tgroup>
    471 </table>
    472 
    473 </section>
    474 
    475 </section>
    476 
    477 <section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info>
    478 <?dbhtml filename="parallel_mode_design.html"?>
    479 
    480   <para>
    481   </para>
    482 <section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info>
    483 
    484 
    485 <para>
    486 All parallel algorithms are intended to have signatures that are
    487 equivalent to the ISO C++ algorithms replaced. For instance, the
    488 <function>std::adjacent_find</function> function is declared as:
    489 </para>
    490 <programlisting>
    491 namespace std
    492 {
    493   template&lt;typename _FIter&gt;
    494     _FIter
    495     adjacent_find(_FIter, _FIter);
    496 }
    497 </programlisting>
    498 
    499 <para>
    500 Which means that there should be something equivalent for the parallel
    501 version. Indeed, this is the case:
    502 </para>
    503 
    504 <programlisting>
    505 namespace std
    506 {
    507   namespace __parallel
    508   {
    509     template&lt;typename _FIter&gt;
    510       _FIter
    511       adjacent_find(_FIter, _FIter);
    512 
    513     ...
    514   }
    515 }
    516 </programlisting>
    517 
    518 <para>But.... why the ellipses?
    519 </para>
    520 
    521 <para> The ellipses in the example above represent additional overloads
    522 required for the parallel version of the function. These additional
    523 overloads are used to dispatch calls from the ISO C++ function
    524 signature to the appropriate parallel function (or sequential
    525 function, if no parallel functions are deemed worthy), based on either
    526 compile-time or run-time conditions.
    527 </para>
    528 
    529 <para> The available signature options are specific for the different
    530 algorithms/algorithm classes.</para>
    531 
    532 <para> The general view of overloads for the parallel algorithms look like this:
    533 </para>
    534 <itemizedlist>
    535    <listitem><para>ISO C++ signature</para></listitem>
    536    <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem>
    537    <listitem><para>ISO C++ signature + algorithm-specific tag type
    538     (several signatures)</para></listitem>
    539 </itemizedlist>
    540 
    541 <para> Please note that the implementation may use additional functions
    542 (designated with the <code>_switch</code> suffix) to dispatch from the
    543 ISO C++ signature to the correct parallel version. Also, some of the
    544 algorithms do not have support for run-time conditions, so the last
    545 overload is therefore missing.
    546 </para>
    547 
    548 
    549 </section>
    550 
    551 <section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info>
    552 
    553 
    554 
    555 <section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info>
    556 
    557 
    558 <para>
    559 Several aspects of the overall runtime environment can be manipulated
    560 by standard OpenMP function calls.
    561 </para>
    562 
    563 <para>
    564 To specify the number of threads to be used for the algorithms globally,
    565 use the function <function>omp_set_num_threads</function>. An example:
    566 </para>
    567 
    568 <programlisting>
    569 #include &lt;stdlib.h&gt;
    570 #include &lt;omp.h&gt;
    571 
    572 int main()
    573 {
    574   // Explicitly set number of threads.
    575   const int threads_wanted = 20;
    576   omp_set_dynamic(false);
    577   omp_set_num_threads(threads_wanted);
    578 
    579   // Call parallel mode algorithms.
    580 
    581   return 0;
    582 }
    583 </programlisting>
    584 
    585 <para>
    586  Some algorithms allow the number of threads being set for a particular call,
    587  by augmenting the algorithm variant.
    588  See the next section for further information.
    589 </para>
    590 
    591 <para>
    592 Other parts of the runtime environment able to be manipulated include
    593 nested parallelism (<function>omp_set_nested</function>), schedule kind
    594 (<function>omp_set_schedule</function>), and others. See the OpenMP
    595 documentation for more information.
    596 </para>
    597 
    598 </section>
    599 
    600 <section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info>
    601 
    602 
    603 <para>
    604 To force an algorithm to execute sequentially, even though parallelism
    605 is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>,
    606 add <classname>__gnu_parallel::sequential_tag()</classname> to the end
    607 of the algorithm's argument list.
    608 </para>
    609 
    610 <para>
    611 Like so:
    612 </para>
    613 
    614 <programlisting>
    615 std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
    616 </programlisting>
    617 
    618 <para>
    619 Some parallel algorithm variants can be excluded from compilation by
    620 preprocessor defines. See the doxygen documentation on
    621 <code>compiletime_settings.h</code> and <code>features.h</code> for details.
    622 </para>
    623 
    624 <para>
    625 For some algorithms, the desired variant can be chosen at compile-time by
    626 appending a tag object. The available options are specific to the particular
    627 algorithm (class).
    628 </para>
    629 
    630 <para>
    631 For the "embarrassingly parallel" algorithms, there is only one "tag object
    632 type", the enum _Parallelism.
    633 It takes one of the following values,
    634 <code>__gnu_parallel::parallel_tag</code>,
    635 <code>__gnu_parallel::balanced_tag</code>,
    636 <code>__gnu_parallel::unbalanced_tag</code>,
    637 <code>__gnu_parallel::omp_loop_tag</code>,
    638 <code>__gnu_parallel::omp_loop_static_tag</code>.
    639 This means that the actual parallelization strategy is chosen at run-time.
    640 (Choosing the variants at compile-time will come soon.)
    641 </para>
    642 
    643 <para>
    644 For the following algorithms in general, we have
    645 <code>__gnu_parallel::parallel_tag</code> and
    646 <code>__gnu_parallel::default_parallel_tag</code>, in addition to
    647 <code>__gnu_parallel::sequential_tag</code>.
    648 <code>__gnu_parallel::default_parallel_tag</code> chooses the default
    649 algorithm at compiletime, as does omitting the tag.
    650 <code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime
    651 (see next section).
    652 For all tags, the number of threads desired for this call can optionally be
    653 passed to the respective tag's constructor.
    654 </para>
    655 
    656 <para>
    657 The <code>multiway_merge</code> algorithm comes with the additional choices,
    658 <code>__gnu_parallel::exact_tag</code> and
    659 <code>__gnu_parallel::sampling_tag</code>.
    660 Exact and sampling are the two available splitting strategies.
    661 </para>
    662 
    663 <para>
    664 For the <code>sort</code> and <code>stable_sort</code> algorithms, there are
    665 several additional choices, namely
    666 <code>__gnu_parallel::multiway_mergesort_tag</code>,
    667 <code>__gnu_parallel::multiway_mergesort_exact_tag</code>,
    668 <code>__gnu_parallel::multiway_mergesort_sampling_tag</code>,
    669 <code>__gnu_parallel::quicksort_tag</code>, and
    670 <code>__gnu_parallel::balanced_quicksort_tag</code>.
    671 Multiway mergesort comes with the two splitting strategies for multi-way
    672 merging. The quicksort options cannot be used for <code>stable_sort</code>.
    673 </para>
    674 
    675 </section>
    676 
    677 <section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info>
    678 
    679 
    680 <para>
    681 The default parallelization strategy, the choice of specific algorithm
    682 strategy, the minimum threshold limits for individual parallel
    683 algorithms, and aspects of the underlying hardware can be specified as
    684 desired via manipulation
    685 of <classname>__gnu_parallel::_Settings</classname> member data.
    686 </para>
    687 
    688 <para>
    689 First off, the choice of parallelization strategy: serial, parallel,
    690 or heuristically deduced. This corresponds
    691 to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a
    692 value of enum <type>__gnu_parallel::_AlgorithmStrategy</type>
    693 type. Choices
    694 include: <type>heuristic</type>, <type>force_sequential</type>,
    695 and <type>force_parallel</type>. The default is <type>heuristic</type>.
    696 </para>
    697 
    698 
    699 <para>
    700 Next, the sub-choices for algorithm variant, if not fixed at compile-time.
    701 Specific algorithms like <function>find</function> or <function>sort</function>
    702 can be implemented in multiple ways: when this is the case,
    703 a <classname>__gnu_parallel::_Settings</classname> member exists to
    704 pick the default strategy. For
    705 example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can
    706 have any values of
    707 enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>,
    708 or <type>QS_BALANCED</type>.
    709 </para>
    710 
    711 <para>
    712 Likewise for setting the minimal threshold for algorithm
    713 parallelization.  Parallelism always incurs some overhead. Thus, it is
    714 not helpful to parallelize operations on very small sets of
    715 data. Because of this, measures are taken to avoid parallelizing below
    716 a certain, pre-determined threshold. For each algorithm, a minimum
    717 problem size is encoded as a variable in the
    718 active <classname>__gnu_parallel::_Settings</classname> object.  This
    719 threshold variable follows the following naming scheme:
    720 <code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So,
    721 for <function>fill</function>, the threshold variable
    722 is <code>__gnu_parallel::_Settings::fill_minimal_n</code>,
    723 </para>
    724 
    725 <para>
    726 Finally, hardware details like L1/L2 cache size can be hardwired
    727 via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends.
    728 </para>
    729 
    730 <para>
    731 </para>
    732 
    733 <para>
    734 All these configuration variables can be changed by the user, if
    735 desired.
    736 There exists one global instance of the class <classname>_Settings</classname>,
    737 i. e. it is a singleton. It can be read and written by calling
    738 <code>__gnu_parallel::_Settings::get</code> and
    739 <code>__gnu_parallel::_Settings::set</code>, respectively.
    740 Please note that the first call return a const object, so direct manipulation
    741 is forbidden.
    742 See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html">
    743   <filename class="headerfile">&lt;parallel/settings.h&gt;</filename></link>
    744 for complete details.
    745 </para>
    746 
    747 <para>
    748 A small example of tuning the default:
    749 </para>
    750 
    751 <programlisting>
    752 #include &lt;parallel/algorithm&gt;
    753 #include &lt;parallel/settings.h&gt;
    754 
    755 int main()
    756 {
    757   __gnu_parallel::_Settings s;
    758   s.algorithm_strategy = __gnu_parallel::force_parallel;
    759   __gnu_parallel::_Settings::set(s);
    760 
    761   // Do work... all algorithms will be parallelized, always.
    762 
    763   return 0;
    764 }
    765 </programlisting>
    766 
    767 </section>
    768 
    769 </section>
    770 
    771 <section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info>
    772 
    773 
    774 <para> One namespace contain versions of code that are always
    775 explicitly sequential:
    776 <code>__gnu_serial</code>.
    777 </para>
    778 
    779 <para> Two namespaces contain the parallel mode:
    780 <code>std::__parallel</code> and <code>__gnu_parallel</code>.
    781 </para>
    782 
    783 <para> Parallel implementations of standard components, including
    784 template helpers to select parallelism, are defined in <code>namespace
    785 std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in
    786 <function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel
    787 implementations are injected into <code>namespace
    788 __gnu_parallel</code> with using declarations.
    789 </para>
    790 
    791 <para> Support and general infrastructure is in <code>namespace
    792 __gnu_parallel</code>.
    793 </para>
    794 
    795 <para> More information, and an organized index of types and functions
    796 related to the parallel mode on a per-namespace basis, can be found in
    797 the generated source documentation.
    798 </para>
    799 
    800 </section>
    801 
    802 </section>
    803 
    804 <section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info>
    805 <?dbhtml filename="parallel_mode_test.html"?>
    806 
    807 
    808   <para>
    809     Both the normal conformance and regression tests and the
    810     supplemental performance tests work.
    811   </para>
    812 
    813   <para>
    814     To run the conformance and regression tests with the parallel mode
    815     active,
    816   </para>
    817 
    818   <screen>
    819   <userinput>make check-parallel</userinput>
    820   </screen>
    821 
    822   <para>
    823     The log and summary files for conformance testing are in the
    824     <filename class="directory">testsuite/parallel</filename> directory.
    825   </para>
    826 
    827   <para>
    828     To run the performance tests with the parallel mode active,
    829   </para>
    830 
    831   <screen>
    832   <userinput>make check-performance-parallel</userinput>
    833   </screen>
    834 
    835   <para>
    836     The result file for performance testing are in the
    837     <filename class="directory">testsuite</filename> directory, in the file
    838     <filename>libstdc++_performance.sum</filename>. In addition, the
    839     policy-based containers have their own visualizations, which have
    840     additional software dependencies than the usual bare-boned text
    841     file, and can be generated by using the <code>make
    842     doc-performance</code> rule in the testsuite's Makefile.
    843 </para>
    844 </section>
    845 
    846 <bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info>
    847 
    848 
    849   <biblioentry>
    850     <citetitle>
    851       Parallelization of Bulk Operations for STL Dictionaries
    852     </citetitle>
    853 
    854     <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
    855     <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author>
    856 
    857     <copyright>
    858       <year>2007</year>
    859       <holder/>
    860     </copyright>
    861 
    862     <publisher>
    863       <publishername>
    864 	Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
    865       </publishername>
    866     </publisher>
    867   </biblioentry>
    868 
    869   <biblioentry>
    870     <citetitle>
    871       The Multi-Core Standard Template Library
    872     </citetitle>
    873 
    874     <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author>
    875     <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author>
    876     <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author>
    877 
    878     <copyright>
    879       <year>2007</year>
    880       <holder/>
    881     </copyright>
    882 
    883     <publisher>
    884       <publishername>
    885 	 Euro-Par 2007: Parallel Processing. (LNCS 4641)
    886       </publishername>
    887     </publisher>
    888   </biblioentry>
    889 
    890 </bibliography>
    891 
    892 </chapter>
    893