1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> 3 <?dbhtml filename="parallel_mode.html"?> 4 5 <info><title>Parallel Mode</title> 6 <keywordset> 7 <keyword>C++</keyword> 8 <keyword>library</keyword> 9 <keyword>parallel</keyword> 10 </keywordset> 11 </info> 12 13 14 15 <para> The libstdc++ parallel mode is an experimental parallel 16 implementation of many algorithms of the C++ Standard Library. 17 </para> 18 19 <para> 20 Several of the standard algorithms, for instance 21 <function>std::sort</function>, are made parallel using OpenMP 22 annotations. These parallel mode constructs can be invoked by 23 explicit source declaration or by compiling existing sources with a 24 specific compiler flag. 25 </para> 26 27 <note> 28 <para> 29 The parallel mode has not been kept up to date with recent C++ standards 30 and so it only conforms to the C++03 requirements. 31 That means that move-only predicates may not work with parallel mode 32 algorithms, and for C++20 most of the algorithms cannot be used in 33 <code>constexpr</code> functions. 34 </para> 35 <para> 36 For C++17 and above there are new overloads of the standard algorithms 37 which take an execution policy argument. You should consider using those 38 instead of the non-standard parallel mode extensions. 39 </para> 40 </note> 41 42 <section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info> 43 44 45 <para>The following library components in the include 46 <filename class="headerfile">numeric</filename> are included in the parallel mode:</para> 47 <itemizedlist> 48 <listitem><para><function>std::accumulate</function></para></listitem> 49 <listitem><para><function>std::adjacent_difference</function></para></listitem> 50 <listitem><para><function>std::inner_product</function></para></listitem> 51 <listitem><para><function>std::partial_sum</function></para></listitem> 52 </itemizedlist> 53 54 <para>The following library components in the include 55 <filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> 56 <itemizedlist> 57 <listitem><para><function>std::adjacent_find</function></para></listitem> 58 <listitem><para><function>std::count</function></para></listitem> 59 <listitem><para><function>std::count_if</function></para></listitem> 60 <listitem><para><function>std::equal</function></para></listitem> 61 <listitem><para><function>std::find</function></para></listitem> 62 <listitem><para><function>std::find_if</function></para></listitem> 63 <listitem><para><function>std::find_first_of</function></para></listitem> 64 <listitem><para><function>std::for_each</function></para></listitem> 65 <listitem><para><function>std::generate</function></para></listitem> 66 <listitem><para><function>std::generate_n</function></para></listitem> 67 <listitem><para><function>std::lexicographical_compare</function></para></listitem> 68 <listitem><para><function>std::mismatch</function></para></listitem> 69 <listitem><para><function>std::search</function></para></listitem> 70 <listitem><para><function>std::search_n</function></para></listitem> 71 <listitem><para><function>std::transform</function></para></listitem> 72 <listitem><para><function>std::replace</function></para></listitem> 73 <listitem><para><function>std::replace_if</function></para></listitem> 74 <listitem><para><function>std::max_element</function></para></listitem> 75 <listitem><para><function>std::merge</function></para></listitem> 76 <listitem><para><function>std::min_element</function></para></listitem> 77 <listitem><para><function>std::nth_element</function></para></listitem> 78 <listitem><para><function>std::partial_sort</function></para></listitem> 79 <listitem><para><function>std::partition</function></para></listitem> 80 <listitem><para><function>std::random_shuffle</function></para></listitem> 81 <listitem><para><function>std::set_union</function></para></listitem> 82 <listitem><para><function>std::set_intersection</function></para></listitem> 83 <listitem><para><function>std::set_symmetric_difference</function></para></listitem> 84 <listitem><para><function>std::set_difference</function></para></listitem> 85 <listitem><para><function>std::sort</function></para></listitem> 86 <listitem><para><function>std::stable_sort</function></para></listitem> 87 <listitem><para><function>std::unique_copy</function></para></listitem> 88 </itemizedlist> 89 90 </section> 91 92 <section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info> 93 <?dbhtml filename="parallel_mode_semantics.html"?> 94 95 96 <para> The parallel mode STL algorithms are currently not exception-safe, 97 i.e. user-defined functors must not throw exceptions. 98 Also, the order of execution is not guaranteed for some functions, of course. 99 Therefore, user-defined functors should not have any concurrent side effects. 100 </para> 101 102 <para> Since the current GCC OpenMP implementation does not support 103 OpenMP parallel regions in concurrent threads, 104 it is not possible to call parallel STL algorithm in 105 concurrent threads, either. 106 It might work with other compilers, though.</para> 107 108 </section> 109 110 <section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info> 111 <?dbhtml filename="parallel_mode_using.html"?> 112 113 114 <section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info> 115 116 117 <para> 118 Any use of parallel functionality requires additional compiler 119 and runtime support, in particular support for OpenMP. Adding this support is 120 not difficult: just compile your application with the compiler 121 flag <literal>-fopenmp</literal>. This will link 122 in <code>libgomp</code>, the 123 <link xmlns:xlink="http://www.w3.org/1999/xlink" 124 xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and 125 Multi Processing Runtime Library</link>, 126 whose presence is mandatory. 127 </para> 128 129 <para> 130 In addition, hardware that supports atomic operations and a compiler 131 capable of producing atomic operations is mandatory: GCC defaults to no 132 support for atomic operations on some common hardware 133 architectures. Activating atomic operations may require explicit 134 compiler flags on some targets (like sparc and x86), such 135 as <literal>-march=i686</literal>, 136 <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See 137 the GCC manual for more information. 138 </para> 139 140 </section> 141 142 <section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info> 143 144 145 <para> 146 To use the libstdc++ parallel mode, compile your application with 147 the prerequisite flags as detailed above, and in addition 148 add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all 149 use of the standard (sequential) algorithms to the appropriate parallel 150 equivalents. Please note that this doesn't necessarily mean that 151 everything will end up being executed in a parallel manner, but 152 rather that the heuristics and settings coded into the parallel 153 versions will be used to determine if all, some, or no algorithms 154 will be executed using parallel variants. 155 </para> 156 157 <para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the 158 sizes and behavior of standard class templates such as 159 <function>std::search</function>, and therefore one can only link code 160 compiled with parallel mode and code compiled without parallel mode 161 if no instantiation of a container is passed between the two 162 translation units. Parallel mode functionality has distinct linkage, 163 and cannot be confused with normal mode symbols. 164 </para> 165 </section> 166 167 <section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info> 168 169 170 <para>When it is not feasible to recompile your entire application, or 171 only specific algorithms need to be parallel-aware, individual 172 parallel algorithms can be made available explicitly. These 173 parallel algorithms are functionally equivalent to the standard 174 drop-in algorithms used in parallel mode, but they are available in 175 a separate namespace as GNU extensions and may be used in programs 176 compiled with either release mode or with parallel mode. 177 </para> 178 179 180 <para>An example of using a parallel version 181 of <function>std::sort</function>, but no other parallel algorithms, is: 182 </para> 183 184 <programlisting> 185 #include <vector> 186 #include <parallel/algorithm> 187 188 int main() 189 { 190 std::vector<int> v(100); 191 192 // ... 193 194 // Explicitly force a call to parallel sort. 195 __gnu_parallel::sort(v.begin(), v.end()); 196 return 0; 197 } 198 </programlisting> 199 200 <para> 201 Then compile this code with the prerequisite compiler flags 202 (<literal>-fopenmp</literal> and any necessary architecture-specific 203 flags for atomic operations.) 204 </para> 205 206 <para> The following table provides the names and headers of all the 207 parallel algorithms that can be used in a similar manner: 208 </para> 209 210 <table frame="all" xml:id="table.parallel_algos"> 211 <title>Parallel Algorithms</title> 212 213 <tgroup cols="4" align="left" colsep="1" rowsep="1"> 214 <colspec colname="c1"/> 215 <colspec colname="c2"/> 216 <colspec colname="c3"/> 217 <colspec colname="c4"/> 218 219 <thead> 220 <row> 221 <entry>Algorithm</entry> 222 <entry>Header</entry> 223 <entry>Parallel algorithm</entry> 224 <entry>Parallel header</entry> 225 </row> 226 </thead> 227 228 <tbody> 229 <row> 230 <entry><function>std::accumulate</function></entry> 231 <entry><filename class="headerfile">numeric</filename></entry> 232 <entry><function>__gnu_parallel::accumulate</function></entry> 233 <entry><filename class="headerfile">parallel/numeric</filename></entry> 234 </row> 235 <row> 236 <entry><function>std::adjacent_difference</function></entry> 237 <entry><filename class="headerfile">numeric</filename></entry> 238 <entry><function>__gnu_parallel::adjacent_difference</function></entry> 239 <entry><filename class="headerfile">parallel/numeric</filename></entry> 240 </row> 241 <row> 242 <entry><function>std::inner_product</function></entry> 243 <entry><filename class="headerfile">numeric</filename></entry> 244 <entry><function>__gnu_parallel::inner_product</function></entry> 245 <entry><filename class="headerfile">parallel/numeric</filename></entry> 246 </row> 247 <row> 248 <entry><function>std::partial_sum</function></entry> 249 <entry><filename class="headerfile">numeric</filename></entry> 250 <entry><function>__gnu_parallel::partial_sum</function></entry> 251 <entry><filename class="headerfile">parallel/numeric</filename></entry> 252 </row> 253 <row> 254 <entry><function>std::adjacent_find</function></entry> 255 <entry><filename class="headerfile">algorithm</filename></entry> 256 <entry><function>__gnu_parallel::adjacent_find</function></entry> 257 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 258 </row> 259 260 <row> 261 <entry><function>std::count</function></entry> 262 <entry><filename class="headerfile">algorithm</filename></entry> 263 <entry><function>__gnu_parallel::count</function></entry> 264 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 265 </row> 266 267 <row> 268 <entry><function>std::count_if</function></entry> 269 <entry><filename class="headerfile">algorithm</filename></entry> 270 <entry><function>__gnu_parallel::count_if</function></entry> 271 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 272 </row> 273 274 <row> 275 <entry><function>std::equal</function></entry> 276 <entry><filename class="headerfile">algorithm</filename></entry> 277 <entry><function>__gnu_parallel::equal</function></entry> 278 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 279 </row> 280 281 <row> 282 <entry><function>std::find</function></entry> 283 <entry><filename class="headerfile">algorithm</filename></entry> 284 <entry><function>__gnu_parallel::find</function></entry> 285 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 286 </row> 287 288 <row> 289 <entry><function>std::find_if</function></entry> 290 <entry><filename class="headerfile">algorithm</filename></entry> 291 <entry><function>__gnu_parallel::find_if</function></entry> 292 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 293 </row> 294 295 <row> 296 <entry><function>std::find_first_of</function></entry> 297 <entry><filename class="headerfile">algorithm</filename></entry> 298 <entry><function>__gnu_parallel::find_first_of</function></entry> 299 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 300 </row> 301 302 <row> 303 <entry><function>std::for_each</function></entry> 304 <entry><filename class="headerfile">algorithm</filename></entry> 305 <entry><function>__gnu_parallel::for_each</function></entry> 306 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 307 </row> 308 309 <row> 310 <entry><function>std::generate</function></entry> 311 <entry><filename class="headerfile">algorithm</filename></entry> 312 <entry><function>__gnu_parallel::generate</function></entry> 313 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 314 </row> 315 316 <row> 317 <entry><function>std::generate_n</function></entry> 318 <entry><filename class="headerfile">algorithm</filename></entry> 319 <entry><function>__gnu_parallel::generate_n</function></entry> 320 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 321 </row> 322 323 <row> 324 <entry><function>std::lexicographical_compare</function></entry> 325 <entry><filename class="headerfile">algorithm</filename></entry> 326 <entry><function>__gnu_parallel::lexicographical_compare</function></entry> 327 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 328 </row> 329 330 <row> 331 <entry><function>std::mismatch</function></entry> 332 <entry><filename class="headerfile">algorithm</filename></entry> 333 <entry><function>__gnu_parallel::mismatch</function></entry> 334 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 335 </row> 336 337 <row> 338 <entry><function>std::search</function></entry> 339 <entry><filename class="headerfile">algorithm</filename></entry> 340 <entry><function>__gnu_parallel::search</function></entry> 341 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 342 </row> 343 344 <row> 345 <entry><function>std::search_n</function></entry> 346 <entry><filename class="headerfile">algorithm</filename></entry> 347 <entry><function>__gnu_parallel::search_n</function></entry> 348 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 349 </row> 350 351 <row> 352 <entry><function>std::transform</function></entry> 353 <entry><filename class="headerfile">algorithm</filename></entry> 354 <entry><function>__gnu_parallel::transform</function></entry> 355 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 356 </row> 357 358 <row> 359 <entry><function>std::replace</function></entry> 360 <entry><filename class="headerfile">algorithm</filename></entry> 361 <entry><function>__gnu_parallel::replace</function></entry> 362 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 363 </row> 364 365 <row> 366 <entry><function>std::replace_if</function></entry> 367 <entry><filename class="headerfile">algorithm</filename></entry> 368 <entry><function>__gnu_parallel::replace_if</function></entry> 369 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 370 </row> 371 372 <row> 373 <entry><function>std::max_element</function></entry> 374 <entry><filename class="headerfile">algorithm</filename></entry> 375 <entry><function>__gnu_parallel::max_element</function></entry> 376 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 377 </row> 378 379 <row> 380 <entry><function>std::merge</function></entry> 381 <entry><filename class="headerfile">algorithm</filename></entry> 382 <entry><function>__gnu_parallel::merge</function></entry> 383 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 384 </row> 385 386 <row> 387 <entry><function>std::min_element</function></entry> 388 <entry><filename class="headerfile">algorithm</filename></entry> 389 <entry><function>__gnu_parallel::min_element</function></entry> 390 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 391 </row> 392 393 <row> 394 <entry><function>std::nth_element</function></entry> 395 <entry><filename class="headerfile">algorithm</filename></entry> 396 <entry><function>__gnu_parallel::nth_element</function></entry> 397 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 398 </row> 399 400 <row> 401 <entry><function>std::partial_sort</function></entry> 402 <entry><filename class="headerfile">algorithm</filename></entry> 403 <entry><function>__gnu_parallel::partial_sort</function></entry> 404 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 405 </row> 406 407 <row> 408 <entry><function>std::partition</function></entry> 409 <entry><filename class="headerfile">algorithm</filename></entry> 410 <entry><function>__gnu_parallel::partition</function></entry> 411 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 412 </row> 413 414 <row> 415 <entry><function>std::random_shuffle</function></entry> 416 <entry><filename class="headerfile">algorithm</filename></entry> 417 <entry><function>__gnu_parallel::random_shuffle</function></entry> 418 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 419 </row> 420 421 <row> 422 <entry><function>std::set_union</function></entry> 423 <entry><filename class="headerfile">algorithm</filename></entry> 424 <entry><function>__gnu_parallel::set_union</function></entry> 425 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 426 </row> 427 428 <row> 429 <entry><function>std::set_intersection</function></entry> 430 <entry><filename class="headerfile">algorithm</filename></entry> 431 <entry><function>__gnu_parallel::set_intersection</function></entry> 432 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 433 </row> 434 435 <row> 436 <entry><function>std::set_symmetric_difference</function></entry> 437 <entry><filename class="headerfile">algorithm</filename></entry> 438 <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> 439 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 440 </row> 441 442 <row> 443 <entry><function>std::set_difference</function></entry> 444 <entry><filename class="headerfile">algorithm</filename></entry> 445 <entry><function>__gnu_parallel::set_difference</function></entry> 446 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 447 </row> 448 449 <row> 450 <entry><function>std::sort</function></entry> 451 <entry><filename class="headerfile">algorithm</filename></entry> 452 <entry><function>__gnu_parallel::sort</function></entry> 453 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 454 </row> 455 456 <row> 457 <entry><function>std::stable_sort</function></entry> 458 <entry><filename class="headerfile">algorithm</filename></entry> 459 <entry><function>__gnu_parallel::stable_sort</function></entry> 460 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 461 </row> 462 463 <row> 464 <entry><function>std::unique_copy</function></entry> 465 <entry><filename class="headerfile">algorithm</filename></entry> 466 <entry><function>__gnu_parallel::unique_copy</function></entry> 467 <entry><filename class="headerfile">parallel/algorithm</filename></entry> 468 </row> 469 </tbody> 470 </tgroup> 471 </table> 472 473 </section> 474 475 </section> 476 477 <section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info> 478 <?dbhtml filename="parallel_mode_design.html"?> 479 480 <para> 481 </para> 482 <section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info> 483 484 485 <para> 486 All parallel algorithms are intended to have signatures that are 487 equivalent to the ISO C++ algorithms replaced. For instance, the 488 <function>std::adjacent_find</function> function is declared as: 489 </para> 490 <programlisting> 491 namespace std 492 { 493 template<typename _FIter> 494 _FIter 495 adjacent_find(_FIter, _FIter); 496 } 497 </programlisting> 498 499 <para> 500 Which means that there should be something equivalent for the parallel 501 version. Indeed, this is the case: 502 </para> 503 504 <programlisting> 505 namespace std 506 { 507 namespace __parallel 508 { 509 template<typename _FIter> 510 _FIter 511 adjacent_find(_FIter, _FIter); 512 513 ... 514 } 515 } 516 </programlisting> 517 518 <para>But.... why the ellipses? 519 </para> 520 521 <para> The ellipses in the example above represent additional overloads 522 required for the parallel version of the function. These additional 523 overloads are used to dispatch calls from the ISO C++ function 524 signature to the appropriate parallel function (or sequential 525 function, if no parallel functions are deemed worthy), based on either 526 compile-time or run-time conditions. 527 </para> 528 529 <para> The available signature options are specific for the different 530 algorithms/algorithm classes.</para> 531 532 <para> The general view of overloads for the parallel algorithms look like this: 533 </para> 534 <itemizedlist> 535 <listitem><para>ISO C++ signature</para></listitem> 536 <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> 537 <listitem><para>ISO C++ signature + algorithm-specific tag type 538 (several signatures)</para></listitem> 539 </itemizedlist> 540 541 <para> Please note that the implementation may use additional functions 542 (designated with the <code>_switch</code> suffix) to dispatch from the 543 ISO C++ signature to the correct parallel version. Also, some of the 544 algorithms do not have support for run-time conditions, so the last 545 overload is therefore missing. 546 </para> 547 548 549 </section> 550 551 <section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info> 552 553 554 555 <section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info> 556 557 558 <para> 559 Several aspects of the overall runtime environment can be manipulated 560 by standard OpenMP function calls. 561 </para> 562 563 <para> 564 To specify the number of threads to be used for the algorithms globally, 565 use the function <function>omp_set_num_threads</function>. An example: 566 </para> 567 568 <programlisting> 569 #include <stdlib.h> 570 #include <omp.h> 571 572 int main() 573 { 574 // Explicitly set number of threads. 575 const int threads_wanted = 20; 576 omp_set_dynamic(false); 577 omp_set_num_threads(threads_wanted); 578 579 // Call parallel mode algorithms. 580 581 return 0; 582 } 583 </programlisting> 584 585 <para> 586 Some algorithms allow the number of threads being set for a particular call, 587 by augmenting the algorithm variant. 588 See the next section for further information. 589 </para> 590 591 <para> 592 Other parts of the runtime environment able to be manipulated include 593 nested parallelism (<function>omp_set_nested</function>), schedule kind 594 (<function>omp_set_schedule</function>), and others. See the OpenMP 595 documentation for more information. 596 </para> 597 598 </section> 599 600 <section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info> 601 602 603 <para> 604 To force an algorithm to execute sequentially, even though parallelism 605 is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, 606 add <classname>__gnu_parallel::sequential_tag()</classname> to the end 607 of the algorithm's argument list. 608 </para> 609 610 <para> 611 Like so: 612 </para> 613 614 <programlisting> 615 std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); 616 </programlisting> 617 618 <para> 619 Some parallel algorithm variants can be excluded from compilation by 620 preprocessor defines. See the doxygen documentation on 621 <code>compiletime_settings.h</code> and <code>features.h</code> for details. 622 </para> 623 624 <para> 625 For some algorithms, the desired variant can be chosen at compile-time by 626 appending a tag object. The available options are specific to the particular 627 algorithm (class). 628 </para> 629 630 <para> 631 For the "embarrassingly parallel" algorithms, there is only one "tag object 632 type", the enum _Parallelism. 633 It takes one of the following values, 634 <code>__gnu_parallel::parallel_tag</code>, 635 <code>__gnu_parallel::balanced_tag</code>, 636 <code>__gnu_parallel::unbalanced_tag</code>, 637 <code>__gnu_parallel::omp_loop_tag</code>, 638 <code>__gnu_parallel::omp_loop_static_tag</code>. 639 This means that the actual parallelization strategy is chosen at run-time. 640 (Choosing the variants at compile-time will come soon.) 641 </para> 642 643 <para> 644 For the following algorithms in general, we have 645 <code>__gnu_parallel::parallel_tag</code> and 646 <code>__gnu_parallel::default_parallel_tag</code>, in addition to 647 <code>__gnu_parallel::sequential_tag</code>. 648 <code>__gnu_parallel::default_parallel_tag</code> chooses the default 649 algorithm at compiletime, as does omitting the tag. 650 <code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime 651 (see next section). 652 For all tags, the number of threads desired for this call can optionally be 653 passed to the respective tag's constructor. 654 </para> 655 656 <para> 657 The <code>multiway_merge</code> algorithm comes with the additional choices, 658 <code>__gnu_parallel::exact_tag</code> and 659 <code>__gnu_parallel::sampling_tag</code>. 660 Exact and sampling are the two available splitting strategies. 661 </para> 662 663 <para> 664 For the <code>sort</code> and <code>stable_sort</code> algorithms, there are 665 several additional choices, namely 666 <code>__gnu_parallel::multiway_mergesort_tag</code>, 667 <code>__gnu_parallel::multiway_mergesort_exact_tag</code>, 668 <code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, 669 <code>__gnu_parallel::quicksort_tag</code>, and 670 <code>__gnu_parallel::balanced_quicksort_tag</code>. 671 Multiway mergesort comes with the two splitting strategies for multi-way 672 merging. The quicksort options cannot be used for <code>stable_sort</code>. 673 </para> 674 675 </section> 676 677 <section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info> 678 679 680 <para> 681 The default parallelization strategy, the choice of specific algorithm 682 strategy, the minimum threshold limits for individual parallel 683 algorithms, and aspects of the underlying hardware can be specified as 684 desired via manipulation 685 of <classname>__gnu_parallel::_Settings</classname> member data. 686 </para> 687 688 <para> 689 First off, the choice of parallelization strategy: serial, parallel, 690 or heuristically deduced. This corresponds 691 to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a 692 value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> 693 type. Choices 694 include: <type>heuristic</type>, <type>force_sequential</type>, 695 and <type>force_parallel</type>. The default is <type>heuristic</type>. 696 </para> 697 698 699 <para> 700 Next, the sub-choices for algorithm variant, if not fixed at compile-time. 701 Specific algorithms like <function>find</function> or <function>sort</function> 702 can be implemented in multiple ways: when this is the case, 703 a <classname>__gnu_parallel::_Settings</classname> member exists to 704 pick the default strategy. For 705 example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can 706 have any values of 707 enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, 708 or <type>QS_BALANCED</type>. 709 </para> 710 711 <para> 712 Likewise for setting the minimal threshold for algorithm 713 parallelization. Parallelism always incurs some overhead. Thus, it is 714 not helpful to parallelize operations on very small sets of 715 data. Because of this, measures are taken to avoid parallelizing below 716 a certain, pre-determined threshold. For each algorithm, a minimum 717 problem size is encoded as a variable in the 718 active <classname>__gnu_parallel::_Settings</classname> object. This 719 threshold variable follows the following naming scheme: 720 <code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, 721 for <function>fill</function>, the threshold variable 722 is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, 723 </para> 724 725 <para> 726 Finally, hardware details like L1/L2 cache size can be hardwired 727 via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. 728 </para> 729 730 <para> 731 </para> 732 733 <para> 734 All these configuration variables can be changed by the user, if 735 desired. 736 There exists one global instance of the class <classname>_Settings</classname>, 737 i. e. it is a singleton. It can be read and written by calling 738 <code>__gnu_parallel::_Settings::get</code> and 739 <code>__gnu_parallel::_Settings::set</code>, respectively. 740 Please note that the first call return a const object, so direct manipulation 741 is forbidden. 742 See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html"> 743 <filename class="headerfile"><parallel/settings.h></filename></link> 744 for complete details. 745 </para> 746 747 <para> 748 A small example of tuning the default: 749 </para> 750 751 <programlisting> 752 #include <parallel/algorithm> 753 #include <parallel/settings.h> 754 755 int main() 756 { 757 __gnu_parallel::_Settings s; 758 s.algorithm_strategy = __gnu_parallel::force_parallel; 759 __gnu_parallel::_Settings::set(s); 760 761 // Do work... all algorithms will be parallelized, always. 762 763 return 0; 764 } 765 </programlisting> 766 767 </section> 768 769 </section> 770 771 <section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info> 772 773 774 <para> One namespace contain versions of code that are always 775 explicitly sequential: 776 <code>__gnu_serial</code>. 777 </para> 778 779 <para> Two namespaces contain the parallel mode: 780 <code>std::__parallel</code> and <code>__gnu_parallel</code>. 781 </para> 782 783 <para> Parallel implementations of standard components, including 784 template helpers to select parallelism, are defined in <code>namespace 785 std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in 786 <function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel 787 implementations are injected into <code>namespace 788 __gnu_parallel</code> with using declarations. 789 </para> 790 791 <para> Support and general infrastructure is in <code>namespace 792 __gnu_parallel</code>. 793 </para> 794 795 <para> More information, and an organized index of types and functions 796 related to the parallel mode on a per-namespace basis, can be found in 797 the generated source documentation. 798 </para> 799 800 </section> 801 802 </section> 803 804 <section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info> 805 <?dbhtml filename="parallel_mode_test.html"?> 806 807 808 <para> 809 Both the normal conformance and regression tests and the 810 supplemental performance tests work. 811 </para> 812 813 <para> 814 To run the conformance and regression tests with the parallel mode 815 active, 816 </para> 817 818 <screen> 819 <userinput>make check-parallel</userinput> 820 </screen> 821 822 <para> 823 The log and summary files for conformance testing are in the 824 <filename class="directory">testsuite/parallel</filename> directory. 825 </para> 826 827 <para> 828 To run the performance tests with the parallel mode active, 829 </para> 830 831 <screen> 832 <userinput>make check-performance-parallel</userinput> 833 </screen> 834 835 <para> 836 The result file for performance testing are in the 837 <filename class="directory">testsuite</filename> directory, in the file 838 <filename>libstdc++_performance.sum</filename>. In addition, the 839 policy-based containers have their own visualizations, which have 840 additional software dependencies than the usual bare-boned text 841 file, and can be generated by using the <code>make 842 doc-performance</code> rule in the testsuite's Makefile. 843 </para> 844 </section> 845 846 <bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info> 847 848 849 <biblioentry> 850 <citetitle> 851 Parallelization of Bulk Operations for STL Dictionaries 852 </citetitle> 853 854 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 855 <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author> 856 857 <copyright> 858 <year>2007</year> 859 <holder/> 860 </copyright> 861 862 <publisher> 863 <publishername> 864 Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) 865 </publishername> 866 </publisher> 867 </biblioentry> 868 869 <biblioentry> 870 <citetitle> 871 The Multi-Core Standard Template Library 872 </citetitle> 873 874 <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> 875 <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author> 876 <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author> 877 878 <copyright> 879 <year>2007</year> 880 <holder/> 881 </copyright> 882 883 <publisher> 884 <publishername> 885 Euro-Par 2007: Parallel Processing. (LNCS 4641) 886 </publishername> 887 </publisher> 888 </biblioentry> 889 890 </bibliography> 891 892 </chapter> 893