Home | History | Annotate | Line # | Download | only in doc
      1      1.1  christos \input texinfo @c -*-texinfo-*-
      2      1.1  christos 
      3      1.1  christos @c ----------------------------------------------------------------------------
      4      1.1  christos @c This is the Texinfo source file for the GPROFNG manual.  This manual
      5      1.1  christos @c includes the man pages for the various tools.
      6      1.1  christos @c
      7      1.1  christos @c Author: Ruud van der Pas
      8      1.1  christos @c ----------------------------------------------------------------------------
      9      1.1  christos 
     10      1.1  christos @c %**start of header
     11      1.1  christos 
     12      1.1  christos @setfilename gprofng.info
     13      1.1  christos @settitle GNU gprofng
     14      1.1  christos 
     15      1.1  christos @c -- Set the indent for the @example command to 1 space, not 5 ---------------
     16      1.1  christos @exampleindent 1
     17      1.1  christos 
     18      1.1  christos @paragraphindent 3
     19      1.1  christos 
     20      1.1  christos @c %**end of header
     21      1.1  christos 
     22      1.1  christos @c -- Start a new chapter on a new, odd numbered, page ------------------------
     23      1.1  christos @setchapternewpage odd
     24      1.1  christos 
     25      1.1  christos @c -- Merge all index entries into the Concepts Index -------------------------
     26      1.1  christos @syncodeindex fn cp
     27      1.1  christos @syncodeindex ky cp
     28      1.1  christos @syncodeindex pg cp
     29      1.1  christos @syncodeindex vr cp
     30      1.1  christos 
     31      1.1  christos @c -- Macros specific to gprofng ----------------------------------------------
     32      1.1  christos @include gp-macros.texi
     33      1.1  christos 
     34      1.1  christos @c -- Get the version information ---------------------------------------------
     35      1.1  christos @include version.texi
     36      1.1  christos 
     37      1.1  christos @c -- Entry for the Info dir structure ----------------------------------------
     38      1.1  christos @ifnottex
     39      1.1  christos @dircategory Software development
     40      1.1  christos @direntry
     41      1.1  christos * gprofng: (gprofng).                    The next generation profiling tool for Linux
     42      1.1  christos @end direntry
     43      1.1  christos @end ifnottex
     44      1.1  christos 
     45      1.1  christos @c -- Copyright stuff ---------------------------------------------------------
     46      1.1  christos @copying
     47      1.1  christos This document is the manual for @ProductName{}, last updated @value{UPDATED}.
     48      1.1  christos 
     49  1.1.1.2  christos Copyright @copyright{} 2022-2025 Free Software Foundation, Inc.
     50      1.1  christos 
     51      1.1  christos @c -- @quotation
     52      1.1  christos Permission is granted to copy, distribute and/or modify this document
     53      1.1  christos under the terms of the GNU Free Documentation License,
     54      1.1  christos Version 1.3 or any later version published by the Free Software
     55      1.1  christos Foundation; with no Invariant Sections, with no Front-Cover texts,
     56      1.1  christos and with no Back-Cover Texts.  A copy of the license is included in the
     57      1.1  christos section entitled ``GNU Free Documentation License.''
     58      1.1  christos 
     59      1.1  christos @c -- @end quotation
     60      1.1  christos @end copying
     61      1.1  christos 
     62      1.1  christos @finalout
     63      1.1  christos @smallbook
     64      1.1  christos 
     65      1.1  christos @c -- Define the title page ---------------------------------------------------
     66      1.1  christos @titlepage
     67      1.1  christos @title GNU gprofng
     68      1.1  christos @subtitle The next generation profiling tool for Linux
     69      1.1  christos @subtitle version @value{VERSION} (last updated @value{UPDATED})
     70      1.1  christos @author Ruud van der Pas
     71      1.1  christos @page
     72      1.1  christos @vskip 0pt plus 1filll
     73      1.1  christos @insertcopying
     74      1.1  christos @end titlepage
     75      1.1  christos 
     76      1.1  christos @c -- Generate the Table of Contents ------------------------------------------
     77      1.1  christos @contents
     78      1.1  christos 
     79      1.1  christos @c -- The Top node ------------------------------------------------------------
     80      1.1  christos @c Should contain a short summary, copying permissions and a master menu.
     81      1.1  christos @c ----------------------------------------------------------------------------
     82      1.1  christos @ifnottex
     83      1.1  christos @node Top
     84      1.1  christos @top  GNU Gprofng
     85      1.1  christos 
     86      1.1  christos @insertcopying
     87      1.1  christos @end ifnottex
     88      1.1  christos 
     89      1.1  christos @ifinfo
     90      1.1  christos @c -- The menu entries --------------------------------------------------------
     91      1.1  christos 
     92      1.1  christos @c * Display Source Code::             Display the source code and disassembly.
     93      1.1  christos @c * Archive Experiment Data::         Archive an experiment.
     94      1.1  christos 
     95      1.1  christos @menu
     96      1.1  christos * Introduction::                      About this manual.
     97      1.1  christos * Overview::                          A brief overview of @ProductName{}.
     98  1.1.1.2  christos * A Mini Tutorial::                   A short tutorial with the key features.
     99      1.1  christos * The gprofng Tools::                 An overview of the tools supported.
    100      1.1  christos * Performance Data Collection::       Record the performance information.
    101      1.1  christos * View the Performance Information::  Different ways to view the data.
    102      1.1  christos * Terminology::                       Concepts and terminology explained.
    103      1.1  christos * Other Document Formats::            Create this document in other formats.
    104      1.1  christos * The gprofng Man Pages::             The gprofng man pages.
    105      1.1  christos * Index::                             The index.
    106      1.1  christos 
    107      1.1  christos @detailmenu
    108      1.1  christos 
    109      1.1  christos --- The Detailed Node Listing ---
    110      1.1  christos 
    111      1.1  christos Introduction
    112      1.1  christos 
    113      1.1  christos Overview
    114      1.1  christos 
    115      1.1  christos * Main Features::                     A high level overview.
    116  1.1.1.2  christos * Sampling versus Tracing::           The pros and cons of both approaches.
    117      1.1  christos * Steps Needed to Create a Profile::  How to create a profile.
    118      1.1  christos 
    119      1.1  christos A Mini Tutorial
    120      1.1  christos 
    121  1.1.1.2  christos * Getting Started::                 The basics of profiling with gprofng.
    122  1.1.1.2  christos * Support for Multithreading::      Commands for multithreaded applications.
    123  1.1.1.2  christos * View Multiple Experiments::       Analyze multiple experiments. 
    124      1.1  christos * Profile Hardware Event Counters:: How to use hardware event counters.
    125      1.1  christos * Java Profiling::                  How to profile a Java application.
    126      1.1  christos 
    127      1.1  christos The gprofng Tools
    128      1.1  christos 
    129      1.1  christos * Tools Overview::                            A brief description of the tools.
    130      1.1  christos * The gprofng.rc file with default settings:: Customize the settings.
    131      1.1  christos * Filters::                                   Filters.
    132      1.1  christos * Supported Environment Variables::           The supported environment variables.
    133      1.1  christos 
    134      1.1  christos Performance Data Collection
    135      1.1  christos 
    136      1.1  christos * The @command{gprofng collect app} Tool::    Collect application performance data.
    137      1.1  christos 
    138      1.1  christos View the Performance Information
    139      1.1  christos 
    140      1.1  christos * The @command{gprofng display text} Tool::   View the performance data in plain text.
    141      1.1  christos 
    142      1.1  christos Terminology
    143      1.1  christos 
    144      1.1  christos * The Program Counter::                    What is a Program Counter?
    145      1.1  christos * Inclusive and Exclusive Metrics::        An explanation of inclusive and exclusive metrics.
    146      1.1  christos * Metric Definitions::                     Definitions associated with metrics.
    147      1.1  christos * The Viewmode::                           Select the way call stacks are presented.
    148      1.1  christos * The Selection List::                     How to define a selection.
    149      1.1  christos * Load Objects and Functions::             The components in an application.
    150      1.1  christos * The Concept of a CPU in gprofng::        The definition of a CPU.
    151      1.1  christos * Hardware Event Counters Explained::      What are event counters?
    152      1.1  christos * apath::                                  Our generic definition of a path.
    153      1.1  christos 
    154      1.1  christos The gprofng Man Pages
    155      1.1  christos 
    156      1.1  christos * Man page for @command{gprofng collect app}::   The man page for gprofng collect app.
    157      1.1  christos * Man page for @command{gprofng display text}::  The man page for gprofng display text.
    158      1.1  christos * Man page for @command{gprofng display html}::  The man page for gprofng display html.
    159      1.1  christos * Man page for @command{gprofng display src}::   The man page for gprofng display src.
    160      1.1  christos * Man page for @command{gprofng archive}::       The man page for gprofng archive.
    161      1.1  christos 
    162      1.1  christos @c -- Index
    163      1.1  christos 
    164      1.1  christos @end detailmenu
    165      1.1  christos @end menu
    166      1.1  christos @end ifinfo
    167      1.1  christos 
    168      1.1  christos @c -- A new node --------------------------------------------------------------
    169      1.1  christos @node    Introduction
    170      1.1  christos @chapter Introduction
    171      1.1  christos @c ----------------------------------------------------------------------------
    172      1.1  christos The @ProductName{} tool is the next generation profiler for Linux.  It consists
    173      1.1  christos of various commands to generate and display profile information.
    174      1.1  christos 
    175      1.1  christos This manual starts with a tutorial how to create and interpret a profile.  This
    176      1.1  christos part is highly practical and has the goal to get users up to speed as quickly
    177      1.1  christos as possible.  As soon as possible, we would like to show you how to get your
    178      1.1  christos first profile on your screen.
    179      1.1  christos 
    180      1.1  christos This is followed by more examples, covering many of the features.  At the
    181      1.1  christos end of this tutorial, you should feel confident enough to tackle the more
    182      1.1  christos complex tasks.
    183      1.1  christos 
    184      1.1  christos In a future update a more formal reference manual will be included as well.
    185      1.1  christos Since even in this tutorial we use certain terminology, we have included a
    186      1.1  christos chapter with descriptions at the end.  In case you encounter unfamiliar
    187      1.1  christos wordings or terminology, please check this chapter.
    188      1.1  christos 
    189      1.1  christos One word of caution.  In several cases we had to somewhat tweak the screen
    190      1.1  christos output in order to make it fit.  This is why the output may look somewhat
    191      1.1  christos different when you try things yourself.
    192      1.1  christos 
    193      1.1  christos For now, we wish you a smooth profiling experience with @ProductName{} and
    194      1.1  christos good luck tackling performance bottlenecks.
    195      1.1  christos 
    196      1.1  christos @c -- A new node --------------------------------------------------------------
    197      1.1  christos @c cccccc @node    A Brief Overview of @ProductName{}
    198      1.1  christos @node    Overview
    199  1.1.1.2  christos @chapter A Brief Overview of gprofng
    200      1.1  christos @c ----------------------------------------------------------------------------
    201      1.1  christos 
    202      1.1  christos @menu
    203      1.1  christos * Main Features::                     A high level overview.
    204      1.1  christos * Sampling versus Tracing::           The pros and cons of sampling versus tracing.
    205      1.1  christos * Steps Needed to Create a Profile::  How to create a profile.
    206      1.1  christos @end menu
    207      1.1  christos 
    208      1.1  christos Before we cover this tool in quite some detail, we start with a brief overview
    209      1.1  christos of what it is, and the main features.  Since we know that many of you would
    210      1.1  christos like to get started rightaway, already in this first chapter we explain the
    211      1.1  christos basics of profiling with @ToolName{}.
    212      1.1  christos 
    213      1.1  christos @c ----------------------------------------------------------------------------
    214      1.1  christos @c TBD Review this text.  Probably be more specific on the gcc releases and
    215      1.1  christos @c processor specifics.
    216      1.1  christos @c ----------------------------------------------------------------------------
    217      1.1  christos 
    218      1.1  christos @c -- A new node --------------------------------------------------------------
    219      1.1  christos @node    Main Features
    220      1.1  christos @section Main Features
    221      1.1  christos @c ----------------------------------------------------------------------------
    222      1.1  christos 
    223      1.1  christos @noindent
    224      1.1  christos These are the main features of the @ProductName{} tool:
    225      1.1  christos 
    226      1.1  christos @itemize @bullet
    227      1.1  christos 
    228      1.1  christos @item
    229      1.1  christos Profiling is supported for an application written in C, C++, Java, or Scala.
    230      1.1  christos 
    231      1.1  christos @c TBD Java: up to 1.8 full support, support other than for modules
    232      1.1  christos 
    233      1.1  christos @item
    234  1.1.1.2  christos Shared libraries are supported.  The information is presented at the
    235  1.1.1.2  christos instruction level.
    236      1.1  christos 
    237      1.1  christos @item
    238      1.1  christos The following multithreading programming models are supported: Pthreads,
    239      1.1  christos OpenMP, and Java threads.
    240      1.1  christos 
    241      1.1  christos @item
    242  1.1.1.2  christos This tool works with unmodified production level executables.  There is no need
    243  1.1.1.2  christos to recompile the code, but if the @samp{-g} option has been used when building
    244      1.1  christos the application, source line level information is available.
    245      1.1  christos 
    246      1.1  christos @item
    247      1.1  christos The focus is on support for code generated with the @command{gcc} compiler, but
    248      1.1  christos there is some limited support for the @command{icc} compiler as well.  Future
    249      1.1  christos improvements and enhancements will focus on @command{gcc} though.
    250      1.1  christos 
    251      1.1  christos @item
    252      1.1  christos Processors from Intel, AMD, and Arm are supported, but the level of support
    253      1.1  christos depends on the architectural details.  In particular, hardware event counters
    254      1.1  christos may not be supported.  If this is the case, all views not related to these
    255      1.1  christos counters still ought to work though.
    256      1.1  christos 
    257      1.1  christos @item
    258      1.1  christos Several views into the data are supported.  For example, a function overview
    259      1.1  christos where the time is spent, but also a source line, disassembly, call tree and
    260      1.1  christos a caller-callees overview are available.
    261      1.1  christos 
    262      1.1  christos @item
    263      1.1  christos Through filters, the user can zoom in on an area of interest.
    264      1.1  christos 
    265      1.1  christos @item
    266  1.1.1.2  christos Two or more profiles can be aggregated, or used in a comparison.  This
    267  1.1.1.2  christos comparison can be obtained at the function, source line, and disassembly
    268  1.1.1.2  christos level.
    269      1.1  christos 
    270      1.1  christos @item
    271      1.1  christos Through a simple scripting language, and customization of the metrics shown,
    272      1.1  christos the generation and creation of a profile can be fully automated and provide
    273      1.1  christos tailored output.
    274      1.1  christos 
    275      1.1  christos @end itemize
    276      1.1  christos 
    277      1.1  christos @c -- A new node --------------------------------------------------------------
    278      1.1  christos @node    Sampling versus Tracing
    279      1.1  christos @section Sampling versus Tracing
    280      1.1  christos @c ----------------------------------------------------------------------------
    281      1.1  christos 
    282      1.1  christos A key difference with some other profiling tools is that the main data
    283      1.1  christos collection command @CollectApp{} mostly uses
    284      1.1  christos @cindex Program Counter sampling
    285      1.1  christos @cindex PC sampling
    286      1.1  christos Program Counter (PC) sampling
    287      1.1  christos under the hood.
    288      1.1  christos 
    289      1.1  christos With @emph{sampling}, the executable is interrupted at regular intervals.  Each
    290      1.1  christos time it is halted, key information is gathered and stored.  This includes the
    291      1.1  christos Program Counter that keeps track of where the execution is.  Hence the name.
    292      1.1  christos 
    293      1.1  christos Together with operational data, this information is stored in the experiment
    294      1.1  christos directory and can be viewed in the second phase.
    295      1.1  christos 
    296      1.1  christos For example, the PC information is used to derive where the program was when
    297      1.1  christos it was halted.  Since the sampling interval is known, it is relatively easy to
    298      1.1  christos derive how much time was spent in the various parts of the program.
    299      1.1  christos 
    300      1.1  christos The opposite technique is generally referred to as @emph{tracing}.  With
    301      1.1  christos tracing, the target is instrumented with specific calls that collect the
    302      1.1  christos requested information.
    303      1.1  christos 
    304      1.1  christos These are some of the pros and cons of PC sampling verus tracing:
    305      1.1  christos 
    306      1.1  christos @itemize
    307      1.1  christos 
    308      1.1  christos @item
    309      1.1  christos Since there is no need to recompile, existing executables can be used
    310      1.1  christos and the profile measures the behaviour of exactly the same executable that is
    311      1.1  christos used in production runs.
    312      1.1  christos 
    313  1.1.1.2  christos With tracing, one inherently profiles a different executable, because
    314  1.1.1.2  christos the calls to the instrumentation library may affect runtime behaviour.
    315      1.1  christos 
    316      1.1  christos @item
    317  1.1.1.2  christos With sampling, there are very few restrictions on what can be profiled and
    318  1.1.1.2  christos even without access to the source code, a basic profile can be made.
    319      1.1  christos 
    320      1.1  christos @item
    321      1.1  christos A downside of sampling is that, depending on the sampling frequency, small
    322      1.1  christos functions may be missed or not captured accurately.  Although this is rare,
    323  1.1.1.2  christos this may happen and is the reason why the user has control over the sampling
    324  1.1.1.2  christos rate.
    325      1.1  christos 
    326      1.1  christos @item
    327      1.1  christos While tracing produces precise information, sampling is statistical in nature.
    328      1.1  christos As a result, small variations may occur across seemingly identical runs.  We
    329      1.1  christos have not observed more than a few percent deviation though.  Especially if
    330      1.1  christos the target job executed for a sufficiently long time.
    331      1.1  christos 
    332      1.1  christos @item
    333      1.1  christos With sampling, it is not possible to get an accurate count how often
    334      1.1  christos functions are called.
    335      1.1  christos 
    336      1.1  christos @end itemize
    337      1.1  christos 
    338      1.1  christos @c -- A new node --------------------------------------------------------------
    339      1.1  christos @node    Steps Needed to Create a Profile
    340      1.1  christos @section Steps Needed to Create a Profile
    341      1.1  christos @c ----------------------------------------------------------------------------
    342      1.1  christos 
    343      1.1  christos Creating a profile takes two steps.  First the profile data needs to be
    344      1.1  christos generated.  This is followed by a viewing step to create a report from the
    345      1.1  christos information that has been gathered.
    346      1.1  christos 
    347      1.1  christos Every @ProductName{} command starts with @ToolName{}, the name of the driver.
    348  1.1.1.2  christos This is followed by a keyword to define the high level functionality.
    349  1.1.1.2  christos Depending on this keyword, a third qualifier may be needed to further narrow
    350  1.1.1.2  christos down the request.
    351  1.1.1.2  christos This combination is then followed by options that are specific to the
    352  1.1.1.2  christos functionality desired.
    353      1.1  christos 
    354      1.1  christos The command to gather, or ``collect'', the performance data is called
    355      1.1  christos @CollectApp{}.  Aside from numerous options, this command takes the name
    356      1.1  christos of the target executable as an input parameter.
    357      1.1  christos 
    358      1.1  christos Upon completion of the run, the performance data can be
    359      1.1  christos found in the newly created
    360      1.1  christos @cindex Experiment directory
    361      1.1  christos experiment directory.
    362      1.1  christos 
    363      1.1  christos Unless explicitly specified otherwise, a default
    364      1.1  christos name for this directory is chosen.  The name is @file{test.<n>.er} where
    365      1.1  christos @var{<n>} is the first integer number not in use yet for such a name.
    366      1.1  christos 
    367      1.1  christos For example, the first time @CollectApp{} is invoked, an experiment
    368      1.1  christos directory with the name @file{test.1.er} is created.
    369      1.1  christos Upon a subsequent invocation of @CollectApp{} in the same directory,
    370      1.1  christos an experiment directory with the name @file{test.2.er} will be created,
    371      1.1  christos and so forth.
    372      1.1  christos 
    373      1.1  christos Note that @CollectApp{} supports an option to explicitly name the experiment
    374      1.1  christos directory.
    375      1.1  christos Aside from the restriction that the name of this directory has to end
    376      1.1  christos with @samp{.er}, any valid directory name can be used for this.
    377      1.1  christos 
    378      1.1  christos Now that we have the performance data, the next step is to display it.
    379      1.1  christos 
    380      1.1  christos @IndexSubentry{@code{gprofng}, @code{display text}}
    381      1.1  christos The most commonly used command to view the performance information is
    382      1.1  christos @DisplayText{}.   This is a very extensive and customizable tool that
    383      1.1  christos produces the information in ASCII format.
    384      1.1  christos 
    385      1.1  christos @IndexSubentry{@code{gprofng}, @code{display html}}
    386      1.1  christos Another option is to use @DisplayHTML{}.  This tool generates a directory with
    387      1.1  christos files in html format.  These can be viewed in a browser, allowing for easy
    388      1.1  christos navigation through the profile data.
    389      1.1  christos 
    390      1.1  christos @c -- A new node --------------------------------------------------------------
    391      1.1  christos @node    A Mini Tutorial
    392      1.1  christos @chapter A Mini Tutorial
    393      1.1  christos @c ----------------------------------------------------------------------------
    394      1.1  christos 
    395      1.1  christos In this chapter we present and discuss the main functionality of @ToolName{}.
    396      1.1  christos This will be a practical approach, using an example code to generate profile
    397      1.1  christos data and show how to get various performance reports.
    398      1.1  christos 
    399      1.1  christos @menu
    400      1.1  christos * Getting Started::                 The basics of profiling with @ProductName().
    401      1.1  christos * Support for Multithreading::      Commands specific to multithreaded applications.
    402      1.1  christos * View Multiple Experiments::       Analyze multiple experiments simultaneously.
    403      1.1  christos * Profile Hardware Event Counters:: How to use hardware event counters.
    404      1.1  christos * Java Profiling::                  How to profile a Java application.
    405      1.1  christos @end menu
    406      1.1  christos 
    407      1.1  christos @c -- A new node --------------------------------------------------------------
    408      1.1  christos @node    Getting Started
    409      1.1  christos @section Getting Started
    410      1.1  christos @c ----------------------------------------------------------------------------
    411      1.1  christos 
    412      1.1  christos The information presented here provides a good and common basis for many
    413      1.1  christos profiling tasks, but there are more features that you may want to leverage.
    414      1.1  christos 
    415      1.1  christos These are covered in subsequent sections in this chapter.
    416      1.1  christos 
    417      1.1  christos @menu
    418      1.1  christos * The Example Program::                        A description of the example program used.
    419      1.1  christos * A First Profile::                            How to get the first profile.
    420      1.1  christos * The Source Code View::                       Display the metrics in the source code.
    421      1.1  christos * The Disassembly View::                       Display the metrics at the instruction level.
    422      1.1  christos * Display and Define the Metrics::             An example how to customize the metrics.
    423      1.1  christos * Customization of the Output::                An example how to customize the output.
    424      1.1  christos * Name the Experiment Directory::              Change the name of the experiment directory.
    425      1.1  christos * Control the Number of Lines in the Output::  Change the number of lines in the tables.
    426      1.1  christos * Sorting the Performance Data::               How to set the metric to sort by.
    427      1.1  christos * Scripting::                                  Use a script to execute the commands.
    428      1.1  christos * A More Elaborate Example::                   An example of customization.
    429      1.1  christos * The Call Tree::                              Display the dynamic call tree.
    430      1.1  christos * More Information on the Experiment::         How to get additional statistics.
    431      1.1  christos * Control the Sampling Frequency::             How to control the sampling granularity.
    432      1.1  christos * Information on Load Objects::                How to get more information on load objects.
    433      1.1  christos @end menu
    434      1.1  christos 
    435      1.1  christos @c -- A new node --------------------------------------------------------------
    436      1.1  christos @node       The Example Program
    437      1.1  christos @subsection The Example Program
    438      1.1  christos @c ----------------------------------------------------------------------------
    439      1.1  christos 
    440      1.1  christos Throughout this guide we use the same example C code that implements the
    441      1.1  christos multiplication of a vector of length @math{n} by an @math{m} by @math{n}
    442      1.1  christos matrix.  The result is stored in a vector of length @math{m}.
    443      1.1  christos @cindex Pthreads
    444      1.1  christos @cindex Posix Threads
    445      1.1  christos The algorithm has been parallelized using Posix Threads, or Pthreads for short.
    446      1.1  christos 
    447      1.1  christos The code was built using the @code{gcc} compiler and the name of the executable
    448      1.1  christos is
    449      1.1  christos @cindex mxv-pthreads
    450      1.1  christos @command{mxv-pthreads}.
    451      1.1  christos 
    452      1.1  christos The matrix sizes can be set through the @code{-m} and @code{-n} options.  The
    453  1.1.1.2  christos number of threads is set with the @code{-t} option.  These are additional
    454  1.1.1.2  christos threads that are used in the multiplication.  To increase the duration of
    455  1.1.1.2  christos the run, the computations are executed repeatedly.
    456      1.1  christos 
    457      1.1  christos This is an example that multiplies a @math{8000} by @math{4000} matrix with
    458      1.1  christos a vector of length @math{4000}.  Although this is a multithreaded application,
    459      1.1  christos initially we will be using @math{1} threads.  Later on we will show examples
    460      1.1  christos using multiple threads.
    461      1.1  christos 
    462      1.1  christos @smallexample
    463      1.1  christos @verbatim
    464      1.1  christos $ ./mxv-pthreads -m 8000 -n 4000 -t 1
    465      1.1  christos mxv: error check passed - rows = 8000 columns = 4000 threads = 1
    466      1.1  christos $
    467      1.1  christos @end verbatim
    468      1.1  christos @end smallexample
    469      1.1  christos 
    470      1.1  christos The program performs an internal check to verify that the computed results
    471      1.1  christos are correct.  The result of this check is printed, as well as the matrix
    472      1.1  christos sizes and the number of threads used.
    473      1.1  christos 
    474      1.1  christos @c -- A new node --------------------------------------------------------------
    475      1.1  christos @node       A First Profile
    476      1.1  christos @subsection A First Profile
    477      1.1  christos @c ----------------------------------------------------------------------------
    478      1.1  christos 
    479      1.1  christos The first step is to collect the performance data.  It is important to remember
    480      1.1  christos that much more information is gathered than may be shown by default.  Often a
    481      1.1  christos single data collection run is sufficient to get a lot of insight.
    482      1.1  christos 
    483      1.1  christos The @CollectApp{} command is used for the data collection.  Nothing needs to be
    484      1.1  christos changed in the way the application is executed.  The only difference is that it
    485      1.1  christos is now run under control of the tool, as shown below:
    486      1.1  christos 
    487      1.1  christos @cartouche
    488      1.1  christos @smallexample
    489      1.1  christos $ gprofng collect app ./mxv-pthreads -m 8000 -n 4000 -t 1
    490      1.1  christos @end smallexample
    491      1.1  christos @end cartouche
    492      1.1  christos 
    493      1.1  christos @noindent
    494      1.1  christos This produces the following output:
    495      1.1  christos 
    496      1.1  christos @smallexample
    497      1.1  christos @verbatim
    498      1.1  christos Creating experiment directory test.1.er (Process ID: 2749878) ...
    499      1.1  christos mxv: error check passed - rows = 8000 columns = 4000 threads = 1
    500      1.1  christos @end verbatim
    501      1.1  christos @end smallexample
    502      1.1  christos 
    503      1.1  christos We see a message that an experiment directory with the name @file{test.1.er}
    504  1.1.1.2  christos has been created.  The process id is also echoed.  The application completes
    505      1.1  christos as usual and we have our first experiment directory that can be analyzed.
    506      1.1  christos 
    507      1.1  christos The tool we use for this is called @DisplayText{}.  It takes the name of
    508      1.1  christos the experiment directory as an argument.
    509      1.1  christos 
    510      1.1  christos @cindex Interpreter mode
    511  1.1.1.2  christos If invoked this way, the tool starts in the interactive @emph{interpreter}
    512  1.1.1.2  christos mode.
    513  1.1.1.2  christos While in this environment, commands can be given and the tool responds.  This
    514  1.1.1.2  christos is illustrated below:
    515      1.1  christos 
    516      1.1  christos @smallexample
    517      1.1  christos @verbatim
    518      1.1  christos $ gprofng display text test.1.er
    519      1.1  christos Warning: History and command editing is not supported on this system.
    520  1.1.1.2  christos (gprofng-display-text) quit
    521      1.1  christos $
    522      1.1  christos @end verbatim
    523      1.1  christos @end smallexample
    524      1.1  christos 
    525      1.1  christos @cindex Command line mode
    526      1.1  christos While useful in certain cases, we prefer to use this tool in command line mode
    527      1.1  christos by specifying the commands to be issued when invoking the tool.  The way to do
    528      1.1  christos this is to prepend the command(s) with a hyphen (@samp{-}) if used on the
    529      1.1  christos command line.
    530      1.1  christos 
    531      1.1  christos Since this makes the command appear to be an option, they are also sometimes
    532      1.1  christos referred to as such, but technically they are commands.  This is the
    533      1.1  christos terminology we will use in this user guide, but for convenience the commands
    534      1.1  christos are also listed as options in the index.
    535      1.1  christos 
    536      1.1  christos For example,
    537      1.1  christos @IndexSubentry{Options,  @code{-functions}}
    538      1.1  christos @IndexSubentry{Commands, @code{functions}}
    539      1.1  christos below we use the @command{functions} command to request a list of the functions
    540      1.1  christos that have been executed, plus their respective CPU times:
    541      1.1  christos 
    542      1.1  christos @cartouche
    543      1.1  christos @smallexample
    544      1.1  christos $ gprofng display text -functions test.1.er
    545      1.1  christos @end smallexample
    546      1.1  christos @end cartouche
    547      1.1  christos 
    548      1.1  christos @smallexample
    549      1.1  christos @verbatim
    550      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
    551      1.1  christos 
    552      1.1  christos Excl. Total   Incl. Total    Name
    553      1.1  christos CPU           CPU
    554      1.1  christos  sec.      %   sec.      %
    555      1.1  christos 9.367 100.00  9.367 100.00   <Total>
    556      1.1  christos 8.926  95.30  8.926  95.30   mxv_core
    557      1.1  christos 0.210   2.24  0.420   4.49   init_data
    558      1.1  christos 0.080   0.85  0.210   2.24   drand48
    559      1.1  christos 0.070   0.75  0.130   1.39   erand48_r
    560      1.1  christos 0.060   0.64  0.060   0.64   __drand48_iterate
    561      1.1  christos 0.010   0.11  0.020   0.21   _int_malloc
    562      1.1  christos 0.010   0.11  0.010   0.11   sysmalloc
    563      1.1  christos 0.      0.    8.926  95.30   <static>@0x47960 (<libgp-collector.so>)
    564      1.1  christos 0.      0.    0.440   4.70   __libc_start_main
    565      1.1  christos 0.      0.    0.020   0.21   allocate_data
    566      1.1  christos 0.      0.    8.926  95.30   driver_mxv
    567      1.1  christos 0.      0.    0.440   4.70   main
    568      1.1  christos 0.      0.    0.020   0.21   malloc
    569      1.1  christos 0.      0.    8.926  95.30   start_thread
    570      1.1  christos @end verbatim
    571      1.1  christos @end smallexample
    572      1.1  christos 
    573  1.1.1.2  christos @noindent
    574  1.1.1.2  christos As easy and simple as these steps are, we do have a first profile of our
    575  1.1.1.2  christos program!
    576      1.1  christos 
    577      1.1  christos There are five columns.  The first four contain the
    578      1.1  christos @cindex Total CPU time
    579      1.1  christos ''Total CPU Time'', which
    580      1.1  christos is the sum of the user and system time. @xref{Inclusive and Exclusive Metrics}
    581      1.1  christos for an explanation of ``exclusive'' and ``inclusive'' times.
    582      1.1  christos 
    583      1.1  christos The first line echoes the metric that is used to sort the output.  By default,
    584  1.1.1.2  christos this is the exclusive CPU time, but through the @command{sort} command, the
    585  1.1.1.2  christos sort metric can be changed by the user.
    586      1.1  christos 
    587      1.1  christos Next, there are four columns with the exclusive and inclusive CPU times and the
    588      1.1  christos respective percentages.  This is followed by the name of the function.
    589      1.1  christos 
    590      1.1  christos @IndexSubentry{Miscellaneous, @code{<Total>}}
    591  1.1.1.2  christos The function with the name @code{<Total>} is not a user function.  It is a
    592      1.1  christos pseudo function introduced by @ToolName{}.  It is used to display the
    593      1.1  christos accumulated measured metric values.  In this example, we see that the total
    594      1.1  christos CPU time of this job was 9.367 seconds and it is scaled to 100%.  All
    595      1.1  christos other percentages in the same column are relative to this number.
    596      1.1  christos 
    597  1.1.1.2  christos @c -- If the metric is derived, for example the @code{IPC}, the value shown
    598  1.1.1.2  christos @c -- under
    599  1.1.1.2  christos @c -- @code{<Total>} is based upon the total values of the that are metrics
    600  1.1.1.2  christos @c -- used to
    601      1.1  christos @c -- compute the derived metric.
    602      1.1  christos @c -- @IndexSubentry{Hardware event counters, IPC}
    603      1.1  christos 
    604      1.1  christos With 8.926 seconds, function @code{mxv_core} takes 95.30% of the
    605  1.1.1.2  christos total time and is by far the most time consuming function.
    606  1.1.1.2  christos The exclusive and inclusive metrics are identical, which means that it
    607  1.1.1.2  christos is a leaf function not calling any other functions.
    608      1.1  christos 
    609      1.1  christos The next function in the list is @code{init_data}.  Although with 4.49%,
    610      1.1  christos the CPU time spent in this part is modest, this is an interesting entry because
    611      1.1  christos the inclusive CPU time of 0.420 seconds is twice the exclusive CPU time
    612      1.1  christos of 0.210 seconds.  Clearly this function is calling another function,
    613      1.1  christos or even more than one function and collectively this takes 0.210 seconds.
    614      1.1  christos Below we show the call tree feature that provides more details on the call
    615      1.1  christos structure of the application.
    616      1.1  christos 
    617      1.1  christos The function @code{<static>@@0x47960 (<libgp-collector.so>)} does odd and
    618      1.1  christos certainly not familiar.  It is one of the internal functions used by
    619      1.1  christos @CollectApp{} and can be ignored.  Also, while the inclusive time is high,
    620      1.1  christos the exclusive time is zero.  This means it doesn't contribute to the
    621      1.1  christos performance.
    622      1.1  christos 
    623  1.1.1.2  christos The question is how we know where this function originates from.  There are
    624      1.1  christos several commands to dig deeper an get more details on a function.
    625      1.1  christos @xref{Information on Load Objects}.
    626      1.1  christos 
    627      1.1  christos @c -- A new node --------------------------------------------------------------
    628      1.1  christos @node       The Source Code View
    629      1.1  christos @subsection The Source Code View
    630      1.1  christos @c ----------------------------------------------------------------------------
    631      1.1  christos 
    632      1.1  christos In general, the tuning efforts are best focused on the most time consuming
    633      1.1  christos part(s) of an application.  In this case that is easy, since over 95% of
    634      1.1  christos the total CPU time is spent in function @code{mxv_core}.
    635      1.1  christos It is now time to dig deeper and look
    636      1.1  christos @cindex Source level metrics
    637      1.1  christos at the metrics distribution at the source code level.  Since we measured
    638      1.1  christos CPU times, these are the metrics shown.
    639      1.1  christos 
    640      1.1  christos @IndexSubentry{Options,  @code{-source}}
    641      1.1  christos @IndexSubentry{Commands, @code{source}}
    642      1.1  christos The @code{source} command is used to accomplish this.  It takes the name of the
    643      1.1  christos function, not the source filename, as an argument.  This is demonstrated
    644      1.1  christos below, where the @DisplayText{} command is used to show the annotated
    645      1.1  christos source listing of function @code{mxv_core}.
    646      1.1  christos 
    647      1.1  christos Be aware that when using the @command{gcc} compiler, the source code has to
    648      1.1  christos be compiled with the @code{-g} option in order for the source code feature
    649      1.1  christos to work.  Otherwise the location(s) can not be determined.  For other compilers
    650      1.1  christos we recommend to check the documentation for such an option.
    651      1.1  christos 
    652      1.1  christos Below the command to display the source code of a function is shown.  Since at
    653      1.1  christos this point we are primarily interested in the timings only, we use the
    654      1.1  christos @code{metrics} command to request the exclusive and inclusive total CPU
    655      1.1  christos timings only. @xref{Display and Define the Metrics} for more information
    656      1.1  christos how to define the metrics to be displayed.
    657      1.1  christos 
    658      1.1  christos @cartouche
    659      1.1  christos @smallexample
    660      1.1  christos $ gprofng display text -metrics ei.totalcpu -source mxv_core test.1.er
    661      1.1  christos @end smallexample
    662      1.1  christos @end cartouche
    663      1.1  christos 
    664      1.1  christos The output is shown below.  It has been somewhat modified to fit the formatting
    665      1.1  christos constraints and reduce the number of lines.
    666      1.1  christos 
    667      1.1  christos @smallexample
    668      1.1  christos @verbatim
    669      1.1  christos Current metrics: e.totalcpu:i.totalcpu:name
    670      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.totalcpu )
    671      1.1  christos Source file: <apath>/mxv.c
    672      1.1  christos Object file: mxv-pthreads (found as test.1.er/archives/...)
    673      1.1  christos Load Object: mxv-pthreads (found as test.1.er/archives/...)
    674      1.1  christos 
    675      1.1  christos    Excl.     Incl.
    676      1.1  christos    Total     Total
    677      1.1  christos    CPU sec.  CPU sec.
    678      1.1  christos 
    679      1.1  christos    <lines deleted>
    680      1.1  christos                                 <Function: mxv_core>
    681      1.1  christos                             43. void __attribute__ ((noinline)) 
    682      1.1  christos                                      mxv_core (int64_t row_index_start,
    683      1.1  christos                             44.                int64_t row_index_end,
    684      1.1  christos                             45.                int64_t m,
    685      1.1  christos                             46.                int64_t n,
    686      1.1  christos                             47.                double **restrict A,
    687      1.1  christos                             48.                double *restrict b,
    688      1.1  christos                             49.                double *restrict c)
    689      1.1  christos                             50. {
    690      1.1  christos    0.        0.             50. {
    691      1.1  christos    0.        0.             51.   for (int64_t i=row_index_start;
    692      1.1  christos                                                  i<=row_index_end; i++)
    693      1.1  christos                             52.     {
    694      1.1  christos    0.        0.             53.       double row_sum = 0.0;
    695      1.1  christos ## 4.613     4.613          54.       for (int64_t j=0; j<n; j++)
    696      1.1  christos ## 4.313     4.313          55.         row_sum += A[i][j] * b[j];
    697      1.1  christos    0.        0.             56.       c[i] = row_sum;
    698      1.1  christos                             57.     }
    699      1.1  christos    0.        0.             58. }
    700      1.1  christos @end verbatim
    701      1.1  christos @end smallexample
    702      1.1  christos 
    703      1.1  christos The first line echoes the metrics that have been selected.  The second line
    704      1.1  christos is not very meaningful when looking at the source code listing, but it shows
    705      1.1  christos the metric that is used to sort the data.
    706      1.1  christos 
    707      1.1  christos The next three lines provide information on the location of the source file,
    708      1.1  christos the object file and the load object (@xref{Load Objects and Functions}).
    709      1.1  christos 
    710      1.1  christos Function @code{mxv_core} is part of a source file that has other functions
    711      1.1  christos as well.  These functions will be shown with the values for the metrics, but
    712      1.1  christos for lay-out purposes they have been removed in the output shown above.
    713      1.1  christos 
    714      1.1  christos The header is followed by the annotated source code listing.  The selected
    715      1.1  christos metrics are shown first, followed by a source line number, and the source code.
    716      1.1  christos @IndexSubentry{Miscellaneous ,@code{##}}
    717      1.1  christos The most time consuming line(s) are marked with the @code{##} symbol.  In
    718      1.1  christos this way they are easier to identify and find with a search.
    719      1.1  christos 
    720      1.1  christos What we see is that all of the time is spent in lines 54-55.
    721      1.1  christos 
    722      1.1  christos @IndexSubentry{Options,  @code{-lines}}
    723      1.1  christos @IndexSubentry{Commands, @code{lines}}
    724      1.1  christos A related command sometimes comes handy as well.  It is called @code{lines}
    725      1.1  christos and displays a list of the source lines and their metrics, ordered according
    726      1.1  christos to the current sort metric (@xref{Sorting the Performance Data}).
    727      1.1  christos 
    728      1.1  christos Below the command and the output.  For lay-out reasons, only the top 10 is
    729      1.1  christos shown here and the last part of the text on some lines has been replaced
    730      1.1  christos by dots.  The full text is @samp{instructions without line numbers} and
    731      1.1  christos means that the line number information for that function was not found.
    732      1.1  christos 
    733      1.1  christos @cartouche
    734      1.1  christos @smallexample
    735      1.1  christos $ gprofng display text -lines test.1.er
    736      1.1  christos @end smallexample
    737      1.1  christos @end cartouche
    738      1.1  christos 
    739      1.1  christos @smallexample
    740      1.1  christos @verbatim
    741      1.1  christos Lines sorted by metric: Exclusive Total CPU Time
    742      1.1  christos 
    743      1.1  christos Excl. Total   Incl. Total    Name
    744      1.1  christos CPU           CPU
    745      1.1  christos  sec.      %   sec.      %
    746      1.1  christos 9.367 100.00  9.367 100.00   <Total>
    747      1.1  christos 4.613  49.25  4.613  49.25   mxv_core, line 54 in "mxv.c"
    748      1.1  christos 4.313  46.05  4.313  46.05   mxv_core, line 55 in "mxv.c"
    749      1.1  christos 0.160   1.71  0.370   3.95   init_data, line 118 in "manage_data.c"
    750      1.1  christos 0.080   0.85  0.210   2.24   <Function: drand48, instructions ...>
    751      1.1  christos 0.070   0.75  0.130   1.39   <Function: erand48_r, instructions ...>
    752      1.1  christos 0.060   0.64  0.060   0.64   <Function: __drand48_iterate, ...>
    753      1.1  christos 0.040   0.43  0.040   0.43   init_data, line 124 in "manage_data.c"
    754      1.1  christos 0.010   0.11  0.020   0.21   <Function: _int_malloc, instructions ...>
    755      1.1  christos 0.010   0.11  0.010   0.11   <Function: sysmalloc, instructions ...>
    756      1.1  christos @end verbatim
    757      1.1  christos @end smallexample
    758      1.1  christos 
    759      1.1  christos What this overview immediately highlights is that the third most time consuming
    760      1.1  christos source line takes 0.370 seconds only.  This means that the inclusive time is
    761      1.1  christos only 3.95% and clearly this branch of the code hardly impacts the performance.
    762      1.1  christos 
    763      1.1  christos @c -- A new node --------------------------------------------------------------
    764      1.1  christos @node       The Disassembly View
    765      1.1  christos @subsection The Disassembly View
    766      1.1  christos @c ----------------------------------------------------------------------------
    767      1.1  christos 
    768      1.1  christos The source view is very useful to obtain more insight where the time is spent,
    769      1.1  christos but sometimes this is not sufficient.  The disassembly view provides more
    770      1.1  christos details since it shows the metrics at the instruction level.
    771      1.1  christos 
    772      1.1  christos This view is displayed with the
    773      1.1  christos @IndexSubentry{Options,  @code{-disasm}}
    774      1.1  christos @IndexSubentry{Commands, @code{disasm}}
    775      1.1  christos @command{disasm}
    776      1.1  christos command and as with the source view, it displays an annotated listing.  In this
    777      1.1  christos @cindex Instruction level metrics
    778      1.1  christos case it shows the instructions with the metrics, interleaved with the
    779      1.1  christos source lines.  The
    780      1.1  christos instructions have a reference in square brackets (@code{[} and @code{]})
    781      1.1  christos to the source line they correspond to.
    782      1.1  christos 
    783      1.1  christos @noindent
    784      1.1  christos We again focus on the tmings only and set the metrics accordingly:
    785      1.1  christos 
    786      1.1  christos @cartouche
    787      1.1  christos @smallexample
    788      1.1  christos $ gprofng display text -metrics ei.totalcpu -disasm mxv_core test.1.er
    789      1.1  christos @end smallexample
    790      1.1  christos @end cartouche
    791      1.1  christos 
    792      1.1  christos @smallexample
    793      1.1  christos @verbatim
    794      1.1  christos Current metrics: e.totalcpu:i.totalcpu:name
    795      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.totalcpu )
    796      1.1  christos Source file: <apath>/src/mxv.c
    797      1.1  christos Object file: mxv-pthreads (found as test.1.er/archives/...)
    798      1.1  christos Load Object: mxv-pthreads (found as test.1.er/archives/...)
    799      1.1  christos 
    800      1.1  christos    Excl.     Incl.
    801      1.1  christos    Total     Total
    802      1.1  christos    CPU sec.  CPU sec.
    803      1.1  christos 
    804      1.1  christos    <lines deleted>
    805      1.1  christos                       43. void __attribute__ ((noinline)) 
    806      1.1  christos                                mxv_core (int64_t row_index_start,
    807      1.1  christos                       44.                int64_t row_index_end,
    808      1.1  christos                       45.                int64_t m,
    809      1.1  christos                       46.                int64_t n,
    810      1.1  christos                       47.                double **restrict A,
    811      1.1  christos                       48.                double *restrict b,
    812      1.1  christos                       49.                double *restrict c)
    813      1.1  christos                       50. {
    814      1.1  christos                           <Function: mxv_core>
    815      1.1  christos    0.        0.           [50]   401d56:  mov    0x8(%rsp),%r10
    816      1.1  christos                       51.   for (int64_t i=row_index_start;
    817      1.1  christos                                            i<=row_index_end; i++)
    818      1.1  christos    0.        0.           [51]   401d5b:  cmp    %rsi,%rdi
    819      1.1  christos    0.        0.           [51]   401d5e:  jg     0x47
    820      1.1  christos    0.        0.           [51]   401d60:  add    $0x1,%rsi
    821      1.1  christos    0.        0.           [51]   401d64:  jmp    0x36
    822      1.1  christos                       52.     {
    823      1.1  christos                       53.       double row_sum = 0.0;
    824      1.1  christos                       54.       for (int64_t j=0; j<n; j++)
    825      1.1  christos                       55          row_sum += A[i][j] * b[j];
    826      1.1  christos    0.        0.           [55]   401d66:  mov    (%r8,%rdi,8),%rdx
    827      1.1  christos    0.        0.           [54]   401d6a:  mov    $0x0,%eax
    828      1.1  christos    0.        0.           [53]   401d6f:  pxor   %xmm1,%xmm1
    829      1.1  christos    0.110     0.110        [55]   401d73:  movsd  (%rdx,%rax,8),%xmm0
    830      1.1  christos    1.921     1.921        [55]   401d78:  mulsd  (%r9,%rax,8),%xmm0
    831      1.1  christos    2.282     2.282        [55]   401d7e:  addsd  %xmm0,%xmm1
    832      1.1  christos ## 4.613     4.613        [54]   401d82:  add    $0x1,%rax
    833      1.1  christos    0.        0.           [54]   401d86:  cmp    %rax,%rcx
    834      1.1  christos    0.        0.           [54]   401d89:  jne    0xffffffffffffffea
    835      1.1  christos                       56.       c[i] = row_sum;
    836      1.1  christos    0.        0.           [56]   401d8b:  movsd  %xmm1,(%r10,%rdi,8)
    837      1.1  christos    0.        0.           [51]   401d91:  add    $0x1,%rdi
    838      1.1  christos    0.        0.           [51]   401d95:  cmp    %rsi,%rdi
    839      1.1  christos    0.        0.           [51]   401d98:  je     0xd
    840      1.1  christos    0.        0.           [53]   401d9a:  pxor   %xmm1,%xmm1
    841      1.1  christos    0.        0.           [54]   401d9e:  test   %rcx,%rcx
    842      1.1  christos    0.        0.           [54]   401da1:  jg     0xffffffffffffffc5
    843      1.1  christos    0.        0.           [54]   401da3:  jmp    0xffffffffffffffe8
    844      1.1  christos                       57.     }
    845      1.1  christos                       58. }
    846      1.1  christos    0.        0.           [58]   401da5:  ret
    847      1.1  christos @end verbatim
    848      1.1  christos @end smallexample
    849      1.1  christos 
    850  1.1.1.2  christos For each instruction, the timing values are given and we can immediately
    851      1.1  christos identify the most expensive instructions.  As with the source level view,
    852      1.1  christos these are marked with the @code{##} symbol.
    853      1.1  christos 
    854      1.1  christos It comes as no surprise that the time consuming instructions originate from
    855      1.1  christos the source code at lines 54-55.
    856      1.1  christos One thing to note is that the source line numbers no longer appear in
    857  1.1.1.2  christos sequential order.
    858      1.1  christos This is because the compiler has re-ordered the instructions as part of
    859  1.1.1.2  christos the code optimizations it has performed.
    860      1.1  christos 
    861      1.1  christos As illustrated below and similar to the @command{lines} command, we can get
    862      1.1  christos an overview of the instructions executed by using the
    863      1.1  christos @IndexSubentry{Options,  @code{-pcs}}
    864      1.1  christos @IndexSubentry{Commands, @code{pcs}}
    865      1.1  christos @command{pcs}
    866      1.1  christos command.
    867      1.1  christos 
    868      1.1  christos @noindent
    869      1.1  christos Below the command and the output, which again has been restricted
    870      1.1  christos to 10 lines.  As before, some lines have been shortened for lay-out
    871      1.1  christos purposes.
    872      1.1  christos 
    873      1.1  christos @cartouche
    874      1.1  christos @smallexample
    875      1.1  christos $ gprofng display text -pcs test.1.er
    876      1.1  christos @end smallexample
    877      1.1  christos @end cartouche
    878      1.1  christos 
    879      1.1  christos @smallexample
    880      1.1  christos @verbatim
    881      1.1  christos PCs sorted by metric: Exclusive Total CPU Time
    882      1.1  christos 
    883      1.1  christos Excl. Total   Incl. Total    Name
    884      1.1  christos CPU           CPU
    885      1.1  christos  sec.      %   sec.      %
    886      1.1  christos 9.367 100.00  9.367 100.00   <Total>
    887      1.1  christos 4.613  49.25  4.613  49.25   mxv_core + 0x0000002C, line 54 in "mxv.c"
    888      1.1  christos 2.282  24.36  2.282  24.36   mxv_core + 0x00000028, line 55 in "mxv.c"
    889      1.1  christos 1.921  20.51  1.921  20.51   mxv_core + 0x00000022, line 55 in "mxv.c"
    890      1.1  christos 0.150   1.60  0.150   1.60   init_data + 0x000000AC, line 118 in ...
    891      1.1  christos 0.110   1.18  0.110   1.18   mxv_core + 0x0000001D, line 55 in "mxv.c"
    892      1.1  christos 0.040   0.43  0.040   0.43   drand48 + 0x00000022
    893      1.1  christos 0.040   0.43  0.040   0.43   init_data + 0x000000F1, line 124 in ...
    894      1.1  christos 0.030   0.32  0.030   0.32   __drand48_iterate + 0x0000001E
    895      1.1  christos 0.020   0.21  0.020   0.21   __drand48_iterate + 0x00000038
    896      1.1  christos @end verbatim
    897      1.1  christos @end smallexample
    898      1.1  christos 
    899      1.1  christos @noindent
    900      1.1  christos What we see is that the top three instructions take 94% of the total CPU time
    901      1.1  christos and any optimizations should focus on this part of the code..
    902      1.1  christos 
    903      1.1  christos @c -- A new node --------------------------------------------------------------
    904      1.1  christos @node       Display and Define the Metrics
    905      1.1  christos @subsection Display and Define the Metrics
    906      1.1  christos @c ----------------------------------------------------------------------------
    907      1.1  christos 
    908      1.1  christos The metrics shown by @DisplayText{} are useful, but there is more recorded
    909      1.1  christos than displayed by default.  We can customize the values shown by defining the
    910      1.1  christos metrics ourselves.
    911      1.1  christos 
    912      1.1  christos There are two commands related to changing the metrics shown:
    913      1.1  christos @IndexSubentry{Options,  @code{-metric_list}}
    914      1.1  christos @IndexSubentry{Commands, @code{metric_list}}
    915      1.1  christos @command{metric_list} and
    916      1.1  christos @IndexSubentry{Options,  @code{-metrics}}
    917      1.1  christos @IndexSubentry{Commands, @code{metrics}}
    918      1.1  christos @command{metrics}.
    919      1.1  christos 
    920      1.1  christos The first command shows the currently selected metrics, plus all the metrics
    921      1.1  christos that have been stored as part of the experiment.  The second command may be
    922      1.1  christos used to define the metric list.
    923      1.1  christos 
    924      1.1  christos @noindent
    925      1.1  christos This is the way to get the information about the metrics:
    926      1.1  christos 
    927      1.1  christos @IndexSubentry{Options,  @code{-metric_list}}
    928      1.1  christos @IndexSubentry{Commands, @code{metric_list}}
    929      1.1  christos @cartouche
    930      1.1  christos @smallexample
    931      1.1  christos $ gprofng display text -metric_list test.1.er
    932      1.1  christos @end smallexample
    933      1.1  christos @end cartouche
    934      1.1  christos 
    935      1.1  christos @noindent
    936      1.1  christos This is the output:
    937      1.1  christos 
    938      1.1  christos @smallexample
    939      1.1  christos @verbatim
    940      1.1  christos Current metrics: e.%totalcpu:i.%totalcpu:name
    941      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu )
    942      1.1  christos Available metrics:
    943      1.1  christos Exclusive Total CPU Time: e.%totalcpu
    944      1.1  christos Inclusive Total CPU Time: i.%totalcpu
    945      1.1  christos                     Size: size
    946      1.1  christos               PC Address: address
    947      1.1  christos                     Name: name
    948      1.1  christos @end verbatim
    949      1.1  christos @end smallexample
    950      1.1  christos 
    951      1.1  christos This shows the metrics that are currently used, the metric that is used to sort
    952      1.1  christos the data and all the metrics that have been recorded, but are not necessarily
    953      1.1  christos shown.
    954      1.1  christos 
    955      1.1  christos @cindex Default metrics
    956      1.1  christos In this case, the current metrics are set to the exclusive and inclusive
    957      1.1  christos total CPU times, the respective percentages, and the name of the function,
    958      1.1  christos or load object.
    959      1.1  christos 
    960      1.1  christos @IndexSubentry{Options,  @code{-metrics}}
    961      1.1  christos @IndexSubentry{Commands, @code{metrics}}
    962      1.1  christos The @code{metrics} command is used to define the metrics that need to be
    963      1.1  christos displayed.
    964      1.1  christos 
    965      1.1  christos For example, to swap the exclusive and inclusive metrics, use the following
    966      1.1  christos metric definition: @code{i.%totalcpu:e.%totalcpu}.
    967      1.1  christos 
    968      1.1  christos Since the metrics can be tailored for different views, there is also a way
    969      1.1  christos to reset them to the default.  This is done through the special keyword
    970      1.1  christos @code{default} for the metrics definition (@command{-metrics default}).
    971      1.1  christos @IndexSubentry{Metrics, Reset to default}
    972      1.1  christos 
    973      1.1  christos @c -- A new node --------------------------------------------------------------
    974      1.1  christos @node       Customization of the Output
    975      1.1  christos @subsection Customization of the Output
    976      1.1  christos @c ----------------------------------------------------------------------------
    977      1.1  christos 
    978      1.1  christos With the information just given, the function overview can be customized.
    979      1.1  christos For sake of the example, we would like to display the name of the function
    980      1.1  christos first, only followed by the exclusive CPU time, given as an absolute number
    981      1.1  christos and a percentage.
    982      1.1  christos 
    983      1.1  christos Note that the commands are parsed in order of appearance.  This is why we
    984      1.1  christos need to define the metrics @emph{before} requesting the function overview:
    985      1.1  christos 
    986      1.1  christos @cartouche
    987      1.1  christos @smallexample
    988      1.1  christos $ gprofng display text -metrics name:e.%totalcpu -functions test.1.er
    989      1.1  christos @end smallexample
    990      1.1  christos @end cartouche
    991      1.1  christos 
    992      1.1  christos @smallexample
    993      1.1  christos @verbatim
    994      1.1  christos Current metrics: name:e.%totalcpu
    995      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu )
    996      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
    997      1.1  christos 
    998      1.1  christos Name                                     Excl. Total
    999      1.1  christos                                          CPU
   1000      1.1  christos                                           sec.      %
   1001      1.1  christos  <Total>                                 9.367 100.00
   1002      1.1  christos  mxv_core                                8.926  95.30
   1003      1.1  christos  init_data                               0.210   2.24
   1004      1.1  christos  drand48                                 0.080   0.85
   1005      1.1  christos  erand48_r                               0.070   0.75
   1006      1.1  christos  __drand48_iterate                       0.060   0.64
   1007      1.1  christos  _int_malloc                             0.010   0.11
   1008      1.1  christos  sysmalloc                               0.010   0.11
   1009      1.1  christos  <static>@0x47960 (<libgp-collector.so>) 0.      0.
   1010      1.1  christos  __libc_start_main                       0.      0.
   1011      1.1  christos  allocate_data                           0.      0.
   1012      1.1  christos  driver_mxv                              0.      0.
   1013      1.1  christos  main                                    0.      0.
   1014      1.1  christos  malloc                                  0.      0.
   1015      1.1  christos  start_thread                            0.      0.
   1016      1.1  christos @end verbatim
   1017      1.1  christos @end smallexample
   1018      1.1  christos 
   1019      1.1  christos This was a first and simple example how to customize the output.  Note that we
   1020      1.1  christos did not rerun our profiling job and merely modified the display settings.
   1021      1.1  christos Below we will show other and also more advanced examples of customization.
   1022      1.1  christos 
   1023      1.1  christos @c -- A new node --------------------------------------------------------------
   1024      1.1  christos @node       Name the Experiment Directory
   1025      1.1  christos @subsection Name the Experiment Directory
   1026      1.1  christos @c ----------------------------------------------------------------------------
   1027      1.1  christos 
   1028      1.1  christos When using @CollectApp{}, the default names for experiments work fine, but
   1029      1.1  christos they are quite generic.  It is often more convenient to select a more
   1030      1.1  christos descriptive name.  For example, one that reflects conditions for the experiment
   1031      1.1  christos conducted, like the number of threads used.
   1032      1.1  christos 
   1033      1.1  christos For this, the mutually exclusive @code{-o} and @code{-O} options come in handy.
   1034      1.1  christos Both may be used to provide a name for the experiment directory, but the
   1035      1.1  christos behaviour of @CollectApp{} is different.
   1036      1.1  christos 
   1037      1.1  christos With the
   1038      1.1  christos @IndexSubentry{Options, @code{-o}}
   1039      1.1  christos @samp{-o}
   1040      1.1  christos option, an existing experiment directory is not overwritten.  Any directory
   1041      1.1  christos with the same name either needs to be renamed, moved, or removed, before the
   1042      1.1  christos experiment can be conducted.
   1043      1.1  christos 
   1044      1.1  christos This is in contrast with the behaviour for the
   1045      1.1  christos @IndexSubentry{Options, @code{-O}}
   1046      1.1  christos @samp{-O}
   1047      1.1  christos option.  Any existing directory with the same name is silently overwritten.
   1048      1.1  christos 
   1049      1.1  christos Be aware that the name of the experiment directory has to end with @file{.er}.
   1050      1.1  christos 
   1051      1.1  christos @c -- A new node --------------------------------------------------------------
   1052      1.1  christos @node       Control the Number of Lines in the Output
   1053      1.1  christos @subsection Control the Number of Lines in the Output
   1054      1.1  christos @c ----------------------------------------------------------------------------
   1055      1.1  christos 
   1056      1.1  christos @IndexSubentry{Options,  @code{-limit}}
   1057      1.1  christos @IndexSubentry{Commands, @code{limit}}
   1058  1.1.1.2  christos The @command{limit} @var{<n>} command can be used to control the number of
   1059  1.1.1.2  christos lines printed in various views.  For example it impacts the function view, but
   1060  1.1.1.2  christos also takes effect for other display commands, like @command{lines}.
   1061      1.1  christos 
   1062      1.1  christos The argument @var{<n>} should be a positive integer number.  It sets the number
   1063      1.1  christos of lines in the (function) view.  A value of zero resets the limit to the
   1064      1.1  christos default.
   1065      1.1  christos 
   1066      1.1  christos Be aware that the pseudo-function @code{<Total>} counts as a regular function.
   1067      1.1  christos For example @command{limit 10} displays nine user level functions.
   1068      1.1  christos 
   1069      1.1  christos @c -- A new node --------------------------------------------------------------
   1070      1.1  christos @node       Sorting the Performance Data
   1071      1.1  christos @subsection Sorting the Performance Data
   1072      1.1  christos @c ----------------------------------------------------------------------------
   1073      1.1  christos 
   1074      1.1  christos @IndexSubentry{Options,  @code{-sort}}
   1075      1.1  christos @IndexSubentry{Commands, @code{sort}}
   1076      1.1  christos The @command{sort} @var{<key>} command sets the key to be used when sorting the
   1077      1.1  christos performance data.
   1078      1.1  christos 
   1079      1.1  christos The key is a valid metric definition, but the
   1080      1.1  christos @IndexSubentry{Metrics, Visibility field}
   1081      1.1  christos visibility field
   1082      1.1  christos (@xref{Metric Definitions})
   1083      1.1  christos in the metric
   1084      1.1  christos definition is ignored, since this does not affect the outcome of the sorting
   1085      1.1  christos operation.
   1086      1.1  christos For example if the sort key is set to @code{e.totalcpu}, the values
   1087      1.1  christos will be sorted in descending order with respect to the exclusive total
   1088      1.1  christos CPU time.
   1089      1.1  christos 
   1090      1.1  christos @IndexSubentry{Sort, Reverse order}
   1091      1.1  christos The data can be sorted in reverse order by prepending the metric definition
   1092      1.1  christos with a minus (@samp{-}) sign.  For example @command{sort -e.totalcpu}.
   1093      1.1  christos 
   1094      1.1  christos @IndexSubentry{Sort, Reset to default}
   1095      1.1  christos A default metric for the sort operation has been defined and since this is
   1096      1.1  christos a persistent command, this default can be restored with @code{default} as
   1097      1.1  christos the key (@command{sort default}).
   1098      1.1  christos 
   1099      1.1  christos @c -- A new node --------------------------------------------------------------
   1100      1.1  christos @node       Scripting
   1101      1.1  christos @subsection Scripting
   1102      1.1  christos @c ----------------------------------------------------------------------------
   1103      1.1  christos 
   1104      1.1  christos @cindex Script files
   1105      1.1  christos The list with commands for @DisplayText{} can be very long.  This is tedious
   1106      1.1  christos and also error prone.  Luckily, there is an easier and elegant way to control
   1107      1.1  christos the output of this tool.
   1108      1.1  christos 
   1109      1.1  christos @IndexSubentry{Options,  @code{-script}}
   1110      1.1  christos @IndexSubentry{Commands, @code{script}}
   1111      1.1  christos Through the @command{script} command, the name of a file with commands can be
   1112      1.1  christos passed in.  These commands are parsed and executed as if they appeared on
   1113      1.1  christos the command line in the same order as encountered in the file.  The commands
   1114      1.1  christos in this script file can actually be mixed with commands on the command line
   1115      1.1  christos and multiple script files may be used.
   1116      1.1  christos The difference between the commands in the script file and those used on the
   1117      1.1  christos command line is that the latter require a leading dash (@samp{-}) symbol.
   1118      1.1  christos 
   1119      1.1  christos Comment lines in a script file are supported.  They need to start with the
   1120      1.1  christos @samp{#} symbol.
   1121      1.1  christos 
   1122      1.1  christos @c -- A new node --------------------------------------------------------------
   1123      1.1  christos @node       A More Elaborate Example
   1124      1.1  christos @subsection A More Elaborate Example
   1125      1.1  christos @c ----------------------------------------------------------------------------
   1126      1.1  christos 
   1127      1.1  christos With the information presented so far, we can customize our data
   1128      1.1  christos gathering and display commands.
   1129      1.1  christos 
   1130      1.1  christos As an example, we would like to use @file{mxv.1.thr.er} as the name for the
   1131      1.1  christos experiment directory.  In this way, the name of the algorithm and the
   1132      1.1  christos number of threads that were used are included in the name.
   1133      1.1  christos We also don't mind to overwrite an existing
   1134      1.1  christos experiment directory with the same name.
   1135      1.1  christos 
   1136      1.1  christos All that needs to be done is to use the
   1137      1.1  christos @IndexSubentry{Options, @code{-O}}
   1138      1.1  christos @samp{-O}
   1139      1.1  christos option, followed by the directory name of choice when running @CollectApp{}:
   1140      1.1  christos 
   1141      1.1  christos @cartouche
   1142      1.1  christos @smallexample
   1143      1.1  christos $ exe=mxv-pthreads
   1144      1.1  christos $ m=8000
   1145      1.1  christos $ n=4000
   1146      1.1  christos $ gprofng collect app -O mxv.1.thr.er ./$exe -m $m -n $n -t 1
   1147      1.1  christos @end smallexample
   1148      1.1  christos @end cartouche
   1149      1.1  christos 
   1150      1.1  christos Since we want to customize the profile and prefer to keep the command line
   1151      1.1  christos short, the commands to generate the profile are put into a file with the
   1152      1.1  christos name @file{my-script}:
   1153      1.1  christos 
   1154      1.1  christos @smallexample
   1155      1.1  christos @verbatim
   1156      1.1  christos $ cat my-script
   1157      1.1  christos # This is my first gprofng script
   1158      1.1  christos # Set the metrics
   1159      1.1  christos metrics i.%totalcpu:e.%totalcpu:name
   1160      1.1  christos # Use the exclusive time to sort
   1161      1.1  christos sort e.totalcpu
   1162      1.1  christos # Limit the function list to 5 lines
   1163      1.1  christos limit 5
   1164      1.1  christos # Show the function list
   1165      1.1  christos functions
   1166      1.1  christos @end verbatim
   1167      1.1  christos @end smallexample
   1168      1.1  christos 
   1169      1.1  christos This script file is specified as input to the @DisplayText{} command
   1170      1.1  christos that is used to display the performance information stored in experiment
   1171      1.1  christos directory @file{mxv.1.thr.er}:
   1172      1.1  christos 
   1173      1.1  christos @cartouche
   1174      1.1  christos @smallexample
   1175      1.1  christos $ gprofng display text -script my-script mxv.1.thr.er
   1176      1.1  christos @end smallexample
   1177      1.1  christos @end cartouche
   1178      1.1  christos 
   1179      1.1  christos This command produces the following output:
   1180      1.1  christos 
   1181      1.1  christos @smallexample
   1182      1.1  christos @verbatim
   1183      1.1  christos # This is my first gprofng script
   1184      1.1  christos # Set the metrics
   1185      1.1  christos Current metrics: i.%totalcpu:e.%totalcpu:name
   1186      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu )
   1187      1.1  christos # Use the exclusive time to sort
   1188      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu )
   1189      1.1  christos # Limit the function list to 5 lines
   1190      1.1  christos Print limit set to 5
   1191      1.1  christos # Show the function list
   1192      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   1193      1.1  christos 
   1194      1.1  christos Incl. Total   Excl. Total    Name
   1195      1.1  christos CPU           CPU
   1196      1.1  christos  sec.      %   sec.      %
   1197      1.1  christos 9.703 100.00  9.703 100.00   <Total>
   1198      1.1  christos 9.226  95.09  9.226  95.09   mxv_core
   1199      1.1  christos 0.455   4.69  0.210   2.17   init_data
   1200      1.1  christos 0.169   1.75  0.123   1.26   erand48_r
   1201      1.1  christos 0.244   2.52  0.075   0.77   drand48
   1202      1.1  christos @end verbatim
   1203      1.1  christos @end smallexample
   1204      1.1  christos 
   1205      1.1  christos In the first part of the output the comment lines in the script file are
   1206  1.1.1.2  christos echoed.  These are interleaved with an acknowledgement message for the
   1207  1.1.1.2  christos commands.
   1208      1.1  christos 
   1209      1.1  christos This is followed by a profile consisting of 5 lines only.  For both metrics,
   1210  1.1.1.2  christos the percentages plus the timings are given.  The numbers are sorted with
   1211  1.1.1.2  christos respect to the exclusive total CPU time.  Although this is the default, for
   1212      1.1  christos demonstration purposes we use the @command{sort} command to explicitly define
   1213      1.1  christos the metric for the sort.
   1214      1.1  christos 
   1215      1.1  christos While we executed the same job as before and only changed the name of the
   1216      1.1  christos experiment directory, the results are somewhat different.  This is sampling
   1217      1.1  christos in action.  The numbers are not all that different though.
   1218      1.1  christos It is seen that function @code{mxv_core} is responsbile for
   1219      1.1  christos 95% of the CPU time and @code{init_data} takes 4.5% only.
   1220      1.1  christos 
   1221      1.1  christos @c -- A new node --------------------------------------------------------------
   1222      1.1  christos @node       The Call Tree
   1223      1.1  christos @subsection The Call Tree
   1224      1.1  christos @c ----------------------------------------------------------------------------
   1225      1.1  christos 
   1226      1.1  christos The call tree shows the dynamic structure of the application by displaying the
   1227      1.1  christos functions executed and their parent.  The CPU time attributed to each function
   1228  1.1.1.2  christos is shown as well.  This view helps to find the most expensive
   1229      1.1  christos execution path in the program.
   1230      1.1  christos 
   1231      1.1  christos @IndexSubentry{Options,  @code{-calltree}}
   1232      1.1  christos @IndexSubentry{Commands, @code{calltree}}
   1233      1.1  christos This feature is enabled through the @command{calltree} command.  For example,
   1234      1.1  christos this is how to get the call tree for our current experiment:
   1235      1.1  christos 
   1236      1.1  christos @cartouche
   1237      1.1  christos @smallexample
   1238      1.1  christos $ gprofng display text -calltree mxv.1.thr.er
   1239      1.1  christos @end smallexample
   1240      1.1  christos @end cartouche
   1241      1.1  christos 
   1242      1.1  christos This displays the following structure:
   1243      1.1  christos 
   1244      1.1  christos @smallexample
   1245      1.1  christos @verbatim
   1246      1.1  christos Functions Call Tree. Metric: Attributed Total CPU Time
   1247      1.1  christos 
   1248      1.1  christos Attr. Total    Name
   1249      1.1  christos CPU
   1250      1.1  christos  sec.      %
   1251      1.1  christos 9.703 100.00   +-<Total>
   1252      1.1  christos 9.226  95.09     +-start_thread
   1253      1.1  christos 9.226  95.09     |  +-<static>@0x47960 (<libgp-collector.so>)
   1254      1.1  christos 9.226  95.09     |    +-driver_mxv
   1255      1.1  christos 9.226  95.09     |      +-mxv_core
   1256      1.1  christos 0.477   4.91     +-__libc_start_main
   1257      1.1  christos 0.477   4.91       +-main
   1258      1.1  christos 0.455   4.69         +-init_data
   1259      1.1  christos 0.244   2.52         |  +-drand48
   1260      1.1  christos 0.169   1.75         |    +-erand48_r
   1261      1.1  christos 0.047   0.48         |      +-__drand48_iterate
   1262      1.1  christos 0.021   0.22         +-allocate_data
   1263      1.1  christos 0.021   0.22         |  +-malloc
   1264      1.1  christos 0.021   0.22         |    +-_int_malloc
   1265      1.1  christos 0.006   0.06         |      +-sysmalloc
   1266      1.1  christos 0.003   0.03         |        +-__default_morecore
   1267      1.1  christos 0.003   0.03         |          +-sbrk
   1268      1.1  christos 0.003   0.03         |            +-brk
   1269      1.1  christos 0.001   0.01         +-pthread_create
   1270      1.1  christos 0.001   0.01           +-__pthread_create_2_1
   1271      1.1  christos @end verbatim
   1272      1.1  christos @end smallexample
   1273      1.1  christos 
   1274      1.1  christos At first sight this may not be what is expected and some explanation is in
   1275      1.1  christos place.
   1276      1.1  christos 
   1277      1.1  christos @c ----------------------------------------------------------------------------
   1278      1.1  christos @c TBD: Revise this text when we have user and machine mode.
   1279      1.1  christos @c ----------------------------------------------------------------------------
   1280      1.1  christos The top function is the pseudo-function @code{<Total>} that we have seen
   1281      1.1  christos before.  It is introduced and shown here to provide the total value of the
   1282      1.1  christos metric(s).
   1283      1.1  christos 
   1284      1.1  christos We also see function @code{<static>@@0x47960} in the call tree and apparently
   1285      1.1  christos it is from @code{libgp-collector.so}, a library that is internal to
   1286      1.1  christos @ToolName{}.
   1287      1.1  christos The @code{<static>} marker, followed by the program counter, is shown if the
   1288      1.1  christos name of the function cannot be found.  This function is part of the
   1289      1.1  christos implementation of the data collection process and should be hidden to the
   1290      1.1  christos user.  This is part of a planned future enhancement.
   1291      1.1  christos 
   1292      1.1  christos In general, if a view has a function that does not appear to be part of the
   1293      1.1  christos user code, or seems odd anyhow, the @command{objects} and @command{fsingle}
   1294      1.1  christos @IndexSubentry{Options,  @code{-objects}}
   1295      1.1  christos @IndexSubentry{Commands, @code{objects}}
   1296      1.1  christos @IndexSubentry{Options,  @code{-fsingle}}
   1297      1.1  christos @IndexSubentry{Commands, @code{fsingle}}
   1298      1.1  christos commands are very useful
   1299      1.1  christos to find out more about load objects in general, but also to help identify
   1300      1.1  christos an unknown entry in the function overview. @xref{Load Objects and Functions}.
   1301      1.1  christos 
   1302      1.1  christos Another thing to note is that there are two main branches.  The one under
   1303      1.1  christos @code{<static>@@0x47960} and the second one under @code{__libc_start_main}.
   1304      1.1  christos This reflects the fact that this is a multithreaded program and the
   1305      1.1  christos threaded part shows up as a separate branch in the call tree.
   1306      1.1  christos 
   1307      1.1  christos The way to interpret this structure is as follows.  The program starts
   1308      1.1  christos under control of @code{__libc_start_main}.  This executes the main program
   1309      1.1  christos called @code{main}, which at the top level executes functions
   1310      1.1  christos @code{init_data}, @code{allocate_data}, and @code{pthread_create}.
   1311      1.1  christos The latter function creates and executes the additional thread(s).
   1312      1.1  christos 
   1313      1.1  christos For this multithreaded part of the code, we need to look at the branch
   1314      1.1  christos under function @code{start_thread} that calls the driver code for the
   1315      1.1  christos matrix-vector multiplication (@code{driver_mxv}), which executes the function
   1316      1.1  christos that performs the actual multiplication (@code{mxv_core}).
   1317      1.1  christos 
   1318      1.1  christos There are two things worth noting for the call tree feature:
   1319      1.1  christos 
   1320      1.1  christos @itemize
   1321      1.1  christos 
   1322      1.1  christos @item
   1323      1.1  christos This is a dynamic tree and since sampling is used, it most likely looks
   1324      1.1  christos slighlty different across seemingly identical profile runs.  In case the
   1325      1.1  christos run times are short, it is worth considering to use a high resolution
   1326      1.1  christos through the
   1327      1.1  christos @IndexSubentry{Options, @code{-p}}
   1328      1.1  christos @samp{-p}
   1329      1.1  christos option.  For example use @samp{-p hi} to increase the sampling rate.
   1330      1.1  christos 
   1331      1.1  christos @item
   1332      1.1  christos In case hardware event counters have been enabled
   1333      1.1  christos (@xref{Profile Hardware Event Counters}), these values are also displayed
   1334      1.1  christos in the call tree view.
   1335      1.1  christos 
   1336      1.1  christos @end itemize
   1337      1.1  christos 
   1338      1.1  christos @c -- A new node --------------------------------------------------------------
   1339      1.1  christos @node       More Information on the Experiment
   1340      1.1  christos @subsection More Information on the Experiment
   1341      1.1  christos @c ----------------------------------------------------------------------------
   1342      1.1  christos 
   1343      1.1  christos The experiment directory not only contains performance related data.  Several
   1344      1.1  christos system characteristics, the profiling command executed, plus some global
   1345      1.1  christos performance statistics are stored and can be displayed.
   1346      1.1  christos 
   1347      1.1  christos @IndexSubentry{Options,  @code{-header}}
   1348      1.1  christos @IndexSubentry{Commands, @code{header}}
   1349      1.1  christos The @command{header} command displays information about the experiment(s).
   1350      1.1  christos For example, this is command is used to extract this data from for our
   1351      1.1  christos experiment directory:
   1352      1.1  christos 
   1353      1.1  christos @cartouche
   1354      1.1  christos @smallexample
   1355      1.1  christos $ gprofng display text -header mxv.1.thr.er
   1356      1.1  christos @end smallexample
   1357      1.1  christos @end cartouche
   1358      1.1  christos 
   1359      1.1  christos The above command prints the following information.  Note that some of the
   1360      1.1  christos lay-out and the information has been modified.  Directory paths have been
   1361      1.1  christos replaced @code{<apath>} for example.  Textual changes are
   1362      1.1  christos marked with the @samp{<} and @samp{>} symbols.
   1363      1.1  christos 
   1364      1.1  christos @smallexample
   1365      1.1  christos @verbatim
   1366      1.1  christos Experiment: mxv.1.thr.er
   1367      1.1  christos No errors
   1368      1.1  christos No warnings
   1369  1.1.1.2  christos Archive command ` /usr/bin/gprofng-archive -n -a on --outfile
   1370      1.1  christos                      <apath>/archive.log <apath>/mxv.1.thr.er'
   1371      1.1  christos 
   1372      1.1  christos Target command (64-bit): './mxv-pthreads -m 8000 -n 4000 -t 1'
   1373      1.1  christos Process pid 2750071, ppid 2750069, pgrp 2749860, sid 2742080
   1374      1.1  christos Current working directory: <apath>
   1375      1.1  christos Collector version: `2.40.00'; experiment version 12.4 (64-bit)
   1376      1.1  christos Host `<the-host-name>', OS `Linux <version>', page size 4096,
   1377      1.1  christos                      architecture `x86_64'
   1378      1.1  christos   4 CPUs, clock speed 2294 MHz.
   1379      1.1  christos   Memory: 3506491 pages @  4096 = 13697 MB.
   1380      1.1  christos Data collection parameters:
   1381      1.1  christos   Clock-profiling, interval = 997 microsecs.
   1382      1.1  christos   Periodic sampling, 1 secs.
   1383      1.1  christos   Follow descendant processes from: fork|exec|combo
   1384      1.1  christos 
   1385      1.1  christos Experiment started <date and time>
   1386      1.1  christos 
   1387      1.1  christos Experiment Ended: 9.801216173
   1388      1.1  christos Data Collection Duration: 9.801216173
   1389      1.1  christos @end verbatim
   1390      1.1  christos @end smallexample
   1391      1.1  christos 
   1392      1.1  christos The output above may assist in troubleshooting, or to verify some of the
   1393      1.1  christos operational conditions and we recommend to include this command when
   1394      1.1  christos generating a profile.
   1395      1.1  christos 
   1396      1.1  christos @IndexSubentry{Options, @code{-C}}
   1397      1.1  christos Related to this command there is a useful option to record comment(s) in
   1398      1.1  christos an experiment.
   1399      1.1  christos To this end, use the @samp{-C} option on the @CollectApp{} tool to
   1400      1.1  christos specify a comment string.  Up to ten comment lines can be included.
   1401      1.1  christos These comments are displayed with the @command{header} command on
   1402      1.1  christos the @DisplayText{} tool.
   1403      1.1  christos 
   1404      1.1  christos @IndexSubentry{Options,  @code{-overview}}
   1405      1.1  christos @IndexSubentry{Commands, @code{overview}}
   1406      1.1  christos The @command{overview} command displays information on the experiment(s) and
   1407      1.1  christos also shows a summary of the values for the metric(s) used.  This is an example
   1408      1.1  christos how to use it on the newly created experiment directory:
   1409      1.1  christos 
   1410      1.1  christos @cartouche
   1411      1.1  christos @smallexample
   1412      1.1  christos $ gprofng display text -overview mxv.1.thr.er
   1413      1.1  christos @end smallexample
   1414      1.1  christos @end cartouche
   1415      1.1  christos 
   1416      1.1  christos @smallexample
   1417      1.1  christos @verbatim
   1418      1.1  christos Experiment(s):
   1419      1.1  christos 
   1420      1.1  christos Experiment      :mxv.1.thr.er
   1421      1.1  christos   Target        : './mxv-pthreads -m 8000 -n 4000 -t 1'
   1422      1.1  christos   Host          : <hostname> (<ISA>, Linux <version>)
   1423      1.1  christos   Start Time    : <date and time>
   1424      1.1  christos   Duration      : 9.801 Seconds
   1425      1.1  christos 
   1426      1.1  christos Metrics:
   1427      1.1  christos 
   1428      1.1  christos   Experiment Duration (Seconds): [9.801]
   1429      1.1  christos   Clock Profiling
   1430      1.1  christos     [X]Total CPU Time - totalcpu (Seconds): [*9.703]
   1431      1.1  christos 
   1432      1.1  christos Notes: '*' indicates hot metrics, '[X]' indicates currently enabled
   1433      1.1  christos        metrics.
   1434      1.1  christos        The metrics command can be used to change selections. The
   1435      1.1  christos        metric_list command lists all available metrics.
   1436      1.1  christos @end verbatim
   1437      1.1  christos @end smallexample
   1438      1.1  christos 
   1439      1.1  christos This command provides a dashboard overview that helps to easily identify
   1440      1.1  christos where the time is spent and in case hardware event counters are used, it
   1441      1.1  christos shows their total values.
   1442      1.1  christos 
   1443      1.1  christos @c -- A new node --------------------------------------------------------------
   1444      1.1  christos @node       Control the Sampling Frequency
   1445      1.1  christos @subsection Control the Sampling Frequency
   1446      1.1  christos @c ----------------------------------------------------------------------------
   1447      1.1  christos 
   1448      1.1  christos @cindex Sampling frequency
   1449      1.1  christos So far we did not go into details on the frequency of the sampling process,
   1450      1.1  christos but in some cases it is useful to change the default of 10 milliseconds.
   1451      1.1  christos 
   1452      1.1  christos The advantage of increasing the sampling frequency is that functions that
   1453      1.1  christos do not take much time per invocation are more accurately captured.  The
   1454      1.1  christos downside is that more data is gathered.  This has an impact on the overhead
   1455      1.1  christos of the collection process and more disk space is required.
   1456      1.1  christos 
   1457      1.1  christos In general this is not an immediate concern, but with heavily threaded
   1458      1.1  christos applications that run for an extended period of time, increasing the
   1459      1.1  christos frequency may have a more noticeable impact.
   1460      1.1  christos 
   1461      1.1  christos @IndexSubentry{Options, @code{-p}}
   1462      1.1  christos The @code{-p} option on the @CollectApp{} tool is used to enable or disable
   1463      1.1  christos clock based profiling, or to explicitly set the sampling rate.
   1464      1.1  christos @cindex Sampling interval
   1465      1.1  christos This option takes one of the following keywords:
   1466      1.1  christos 
   1467      1.1  christos @table @code
   1468      1.1  christos 
   1469      1.1  christos @item off
   1470      1.1  christos Disable clock based profiling.
   1471      1.1  christos 
   1472      1.1  christos @item on
   1473      1.1  christos Enable clock based profiling with a per thread sampling interval of 10 ms.
   1474      1.1  christos This is the default.
   1475      1.1  christos 
   1476      1.1  christos @item lo
   1477      1.1  christos Enable clock based profiling with a per thread sampling interval of 100 ms.
   1478      1.1  christos 
   1479      1.1  christos @item hi
   1480      1.1  christos Enable clock based profiling with a per thread sampling interval of 1 ms.
   1481      1.1  christos 
   1482      1.1  christos @item @var{value}
   1483      1.1  christos @cindex Sampling interval
   1484      1.1  christos Enable clock based profiling with a per thread sampling interval of
   1485      1.1  christos @var{value}.
   1486      1.1  christos 
   1487      1.1  christos @end table
   1488      1.1  christos 
   1489      1.1  christos It may seem unnecessary to have an option to disable clock based profiling,
   1490      1.1  christos but there is a good reason to support this.
   1491      1.1  christos By default, clock profiling  is enabled when conducting hardware event counter
   1492      1.1  christos experiments (@xref{Profile Hardware Event Counters}).
   1493      1.1  christos With the @code{-p off} option, this can be disabled.
   1494      1.1  christos 
   1495  1.1.1.2  christos If an explicit value is set for the sampling, the number can be an integer or
   1496  1.1.1.2  christos a floating-point number.
   1497  1.1.1.2  christos A suffix of @samp{u} for microseconds, or @samp{m} for milliseconds is
   1498  1.1.1.2  christos supported.  If no suffix is used, the value is assumed to be in milliseconds.
   1499      1.1  christos For example, the following command sets the sampling rate to
   1500      1.1  christos 5123.4 microseconds:
   1501      1.1  christos 
   1502      1.1  christos @cartouche
   1503      1.1  christos @smallexample
   1504      1.1  christos $ gprofng collect app -p 5123.4u ./mxv-pthreads -m 8000 -n 4000 -t 1
   1505      1.1  christos @end smallexample
   1506      1.1  christos @end cartouche
   1507      1.1  christos 
   1508  1.1.1.2  christos If the value is smaller than the clock profiling minimum, a warning message
   1509  1.1.1.2  christos is issued and it is set to the minimum.
   1510  1.1.1.2  christos In case it is not a multiple of the clock profiling resolution, it is
   1511  1.1.1.2  christos silently rounded down to the nearest multiple of the clock resolution.
   1512  1.1.1.2  christos If the value exceeds the clock profiling maximum, is negative, or zero, an
   1513  1.1.1.2  christos error is reported.
   1514      1.1  christos 
   1515      1.1  christos @IndexSubentry{Options,  @code{-header}}
   1516      1.1  christos @IndexSubentry{Commands, @code{header}}
   1517      1.1  christos @noindent
   1518      1.1  christos Note that the @code{header} command echoes the sampling rate used.
   1519      1.1  christos 
   1520      1.1  christos @c -- A new node --------------------------------------------------------------
   1521      1.1  christos @node       Information on Load Objects
   1522      1.1  christos @subsection Information on Load Objects
   1523      1.1  christos @c ----------------------------------------------------------------------------
   1524      1.1  christos 
   1525      1.1  christos It may happen that the function view shows a function that is not known to
   1526      1.1  christos the user.  This can easily happen with library functions for example.
   1527      1.1  christos Luckily there are three commands that come in handy then.
   1528      1.1  christos 
   1529      1.1  christos @IndexSubentry{Options,  @code{-objects}}
   1530      1.1  christos @IndexSubentry{Commands, @code{objects}}
   1531      1.1  christos @IndexSubentry{Options,  @code{-fsingle}}
   1532      1.1  christos @IndexSubentry{Commands, @code{fsingle}}
   1533      1.1  christos @IndexSubentry{Options,  @code{-fsummary}}
   1534      1.1  christos @IndexSubentry{Commands, @code{fsummary}}
   1535  1.1.1.2  christos These commands are @command{objects}, @command{fsingle}, and
   1536  1.1.1.2  christos @command{fsummary}.  They provide details on
   1537      1.1  christos @cindex Load objects
   1538      1.1  christos load objects (@xref{Load Objects and Functions}).
   1539      1.1  christos 
   1540      1.1  christos The @command{objects} command lists all load objects that have been referenced
   1541      1.1  christos during the performance experiment.
   1542      1.1  christos Below we show the command and the result for our profile job.  Like before,
   1543      1.1  christos some path names in the output have been shortened and replaced by the
   1544      1.1  christos @IndexSubentry{Miscellaneous, @code{<apath>}}
   1545      1.1  christos @code{<apath>} symbol that represents an absolute directory path.
   1546      1.1  christos 
   1547      1.1  christos @cartouche
   1548      1.1  christos @smallexample
   1549      1.1  christos $ gprofng display text -objects mxv.1.thr.er
   1550      1.1  christos @end smallexample
   1551      1.1  christos @end cartouche
   1552      1.1  christos 
   1553      1.1  christos The output includes the name and path of the target executable:
   1554      1.1  christos 
   1555      1.1  christos @smallexample
   1556      1.1  christos @verbatim
   1557      1.1  christos <Unknown> (<Unknown>)
   1558      1.1  christos <mxv-pthreads> (<apath>/mxv-pthreads)
   1559      1.1  christos <libdl-2.28.so> (/usr/lib64/libdl-2.28.so)
   1560      1.1  christos <librt-2.28.so> (/usr/lib64/librt-2.28.so)
   1561      1.1  christos <libc-2.28.so> (/usr/lib64/libc-2.28.so)
   1562      1.1  christos <libpthread-2.28.so> (/usr/lib64/libpthread-2.28.so)
   1563      1.1  christos <libm-2.28.so> (/usr/lib64/libm-2.28.so)
   1564      1.1  christos <libgp-collector.so> (/usr/lib64/gprofng/libgp-collector.so)
   1565      1.1  christos <ld-2.28.so> (/usr/lib64/ld-2.28.so)
   1566      1.1  christos <DYNAMIC_FUNCTIONS> (DYNAMIC_FUNCTIONS)
   1567      1.1  christos @end verbatim
   1568      1.1  christos @end smallexample
   1569      1.1  christos 
   1570      1.1  christos @IndexSubentry{Options,  @code{-fsingle}}
   1571      1.1  christos @IndexSubentry{Commands, @code{fsingle}}
   1572  1.1.1.2  christos The @command{fsingle} command may be used to get more details on a specific
   1573  1.1.1.2  christos entry in the function view, say.  For example, the command below provides
   1574  1.1.1.2  christos additional information on the @code{pthread_create} function shown in the
   1575  1.1.1.2  christos function overview.
   1576      1.1  christos 
   1577      1.1  christos @cartouche
   1578      1.1  christos @smallexample
   1579      1.1  christos $ gprofng display text -fsingle pthread_create mxv.1.thr.er
   1580      1.1  christos @end smallexample
   1581      1.1  christos @end cartouche
   1582      1.1  christos 
   1583  1.1.1.2  christos Below the output from this command.  It has been somewhat modified to match
   1584  1.1.1.2  christos the display requirements.
   1585      1.1  christos 
   1586      1.1  christos @smallexample
   1587      1.1  christos @verbatim
   1588      1.1  christos + gprofng display text -fsingle pthread_create mxv.1.thr.er
   1589      1.1  christos pthread_create
   1590      1.1  christos         Exclusive Total CPU Time: 0.    (  0. %)
   1591      1.1  christos         Inclusive Total CPU Time: 0.001 (  0.0%)
   1592      1.1  christos                             Size:   258
   1593      1.1  christos                       PC Address: 8:0x00049f60
   1594      1.1  christos                      Source File: (unknown)
   1595      1.1  christos                      Object File: (unknown)
   1596      1.1  christos                      Load Object: /usr/lib64/gprofng/libgp-collector.so
   1597      1.1  christos                     Mangled Name:
   1598      1.1  christos                          Aliases:
   1599      1.1  christos @end verbatim
   1600      1.1  christos @end smallexample
   1601      1.1  christos 
   1602      1.1  christos In this table we not only see how much time was spent in this function, we
   1603      1.1  christos also see where it originates from.  In addition to this, the size and start
   1604      1.1  christos address are given as well.  If the source code location is known it is also
   1605      1.1  christos shown here.
   1606      1.1  christos 
   1607      1.1  christos @IndexSubentry{Options,  @code{-fsummary}}
   1608      1.1  christos @IndexSubentry{Commands, @code{fsummary}}
   1609      1.1  christos The related @code{fsummary} command displays the same information as
   1610      1.1  christos @code{fsingle}, but for all functions in the function overview,
   1611      1.1  christos including @code{<Total>}:
   1612      1.1  christos 
   1613      1.1  christos @cartouche
   1614      1.1  christos @smallexample
   1615      1.1  christos $ gprofng display text -fsummary mxv.1.thr.er
   1616      1.1  christos @end smallexample
   1617      1.1  christos @end cartouche
   1618      1.1  christos 
   1619      1.1  christos @smallexample
   1620      1.1  christos @verbatim
   1621      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   1622      1.1  christos 
   1623      1.1  christos <Total>
   1624  1.1.1.2  christos        Exclusive Total CPU Time: 9.703 (100.0%)
   1625  1.1.1.2  christos        Inclusive Total CPU Time: 9.703 (100.0%)
   1626  1.1.1.2  christos                            Size:     0
   1627  1.1.1.2  christos                      PC Address: 1:0x00000000
   1628  1.1.1.2  christos                     Source File: (unknown)
   1629  1.1.1.2  christos                     Object File: (unknown)
   1630  1.1.1.2  christos                     Load Object: <Total>
   1631  1.1.1.2  christos                    Mangled Name:
   1632  1.1.1.2  christos                         Aliases:
   1633      1.1  christos 
   1634      1.1  christos mxv_core
   1635  1.1.1.2  christos        Exclusive Total CPU Time: 9.226 ( 95.1%)
   1636  1.1.1.2  christos        Inclusive Total CPU Time: 9.226 ( 95.1%)
   1637  1.1.1.2  christos                            Size:    80
   1638  1.1.1.2  christos                      PC Address: 2:0x00001d56
   1639  1.1.1.2  christos                     Source File: <apath>/src/mxv.c
   1640  1.1.1.2  christos                     Object File: mxv.1.thr.er/archives/mxv-pthreads_ss_pf53V__5
   1641  1.1.1.2  christos                     Load Object: <apath>/mxv-pthreads
   1642  1.1.1.2  christos                    Mangled Name:
   1643  1.1.1.2  christos                         Aliases:
   1644      1.1  christos 
   1645      1.1  christos           ... etc ...
   1646      1.1  christos @end verbatim
   1647      1.1  christos @end smallexample
   1648      1.1  christos 
   1649      1.1  christos @c -- A new node --------------------------------------------------------------
   1650      1.1  christos @node    Support for Multithreading
   1651      1.1  christos @section Support for Multithreading
   1652      1.1  christos @c ----------------------------------------------------------------------------
   1653      1.1  christos 
   1654      1.1  christos In this chapter the support for multithreading is introduced and discussed.
   1655      1.1  christos As is shown below, nothing needs to be changed when collecting the performance
   1656      1.1  christos data.
   1657      1.1  christos 
   1658      1.1  christos The difference is that additional commands are available to get more
   1659      1.1  christos information on the multithreading details, plus that several filters allow
   1660      1.1  christos the user to zoom in on specific threads.
   1661      1.1  christos 
   1662      1.1  christos @c -- A new node --------------------------------------------------------------
   1663      1.1  christos @node       Creating a Multithreading Experiment
   1664      1.1  christos @subsection Creating a Multithreading Experiment
   1665      1.1  christos @c ----------------------------------------------------------------------------
   1666      1.1  christos 
   1667      1.1  christos We demonstrate the support for multithreading using the same code and settings
   1668      1.1  christos as before, but this time 2 threads are used:
   1669      1.1  christos 
   1670      1.1  christos @cartouche
   1671      1.1  christos @smallexample
   1672      1.1  christos $ exe=mxv-pthreads
   1673      1.1  christos $ m=8000
   1674      1.1  christos $ n=4000
   1675      1.1  christos $ gprofng collect app -O mxv.2.thr.er ./$exe -m $m -n $n -t 2
   1676      1.1  christos @end smallexample
   1677      1.1  christos @end cartouche
   1678      1.1  christos 
   1679      1.1  christos First of all, in as far as @ProductName{} is concerned, no changes are needed.
   1680  1.1.1.2  christos Nothing special is needed to profile a multithreaded job when using
   1681  1.1.1.2  christos @ToolName{}.
   1682      1.1  christos 
   1683      1.1  christos The same is true when displaying the performance results.  The same commands
   1684  1.1.1.2  christos that were used before work unmodified.  For example, this is all that is
   1685  1.1.1.2  christos needed to get a function overview:
   1686      1.1  christos 
   1687      1.1  christos @cartouche
   1688      1.1  christos @smallexample
   1689      1.1  christos $ gprofng display text -limit 5 -functions mxv.2.thr.er
   1690      1.1  christos @end smallexample
   1691      1.1  christos @end cartouche
   1692      1.1  christos 
   1693  1.1.1.2  christos @noindent
   1694      1.1  christos This produces the following familiar looking output:
   1695      1.1  christos 
   1696      1.1  christos @smallexample
   1697      1.1  christos @verbatim
   1698      1.1  christos Print limit set to 5
   1699      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   1700      1.1  christos 
   1701      1.1  christos Excl. Total   Incl. Total    Name
   1702      1.1  christos CPU           CPU
   1703      1.1  christos  sec.      %   sec.      %
   1704      1.1  christos 9.464 100.00  9.464 100.00   <Total>
   1705      1.1  christos 8.961  94.69  8.961  94.69   mxv_core
   1706      1.1  christos 0.224   2.37  0.469   4.95   init_data
   1707      1.1  christos 0.105   1.11  0.177   1.88   erand48_r
   1708      1.1  christos 0.073   0.77  0.073   0.77   __drand48_iterate
   1709      1.1  christos @end verbatim
   1710      1.1  christos @end smallexample
   1711      1.1  christos 
   1712      1.1  christos @c -- A new node --------------------------------------------------------------
   1713      1.1  christos @node       Commands Specific to Multithreading
   1714      1.1  christos @subsection Commands Specific to Multithreading
   1715      1.1  christos @c ----------------------------------------------------------------------------
   1716      1.1  christos 
   1717      1.1  christos The function overview shown above shows the results aggregated over all the
   1718      1.1  christos threads.  The interesting new element is that we can also look at the
   1719      1.1  christos performance data for the individual threads.
   1720      1.1  christos 
   1721      1.1  christos @IndexSubentry{Options,  @code{-thread_list}}
   1722      1.1  christos @IndexSubentry{Commands, @code{thread_list}}
   1723      1.1  christos The @command{thread_list} command displays how many threads have been used:
   1724      1.1  christos 
   1725      1.1  christos @cartouche
   1726      1.1  christos @smallexample
   1727      1.1  christos $ gprofng display text -thread_list mxv.2.thr.er
   1728      1.1  christos @end smallexample
   1729      1.1  christos @end cartouche
   1730      1.1  christos 
   1731      1.1  christos This produces the following output, showing that three threads have
   1732      1.1  christos been used:
   1733      1.1  christos 
   1734      1.1  christos @smallexample
   1735      1.1  christos @verbatim
   1736      1.1  christos Exp Sel Total
   1737      1.1  christos === === =====
   1738      1.1  christos   1 all     3
   1739      1.1  christos @end verbatim
   1740      1.1  christos @end smallexample
   1741      1.1  christos 
   1742      1.1  christos The output confirms there is one experiment and that by default all
   1743      1.1  christos threads are selected.
   1744      1.1  christos 
   1745      1.1  christos It may seem surprising to see three threads here, since we used the
   1746      1.1  christos @code{-t 2} option, but it is common for a Pthreads program to use one
   1747      1.1  christos additional thread.
   1748      1.1  christos Typically, there is one main thread that runs from start to finish.
   1749      1.1  christos It handles the sequential portions of the code, as well as thread
   1750      1.1  christos management related tasks.
   1751      1.1  christos It is no different in the example code.  At some point, the main thread
   1752      1.1  christos creates and activates the two threads that perform the multiplication
   1753      1.1  christos of the matrix with the vector.  Upon completion of this computation,
   1754      1.1  christos the main thread continues.
   1755      1.1  christos 
   1756      1.1  christos @IndexSubentry{Options,  @code{-threads}}
   1757      1.1  christos @IndexSubentry{Commands, @code{threads}}
   1758      1.1  christos The @command{threads} command is simple, yet very powerful.  It shows the
   1759      1.1  christos total value of the metrics for each thread.
   1760      1.1  christos 
   1761      1.1  christos @cartouche
   1762      1.1  christos @smallexample
   1763      1.1  christos $ gprofng display text -threads mxv.2.thr.er
   1764      1.1  christos @end smallexample
   1765      1.1  christos @end cartouche
   1766      1.1  christos 
   1767      1.1  christos @noindent
   1768      1.1  christos The command above produces the following overview:
   1769      1.1  christos 
   1770      1.1  christos @smallexample
   1771      1.1  christos @verbatim
   1772      1.1  christos Objects sorted by metric: Exclusive Total CPU Time
   1773      1.1  christos 
   1774      1.1  christos Excl. Total    Name
   1775      1.1  christos CPU
   1776      1.1  christos  sec.      %
   1777      1.1  christos 9.464 100.00   <Total>
   1778      1.1  christos 4.547  48.05   Process 1, Thread 3
   1779      1.1  christos 4.414  46.64   Process 1, Thread 2
   1780      1.1  christos 0.502   5.31   Process 1, Thread 1
   1781      1.1  christos @end verbatim
   1782      1.1  christos @end smallexample
   1783      1.1  christos 
   1784      1.1  christos The first line gives the total CPU time accumulated over the threads
   1785      1.1  christos selected.  This is followed by the metric value(s) for each thread.
   1786      1.1  christos 
   1787      1.1  christos From this it is clear that the main thread is responsible for a
   1788      1.1  christos little over 5% of the total CPU time, while the other two threads
   1789      1.1  christos take 47-48% each.
   1790      1.1  christos 
   1791      1.1  christos This view is ideally suited to verify if there are any load balancing
   1792      1.1  christos issues and also to find the most time consuming thread(s).
   1793      1.1  christos 
   1794      1.1  christos @IndexSubentry{Filters, Thread selection}
   1795      1.1  christos While useful, often more information than this is needed.  This is
   1796      1.1  christos @IndexSubentry{Options,  @code{-thread_select}}
   1797      1.1  christos @IndexSubentry{Commands, @code{thread_select}}
   1798      1.1  christos where the thread selection filter comes in.  Through the
   1799      1.1  christos @command{thread_select}
   1800      1.1  christos command, one or more threads may be selected.
   1801      1.1  christos @xref{The Selection List} how to define the selection list.
   1802      1.1  christos 
   1803      1.1  christos Since it is most common to use this command in a script, we do so as
   1804      1.1  christos well here.  Below the script we are using:
   1805      1.1  christos 
   1806      1.1  christos @cartouche
   1807      1.1  christos @smallexample
   1808      1.1  christos # Define the metrics
   1809      1.1  christos metrics e.%totalcpu
   1810      1.1  christos # Limit the output to 5 lines
   1811      1.1  christos limit 5
   1812      1.1  christos # Get the function overview for thread 1
   1813      1.1  christos thread_select 1
   1814      1.1  christos functions
   1815      1.1  christos # Get the function overview for thread 2
   1816      1.1  christos thread_select 2
   1817      1.1  christos functions
   1818      1.1  christos # Get the function overview for thread 3
   1819      1.1  christos thread_select 3
   1820      1.1  christos functions
   1821      1.1  christos @end smallexample
   1822      1.1  christos @end cartouche
   1823      1.1  christos 
   1824      1.1  christos The definition of the metrics and the output limit have been shown and
   1825      1.1  christos explained earlier.  The new command to focus on is @command{thread_select}.
   1826      1.1  christos 
   1827      1.1  christos This command takes a list (@xref{The Selection List}) to select specific
   1828      1.1  christos threads.  In this case, the individual thread numbers that were
   1829      1.1  christos obtained earlier with the @command{thread_list} command are selected.
   1830      1.1  christos 
   1831      1.1  christos This restricts the output of the @command{functions} command to the thread
   1832      1.1  christos number(s) specified.  This means that the script above shows which
   1833      1.1  christos function(s) each thread executes and how much CPU time they consumed.
   1834      1.1  christos Both the exclusive timings and their percentages are given.
   1835      1.1  christos 
   1836  1.1.1.2  christos Note that technically this command is a filter and persistent.  The
   1837      1.1  christos selection remains active until changed through another thread selection
   1838      1.1  christos command, or when it is reset with the @samp{all} selection list.
   1839      1.1  christos 
   1840      1.1  christos @noindent
   1841      1.1  christos This is the relevant part of the output for the first thread:
   1842      1.1  christos 
   1843      1.1  christos @smallexample
   1844      1.1  christos @verbatim
   1845      1.1  christos Exp Sel Total
   1846      1.1  christos === === =====
   1847      1.1  christos   1 1       3
   1848      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   1849      1.1  christos 
   1850      1.1  christos Excl. Total    Name
   1851      1.1  christos CPU
   1852      1.1  christos  sec.      %
   1853      1.1  christos 0.502 100.00   <Total>
   1854      1.1  christos 0.224  44.64   init_data
   1855      1.1  christos 0.105  20.83   erand48_r
   1856      1.1  christos 0.073  14.48   __drand48_iterate
   1857      1.1  christos 0.067  13.29   drand48
   1858      1.1  christos @end verbatim
   1859      1.1  christos @end smallexample
   1860      1.1  christos 
   1861      1.1  christos As usual, the comment lines are echoed.  This is followed by a confirmation
   1862      1.1  christos of the selection.  The first table shows that one experiment is loaded and
   1863      1.1  christos that thread 1 out of the three threads has been selected.  What is
   1864      1.1  christos displayed next is the function overview for this particular thread.  Due to
   1865      1.1  christos the @code{limit 5} command, there are only five functions in this list.
   1866      1.1  christos 
   1867      1.1  christos Clearly, this thread handles the data initialization part and as we know
   1868      1.1  christos from the call tree output, function @code{init_data} executes the 3 other
   1869      1.1  christos functions shown in this profile.
   1870      1.1  christos 
   1871      1.1  christos Below are the overviews for threads 2 and 3 respectively.  It is seen that all
   1872      1.1  christos of the CPU time is spent in function @code{mxv_core} and that this time
   1873      1.1  christos is approximately the same for both threads.
   1874      1.1  christos 
   1875      1.1  christos @smallexample
   1876      1.1  christos @verbatim
   1877      1.1  christos # Get the function overview for thread 2
   1878      1.1  christos Exp Sel Total
   1879      1.1  christos === === =====
   1880      1.1  christos   1 2       3
   1881      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   1882      1.1  christos 
   1883      1.1  christos Excl. Total    Name
   1884      1.1  christos CPU
   1885      1.1  christos  sec.      %
   1886      1.1  christos 4.414 100.00   <Total>
   1887      1.1  christos 4.414 100.00   mxv_core
   1888      1.1  christos 0.      0.     <static>@0x48630 (<libgp-collector.so>)
   1889      1.1  christos 0.      0.     driver_mxv
   1890      1.1  christos 0.      0.     start_thread
   1891      1.1  christos 
   1892      1.1  christos # Get the function overview for thread 3
   1893      1.1  christos Exp Sel Total
   1894      1.1  christos === === =====
   1895      1.1  christos   1 3       3
   1896      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   1897      1.1  christos 
   1898      1.1  christos Excl. Total    Name
   1899      1.1  christos CPU
   1900      1.1  christos  sec.      %
   1901      1.1  christos 4.547 100.00   <Total>
   1902      1.1  christos 4.547 100.00   mxv_core
   1903      1.1  christos 0.      0.     <static>@0x48630 (<libgp-collector.so>)
   1904      1.1  christos 0.      0.     driver_mxv
   1905      1.1  christos 0.      0.     start_thread
   1906      1.1  christos @end verbatim
   1907      1.1  christos @end smallexample
   1908      1.1  christos 
   1909      1.1  christos When analyzing the performance of a multithreaded application, it is sometimes
   1910      1.1  christos useful to know whether threads have mostly executed on the same core, say, or
   1911  1.1.1.2  christos if they have wandered across multiple cores.  This sort of stickiness is
   1912  1.1.1.2  christos usually referred to as
   1913      1.1  christos @cindex Thread affinity
   1914      1.1  christos @emph{thread affinity}.
   1915      1.1  christos 
   1916      1.1  christos Similar to the commands for the threads, there are several commands related
   1917      1.1  christos to the usage of the cores, or @emph{CPUs} as they are called in @ToolName{}
   1918      1.1  christos (@xref{The Concept of a CPU in gprofng}).
   1919      1.1  christos 
   1920      1.1  christos @IndexSubentry{Options,  @code{-cpu_list}}
   1921      1.1  christos @IndexSubentry{Commands, @code{cpu_list}}
   1922      1.1  christos Similar to the @command{thread_list} command, the @command{cpu_list} command
   1923      1.1  christos displays how many CPUs have been used.
   1924      1.1  christos @IndexSubentry{Options,  @code{-cpus}}
   1925      1.1  christos @IndexSubentry{Commands, @code{cpus}}
   1926      1.1  christos The equivalent of the @command{threads} threads command, is the @command{cpus}
   1927  1.1.1.2  christos command, which shows the numbers of the CPUs that were used and the metric
   1928  1.1.1.2  christos values for each one of them.  Both commands are demonstrated below.
   1929      1.1  christos 
   1930      1.1  christos @cartouche
   1931      1.1  christos @smallexample
   1932      1.1  christos $ gprofng display text -cpu_list -cpus mxv.2.thr.er
   1933      1.1  christos @end smallexample
   1934      1.1  christos @end cartouche
   1935      1.1  christos 
   1936      1.1  christos @noindent
   1937      1.1  christos This command produces the following output:
   1938      1.1  christos 
   1939      1.1  christos @smallexample
   1940      1.1  christos @verbatim
   1941      1.1  christos + gprofng display text -cpu_list -cpus mxv.2.thr.er
   1942      1.1  christos Exp Sel Total
   1943      1.1  christos === === =====
   1944      1.1  christos   1 all     4
   1945      1.1  christos Objects sorted by metric: Exclusive Total CPU Time
   1946      1.1  christos 
   1947      1.1  christos Excl. Total    Name
   1948      1.1  christos CPU
   1949      1.1  christos  sec.      %
   1950      1.1  christos 9.464 100.00   <Total>
   1951      1.1  christos 4.414  46.64   CPU 2
   1952      1.1  christos 2.696  28.49   CPU 0
   1953      1.1  christos 1.851  19.56   CPU 1
   1954      1.1  christos 0.502   5.31   CPU 3
   1955      1.1  christos @end verbatim
   1956      1.1  christos @end smallexample
   1957      1.1  christos 
   1958      1.1  christos The first table shows that there is only one experiment and that all of the
   1959      1.1  christos four CPUs have been selected.  The second table shows the exclusive metrics
   1960      1.1  christos for each of the CPUs that have been used.
   1961      1.1  christos 
   1962  1.1.1.2  christos As also echoed in the output, the data is sorted with respect to the
   1963      1.1  christos exclusive CPU time, but it is very easy to sort the data by the CPU id
   1964      1.1  christos @IndexSubentry{Options, -sort}
   1965      1.1  christos @IndexSubentry{Commands, sort}
   1966      1.1  christos by using the @command{sort} command:
   1967      1.1  christos 
   1968      1.1  christos @cartouche
   1969      1.1  christos @smallexample
   1970      1.1  christos $ gprofng display text -cpu_list -sort name -cpus mxv.2.thr.er
   1971      1.1  christos @end smallexample
   1972      1.1  christos @end cartouche
   1973      1.1  christos 
   1974      1.1  christos @noindent
   1975      1.1  christos With the @command{sort} added, the output is as follows:
   1976      1.1  christos 
   1977      1.1  christos @smallexample
   1978      1.1  christos @verbatim
   1979      1.1  christos Exp Sel Total
   1980      1.1  christos === === =====
   1981      1.1  christos   1 all     4
   1982      1.1  christos Current Sort Metric: Name ( name )
   1983      1.1  christos Objects sorted by metric: Name
   1984      1.1  christos 
   1985      1.1  christos Excl. Total    Name
   1986      1.1  christos CPU
   1987      1.1  christos  sec.      %
   1988      1.1  christos 9.464 100.00   <Total>
   1989      1.1  christos 2.696  28.49   CPU 0
   1990      1.1  christos 1.851  19.56   CPU 1
   1991      1.1  christos 4.414  46.64   CPU 2
   1992      1.1  christos 0.502   5.31   CPU 3
   1993      1.1  christos @end verbatim
   1994      1.1  christos @end smallexample
   1995      1.1  christos 
   1996      1.1  christos While the table with thread times shown earlier may point at a load imbalance
   1997      1.1  christos in the application, this overview has a different purpose.
   1998      1.1  christos 
   1999      1.1  christos For example, we see that 4 CPUs have been used, but we know that the
   2000      1.1  christos application uses 3 threads only.
   2001      1.1  christos We will now demonstrate how filters can be used to help answer the
   2002      1.1  christos question why 4 CPUs are used, while the application has 3 threads only.
   2003      1.1  christos This means that at least one thread has executed on more than one CPU.
   2004      1.1  christos 
   2005      1.1  christos Recall the thread level timings:
   2006      1.1  christos 
   2007      1.1  christos @smallexample
   2008      1.1  christos @verbatim
   2009      1.1  christos Excl. Total    Name
   2010      1.1  christos CPU
   2011      1.1  christos  sec.      %
   2012      1.1  christos 9.464 100.00   <Total>
   2013      1.1  christos 4.547  48.05   Process 1, Thread 3
   2014      1.1  christos 4.414  46.64   Process 1, Thread 2
   2015      1.1  christos 0.502   5.31   Process 1, Thread 1
   2016      1.1  christos @end verbatim
   2017      1.1  christos @end smallexample
   2018      1.1  christos 
   2019      1.1  christos Compared to the CPU timings above, it seems very likely that thread 3 has
   2020      1.1  christos used more than one CPU, because the thread and CPU timings are the same
   2021      1.1  christos for both other threads.
   2022      1.1  christos 
   2023      1.1  christos The command below selects thread number 3 and then requests the CPU
   2024      1.1  christos utilization for this thread:
   2025      1.1  christos 
   2026      1.1  christos @cartouche
   2027      1.1  christos @smallexample
   2028      1.1  christos $ gprofng display text -thread_select 3 -sort name -cpus mxv.2.thr.er
   2029      1.1  christos @end smallexample
   2030      1.1  christos @end cartouche
   2031      1.1  christos 
   2032      1.1  christos The output shown below confirms that thread 3 is selected and then displays
   2033      1.1  christos the CPU(s) that have been used by this thread:
   2034      1.1  christos 
   2035      1.1  christos @smallexample
   2036      1.1  christos @verbatim
   2037      1.1  christos Exp Sel Total
   2038      1.1  christos === === =====
   2039      1.1  christos   1 3       3
   2040      1.1  christos 
   2041      1.1  christos Objects sorted by metric: Exclusive Total CPU Time
   2042      1.1  christos 
   2043      1.1  christos Excl. Total    Name
   2044      1.1  christos CPU
   2045      1.1  christos  sec.      %
   2046      1.1  christos 4.547 100.00   <Total>
   2047      1.1  christos 2.696  59.29   CPU 0
   2048      1.1  christos 1.851  40.71   CPU 1
   2049      1.1  christos @end verbatim
   2050      1.1  christos @end smallexample
   2051      1.1  christos 
   2052      1.1  christos The results show that this thread has used CPU 0 nearly 60% of the time
   2053      1.1  christos and CPU 1 for the remaining 40%.
   2054      1.1  christos 
   2055      1.1  christos To confirm that this is the only thread that has used more than one CPU, the
   2056      1.1  christos same approach can be used for threads 1 and 2:
   2057      1.1  christos 
   2058      1.1  christos @smallexample
   2059      1.1  christos @verbatim
   2060      1.1  christos $ gprofng display text -thread_select 1 -cpus mxv.2.thr.er
   2061      1.1  christos Exp Sel Total
   2062      1.1  christos === === =====
   2063      1.1  christos   1 1       3
   2064      1.1  christos Objects sorted by metric: Exclusive Total CPU Time
   2065      1.1  christos 
   2066      1.1  christos Excl. Total    Name
   2067      1.1  christos CPU
   2068      1.1  christos  sec.      %
   2069      1.1  christos 0.502 100.00   <Total>
   2070      1.1  christos 0.502 100.00   CPU 3
   2071      1.1  christos @end verbatim
   2072      1.1  christos @end smallexample
   2073      1.1  christos 
   2074      1.1  christos @smallexample
   2075      1.1  christos @verbatim
   2076      1.1  christos $ gprofng display text -thread_select 2 -cpus mxv.2.thr.er
   2077      1.1  christos Exp Sel Total
   2078      1.1  christos === === =====
   2079      1.1  christos   1 2       3
   2080      1.1  christos Objects sorted by metric: Exclusive Total CPU Time
   2081      1.1  christos 
   2082      1.1  christos Excl. Total    Name
   2083      1.1  christos CPU
   2084      1.1  christos  sec.      %
   2085      1.1  christos 4.414 100.00   <Total>
   2086      1.1  christos 4.414 100.00   CPU 2
   2087      1.1  christos @end verbatim
   2088      1.1  christos @end smallexample
   2089      1.1  christos 
   2090      1.1  christos @noindent
   2091      1.1  christos The output above shows that indeed threads 1 and 2 each have used a single
   2092      1.1  christos CPU only.
   2093      1.1  christos 
   2094      1.1  christos @c -- A new node --------------------------------------------------------------
   2095      1.1  christos @node    View Multiple Experiments
   2096      1.1  christos @section View Multiple Experiments
   2097      1.1  christos @c ----------------------------------------------------------------------------
   2098      1.1  christos 
   2099  1.1.1.2  christos One thing we did not cover sofar is that @ToolName{} fully supports the
   2100  1.1.1.2  christos analysis of multiple experiments.  The @DisplayText{} tool accepts a list of
   2101  1.1.1.2  christos experiments.  The data can either be aggregated across the experiments, or
   2102  1.1.1.2  christos used in a comparison.
   2103      1.1  christos 
   2104      1.1  christos The default is to aggregate the metric values across the experiments that have
   2105      1.1  christos been loaded.  The @command{compare} command can be used to enable the
   2106      1.1  christos @IndexSubentry{Options, @code{-compare}}
   2107      1.1  christos @IndexSubentry{Commands, @code{compare}}
   2108      1.1  christos comparison of results.
   2109      1.1  christos 
   2110      1.1  christos In this section both modes are illustrated with an example.
   2111      1.1  christos 
   2112      1.1  christos @c -- A new node --------------------------------------------------------------
   2113      1.1  christos @node       Aggregation of Experiments
   2114      1.1  christos @subsection Aggregation of Experiments
   2115      1.1  christos @c ----------------------------------------------------------------------------
   2116      1.1  christos 
   2117      1.1  christos If the data for multiple experiments is aggregrated, the @DisplayText{} tool
   2118      1.1  christos shows the combined results.
   2119      1.1  christos For example, below is the script to show the function view for the data
   2120      1.1  christos aggregated over two experiments, drop the first experiment and then show
   2121  1.1.1.2  christos the function view for the second experiment only.
   2122      1.1  christos We will call it @file{my-script-agg}.
   2123      1.1  christos 
   2124      1.1  christos @cartouche
   2125      1.1  christos @smallexample
   2126      1.1  christos # Define the metrics
   2127      1.1  christos metrics e.%totalcpu
   2128      1.1  christos # Limit the output to 5 lines
   2129      1.1  christos limit 5
   2130      1.1  christos # Get the list with experiments
   2131      1.1  christos experiment_list
   2132      1.1  christos # Get the function overview for all
   2133      1.1  christos functions
   2134      1.1  christos # Drop the first experiment
   2135      1.1  christos drop_exp mxv.2.thr.er
   2136      1.1  christos # Get the function overview for exp #2
   2137      1.1  christos functions
   2138      1.1  christos @end smallexample
   2139      1.1  christos @end cartouche
   2140      1.1  christos 
   2141      1.1  christos @IndexSubentry{Options,  @code{-experiment_list}}
   2142      1.1  christos @IndexSubentry{Commands, @code{experiment_list}}
   2143      1.1  christos With the exception of the @command{experiment_list} command, all commands
   2144      1.1  christos used have been discussed earlier.
   2145      1.1  christos 
   2146      1.1  christos The @command{experiment_list} command provides a list of the experiments
   2147      1.1  christos that have been loaded.  This may be used to get the experiment IDs and
   2148      1.1  christos to verify the correct experiments are loaded for the aggregation.
   2149      1.1  christos 
   2150      1.1  christos @noindent
   2151      1.1  christos Below is an example that loads two experiments and uses the above
   2152      1.1  christos script to display different function views.
   2153      1.1  christos 
   2154      1.1  christos @cartouche
   2155      1.1  christos @smallexample
   2156      1.1  christos $ gprofng display text -script my-script-agg mxv.2.thr.er mxv.4.thr.er
   2157      1.1  christos @end smallexample
   2158      1.1  christos @end cartouche
   2159      1.1  christos 
   2160      1.1  christos @noindent
   2161      1.1  christos This produces the following output:
   2162      1.1  christos 
   2163      1.1  christos @smallexample
   2164      1.1  christos @verbatim
   2165      1.1  christos # Define the metrics
   2166      1.1  christos Current metrics: e.%totalcpu:name
   2167      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu )
   2168      1.1  christos # Limit the output to 5 lines
   2169      1.1  christos Print limit set to 5
   2170      1.1  christos # Get the list with experiments
   2171      1.1  christos ID Sel     PID Experiment
   2172      1.1  christos == === ======= ============
   2173      1.1  christos  1 yes 1339450 mxv.2.thr.er
   2174      1.1  christos  2 yes 3579561 mxv.4.thr.er
   2175      1.1  christos # Get the function overview for all
   2176      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   2177      1.1  christos 
   2178      1.1  christos Excl. Total     Name
   2179      1.1  christos CPU
   2180      1.1  christos   sec.      %
   2181      1.1  christos 20.567 100.00   <Total>
   2182      1.1  christos 19.553  95.07   mxv_core
   2183      1.1  christos  0.474   2.30   init_data
   2184      1.1  christos  0.198   0.96   erand48_r
   2185      1.1  christos  0.149   0.72   drand48
   2186      1.1  christos 
   2187      1.1  christos # Drop the first experiment
   2188      1.1  christos Experiment mxv.2.thr.er has been dropped
   2189      1.1  christos # Get the function overview for exp #2
   2190      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   2191      1.1  christos 
   2192      1.1  christos Excl. Total     Name
   2193      1.1  christos CPU
   2194      1.1  christos   sec.      %
   2195      1.1  christos 11.104 100.00   <Total>
   2196      1.1  christos 10.592  95.39   mxv_core
   2197      1.1  christos  0.249   2.24   init_data
   2198      1.1  christos  0.094   0.84   erand48_r
   2199      1.1  christos  0.082   0.74   drand48
   2200      1.1  christos @end verbatim
   2201      1.1  christos @end smallexample
   2202      1.1  christos 
   2203      1.1  christos The first five lines should look familiar.  The five lines following echo
   2204      1.1  christos the comment line in the script and show the overview of the experiments.
   2205      1.1  christos This confirms two experiments have been loaded and that both are active.
   2206      1.1  christos This is followed by the function overview.  The timings have been summed
   2207      1.1  christos up and the percentages are adjusted accordingly.
   2208      1.1  christos 
   2209      1.1  christos @c -- A new node --------------------------------------------------------------
   2210      1.1  christos @node       Comparison of Experiments
   2211      1.1  christos @subsection Comparison of Experiments
   2212      1.1  christos @c ----------------------------------------------------------------------------
   2213      1.1  christos 
   2214  1.1.1.2  christos The support for multiple experiments really shines in comparison mode.
   2215      1.1  christos @cindex Compare experiments
   2216      1.1  christos In comparison mode, the data for the various experiments is shown side by
   2217      1.1  christos side, as illustrated below where we compare the results for the multithreaded
   2218      1.1  christos experiments using two and four threads respectively.
   2219      1.1  christos 
   2220      1.1  christos This
   2221      1.1  christos feature is controlled through the
   2222      1.1  christos @IndexSubentry{Options,  @code{-compare}}
   2223      1.1  christos @IndexSubentry{Commands, @code{compare}}
   2224      1.1  christos @code{compare} command.
   2225      1.1  christos 
   2226      1.1  christos The comparison mode is enabled through @command{compare on} and with
   2227      1.1  christos @command{compare off} it is disabled again.
   2228      1.1  christos In addition to @samp{on}, or @samp{off}, this command also supports
   2229      1.1  christos the @samp{delta} and @samp{ratio} keywords.
   2230      1.1  christos 
   2231      1.1  christos This is the script that will be used in our example.  It sets the comparison
   2232      1.1  christos mode to @samp{on}:
   2233      1.1  christos 
   2234      1.1  christos @smallexample
   2235      1.1  christos @verbatim
   2236      1.1  christos # Define the metrics
   2237      1.1  christos metrics e.%totalcpu
   2238      1.1  christos # Limit the output to 5 lines
   2239      1.1  christos limit 5
   2240      1.1  christos # Set the comparison mode to differences
   2241      1.1  christos compare on
   2242      1.1  christos # Get the function overview
   2243      1.1  christos functions
   2244      1.1  christos @end verbatim
   2245      1.1  christos @end smallexample
   2246      1.1  christos 
   2247      1.1  christos Assuming this script file is called @file{my-script-comp}, this is how
   2248      1.1  christos it is used to display the differences:
   2249      1.1  christos 
   2250      1.1  christos @cartouche
   2251      1.1  christos @smallexample
   2252      1.1  christos $ gprofng display text -script my-script-comp mxv.2.thr.er mxv.4.thr.er
   2253      1.1  christos @end smallexample
   2254      1.1  christos @end cartouche
   2255      1.1  christos 
   2256      1.1  christos @noindent
   2257      1.1  christos This produces the output shown below.  The data for the first experiment
   2258      1.1  christos is shown as absolute numbers.  The timings for the other experiment are
   2259      1.1  christos shown as a delta relative to these reference numbers:
   2260      1.1  christos 
   2261      1.1  christos @smallexample
   2262      1.1  christos @verbatim
   2263      1.1  christos 
   2264      1.1  christos mxv.2.thr.er  mxv.4.thr.er
   2265      1.1  christos Excl. Total   Excl. Total     Name
   2266      1.1  christos CPU           CPU
   2267      1.1  christos  sec.      %    sec.      %
   2268      1.1  christos 9.464 100.00  11.104 100.00   <Total>
   2269      1.1  christos 8.961  94.69  10.592  95.39   mxv_core
   2270      1.1  christos 0.224   2.37   0.249   2.24   init_data
   2271      1.1  christos 0.105   1.11   0.094   0.84   erand48_r
   2272      1.1  christos 0.073   0.77   0.060   0.54   __drand48_iterate
   2273      1.1  christos @end verbatim
   2274      1.1  christos @end smallexample
   2275      1.1  christos 
   2276      1.1  christos This table is already helpful to more easily compare (two) profiles, but
   2277      1.1  christos there is more that we can do here.
   2278      1.1  christos 
   2279      1.1  christos By default, in comparison mode, all measured values are shown.  Often
   2280      1.1  christos profiling is about comparing performance data.  It is therefore
   2281      1.1  christos sometimes more useful to look at differences or ratios, using one
   2282      1.1  christos experiment as a reference.
   2283      1.1  christos 
   2284      1.1  christos The values shown are relative to this difference.  For example if a ratio
   2285      1.1  christos is below one, it means the reference value was higher.
   2286      1.1  christos 
   2287      1.1  christos In the example below, we use the same two experiments used in the comparison
   2288  1.1.1.2  christos above.  The script is also nearly identical.  The only change is that we now
   2289      1.1  christos use the @samp{delta} keyword.
   2290      1.1  christos 
   2291      1.1  christos As before, the number of lines is restricted to 5 and we focus on
   2292      1.1  christos the exclusive timings plus percentages.  For the comparison part we are
   2293      1.1  christos interested in the differences.
   2294      1.1  christos 
   2295      1.1  christos This is the script that produces such an overview:
   2296      1.1  christos 
   2297      1.1  christos @smallexample
   2298      1.1  christos @verbatim
   2299      1.1  christos # Define the metrics
   2300      1.1  christos metrics e.%totalcpu
   2301      1.1  christos # Limit the output to 5 lines
   2302      1.1  christos limit 5
   2303      1.1  christos # Set the comparison mode to differences
   2304      1.1  christos compare delta
   2305      1.1  christos # Get the function overview
   2306      1.1  christos functions
   2307      1.1  christos @end verbatim
   2308      1.1  christos @end smallexample
   2309      1.1  christos 
   2310      1.1  christos Assuming this script file is called @file{my-script-comp2}, this is how we
   2311      1.1  christos get the table displayed on our screen:
   2312      1.1  christos 
   2313      1.1  christos @cartouche
   2314      1.1  christos @smallexample
   2315      1.1  christos $ gprofng display text -script my-script-comp2 mxv.2.thr.er mxv.4.thr.er
   2316      1.1  christos @end smallexample
   2317      1.1  christos @end cartouche
   2318      1.1  christos 
   2319      1.1  christos Leaving out some of the lines printed, but we have seen before, we get
   2320      1.1  christos the following table:
   2321      1.1  christos 
   2322      1.1  christos @smallexample
   2323      1.1  christos @verbatim
   2324      1.1  christos mxv.2.thr.er  mxv.4.thr.er
   2325      1.1  christos Excl. Total   Excl. Total     Name
   2326      1.1  christos CPU           CPU
   2327      1.1  christos  sec.      %   delta      %
   2328      1.1  christos 9.464 100.00  +1.640 100.00   <Total>
   2329      1.1  christos 8.961  94.69  +1.631  95.39   mxv_core
   2330      1.1  christos 0.224   2.37  +0.025   2.24   init_data
   2331      1.1  christos 0.105   1.11  -0.011   0.84   erand48_r
   2332      1.1  christos 0.073   0.77  -0.013   0.54   __drand48_iterate
   2333      1.1  christos @end verbatim
   2334      1.1  christos @end smallexample
   2335      1.1  christos 
   2336      1.1  christos It is now easier to see that the CPU times for the most time consuming
   2337      1.1  christos functions in this code are practically the same.
   2338      1.1  christos 
   2339      1.1  christos It is also possible to show ratio's through the @command{compare ratio}
   2340      1.1  christos @IndexSubentry{Options, @code{-compare}}
   2341      1.1  christos @IndexSubentry{Commands, @code{compare}}
   2342      1.1  christos command.  The first colum is used as a reference and the values for
   2343      1.1  christos the other columns with metrics are derived by dividing the value by
   2344      1.1  christos the reference.  The result for such a comparison is shown below:
   2345      1.1  christos 
   2346      1.1  christos @smallexample
   2347      1.1  christos @verbatim
   2348      1.1  christos mxv.2.thr.er  mxv.4.thr.er
   2349      1.1  christos Excl. Total   Excl. Total CPU    Name
   2350      1.1  christos CPU
   2351      1.1  christos  sec.      %      ratio      %
   2352      1.1  christos 9.464 100.00  x   1.173 100.00   <Total>
   2353      1.1  christos 8.961  94.69  x   1.182  95.39   mxv_core
   2354      1.1  christos 0.224   2.37  x   1.111   2.24   init_data
   2355      1.1  christos 0.105   1.11  x   0.895   0.84   erand48_r
   2356      1.1  christos 0.073   0.77  x   0.822   0.54   __drand48_iterate
   2357      1.1  christos @end verbatim
   2358      1.1  christos @end smallexample
   2359      1.1  christos 
   2360      1.1  christos Note that the comparison feature is supported at the function, source, and
   2361      1.1  christos disassembly level.  There is no practical limit on the number of experiments
   2362      1.1  christos that can be used in a comparison.
   2363      1.1  christos 
   2364      1.1  christos @c -- A new node --------------------------------------------------------------
   2365      1.1  christos @node    Profile Hardware Event Counters
   2366      1.1  christos @section Profile Hardware Event Counters
   2367      1.1  christos @c ----------------------------------------------------------------------------
   2368      1.1  christos 
   2369      1.1  christos Many processors provide a set of hardware event counters and @ToolName{}
   2370      1.1  christos provides support for this feature.
   2371      1.1  christos @xref{Hardware Event Counters Explained} for those readers that are not
   2372      1.1  christos familiar with such counters and like to learn more.
   2373      1.1  christos 
   2374      1.1  christos In this section we explain how to get the details on the event counter
   2375      1.1  christos support for the processor used in the experiment(s), and show several
   2376      1.1  christos examples.
   2377      1.1  christos 
   2378      1.1  christos @c -- A new node --------------------------------------------------------------
   2379      1.1  christos @node       Getting Information on the Counters Supported
   2380      1.1  christos @subsection Getting Information on the Counters Supported
   2381      1.1  christos @c ----------------------------------------------------------------------------
   2382      1.1  christos 
   2383      1.1  christos The first step is to check if the processor used for the experiments is
   2384      1.1  christos supported by @ToolName{}.
   2385      1.1  christos @IndexSubentry{Options, @code{-h}}
   2386      1.1  christos The @code{-h} option on @CollectApp{} will show the event counter
   2387      1.1  christos information:
   2388      1.1  christos 
   2389      1.1  christos @cartouche
   2390      1.1  christos @smallexample
   2391      1.1  christos $ gprofng collect app -h
   2392      1.1  christos @end smallexample
   2393      1.1  christos @end cartouche
   2394      1.1  christos 
   2395      1.1  christos In case the counters are supported, a list with the events is printed.
   2396      1.1  christos Otherwise, a warning message will be issued.
   2397      1.1  christos 
   2398      1.1  christos For example, below we show this command and the output on an Intel Xeon
   2399      1.1  christos Platinum 8167M (aka ``Skylake'') processor.  The output has been split
   2400      1.1  christos into several sections and each section is commented upon separately.
   2401      1.1  christos 
   2402      1.1  christos @smallexample
   2403      1.1  christos @verbatim
   2404      1.1  christos Run "gprofng collect app --help" for a usage message.
   2405      1.1  christos 
   2406      1.1  christos Specifying HW counters on `Intel Arch PerfMon v2 on Family 6 Model 85'
   2407      1.1  christos (cpuver=2499):
   2408      1.1  christos 
   2409      1.1  christos   -h {auto|lo|on|hi}
   2410      1.1  christos 	turn on default set of HW counters at the specified rate
   2411      1.1  christos   -h <ctr_def> [-h <ctr_def>]...
   2412      1.1  christos   -h <ctr_def>[,<ctr_def>]...
   2413      1.1  christos 	specify HW counter profiling for up to 4 HW counters
   2414      1.1  christos @end verbatim
   2415      1.1  christos @end smallexample
   2416      1.1  christos 
   2417      1.1  christos The first line shows how to get a usage overview.  This is followed by
   2418      1.1  christos some information on the target processor.
   2419      1.1  christos The next five lines explain in what ways the @code{-h} option can be
   2420      1.1  christos used to define the events to be monitored.
   2421      1.1  christos 
   2422      1.1  christos The first version shown above enables a default set of counters.  This
   2423      1.1  christos default depends on the processor this command is executed on.  The
   2424      1.1  christos keyword following the @code{-h} option defines the sampling rate:
   2425      1.1  christos 
   2426      1.1  christos @table @code
   2427      1.1  christos 
   2428      1.1  christos @item auto
   2429      1.1  christos Match the sample rate of used by clock profiling.  If the latter is disabled,
   2430      1.1  christos Use a per thread sampling rate of approximately 100 samples per second.
   2431      1.1  christos This setting is the default and preferred.
   2432      1.1  christos 
   2433      1.1  christos @item on
   2434      1.1  christos Use a per thread sampling rate of approximately 100 samples per second.
   2435      1.1  christos 
   2436      1.1  christos @item lo
   2437      1.1  christos Use a per thread sampling rate of approximately 10 samples per second.
   2438      1.1  christos 
   2439      1.1  christos @item hi
   2440      1.1  christos Use a per thread sampling rate of approximately 1000 samples per second.
   2441      1.1  christos 
   2442      1.1  christos @end table
   2443      1.1  christos 
   2444      1.1  christos The second and third variant define the events to be monitored.  Note
   2445      1.1  christos that the number of simultaneous events supported is printed.  In this
   2446      1.1  christos case we can monitor four events in a single profiling job.
   2447      1.1  christos 
   2448      1.1  christos It is a matter of preference whether you like to use the @code{-h}
   2449      1.1  christos option for each event, or use it once, followed by a comma separated
   2450      1.1  christos list.
   2451      1.1  christos 
   2452      1.1  christos There is one slight catch though.  The counter definition below has
   2453      1.1  christos mandatory comma (@code{,}) between the event and the rate.  While a
   2454      1.1  christos default can be used for the rate, the comma cannot be omitted.
   2455      1.1  christos This may result in a somewhat awkward counter definition in case
   2456      1.1  christos the default sampling rate is used.
   2457      1.1  christos 
   2458      1.1  christos For example, the following two commands are equivalent.  Note
   2459      1.1  christos the double comma in the second command.  This is not a typo.
   2460      1.1  christos 
   2461      1.1  christos @cartouche
   2462      1.1  christos @smallexample
   2463      1.1  christos $ gprofng collect app -h cycles -h insts ...
   2464      1.1  christos $ gprofng collect app -h cycles,,insts ...
   2465      1.1  christos @end smallexample
   2466      1.1  christos @end cartouche
   2467      1.1  christos 
   2468      1.1  christos In the first command this comma is not needed, because a
   2469      1.1  christos comma (``@code{,}'') immediately followed by white space may
   2470      1.1  christos be omitted.
   2471      1.1  christos 
   2472      1.1  christos This is why we prefer the this syntax and in the remainder will
   2473      1.1  christos use the first version of this command.
   2474      1.1  christos 
   2475      1.1  christos @IndexSubentry{Hardware event counters, counter definition}
   2476      1.1  christos The counter definition takes an event name, plus optionally one or
   2477      1.1  christos more attributes, followed by a comma, and optionally the sampling rate.
   2478      1.1  christos The output section below shows the formal definition.
   2479      1.1  christos 
   2480      1.1  christos @cartouche
   2481      1.1  christos @smallexample
   2482      1.1  christos   <ctr_def> == <ctr>[[~<attr>=<val>]...],[<rate>]
   2483      1.1  christos @end smallexample
   2484      1.1  christos @end cartouche
   2485      1.1  christos 
   2486      1.1  christos The printed help then explains this syntax.  Below we have summarized
   2487      1.1  christos and expanded this output:
   2488      1.1  christos 
   2489      1.1  christos @table @code
   2490      1.1  christos 
   2491      1.1  christos @item @var{<ctr>}
   2492      1.1  christos The counter name must be selected from the available counters listed
   2493      1.1  christos as part of the output printed with the @code{-h} option.
   2494      1.1  christos On most systems, if a counter is not listed, it may still be specified
   2495      1.1  christos by its numeric value.
   2496      1.1  christos 
   2497      1.1  christos @item @var{~<attr>=<val>}
   2498      1.1  christos This is an optional attribute that depends on the processor.  The list
   2499      1.1  christos of supported attributes is printed in the output.  Examples of
   2500      1.1  christos attributes are ``user'', or ``system''.  The value can given in decimal
   2501      1.1  christos or hexadecimal format.
   2502      1.1  christos Multiple attributes may be specified, and each must be preceded
   2503      1.1  christos by a ~.
   2504      1.1  christos 
   2505      1.1  christos @item @var{<rate>}
   2506      1.1  christos 
   2507      1.1  christos The sampling rate is one of the following:
   2508      1.1  christos 
   2509      1.1  christos @table @code
   2510      1.1  christos 
   2511      1.1  christos @item auto
   2512      1.1  christos This is the default and matches the rate used by clock profiling.
   2513      1.1  christos If clock profiling is disabled, use @samp{on}.
   2514      1.1  christos 
   2515      1.1  christos @item on
   2516      1.1  christos Set the per thread maximum sampling rate to ~100 samples/second
   2517      1.1  christos 
   2518      1.1  christos @item lo
   2519      1.1  christos Set the per thread maximum sampling rate to ~10 samples/second
   2520      1.1  christos 
   2521      1.1  christos @item hi
   2522      1.1  christos Set the per thread maximum sampling rate to ~1000 samples/second
   2523      1.1  christos 
   2524      1.1  christos @item @var{<interval>}
   2525      1.1  christos Define the sampling interval.
   2526      1.1  christos @xref{Control the Sampling Frequency} how to define this.
   2527      1.1  christos 
   2528      1.1  christos @end table
   2529      1.1  christos 
   2530      1.1  christos @end table
   2531      1.1  christos 
   2532      1.1  christos After the section with the formal definition of events and counters, a
   2533      1.1  christos processor specific list is displayed.  This part starts with an overview
   2534      1.1  christos of the default set of counters and the aliased names supported
   2535      1.1  christos @emph{on this specific processor}.
   2536      1.1  christos 
   2537      1.1  christos @smallexample
   2538      1.1  christos @verbatim
   2539      1.1  christos Default set of HW counters:
   2540      1.1  christos 
   2541      1.1  christos     -h cycles,,insts,,llm
   2542      1.1  christos 
   2543      1.1  christos Aliases for most useful HW counters:
   2544      1.1  christos 
   2545      1.1  christos  alias    raw name                   type units regs description
   2546      1.1  christos 
   2547      1.1  christos  cycles   unhalted-core-cycles   CPU-cycles 0123 CPU Cycles
   2548      1.1  christos  insts    instruction-retired        events 0123 Instructions Executed
   2549      1.1  christos  llm      llc-misses                 events 0123 Last-Level Cache Misses
   2550      1.1  christos  br_msp   branch-misses-retired      events 0123 Branch Mispredict
   2551      1.1  christos  br_ins   branch-instruction-retired events 0123 Branch Instructions
   2552      1.1  christos @end verbatim
   2553      1.1  christos @end smallexample
   2554      1.1  christos 
   2555      1.1  christos @noindent
   2556      1.1  christos The definitions given above may or may not be available on other processors.
   2557      1.1  christos 
   2558      1.1  christos The table above shows the default set of counters defined for this processor,
   2559      1.1  christos and the aliases.  For each alias the full ``raw'' name is given, plus the
   2560      1.1  christos unit of the number returned by the counter (CPU cycles, or a raw count),
   2561      1.1  christos the hardware counter the event is allowed to be mapped onto, and a short
   2562      1.1  christos description.
   2563      1.1  christos 
   2564      1.1  christos The last part of the output contains all the events that can be monitored:
   2565      1.1  christos 
   2566      1.1  christos @smallexample
   2567      1.1  christos @verbatim
   2568      1.1  christos Raw HW counters:
   2569      1.1  christos 
   2570      1.1  christos     name                                type      units regs description
   2571      1.1  christos 
   2572      1.1  christos     unhalted-core-cycles                     CPU-cycles 0123
   2573      1.1  christos     unhalted-reference-cycles                    events 0123
   2574      1.1  christos     instruction-retired                          events 0123
   2575      1.1  christos     llc-reference                                events 0123
   2576      1.1  christos     llc-misses                                   events 0123
   2577      1.1  christos     branch-instruction-retired                   events 0123
   2578      1.1  christos     branch-misses-retired                        events 0123
   2579      1.1  christos     ld_blocks.store_forward                      events 0123
   2580      1.1  christos     ld_blocks.no_sr                              events 0123
   2581      1.1  christos     ld_blocks_partial.address_alias              events 0123
   2582      1.1  christos     dtlb_load_misses.miss_causes_a_walk          events 0123
   2583      1.1  christos     dtlb_load_misses.walk_completed_4k           events 0123
   2584      1.1  christos 
   2585      1.1  christos     <many lines deleted>
   2586      1.1  christos 
   2587      1.1  christos     l2_lines_out.silent                          events 0123
   2588      1.1  christos     l2_lines_out.non_silent                      events 0123
   2589      1.1  christos     l2_lines_out.useless_hwpf                    events 0123
   2590      1.1  christos     sq_misc.split_lock                           events 0123
   2591      1.1  christos @end verbatim
   2592      1.1  christos @end smallexample
   2593      1.1  christos 
   2594      1.1  christos As can be seen, these names are not always easy to correlate to a specific
   2595      1.1  christos event of interest.  The processor manual should provide more clarity on this.
   2596      1.1  christos 
   2597      1.1  christos @c -- A new node --------------------------------------------------------------
   2598      1.1  christos @node       Examples Using Hardware Event Counters
   2599      1.1  christos @subsection Examples Using Hardware Event Counters
   2600      1.1  christos @c ----------------------------------------------------------------------------
   2601      1.1  christos 
   2602      1.1  christos The previous section may give the impression that these counters are hard to
   2603      1.1  christos use, but as we will show now, in practice it is quite simple.
   2604      1.1  christos 
   2605      1.1  christos With the information from the @code{-h} option, we can easily set up our first
   2606      1.1  christos event counter experiment.
   2607      1.1  christos 
   2608      1.1  christos We start by using the default set of counters defined for our processor and we
   2609      1.1  christos use 2 threads:
   2610      1.1  christos 
   2611      1.1  christos @cartouche
   2612      1.1  christos @smallexample
   2613      1.1  christos $ exe=mxv-pthreads
   2614      1.1  christos $ m=8000
   2615      1.1  christos $ n=4000
   2616      1.1  christos $ exp=mxv.hwc.def.2.thr.er
   2617      1.1  christos $ gprofng collect app -O $exp -h auto ./$exe -m $m -n $n -t 2
   2618      1.1  christos @end smallexample
   2619      1.1  christos @end cartouche
   2620      1.1  christos 
   2621      1.1  christos @IndexSubentry{Options, @code{-h}}
   2622      1.1  christos @IndexSubentry{Hardware event counters, @code{auto} option}
   2623      1.1  christos The new option here is @code{-h auto}.  The @code{auto} keyword enables
   2624      1.1  christos hardware event counter profiling and selects the default set of counters
   2625      1.1  christos defined for this processor.
   2626      1.1  christos 
   2627      1.1  christos As before, we can display the information, but there is one practical hurdle
   2628      1.1  christos to take.  Unless we like to view all metrics recorded, we would need to know
   2629      1.1  christos the names of the events that have been enabled.  This is tedious and also not
   2630      1.1  christos portable in case we would like to repeat this experiment on another processor.
   2631      1.1  christos 
   2632      1.1  christos @IndexSubentry{Hardware event counters, @code{hwc} metric}
   2633      1.1  christos This is where the special @code{hwc} metric comes very handy.  It
   2634      1.1  christos automatically expands to the active set of hardware event counters used
   2635      1.1  christos in the experiment(s).
   2636      1.1  christos 
   2637      1.1  christos With this, it is very easy to display the event counter values.  Note that
   2638      1.1  christos although the regular clock based profiling was enabled, we only want to see
   2639      1.1  christos the counter values.  We also request to see the percentages and limit the
   2640      1.1  christos output to the first 5 lines:
   2641      1.1  christos 
   2642      1.1  christos @cartouche
   2643      1.1  christos @smallexample
   2644      1.1  christos $ exp=mxv.hwc.def.2.thr.er
   2645      1.1  christos $ gprofng display text -metrics e.%hwc -limit 5 -functions $exp
   2646      1.1  christos @end smallexample
   2647      1.1  christos @end cartouche
   2648      1.1  christos 
   2649      1.1  christos @smallexample
   2650      1.1  christos @verbatim
   2651      1.1  christos Current metrics: e.%cycles:e+%insts:e+%llm:name
   2652      1.1  christos Current Sort Metric: Exclusive CPU Cycles ( e.%cycles )
   2653      1.1  christos Print limit set to 5
   2654      1.1  christos Functions sorted by metric: Exclusive CPU Cycles
   2655      1.1  christos 
   2656      1.1  christos Excl. CPU     Excl. Instructions  Excl. Last-Level   Name
   2657      1.1  christos Cycles        Executed            Cache Misses
   2658      1.1  christos  sec.      %                  %                 %
   2659      1.1  christos 2.691 100.00  7906475309 100.00   122658983 100.00   <Total>
   2660      1.1  christos 2.598  96.54  7432724378  94.01   121745696  99.26   mxv_core
   2661      1.1  christos 0.035   1.31   188860269   2.39       70084   0.06   erand48_r
   2662      1.1  christos 0.026   0.95    73623396   0.93      763116   0.62   init_data
   2663      1.1  christos 0.018   0.66    76824434   0.97       40040   0.03   drand48
   2664      1.1  christos @end verbatim
   2665      1.1  christos @end smallexample
   2666      1.1  christos 
   2667      1.1  christos As we have seen before, the first few lines echo the settings.
   2668      1.1  christos This includes a list with the hardware event counters used by
   2669      1.1  christos default.
   2670      1.1  christos 
   2671      1.1  christos The table that follows makes it very easy to get an overview where the
   2672      1.1  christos time is spent and how many of the target events have occurred.
   2673      1.1  christos 
   2674      1.1  christos As before, we can drill down deeper and see the same metrics at the source
   2675      1.1  christos line and instruction level.  Other than using @code{hwc} in the metrics
   2676      1.1  christos definitions, nothing has changed compared to the previous examples:
   2677      1.1  christos 
   2678      1.1  christos @cartouche
   2679      1.1  christos @smallexample
   2680      1.1  christos $ exp=mxv.hwc.def.2.thr.er
   2681      1.1  christos $ gprofng display text -metrics e.hwc -source mxv_core $exp
   2682      1.1  christos @end smallexample
   2683      1.1  christos @end cartouche
   2684      1.1  christos 
   2685      1.1  christos This is the relevant part of the output.  Since the lines get very long,
   2686      1.1  christos we have somewhat modified the lay-out:
   2687      1.1  christos 
   2688      1.1  christos @smallexample
   2689      1.1  christos @verbatim
   2690      1.1  christos    Excl. CPU Excl.        Excl.
   2691      1.1  christos    Cycles    Instructions Last-Level
   2692      1.1  christos     sec.     Executed     Cache Misses
   2693      1.1  christos                                          <Function: mxv_core>
   2694      1.1  christos    0.                 0          0   32. void __attribute__ ((noinline))
   2695      1.1  christos                                          mxv_core(...)
   2696      1.1  christos    0.                 0          0   33. {
   2697      1.1  christos    0.                 0          0   34.   for (uint64_t i=...) {
   2698      1.1  christos    0.                 0          0   35.     double row_sum = 0.0;
   2699      1.1  christos ## 1.872     7291879319   88150571   36.     for (int64_t j=0; j<n; j++)
   2700      1.1  christos    0.725      140845059   33595125   37.        row_sum += A[i][j]*b[j];
   2701      1.1  christos    0.                 0          0   38.     c[i] = row_sum;
   2702      1.1  christos                                      39.    }
   2703      1.1  christos    0.                 0          0   40. }
   2704      1.1  christos @end verbatim
   2705      1.1  christos @end smallexample
   2706      1.1  christos 
   2707      1.1  christos In a smiliar way we can display the event counter values at the instruction
   2708      1.1  christos level.  Again we have modified the lay-out due to page width limitations:
   2709      1.1  christos 
   2710      1.1  christos @cartouche
   2711      1.1  christos @smallexample
   2712      1.1  christos $ exp=mxv.hwc.def.2.thr.er
   2713      1.1  christos $ gprofng display text -metrics e.hwc -disasm mxv_core $exp
   2714      1.1  christos @end smallexample
   2715      1.1  christos @end cartouche
   2716      1.1  christos 
   2717      1.1  christos @smallexample
   2718      1.1  christos @verbatim
   2719      1.1  christos    Excl. CPU Excl.        Excl.
   2720      1.1  christos    Cycles    Instructions Last-Level
   2721      1.1  christos     sec.     Executed     Cache Misses
   2722      1.1  christos                                                 <Function: mxv_core>
   2723      1.1  christos    0.                 0          0  [33] 4021ba: mov   0x8(%rsp),%r10
   2724      1.1  christos                                     34.   for (uint64_t i=...) {
   2725      1.1  christos    0.                 0          0  [34] 4021bf: cmp   %rsi,%rdi
   2726      1.1  christos    0.                 0          0  [34] 4021c2: jbe   0x37
   2727      1.1  christos    0.                 0          0  [34] 4021c4: ret
   2728      1.1  christos                                     35.       double row_sum = 0.0;
   2729      1.1  christos                                     36.       for (int64_t j=0; j<n; j++)
   2730      1.1  christos                                     37.         row_sum += A[i][j]*b[j];
   2731      1.1  christos    0.                 0          0  [37] 4021c5: mov   (%r8,%rdi,8),%rdx
   2732      1.1  christos    0.                 0          0  [36] 4021c9: mov   $0x0,%eax
   2733      1.1  christos    0.                 0          0  [35] 4021ce: pxor  %xmm1,%xmm1
   2734      1.1  christos    0.002       12804230     321394  [37] 4021d2: movsd (%rdx,%rax,8),%xmm0
   2735      1.1  christos    0.141       60819025    3866677  [37] 4021d7: mulsd (%r9,%rax,8),%xmm0
   2736      1.1  christos    0.582       67221804   29407054  [37] 4021dd: addsd %xmm0,%xmm1
   2737      1.1  christos ## 1.871     7279075109   87989870  [36] 4021e1: add   $0x1,%rax
   2738      1.1  christos    0.002       12804210      80351  [36] 4021e5: cmp   %rax,%rcx
   2739      1.1  christos    0.                 0          0  [36] 4021e8: jne   0xffffffffffffffea
   2740      1.1  christos                                     38.       c[i] = row_sum;
   2741      1.1  christos    0.                 0          0  [38] 4021ea: movsd %xmm1,(%r10,%rdi,8)
   2742      1.1  christos    0.                 0          0  [34] 4021f0: add   $0x1,%rdi
   2743      1.1  christos    0.                 0          0  [34] 4021f4: cmp   %rdi,%rsi
   2744      1.1  christos    0.                 0          0  [34] 4021f7: jb    0xd
   2745      1.1  christos    0.                 0          0  [35] 4021f9: pxor  %xmm1,%xmm1
   2746      1.1  christos    0.                 0          0  [36] 4021fd: test  %rcx,%rcx
   2747      1.1  christos    0.                 0      80350  [36] 402200: jne   0xffffffffffffffc5
   2748      1.1  christos    0.                 0          0  [36] 402202: jmp   0xffffffffffffffe8
   2749      1.1  christos                                     39.   }
   2750      1.1  christos                                     40. }
   2751      1.1  christos    0.                 0          0  [40]  402204:  ret
   2752      1.1  christos @end verbatim
   2753      1.1  christos @end smallexample
   2754      1.1  christos 
   2755      1.1  christos So far we have used the default settings for the event counters.  It is
   2756      1.1  christos quite straightforward to select specific counters.  For sake of the
   2757      1.1  christos example, let's assume we would like to count how many branch instructions
   2758      1.1  christos and retired memory load instructions that missed in the L1 cache have been
   2759      1.1  christos executed.  We also want to count these events with a high resolution.
   2760      1.1  christos 
   2761      1.1  christos This is the command to do so:
   2762      1.1  christos 
   2763      1.1  christos @cartouche
   2764      1.1  christos @smallexample
   2765      1.1  christos $ exe=mxv-pthreads
   2766      1.1  christos $ m=8000
   2767      1.1  christos $ n=4000
   2768      1.1  christos $ exp=mxv.hwc.sel.2.thr.er
   2769      1.1  christos $ hwc1=br_ins,hi
   2770      1.1  christos $ hwc2=mem_load_retired.l1_miss,hi
   2771      1.1  christos $ gprofng collect app -O $exp -h $hwc1 -h $hwc2 $exe -m $m -n $n -t 2
   2772      1.1  christos @end smallexample
   2773      1.1  christos @end cartouche
   2774      1.1  christos 
   2775      1.1  christos As before, we get a table with the event counts.  Due to the very
   2776      1.1  christos long name for the second counter, we have somewhat modified the
   2777      1.1  christos output.
   2778      1.1  christos 
   2779      1.1  christos @cartouche
   2780      1.1  christos @smallexample
   2781      1.1  christos $ gprofng display text -limit 10 -functions mxv.hwc.sel.2.thr.er
   2782      1.1  christos @end smallexample
   2783      1.1  christos @end cartouche
   2784      1.1  christos 
   2785      1.1  christos @smallexample
   2786      1.1  christos @verbatim
   2787      1.1  christos Functions sorted by metric: Exclusive Total CPU Time
   2788      1.1  christos Excl.     Incl.     Excl. Branch  Excl.                 Name
   2789      1.1  christos Total     Total     Instructions  mem_load_retired.l1_miss
   2790      1.1  christos CPU sec.  CPU sec.                Events
   2791      1.1  christos 2.597     2.597     1305305319    4021340               <Total>
   2792      1.1  christos 2.481     2.481     1233233242    3982327               mxv_core
   2793      1.1  christos 0.040     0.107       19019012       9003               init_data
   2794      1.1  christos 0.028     0.052       23023048      15006               erand48_r
   2795      1.1  christos 0.024     0.024       19019008       9004               __drand48_iterate
   2796      1.1  christos 0.015     0.067       11011009       2998               drand48
   2797      1.1  christos 0.008     0.010              0       3002               _int_malloc
   2798      1.1  christos 0.001     0.001              0          0               brk
   2799      1.1  christos 0.001     0.002              0          0               sysmalloc
   2800      1.1  christos 0.        0.001              0          0               __default_morecore
   2801      1.1  christos @end verbatim
   2802      1.1  christos @end smallexample
   2803      1.1  christos 
   2804      1.1  christos @IndexSubentry{Options,  @code{-compare}}
   2805      1.1  christos @IndexSubentry{Commands, @code{compare}}
   2806      1.1  christos When using event counters, the values could be very large and it is not easy
   2807      1.1  christos to compare the numbers.  As we will show next, the @code{ratio} feature is
   2808      1.1  christos very useful when comparing such profiles.
   2809      1.1  christos 
   2810      1.1  christos To demonstrate this, we have set up another event counter experiment where
   2811      1.1  christos we would like to compare the number of last level cache miss and the number
   2812      1.1  christos of branch instructions executed when using a single thread, or two threads.
   2813      1.1  christos 
   2814      1.1  christos These are the commands used to generate the experiment directories:
   2815      1.1  christos 
   2816      1.1  christos @cartouche
   2817      1.1  christos @smallexample
   2818      1.1  christos $ exe=./mxv-pthreads
   2819      1.1  christos $ m=8000
   2820      1.1  christos $ n=4000
   2821      1.1  christos $ exp1=mxv.hwc.comp.1.thr.er
   2822      1.1  christos $ exp2=mxv.hwc.comp.2.thr.er
   2823      1.1  christos $ gprofng collect app -O $exp1 -h llm -h br_ins $exe -m $m -n $n -t 1
   2824      1.1  christos $ gprofng collect app -O $exp2 -h llm -h br_ins $exe -m $m -n $n -t 2
   2825      1.1  christos @end smallexample
   2826      1.1  christos @end cartouche
   2827      1.1  christos 
   2828      1.1  christos The following script has been used to get the tables.  Due to lay-out
   2829      1.1  christos restrictions, we have to create two tables, one for each counter.
   2830      1.1  christos 
   2831      1.1  christos @cartouche
   2832      1.1  christos @smallexample
   2833      1.1  christos # Limit the output to 5 lines
   2834      1.1  christos limit 5
   2835      1.1  christos # Define the metrics
   2836      1.1  christos metrics name:e.llm
   2837      1.1  christos # Set the comparison to ratio
   2838      1.1  christos compare ratio
   2839      1.1  christos functions
   2840      1.1  christos # Define the metrics
   2841      1.1  christos metrics name:e.br_ins
   2842      1.1  christos # Set the comparison to ratio
   2843      1.1  christos compare ratio
   2844      1.1  christos functions
   2845      1.1  christos @end smallexample
   2846      1.1  christos @end cartouche
   2847      1.1  christos 
   2848      1.1  christos Note that we print the name of the function first, followed by the counter
   2849      1.1  christos data.
   2850      1.1  christos The new element is that we set the comparison mode to @code{ratio}.  This
   2851      1.1  christos divides the data in a column by its counterpart in the reference experiment.
   2852      1.1  christos 
   2853      1.1  christos This is the command using this script and the two experiment directories as
   2854      1.1  christos input:
   2855      1.1  christos 
   2856      1.1  christos @cartouche
   2857      1.1  christos @smallexample
   2858      1.1  christos $ gprofng display text -script my-script-comp-counters \
   2859      1.1  christos   mxv.hwc.comp.1.thr.er \
   2860      1.1  christos   mxv.hwc.comp.2.thr.er
   2861      1.1  christos @end smallexample
   2862      1.1  christos @end cartouche
   2863      1.1  christos 
   2864      1.1  christos By design, we get two tables, one for each counter:
   2865      1.1  christos 
   2866      1.1  christos @smallexample
   2867      1.1  christos @verbatim
   2868      1.1  christos Functions sorted by metric: Exclusive Last-Level Cache Misses
   2869      1.1  christos 
   2870      1.1  christos                               mxv.hwc.comp.1.thr.er  mxv.hwc.comp.2.thr.er
   2871      1.1  christos Name                          Excl. Last-Level       Excl. Last-Level
   2872      1.1  christos                               Cache Misses           Cache Misses
   2873      1.1  christos                                                          ratio
   2874      1.1  christos  <Total>                      122709276              x   0.788
   2875      1.1  christos  mxv_core                     121796001              x   0.787
   2876      1.1  christos  init_data                       723064              x   1.055
   2877      1.1  christos  erand48_r                       100111              x   0.500
   2878      1.1  christos  drand48                          60065              x   1.167
   2879      1.1  christos 
   2880      1.1  christos Functions sorted by metric: Exclusive Branch Instructions
   2881      1.1  christos 
   2882      1.1  christos                               mxv.hwc.comp.1.thr.er  mxv.hwc.comp.2.thr.er
   2883      1.1  christos Name                          Excl. Branch           Excl. Branch
   2884      1.1  christos                               Instructions           Instructions
   2885      1.1  christos                                                        ratio
   2886      1.1  christos  <Total>                      1307307316             x 0.997
   2887      1.1  christos  mxv_core                     1235235239             x 0.997
   2888      1.1  christos  erand48_r                      23023033             x 0.957
   2889      1.1  christos  drand48                        20020009             x 0.600
   2890      1.1  christos  __drand48_iterate              17017028             x 0.882
   2891      1.1  christos @end verbatim
   2892      1.1  christos @end smallexample
   2893      1.1  christos 
   2894      1.1  christos A ratio less than one in the second column, means that this counter
   2895      1.1  christos value was smaller than the value from the reference experiment shown
   2896      1.1  christos in the first column.
   2897      1.1  christos 
   2898      1.1  christos This kind of presentation of the results makes it much easier to
   2899      1.1  christos quickly interpret the data.
   2900      1.1  christos 
   2901      1.1  christos We conclude this section with thread-level event counter overviews,
   2902      1.1  christos but before we go into this, there is an important metric we need to
   2903      1.1  christos mention.
   2904      1.1  christos 
   2905      1.1  christos @c -- TBD Explain <Total> for IPC
   2906      1.1  christos 
   2907      1.1  christos @IndexSubentry{Hardware event counters, IPC}
   2908      1.1  christos In case it is known how many instructions and CPU cycles have been executed,
   2909      1.1  christos the value for the IPC (``Instructions Per Clockycle'') can be computed.
   2910      1.1  christos @xref{Hardware Event Counters Explained}.
   2911      1.1  christos This is a derived metric that gives an indication how well the processor
   2912      1.1  christos is utilized.  The inverse of the IPC is called CPI.
   2913      1.1  christos @IndexSubentry{Hardware event counters, CPI}
   2914      1.1  christos 
   2915      1.1  christos The @DisplayText{} command automatically computes the IPC and CPI values
   2916      1.1  christos if an experiment contains the event counter values for the instructions
   2917      1.1  christos and CPU cycles executed.  These are part of the metric list and can be
   2918      1.1  christos displayed, just like any other metric.
   2919      1.1  christos 
   2920      1.1  christos @IndexSubentry{Options,  @code{-metric_list}}
   2921      1.1  christos @IndexSubentry{Commands, @code{metric_list}}
   2922      1.1  christos This can be verified through the @command{metric_list} command.  If we go
   2923      1.1  christos back to our earlier experiment with the default event counters, we get
   2924      1.1  christos the following result.
   2925      1.1  christos 
   2926      1.1  christos @cartouche
   2927      1.1  christos @smallexample
   2928      1.1  christos $ gprofng display text -metric_list mxv.hwc.def.2.thr.er
   2929      1.1  christos @end smallexample
   2930      1.1  christos @end cartouche
   2931      1.1  christos 
   2932      1.1  christos @smallexample
   2933      1.1  christos @verbatim
   2934      1.1  christos Current metrics: e.totalcpu:i.totalcpu:e.cycles:e+insts:e+llm:name
   2935      1.1  christos Current Sort Metric: Exclusive Total CPU Time ( e.totalcpu )
   2936      1.1  christos Available metrics:
   2937      1.1  christos          Exclusive Total CPU Time: e.%totalcpu
   2938      1.1  christos          Inclusive Total CPU Time: i.%totalcpu
   2939      1.1  christos              Exclusive CPU Cycles: e.+%cycles
   2940      1.1  christos              Inclusive CPU Cycles: i.+%cycles
   2941      1.1  christos   Exclusive Instructions Executed: e+%insts
   2942      1.1  christos   Inclusive Instructions Executed: i+%insts
   2943      1.1  christos Exclusive Last-Level Cache Misses: e+%llm
   2944      1.1  christos Inclusive Last-Level Cache Misses: i+%llm
   2945      1.1  christos  Exclusive Instructions Per Cycle: e+IPC
   2946      1.1  christos  Inclusive Instructions Per Cycle: i+IPC
   2947      1.1  christos  Exclusive Cycles Per Instruction: e+CPI
   2948      1.1  christos  Inclusive Cycles Per Instruction: i+CPI
   2949      1.1  christos                              Size: size
   2950      1.1  christos                        PC Address: address
   2951      1.1  christos                              Name: name
   2952      1.1  christos @end verbatim
   2953      1.1  christos @end smallexample
   2954      1.1  christos 
   2955      1.1  christos Among the other metrics, we see the new metrics for the IPC and CPI
   2956      1.1  christos listed.
   2957      1.1  christos 
   2958      1.1  christos In the script below, we use this information and add the IPC and CPI
   2959      1.1  christos to the metrics to be displayed.  We also use a the thread filter to
   2960      1.1  christos display these values for the individual threads.
   2961      1.1  christos 
   2962      1.1  christos This is the complete script we have used.  Other than a different selection
   2963      1.1  christos of the metrics, there are no new features.
   2964      1.1  christos 
   2965      1.1  christos @cartouche
   2966      1.1  christos @smallexample
   2967      1.1  christos # Define the metrics
   2968      1.1  christos metrics e.insts:e.%cycles:e.IPC:e.CPI
   2969      1.1  christos # Sort with respect to cycles
   2970      1.1  christos sort e.cycles
   2971      1.1  christos # Limit the output to 5 lines
   2972      1.1  christos limit 5
   2973      1.1  christos # Get the function overview for all threads
   2974      1.1  christos functions
   2975      1.1  christos # Get the function overview for thread 1
   2976      1.1  christos thread_select 1
   2977      1.1  christos functions
   2978      1.1  christos # Get the function overview for thread 2
   2979      1.1  christos thread_select 2
   2980      1.1  christos functions
   2981      1.1  christos # Get the function overview for thread 3
   2982      1.1  christos thread_select 3
   2983      1.1  christos functions
   2984      1.1  christos @end smallexample
   2985      1.1  christos @end cartouche
   2986      1.1  christos 
   2987      1.1  christos In the metrics definition on the second line, we explicitly request the
   2988      1.1  christos counter values for the instructions (@code{e.insts}) and CPU cycles
   2989      1.1  christos (@code{e.cycles}) executed.  These names can be found in output from the
   2990      1.1  christos @IndexSubentry{Options, @code{-metric_list}}
   2991      1.1  christos @IndexSubentry{Commands, @code{metric_list}}
   2992      1.1  christos @command{metric_list} command above.
   2993      1.1  christos In addition to these metrics, we also request the IPC and CPI to be shown.
   2994      1.1  christos 
   2995      1.1  christos @IndexSubentry{Options, @code{-limit}}
   2996      1.1  christos @IndexSubentry{Commands, @code{limit}}
   2997      1.1  christos As before, we used the @command{limit} command to control the number of
   2998      1.1  christos functions displayed.  We then request an overview for all the threads,
   2999      1.1  christos followed by three sets of two commands to select a thread and display the
   3000      1.1  christos function overview.
   3001      1.1  christos 
   3002      1.1  christos The script above is used as follows:
   3003      1.1  christos 
   3004      1.1  christos @cartouche
   3005      1.1  christos @smallexample
   3006      1.1  christos $ gprofng display text -script my-script-ipc mxv.hwc.def.2.thr.er
   3007      1.1  christos @end smallexample
   3008      1.1  christos @end cartouche
   3009      1.1  christos 
   3010      1.1  christos @noindent
   3011      1.1  christos This script produces four tables.  We list them separately below,
   3012      1.1  christos and have left out the additional output.
   3013      1.1  christos 
   3014      1.1  christos @noindent
   3015      1.1  christos The first table shows the accumulated values across the three
   3016      1.1  christos threads that have been active.
   3017      1.1  christos 
   3018      1.1  christos @smallexample
   3019      1.1  christos @verbatim
   3020      1.1  christos Functions sorted by metric: Exclusive CPU Cycles
   3021      1.1  christos 
   3022      1.1  christos Excl.         Excl. CPU     Excl.  Excl.   Name
   3023      1.1  christos Instructions  Cycles        IPC    CPI
   3024      1.1  christos Executed       sec.      %
   3025      1.1  christos 7906475309    2.691 100.00  1.473  0.679   <Total>
   3026      1.1  christos 7432724378    2.598  96.54  1.434  0.697   mxv_core
   3027      1.1  christos  188860269    0.035   1.31  2.682  0.373   erand48_r
   3028      1.1  christos   73623396    0.026   0.95  1.438  0.696   init_data
   3029      1.1  christos   76824434    0.018   0.66  2.182  0.458   drand48
   3030      1.1  christos @end verbatim
   3031      1.1  christos @end smallexample
   3032      1.1  christos 
   3033      1.1  christos @noindent
   3034      1.1  christos This shows that IPC of this program is completely dominated
   3035      1.1  christos by function @code{mxv_core}.  It has a fairly low IPC value
   3036      1.1  christos of 1.43.
   3037      1.1  christos 
   3038      1.1  christos @noindent
   3039      1.1  christos The next table is for thread 1 and shows the values for the
   3040      1.1  christos main thread.
   3041      1.1  christos 
   3042      1.1  christos @smallexample
   3043      1.1  christos @verbatim
   3044      1.1  christos Exp Sel Total
   3045      1.1  christos === === =====
   3046      1.1  christos   1 1       3
   3047      1.1  christos Functions sorted by metric: Exclusive CPU Cycles
   3048      1.1  christos 
   3049      1.1  christos Excl.         Excl. CPU     Excl.  Excl.   Name
   3050      1.1  christos Instructions  Cycles        IPC    CPI
   3051      1.1  christos Executed       sec.      %
   3052      1.1  christos 473750931     0.093 100.00  2.552  0.392   <Total>
   3053      1.1  christos 188860269     0.035  37.93  2.682  0.373   erand48_r
   3054      1.1  christos  73623396     0.026  27.59  1.438  0.696   init_data
   3055      1.1  christos  76824434     0.018  18.97  2.182  0.458   drand48
   3056      1.1  christos 134442832     0.013  13.79  5.250  0.190   __drand48_iterate
   3057      1.1  christos @end verbatim
   3058      1.1  christos @end smallexample
   3059      1.1  christos 
   3060      1.1  christos @noindent
   3061      1.1  christos Although this thread hardly uses any CPU cycles, the overall IPC
   3062      1.1  christos of 2.55 is not all that bad.
   3063      1.1  christos 
   3064      1.1  christos @noindent
   3065      1.1  christos Last, we show the tables for threads 2 and 3:
   3066      1.1  christos 
   3067      1.1  christos @smallexample
   3068      1.1  christos @verbatim
   3069      1.1  christos Exp Sel Total
   3070      1.1  christos === === =====
   3071      1.1  christos   1 2       3
   3072      1.1  christos Functions sorted by metric: Exclusive CPU Cycles
   3073      1.1  christos 
   3074      1.1  christos Excl.         Excl. CPU     Excl.  Excl.   Name
   3075      1.1  christos Instructions  Cycles        IPC    CPI
   3076      1.1  christos Executed       sec.      %
   3077      1.1  christos 3716362189    1.298 100.00  1.435  0.697   <Total>
   3078      1.1  christos 3716362189    1.298 100.00  1.435  0.697   mxv_core
   3079      1.1  christos          0    0.      0.    0.     0.      collector_root
   3080      1.1  christos          0    0.      0.    0.     0.      driver_mxv
   3081      1.1  christos 
   3082      1.1  christos Exp Sel Total
   3083      1.1  christos === === =====
   3084      1.1  christos   1 3       3
   3085      1.1  christos Functions sorted by metric: Exclusive CPU Cycles
   3086      1.1  christos 
   3087      1.1  christos Excl.         Excl. CPU     Excl.  Excl.   Name
   3088      1.1  christos Instructions  Cycles        IPC    CPI
   3089      1.1  christos Executed       sec.      %
   3090      1.1  christos 3716362189    1.300 100.00  1.433  0.698   <Total>
   3091      1.1  christos 3716362189    1.300 100.00  1.433  0.698   mxv_core
   3092      1.1  christos          0    0.      0.    0.     0.      collector_root
   3093      1.1  christos          0    0.      0.    0.     0.      driver_mxv
   3094      1.1  christos @end verbatim
   3095      1.1  christos @end smallexample
   3096      1.1  christos 
   3097      1.1  christos It is seen that both execute the same number of instructions and
   3098      1.1  christos take about the same number of CPU cycles.  As a result, the IPC is
   3099      1.1  christos the same for both threads.
   3100      1.1  christos 
   3101      1.1  christos @c -- A new node --------------------------------------------------------------
   3102      1.1  christos @c TBD @node    Additional Features
   3103      1.1  christos @c TBD @section Additional Features
   3104      1.1  christos @c ----------------------------------------------------------------------------
   3105      1.1  christos 
   3106      1.1  christos @c -- A new node --------------------------------------------------------------
   3107      1.1  christos @c TBD @node    More Filtering Capabilities
   3108      1.1  christos @c TBD @subsection More Filtering Capabilities
   3109      1.1  christos @c ----------------------------------------------------------------------------
   3110      1.1  christos 
   3111      1.1  christos @c TBD Cover @code{samples} and @code{seconds}
   3112      1.1  christos 
   3113      1.1  christos @c -- A new node --------------------------------------------------------------
   3114      1.1  christos @node    Java Profiling
   3115      1.1  christos @section Java Profiling
   3116      1.1  christos @c ----------------------------------------------------------------------------
   3117      1.1  christos 
   3118      1.1  christos @IndexSubentry{Options, @code{-j}}
   3119      1.1  christos @IndexSubentry{Java profiling, @code{-j on/off}}
   3120      1.1  christos The @CollectApp{} command supports Java profiling.  The @code{-j on} option
   3121      1.1  christos can be used for this, but since this feature is enabled by default, there is
   3122      1.1  christos no need to set this explicitly.  Java profiling may be disabled through the
   3123      1.1  christos @code{-j off} option.
   3124      1.1  christos 
   3125      1.1  christos The program is compiled as usual and the experiment directory is created
   3126      1.1  christos similar to what we have seen before.  The only difference with a C/C++
   3127      1.1  christos application is that the program has to be explicitly executed by java.
   3128      1.1  christos 
   3129      1.1  christos For example, this is how to generate the experiment data for a Java
   3130      1.1  christos program that has the source code stored in file @code{Pi.java}:
   3131      1.1  christos 
   3132      1.1  christos @cartouche
   3133      1.1  christos @smallexample
   3134      1.1  christos $ javac Pi.java
   3135      1.1  christos $ gprofng collect app -j on -O pi.demo.er java Pi < pi.in
   3136      1.1  christos @end smallexample
   3137      1.1  christos @end cartouche
   3138      1.1  christos 
   3139      1.1  christos Regarding which java is selected to generate the data, @ToolName{}
   3140      1.1  christos first looks for the JDK in the path set in either the
   3141      1.1  christos @IndexSubentry{Java profiling, @code{JDK_HOME}}
   3142      1.1  christos @code{JDK_HOME} environment variable, or in the
   3143      1.1  christos @IndexSubentry{Java profiling, @code{JAVA_PATH}}
   3144      1.1  christos @code{JAVA_PATH} environment variable.  If neither of these variables is
   3145      1.1  christos set, it checks for a JDK in the search path (set in the PATH
   3146      1.1  christos environment variable).  If there is no JDK in this path, it checks for
   3147      1.1  christos the java executable in @code{/usr/java/bin/java}.
   3148      1.1  christos 
   3149      1.1  christos In case additional options need to be passed on to the JVM, the
   3150      1.1  christos @IndexSubentry{Options, @code{-J}}
   3151      1.1  christos @IndexSubentry{Java profiling, @code{-J <string>}}
   3152      1.1  christos @code{-J <string>} option can be used.  The string with the
   3153      1.1  christos option(s) has to be delimited by quotation marks in case
   3154      1.1  christos there is more than one argument.
   3155      1.1  christos 
   3156      1.1  christos The @DisplayText{} command may be used to view the performance data.  There is
   3157      1.1  christos no need for any special options and the same commands as previously discussed
   3158      1.1  christos are supported.
   3159      1.1  christos 
   3160      1.1  christos @IndexSubentry{Options,  @code{-viewmode}}
   3161      1.1  christos @IndexSubentry{Commands, @code{viewmode}}
   3162      1.1  christos @IndexSubentry{Java profiling, different view modes}
   3163      1.1  christos The @code{viewmode} command
   3164      1.1  christos @xref{The Viewmode}
   3165      1.1  christos is very useful to examine the call stacks.
   3166      1.1  christos 
   3167      1.1  christos For example, this is how one can see the native call stacks.  For
   3168      1.1  christos lay-out purposes we have restricted the list to the first five entries:
   3169      1.1  christos 
   3170      1.1  christos @cartouche
   3171      1.1  christos @smallexample
   3172      1.1  christos $ gprofng display text -limit 5 -viewmode machine -calltree pi.demo.er
   3173      1.1  christos @end smallexample
   3174      1.1  christos @end cartouche
   3175      1.1  christos 
   3176      1.1  christos @smallexample
   3177      1.1  christos @verbatim
   3178      1.1  christos Print limit set to 5
   3179      1.1  christos Viewmode set to machine
   3180      1.1  christos Functions Call Tree. Metric: Attributed Total CPU Time
   3181      1.1  christos 
   3182      1.1  christos Attr.      Name
   3183      1.1  christos Total
   3184      1.1  christos CPU sec.
   3185      1.1  christos 1.381      +-<Total>
   3186      1.1  christos 1.171        +-Pi.calculatePi(double)
   3187      1.1  christos 0.110        +-collector_root
   3188      1.1  christos 0.110        |  +-JavaMain
   3189      1.1  christos 0.070        |    +-jni_CallStaticVoidMethod
   3190      1.1  christos @end verbatim
   3191      1.1  christos @end smallexample
   3192      1.1  christos 
   3193      1.1  christos @noindent
   3194      1.1  christos Note that the selection of the viewmode is echoed in the output.
   3195      1.1  christos 
   3196      1.1  christos @c -- A new node --------------------------------------------------------------
   3197      1.1  christos @node    The gprofng Tools
   3198      1.1  christos @chapter The gprofng Tools
   3199      1.1  christos @c ----------------------------------------------------------------------------
   3200      1.1  christos 
   3201      1.1  christos Several tools are included in @ProductName{}.  In subsequent chapters these
   3202      1.1  christos are discussed in detail.  Below a brief description is given, followed by an
   3203      1.1  christos overview of the environment variables that are supported.
   3204      1.1  christos 
   3205      1.1  christos @c -- A new node --------------------------------------------------------------
   3206      1.1  christos @node    Tools Overview
   3207      1.1  christos @section Tools Overview
   3208      1.1  christos @c ----------------------------------------------------------------------------
   3209      1.1  christos 
   3210      1.1  christos The following tools are supported by @ProductName{}:
   3211      1.1  christos 
   3212      1.1  christos @table @code
   3213      1.1  christos 
   3214      1.1  christos @item @CollectApp{}
   3215      1.1  christos @IndexSubentry{@code{gprofng}, @code{collect app}}
   3216      1.1  christos 
   3217      1.1  christos Collects the performance data and stores the results in an experiment
   3218      1.1  christos directory.  There are many options on this tool, but quite often the
   3219      1.1  christos defaults are sufficient.
   3220      1.1  christos An experiment directory is required for the subsequent analysis of
   3221      1.1  christos the results.
   3222      1.1  christos 
   3223      1.1  christos @item @DisplayText{}
   3224      1.1  christos @IndexSubentry{@code{gprofng}, @code{display text}}
   3225      1.1  christos 
   3226      1.1  christos Generates performance reports in ASCII format.  Commandline
   3227      1.1  christos options, and/or commands in a script file are used to control the contents
   3228      1.1  christos and lay-out of the generated report(s).
   3229      1.1  christos 
   3230      1.1  christos @item @DisplayHTML{}
   3231      1.1  christos @IndexSubentry{@code{gprofng}, @code{display html}}
   3232      1.1  christos 
   3233      1.1  christos Takes one or more experiment directories and generates a directory with
   3234      1.1  christos HTML files.  Starting from the index.html file, the performance data
   3235      1.1  christos may be examined in a browser.
   3236      1.1  christos 
   3237      1.1  christos @item @DisplaySRC{}
   3238      1.1  christos @IndexSubentry{@code{gprofng}, @code{display src}}
   3239      1.1  christos 
   3240      1.1  christos Displays the source code, interleaved with the disassembled instructions.
   3241      1.1  christos 
   3242      1.1  christos @item @Archive{}
   3243      1.1  christos @IndexSubentry{@code{gprofng}, @code{archive}}
   3244      1.1  christos 
   3245      1.1  christos Archives an experiment directory by (optionally) including source code and
   3246      1.1  christos object files, as well as the shared libraries that have been used.
   3247      1.1  christos 
   3248      1.1  christos @item @GUI{}
   3249      1.1  christos @IndexSubentry{@code{gprofng}, @code{display gui}}
   3250      1.1  christos 
   3251      1.1  christos This is an optional component that can be installed in addition to the
   3252      1.1  christos command line gprofng tools listed above.  It supports the graphical
   3253      1.1  christos analysis of one or more experiments that have been created using
   3254      1.1  christos @CollectApp{}.
   3255      1.1  christos 
   3256      1.1  christos The GUI part of @ProductName{} is a GNU project.  This is the link to the
   3257      1.1  christos @url{https://savannah.gnu.org/projects/gprofng-gui, gprofng GUI page}.
   3258      1.1  christos This page contains more information (e.g. how to clone the repo).
   3259  1.1.1.2  christos There is also a
   3260      1.1  christos @url{https://ftp.gnu.org/gnu/gprofng-gui, tar file distribution directory}
   3261      1.1  christos with tar files that include everything that is needed to build and install
   3262      1.1  christos the GUI code.  Various versions are available here.
   3263      1.1  christos Be aware that in order to build and use the gprofng GUI, Java needs to be
   3264      1.1  christos installed first.  The minimum Java version required is Java 8.
   3265      1.1  christos 
   3266      1.1  christos @end table
   3267      1.1  christos 
   3268      1.1  christos @c -- A new section -----------------------------------------------------------
   3269      1.1  christos @node    The gprofng.rc file with default settings
   3270      1.1  christos @section The gprofng.rc file with default settings
   3271      1.1  christos @c ----------------------------------------------------------------------------
   3272      1.1  christos The @file{gprofng.rc}
   3273      1.1  christos @cindex gprofng.rc
   3274      1.1  christos file is used to define default settings for the @DisplayText{}, @Archive{},
   3275      1.1  christos and @DisplaySRC{} tools, but the user can override these defaults through
   3276      1.1  christos local configuration settings when building and installing from the source
   3277  1.1.1.2  christos code.
   3278      1.1  christos 
   3279      1.1  christos There are three files that are checked when the tool starts up.  The first
   3280      1.1  christos file has pre-defined settings and comes with the installation, but through
   3281      1.1  christos a hidden file called @file{.gprofng.rc}, the user can (re)define the defaults.
   3282      1.1  christos 
   3283      1.1  christos These are the locations and files that are checked upon starting the above
   3284      1.1  christos mentioned tools:
   3285      1.1  christos 
   3286      1.1  christos @enumerate
   3287      1.1  christos 
   3288      1.1  christos @item
   3289      1.1  christos The system-wide filename is called @file{gprofng.rc} and is located in
   3290  1.1.1.2  christos the @file{/etc} subdirectory in case an RPM was used for the installation.
   3291      1.1  christos 
   3292      1.1  christos If @ProductName{} has been built from the source, this file is in
   3293      1.1  christos subdirectory @file{etc} in the top level installation directory.
   3294      1.1  christos 
   3295      1.1  christos @item
   3296      1.1  christos The user's home directory may have a hidden file called @file{.gprofng.rc}.
   3297      1.1  christos 
   3298      1.1  christos @item
   3299      1.1  christos The directory where @DisplayText{} (or @DisplaySRC{}) is invoked from, may
   3300      1.1  christos have a hidden file called @file{.gprofng.rc}.
   3301      1.1  christos 
   3302      1.1  christos @end enumerate
   3303      1.1  christos 
   3304      1.1  christos The settings of each file override the settings of the file(s) read before it.
   3305      1.1  christos Defaults in the system-wide file are overruled by the file in the user home
   3306      1.1  christos directory (if any) and any settings in the @file{.gprofng.rc} file in the
   3307      1.1  christos current directory override those.
   3308      1.1  christos 
   3309      1.1  christos Note that the settings in these files only affect the defaults.  Unlike
   3310      1.1  christos the commands used in a script file, they are not commands for the tools.
   3311      1.1  christos 
   3312      1.1  christos @c -- TBD indxobj_define, 
   3313      1.1  christos @c -- TBD object_show, 
   3314      1.1  christos @c -- TBD object_hide, 
   3315      1.1  christos @c -- TBD object_api, 
   3316      1.1  christos 
   3317      1.1  christos The @file{.gprofng.rc} configuration files can contain the
   3318      1.1  christos @command{addpath},
   3319      1.1  christos @command{compare}, 
   3320      1.1  christos @command{dthresh}, 
   3321      1.1  christos @command{name}, 
   3322      1.1  christos @command{pathmap},
   3323      1.1  christos @command{printmode}, 
   3324      1.1  christos @command{sthresh},
   3325      1.1  christos and
   3326      1.1  christos @command{viewmode} 
   3327      1.1  christos commands as described in this user guide.
   3328      1.1  christos 
   3329      1.1  christos They can also contain the following commands, @emph{which cannot be used on
   3330      1.1  christos either the command line, or in a script file}:
   3331      1.1  christos 
   3332      1.1  christos @table @code
   3333      1.1  christos 
   3334      1.1  christos @item dmetrics @var{metric-spec}
   3335      1.1  christos @IndexSubentry{Commands, @code{dmetrics}}
   3336      1.1  christos 
   3337      1.1  christos Specify the default metrics to be displayed or printed in the function list.
   3338      1.1  christos The syntax and use of the metric list is described in section
   3339      1.1  christos @ref{Metric Definitions}.
   3340      1.1  christos The order of the metric keywords in the list determines the order in which
   3341      1.1  christos the metrics are presented.
   3342      1.1  christos 
   3343      1.1  christos Default metrics for the @code{callers-callees} list are derived from the
   3344      1.1  christos function list default metrics by adding the corresponding attributed metric
   3345      1.1  christos before the first occurrence of each metric name in the list.
   3346      1.1  christos 
   3347      1.1  christos @item dsort @var{metric-spec}
   3348      1.1  christos @IndexSubentry{Commands, @code{dsort}}
   3349      1.1  christos 
   3350      1.1  christos Specify the default metric by which the function list is sorted.  The sort
   3351      1.1  christos metric is the first metric in this list that matches a metric in any loaded
   3352      1.1  christos experiment, subject to the following conditions:
   3353      1.1  christos 
   3354      1.1  christos @itemize @bullet
   3355      1.1  christos 
   3356      1.1  christos @item
   3357      1.1  christos If the entry in @var{metric-spec} has a visibility string of an exclamation
   3358      1.1  christos point (@samp{!}), the first metric whose name matches is used, regardless of
   3359      1.1  christos whether it is visible.
   3360      1.1  christos 
   3361      1.1  christos @item
   3362      1.1  christos If the entry in @var{metric-spec} has any other visibility string, the first
   3363      1.1  christos visible metric whose name matches is used.
   3364      1.1  christos 
   3365      1.1  christos @end itemize
   3366      1.1  christos 
   3367      1.1  christos The syntax and use of the metric list is described in section
   3368      1.1  christos @ref{Metric Definitions}.
   3369      1.1  christos The default sort metric for the @code{callers-callees} list is the attributed
   3370      1.1  christos metric corresponding to the default sort metric for the function list.
   3371      1.1  christos 
   3372      1.1  christos @item en_desc @{on | off | =@var{regex}@}
   3373      1.1  christos @IndexSubentry{Commands, @code{en_desc}}
   3374      1.1  christos 
   3375      1.1  christos Set the mode for reading descendant experiments to @samp{on} (enable all
   3376  1.1.1.2  christos descendants) or @samp{off} to disable all descendants.  If
   3377      1.1  christos @samp{=}@var{regex} is used, enable data from those experiments whose
   3378      1.1  christos executable name matches the regular expression.
   3379      1.1  christos 
   3380      1.1  christos The default setting is @samp{on} to follow all descendants.  In reading
   3381      1.1  christos experiments with descendants, any sub-experiments that contain little or
   3382      1.1  christos no performance data are ignored by @DisplayText{}.
   3383      1.1  christos 
   3384      1.1  christos @end table
   3385      1.1  christos 
   3386      1.1  christos @c -- A new section -----------------------------------------------------------
   3387      1.1  christos @node    Filters
   3388      1.1  christos @section Filters
   3389      1.1  christos @c ----------------------------------------------------------------------------
   3390      1.1  christos 
   3391      1.1  christos Various filter commands are supported by @DisplayText{}.
   3392  1.1.1.2  christos Thanks to the use of filters, the user can zoom in on a certain area of
   3393      1.1  christos interest.  With filters, it is possible to select one or more threads to
   3394      1.1  christos focus on, define a window in time, select specific call stacks, etc.
   3395      1.1  christos @IndexSubentry{Filters, Intro}
   3396      1.1  christos 
   3397      1.1  christos While already powerful by themselves, filters may be combined to further
   3398      1.1  christos narrow down the view into the data.
   3399      1.1  christos 
   3400      1.1  christos @IndexSubentry{Filters, Persistence}
   3401      1.1  christos It is important to note that filters are @emph{persistent}.  A filter is
   3402      1.1  christos active until it is reset.  This means that successive filter commands
   3403      1.1  christos increasingly narrow down the view until one or more are reset.
   3404      1.1  christos 
   3405      1.1  christos @noindent
   3406      1.1  christos An example is the following:
   3407      1.1  christos 
   3408      1.1  christos @cartouche
   3409      1.1  christos @smallexample
   3410      1.1  christos $ gprofng display text -thread_select 1 -functions \
   3411      1.1  christos                        -cpu_select 2 -functions @dots{}
   3412      1.1  christos @end smallexample
   3413      1.1  christos @end cartouche
   3414      1.1  christos 
   3415      1.1  christos This command selects thread 1 and requests the function view for this thread.
   3416      1.1  christos The third (@command{cpu_select 2}) command @emph{adds} the
   3417      1.1  christos constraint that only the events on CPU 2 are to be selected.  This means
   3418      1.1  christos that the next function view selects events that were executed by thread 1 and
   3419      1.1  christos have been running on CPU 2.
   3420      1.1  christos 
   3421      1.1  christos @noindent
   3422      1.1  christos In contrast with this single command line, the two commands below look similar,
   3423      1.1  christos but behave very differently:
   3424      1.1  christos 
   3425      1.1  christos @cartouche
   3426      1.1  christos @smallexample
   3427      1.1  christos $ gprofng display text -thread_select 1 -functions @dots{}
   3428      1.1  christos $ gprofng display text -cpu_select 2 -functions @dots{}
   3429      1.1  christos @end smallexample
   3430      1.1  christos @end cartouche
   3431      1.1  christos 
   3432      1.1  christos The first command displays the function view for thread 1.  The second
   3433      1.1  christos command shows the function view for CPU 2 for @emph{all} threads that have
   3434      1.1  christos been running on this CPU.
   3435      1.1  christos 
   3436      1.1  christos As the following example demonstrates, things get a little more tricky in
   3437      1.1  christos case a script file is used.  Consider the following script file:
   3438      1.1  christos 
   3439      1.1  christos @smallexample
   3440      1.1  christos @verbatim
   3441      1.1  christos thread_select 1
   3442      1.1  christos functions
   3443      1.1  christos cpu_select 2
   3444      1.1  christos functions
   3445      1.1  christos @end verbatim
   3446      1.1  christos @end smallexample
   3447      1.1  christos 
   3448      1.1  christos This script file displays the function view for thread 1 first.  This is
   3449      1.1  christos followed by those functions that were executed by thread 1 @emph{and} have
   3450      1.1  christos been run on CPU 2.
   3451      1.1  christos 
   3452      1.1  christos If however, the script should behave like the two command line invocations
   3453      1.1  christos shown above, the thread selection filter needs to be reset before CPU 2 is
   3454      1.1  christos selected:
   3455      1.1  christos 
   3456      1.1  christos @smallexample
   3457      1.1  christos @verbatim
   3458      1.1  christos thread_select 1
   3459      1.1  christos functions
   3460      1.1  christos # Reset the thread selection filter:
   3461      1.1  christos thread_select all
   3462      1.1  christos cpu_select 2
   3463      1.1  christos functions
   3464      1.1  christos @end verbatim
   3465      1.1  christos @end smallexample
   3466      1.1  christos 
   3467  1.1.1.2  christos In general, filters behave differently than commands or options.  In
   3468      1.1  christos particular there may be an interaction between different filter definitions.
   3469      1.1  christos 
   3470      1.1  christos For example, as explained above, in the first script file the
   3471      1.1  christos @command{thread_select} and @command{cpu_select} commands interact.
   3472      1.1  christos 
   3473      1.1  christos For a list of all the predefined filters see @ref{Predefined Filters}.
   3474      1.1  christos 
   3475      1.1  christos @c -- A new section -----------------------------------------------------------
   3476      1.1  christos @node    Supported Environment Variables
   3477      1.1  christos @section Supported Environment Variables
   3478      1.1  christos @c ----------------------------------------------------------------------------
   3479      1.1  christos 
   3480      1.1  christos Various environment variables are supported.  We refer to the man page for
   3481      1.1  christos gprofng(1) for an overview and description
   3482      1.1  christos (@xref{Man page for @command{gprofng}}).
   3483      1.1  christos 
   3484      1.1  christos @c -- A new chapter -----------------------------------------------------------
   3485      1.1  christos @node    Performance Data Collection
   3486      1.1  christos @chapter Performance Data Collection
   3487      1.1  christos @c ----------------------------------------------------------------------------
   3488      1.1  christos 
   3489      1.1  christos The @CollectApp{} command is used to gather the application performance data
   3490      1.1  christos while the application executes.
   3491      1.1  christos At regular intervals, program execution is halted and the required data is
   3492      1.1  christos recorded.
   3493      1.1  christos @cindex Experiment directory
   3494      1.1  christos An experiment directory is created when the tool starts.  This directory is
   3495      1.1  christos used to store the relevant information and forms the basis for a subsequent
   3496      1.1  christos analysis with one of the viewing tools.
   3497      1.1  christos 
   3498      1.1  christos @c -- A new section -----------------------------------------------------------
   3499      1.1  christos @node    The @command{gprofng collect app} Tool
   3500      1.1  christos @section The @command{gprofng collect app} Tool
   3501      1.1  christos @c ----------------------------------------------------------------------------
   3502      1.1  christos 
   3503      1.1  christos This is the command to collect the performance information for the target
   3504      1.1  christos application.  The usage is as follows:
   3505      1.1  christos 
   3506      1.1  christos @cartouche
   3507      1.1  christos @smallexample
   3508      1.1  christos $ gprofng collect app [OPTION(S)] TARGET [TARGET_ARGUMENTS]
   3509      1.1  christos @end smallexample
   3510      1.1  christos @end cartouche
   3511      1.1  christos 
   3512      1.1  christos @noindent
   3513      1.1  christos Options to the command are passed in first.  This is followed by the name of
   3514      1.1  christos the target, which is typically a binary executable or a script, followed by
   3515      1.1  christos any options that may be required by the target.
   3516      1.1  christos 
   3517      1.1  christos @c -- A new section -----------------------------------------------------------
   3518      1.1  christos @node    View the Performance Information
   3519      1.1  christos @chapter View the Performance Information
   3520      1.1  christos @c ----------------------------------------------------------------------------
   3521      1.1  christos Various tools to view the performance data stored in one or more experiment
   3522      1.1  christos directories are available.  In this chapter, these will all be covered in
   3523      1.1  christos detail.
   3524      1.1  christos 
   3525      1.1  christos @c -- A new section -----------------------------------------------------------
   3526      1.1  christos @node    The @command{gprofng display text} Tool
   3527      1.1  christos @section The @command{gprofng display text} Tool
   3528      1.1  christos @c ----------------------------------------------------------------------------
   3529      1.1  christos 
   3530      1.1  christos This tool displays the performance information in ASCII format.  It supports
   3531      1.1  christos a variety of views into the data recorded.
   3532      1.1  christos These views can be specified in two ways and both may be used simultaneously:
   3533      1.1  christos 
   3534      1.1  christos @itemize @bullet
   3535      1.1  christos 
   3536      1.1  christos @item
   3537      1.1  christos Command line options start with a dash (@samp{-}) symbol and may take an
   3538      1.1  christos argument.
   3539      1.1  christos 
   3540      1.1  christos @item
   3541      1.1  christos Options may also be included in a file, the ``script file''.  In this case,
   3542      1.1  christos the dash symbol should @emph{not} be included.  Multiple script files can
   3543      1.1  christos be used on the same command line.
   3544      1.1  christos 
   3545      1.1  christos @end itemize
   3546      1.1  christos 
   3547      1.1  christos While they may appear as an option, they are really commands and this is
   3548      1.1  christos why they will be referred to as @emph{commands}
   3549      1.1  christos @cindex Commands
   3550      1.1  christos in the documentation.
   3551      1.1  christos 
   3552      1.1  christos As a general rule, @emph{the order of options matters} and if the same option,
   3553      1.1  christos or command, occurs multiple times, the rightmost setting is selected.
   3554      1.1  christos 
   3555      1.1  christos @c -- A new sub section -------------------------------------------------------
   3556      1.1  christos @node       The @code{gprofng display text} Commands
   3557      1.1  christos @subsection The @code{gprofng display text} Commands
   3558      1.1  christos @c ----------------------------------------------------------------------------
   3559      1.1  christos 
   3560      1.1  christos The most commonly used commands are documented in the man page for this tool
   3561      1.1  christos (@xref{Man page for @command{gprofng display text}}).  In this section we list
   3562      1.1  christos and describe all other commands that are supported.
   3563      1.1  christos 
   3564      1.1  christos @c -- A new sub subsection ----------------------------------------------------
   3565      1.1  christos @node                Commands that List Experiment Details
   3566      1.1  christos @unnumberedsubsubsec Commands that List Experiment Details
   3567      1.1  christos @c ----------------------------------------------------------------------------
   3568      1.1  christos 
   3569      1.1  christos @table @code
   3570      1.1  christos 
   3571      1.1  christos @item experiment_ids
   3572      1.1  christos @ifclear man
   3573      1.1  christos @IndexSubentry{Options,  @code{-experiment_ids}}
   3574      1.1  christos @IndexSubentry{Commands, @code{experiment_ids}}
   3575      1.1  christos @end ifclear
   3576      1.1  christos 
   3577      1.1  christos For each experiment that has been loaded, show the totals of the metrics
   3578      1.1  christos recorded, plus some other operational characteristics like the name of
   3579      1.1  christos the executable, PID, etc.  The top line contains the accumulated totals
   3580      1.1  christos for the metrics.
   3581      1.1  christos 
   3582      1.1  christos @item experiment_list
   3583      1.1  christos @ifclear man
   3584      1.1  christos @IndexSubentry{Options,  @code{-experiment_list}}
   3585      1.1  christos @IndexSubentry{Commands, @code{experiment_list}}
   3586      1.1  christos @end ifclear
   3587      1.1  christos 
   3588      1.1  christos Display the list of experiments that are loaded.  Each experiment is listed
   3589      1.1  christos with an index, which is used when selecting samples, threads, or LWPs, and
   3590      1.1  christos a process id (PID), which can be used for advanced filtering.
   3591      1.1  christos 
   3592      1.1  christos @item cpu_list
   3593      1.1  christos @IndexSubentry{Options,  @code{-cpu_list}}
   3594      1.1  christos @IndexSubentry{Commands, @code{cpu_list}}
   3595      1.1  christos 
   3596      1.1  christos Display the total number of CPUs that have been used during the experiment(s).
   3597      1.1  christos 
   3598      1.1  christos @item cpus
   3599      1.1  christos @IndexSubentry{Options,  @code{-cpus}}
   3600      1.1  christos @IndexSubentry{Commands, @code{cpus}}
   3601      1.1  christos 
   3602      1.1  christos Show a list of CPUs that were used by the application, along with the metrics
   3603  1.1.1.2  christos that have been recorded.  The CPUs are represented by a CPU number and show
   3604  1.1.1.2  christos the Total CPU time by default.
   3605      1.1  christos 
   3606      1.1  christos Note that since the data is sorted with respect to the default metric, it may
   3607      1.1  christos be useful to use the @command{sort name} command to show the list sorted with
   3608      1.1  christos respect to the CPU id.
   3609      1.1  christos 
   3610      1.1  christos @item GCEvents
   3611      1.1  christos @IndexSubentry{Options,  @code{-GCEvents}}
   3612      1.1  christos @IndexSubentry{Commands, @code{GCEvents}}
   3613      1.1  christos 
   3614      1.1  christos This commands is for Java applications only.  It shows any Garbage Collection
   3615  1.1.1.2  christos (GC) events that have occurred while the application was executing.
   3616      1.1  christos 
   3617      1.1  christos @item lwp_list
   3618      1.1  christos @IndexSubentry{Options,  @code{-lwp_list}}
   3619      1.1  christos @IndexSubentry{Commands, @code{lwp_list}}
   3620      1.1  christos 
   3621      1.1  christos Displays the list of LWPs processed during the experiment(s).
   3622      1.1  christos 
   3623      1.1  christos @item processes
   3624      1.1  christos @IndexSubentry{Options,  @code{-processes}}
   3625      1.1  christos @IndexSubentry{Commands, @code{processes}}
   3626      1.1  christos 
   3627      1.1  christos For each experiment that has been loaded, this command displays a list of
   3628      1.1  christos processes that were created by the application, along with their metrics.
   3629      1.1  christos The processes are represented by process ID (PID) numbers and show the
   3630  1.1.1.2  christos Total CPU time metric by default.  If additional metrics are recorded in
   3631      1.1  christos an experiment, these are shown as well.
   3632      1.1  christos 
   3633      1.1  christos @item samples
   3634      1.1  christos @IndexSubentry{Options,  @code{-samples}}
   3635      1.1  christos @IndexSubentry{Commands, @code{samples}}
   3636      1.1  christos 
   3637  1.1.1.2  christos Display a list of sample points and their metrics, which reflect the
   3638      1.1  christos microstates recorded at each sample point in the loaded experiment.
   3639      1.1  christos The samples are represented by sample numbers and show the Total CPU time
   3640      1.1  christos by default.  Other metrics might also be displayed if enabled.
   3641      1.1  christos 
   3642      1.1  christos @item sample_list
   3643      1.1  christos @IndexSubentry{Options,  @code{-sample_list}}
   3644      1.1  christos @IndexSubentry{Commands, @code{sample_list}}
   3645      1.1  christos 
   3646      1.1  christos For each experiment loaded, display the list of samples currently selected.
   3647      1.1  christos 
   3648      1.1  christos @item seconds
   3649      1.1  christos @IndexSubentry{Options,  @code{-seconds}}
   3650      1.1  christos @IndexSubentry{Commands, @code{seconds}}
   3651      1.1  christos 
   3652      1.1  christos Show each second of the profiling run that was captured in the experiment,
   3653      1.1  christos along with the metrics collected in that second.  The seconds view differs
   3654      1.1  christos from the samples view in that it shows periodic samples that occur every
   3655      1.1  christos second beginning at 0 and the interval cannot be changed.
   3656      1.1  christos 
   3657      1.1  christos The seconds view lists the seconds of execution with the Total CPU time by
   3658      1.1  christos default.  Other metrics might also be displayed if the metrics are present
   3659      1.1  christos in the loaded experiments.
   3660      1.1  christos 
   3661      1.1  christos @item threads
   3662      1.1  christos @IndexSubentry{Options,  @code{-threads}}
   3663      1.1  christos @IndexSubentry{Commands, @code{threads}}
   3664      1.1  christos 
   3665      1.1  christos Show a list of threads and their metrics.  The threads are represented
   3666      1.1  christos by a process and thread pair and show the Total CPU time by default.
   3667      1.1  christos Other metrics might also be displayed by default if the metrics are
   3668      1.1  christos present in the loaded experiment.
   3669      1.1  christos 
   3670      1.1  christos @item thread_list
   3671      1.1  christos @IndexSubentry{Options,  @code{-thread_list}}
   3672      1.1  christos @IndexSubentry{Commands, @code{thread_list}}
   3673      1.1  christos 
   3674      1.1  christos Display the list of threads currently selected for the analysis.
   3675      1.1  christos 
   3676      1.1  christos @end table
   3677      1.1  christos 
   3678      1.1  christos @noindent
   3679      1.1  christos @emph{The commands below are for use in scripts and interactive mode only.
   3680      1.1  christos They are not allowed on the command line.}
   3681      1.1  christos 
   3682      1.1  christos @table @code
   3683      1.1  christos 
   3684      1.1  christos @item add_exp @var{exp-name}
   3685      1.1  christos @IndexSubentry{Commands, @code{add_exp}}
   3686      1.1  christos 
   3687      1.1  christos Add the named experiment to the current session.
   3688      1.1  christos 
   3689      1.1  christos @item drop_exp @var{exp-name}
   3690      1.1  christos @IndexSubentry{Commands, @code{drop_exp}}
   3691      1.1  christos 
   3692      1.1  christos Drop the named experiment from the current session.
   3693      1.1  christos 
   3694      1.1  christos @item open_exp @var{exp-name}
   3695      1.1  christos @IndexSubentry{Commands, @code{open_exp}}
   3696      1.1  christos 
   3697      1.1  christos Drop all loaded experiments from the session, and then load the named
   3698      1.1  christos experiment.
   3699      1.1  christos 
   3700      1.1  christos @end table
   3701      1.1  christos 
   3702      1.1  christos @c -- A new sub subsection ----------------------------------------------------
   3703      1.1  christos @node                Commands that Affect Listings and Output
   3704      1.1  christos @unnumberedsubsubsec Commands that Affect Listings and Output
   3705      1.1  christos @c ----------------------------------------------------------------------------
   3706      1.1  christos 
   3707      1.1  christos @table @code
   3708      1.1  christos 
   3709      1.1  christos @item dthresh @var{value}
   3710      1.1  christos @IndexSubentry{Options,  @code{-dthresh}}
   3711      1.1  christos @IndexSubentry{Commands, @code{dthresh}}
   3712      1.1  christos 
   3713      1.1  christos Specify the threshold percentage for highlighting metrics in the annotated
   3714      1.1  christos disassembly code.  If the value of any metric is equal to or greater than
   3715      1.1  christos @var{value} as a percentage of the maximum value of that metric for any
   3716      1.1  christos instruction line in the file, the line on which the metrics occur has a
   3717      1.1  christos @samp{##} marker inserted at the beginning of the line.  The default is 75.
   3718      1.1  christos 
   3719      1.1  christos @item printmode @{text | html | @var{single-char}@}
   3720      1.1  christos @IndexSubentry{Options,  @code{-printmode}}
   3721      1.1  christos @IndexSubentry{Commands, @code{printmode}}
   3722      1.1  christos 
   3723  1.1.1.2  christos Set the print mode.  If the keyword is @code{text}, printing will be done in
   3724      1.1  christos tabular form using plain text.  In case the @code{html} keyword is selected,
   3725      1.1  christos the output is formatted as an HTML table.
   3726      1.1  christos 
   3727      1.1  christos Alternatively, @var{single-char} may be used in a delimiter separated list,
   3728      1.1  christos with the single character @var{single-char} as the delimiter.
   3729      1.1  christos 
   3730      1.1  christos The printmode setting is used only for those commands that generate tables,
   3731      1.1  christos such as @command{functions}.  The setting is ignored for other printing
   3732      1.1  christos commands, including those showing source and disassembly listings.
   3733      1.1  christos 
   3734      1.1  christos @item sthresh @var{value}
   3735      1.1  christos @IndexSubentry{Options,  @code{-sthresh}}
   3736      1.1  christos @IndexSubentry{Commands, @code{sthresh}}
   3737      1.1  christos 
   3738      1.1  christos Specify the threshold percentage for highlighting metrics in the annotated
   3739      1.1  christos source code.  If the value of any metric is equal to or greater than
   3740      1.1  christos @var{value} (as a percentage) of the maximum value of that metric for any
   3741      1.1  christos source line in the file, the line on which the metrics occur has a @samp{##}
   3742      1.1  christos marker inserted at the beginning of the line.  The default is 75.
   3743      1.1  christos 
   3744      1.1  christos @end table
   3745      1.1  christos 
   3746      1.1  christos @c -- A new sub subsection ----------------------------------------------------
   3747      1.1  christos @node                Predefined Filters
   3748      1.1  christos @unnumberedsubsubsec Predefined Filters
   3749      1.1  christos @c ----------------------------------------------------------------------------
   3750      1.1  christos 
   3751      1.1  christos The filters below use a list, the selection list, to define a sequence of
   3752      1.1  christos numbers. @xref{The Selection List}.
   3753      1.1  christos Note that this selection is persistent, but the filter can be reset by using
   3754      1.1  christos @samp{all} as the @var{selection-list}.
   3755      1.1  christos 
   3756      1.1  christos @table @code
   3757      1.1  christos 
   3758      1.1  christos @item cpu_select @var{selection-list}
   3759      1.1  christos @IndexSubentry{Options,  @code{-cpu_select}}
   3760      1.1  christos @IndexSubentry{Commands, @code{cpu_select}}
   3761      1.1  christos 
   3762      1.1  christos Select the CPU ids specified in the @var{selection-list}.
   3763      1.1  christos 
   3764      1.1  christos @item lwp_select @var{selection-list}
   3765      1.1  christos @IndexSubentry{Options,  @code{-lwp_select}}
   3766      1.1  christos @IndexSubentry{Commands, @code{lwp_select}}
   3767      1.1  christos 
   3768      1.1  christos Select the LWPs specified in the @var{selection-list}.
   3769      1.1  christos 
   3770      1.1  christos @item sample_select @var{selection-list}
   3771      1.1  christos @IndexSubentry{Options,  @code{-sample-select}}
   3772      1.1  christos @IndexSubentry{Commands, @code{sample-select}}
   3773      1.1  christos 
   3774      1.1  christos @item thread_select @var{selection-list}
   3775      1.1  christos @IndexSubentry{Options,  @code{-thread_select}}
   3776      1.1  christos @IndexSubentry{Commands, @code{thread_select}}
   3777      1.1  christos 
   3778      1.1  christos Select a series of threads, or just one, to be used in subsequent views.
   3779      1.1  christos The @var{selection-list} consists of a sequence of comma separated numbers.
   3780      1.1  christos This may include a range of the form @samp{n-m}.
   3781      1.1  christos 
   3782      1.1  christos @end table
   3783      1.1  christos 
   3784      1.1  christos 
   3785      1.1  christos @c -- A new sub subsection ----------------------------------------------------
   3786      1.1  christos @node                Commands to Set and Change Search Paths
   3787      1.1  christos @unnumberedsubsubsec Commands to Set and Change Search Paths
   3788      1.1  christos @c ----------------------------------------------------------------------------
   3789      1.1  christos 
   3790      1.1  christos @table @code
   3791      1.1  christos 
   3792      1.1  christos @item addpath @var{path-list}
   3793      1.1  christos @IndexSubentry{Options,  @code{-addpath}}
   3794      1.1  christos @IndexSubentry{Commands, @code{addpath}}
   3795      1.1  christos 
   3796      1.1  christos Append @var{path-list} to the current setpath settings.  Note that multiple
   3797      1.1  christos @command{addpath} commands can be used in @file{.gprofng.rc} files, and will
   3798      1.1  christos be concatenated.
   3799      1.1  christos 
   3800      1.1  christos @item pathmap @var{old-prefix} @var{new-prefix}
   3801      1.1  christos @IndexSubentry{Options,  @code{-pathmap}}
   3802      1.1  christos @IndexSubentry{Commands, @code{pathmap}}
   3803      1.1  christos 
   3804      1.1  christos If a file cannot be found using the path list set by @command{addpath}, or
   3805  1.1.1.2  christos the @command{setpath} command, one or more path remappings may be set with the
   3806      1.1  christos @command{pathmap} command.
   3807      1.1  christos 
   3808      1.1  christos With path mapping, the user can specify how to replace the leading component
   3809      1.1  christos in a full path by a different string.
   3810      1.1  christos 
   3811      1.1  christos With this command, any path name for a source file, object file, or shared
   3812      1.1  christos object that begins with the prefix specified with @var{old-prefix}, the
   3813      1.1  christos old prefix is replaced by the prefix specified with @var{new-prefix}.
   3814      1.1  christos The resulting path is used to find the file.
   3815      1.1  christos 
   3816      1.1  christos For example, if a source file located in directory @file{/tmp}
   3817      1.1  christos is shown in the @DisplayText{} output, but should instead be taken from
   3818      1.1  christos @file{/home/demo}, the following @file{pathmap} command redefines the
   3819  1.1.1.2  christos path:
   3820      1.1  christos 
   3821      1.1  christos @smallexample
   3822      1.1  christos $ gprofng diplay text -pathmap /tmp /home/demo -source ...
   3823      1.1  christos @end smallexample
   3824      1.1  christos 
   3825      1.1  christos Note that multiple @command{pathmap} commands can be supplied, and each is
   3826      1.1  christos tried until the file is found.
   3827      1.1  christos 
   3828      1.1  christos @item setpath @var{path-list}
   3829      1.1  christos @IndexSubentry{Options,  @code{-setpath}}
   3830      1.1  christos @IndexSubentry{Commands, @code{setpath}}
   3831      1.1  christos 
   3832  1.1.1.2  christos Set the path used to find source and object files.  The path is defined
   3833  1.1.1.2  christos through the @var{path-list} keyword.  It is a colon separated list of
   3834      1.1  christos directories, jar files, or zip files.
   3835      1.1  christos If any directory has a colon character in it, escape it with a
   3836      1.1  christos backslash (@samp{\}).
   3837      1.1  christos 
   3838      1.1  christos The special directory name @code{$expts}, refers
   3839      1.1  christos to the set of current experiments in the order in which they were loaded.
   3840      1.1  christos You can abbreviate it with a single @samp{$} character.
   3841      1.1  christos 
   3842      1.1  christos The default path is @samp{$expts:..} which is the directories of the
   3843      1.1  christos loaded experiments and the current working directory.
   3844      1.1  christos 
   3845  1.1.1.2  christos Use @command{setpath} with no argument to display the current path.
   3846      1.1  christos 
   3847      1.1  christos Note that @command{setpath} commands @emph{are not allowed .gprofng.rc
   3848      1.1  christos configuration files}.
   3849      1.1  christos 
   3850      1.1  christos @end table
   3851      1.1  christos 
   3852      1.1  christos @c -- A new subsection --------------------------------------------------------
   3853      1.1  christos @c -- TBD @node       Usage examples for @code{gprofng display text}
   3854      1.1  christos @c -- TBD @subsection Usage examples for @code{gprofng display text}
   3855      1.1  christos @c ----------------------------------------------------------------------------
   3856      1.1  christos 
   3857      1.1  christos @c -- TBD In this section we present usage examples.
   3858      1.1  christos 
   3859      1.1  christos @c -- A new chapter -----------------------------------------------------------
   3860      1.1  christos @c  TBD @node    The @code{gprofng display html} Tool
   3861      1.1  christos @c  TBD @section The @code{gprofng display html} Tool
   3862      1.1  christos @c ----------------------------------------------------------------------------
   3863      1.1  christos @c  TBD The options are documented in the man page for this tool. In this section we
   3864      1.1  christos @c  TBD present usage examples.
   3865      1.1  christos 
   3866      1.1  christos @c -- A new chapter -----------------------------------------------------------
   3867      1.1  christos @c  TBD @node    Display Source Code
   3868      1.1  christos @c  TBD @chapter Display Source Code
   3869      1.1  christos @c ----------------------------------------------------------------------------
   3870      1.1  christos @c  TBD The options are documented in the man page for this tool. In this section we
   3871      1.1  christos @c  TBD present usage examples.
   3872      1.1  christos 
   3873      1.1  christos 
   3874      1.1  christos @c -- A new chapter -----------------------------------------------------------
   3875      1.1  christos @c  TBD @node    Archive Experiment Data
   3876      1.1  christos @c  TBD @chapter Archive Experiment Data
   3877      1.1  christos @c ----------------------------------------------------------------------------
   3878      1.1  christos @c  TBD The options are documented in the man page for this tool. In this section we
   3879      1.1  christos @c  TBD present usage examples.
   3880      1.1  christos 
   3881      1.1  christos @c -- A new chapter -----------------------------------------------------------
   3882      1.1  christos @node    Terminology
   3883      1.1  christos @chapter Terminology
   3884      1.1  christos @c ----------------------------------------------------------------------------
   3885      1.1  christos 
   3886      1.1  christos Throughout this manual, certain terminology specific to profiling tools,
   3887      1.1  christos or @ToolName{}, or even to this document only, is used.  In this chapter
   3888      1.1  christos this terminology is explained in detail.
   3889      1.1  christos 
   3890      1.1  christos @menu
   3891      1.1  christos * The Program Counter::                    What is a Program Counter?
   3892      1.1  christos * Inclusive and Exclusive Metrics::        An explanation of inclusive and exclusive metrics.
   3893      1.1  christos * Metric Definitions::                     Definitions associated with metrics.
   3894      1.1  christos * The Viewmode::                           Select the way call stacks are presented.
   3895      1.1  christos * The Selection List::                     How to define a selection.
   3896      1.1  christos * Load Objects and Functions::             The components in an application.
   3897      1.1  christos * The Concept of a CPU in gprofng::        The definition of a CPU.
   3898      1.1  christos * Hardware Event Counters Explained::      What are event counters?
   3899      1.1  christos * apath::                                  Our generic definition of a path.
   3900      1.1  christos @end menu
   3901      1.1  christos 
   3902      1.1  christos @c ----------------------------------------------------------------------------
   3903      1.1  christos @node    The Program Counter
   3904      1.1  christos @section The Program Counter
   3905      1.1  christos @c ----------------------------------------------------------------------------
   3906      1.1  christos 
   3907      1.1  christos @cindex PC
   3908      1.1  christos @cindex Program Counter
   3909  1.1.1.2  christos The @emph{Program Counter}, or PC for short, keeps track where program
   3910  1.1.1.2  christos execution is.  The address of the next instruction to be executed is stored
   3911  1.1.1.2  christos in a special purpose register in the processor, or core.
   3912      1.1  christos 
   3913      1.1  christos @cindex Instruction pointer
   3914      1.1  christos The PC is sometimes also referred to as the @emph{instruction pointer}, but
   3915      1.1  christos we will use Program Counter or PC throughout this document.
   3916      1.1  christos 
   3917      1.1  christos @c ----------------------------------------------------------------------------
   3918      1.1  christos @node    Inclusive and Exclusive Metrics
   3919      1.1  christos @section Inclusive and Exclusive Metrics
   3920      1.1  christos @c ----------------------------------------------------------------------------
   3921      1.1  christos 
   3922      1.1  christos In the remainder, these two concepts occur quite often and for lack of a better
   3923      1.1  christos place, they are explained here.
   3924      1.1  christos 
   3925      1.1  christos @cindex Inclusive metric
   3926      1.1  christos The @emph{inclusive} value for a metric includes all values that are part of
   3927      1.1  christos the dynamic extent of the target function.  For example if function @code{A}
   3928      1.1  christos calls functions @code{B} and @code{C}, the inclusive CPU time for @code{A}
   3929      1.1  christos includes the CPU time spent in @code{B} and @code{C}.
   3930      1.1  christos 
   3931      1.1  christos @cindex Exclusive metric
   3932      1.1  christos In contrast with this, the @emph{exclusive} value for a metric is computed
   3933  1.1.1.2  christos by excluding the metric values used by other functions called.  In our
   3934  1.1.1.2  christos imaginary example, the exclusive CPU time for function @code{A} is the
   3935  1.1.1.2  christos time spent outside calling functions @code{B} and @code{C}.
   3936      1.1  christos 
   3937      1.1  christos @cindex Leaf function
   3938      1.1  christos In case of a @emph{leaf function}, the inclusive and exclusive values for the
   3939      1.1  christos metric are the same since by definition, it is not calling any other
   3940      1.1  christos function(s).
   3941      1.1  christos 
   3942      1.1  christos Why do we use these two different values? The inclusive metric shows the most
   3943      1.1  christos expensive path, in terms of this metric, in the application.  For example, if
   3944      1.1  christos the metric is cache misses, the function with the highest inclusive metric
   3945      1.1  christos tells you where most of the cache misses come from.
   3946      1.1  christos 
   3947      1.1  christos Within this branch of the application, the exclusive metric points to the
   3948      1.1  christos functions that contribute and help to identify which part(s) to consider
   3949      1.1  christos for further analysis.
   3950      1.1  christos 
   3951      1.1  christos @c ----------------------------------------------------------------------------
   3952      1.1  christos @node    Metric Definitions
   3953      1.1  christos @section Metric Definitions
   3954      1.1  christos @c ----------------------------------------------------------------------------
   3955      1.1  christos The metrics displayed in the various views are highly customizable.  In this
   3956      1.1  christos section it is explained how to construct the metrics definition(s).
   3957      1.1  christos 
   3958      1.1  christos @IndexSubentry{Options,  @code{-metrics}}
   3959      1.1  christos @IndexSubentry{Commands, @code{metrics}}
   3960      1.1  christos The @command{metrics} command takes a colon (@samp{:}) separated list, where
   3961      1.1  christos each item in the list consists of the following three fields:
   3962      1.1  christos @var{<flavor>}@var{<visibility>}@var{<metric-name>}.
   3963      1.1  christos 
   3964      1.1  christos @cindex Flavor field
   3965      1.1  christos @cindex Visibility field
   3966      1.1  christos @cindex Metric name field
   3967      1.1  christos @IndexSubentry{Metrics, Flavor field}
   3968      1.1  christos @IndexSubentry{Metrics, Visibility field}
   3969      1.1  christos @IndexSubentry{Metrics, Metric name field}
   3970      1.1  christos The @var{<flavor>} field is either @samp{e} for ``exclusive'', and/or
   3971      1.1  christos @samp{i} for ``inclusive''.  The @var{<metric-name>} field is the name of
   3972      1.1  christos the metric and the @var{<visibility>} field consists of one ore more characters
   3973      1.1  christos from the following table:
   3974      1.1  christos 
   3975      1.1  christos @table @code
   3976      1.1  christos 
   3977      1.1  christos @item .
   3978      1.1  christos Show the metric as time.  This applies to timing metrics and hardware event
   3979      1.1  christos counters that measure cycles.  Interpret as @samp{+} for other metrics.
   3980      1.1  christos 
   3981      1.1  christos @item %
   3982      1.1  christos Show the metric as a percentage of the total value for this metric.
   3983      1.1  christos 
   3984      1.1  christos @item +
   3985      1.1  christos Show the metric as an absolute value.  For hardware event counters this is
   3986      1.1  christos the event count.  Interpret as @samp{.} for timing metrics.
   3987      1.1  christos 
   3988      1.1  christos @item !
   3989      1.1  christos Do not show any metric value.  Cannot be used with other visibility characters.
   3990      1.1  christos This visibility is meant to be used in a @command{dmetrics} command to set
   3991      1.1  christos default metrics that override the built-in visibility defaults
   3992      1.1  christos for each type of metric.
   3993      1.1  christos 
   3994      1.1  christos @end table
   3995      1.1  christos 
   3996      1.1  christos Both the @var{<flavor>} and @var{<visibility>} strings may have more than one
   3997      1.1  christos character.  If both strings have more than one character, the @var{<flavor>}
   3998      1.1  christos string is expanded first.  For example, @code{ie.%user} is first expanded to
   3999      1.1  christos @code{i.%user:e.%user}, which is then expanded into
   4000      1.1  christos @code{i.user:i%user:e.user:e%user}.
   4001      1.1  christos 
   4002      1.1  christos @c ----------------------------------------------------------------------------
   4003      1.1  christos @node    The Viewmode
   4004      1.1  christos @section The Viewmode
   4005      1.1  christos 
   4006      1.1  christos @cindex Viewmode
   4007      1.1  christos @IndexSubentry{Options,  @code{-viewmode}}
   4008      1.1  christos @IndexSubentry{Commands, @code{viewmode}}
   4009      1.1  christos 
   4010      1.1  christos There are different ways to view a call stack in Java.  In @ToolName{}, this
   4011      1.1  christos is called the @emph{viewmode} and the setting is controlled through a command
   4012      1.1  christos with the same name.
   4013      1.1  christos 
   4014      1.1  christos The @code{viewmode} command takes one of the following keywords:
   4015      1.1  christos 
   4016      1.1  christos @table @code
   4017      1.1  christos 
   4018      1.1  christos @item user
   4019      1.1  christos This is the default and shows the Java call stacks for Java threads.
   4020      1.1  christos No call stacks for any housekeeping threads are shown.  The function
   4021      1.1  christos list contains a function
   4022      1.1  christos @IndexSubentry{Java profiling, @code{<JVM-System>}}
   4023      1.1  christos @code{<JVM-System>} that represents the aggregated time from non-Java
   4024      1.1  christos threads.
   4025      1.1  christos When the JVM software does not report a Java call stack, time is reported
   4026      1.1  christos against the function
   4027      1.1  christos @IndexSubentry{Java profiling, @code{<no Java callstack recorded>}}
   4028      1.1  christos @code{<no Java callstack recorded>}.
   4029      1.1  christos 
   4030      1.1  christos @item expert
   4031      1.1  christos Show the Java call stacks for Java threads when the Java code from the
   4032      1.1  christos user is executed and machine call stacks when JVM code is executed, or
   4033      1.1  christos when the JVM software does not report a Java call stack.
   4034      1.1  christos Show the machine call stacks for housekeeping threads.
   4035      1.1  christos 
   4036      1.1  christos @item machine
   4037      1.1  christos Show the actual native call stacks for all threads.
   4038      1.1  christos 
   4039      1.1  christos @end table
   4040      1.1  christos 
   4041      1.1  christos @c ----------------------------------------------------------------------------
   4042      1.1  christos @node    The Selection List
   4043      1.1  christos @section The Selection List
   4044      1.1  christos @c ----------------------------------------------------------------------------
   4045      1.1  christos 
   4046      1.1  christos @cindex Selection list
   4047      1.1  christos @cindex List specification
   4048      1.1  christos Several commands allow the user to specify a sequence of numbers called the
   4049      1.1  christos @emph{selection list}.  Such a list may for example be used to select specific
   4050      1.1  christos threads from all the threads that have been used when conducting the
   4051      1.1  christos experiment(s).
   4052      1.1  christos 
   4053      1.1  christos A selection list (or ``list'' in the remainder of this section) can be a
   4054      1.1  christos single number, a contiguous range of numbers with the start and end numbers
   4055      1.1  christos separated by a hyphen (@samp{-}), a comma-separated list of numbers and
   4056      1.1  christos ranges, or the @code{all} keyword that resets the filter.
   4057      1.1  christos @IndexSubentry{Filters, Reset to default}
   4058      1.1  christos Lists must not contain spaces.
   4059      1.1  christos 
   4060      1.1  christos Each list can optionally be preceded by an experiment list with a similar
   4061      1.1  christos format, separated from the list by a colon (:).
   4062      1.1  christos If no experiment list is included, the list applies to all experiments.
   4063      1.1  christos 
   4064      1.1  christos Multiple lists can be concatenated by separating the individual lists
   4065      1.1  christos by a plus sign.
   4066      1.1  christos 
   4067      1.1  christos These are some examples of various filters using a list:
   4068      1.1  christos 
   4069      1.1  christos @table @code
   4070      1.1  christos 
   4071      1.1  christos @item thread_select 1
   4072      1.1  christos Select thread 1 from all experiments.
   4073      1.1  christos 
   4074      1.1  christos @item thread_select all:1
   4075      1.1  christos Select thread 1 from all experiments.
   4076      1.1  christos 
   4077      1.1  christos @item thread_select 1:all
   4078      1.1  christos Select all the threads from the first experiment loaded.
   4079      1.1  christos 
   4080      1.1  christos @item thread_select 1:2+3:4
   4081      1.1  christos Select thread 2 from experiment 1 and thread 4 from experiment 3.
   4082      1.1  christos 
   4083      1.1  christos @item cpu_select all:1,3,5
   4084      1.1  christos Selects cores 1, 3, and 5 from all experiments.
   4085      1.1  christos 
   4086      1.1  christos @item cpu_select 1,2:all
   4087      1.1  christos Select all cores from experiments 1 and 2.
   4088      1.1  christos 
   4089      1.1  christos @end table
   4090      1.1  christos 
   4091      1.1  christos Recall that there are several list commands that show the mapping between the
   4092  1.1.1.2  christos numbers and the targets.
   4093      1.1  christos 
   4094      1.1  christos @IndexSubentry{Options, @code{-experiment_list}}
   4095      1.1  christos @IndexSubentry{Commands, @code{experiment_list}}
   4096      1.1  christos For example, the @command{experiment_list} command shows the name(s) of the
   4097  1.1.1.2  christos experiment(s) loaded and the associated number.  In this example it is used
   4098      1.1  christos to get this information for a range of experiments:
   4099      1.1  christos 
   4100      1.1  christos @cartouche
   4101      1.1  christos @smallexample
   4102      1.1  christos $ gprofng display text -experiment_list mxv.?.thr.er
   4103      1.1  christos @end smallexample
   4104      1.1  christos @end cartouche
   4105      1.1  christos 
   4106      1.1  christos @noindent
   4107      1.1  christos This is the output, showing for each experiment the ID, the PID, and the name:
   4108      1.1  christos 
   4109      1.1  christos @smallexample
   4110      1.1  christos @verbatim
   4111      1.1  christos ID Sel     PID Experiment
   4112      1.1  christos == === ======= ============
   4113      1.1  christos  1 yes 2750071 mxv.1.thr.er
   4114      1.1  christos  2 yes 1339450 mxv.2.thr.er
   4115      1.1  christos  3 yes 3579561 mxv.4.thr.er
   4116      1.1  christos @end verbatim
   4117      1.1  christos @end smallexample
   4118      1.1  christos 
   4119      1.1  christos @c ----------------------------------------------------------------------------
   4120      1.1  christos @node    Load Objects and Functions
   4121      1.1  christos @section Load Objects and Functions
   4122      1.1  christos @c ----------------------------------------------------------------------------
   4123      1.1  christos 
   4124      1.1  christos An application consists of various components.  The source code files are
   4125      1.1  christos compiled into object files.  These are then glued together at link time to form
   4126      1.1  christos the executable.
   4127      1.1  christos During execution, the program may also dynamically load objects.
   4128      1.1  christos 
   4129      1.1  christos @cindex Load object
   4130  1.1.1.2  christos A @emph{load object} is defined to be an executable, or shared object.  A
   4131  1.1.1.2  christos shared library is an example of a load object in @ToolName{}.
   4132      1.1  christos 
   4133  1.1.1.2  christos Each load object, contains a text section with the instructions generated by
   4134  1.1.1.2  christos the compiler, a data section for data, and various symbol tables.
   4135      1.1  christos All load objects must contain an
   4136      1.1  christos @cindex ELF
   4137      1.1  christos ELF
   4138      1.1  christos symbol table, which gives the names and addresses of all the globally known
   4139      1.1  christos functions in that object.
   4140      1.1  christos 
   4141  1.1.1.2  christos Load objects compiled with the -g option contain additional symbolic
   4142  1.1.1.2  christos information that can augment the ELF symbol table and provide information
   4143  1.1.1.2  christos about functions that are not global, additional information about object
   4144  1.1.1.2  christos modules from which the functions came, and line number information relating
   4145  1.1.1.2  christos addresses to source lines.
   4146      1.1  christos 
   4147      1.1  christos The term
   4148      1.1  christos @cindex Function
   4149      1.1  christos @emph{function}
   4150      1.1  christos is used to describe a set of instructions that represent a high-level operation
   4151  1.1.1.2  christos described in the source code.  The term also covers methods as used in C++
   4152  1.1.1.2  christos and in the Java programming language.
   4153      1.1  christos 
   4154      1.1  christos In the @ToolName{} context, functions are provided in source code format.
   4155  1.1.1.2  christos Normally their names appear in the symbol table representing a set of
   4156  1.1.1.2  christos addresses.
   4157      1.1  christos @cindex Program Counter
   4158      1.1  christos @cindex PC
   4159  1.1.1.2  christos If the Program Counter (PC) is within that set, the program is executing within
   4160  1.1.1.2  christos that function.
   4161      1.1  christos 
   4162  1.1.1.2  christos In principle, any address within the text segment of a load object can be
   4163  1.1.1.2  christos mapped to a function.  Exactly the same mapping is used for the leaf PC and
   4164  1.1.1.2  christos all the other PCs on the call stack.
   4165  1.1.1.2  christos 
   4166  1.1.1.2  christos Most of the functions correspond directly to the source model of the program,
   4167  1.1.1.2  christos but there are exceptions.  This topic is however outside of the scope of this
   4168  1.1.1.2  christos guide.
   4169      1.1  christos 
   4170      1.1  christos @c ----------------------------------------------------------------------------
   4171      1.1  christos @node    The Concept of a CPU in gprofng
   4172      1.1  christos @section The Concept of a CPU in gprofng
   4173      1.1  christos @c ----------------------------------------------------------------------------
   4174      1.1  christos 
   4175      1.1  christos @cindex CPU
   4176      1.1  christos In @ProductName{}, there is the concept of a CPU.  Admittedly, this is not the
   4177      1.1  christos best word to describe what is meant here and may be replaced in the future.
   4178      1.1  christos 
   4179      1.1  christos The word CPU is used in many of the displays.
   4180      1.1  christos In the context of @ProductName{}, it is meant to denote a part of the
   4181      1.1  christos processor that is capable of executing instructions and with its own state,
   4182      1.1  christos like the program counter.
   4183      1.1  christos 
   4184      1.1  christos For example, on a contemporary processor, a CPU could be a core.  In case
   4185      1.1  christos hardware threads are supported within a core, a CPU is one of those
   4186      1.1  christos hardware threads.
   4187      1.1  christos 
   4188      1.1  christos To see which CPUs have been used in the experiment, use the @command{cpu}
   4189      1.1  christos command in @DisplayText{}.
   4190      1.1  christos 
   4191      1.1  christos @c ----------------------------------------------------------------------------
   4192      1.1  christos @node    Hardware Event Counters Explained
   4193      1.1  christos @section Hardware Event Counters Explained
   4194      1.1  christos @c ----------------------------------------------------------------------------
   4195      1.1  christos 
   4196      1.1  christos @IndexSubentry{Hardware event counters, description}
   4197      1.1  christos For quite a number of years now, many microprocessors have supported hardware
   4198      1.1  christos event counters.
   4199      1.1  christos 
   4200      1.1  christos On the hardware side, this means that in the processor there are one or more
   4201      1.1  christos registers dedicated to count certain activities, or ``events''.
   4202  1.1.1.2  christos Examples of such events are the number of instructions executed, or the
   4203  1.1.1.2  christos number of cache misses at level 2 in the memory hierarchy.
   4204      1.1  christos 
   4205      1.1  christos While there is a limited set of such registers, the user can map events onto
   4206      1.1  christos them.  In case more than one register is available, this allows for the
   4207      1.1  christos simultaenous measurement of various events.
   4208      1.1  christos 
   4209      1.1  christos A simple, yet powerful, example is to simultaneously count the number of CPU
   4210      1.1  christos cycles and the number of instructions excuted.  These two numbers can then be
   4211      1.1  christos used to compute the
   4212      1.1  christos @cindex IPC
   4213  1.1.1.2  christos @emph{IPC} value.  IPC stands for ``Instructions Per Clockcycle'' and each
   4214  1.1.1.2  christos processor has an architecturally defined maximum.  For example, if this
   4215  1.1.1.2  christos maximum number is 2, it means the processor is capable of executing two
   4216  1.1.1.2  christos instructions every clock cycle.
   4217      1.1  christos 
   4218      1.1  christos Whether this is actually achieved, depends on several factors, including the
   4219      1.1  christos instruction characteristics.
   4220      1.1  christos However, in case the IPC value is well below this maximum in a time critical
   4221      1.1  christos part of the application and this cannot be easily explained, further
   4222      1.1  christos investigation is probably warranted.
   4223      1.1  christos 
   4224      1.1  christos @cindex CPI
   4225      1.1  christos A related metric is called @emph{CPI}, or ``Clockcycles Per Instruction''.
   4226      1.1  christos It is the inverse of the CPI and can be compared against the theoretical
   4227      1.1  christos value(s) of the target instruction(s).  A significant difference may point
   4228      1.1  christos at a bottleneck.
   4229      1.1  christos 
   4230      1.1  christos One thing to keep in mind is that the value returned by a counter can either
   4231      1.1  christos be the number of times the event occured, or a CPU cycle count.  In case of
   4232      1.1  christos the latter it is possible to convert this number to time.
   4233      1.1  christos 
   4234      1.1  christos @IndexSubentry{Hardware event counters, variable CPU frequency}
   4235      1.1  christos This is often easier to interpret than a simple count, but there is one
   4236      1.1  christos caveat to keep in mind.  The CPU frequency may not have been constant while
   4237      1.1  christos the experimen was recorded and this impacts the time reported.
   4238      1.1  christos 
   4239      1.1  christos These event counters, or ``counters'' for short, provide great insight into
   4240      1.1  christos what happens deep inside the processor.  In case higher level information does
   4241      1.1  christos not provide the insight needed, the counters provide the information to get
   4242      1.1  christos to the bottom of a performance problem.
   4243      1.1  christos 
   4244      1.1  christos There are some things to consider though.
   4245      1.1  christos 
   4246      1.1  christos @itemize @bullet
   4247      1.1  christos 
   4248      1.1  christos @item
   4249      1.1  christos The event definitions and names vary across processors and it may even happen
   4250      1.1  christos that some events change with an update.
   4251      1.1  christos Unfortunately and this is luckily rare, there are sometimes bugs causing the
   4252      1.1  christos wrong count to be returned.
   4253      1.1  christos 
   4254      1.1  christos @IndexSubentry{Hardware event counters, alias name}
   4255      1.1  christos In @ToolName{}, some of the processor specific event names have an alias
   4256      1.1  christos name.  For example @code{insts} measures the instructions executed.
   4257      1.1  christos These aliases not only makes it easier to identify the functionality, but also
   4258      1.1  christos provide portability of certain events across processors.
   4259      1.1  christos 
   4260      1.1  christos @item
   4261      1.1  christos Another complexity is that there are typically many events one can monitor.
   4262      1.1  christos There may up to hundreds of events available and it could require several
   4263      1.1  christos experiments to zoom in on the root cause of a performance problem.
   4264      1.1  christos 
   4265      1.1  christos @item
   4266      1.1  christos There may be restrictions regarding the mapping of event(s) onto the
   4267      1.1  christos counters.  For example, certain events may be restricted to specific
   4268      1.1  christos counters only.  As a result, one may have to conduct additional experiments
   4269      1.1  christos to cover all the events of interest.
   4270      1.1  christos 
   4271      1.1  christos @item
   4272      1.1  christos The names of the events may also not be easy to interpret.  In such cases,
   4273      1.1  christos the description can be found in the architecture manual for the processor.
   4274      1.1  christos 
   4275      1.1  christos @end itemize
   4276      1.1  christos 
   4277      1.1  christos Despite these drawbacks, hardware event counters are extremely useful and
   4278      1.1  christos may even turn out to be indispensable.
   4279      1.1  christos 
   4280      1.1  christos @c ----------------------------------------------------------------------------
   4281      1.1  christos @node    apath
   4282      1.1  christos @section What is <apath>?
   4283      1.1  christos @c ----------------------------------------------------------------------------
   4284      1.1  christos 
   4285      1.1  christos In most cases, @ToolName{} shows the absolute pathnames of directories.  These
   4286      1.1  christos tend to be rather long, causing display issues in this document.
   4287      1.1  christos 
   4288      1.1  christos Instead of wrapping these long pathnames over multiple lines, we decided to
   4289      1.1  christos represent them by the @code{<apath>} symbol, which stands for ``an absolute
   4290      1.1  christos pathname''.
   4291      1.1  christos 
   4292      1.1  christos Note that different occurrences of @code{<apath>} may represent different
   4293      1.1  christos absolute pathnames.
   4294      1.1  christos 
   4295      1.1  christos @c -- A new node --------------------------------------------------------------
   4296      1.1  christos @node    Other Document Formats
   4297      1.1  christos @chapter Other Document Formats
   4298      1.1  christos @c ----------------------------------------------------------------------------
   4299      1.1  christos 
   4300      1.1  christos @emph{This chapter is applicable when building gprofng from the
   4301      1.1  christos binutils source.}
   4302      1.1  christos 
   4303      1.1  christos This document is written in Texinfo and the source text is made available as
   4304      1.1  christos part of the binutils distribution.  The file name is @code{gprofng.texi} and
   4305      1.1  christos can be found in subdirectory @code{gprofng/doc} of the top level binutils
   4306      1.1  christos directory.
   4307      1.1  christos 
   4308      1.1  christos The default installation procedure creates a file in the @code{info} format and
   4309      1.1  christos stores it in the documentation section of binutils.
   4310      1.1  christos This source file can however also be used to generate the document in the
   4311      1.1  christos @code{html} and @code{pdf} formats.  These may be easier to read and search.
   4312      1.1  christos 
   4313      1.1  christos To generate this documentation file in a different format, go to the directory
   4314      1.1  christos that was used to build the tools.  The make file to build the other formats is
   4315      1.1  christos in the @code{gprofng/doc} subdirectory.
   4316      1.1  christos 
   4317      1.1  christos For example, if you have set the build directory to be @var{<my-build-dir>},
   4318      1.1  christos go to subdirectory @var{<my-build-dir>/gprofng/doc}.
   4319      1.1  christos 
   4320  1.1.1.2  christos This subdirectory has a single file called @file{Makefile} that can be used to
   4321  1.1.1.2  christos build the documentation in various formats.  We recommend to use these
   4322  1.1.1.2  christos commands.
   4323      1.1  christos 
   4324      1.1  christos There are four commands to generate the documentation in the @code{html} or
   4325      1.1  christos @code{pdf} format.  It is assumed that you are in directory @code{gprofng/doc}
   4326      1.1  christos under the main directory @var{<my-build-dir>}.
   4327      1.1  christos 
   4328      1.1  christos @table @code
   4329      1.1  christos 
   4330      1.1  christos @item make html
   4331      1.1  christos Create the html file in the current directory.
   4332      1.1  christos 
   4333      1.1  christos @item make pdf
   4334      1.1  christos Create the pdf file in the current directory.
   4335      1.1  christos 
   4336      1.1  christos @item make install-html
   4337      1.1  christos Create and install the html file in the binutils documentation directory.
   4338      1.1  christos 
   4339      1.1  christos @item make install-pdf
   4340  1.1.1.2  christos Create and install the pdf file in the binutils documentation directory.
   4341      1.1  christos 
   4342      1.1  christos @end table
   4343      1.1  christos 
   4344  1.1.1.2  christos For example, to install this document in the binutils documentation directory,
   4345  1.1.1.2  christos the commands below may be executed.  In this notation, @var{<format>}
   4346      1.1  christos is one of @code{html}, or @code{pdf}:
   4347      1.1  christos 
   4348      1.1  christos @smallexample
   4349      1.1  christos @verbatim
   4350      1.1  christos $ cd <my-build-dir>/gprofng/doc
   4351      1.1  christos $ make install-<format>
   4352      1.1  christos @end verbatim
   4353      1.1  christos @end smallexample
   4354      1.1  christos 
   4355  1.1.1.2  christos The binutils installation directory is either the default @code{/usr/local} or
   4356  1.1.1.2  christos the one that has been set with the @code{--prefix} option as part of the
   4357  1.1.1.2  christos @code{configure} command.  In this example we symbolize this location with
   4358  1.1.1.2  christos @code{<install>}.
   4359      1.1  christos 
   4360      1.1  christos The documentation directory is @code{<install>/share/doc/gprofng} in case
   4361      1.1  christos @code{html} or @code{pdf} is selected and @code{<install>/share/info} for the
   4362      1.1  christos file in the @code{info} format.
   4363      1.1  christos 
   4364      1.1  christos @noindent
   4365      1.1  christos Some things to note:
   4366      1.1  christos 
   4367      1.1  christos @itemize
   4368      1.1  christos 
   4369      1.1  christos @item
   4370      1.1  christos For the @code{pdf} file to be generated, the @code{texi2dvi} tool is required.
   4371      1.1  christos @cindex texi2dvi
   4372      1.1  christos It is for example available as part of the @code{texinfo-tex} package.
   4373      1.1  christos 
   4374      1.1  christos @item
   4375      1.1  christos Instead of generating a single file in the @code{html} format, it is also
   4376      1.1  christos possible to create a directory with individual files for the various chapters.
   4377      1.1  christos To do so, remove the use of @code{--no-split} in variable @code{MAKEINFOHTML}
   4378      1.1  christos in the make file in the @code{<my-build-dir/gprofng/doc} directory.
   4379      1.1  christos 
   4380      1.1  christos @end itemize
   4381      1.1  christos 
   4382      1.1  christos @c -- An appendix -------------------------------------------------------------
   4383      1.1  christos @node     The gprofng Man Pages
   4384      1.1  christos @appendix The gprofng Man Pages
   4385      1.1  christos @c ----------------------------------------------------------------------------
   4386      1.1  christos 
   4387      1.1  christos In this appendix the man pages for the various @ProductName{} tools are listed.
   4388      1.1  christos 
   4389      1.1  christos @c -- A new node --------------------------------------------------------------
   4390      1.1  christos @c  @node    gprofng driver
   4391      1.1  christos @node    Man page for @command{gprofng}
   4392      1.1  christos @section Man page for @command{gprofng}
   4393      1.1  christos @c ----------------------------------------------------------------------------
   4394      1.1  christos 
   4395      1.1  christos @include gprofng.texi
   4396      1.1  christos 
   4397      1.1  christos @c -- A new node --------------------------------------------------------------
   4398      1.1  christos @page
   4399      1.1  christos @node    Man page for @command{gprofng collect app}
   4400      1.1  christos @section Man page for @command{gprofng collect app}
   4401      1.1  christos @c ----------------------------------------------------------------------------
   4402      1.1  christos 
   4403  1.1.1.2  christos @include gprofng-collect-app.texi
   4404      1.1  christos 
   4405      1.1  christos @c -- A new node --------------------------------------------------------------
   4406      1.1  christos @page
   4407      1.1  christos @node    Man page for @command{gprofng display text}
   4408      1.1  christos @section Man page for @command{gprofng display text}
   4409      1.1  christos @c ----------------------------------------------------------------------------
   4410      1.1  christos 
   4411  1.1.1.2  christos @include gprofng-display-text.texi
   4412      1.1  christos 
   4413      1.1  christos @c -- A new node --------------------------------------------------------------
   4414      1.1  christos @page
   4415      1.1  christos @node    Man page for @command{gprofng display html}
   4416      1.1  christos @section Man page for @command{gprofng display html}
   4417      1.1  christos @c ----------------------------------------------------------------------------
   4418      1.1  christos 
   4419  1.1.1.2  christos @include gprofng-display-html.texi
   4420      1.1  christos 
   4421      1.1  christos @c -- A new node --------------------------------------------------------------
   4422      1.1  christos @page
   4423      1.1  christos @node    Man page for @command{gprofng display src}
   4424      1.1  christos @section Man page for @command{gprofng display src}
   4425      1.1  christos @c ----------------------------------------------------------------------------
   4426      1.1  christos 
   4427  1.1.1.2  christos @include gprofng-display-src.texi
   4428      1.1  christos 
   4429      1.1  christos @c -- A new node --------------------------------------------------------------
   4430      1.1  christos @page
   4431      1.1  christos @node    Man page for @command{gprofng archive}
   4432      1.1  christos @section Man page for @command{gprofng archive}
   4433      1.1  christos @c ----------------------------------------------------------------------------
   4434      1.1  christos 
   4435  1.1.1.2  christos @include gprofng-archive.texi
   4436      1.1  christos 
   4437      1.1  christos @ifnothtml
   4438      1.1  christos @node       Index
   4439      1.1  christos @unnumbered Index
   4440      1.1  christos @printindex cp
   4441      1.1  christos @end ifnothtml
   4442      1.1  christos 
   4443      1.1  christos @bye
   4444