1 1.1 christos \input texinfo @c -*-texinfo-*- 2 1.1 christos 3 1.1 christos @c ---------------------------------------------------------------------------- 4 1.1 christos @c This is the Texinfo source file for the GPROFNG manual. This manual 5 1.1 christos @c includes the man pages for the various tools. 6 1.1 christos @c 7 1.1 christos @c Author: Ruud van der Pas 8 1.1 christos @c ---------------------------------------------------------------------------- 9 1.1 christos 10 1.1 christos @c %**start of header 11 1.1 christos 12 1.1 christos @setfilename gprofng.info 13 1.1 christos @settitle GNU gprofng 14 1.1 christos 15 1.1 christos @c -- Set the indent for the @example command to 1 space, not 5 --------------- 16 1.1 christos @exampleindent 1 17 1.1 christos 18 1.1 christos @paragraphindent 3 19 1.1 christos 20 1.1 christos @c %**end of header 21 1.1 christos 22 1.1 christos @c -- Start a new chapter on a new, odd numbered, page ------------------------ 23 1.1 christos @setchapternewpage odd 24 1.1 christos 25 1.1 christos @c -- Merge all index entries into the Concepts Index ------------------------- 26 1.1 christos @syncodeindex fn cp 27 1.1 christos @syncodeindex ky cp 28 1.1 christos @syncodeindex pg cp 29 1.1 christos @syncodeindex vr cp 30 1.1 christos 31 1.1 christos @c -- Macros specific to gprofng ---------------------------------------------- 32 1.1 christos @include gp-macros.texi 33 1.1 christos 34 1.1 christos @c -- Get the version information --------------------------------------------- 35 1.1 christos @include version.texi 36 1.1 christos 37 1.1 christos @c -- Entry for the Info dir structure ---------------------------------------- 38 1.1 christos @ifnottex 39 1.1 christos @dircategory Software development 40 1.1 christos @direntry 41 1.1 christos * gprofng: (gprofng). The next generation profiling tool for Linux 42 1.1 christos @end direntry 43 1.1 christos @end ifnottex 44 1.1 christos 45 1.1 christos @c -- Copyright stuff --------------------------------------------------------- 46 1.1 christos @copying 47 1.1 christos This document is the manual for @ProductName{}, last updated @value{UPDATED}. 48 1.1 christos 49 1.1.1.2 christos Copyright @copyright{} 2022-2025 Free Software Foundation, Inc. 50 1.1 christos 51 1.1 christos @c -- @quotation 52 1.1 christos Permission is granted to copy, distribute and/or modify this document 53 1.1 christos under the terms of the GNU Free Documentation License, 54 1.1 christos Version 1.3 or any later version published by the Free Software 55 1.1 christos Foundation; with no Invariant Sections, with no Front-Cover texts, 56 1.1 christos and with no Back-Cover Texts. A copy of the license is included in the 57 1.1 christos section entitled ``GNU Free Documentation License.'' 58 1.1 christos 59 1.1 christos @c -- @end quotation 60 1.1 christos @end copying 61 1.1 christos 62 1.1 christos @finalout 63 1.1 christos @smallbook 64 1.1 christos 65 1.1 christos @c -- Define the title page --------------------------------------------------- 66 1.1 christos @titlepage 67 1.1 christos @title GNU gprofng 68 1.1 christos @subtitle The next generation profiling tool for Linux 69 1.1 christos @subtitle version @value{VERSION} (last updated @value{UPDATED}) 70 1.1 christos @author Ruud van der Pas 71 1.1 christos @page 72 1.1 christos @vskip 0pt plus 1filll 73 1.1 christos @insertcopying 74 1.1 christos @end titlepage 75 1.1 christos 76 1.1 christos @c -- Generate the Table of Contents ------------------------------------------ 77 1.1 christos @contents 78 1.1 christos 79 1.1 christos @c -- The Top node ------------------------------------------------------------ 80 1.1 christos @c Should contain a short summary, copying permissions and a master menu. 81 1.1 christos @c ---------------------------------------------------------------------------- 82 1.1 christos @ifnottex 83 1.1 christos @node Top 84 1.1 christos @top GNU Gprofng 85 1.1 christos 86 1.1 christos @insertcopying 87 1.1 christos @end ifnottex 88 1.1 christos 89 1.1 christos @ifinfo 90 1.1 christos @c -- The menu entries -------------------------------------------------------- 91 1.1 christos 92 1.1 christos @c * Display Source Code:: Display the source code and disassembly. 93 1.1 christos @c * Archive Experiment Data:: Archive an experiment. 94 1.1 christos 95 1.1 christos @menu 96 1.1 christos * Introduction:: About this manual. 97 1.1 christos * Overview:: A brief overview of @ProductName{}. 98 1.1.1.2 christos * A Mini Tutorial:: A short tutorial with the key features. 99 1.1 christos * The gprofng Tools:: An overview of the tools supported. 100 1.1 christos * Performance Data Collection:: Record the performance information. 101 1.1 christos * View the Performance Information:: Different ways to view the data. 102 1.1 christos * Terminology:: Concepts and terminology explained. 103 1.1 christos * Other Document Formats:: Create this document in other formats. 104 1.1 christos * The gprofng Man Pages:: The gprofng man pages. 105 1.1 christos * Index:: The index. 106 1.1 christos 107 1.1 christos @detailmenu 108 1.1 christos 109 1.1 christos --- The Detailed Node Listing --- 110 1.1 christos 111 1.1 christos Introduction 112 1.1 christos 113 1.1 christos Overview 114 1.1 christos 115 1.1 christos * Main Features:: A high level overview. 116 1.1.1.2 christos * Sampling versus Tracing:: The pros and cons of both approaches. 117 1.1 christos * Steps Needed to Create a Profile:: How to create a profile. 118 1.1 christos 119 1.1 christos A Mini Tutorial 120 1.1 christos 121 1.1.1.2 christos * Getting Started:: The basics of profiling with gprofng. 122 1.1.1.2 christos * Support for Multithreading:: Commands for multithreaded applications. 123 1.1.1.2 christos * View Multiple Experiments:: Analyze multiple experiments. 124 1.1 christos * Profile Hardware Event Counters:: How to use hardware event counters. 125 1.1 christos * Java Profiling:: How to profile a Java application. 126 1.1 christos 127 1.1 christos The gprofng Tools 128 1.1 christos 129 1.1 christos * Tools Overview:: A brief description of the tools. 130 1.1 christos * The gprofng.rc file with default settings:: Customize the settings. 131 1.1 christos * Filters:: Filters. 132 1.1 christos * Supported Environment Variables:: The supported environment variables. 133 1.1 christos 134 1.1 christos Performance Data Collection 135 1.1 christos 136 1.1 christos * The @command{gprofng collect app} Tool:: Collect application performance data. 137 1.1 christos 138 1.1 christos View the Performance Information 139 1.1 christos 140 1.1 christos * The @command{gprofng display text} Tool:: View the performance data in plain text. 141 1.1 christos 142 1.1 christos Terminology 143 1.1 christos 144 1.1 christos * The Program Counter:: What is a Program Counter? 145 1.1 christos * Inclusive and Exclusive Metrics:: An explanation of inclusive and exclusive metrics. 146 1.1 christos * Metric Definitions:: Definitions associated with metrics. 147 1.1 christos * The Viewmode:: Select the way call stacks are presented. 148 1.1 christos * The Selection List:: How to define a selection. 149 1.1 christos * Load Objects and Functions:: The components in an application. 150 1.1 christos * The Concept of a CPU in gprofng:: The definition of a CPU. 151 1.1 christos * Hardware Event Counters Explained:: What are event counters? 152 1.1 christos * apath:: Our generic definition of a path. 153 1.1 christos 154 1.1 christos The gprofng Man Pages 155 1.1 christos 156 1.1 christos * Man page for @command{gprofng collect app}:: The man page for gprofng collect app. 157 1.1 christos * Man page for @command{gprofng display text}:: The man page for gprofng display text. 158 1.1 christos * Man page for @command{gprofng display html}:: The man page for gprofng display html. 159 1.1 christos * Man page for @command{gprofng display src}:: The man page for gprofng display src. 160 1.1 christos * Man page for @command{gprofng archive}:: The man page for gprofng archive. 161 1.1 christos 162 1.1 christos @c -- Index 163 1.1 christos 164 1.1 christos @end detailmenu 165 1.1 christos @end menu 166 1.1 christos @end ifinfo 167 1.1 christos 168 1.1 christos @c -- A new node -------------------------------------------------------------- 169 1.1 christos @node Introduction 170 1.1 christos @chapter Introduction 171 1.1 christos @c ---------------------------------------------------------------------------- 172 1.1 christos The @ProductName{} tool is the next generation profiler for Linux. It consists 173 1.1 christos of various commands to generate and display profile information. 174 1.1 christos 175 1.1 christos This manual starts with a tutorial how to create and interpret a profile. This 176 1.1 christos part is highly practical and has the goal to get users up to speed as quickly 177 1.1 christos as possible. As soon as possible, we would like to show you how to get your 178 1.1 christos first profile on your screen. 179 1.1 christos 180 1.1 christos This is followed by more examples, covering many of the features. At the 181 1.1 christos end of this tutorial, you should feel confident enough to tackle the more 182 1.1 christos complex tasks. 183 1.1 christos 184 1.1 christos In a future update a more formal reference manual will be included as well. 185 1.1 christos Since even in this tutorial we use certain terminology, we have included a 186 1.1 christos chapter with descriptions at the end. In case you encounter unfamiliar 187 1.1 christos wordings or terminology, please check this chapter. 188 1.1 christos 189 1.1 christos One word of caution. In several cases we had to somewhat tweak the screen 190 1.1 christos output in order to make it fit. This is why the output may look somewhat 191 1.1 christos different when you try things yourself. 192 1.1 christos 193 1.1 christos For now, we wish you a smooth profiling experience with @ProductName{} and 194 1.1 christos good luck tackling performance bottlenecks. 195 1.1 christos 196 1.1 christos @c -- A new node -------------------------------------------------------------- 197 1.1 christos @c cccccc @node A Brief Overview of @ProductName{} 198 1.1 christos @node Overview 199 1.1.1.2 christos @chapter A Brief Overview of gprofng 200 1.1 christos @c ---------------------------------------------------------------------------- 201 1.1 christos 202 1.1 christos @menu 203 1.1 christos * Main Features:: A high level overview. 204 1.1 christos * Sampling versus Tracing:: The pros and cons of sampling versus tracing. 205 1.1 christos * Steps Needed to Create a Profile:: How to create a profile. 206 1.1 christos @end menu 207 1.1 christos 208 1.1 christos Before we cover this tool in quite some detail, we start with a brief overview 209 1.1 christos of what it is, and the main features. Since we know that many of you would 210 1.1 christos like to get started rightaway, already in this first chapter we explain the 211 1.1 christos basics of profiling with @ToolName{}. 212 1.1 christos 213 1.1 christos @c ---------------------------------------------------------------------------- 214 1.1 christos @c TBD Review this text. Probably be more specific on the gcc releases and 215 1.1 christos @c processor specifics. 216 1.1 christos @c ---------------------------------------------------------------------------- 217 1.1 christos 218 1.1 christos @c -- A new node -------------------------------------------------------------- 219 1.1 christos @node Main Features 220 1.1 christos @section Main Features 221 1.1 christos @c ---------------------------------------------------------------------------- 222 1.1 christos 223 1.1 christos @noindent 224 1.1 christos These are the main features of the @ProductName{} tool: 225 1.1 christos 226 1.1 christos @itemize @bullet 227 1.1 christos 228 1.1 christos @item 229 1.1 christos Profiling is supported for an application written in C, C++, Java, or Scala. 230 1.1 christos 231 1.1 christos @c TBD Java: up to 1.8 full support, support other than for modules 232 1.1 christos 233 1.1 christos @item 234 1.1.1.2 christos Shared libraries are supported. The information is presented at the 235 1.1.1.2 christos instruction level. 236 1.1 christos 237 1.1 christos @item 238 1.1 christos The following multithreading programming models are supported: Pthreads, 239 1.1 christos OpenMP, and Java threads. 240 1.1 christos 241 1.1 christos @item 242 1.1.1.2 christos This tool works with unmodified production level executables. There is no need 243 1.1.1.2 christos to recompile the code, but if the @samp{-g} option has been used when building 244 1.1 christos the application, source line level information is available. 245 1.1 christos 246 1.1 christos @item 247 1.1 christos The focus is on support for code generated with the @command{gcc} compiler, but 248 1.1 christos there is some limited support for the @command{icc} compiler as well. Future 249 1.1 christos improvements and enhancements will focus on @command{gcc} though. 250 1.1 christos 251 1.1 christos @item 252 1.1 christos Processors from Intel, AMD, and Arm are supported, but the level of support 253 1.1 christos depends on the architectural details. In particular, hardware event counters 254 1.1 christos may not be supported. If this is the case, all views not related to these 255 1.1 christos counters still ought to work though. 256 1.1 christos 257 1.1 christos @item 258 1.1 christos Several views into the data are supported. For example, a function overview 259 1.1 christos where the time is spent, but also a source line, disassembly, call tree and 260 1.1 christos a caller-callees overview are available. 261 1.1 christos 262 1.1 christos @item 263 1.1 christos Through filters, the user can zoom in on an area of interest. 264 1.1 christos 265 1.1 christos @item 266 1.1.1.2 christos Two or more profiles can be aggregated, or used in a comparison. This 267 1.1.1.2 christos comparison can be obtained at the function, source line, and disassembly 268 1.1.1.2 christos level. 269 1.1 christos 270 1.1 christos @item 271 1.1 christos Through a simple scripting language, and customization of the metrics shown, 272 1.1 christos the generation and creation of a profile can be fully automated and provide 273 1.1 christos tailored output. 274 1.1 christos 275 1.1 christos @end itemize 276 1.1 christos 277 1.1 christos @c -- A new node -------------------------------------------------------------- 278 1.1 christos @node Sampling versus Tracing 279 1.1 christos @section Sampling versus Tracing 280 1.1 christos @c ---------------------------------------------------------------------------- 281 1.1 christos 282 1.1 christos A key difference with some other profiling tools is that the main data 283 1.1 christos collection command @CollectApp{} mostly uses 284 1.1 christos @cindex Program Counter sampling 285 1.1 christos @cindex PC sampling 286 1.1 christos Program Counter (PC) sampling 287 1.1 christos under the hood. 288 1.1 christos 289 1.1 christos With @emph{sampling}, the executable is interrupted at regular intervals. Each 290 1.1 christos time it is halted, key information is gathered and stored. This includes the 291 1.1 christos Program Counter that keeps track of where the execution is. Hence the name. 292 1.1 christos 293 1.1 christos Together with operational data, this information is stored in the experiment 294 1.1 christos directory and can be viewed in the second phase. 295 1.1 christos 296 1.1 christos For example, the PC information is used to derive where the program was when 297 1.1 christos it was halted. Since the sampling interval is known, it is relatively easy to 298 1.1 christos derive how much time was spent in the various parts of the program. 299 1.1 christos 300 1.1 christos The opposite technique is generally referred to as @emph{tracing}. With 301 1.1 christos tracing, the target is instrumented with specific calls that collect the 302 1.1 christos requested information. 303 1.1 christos 304 1.1 christos These are some of the pros and cons of PC sampling verus tracing: 305 1.1 christos 306 1.1 christos @itemize 307 1.1 christos 308 1.1 christos @item 309 1.1 christos Since there is no need to recompile, existing executables can be used 310 1.1 christos and the profile measures the behaviour of exactly the same executable that is 311 1.1 christos used in production runs. 312 1.1 christos 313 1.1.1.2 christos With tracing, one inherently profiles a different executable, because 314 1.1.1.2 christos the calls to the instrumentation library may affect runtime behaviour. 315 1.1 christos 316 1.1 christos @item 317 1.1.1.2 christos With sampling, there are very few restrictions on what can be profiled and 318 1.1.1.2 christos even without access to the source code, a basic profile can be made. 319 1.1 christos 320 1.1 christos @item 321 1.1 christos A downside of sampling is that, depending on the sampling frequency, small 322 1.1 christos functions may be missed or not captured accurately. Although this is rare, 323 1.1.1.2 christos this may happen and is the reason why the user has control over the sampling 324 1.1.1.2 christos rate. 325 1.1 christos 326 1.1 christos @item 327 1.1 christos While tracing produces precise information, sampling is statistical in nature. 328 1.1 christos As a result, small variations may occur across seemingly identical runs. We 329 1.1 christos have not observed more than a few percent deviation though. Especially if 330 1.1 christos the target job executed for a sufficiently long time. 331 1.1 christos 332 1.1 christos @item 333 1.1 christos With sampling, it is not possible to get an accurate count how often 334 1.1 christos functions are called. 335 1.1 christos 336 1.1 christos @end itemize 337 1.1 christos 338 1.1 christos @c -- A new node -------------------------------------------------------------- 339 1.1 christos @node Steps Needed to Create a Profile 340 1.1 christos @section Steps Needed to Create a Profile 341 1.1 christos @c ---------------------------------------------------------------------------- 342 1.1 christos 343 1.1 christos Creating a profile takes two steps. First the profile data needs to be 344 1.1 christos generated. This is followed by a viewing step to create a report from the 345 1.1 christos information that has been gathered. 346 1.1 christos 347 1.1 christos Every @ProductName{} command starts with @ToolName{}, the name of the driver. 348 1.1.1.2 christos This is followed by a keyword to define the high level functionality. 349 1.1.1.2 christos Depending on this keyword, a third qualifier may be needed to further narrow 350 1.1.1.2 christos down the request. 351 1.1.1.2 christos This combination is then followed by options that are specific to the 352 1.1.1.2 christos functionality desired. 353 1.1 christos 354 1.1 christos The command to gather, or ``collect'', the performance data is called 355 1.1 christos @CollectApp{}. Aside from numerous options, this command takes the name 356 1.1 christos of the target executable as an input parameter. 357 1.1 christos 358 1.1 christos Upon completion of the run, the performance data can be 359 1.1 christos found in the newly created 360 1.1 christos @cindex Experiment directory 361 1.1 christos experiment directory. 362 1.1 christos 363 1.1 christos Unless explicitly specified otherwise, a default 364 1.1 christos name for this directory is chosen. The name is @file{test.<n>.er} where 365 1.1 christos @var{<n>} is the first integer number not in use yet for such a name. 366 1.1 christos 367 1.1 christos For example, the first time @CollectApp{} is invoked, an experiment 368 1.1 christos directory with the name @file{test.1.er} is created. 369 1.1 christos Upon a subsequent invocation of @CollectApp{} in the same directory, 370 1.1 christos an experiment directory with the name @file{test.2.er} will be created, 371 1.1 christos and so forth. 372 1.1 christos 373 1.1 christos Note that @CollectApp{} supports an option to explicitly name the experiment 374 1.1 christos directory. 375 1.1 christos Aside from the restriction that the name of this directory has to end 376 1.1 christos with @samp{.er}, any valid directory name can be used for this. 377 1.1 christos 378 1.1 christos Now that we have the performance data, the next step is to display it. 379 1.1 christos 380 1.1 christos @IndexSubentry{@code{gprofng}, @code{display text}} 381 1.1 christos The most commonly used command to view the performance information is 382 1.1 christos @DisplayText{}. This is a very extensive and customizable tool that 383 1.1 christos produces the information in ASCII format. 384 1.1 christos 385 1.1 christos @IndexSubentry{@code{gprofng}, @code{display html}} 386 1.1 christos Another option is to use @DisplayHTML{}. This tool generates a directory with 387 1.1 christos files in html format. These can be viewed in a browser, allowing for easy 388 1.1 christos navigation through the profile data. 389 1.1 christos 390 1.1 christos @c -- A new node -------------------------------------------------------------- 391 1.1 christos @node A Mini Tutorial 392 1.1 christos @chapter A Mini Tutorial 393 1.1 christos @c ---------------------------------------------------------------------------- 394 1.1 christos 395 1.1 christos In this chapter we present and discuss the main functionality of @ToolName{}. 396 1.1 christos This will be a practical approach, using an example code to generate profile 397 1.1 christos data and show how to get various performance reports. 398 1.1 christos 399 1.1 christos @menu 400 1.1 christos * Getting Started:: The basics of profiling with @ProductName(). 401 1.1 christos * Support for Multithreading:: Commands specific to multithreaded applications. 402 1.1 christos * View Multiple Experiments:: Analyze multiple experiments simultaneously. 403 1.1 christos * Profile Hardware Event Counters:: How to use hardware event counters. 404 1.1 christos * Java Profiling:: How to profile a Java application. 405 1.1 christos @end menu 406 1.1 christos 407 1.1 christos @c -- A new node -------------------------------------------------------------- 408 1.1 christos @node Getting Started 409 1.1 christos @section Getting Started 410 1.1 christos @c ---------------------------------------------------------------------------- 411 1.1 christos 412 1.1 christos The information presented here provides a good and common basis for many 413 1.1 christos profiling tasks, but there are more features that you may want to leverage. 414 1.1 christos 415 1.1 christos These are covered in subsequent sections in this chapter. 416 1.1 christos 417 1.1 christos @menu 418 1.1 christos * The Example Program:: A description of the example program used. 419 1.1 christos * A First Profile:: How to get the first profile. 420 1.1 christos * The Source Code View:: Display the metrics in the source code. 421 1.1 christos * The Disassembly View:: Display the metrics at the instruction level. 422 1.1 christos * Display and Define the Metrics:: An example how to customize the metrics. 423 1.1 christos * Customization of the Output:: An example how to customize the output. 424 1.1 christos * Name the Experiment Directory:: Change the name of the experiment directory. 425 1.1 christos * Control the Number of Lines in the Output:: Change the number of lines in the tables. 426 1.1 christos * Sorting the Performance Data:: How to set the metric to sort by. 427 1.1 christos * Scripting:: Use a script to execute the commands. 428 1.1 christos * A More Elaborate Example:: An example of customization. 429 1.1 christos * The Call Tree:: Display the dynamic call tree. 430 1.1 christos * More Information on the Experiment:: How to get additional statistics. 431 1.1 christos * Control the Sampling Frequency:: How to control the sampling granularity. 432 1.1 christos * Information on Load Objects:: How to get more information on load objects. 433 1.1 christos @end menu 434 1.1 christos 435 1.1 christos @c -- A new node -------------------------------------------------------------- 436 1.1 christos @node The Example Program 437 1.1 christos @subsection The Example Program 438 1.1 christos @c ---------------------------------------------------------------------------- 439 1.1 christos 440 1.1 christos Throughout this guide we use the same example C code that implements the 441 1.1 christos multiplication of a vector of length @math{n} by an @math{m} by @math{n} 442 1.1 christos matrix. The result is stored in a vector of length @math{m}. 443 1.1 christos @cindex Pthreads 444 1.1 christos @cindex Posix Threads 445 1.1 christos The algorithm has been parallelized using Posix Threads, or Pthreads for short. 446 1.1 christos 447 1.1 christos The code was built using the @code{gcc} compiler and the name of the executable 448 1.1 christos is 449 1.1 christos @cindex mxv-pthreads 450 1.1 christos @command{mxv-pthreads}. 451 1.1 christos 452 1.1 christos The matrix sizes can be set through the @code{-m} and @code{-n} options. The 453 1.1.1.2 christos number of threads is set with the @code{-t} option. These are additional 454 1.1.1.2 christos threads that are used in the multiplication. To increase the duration of 455 1.1.1.2 christos the run, the computations are executed repeatedly. 456 1.1 christos 457 1.1 christos This is an example that multiplies a @math{8000} by @math{4000} matrix with 458 1.1 christos a vector of length @math{4000}. Although this is a multithreaded application, 459 1.1 christos initially we will be using @math{1} threads. Later on we will show examples 460 1.1 christos using multiple threads. 461 1.1 christos 462 1.1 christos @smallexample 463 1.1 christos @verbatim 464 1.1 christos $ ./mxv-pthreads -m 8000 -n 4000 -t 1 465 1.1 christos mxv: error check passed - rows = 8000 columns = 4000 threads = 1 466 1.1 christos $ 467 1.1 christos @end verbatim 468 1.1 christos @end smallexample 469 1.1 christos 470 1.1 christos The program performs an internal check to verify that the computed results 471 1.1 christos are correct. The result of this check is printed, as well as the matrix 472 1.1 christos sizes and the number of threads used. 473 1.1 christos 474 1.1 christos @c -- A new node -------------------------------------------------------------- 475 1.1 christos @node A First Profile 476 1.1 christos @subsection A First Profile 477 1.1 christos @c ---------------------------------------------------------------------------- 478 1.1 christos 479 1.1 christos The first step is to collect the performance data. It is important to remember 480 1.1 christos that much more information is gathered than may be shown by default. Often a 481 1.1 christos single data collection run is sufficient to get a lot of insight. 482 1.1 christos 483 1.1 christos The @CollectApp{} command is used for the data collection. Nothing needs to be 484 1.1 christos changed in the way the application is executed. The only difference is that it 485 1.1 christos is now run under control of the tool, as shown below: 486 1.1 christos 487 1.1 christos @cartouche 488 1.1 christos @smallexample 489 1.1 christos $ gprofng collect app ./mxv-pthreads -m 8000 -n 4000 -t 1 490 1.1 christos @end smallexample 491 1.1 christos @end cartouche 492 1.1 christos 493 1.1 christos @noindent 494 1.1 christos This produces the following output: 495 1.1 christos 496 1.1 christos @smallexample 497 1.1 christos @verbatim 498 1.1 christos Creating experiment directory test.1.er (Process ID: 2749878) ... 499 1.1 christos mxv: error check passed - rows = 8000 columns = 4000 threads = 1 500 1.1 christos @end verbatim 501 1.1 christos @end smallexample 502 1.1 christos 503 1.1 christos We see a message that an experiment directory with the name @file{test.1.er} 504 1.1.1.2 christos has been created. The process id is also echoed. The application completes 505 1.1 christos as usual and we have our first experiment directory that can be analyzed. 506 1.1 christos 507 1.1 christos The tool we use for this is called @DisplayText{}. It takes the name of 508 1.1 christos the experiment directory as an argument. 509 1.1 christos 510 1.1 christos @cindex Interpreter mode 511 1.1.1.2 christos If invoked this way, the tool starts in the interactive @emph{interpreter} 512 1.1.1.2 christos mode. 513 1.1.1.2 christos While in this environment, commands can be given and the tool responds. This 514 1.1.1.2 christos is illustrated below: 515 1.1 christos 516 1.1 christos @smallexample 517 1.1 christos @verbatim 518 1.1 christos $ gprofng display text test.1.er 519 1.1 christos Warning: History and command editing is not supported on this system. 520 1.1.1.2 christos (gprofng-display-text) quit 521 1.1 christos $ 522 1.1 christos @end verbatim 523 1.1 christos @end smallexample 524 1.1 christos 525 1.1 christos @cindex Command line mode 526 1.1 christos While useful in certain cases, we prefer to use this tool in command line mode 527 1.1 christos by specifying the commands to be issued when invoking the tool. The way to do 528 1.1 christos this is to prepend the command(s) with a hyphen (@samp{-}) if used on the 529 1.1 christos command line. 530 1.1 christos 531 1.1 christos Since this makes the command appear to be an option, they are also sometimes 532 1.1 christos referred to as such, but technically they are commands. This is the 533 1.1 christos terminology we will use in this user guide, but for convenience the commands 534 1.1 christos are also listed as options in the index. 535 1.1 christos 536 1.1 christos For example, 537 1.1 christos @IndexSubentry{Options, @code{-functions}} 538 1.1 christos @IndexSubentry{Commands, @code{functions}} 539 1.1 christos below we use the @command{functions} command to request a list of the functions 540 1.1 christos that have been executed, plus their respective CPU times: 541 1.1 christos 542 1.1 christos @cartouche 543 1.1 christos @smallexample 544 1.1 christos $ gprofng display text -functions test.1.er 545 1.1 christos @end smallexample 546 1.1 christos @end cartouche 547 1.1 christos 548 1.1 christos @smallexample 549 1.1 christos @verbatim 550 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 551 1.1 christos 552 1.1 christos Excl. Total Incl. Total Name 553 1.1 christos CPU CPU 554 1.1 christos sec. % sec. % 555 1.1 christos 9.367 100.00 9.367 100.00 <Total> 556 1.1 christos 8.926 95.30 8.926 95.30 mxv_core 557 1.1 christos 0.210 2.24 0.420 4.49 init_data 558 1.1 christos 0.080 0.85 0.210 2.24 drand48 559 1.1 christos 0.070 0.75 0.130 1.39 erand48_r 560 1.1 christos 0.060 0.64 0.060 0.64 __drand48_iterate 561 1.1 christos 0.010 0.11 0.020 0.21 _int_malloc 562 1.1 christos 0.010 0.11 0.010 0.11 sysmalloc 563 1.1 christos 0. 0. 8.926 95.30 <static>@0x47960 (<libgp-collector.so>) 564 1.1 christos 0. 0. 0.440 4.70 __libc_start_main 565 1.1 christos 0. 0. 0.020 0.21 allocate_data 566 1.1 christos 0. 0. 8.926 95.30 driver_mxv 567 1.1 christos 0. 0. 0.440 4.70 main 568 1.1 christos 0. 0. 0.020 0.21 malloc 569 1.1 christos 0. 0. 8.926 95.30 start_thread 570 1.1 christos @end verbatim 571 1.1 christos @end smallexample 572 1.1 christos 573 1.1.1.2 christos @noindent 574 1.1.1.2 christos As easy and simple as these steps are, we do have a first profile of our 575 1.1.1.2 christos program! 576 1.1 christos 577 1.1 christos There are five columns. The first four contain the 578 1.1 christos @cindex Total CPU time 579 1.1 christos ''Total CPU Time'', which 580 1.1 christos is the sum of the user and system time. @xref{Inclusive and Exclusive Metrics} 581 1.1 christos for an explanation of ``exclusive'' and ``inclusive'' times. 582 1.1 christos 583 1.1 christos The first line echoes the metric that is used to sort the output. By default, 584 1.1.1.2 christos this is the exclusive CPU time, but through the @command{sort} command, the 585 1.1.1.2 christos sort metric can be changed by the user. 586 1.1 christos 587 1.1 christos Next, there are four columns with the exclusive and inclusive CPU times and the 588 1.1 christos respective percentages. This is followed by the name of the function. 589 1.1 christos 590 1.1 christos @IndexSubentry{Miscellaneous, @code{<Total>}} 591 1.1.1.2 christos The function with the name @code{<Total>} is not a user function. It is a 592 1.1 christos pseudo function introduced by @ToolName{}. It is used to display the 593 1.1 christos accumulated measured metric values. In this example, we see that the total 594 1.1 christos CPU time of this job was 9.367 seconds and it is scaled to 100%. All 595 1.1 christos other percentages in the same column are relative to this number. 596 1.1 christos 597 1.1.1.2 christos @c -- If the metric is derived, for example the @code{IPC}, the value shown 598 1.1.1.2 christos @c -- under 599 1.1.1.2 christos @c -- @code{<Total>} is based upon the total values of the that are metrics 600 1.1.1.2 christos @c -- used to 601 1.1 christos @c -- compute the derived metric. 602 1.1 christos @c -- @IndexSubentry{Hardware event counters, IPC} 603 1.1 christos 604 1.1 christos With 8.926 seconds, function @code{mxv_core} takes 95.30% of the 605 1.1.1.2 christos total time and is by far the most time consuming function. 606 1.1.1.2 christos The exclusive and inclusive metrics are identical, which means that it 607 1.1.1.2 christos is a leaf function not calling any other functions. 608 1.1 christos 609 1.1 christos The next function in the list is @code{init_data}. Although with 4.49%, 610 1.1 christos the CPU time spent in this part is modest, this is an interesting entry because 611 1.1 christos the inclusive CPU time of 0.420 seconds is twice the exclusive CPU time 612 1.1 christos of 0.210 seconds. Clearly this function is calling another function, 613 1.1 christos or even more than one function and collectively this takes 0.210 seconds. 614 1.1 christos Below we show the call tree feature that provides more details on the call 615 1.1 christos structure of the application. 616 1.1 christos 617 1.1 christos The function @code{<static>@@0x47960 (<libgp-collector.so>)} does odd and 618 1.1 christos certainly not familiar. It is one of the internal functions used by 619 1.1 christos @CollectApp{} and can be ignored. Also, while the inclusive time is high, 620 1.1 christos the exclusive time is zero. This means it doesn't contribute to the 621 1.1 christos performance. 622 1.1 christos 623 1.1.1.2 christos The question is how we know where this function originates from. There are 624 1.1 christos several commands to dig deeper an get more details on a function. 625 1.1 christos @xref{Information on Load Objects}. 626 1.1 christos 627 1.1 christos @c -- A new node -------------------------------------------------------------- 628 1.1 christos @node The Source Code View 629 1.1 christos @subsection The Source Code View 630 1.1 christos @c ---------------------------------------------------------------------------- 631 1.1 christos 632 1.1 christos In general, the tuning efforts are best focused on the most time consuming 633 1.1 christos part(s) of an application. In this case that is easy, since over 95% of 634 1.1 christos the total CPU time is spent in function @code{mxv_core}. 635 1.1 christos It is now time to dig deeper and look 636 1.1 christos @cindex Source level metrics 637 1.1 christos at the metrics distribution at the source code level. Since we measured 638 1.1 christos CPU times, these are the metrics shown. 639 1.1 christos 640 1.1 christos @IndexSubentry{Options, @code{-source}} 641 1.1 christos @IndexSubentry{Commands, @code{source}} 642 1.1 christos The @code{source} command is used to accomplish this. It takes the name of the 643 1.1 christos function, not the source filename, as an argument. This is demonstrated 644 1.1 christos below, where the @DisplayText{} command is used to show the annotated 645 1.1 christos source listing of function @code{mxv_core}. 646 1.1 christos 647 1.1 christos Be aware that when using the @command{gcc} compiler, the source code has to 648 1.1 christos be compiled with the @code{-g} option in order for the source code feature 649 1.1 christos to work. Otherwise the location(s) can not be determined. For other compilers 650 1.1 christos we recommend to check the documentation for such an option. 651 1.1 christos 652 1.1 christos Below the command to display the source code of a function is shown. Since at 653 1.1 christos this point we are primarily interested in the timings only, we use the 654 1.1 christos @code{metrics} command to request the exclusive and inclusive total CPU 655 1.1 christos timings only. @xref{Display and Define the Metrics} for more information 656 1.1 christos how to define the metrics to be displayed. 657 1.1 christos 658 1.1 christos @cartouche 659 1.1 christos @smallexample 660 1.1 christos $ gprofng display text -metrics ei.totalcpu -source mxv_core test.1.er 661 1.1 christos @end smallexample 662 1.1 christos @end cartouche 663 1.1 christos 664 1.1 christos The output is shown below. It has been somewhat modified to fit the formatting 665 1.1 christos constraints and reduce the number of lines. 666 1.1 christos 667 1.1 christos @smallexample 668 1.1 christos @verbatim 669 1.1 christos Current metrics: e.totalcpu:i.totalcpu:name 670 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.totalcpu ) 671 1.1 christos Source file: <apath>/mxv.c 672 1.1 christos Object file: mxv-pthreads (found as test.1.er/archives/...) 673 1.1 christos Load Object: mxv-pthreads (found as test.1.er/archives/...) 674 1.1 christos 675 1.1 christos Excl. Incl. 676 1.1 christos Total Total 677 1.1 christos CPU sec. CPU sec. 678 1.1 christos 679 1.1 christos <lines deleted> 680 1.1 christos <Function: mxv_core> 681 1.1 christos 43. void __attribute__ ((noinline)) 682 1.1 christos mxv_core (int64_t row_index_start, 683 1.1 christos 44. int64_t row_index_end, 684 1.1 christos 45. int64_t m, 685 1.1 christos 46. int64_t n, 686 1.1 christos 47. double **restrict A, 687 1.1 christos 48. double *restrict b, 688 1.1 christos 49. double *restrict c) 689 1.1 christos 50. { 690 1.1 christos 0. 0. 50. { 691 1.1 christos 0. 0. 51. for (int64_t i=row_index_start; 692 1.1 christos i<=row_index_end; i++) 693 1.1 christos 52. { 694 1.1 christos 0. 0. 53. double row_sum = 0.0; 695 1.1 christos ## 4.613 4.613 54. for (int64_t j=0; j<n; j++) 696 1.1 christos ## 4.313 4.313 55. row_sum += A[i][j] * b[j]; 697 1.1 christos 0. 0. 56. c[i] = row_sum; 698 1.1 christos 57. } 699 1.1 christos 0. 0. 58. } 700 1.1 christos @end verbatim 701 1.1 christos @end smallexample 702 1.1 christos 703 1.1 christos The first line echoes the metrics that have been selected. The second line 704 1.1 christos is not very meaningful when looking at the source code listing, but it shows 705 1.1 christos the metric that is used to sort the data. 706 1.1 christos 707 1.1 christos The next three lines provide information on the location of the source file, 708 1.1 christos the object file and the load object (@xref{Load Objects and Functions}). 709 1.1 christos 710 1.1 christos Function @code{mxv_core} is part of a source file that has other functions 711 1.1 christos as well. These functions will be shown with the values for the metrics, but 712 1.1 christos for lay-out purposes they have been removed in the output shown above. 713 1.1 christos 714 1.1 christos The header is followed by the annotated source code listing. The selected 715 1.1 christos metrics are shown first, followed by a source line number, and the source code. 716 1.1 christos @IndexSubentry{Miscellaneous ,@code{##}} 717 1.1 christos The most time consuming line(s) are marked with the @code{##} symbol. In 718 1.1 christos this way they are easier to identify and find with a search. 719 1.1 christos 720 1.1 christos What we see is that all of the time is spent in lines 54-55. 721 1.1 christos 722 1.1 christos @IndexSubentry{Options, @code{-lines}} 723 1.1 christos @IndexSubentry{Commands, @code{lines}} 724 1.1 christos A related command sometimes comes handy as well. It is called @code{lines} 725 1.1 christos and displays a list of the source lines and their metrics, ordered according 726 1.1 christos to the current sort metric (@xref{Sorting the Performance Data}). 727 1.1 christos 728 1.1 christos Below the command and the output. For lay-out reasons, only the top 10 is 729 1.1 christos shown here and the last part of the text on some lines has been replaced 730 1.1 christos by dots. The full text is @samp{instructions without line numbers} and 731 1.1 christos means that the line number information for that function was not found. 732 1.1 christos 733 1.1 christos @cartouche 734 1.1 christos @smallexample 735 1.1 christos $ gprofng display text -lines test.1.er 736 1.1 christos @end smallexample 737 1.1 christos @end cartouche 738 1.1 christos 739 1.1 christos @smallexample 740 1.1 christos @verbatim 741 1.1 christos Lines sorted by metric: Exclusive Total CPU Time 742 1.1 christos 743 1.1 christos Excl. Total Incl. Total Name 744 1.1 christos CPU CPU 745 1.1 christos sec. % sec. % 746 1.1 christos 9.367 100.00 9.367 100.00 <Total> 747 1.1 christos 4.613 49.25 4.613 49.25 mxv_core, line 54 in "mxv.c" 748 1.1 christos 4.313 46.05 4.313 46.05 mxv_core, line 55 in "mxv.c" 749 1.1 christos 0.160 1.71 0.370 3.95 init_data, line 118 in "manage_data.c" 750 1.1 christos 0.080 0.85 0.210 2.24 <Function: drand48, instructions ...> 751 1.1 christos 0.070 0.75 0.130 1.39 <Function: erand48_r, instructions ...> 752 1.1 christos 0.060 0.64 0.060 0.64 <Function: __drand48_iterate, ...> 753 1.1 christos 0.040 0.43 0.040 0.43 init_data, line 124 in "manage_data.c" 754 1.1 christos 0.010 0.11 0.020 0.21 <Function: _int_malloc, instructions ...> 755 1.1 christos 0.010 0.11 0.010 0.11 <Function: sysmalloc, instructions ...> 756 1.1 christos @end verbatim 757 1.1 christos @end smallexample 758 1.1 christos 759 1.1 christos What this overview immediately highlights is that the third most time consuming 760 1.1 christos source line takes 0.370 seconds only. This means that the inclusive time is 761 1.1 christos only 3.95% and clearly this branch of the code hardly impacts the performance. 762 1.1 christos 763 1.1 christos @c -- A new node -------------------------------------------------------------- 764 1.1 christos @node The Disassembly View 765 1.1 christos @subsection The Disassembly View 766 1.1 christos @c ---------------------------------------------------------------------------- 767 1.1 christos 768 1.1 christos The source view is very useful to obtain more insight where the time is spent, 769 1.1 christos but sometimes this is not sufficient. The disassembly view provides more 770 1.1 christos details since it shows the metrics at the instruction level. 771 1.1 christos 772 1.1 christos This view is displayed with the 773 1.1 christos @IndexSubentry{Options, @code{-disasm}} 774 1.1 christos @IndexSubentry{Commands, @code{disasm}} 775 1.1 christos @command{disasm} 776 1.1 christos command and as with the source view, it displays an annotated listing. In this 777 1.1 christos @cindex Instruction level metrics 778 1.1 christos case it shows the instructions with the metrics, interleaved with the 779 1.1 christos source lines. The 780 1.1 christos instructions have a reference in square brackets (@code{[} and @code{]}) 781 1.1 christos to the source line they correspond to. 782 1.1 christos 783 1.1 christos @noindent 784 1.1 christos We again focus on the tmings only and set the metrics accordingly: 785 1.1 christos 786 1.1 christos @cartouche 787 1.1 christos @smallexample 788 1.1 christos $ gprofng display text -metrics ei.totalcpu -disasm mxv_core test.1.er 789 1.1 christos @end smallexample 790 1.1 christos @end cartouche 791 1.1 christos 792 1.1 christos @smallexample 793 1.1 christos @verbatim 794 1.1 christos Current metrics: e.totalcpu:i.totalcpu:name 795 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.totalcpu ) 796 1.1 christos Source file: <apath>/src/mxv.c 797 1.1 christos Object file: mxv-pthreads (found as test.1.er/archives/...) 798 1.1 christos Load Object: mxv-pthreads (found as test.1.er/archives/...) 799 1.1 christos 800 1.1 christos Excl. Incl. 801 1.1 christos Total Total 802 1.1 christos CPU sec. CPU sec. 803 1.1 christos 804 1.1 christos <lines deleted> 805 1.1 christos 43. void __attribute__ ((noinline)) 806 1.1 christos mxv_core (int64_t row_index_start, 807 1.1 christos 44. int64_t row_index_end, 808 1.1 christos 45. int64_t m, 809 1.1 christos 46. int64_t n, 810 1.1 christos 47. double **restrict A, 811 1.1 christos 48. double *restrict b, 812 1.1 christos 49. double *restrict c) 813 1.1 christos 50. { 814 1.1 christos <Function: mxv_core> 815 1.1 christos 0. 0. [50] 401d56: mov 0x8(%rsp),%r10 816 1.1 christos 51. for (int64_t i=row_index_start; 817 1.1 christos i<=row_index_end; i++) 818 1.1 christos 0. 0. [51] 401d5b: cmp %rsi,%rdi 819 1.1 christos 0. 0. [51] 401d5e: jg 0x47 820 1.1 christos 0. 0. [51] 401d60: add $0x1,%rsi 821 1.1 christos 0. 0. [51] 401d64: jmp 0x36 822 1.1 christos 52. { 823 1.1 christos 53. double row_sum = 0.0; 824 1.1 christos 54. for (int64_t j=0; j<n; j++) 825 1.1 christos 55 row_sum += A[i][j] * b[j]; 826 1.1 christos 0. 0. [55] 401d66: mov (%r8,%rdi,8),%rdx 827 1.1 christos 0. 0. [54] 401d6a: mov $0x0,%eax 828 1.1 christos 0. 0. [53] 401d6f: pxor %xmm1,%xmm1 829 1.1 christos 0.110 0.110 [55] 401d73: movsd (%rdx,%rax,8),%xmm0 830 1.1 christos 1.921 1.921 [55] 401d78: mulsd (%r9,%rax,8),%xmm0 831 1.1 christos 2.282 2.282 [55] 401d7e: addsd %xmm0,%xmm1 832 1.1 christos ## 4.613 4.613 [54] 401d82: add $0x1,%rax 833 1.1 christos 0. 0. [54] 401d86: cmp %rax,%rcx 834 1.1 christos 0. 0. [54] 401d89: jne 0xffffffffffffffea 835 1.1 christos 56. c[i] = row_sum; 836 1.1 christos 0. 0. [56] 401d8b: movsd %xmm1,(%r10,%rdi,8) 837 1.1 christos 0. 0. [51] 401d91: add $0x1,%rdi 838 1.1 christos 0. 0. [51] 401d95: cmp %rsi,%rdi 839 1.1 christos 0. 0. [51] 401d98: je 0xd 840 1.1 christos 0. 0. [53] 401d9a: pxor %xmm1,%xmm1 841 1.1 christos 0. 0. [54] 401d9e: test %rcx,%rcx 842 1.1 christos 0. 0. [54] 401da1: jg 0xffffffffffffffc5 843 1.1 christos 0. 0. [54] 401da3: jmp 0xffffffffffffffe8 844 1.1 christos 57. } 845 1.1 christos 58. } 846 1.1 christos 0. 0. [58] 401da5: ret 847 1.1 christos @end verbatim 848 1.1 christos @end smallexample 849 1.1 christos 850 1.1.1.2 christos For each instruction, the timing values are given and we can immediately 851 1.1 christos identify the most expensive instructions. As with the source level view, 852 1.1 christos these are marked with the @code{##} symbol. 853 1.1 christos 854 1.1 christos It comes as no surprise that the time consuming instructions originate from 855 1.1 christos the source code at lines 54-55. 856 1.1 christos One thing to note is that the source line numbers no longer appear in 857 1.1.1.2 christos sequential order. 858 1.1 christos This is because the compiler has re-ordered the instructions as part of 859 1.1.1.2 christos the code optimizations it has performed. 860 1.1 christos 861 1.1 christos As illustrated below and similar to the @command{lines} command, we can get 862 1.1 christos an overview of the instructions executed by using the 863 1.1 christos @IndexSubentry{Options, @code{-pcs}} 864 1.1 christos @IndexSubentry{Commands, @code{pcs}} 865 1.1 christos @command{pcs} 866 1.1 christos command. 867 1.1 christos 868 1.1 christos @noindent 869 1.1 christos Below the command and the output, which again has been restricted 870 1.1 christos to 10 lines. As before, some lines have been shortened for lay-out 871 1.1 christos purposes. 872 1.1 christos 873 1.1 christos @cartouche 874 1.1 christos @smallexample 875 1.1 christos $ gprofng display text -pcs test.1.er 876 1.1 christos @end smallexample 877 1.1 christos @end cartouche 878 1.1 christos 879 1.1 christos @smallexample 880 1.1 christos @verbatim 881 1.1 christos PCs sorted by metric: Exclusive Total CPU Time 882 1.1 christos 883 1.1 christos Excl. Total Incl. Total Name 884 1.1 christos CPU CPU 885 1.1 christos sec. % sec. % 886 1.1 christos 9.367 100.00 9.367 100.00 <Total> 887 1.1 christos 4.613 49.25 4.613 49.25 mxv_core + 0x0000002C, line 54 in "mxv.c" 888 1.1 christos 2.282 24.36 2.282 24.36 mxv_core + 0x00000028, line 55 in "mxv.c" 889 1.1 christos 1.921 20.51 1.921 20.51 mxv_core + 0x00000022, line 55 in "mxv.c" 890 1.1 christos 0.150 1.60 0.150 1.60 init_data + 0x000000AC, line 118 in ... 891 1.1 christos 0.110 1.18 0.110 1.18 mxv_core + 0x0000001D, line 55 in "mxv.c" 892 1.1 christos 0.040 0.43 0.040 0.43 drand48 + 0x00000022 893 1.1 christos 0.040 0.43 0.040 0.43 init_data + 0x000000F1, line 124 in ... 894 1.1 christos 0.030 0.32 0.030 0.32 __drand48_iterate + 0x0000001E 895 1.1 christos 0.020 0.21 0.020 0.21 __drand48_iterate + 0x00000038 896 1.1 christos @end verbatim 897 1.1 christos @end smallexample 898 1.1 christos 899 1.1 christos @noindent 900 1.1 christos What we see is that the top three instructions take 94% of the total CPU time 901 1.1 christos and any optimizations should focus on this part of the code.. 902 1.1 christos 903 1.1 christos @c -- A new node -------------------------------------------------------------- 904 1.1 christos @node Display and Define the Metrics 905 1.1 christos @subsection Display and Define the Metrics 906 1.1 christos @c ---------------------------------------------------------------------------- 907 1.1 christos 908 1.1 christos The metrics shown by @DisplayText{} are useful, but there is more recorded 909 1.1 christos than displayed by default. We can customize the values shown by defining the 910 1.1 christos metrics ourselves. 911 1.1 christos 912 1.1 christos There are two commands related to changing the metrics shown: 913 1.1 christos @IndexSubentry{Options, @code{-metric_list}} 914 1.1 christos @IndexSubentry{Commands, @code{metric_list}} 915 1.1 christos @command{metric_list} and 916 1.1 christos @IndexSubentry{Options, @code{-metrics}} 917 1.1 christos @IndexSubentry{Commands, @code{metrics}} 918 1.1 christos @command{metrics}. 919 1.1 christos 920 1.1 christos The first command shows the currently selected metrics, plus all the metrics 921 1.1 christos that have been stored as part of the experiment. The second command may be 922 1.1 christos used to define the metric list. 923 1.1 christos 924 1.1 christos @noindent 925 1.1 christos This is the way to get the information about the metrics: 926 1.1 christos 927 1.1 christos @IndexSubentry{Options, @code{-metric_list}} 928 1.1 christos @IndexSubentry{Commands, @code{metric_list}} 929 1.1 christos @cartouche 930 1.1 christos @smallexample 931 1.1 christos $ gprofng display text -metric_list test.1.er 932 1.1 christos @end smallexample 933 1.1 christos @end cartouche 934 1.1 christos 935 1.1 christos @noindent 936 1.1 christos This is the output: 937 1.1 christos 938 1.1 christos @smallexample 939 1.1 christos @verbatim 940 1.1 christos Current metrics: e.%totalcpu:i.%totalcpu:name 941 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu ) 942 1.1 christos Available metrics: 943 1.1 christos Exclusive Total CPU Time: e.%totalcpu 944 1.1 christos Inclusive Total CPU Time: i.%totalcpu 945 1.1 christos Size: size 946 1.1 christos PC Address: address 947 1.1 christos Name: name 948 1.1 christos @end verbatim 949 1.1 christos @end smallexample 950 1.1 christos 951 1.1 christos This shows the metrics that are currently used, the metric that is used to sort 952 1.1 christos the data and all the metrics that have been recorded, but are not necessarily 953 1.1 christos shown. 954 1.1 christos 955 1.1 christos @cindex Default metrics 956 1.1 christos In this case, the current metrics are set to the exclusive and inclusive 957 1.1 christos total CPU times, the respective percentages, and the name of the function, 958 1.1 christos or load object. 959 1.1 christos 960 1.1 christos @IndexSubentry{Options, @code{-metrics}} 961 1.1 christos @IndexSubentry{Commands, @code{metrics}} 962 1.1 christos The @code{metrics} command is used to define the metrics that need to be 963 1.1 christos displayed. 964 1.1 christos 965 1.1 christos For example, to swap the exclusive and inclusive metrics, use the following 966 1.1 christos metric definition: @code{i.%totalcpu:e.%totalcpu}. 967 1.1 christos 968 1.1 christos Since the metrics can be tailored for different views, there is also a way 969 1.1 christos to reset them to the default. This is done through the special keyword 970 1.1 christos @code{default} for the metrics definition (@command{-metrics default}). 971 1.1 christos @IndexSubentry{Metrics, Reset to default} 972 1.1 christos 973 1.1 christos @c -- A new node -------------------------------------------------------------- 974 1.1 christos @node Customization of the Output 975 1.1 christos @subsection Customization of the Output 976 1.1 christos @c ---------------------------------------------------------------------------- 977 1.1 christos 978 1.1 christos With the information just given, the function overview can be customized. 979 1.1 christos For sake of the example, we would like to display the name of the function 980 1.1 christos first, only followed by the exclusive CPU time, given as an absolute number 981 1.1 christos and a percentage. 982 1.1 christos 983 1.1 christos Note that the commands are parsed in order of appearance. This is why we 984 1.1 christos need to define the metrics @emph{before} requesting the function overview: 985 1.1 christos 986 1.1 christos @cartouche 987 1.1 christos @smallexample 988 1.1 christos $ gprofng display text -metrics name:e.%totalcpu -functions test.1.er 989 1.1 christos @end smallexample 990 1.1 christos @end cartouche 991 1.1 christos 992 1.1 christos @smallexample 993 1.1 christos @verbatim 994 1.1 christos Current metrics: name:e.%totalcpu 995 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu ) 996 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 997 1.1 christos 998 1.1 christos Name Excl. Total 999 1.1 christos CPU 1000 1.1 christos sec. % 1001 1.1 christos <Total> 9.367 100.00 1002 1.1 christos mxv_core 8.926 95.30 1003 1.1 christos init_data 0.210 2.24 1004 1.1 christos drand48 0.080 0.85 1005 1.1 christos erand48_r 0.070 0.75 1006 1.1 christos __drand48_iterate 0.060 0.64 1007 1.1 christos _int_malloc 0.010 0.11 1008 1.1 christos sysmalloc 0.010 0.11 1009 1.1 christos <static>@0x47960 (<libgp-collector.so>) 0. 0. 1010 1.1 christos __libc_start_main 0. 0. 1011 1.1 christos allocate_data 0. 0. 1012 1.1 christos driver_mxv 0. 0. 1013 1.1 christos main 0. 0. 1014 1.1 christos malloc 0. 0. 1015 1.1 christos start_thread 0. 0. 1016 1.1 christos @end verbatim 1017 1.1 christos @end smallexample 1018 1.1 christos 1019 1.1 christos This was a first and simple example how to customize the output. Note that we 1020 1.1 christos did not rerun our profiling job and merely modified the display settings. 1021 1.1 christos Below we will show other and also more advanced examples of customization. 1022 1.1 christos 1023 1.1 christos @c -- A new node -------------------------------------------------------------- 1024 1.1 christos @node Name the Experiment Directory 1025 1.1 christos @subsection Name the Experiment Directory 1026 1.1 christos @c ---------------------------------------------------------------------------- 1027 1.1 christos 1028 1.1 christos When using @CollectApp{}, the default names for experiments work fine, but 1029 1.1 christos they are quite generic. It is often more convenient to select a more 1030 1.1 christos descriptive name. For example, one that reflects conditions for the experiment 1031 1.1 christos conducted, like the number of threads used. 1032 1.1 christos 1033 1.1 christos For this, the mutually exclusive @code{-o} and @code{-O} options come in handy. 1034 1.1 christos Both may be used to provide a name for the experiment directory, but the 1035 1.1 christos behaviour of @CollectApp{} is different. 1036 1.1 christos 1037 1.1 christos With the 1038 1.1 christos @IndexSubentry{Options, @code{-o}} 1039 1.1 christos @samp{-o} 1040 1.1 christos option, an existing experiment directory is not overwritten. Any directory 1041 1.1 christos with the same name either needs to be renamed, moved, or removed, before the 1042 1.1 christos experiment can be conducted. 1043 1.1 christos 1044 1.1 christos This is in contrast with the behaviour for the 1045 1.1 christos @IndexSubentry{Options, @code{-O}} 1046 1.1 christos @samp{-O} 1047 1.1 christos option. Any existing directory with the same name is silently overwritten. 1048 1.1 christos 1049 1.1 christos Be aware that the name of the experiment directory has to end with @file{.er}. 1050 1.1 christos 1051 1.1 christos @c -- A new node -------------------------------------------------------------- 1052 1.1 christos @node Control the Number of Lines in the Output 1053 1.1 christos @subsection Control the Number of Lines in the Output 1054 1.1 christos @c ---------------------------------------------------------------------------- 1055 1.1 christos 1056 1.1 christos @IndexSubentry{Options, @code{-limit}} 1057 1.1 christos @IndexSubentry{Commands, @code{limit}} 1058 1.1.1.2 christos The @command{limit} @var{<n>} command can be used to control the number of 1059 1.1.1.2 christos lines printed in various views. For example it impacts the function view, but 1060 1.1.1.2 christos also takes effect for other display commands, like @command{lines}. 1061 1.1 christos 1062 1.1 christos The argument @var{<n>} should be a positive integer number. It sets the number 1063 1.1 christos of lines in the (function) view. A value of zero resets the limit to the 1064 1.1 christos default. 1065 1.1 christos 1066 1.1 christos Be aware that the pseudo-function @code{<Total>} counts as a regular function. 1067 1.1 christos For example @command{limit 10} displays nine user level functions. 1068 1.1 christos 1069 1.1 christos @c -- A new node -------------------------------------------------------------- 1070 1.1 christos @node Sorting the Performance Data 1071 1.1 christos @subsection Sorting the Performance Data 1072 1.1 christos @c ---------------------------------------------------------------------------- 1073 1.1 christos 1074 1.1 christos @IndexSubentry{Options, @code{-sort}} 1075 1.1 christos @IndexSubentry{Commands, @code{sort}} 1076 1.1 christos The @command{sort} @var{<key>} command sets the key to be used when sorting the 1077 1.1 christos performance data. 1078 1.1 christos 1079 1.1 christos The key is a valid metric definition, but the 1080 1.1 christos @IndexSubentry{Metrics, Visibility field} 1081 1.1 christos visibility field 1082 1.1 christos (@xref{Metric Definitions}) 1083 1.1 christos in the metric 1084 1.1 christos definition is ignored, since this does not affect the outcome of the sorting 1085 1.1 christos operation. 1086 1.1 christos For example if the sort key is set to @code{e.totalcpu}, the values 1087 1.1 christos will be sorted in descending order with respect to the exclusive total 1088 1.1 christos CPU time. 1089 1.1 christos 1090 1.1 christos @IndexSubentry{Sort, Reverse order} 1091 1.1 christos The data can be sorted in reverse order by prepending the metric definition 1092 1.1 christos with a minus (@samp{-}) sign. For example @command{sort -e.totalcpu}. 1093 1.1 christos 1094 1.1 christos @IndexSubentry{Sort, Reset to default} 1095 1.1 christos A default metric for the sort operation has been defined and since this is 1096 1.1 christos a persistent command, this default can be restored with @code{default} as 1097 1.1 christos the key (@command{sort default}). 1098 1.1 christos 1099 1.1 christos @c -- A new node -------------------------------------------------------------- 1100 1.1 christos @node Scripting 1101 1.1 christos @subsection Scripting 1102 1.1 christos @c ---------------------------------------------------------------------------- 1103 1.1 christos 1104 1.1 christos @cindex Script files 1105 1.1 christos The list with commands for @DisplayText{} can be very long. This is tedious 1106 1.1 christos and also error prone. Luckily, there is an easier and elegant way to control 1107 1.1 christos the output of this tool. 1108 1.1 christos 1109 1.1 christos @IndexSubentry{Options, @code{-script}} 1110 1.1 christos @IndexSubentry{Commands, @code{script}} 1111 1.1 christos Through the @command{script} command, the name of a file with commands can be 1112 1.1 christos passed in. These commands are parsed and executed as if they appeared on 1113 1.1 christos the command line in the same order as encountered in the file. The commands 1114 1.1 christos in this script file can actually be mixed with commands on the command line 1115 1.1 christos and multiple script files may be used. 1116 1.1 christos The difference between the commands in the script file and those used on the 1117 1.1 christos command line is that the latter require a leading dash (@samp{-}) symbol. 1118 1.1 christos 1119 1.1 christos Comment lines in a script file are supported. They need to start with the 1120 1.1 christos @samp{#} symbol. 1121 1.1 christos 1122 1.1 christos @c -- A new node -------------------------------------------------------------- 1123 1.1 christos @node A More Elaborate Example 1124 1.1 christos @subsection A More Elaborate Example 1125 1.1 christos @c ---------------------------------------------------------------------------- 1126 1.1 christos 1127 1.1 christos With the information presented so far, we can customize our data 1128 1.1 christos gathering and display commands. 1129 1.1 christos 1130 1.1 christos As an example, we would like to use @file{mxv.1.thr.er} as the name for the 1131 1.1 christos experiment directory. In this way, the name of the algorithm and the 1132 1.1 christos number of threads that were used are included in the name. 1133 1.1 christos We also don't mind to overwrite an existing 1134 1.1 christos experiment directory with the same name. 1135 1.1 christos 1136 1.1 christos All that needs to be done is to use the 1137 1.1 christos @IndexSubentry{Options, @code{-O}} 1138 1.1 christos @samp{-O} 1139 1.1 christos option, followed by the directory name of choice when running @CollectApp{}: 1140 1.1 christos 1141 1.1 christos @cartouche 1142 1.1 christos @smallexample 1143 1.1 christos $ exe=mxv-pthreads 1144 1.1 christos $ m=8000 1145 1.1 christos $ n=4000 1146 1.1 christos $ gprofng collect app -O mxv.1.thr.er ./$exe -m $m -n $n -t 1 1147 1.1 christos @end smallexample 1148 1.1 christos @end cartouche 1149 1.1 christos 1150 1.1 christos Since we want to customize the profile and prefer to keep the command line 1151 1.1 christos short, the commands to generate the profile are put into a file with the 1152 1.1 christos name @file{my-script}: 1153 1.1 christos 1154 1.1 christos @smallexample 1155 1.1 christos @verbatim 1156 1.1 christos $ cat my-script 1157 1.1 christos # This is my first gprofng script 1158 1.1 christos # Set the metrics 1159 1.1 christos metrics i.%totalcpu:e.%totalcpu:name 1160 1.1 christos # Use the exclusive time to sort 1161 1.1 christos sort e.totalcpu 1162 1.1 christos # Limit the function list to 5 lines 1163 1.1 christos limit 5 1164 1.1 christos # Show the function list 1165 1.1 christos functions 1166 1.1 christos @end verbatim 1167 1.1 christos @end smallexample 1168 1.1 christos 1169 1.1 christos This script file is specified as input to the @DisplayText{} command 1170 1.1 christos that is used to display the performance information stored in experiment 1171 1.1 christos directory @file{mxv.1.thr.er}: 1172 1.1 christos 1173 1.1 christos @cartouche 1174 1.1 christos @smallexample 1175 1.1 christos $ gprofng display text -script my-script mxv.1.thr.er 1176 1.1 christos @end smallexample 1177 1.1 christos @end cartouche 1178 1.1 christos 1179 1.1 christos This command produces the following output: 1180 1.1 christos 1181 1.1 christos @smallexample 1182 1.1 christos @verbatim 1183 1.1 christos # This is my first gprofng script 1184 1.1 christos # Set the metrics 1185 1.1 christos Current metrics: i.%totalcpu:e.%totalcpu:name 1186 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu ) 1187 1.1 christos # Use the exclusive time to sort 1188 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu ) 1189 1.1 christos # Limit the function list to 5 lines 1190 1.1 christos Print limit set to 5 1191 1.1 christos # Show the function list 1192 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 1193 1.1 christos 1194 1.1 christos Incl. Total Excl. Total Name 1195 1.1 christos CPU CPU 1196 1.1 christos sec. % sec. % 1197 1.1 christos 9.703 100.00 9.703 100.00 <Total> 1198 1.1 christos 9.226 95.09 9.226 95.09 mxv_core 1199 1.1 christos 0.455 4.69 0.210 2.17 init_data 1200 1.1 christos 0.169 1.75 0.123 1.26 erand48_r 1201 1.1 christos 0.244 2.52 0.075 0.77 drand48 1202 1.1 christos @end verbatim 1203 1.1 christos @end smallexample 1204 1.1 christos 1205 1.1 christos In the first part of the output the comment lines in the script file are 1206 1.1.1.2 christos echoed. These are interleaved with an acknowledgement message for the 1207 1.1.1.2 christos commands. 1208 1.1 christos 1209 1.1 christos This is followed by a profile consisting of 5 lines only. For both metrics, 1210 1.1.1.2 christos the percentages plus the timings are given. The numbers are sorted with 1211 1.1.1.2 christos respect to the exclusive total CPU time. Although this is the default, for 1212 1.1 christos demonstration purposes we use the @command{sort} command to explicitly define 1213 1.1 christos the metric for the sort. 1214 1.1 christos 1215 1.1 christos While we executed the same job as before and only changed the name of the 1216 1.1 christos experiment directory, the results are somewhat different. This is sampling 1217 1.1 christos in action. The numbers are not all that different though. 1218 1.1 christos It is seen that function @code{mxv_core} is responsbile for 1219 1.1 christos 95% of the CPU time and @code{init_data} takes 4.5% only. 1220 1.1 christos 1221 1.1 christos @c -- A new node -------------------------------------------------------------- 1222 1.1 christos @node The Call Tree 1223 1.1 christos @subsection The Call Tree 1224 1.1 christos @c ---------------------------------------------------------------------------- 1225 1.1 christos 1226 1.1 christos The call tree shows the dynamic structure of the application by displaying the 1227 1.1 christos functions executed and their parent. The CPU time attributed to each function 1228 1.1.1.2 christos is shown as well. This view helps to find the most expensive 1229 1.1 christos execution path in the program. 1230 1.1 christos 1231 1.1 christos @IndexSubentry{Options, @code{-calltree}} 1232 1.1 christos @IndexSubentry{Commands, @code{calltree}} 1233 1.1 christos This feature is enabled through the @command{calltree} command. For example, 1234 1.1 christos this is how to get the call tree for our current experiment: 1235 1.1 christos 1236 1.1 christos @cartouche 1237 1.1 christos @smallexample 1238 1.1 christos $ gprofng display text -calltree mxv.1.thr.er 1239 1.1 christos @end smallexample 1240 1.1 christos @end cartouche 1241 1.1 christos 1242 1.1 christos This displays the following structure: 1243 1.1 christos 1244 1.1 christos @smallexample 1245 1.1 christos @verbatim 1246 1.1 christos Functions Call Tree. Metric: Attributed Total CPU Time 1247 1.1 christos 1248 1.1 christos Attr. Total Name 1249 1.1 christos CPU 1250 1.1 christos sec. % 1251 1.1 christos 9.703 100.00 +-<Total> 1252 1.1 christos 9.226 95.09 +-start_thread 1253 1.1 christos 9.226 95.09 | +-<static>@0x47960 (<libgp-collector.so>) 1254 1.1 christos 9.226 95.09 | +-driver_mxv 1255 1.1 christos 9.226 95.09 | +-mxv_core 1256 1.1 christos 0.477 4.91 +-__libc_start_main 1257 1.1 christos 0.477 4.91 +-main 1258 1.1 christos 0.455 4.69 +-init_data 1259 1.1 christos 0.244 2.52 | +-drand48 1260 1.1 christos 0.169 1.75 | +-erand48_r 1261 1.1 christos 0.047 0.48 | +-__drand48_iterate 1262 1.1 christos 0.021 0.22 +-allocate_data 1263 1.1 christos 0.021 0.22 | +-malloc 1264 1.1 christos 0.021 0.22 | +-_int_malloc 1265 1.1 christos 0.006 0.06 | +-sysmalloc 1266 1.1 christos 0.003 0.03 | +-__default_morecore 1267 1.1 christos 0.003 0.03 | +-sbrk 1268 1.1 christos 0.003 0.03 | +-brk 1269 1.1 christos 0.001 0.01 +-pthread_create 1270 1.1 christos 0.001 0.01 +-__pthread_create_2_1 1271 1.1 christos @end verbatim 1272 1.1 christos @end smallexample 1273 1.1 christos 1274 1.1 christos At first sight this may not be what is expected and some explanation is in 1275 1.1 christos place. 1276 1.1 christos 1277 1.1 christos @c ---------------------------------------------------------------------------- 1278 1.1 christos @c TBD: Revise this text when we have user and machine mode. 1279 1.1 christos @c ---------------------------------------------------------------------------- 1280 1.1 christos The top function is the pseudo-function @code{<Total>} that we have seen 1281 1.1 christos before. It is introduced and shown here to provide the total value of the 1282 1.1 christos metric(s). 1283 1.1 christos 1284 1.1 christos We also see function @code{<static>@@0x47960} in the call tree and apparently 1285 1.1 christos it is from @code{libgp-collector.so}, a library that is internal to 1286 1.1 christos @ToolName{}. 1287 1.1 christos The @code{<static>} marker, followed by the program counter, is shown if the 1288 1.1 christos name of the function cannot be found. This function is part of the 1289 1.1 christos implementation of the data collection process and should be hidden to the 1290 1.1 christos user. This is part of a planned future enhancement. 1291 1.1 christos 1292 1.1 christos In general, if a view has a function that does not appear to be part of the 1293 1.1 christos user code, or seems odd anyhow, the @command{objects} and @command{fsingle} 1294 1.1 christos @IndexSubentry{Options, @code{-objects}} 1295 1.1 christos @IndexSubentry{Commands, @code{objects}} 1296 1.1 christos @IndexSubentry{Options, @code{-fsingle}} 1297 1.1 christos @IndexSubentry{Commands, @code{fsingle}} 1298 1.1 christos commands are very useful 1299 1.1 christos to find out more about load objects in general, but also to help identify 1300 1.1 christos an unknown entry in the function overview. @xref{Load Objects and Functions}. 1301 1.1 christos 1302 1.1 christos Another thing to note is that there are two main branches. The one under 1303 1.1 christos @code{<static>@@0x47960} and the second one under @code{__libc_start_main}. 1304 1.1 christos This reflects the fact that this is a multithreaded program and the 1305 1.1 christos threaded part shows up as a separate branch in the call tree. 1306 1.1 christos 1307 1.1 christos The way to interpret this structure is as follows. The program starts 1308 1.1 christos under control of @code{__libc_start_main}. This executes the main program 1309 1.1 christos called @code{main}, which at the top level executes functions 1310 1.1 christos @code{init_data}, @code{allocate_data}, and @code{pthread_create}. 1311 1.1 christos The latter function creates and executes the additional thread(s). 1312 1.1 christos 1313 1.1 christos For this multithreaded part of the code, we need to look at the branch 1314 1.1 christos under function @code{start_thread} that calls the driver code for the 1315 1.1 christos matrix-vector multiplication (@code{driver_mxv}), which executes the function 1316 1.1 christos that performs the actual multiplication (@code{mxv_core}). 1317 1.1 christos 1318 1.1 christos There are two things worth noting for the call tree feature: 1319 1.1 christos 1320 1.1 christos @itemize 1321 1.1 christos 1322 1.1 christos @item 1323 1.1 christos This is a dynamic tree and since sampling is used, it most likely looks 1324 1.1 christos slighlty different across seemingly identical profile runs. In case the 1325 1.1 christos run times are short, it is worth considering to use a high resolution 1326 1.1 christos through the 1327 1.1 christos @IndexSubentry{Options, @code{-p}} 1328 1.1 christos @samp{-p} 1329 1.1 christos option. For example use @samp{-p hi} to increase the sampling rate. 1330 1.1 christos 1331 1.1 christos @item 1332 1.1 christos In case hardware event counters have been enabled 1333 1.1 christos (@xref{Profile Hardware Event Counters}), these values are also displayed 1334 1.1 christos in the call tree view. 1335 1.1 christos 1336 1.1 christos @end itemize 1337 1.1 christos 1338 1.1 christos @c -- A new node -------------------------------------------------------------- 1339 1.1 christos @node More Information on the Experiment 1340 1.1 christos @subsection More Information on the Experiment 1341 1.1 christos @c ---------------------------------------------------------------------------- 1342 1.1 christos 1343 1.1 christos The experiment directory not only contains performance related data. Several 1344 1.1 christos system characteristics, the profiling command executed, plus some global 1345 1.1 christos performance statistics are stored and can be displayed. 1346 1.1 christos 1347 1.1 christos @IndexSubentry{Options, @code{-header}} 1348 1.1 christos @IndexSubentry{Commands, @code{header}} 1349 1.1 christos The @command{header} command displays information about the experiment(s). 1350 1.1 christos For example, this is command is used to extract this data from for our 1351 1.1 christos experiment directory: 1352 1.1 christos 1353 1.1 christos @cartouche 1354 1.1 christos @smallexample 1355 1.1 christos $ gprofng display text -header mxv.1.thr.er 1356 1.1 christos @end smallexample 1357 1.1 christos @end cartouche 1358 1.1 christos 1359 1.1 christos The above command prints the following information. Note that some of the 1360 1.1 christos lay-out and the information has been modified. Directory paths have been 1361 1.1 christos replaced @code{<apath>} for example. Textual changes are 1362 1.1 christos marked with the @samp{<} and @samp{>} symbols. 1363 1.1 christos 1364 1.1 christos @smallexample 1365 1.1 christos @verbatim 1366 1.1 christos Experiment: mxv.1.thr.er 1367 1.1 christos No errors 1368 1.1 christos No warnings 1369 1.1.1.2 christos Archive command ` /usr/bin/gprofng-archive -n -a on --outfile 1370 1.1 christos <apath>/archive.log <apath>/mxv.1.thr.er' 1371 1.1 christos 1372 1.1 christos Target command (64-bit): './mxv-pthreads -m 8000 -n 4000 -t 1' 1373 1.1 christos Process pid 2750071, ppid 2750069, pgrp 2749860, sid 2742080 1374 1.1 christos Current working directory: <apath> 1375 1.1 christos Collector version: `2.40.00'; experiment version 12.4 (64-bit) 1376 1.1 christos Host `<the-host-name>', OS `Linux <version>', page size 4096, 1377 1.1 christos architecture `x86_64' 1378 1.1 christos 4 CPUs, clock speed 2294 MHz. 1379 1.1 christos Memory: 3506491 pages @ 4096 = 13697 MB. 1380 1.1 christos Data collection parameters: 1381 1.1 christos Clock-profiling, interval = 997 microsecs. 1382 1.1 christos Periodic sampling, 1 secs. 1383 1.1 christos Follow descendant processes from: fork|exec|combo 1384 1.1 christos 1385 1.1 christos Experiment started <date and time> 1386 1.1 christos 1387 1.1 christos Experiment Ended: 9.801216173 1388 1.1 christos Data Collection Duration: 9.801216173 1389 1.1 christos @end verbatim 1390 1.1 christos @end smallexample 1391 1.1 christos 1392 1.1 christos The output above may assist in troubleshooting, or to verify some of the 1393 1.1 christos operational conditions and we recommend to include this command when 1394 1.1 christos generating a profile. 1395 1.1 christos 1396 1.1 christos @IndexSubentry{Options, @code{-C}} 1397 1.1 christos Related to this command there is a useful option to record comment(s) in 1398 1.1 christos an experiment. 1399 1.1 christos To this end, use the @samp{-C} option on the @CollectApp{} tool to 1400 1.1 christos specify a comment string. Up to ten comment lines can be included. 1401 1.1 christos These comments are displayed with the @command{header} command on 1402 1.1 christos the @DisplayText{} tool. 1403 1.1 christos 1404 1.1 christos @IndexSubentry{Options, @code{-overview}} 1405 1.1 christos @IndexSubentry{Commands, @code{overview}} 1406 1.1 christos The @command{overview} command displays information on the experiment(s) and 1407 1.1 christos also shows a summary of the values for the metric(s) used. This is an example 1408 1.1 christos how to use it on the newly created experiment directory: 1409 1.1 christos 1410 1.1 christos @cartouche 1411 1.1 christos @smallexample 1412 1.1 christos $ gprofng display text -overview mxv.1.thr.er 1413 1.1 christos @end smallexample 1414 1.1 christos @end cartouche 1415 1.1 christos 1416 1.1 christos @smallexample 1417 1.1 christos @verbatim 1418 1.1 christos Experiment(s): 1419 1.1 christos 1420 1.1 christos Experiment :mxv.1.thr.er 1421 1.1 christos Target : './mxv-pthreads -m 8000 -n 4000 -t 1' 1422 1.1 christos Host : <hostname> (<ISA>, Linux <version>) 1423 1.1 christos Start Time : <date and time> 1424 1.1 christos Duration : 9.801 Seconds 1425 1.1 christos 1426 1.1 christos Metrics: 1427 1.1 christos 1428 1.1 christos Experiment Duration (Seconds): [9.801] 1429 1.1 christos Clock Profiling 1430 1.1 christos [X]Total CPU Time - totalcpu (Seconds): [*9.703] 1431 1.1 christos 1432 1.1 christos Notes: '*' indicates hot metrics, '[X]' indicates currently enabled 1433 1.1 christos metrics. 1434 1.1 christos The metrics command can be used to change selections. The 1435 1.1 christos metric_list command lists all available metrics. 1436 1.1 christos @end verbatim 1437 1.1 christos @end smallexample 1438 1.1 christos 1439 1.1 christos This command provides a dashboard overview that helps to easily identify 1440 1.1 christos where the time is spent and in case hardware event counters are used, it 1441 1.1 christos shows their total values. 1442 1.1 christos 1443 1.1 christos @c -- A new node -------------------------------------------------------------- 1444 1.1 christos @node Control the Sampling Frequency 1445 1.1 christos @subsection Control the Sampling Frequency 1446 1.1 christos @c ---------------------------------------------------------------------------- 1447 1.1 christos 1448 1.1 christos @cindex Sampling frequency 1449 1.1 christos So far we did not go into details on the frequency of the sampling process, 1450 1.1 christos but in some cases it is useful to change the default of 10 milliseconds. 1451 1.1 christos 1452 1.1 christos The advantage of increasing the sampling frequency is that functions that 1453 1.1 christos do not take much time per invocation are more accurately captured. The 1454 1.1 christos downside is that more data is gathered. This has an impact on the overhead 1455 1.1 christos of the collection process and more disk space is required. 1456 1.1 christos 1457 1.1 christos In general this is not an immediate concern, but with heavily threaded 1458 1.1 christos applications that run for an extended period of time, increasing the 1459 1.1 christos frequency may have a more noticeable impact. 1460 1.1 christos 1461 1.1 christos @IndexSubentry{Options, @code{-p}} 1462 1.1 christos The @code{-p} option on the @CollectApp{} tool is used to enable or disable 1463 1.1 christos clock based profiling, or to explicitly set the sampling rate. 1464 1.1 christos @cindex Sampling interval 1465 1.1 christos This option takes one of the following keywords: 1466 1.1 christos 1467 1.1 christos @table @code 1468 1.1 christos 1469 1.1 christos @item off 1470 1.1 christos Disable clock based profiling. 1471 1.1 christos 1472 1.1 christos @item on 1473 1.1 christos Enable clock based profiling with a per thread sampling interval of 10 ms. 1474 1.1 christos This is the default. 1475 1.1 christos 1476 1.1 christos @item lo 1477 1.1 christos Enable clock based profiling with a per thread sampling interval of 100 ms. 1478 1.1 christos 1479 1.1 christos @item hi 1480 1.1 christos Enable clock based profiling with a per thread sampling interval of 1 ms. 1481 1.1 christos 1482 1.1 christos @item @var{value} 1483 1.1 christos @cindex Sampling interval 1484 1.1 christos Enable clock based profiling with a per thread sampling interval of 1485 1.1 christos @var{value}. 1486 1.1 christos 1487 1.1 christos @end table 1488 1.1 christos 1489 1.1 christos It may seem unnecessary to have an option to disable clock based profiling, 1490 1.1 christos but there is a good reason to support this. 1491 1.1 christos By default, clock profiling is enabled when conducting hardware event counter 1492 1.1 christos experiments (@xref{Profile Hardware Event Counters}). 1493 1.1 christos With the @code{-p off} option, this can be disabled. 1494 1.1 christos 1495 1.1.1.2 christos If an explicit value is set for the sampling, the number can be an integer or 1496 1.1.1.2 christos a floating-point number. 1497 1.1.1.2 christos A suffix of @samp{u} for microseconds, or @samp{m} for milliseconds is 1498 1.1.1.2 christos supported. If no suffix is used, the value is assumed to be in milliseconds. 1499 1.1 christos For example, the following command sets the sampling rate to 1500 1.1 christos 5123.4 microseconds: 1501 1.1 christos 1502 1.1 christos @cartouche 1503 1.1 christos @smallexample 1504 1.1 christos $ gprofng collect app -p 5123.4u ./mxv-pthreads -m 8000 -n 4000 -t 1 1505 1.1 christos @end smallexample 1506 1.1 christos @end cartouche 1507 1.1 christos 1508 1.1.1.2 christos If the value is smaller than the clock profiling minimum, a warning message 1509 1.1.1.2 christos is issued and it is set to the minimum. 1510 1.1.1.2 christos In case it is not a multiple of the clock profiling resolution, it is 1511 1.1.1.2 christos silently rounded down to the nearest multiple of the clock resolution. 1512 1.1.1.2 christos If the value exceeds the clock profiling maximum, is negative, or zero, an 1513 1.1.1.2 christos error is reported. 1514 1.1 christos 1515 1.1 christos @IndexSubentry{Options, @code{-header}} 1516 1.1 christos @IndexSubentry{Commands, @code{header}} 1517 1.1 christos @noindent 1518 1.1 christos Note that the @code{header} command echoes the sampling rate used. 1519 1.1 christos 1520 1.1 christos @c -- A new node -------------------------------------------------------------- 1521 1.1 christos @node Information on Load Objects 1522 1.1 christos @subsection Information on Load Objects 1523 1.1 christos @c ---------------------------------------------------------------------------- 1524 1.1 christos 1525 1.1 christos It may happen that the function view shows a function that is not known to 1526 1.1 christos the user. This can easily happen with library functions for example. 1527 1.1 christos Luckily there are three commands that come in handy then. 1528 1.1 christos 1529 1.1 christos @IndexSubentry{Options, @code{-objects}} 1530 1.1 christos @IndexSubentry{Commands, @code{objects}} 1531 1.1 christos @IndexSubentry{Options, @code{-fsingle}} 1532 1.1 christos @IndexSubentry{Commands, @code{fsingle}} 1533 1.1 christos @IndexSubentry{Options, @code{-fsummary}} 1534 1.1 christos @IndexSubentry{Commands, @code{fsummary}} 1535 1.1.1.2 christos These commands are @command{objects}, @command{fsingle}, and 1536 1.1.1.2 christos @command{fsummary}. They provide details on 1537 1.1 christos @cindex Load objects 1538 1.1 christos load objects (@xref{Load Objects and Functions}). 1539 1.1 christos 1540 1.1 christos The @command{objects} command lists all load objects that have been referenced 1541 1.1 christos during the performance experiment. 1542 1.1 christos Below we show the command and the result for our profile job. Like before, 1543 1.1 christos some path names in the output have been shortened and replaced by the 1544 1.1 christos @IndexSubentry{Miscellaneous, @code{<apath>}} 1545 1.1 christos @code{<apath>} symbol that represents an absolute directory path. 1546 1.1 christos 1547 1.1 christos @cartouche 1548 1.1 christos @smallexample 1549 1.1 christos $ gprofng display text -objects mxv.1.thr.er 1550 1.1 christos @end smallexample 1551 1.1 christos @end cartouche 1552 1.1 christos 1553 1.1 christos The output includes the name and path of the target executable: 1554 1.1 christos 1555 1.1 christos @smallexample 1556 1.1 christos @verbatim 1557 1.1 christos <Unknown> (<Unknown>) 1558 1.1 christos <mxv-pthreads> (<apath>/mxv-pthreads) 1559 1.1 christos <libdl-2.28.so> (/usr/lib64/libdl-2.28.so) 1560 1.1 christos <librt-2.28.so> (/usr/lib64/librt-2.28.so) 1561 1.1 christos <libc-2.28.so> (/usr/lib64/libc-2.28.so) 1562 1.1 christos <libpthread-2.28.so> (/usr/lib64/libpthread-2.28.so) 1563 1.1 christos <libm-2.28.so> (/usr/lib64/libm-2.28.so) 1564 1.1 christos <libgp-collector.so> (/usr/lib64/gprofng/libgp-collector.so) 1565 1.1 christos <ld-2.28.so> (/usr/lib64/ld-2.28.so) 1566 1.1 christos <DYNAMIC_FUNCTIONS> (DYNAMIC_FUNCTIONS) 1567 1.1 christos @end verbatim 1568 1.1 christos @end smallexample 1569 1.1 christos 1570 1.1 christos @IndexSubentry{Options, @code{-fsingle}} 1571 1.1 christos @IndexSubentry{Commands, @code{fsingle}} 1572 1.1.1.2 christos The @command{fsingle} command may be used to get more details on a specific 1573 1.1.1.2 christos entry in the function view, say. For example, the command below provides 1574 1.1.1.2 christos additional information on the @code{pthread_create} function shown in the 1575 1.1.1.2 christos function overview. 1576 1.1 christos 1577 1.1 christos @cartouche 1578 1.1 christos @smallexample 1579 1.1 christos $ gprofng display text -fsingle pthread_create mxv.1.thr.er 1580 1.1 christos @end smallexample 1581 1.1 christos @end cartouche 1582 1.1 christos 1583 1.1.1.2 christos Below the output from this command. It has been somewhat modified to match 1584 1.1.1.2 christos the display requirements. 1585 1.1 christos 1586 1.1 christos @smallexample 1587 1.1 christos @verbatim 1588 1.1 christos + gprofng display text -fsingle pthread_create mxv.1.thr.er 1589 1.1 christos pthread_create 1590 1.1 christos Exclusive Total CPU Time: 0. ( 0. %) 1591 1.1 christos Inclusive Total CPU Time: 0.001 ( 0.0%) 1592 1.1 christos Size: 258 1593 1.1 christos PC Address: 8:0x00049f60 1594 1.1 christos Source File: (unknown) 1595 1.1 christos Object File: (unknown) 1596 1.1 christos Load Object: /usr/lib64/gprofng/libgp-collector.so 1597 1.1 christos Mangled Name: 1598 1.1 christos Aliases: 1599 1.1 christos @end verbatim 1600 1.1 christos @end smallexample 1601 1.1 christos 1602 1.1 christos In this table we not only see how much time was spent in this function, we 1603 1.1 christos also see where it originates from. In addition to this, the size and start 1604 1.1 christos address are given as well. If the source code location is known it is also 1605 1.1 christos shown here. 1606 1.1 christos 1607 1.1 christos @IndexSubentry{Options, @code{-fsummary}} 1608 1.1 christos @IndexSubentry{Commands, @code{fsummary}} 1609 1.1 christos The related @code{fsummary} command displays the same information as 1610 1.1 christos @code{fsingle}, but for all functions in the function overview, 1611 1.1 christos including @code{<Total>}: 1612 1.1 christos 1613 1.1 christos @cartouche 1614 1.1 christos @smallexample 1615 1.1 christos $ gprofng display text -fsummary mxv.1.thr.er 1616 1.1 christos @end smallexample 1617 1.1 christos @end cartouche 1618 1.1 christos 1619 1.1 christos @smallexample 1620 1.1 christos @verbatim 1621 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 1622 1.1 christos 1623 1.1 christos <Total> 1624 1.1.1.2 christos Exclusive Total CPU Time: 9.703 (100.0%) 1625 1.1.1.2 christos Inclusive Total CPU Time: 9.703 (100.0%) 1626 1.1.1.2 christos Size: 0 1627 1.1.1.2 christos PC Address: 1:0x00000000 1628 1.1.1.2 christos Source File: (unknown) 1629 1.1.1.2 christos Object File: (unknown) 1630 1.1.1.2 christos Load Object: <Total> 1631 1.1.1.2 christos Mangled Name: 1632 1.1.1.2 christos Aliases: 1633 1.1 christos 1634 1.1 christos mxv_core 1635 1.1.1.2 christos Exclusive Total CPU Time: 9.226 ( 95.1%) 1636 1.1.1.2 christos Inclusive Total CPU Time: 9.226 ( 95.1%) 1637 1.1.1.2 christos Size: 80 1638 1.1.1.2 christos PC Address: 2:0x00001d56 1639 1.1.1.2 christos Source File: <apath>/src/mxv.c 1640 1.1.1.2 christos Object File: mxv.1.thr.er/archives/mxv-pthreads_ss_pf53V__5 1641 1.1.1.2 christos Load Object: <apath>/mxv-pthreads 1642 1.1.1.2 christos Mangled Name: 1643 1.1.1.2 christos Aliases: 1644 1.1 christos 1645 1.1 christos ... etc ... 1646 1.1 christos @end verbatim 1647 1.1 christos @end smallexample 1648 1.1 christos 1649 1.1 christos @c -- A new node -------------------------------------------------------------- 1650 1.1 christos @node Support for Multithreading 1651 1.1 christos @section Support for Multithreading 1652 1.1 christos @c ---------------------------------------------------------------------------- 1653 1.1 christos 1654 1.1 christos In this chapter the support for multithreading is introduced and discussed. 1655 1.1 christos As is shown below, nothing needs to be changed when collecting the performance 1656 1.1 christos data. 1657 1.1 christos 1658 1.1 christos The difference is that additional commands are available to get more 1659 1.1 christos information on the multithreading details, plus that several filters allow 1660 1.1 christos the user to zoom in on specific threads. 1661 1.1 christos 1662 1.1 christos @c -- A new node -------------------------------------------------------------- 1663 1.1 christos @node Creating a Multithreading Experiment 1664 1.1 christos @subsection Creating a Multithreading Experiment 1665 1.1 christos @c ---------------------------------------------------------------------------- 1666 1.1 christos 1667 1.1 christos We demonstrate the support for multithreading using the same code and settings 1668 1.1 christos as before, but this time 2 threads are used: 1669 1.1 christos 1670 1.1 christos @cartouche 1671 1.1 christos @smallexample 1672 1.1 christos $ exe=mxv-pthreads 1673 1.1 christos $ m=8000 1674 1.1 christos $ n=4000 1675 1.1 christos $ gprofng collect app -O mxv.2.thr.er ./$exe -m $m -n $n -t 2 1676 1.1 christos @end smallexample 1677 1.1 christos @end cartouche 1678 1.1 christos 1679 1.1 christos First of all, in as far as @ProductName{} is concerned, no changes are needed. 1680 1.1.1.2 christos Nothing special is needed to profile a multithreaded job when using 1681 1.1.1.2 christos @ToolName{}. 1682 1.1 christos 1683 1.1 christos The same is true when displaying the performance results. The same commands 1684 1.1.1.2 christos that were used before work unmodified. For example, this is all that is 1685 1.1.1.2 christos needed to get a function overview: 1686 1.1 christos 1687 1.1 christos @cartouche 1688 1.1 christos @smallexample 1689 1.1 christos $ gprofng display text -limit 5 -functions mxv.2.thr.er 1690 1.1 christos @end smallexample 1691 1.1 christos @end cartouche 1692 1.1 christos 1693 1.1.1.2 christos @noindent 1694 1.1 christos This produces the following familiar looking output: 1695 1.1 christos 1696 1.1 christos @smallexample 1697 1.1 christos @verbatim 1698 1.1 christos Print limit set to 5 1699 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 1700 1.1 christos 1701 1.1 christos Excl. Total Incl. Total Name 1702 1.1 christos CPU CPU 1703 1.1 christos sec. % sec. % 1704 1.1 christos 9.464 100.00 9.464 100.00 <Total> 1705 1.1 christos 8.961 94.69 8.961 94.69 mxv_core 1706 1.1 christos 0.224 2.37 0.469 4.95 init_data 1707 1.1 christos 0.105 1.11 0.177 1.88 erand48_r 1708 1.1 christos 0.073 0.77 0.073 0.77 __drand48_iterate 1709 1.1 christos @end verbatim 1710 1.1 christos @end smallexample 1711 1.1 christos 1712 1.1 christos @c -- A new node -------------------------------------------------------------- 1713 1.1 christos @node Commands Specific to Multithreading 1714 1.1 christos @subsection Commands Specific to Multithreading 1715 1.1 christos @c ---------------------------------------------------------------------------- 1716 1.1 christos 1717 1.1 christos The function overview shown above shows the results aggregated over all the 1718 1.1 christos threads. The interesting new element is that we can also look at the 1719 1.1 christos performance data for the individual threads. 1720 1.1 christos 1721 1.1 christos @IndexSubentry{Options, @code{-thread_list}} 1722 1.1 christos @IndexSubentry{Commands, @code{thread_list}} 1723 1.1 christos The @command{thread_list} command displays how many threads have been used: 1724 1.1 christos 1725 1.1 christos @cartouche 1726 1.1 christos @smallexample 1727 1.1 christos $ gprofng display text -thread_list mxv.2.thr.er 1728 1.1 christos @end smallexample 1729 1.1 christos @end cartouche 1730 1.1 christos 1731 1.1 christos This produces the following output, showing that three threads have 1732 1.1 christos been used: 1733 1.1 christos 1734 1.1 christos @smallexample 1735 1.1 christos @verbatim 1736 1.1 christos Exp Sel Total 1737 1.1 christos === === ===== 1738 1.1 christos 1 all 3 1739 1.1 christos @end verbatim 1740 1.1 christos @end smallexample 1741 1.1 christos 1742 1.1 christos The output confirms there is one experiment and that by default all 1743 1.1 christos threads are selected. 1744 1.1 christos 1745 1.1 christos It may seem surprising to see three threads here, since we used the 1746 1.1 christos @code{-t 2} option, but it is common for a Pthreads program to use one 1747 1.1 christos additional thread. 1748 1.1 christos Typically, there is one main thread that runs from start to finish. 1749 1.1 christos It handles the sequential portions of the code, as well as thread 1750 1.1 christos management related tasks. 1751 1.1 christos It is no different in the example code. At some point, the main thread 1752 1.1 christos creates and activates the two threads that perform the multiplication 1753 1.1 christos of the matrix with the vector. Upon completion of this computation, 1754 1.1 christos the main thread continues. 1755 1.1 christos 1756 1.1 christos @IndexSubentry{Options, @code{-threads}} 1757 1.1 christos @IndexSubentry{Commands, @code{threads}} 1758 1.1 christos The @command{threads} command is simple, yet very powerful. It shows the 1759 1.1 christos total value of the metrics for each thread. 1760 1.1 christos 1761 1.1 christos @cartouche 1762 1.1 christos @smallexample 1763 1.1 christos $ gprofng display text -threads mxv.2.thr.er 1764 1.1 christos @end smallexample 1765 1.1 christos @end cartouche 1766 1.1 christos 1767 1.1 christos @noindent 1768 1.1 christos The command above produces the following overview: 1769 1.1 christos 1770 1.1 christos @smallexample 1771 1.1 christos @verbatim 1772 1.1 christos Objects sorted by metric: Exclusive Total CPU Time 1773 1.1 christos 1774 1.1 christos Excl. Total Name 1775 1.1 christos CPU 1776 1.1 christos sec. % 1777 1.1 christos 9.464 100.00 <Total> 1778 1.1 christos 4.547 48.05 Process 1, Thread 3 1779 1.1 christos 4.414 46.64 Process 1, Thread 2 1780 1.1 christos 0.502 5.31 Process 1, Thread 1 1781 1.1 christos @end verbatim 1782 1.1 christos @end smallexample 1783 1.1 christos 1784 1.1 christos The first line gives the total CPU time accumulated over the threads 1785 1.1 christos selected. This is followed by the metric value(s) for each thread. 1786 1.1 christos 1787 1.1 christos From this it is clear that the main thread is responsible for a 1788 1.1 christos little over 5% of the total CPU time, while the other two threads 1789 1.1 christos take 47-48% each. 1790 1.1 christos 1791 1.1 christos This view is ideally suited to verify if there are any load balancing 1792 1.1 christos issues and also to find the most time consuming thread(s). 1793 1.1 christos 1794 1.1 christos @IndexSubentry{Filters, Thread selection} 1795 1.1 christos While useful, often more information than this is needed. This is 1796 1.1 christos @IndexSubentry{Options, @code{-thread_select}} 1797 1.1 christos @IndexSubentry{Commands, @code{thread_select}} 1798 1.1 christos where the thread selection filter comes in. Through the 1799 1.1 christos @command{thread_select} 1800 1.1 christos command, one or more threads may be selected. 1801 1.1 christos @xref{The Selection List} how to define the selection list. 1802 1.1 christos 1803 1.1 christos Since it is most common to use this command in a script, we do so as 1804 1.1 christos well here. Below the script we are using: 1805 1.1 christos 1806 1.1 christos @cartouche 1807 1.1 christos @smallexample 1808 1.1 christos # Define the metrics 1809 1.1 christos metrics e.%totalcpu 1810 1.1 christos # Limit the output to 5 lines 1811 1.1 christos limit 5 1812 1.1 christos # Get the function overview for thread 1 1813 1.1 christos thread_select 1 1814 1.1 christos functions 1815 1.1 christos # Get the function overview for thread 2 1816 1.1 christos thread_select 2 1817 1.1 christos functions 1818 1.1 christos # Get the function overview for thread 3 1819 1.1 christos thread_select 3 1820 1.1 christos functions 1821 1.1 christos @end smallexample 1822 1.1 christos @end cartouche 1823 1.1 christos 1824 1.1 christos The definition of the metrics and the output limit have been shown and 1825 1.1 christos explained earlier. The new command to focus on is @command{thread_select}. 1826 1.1 christos 1827 1.1 christos This command takes a list (@xref{The Selection List}) to select specific 1828 1.1 christos threads. In this case, the individual thread numbers that were 1829 1.1 christos obtained earlier with the @command{thread_list} command are selected. 1830 1.1 christos 1831 1.1 christos This restricts the output of the @command{functions} command to the thread 1832 1.1 christos number(s) specified. This means that the script above shows which 1833 1.1 christos function(s) each thread executes and how much CPU time they consumed. 1834 1.1 christos Both the exclusive timings and their percentages are given. 1835 1.1 christos 1836 1.1.1.2 christos Note that technically this command is a filter and persistent. The 1837 1.1 christos selection remains active until changed through another thread selection 1838 1.1 christos command, or when it is reset with the @samp{all} selection list. 1839 1.1 christos 1840 1.1 christos @noindent 1841 1.1 christos This is the relevant part of the output for the first thread: 1842 1.1 christos 1843 1.1 christos @smallexample 1844 1.1 christos @verbatim 1845 1.1 christos Exp Sel Total 1846 1.1 christos === === ===== 1847 1.1 christos 1 1 3 1848 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 1849 1.1 christos 1850 1.1 christos Excl. Total Name 1851 1.1 christos CPU 1852 1.1 christos sec. % 1853 1.1 christos 0.502 100.00 <Total> 1854 1.1 christos 0.224 44.64 init_data 1855 1.1 christos 0.105 20.83 erand48_r 1856 1.1 christos 0.073 14.48 __drand48_iterate 1857 1.1 christos 0.067 13.29 drand48 1858 1.1 christos @end verbatim 1859 1.1 christos @end smallexample 1860 1.1 christos 1861 1.1 christos As usual, the comment lines are echoed. This is followed by a confirmation 1862 1.1 christos of the selection. The first table shows that one experiment is loaded and 1863 1.1 christos that thread 1 out of the three threads has been selected. What is 1864 1.1 christos displayed next is the function overview for this particular thread. Due to 1865 1.1 christos the @code{limit 5} command, there are only five functions in this list. 1866 1.1 christos 1867 1.1 christos Clearly, this thread handles the data initialization part and as we know 1868 1.1 christos from the call tree output, function @code{init_data} executes the 3 other 1869 1.1 christos functions shown in this profile. 1870 1.1 christos 1871 1.1 christos Below are the overviews for threads 2 and 3 respectively. It is seen that all 1872 1.1 christos of the CPU time is spent in function @code{mxv_core} and that this time 1873 1.1 christos is approximately the same for both threads. 1874 1.1 christos 1875 1.1 christos @smallexample 1876 1.1 christos @verbatim 1877 1.1 christos # Get the function overview for thread 2 1878 1.1 christos Exp Sel Total 1879 1.1 christos === === ===== 1880 1.1 christos 1 2 3 1881 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 1882 1.1 christos 1883 1.1 christos Excl. Total Name 1884 1.1 christos CPU 1885 1.1 christos sec. % 1886 1.1 christos 4.414 100.00 <Total> 1887 1.1 christos 4.414 100.00 mxv_core 1888 1.1 christos 0. 0. <static>@0x48630 (<libgp-collector.so>) 1889 1.1 christos 0. 0. driver_mxv 1890 1.1 christos 0. 0. start_thread 1891 1.1 christos 1892 1.1 christos # Get the function overview for thread 3 1893 1.1 christos Exp Sel Total 1894 1.1 christos === === ===== 1895 1.1 christos 1 3 3 1896 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 1897 1.1 christos 1898 1.1 christos Excl. Total Name 1899 1.1 christos CPU 1900 1.1 christos sec. % 1901 1.1 christos 4.547 100.00 <Total> 1902 1.1 christos 4.547 100.00 mxv_core 1903 1.1 christos 0. 0. <static>@0x48630 (<libgp-collector.so>) 1904 1.1 christos 0. 0. driver_mxv 1905 1.1 christos 0. 0. start_thread 1906 1.1 christos @end verbatim 1907 1.1 christos @end smallexample 1908 1.1 christos 1909 1.1 christos When analyzing the performance of a multithreaded application, it is sometimes 1910 1.1 christos useful to know whether threads have mostly executed on the same core, say, or 1911 1.1.1.2 christos if they have wandered across multiple cores. This sort of stickiness is 1912 1.1.1.2 christos usually referred to as 1913 1.1 christos @cindex Thread affinity 1914 1.1 christos @emph{thread affinity}. 1915 1.1 christos 1916 1.1 christos Similar to the commands for the threads, there are several commands related 1917 1.1 christos to the usage of the cores, or @emph{CPUs} as they are called in @ToolName{} 1918 1.1 christos (@xref{The Concept of a CPU in gprofng}). 1919 1.1 christos 1920 1.1 christos @IndexSubentry{Options, @code{-cpu_list}} 1921 1.1 christos @IndexSubentry{Commands, @code{cpu_list}} 1922 1.1 christos Similar to the @command{thread_list} command, the @command{cpu_list} command 1923 1.1 christos displays how many CPUs have been used. 1924 1.1 christos @IndexSubentry{Options, @code{-cpus}} 1925 1.1 christos @IndexSubentry{Commands, @code{cpus}} 1926 1.1 christos The equivalent of the @command{threads} threads command, is the @command{cpus} 1927 1.1.1.2 christos command, which shows the numbers of the CPUs that were used and the metric 1928 1.1.1.2 christos values for each one of them. Both commands are demonstrated below. 1929 1.1 christos 1930 1.1 christos @cartouche 1931 1.1 christos @smallexample 1932 1.1 christos $ gprofng display text -cpu_list -cpus mxv.2.thr.er 1933 1.1 christos @end smallexample 1934 1.1 christos @end cartouche 1935 1.1 christos 1936 1.1 christos @noindent 1937 1.1 christos This command produces the following output: 1938 1.1 christos 1939 1.1 christos @smallexample 1940 1.1 christos @verbatim 1941 1.1 christos + gprofng display text -cpu_list -cpus mxv.2.thr.er 1942 1.1 christos Exp Sel Total 1943 1.1 christos === === ===== 1944 1.1 christos 1 all 4 1945 1.1 christos Objects sorted by metric: Exclusive Total CPU Time 1946 1.1 christos 1947 1.1 christos Excl. Total Name 1948 1.1 christos CPU 1949 1.1 christos sec. % 1950 1.1 christos 9.464 100.00 <Total> 1951 1.1 christos 4.414 46.64 CPU 2 1952 1.1 christos 2.696 28.49 CPU 0 1953 1.1 christos 1.851 19.56 CPU 1 1954 1.1 christos 0.502 5.31 CPU 3 1955 1.1 christos @end verbatim 1956 1.1 christos @end smallexample 1957 1.1 christos 1958 1.1 christos The first table shows that there is only one experiment and that all of the 1959 1.1 christos four CPUs have been selected. The second table shows the exclusive metrics 1960 1.1 christos for each of the CPUs that have been used. 1961 1.1 christos 1962 1.1.1.2 christos As also echoed in the output, the data is sorted with respect to the 1963 1.1 christos exclusive CPU time, but it is very easy to sort the data by the CPU id 1964 1.1 christos @IndexSubentry{Options, -sort} 1965 1.1 christos @IndexSubentry{Commands, sort} 1966 1.1 christos by using the @command{sort} command: 1967 1.1 christos 1968 1.1 christos @cartouche 1969 1.1 christos @smallexample 1970 1.1 christos $ gprofng display text -cpu_list -sort name -cpus mxv.2.thr.er 1971 1.1 christos @end smallexample 1972 1.1 christos @end cartouche 1973 1.1 christos 1974 1.1 christos @noindent 1975 1.1 christos With the @command{sort} added, the output is as follows: 1976 1.1 christos 1977 1.1 christos @smallexample 1978 1.1 christos @verbatim 1979 1.1 christos Exp Sel Total 1980 1.1 christos === === ===== 1981 1.1 christos 1 all 4 1982 1.1 christos Current Sort Metric: Name ( name ) 1983 1.1 christos Objects sorted by metric: Name 1984 1.1 christos 1985 1.1 christos Excl. Total Name 1986 1.1 christos CPU 1987 1.1 christos sec. % 1988 1.1 christos 9.464 100.00 <Total> 1989 1.1 christos 2.696 28.49 CPU 0 1990 1.1 christos 1.851 19.56 CPU 1 1991 1.1 christos 4.414 46.64 CPU 2 1992 1.1 christos 0.502 5.31 CPU 3 1993 1.1 christos @end verbatim 1994 1.1 christos @end smallexample 1995 1.1 christos 1996 1.1 christos While the table with thread times shown earlier may point at a load imbalance 1997 1.1 christos in the application, this overview has a different purpose. 1998 1.1 christos 1999 1.1 christos For example, we see that 4 CPUs have been used, but we know that the 2000 1.1 christos application uses 3 threads only. 2001 1.1 christos We will now demonstrate how filters can be used to help answer the 2002 1.1 christos question why 4 CPUs are used, while the application has 3 threads only. 2003 1.1 christos This means that at least one thread has executed on more than one CPU. 2004 1.1 christos 2005 1.1 christos Recall the thread level timings: 2006 1.1 christos 2007 1.1 christos @smallexample 2008 1.1 christos @verbatim 2009 1.1 christos Excl. Total Name 2010 1.1 christos CPU 2011 1.1 christos sec. % 2012 1.1 christos 9.464 100.00 <Total> 2013 1.1 christos 4.547 48.05 Process 1, Thread 3 2014 1.1 christos 4.414 46.64 Process 1, Thread 2 2015 1.1 christos 0.502 5.31 Process 1, Thread 1 2016 1.1 christos @end verbatim 2017 1.1 christos @end smallexample 2018 1.1 christos 2019 1.1 christos Compared to the CPU timings above, it seems very likely that thread 3 has 2020 1.1 christos used more than one CPU, because the thread and CPU timings are the same 2021 1.1 christos for both other threads. 2022 1.1 christos 2023 1.1 christos The command below selects thread number 3 and then requests the CPU 2024 1.1 christos utilization for this thread: 2025 1.1 christos 2026 1.1 christos @cartouche 2027 1.1 christos @smallexample 2028 1.1 christos $ gprofng display text -thread_select 3 -sort name -cpus mxv.2.thr.er 2029 1.1 christos @end smallexample 2030 1.1 christos @end cartouche 2031 1.1 christos 2032 1.1 christos The output shown below confirms that thread 3 is selected and then displays 2033 1.1 christos the CPU(s) that have been used by this thread: 2034 1.1 christos 2035 1.1 christos @smallexample 2036 1.1 christos @verbatim 2037 1.1 christos Exp Sel Total 2038 1.1 christos === === ===== 2039 1.1 christos 1 3 3 2040 1.1 christos 2041 1.1 christos Objects sorted by metric: Exclusive Total CPU Time 2042 1.1 christos 2043 1.1 christos Excl. Total Name 2044 1.1 christos CPU 2045 1.1 christos sec. % 2046 1.1 christos 4.547 100.00 <Total> 2047 1.1 christos 2.696 59.29 CPU 0 2048 1.1 christos 1.851 40.71 CPU 1 2049 1.1 christos @end verbatim 2050 1.1 christos @end smallexample 2051 1.1 christos 2052 1.1 christos The results show that this thread has used CPU 0 nearly 60% of the time 2053 1.1 christos and CPU 1 for the remaining 40%. 2054 1.1 christos 2055 1.1 christos To confirm that this is the only thread that has used more than one CPU, the 2056 1.1 christos same approach can be used for threads 1 and 2: 2057 1.1 christos 2058 1.1 christos @smallexample 2059 1.1 christos @verbatim 2060 1.1 christos $ gprofng display text -thread_select 1 -cpus mxv.2.thr.er 2061 1.1 christos Exp Sel Total 2062 1.1 christos === === ===== 2063 1.1 christos 1 1 3 2064 1.1 christos Objects sorted by metric: Exclusive Total CPU Time 2065 1.1 christos 2066 1.1 christos Excl. Total Name 2067 1.1 christos CPU 2068 1.1 christos sec. % 2069 1.1 christos 0.502 100.00 <Total> 2070 1.1 christos 0.502 100.00 CPU 3 2071 1.1 christos @end verbatim 2072 1.1 christos @end smallexample 2073 1.1 christos 2074 1.1 christos @smallexample 2075 1.1 christos @verbatim 2076 1.1 christos $ gprofng display text -thread_select 2 -cpus mxv.2.thr.er 2077 1.1 christos Exp Sel Total 2078 1.1 christos === === ===== 2079 1.1 christos 1 2 3 2080 1.1 christos Objects sorted by metric: Exclusive Total CPU Time 2081 1.1 christos 2082 1.1 christos Excl. Total Name 2083 1.1 christos CPU 2084 1.1 christos sec. % 2085 1.1 christos 4.414 100.00 <Total> 2086 1.1 christos 4.414 100.00 CPU 2 2087 1.1 christos @end verbatim 2088 1.1 christos @end smallexample 2089 1.1 christos 2090 1.1 christos @noindent 2091 1.1 christos The output above shows that indeed threads 1 and 2 each have used a single 2092 1.1 christos CPU only. 2093 1.1 christos 2094 1.1 christos @c -- A new node -------------------------------------------------------------- 2095 1.1 christos @node View Multiple Experiments 2096 1.1 christos @section View Multiple Experiments 2097 1.1 christos @c ---------------------------------------------------------------------------- 2098 1.1 christos 2099 1.1.1.2 christos One thing we did not cover sofar is that @ToolName{} fully supports the 2100 1.1.1.2 christos analysis of multiple experiments. The @DisplayText{} tool accepts a list of 2101 1.1.1.2 christos experiments. The data can either be aggregated across the experiments, or 2102 1.1.1.2 christos used in a comparison. 2103 1.1 christos 2104 1.1 christos The default is to aggregate the metric values across the experiments that have 2105 1.1 christos been loaded. The @command{compare} command can be used to enable the 2106 1.1 christos @IndexSubentry{Options, @code{-compare}} 2107 1.1 christos @IndexSubentry{Commands, @code{compare}} 2108 1.1 christos comparison of results. 2109 1.1 christos 2110 1.1 christos In this section both modes are illustrated with an example. 2111 1.1 christos 2112 1.1 christos @c -- A new node -------------------------------------------------------------- 2113 1.1 christos @node Aggregation of Experiments 2114 1.1 christos @subsection Aggregation of Experiments 2115 1.1 christos @c ---------------------------------------------------------------------------- 2116 1.1 christos 2117 1.1 christos If the data for multiple experiments is aggregrated, the @DisplayText{} tool 2118 1.1 christos shows the combined results. 2119 1.1 christos For example, below is the script to show the function view for the data 2120 1.1 christos aggregated over two experiments, drop the first experiment and then show 2121 1.1.1.2 christos the function view for the second experiment only. 2122 1.1 christos We will call it @file{my-script-agg}. 2123 1.1 christos 2124 1.1 christos @cartouche 2125 1.1 christos @smallexample 2126 1.1 christos # Define the metrics 2127 1.1 christos metrics e.%totalcpu 2128 1.1 christos # Limit the output to 5 lines 2129 1.1 christos limit 5 2130 1.1 christos # Get the list with experiments 2131 1.1 christos experiment_list 2132 1.1 christos # Get the function overview for all 2133 1.1 christos functions 2134 1.1 christos # Drop the first experiment 2135 1.1 christos drop_exp mxv.2.thr.er 2136 1.1 christos # Get the function overview for exp #2 2137 1.1 christos functions 2138 1.1 christos @end smallexample 2139 1.1 christos @end cartouche 2140 1.1 christos 2141 1.1 christos @IndexSubentry{Options, @code{-experiment_list}} 2142 1.1 christos @IndexSubentry{Commands, @code{experiment_list}} 2143 1.1 christos With the exception of the @command{experiment_list} command, all commands 2144 1.1 christos used have been discussed earlier. 2145 1.1 christos 2146 1.1 christos The @command{experiment_list} command provides a list of the experiments 2147 1.1 christos that have been loaded. This may be used to get the experiment IDs and 2148 1.1 christos to verify the correct experiments are loaded for the aggregation. 2149 1.1 christos 2150 1.1 christos @noindent 2151 1.1 christos Below is an example that loads two experiments and uses the above 2152 1.1 christos script to display different function views. 2153 1.1 christos 2154 1.1 christos @cartouche 2155 1.1 christos @smallexample 2156 1.1 christos $ gprofng display text -script my-script-agg mxv.2.thr.er mxv.4.thr.er 2157 1.1 christos @end smallexample 2158 1.1 christos @end cartouche 2159 1.1 christos 2160 1.1 christos @noindent 2161 1.1 christos This produces the following output: 2162 1.1 christos 2163 1.1 christos @smallexample 2164 1.1 christos @verbatim 2165 1.1 christos # Define the metrics 2166 1.1 christos Current metrics: e.%totalcpu:name 2167 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.%totalcpu ) 2168 1.1 christos # Limit the output to 5 lines 2169 1.1 christos Print limit set to 5 2170 1.1 christos # Get the list with experiments 2171 1.1 christos ID Sel PID Experiment 2172 1.1 christos == === ======= ============ 2173 1.1 christos 1 yes 1339450 mxv.2.thr.er 2174 1.1 christos 2 yes 3579561 mxv.4.thr.er 2175 1.1 christos # Get the function overview for all 2176 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 2177 1.1 christos 2178 1.1 christos Excl. Total Name 2179 1.1 christos CPU 2180 1.1 christos sec. % 2181 1.1 christos 20.567 100.00 <Total> 2182 1.1 christos 19.553 95.07 mxv_core 2183 1.1 christos 0.474 2.30 init_data 2184 1.1 christos 0.198 0.96 erand48_r 2185 1.1 christos 0.149 0.72 drand48 2186 1.1 christos 2187 1.1 christos # Drop the first experiment 2188 1.1 christos Experiment mxv.2.thr.er has been dropped 2189 1.1 christos # Get the function overview for exp #2 2190 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 2191 1.1 christos 2192 1.1 christos Excl. Total Name 2193 1.1 christos CPU 2194 1.1 christos sec. % 2195 1.1 christos 11.104 100.00 <Total> 2196 1.1 christos 10.592 95.39 mxv_core 2197 1.1 christos 0.249 2.24 init_data 2198 1.1 christos 0.094 0.84 erand48_r 2199 1.1 christos 0.082 0.74 drand48 2200 1.1 christos @end verbatim 2201 1.1 christos @end smallexample 2202 1.1 christos 2203 1.1 christos The first five lines should look familiar. The five lines following echo 2204 1.1 christos the comment line in the script and show the overview of the experiments. 2205 1.1 christos This confirms two experiments have been loaded and that both are active. 2206 1.1 christos This is followed by the function overview. The timings have been summed 2207 1.1 christos up and the percentages are adjusted accordingly. 2208 1.1 christos 2209 1.1 christos @c -- A new node -------------------------------------------------------------- 2210 1.1 christos @node Comparison of Experiments 2211 1.1 christos @subsection Comparison of Experiments 2212 1.1 christos @c ---------------------------------------------------------------------------- 2213 1.1 christos 2214 1.1.1.2 christos The support for multiple experiments really shines in comparison mode. 2215 1.1 christos @cindex Compare experiments 2216 1.1 christos In comparison mode, the data for the various experiments is shown side by 2217 1.1 christos side, as illustrated below where we compare the results for the multithreaded 2218 1.1 christos experiments using two and four threads respectively. 2219 1.1 christos 2220 1.1 christos This 2221 1.1 christos feature is controlled through the 2222 1.1 christos @IndexSubentry{Options, @code{-compare}} 2223 1.1 christos @IndexSubentry{Commands, @code{compare}} 2224 1.1 christos @code{compare} command. 2225 1.1 christos 2226 1.1 christos The comparison mode is enabled through @command{compare on} and with 2227 1.1 christos @command{compare off} it is disabled again. 2228 1.1 christos In addition to @samp{on}, or @samp{off}, this command also supports 2229 1.1 christos the @samp{delta} and @samp{ratio} keywords. 2230 1.1 christos 2231 1.1 christos This is the script that will be used in our example. It sets the comparison 2232 1.1 christos mode to @samp{on}: 2233 1.1 christos 2234 1.1 christos @smallexample 2235 1.1 christos @verbatim 2236 1.1 christos # Define the metrics 2237 1.1 christos metrics e.%totalcpu 2238 1.1 christos # Limit the output to 5 lines 2239 1.1 christos limit 5 2240 1.1 christos # Set the comparison mode to differences 2241 1.1 christos compare on 2242 1.1 christos # Get the function overview 2243 1.1 christos functions 2244 1.1 christos @end verbatim 2245 1.1 christos @end smallexample 2246 1.1 christos 2247 1.1 christos Assuming this script file is called @file{my-script-comp}, this is how 2248 1.1 christos it is used to display the differences: 2249 1.1 christos 2250 1.1 christos @cartouche 2251 1.1 christos @smallexample 2252 1.1 christos $ gprofng display text -script my-script-comp mxv.2.thr.er mxv.4.thr.er 2253 1.1 christos @end smallexample 2254 1.1 christos @end cartouche 2255 1.1 christos 2256 1.1 christos @noindent 2257 1.1 christos This produces the output shown below. The data for the first experiment 2258 1.1 christos is shown as absolute numbers. The timings for the other experiment are 2259 1.1 christos shown as a delta relative to these reference numbers: 2260 1.1 christos 2261 1.1 christos @smallexample 2262 1.1 christos @verbatim 2263 1.1 christos 2264 1.1 christos mxv.2.thr.er mxv.4.thr.er 2265 1.1 christos Excl. Total Excl. Total Name 2266 1.1 christos CPU CPU 2267 1.1 christos sec. % sec. % 2268 1.1 christos 9.464 100.00 11.104 100.00 <Total> 2269 1.1 christos 8.961 94.69 10.592 95.39 mxv_core 2270 1.1 christos 0.224 2.37 0.249 2.24 init_data 2271 1.1 christos 0.105 1.11 0.094 0.84 erand48_r 2272 1.1 christos 0.073 0.77 0.060 0.54 __drand48_iterate 2273 1.1 christos @end verbatim 2274 1.1 christos @end smallexample 2275 1.1 christos 2276 1.1 christos This table is already helpful to more easily compare (two) profiles, but 2277 1.1 christos there is more that we can do here. 2278 1.1 christos 2279 1.1 christos By default, in comparison mode, all measured values are shown. Often 2280 1.1 christos profiling is about comparing performance data. It is therefore 2281 1.1 christos sometimes more useful to look at differences or ratios, using one 2282 1.1 christos experiment as a reference. 2283 1.1 christos 2284 1.1 christos The values shown are relative to this difference. For example if a ratio 2285 1.1 christos is below one, it means the reference value was higher. 2286 1.1 christos 2287 1.1 christos In the example below, we use the same two experiments used in the comparison 2288 1.1.1.2 christos above. The script is also nearly identical. The only change is that we now 2289 1.1 christos use the @samp{delta} keyword. 2290 1.1 christos 2291 1.1 christos As before, the number of lines is restricted to 5 and we focus on 2292 1.1 christos the exclusive timings plus percentages. For the comparison part we are 2293 1.1 christos interested in the differences. 2294 1.1 christos 2295 1.1 christos This is the script that produces such an overview: 2296 1.1 christos 2297 1.1 christos @smallexample 2298 1.1 christos @verbatim 2299 1.1 christos # Define the metrics 2300 1.1 christos metrics e.%totalcpu 2301 1.1 christos # Limit the output to 5 lines 2302 1.1 christos limit 5 2303 1.1 christos # Set the comparison mode to differences 2304 1.1 christos compare delta 2305 1.1 christos # Get the function overview 2306 1.1 christos functions 2307 1.1 christos @end verbatim 2308 1.1 christos @end smallexample 2309 1.1 christos 2310 1.1 christos Assuming this script file is called @file{my-script-comp2}, this is how we 2311 1.1 christos get the table displayed on our screen: 2312 1.1 christos 2313 1.1 christos @cartouche 2314 1.1 christos @smallexample 2315 1.1 christos $ gprofng display text -script my-script-comp2 mxv.2.thr.er mxv.4.thr.er 2316 1.1 christos @end smallexample 2317 1.1 christos @end cartouche 2318 1.1 christos 2319 1.1 christos Leaving out some of the lines printed, but we have seen before, we get 2320 1.1 christos the following table: 2321 1.1 christos 2322 1.1 christos @smallexample 2323 1.1 christos @verbatim 2324 1.1 christos mxv.2.thr.er mxv.4.thr.er 2325 1.1 christos Excl. Total Excl. Total Name 2326 1.1 christos CPU CPU 2327 1.1 christos sec. % delta % 2328 1.1 christos 9.464 100.00 +1.640 100.00 <Total> 2329 1.1 christos 8.961 94.69 +1.631 95.39 mxv_core 2330 1.1 christos 0.224 2.37 +0.025 2.24 init_data 2331 1.1 christos 0.105 1.11 -0.011 0.84 erand48_r 2332 1.1 christos 0.073 0.77 -0.013 0.54 __drand48_iterate 2333 1.1 christos @end verbatim 2334 1.1 christos @end smallexample 2335 1.1 christos 2336 1.1 christos It is now easier to see that the CPU times for the most time consuming 2337 1.1 christos functions in this code are practically the same. 2338 1.1 christos 2339 1.1 christos It is also possible to show ratio's through the @command{compare ratio} 2340 1.1 christos @IndexSubentry{Options, @code{-compare}} 2341 1.1 christos @IndexSubentry{Commands, @code{compare}} 2342 1.1 christos command. The first colum is used as a reference and the values for 2343 1.1 christos the other columns with metrics are derived by dividing the value by 2344 1.1 christos the reference. The result for such a comparison is shown below: 2345 1.1 christos 2346 1.1 christos @smallexample 2347 1.1 christos @verbatim 2348 1.1 christos mxv.2.thr.er mxv.4.thr.er 2349 1.1 christos Excl. Total Excl. Total CPU Name 2350 1.1 christos CPU 2351 1.1 christos sec. % ratio % 2352 1.1 christos 9.464 100.00 x 1.173 100.00 <Total> 2353 1.1 christos 8.961 94.69 x 1.182 95.39 mxv_core 2354 1.1 christos 0.224 2.37 x 1.111 2.24 init_data 2355 1.1 christos 0.105 1.11 x 0.895 0.84 erand48_r 2356 1.1 christos 0.073 0.77 x 0.822 0.54 __drand48_iterate 2357 1.1 christos @end verbatim 2358 1.1 christos @end smallexample 2359 1.1 christos 2360 1.1 christos Note that the comparison feature is supported at the function, source, and 2361 1.1 christos disassembly level. There is no practical limit on the number of experiments 2362 1.1 christos that can be used in a comparison. 2363 1.1 christos 2364 1.1 christos @c -- A new node -------------------------------------------------------------- 2365 1.1 christos @node Profile Hardware Event Counters 2366 1.1 christos @section Profile Hardware Event Counters 2367 1.1 christos @c ---------------------------------------------------------------------------- 2368 1.1 christos 2369 1.1 christos Many processors provide a set of hardware event counters and @ToolName{} 2370 1.1 christos provides support for this feature. 2371 1.1 christos @xref{Hardware Event Counters Explained} for those readers that are not 2372 1.1 christos familiar with such counters and like to learn more. 2373 1.1 christos 2374 1.1 christos In this section we explain how to get the details on the event counter 2375 1.1 christos support for the processor used in the experiment(s), and show several 2376 1.1 christos examples. 2377 1.1 christos 2378 1.1 christos @c -- A new node -------------------------------------------------------------- 2379 1.1 christos @node Getting Information on the Counters Supported 2380 1.1 christos @subsection Getting Information on the Counters Supported 2381 1.1 christos @c ---------------------------------------------------------------------------- 2382 1.1 christos 2383 1.1 christos The first step is to check if the processor used for the experiments is 2384 1.1 christos supported by @ToolName{}. 2385 1.1 christos @IndexSubentry{Options, @code{-h}} 2386 1.1 christos The @code{-h} option on @CollectApp{} will show the event counter 2387 1.1 christos information: 2388 1.1 christos 2389 1.1 christos @cartouche 2390 1.1 christos @smallexample 2391 1.1 christos $ gprofng collect app -h 2392 1.1 christos @end smallexample 2393 1.1 christos @end cartouche 2394 1.1 christos 2395 1.1 christos In case the counters are supported, a list with the events is printed. 2396 1.1 christos Otherwise, a warning message will be issued. 2397 1.1 christos 2398 1.1 christos For example, below we show this command and the output on an Intel Xeon 2399 1.1 christos Platinum 8167M (aka ``Skylake'') processor. The output has been split 2400 1.1 christos into several sections and each section is commented upon separately. 2401 1.1 christos 2402 1.1 christos @smallexample 2403 1.1 christos @verbatim 2404 1.1 christos Run "gprofng collect app --help" for a usage message. 2405 1.1 christos 2406 1.1 christos Specifying HW counters on `Intel Arch PerfMon v2 on Family 6 Model 85' 2407 1.1 christos (cpuver=2499): 2408 1.1 christos 2409 1.1 christos -h {auto|lo|on|hi} 2410 1.1 christos turn on default set of HW counters at the specified rate 2411 1.1 christos -h <ctr_def> [-h <ctr_def>]... 2412 1.1 christos -h <ctr_def>[,<ctr_def>]... 2413 1.1 christos specify HW counter profiling for up to 4 HW counters 2414 1.1 christos @end verbatim 2415 1.1 christos @end smallexample 2416 1.1 christos 2417 1.1 christos The first line shows how to get a usage overview. This is followed by 2418 1.1 christos some information on the target processor. 2419 1.1 christos The next five lines explain in what ways the @code{-h} option can be 2420 1.1 christos used to define the events to be monitored. 2421 1.1 christos 2422 1.1 christos The first version shown above enables a default set of counters. This 2423 1.1 christos default depends on the processor this command is executed on. The 2424 1.1 christos keyword following the @code{-h} option defines the sampling rate: 2425 1.1 christos 2426 1.1 christos @table @code 2427 1.1 christos 2428 1.1 christos @item auto 2429 1.1 christos Match the sample rate of used by clock profiling. If the latter is disabled, 2430 1.1 christos Use a per thread sampling rate of approximately 100 samples per second. 2431 1.1 christos This setting is the default and preferred. 2432 1.1 christos 2433 1.1 christos @item on 2434 1.1 christos Use a per thread sampling rate of approximately 100 samples per second. 2435 1.1 christos 2436 1.1 christos @item lo 2437 1.1 christos Use a per thread sampling rate of approximately 10 samples per second. 2438 1.1 christos 2439 1.1 christos @item hi 2440 1.1 christos Use a per thread sampling rate of approximately 1000 samples per second. 2441 1.1 christos 2442 1.1 christos @end table 2443 1.1 christos 2444 1.1 christos The second and third variant define the events to be monitored. Note 2445 1.1 christos that the number of simultaneous events supported is printed. In this 2446 1.1 christos case we can monitor four events in a single profiling job. 2447 1.1 christos 2448 1.1 christos It is a matter of preference whether you like to use the @code{-h} 2449 1.1 christos option for each event, or use it once, followed by a comma separated 2450 1.1 christos list. 2451 1.1 christos 2452 1.1 christos There is one slight catch though. The counter definition below has 2453 1.1 christos mandatory comma (@code{,}) between the event and the rate. While a 2454 1.1 christos default can be used for the rate, the comma cannot be omitted. 2455 1.1 christos This may result in a somewhat awkward counter definition in case 2456 1.1 christos the default sampling rate is used. 2457 1.1 christos 2458 1.1 christos For example, the following two commands are equivalent. Note 2459 1.1 christos the double comma in the second command. This is not a typo. 2460 1.1 christos 2461 1.1 christos @cartouche 2462 1.1 christos @smallexample 2463 1.1 christos $ gprofng collect app -h cycles -h insts ... 2464 1.1 christos $ gprofng collect app -h cycles,,insts ... 2465 1.1 christos @end smallexample 2466 1.1 christos @end cartouche 2467 1.1 christos 2468 1.1 christos In the first command this comma is not needed, because a 2469 1.1 christos comma (``@code{,}'') immediately followed by white space may 2470 1.1 christos be omitted. 2471 1.1 christos 2472 1.1 christos This is why we prefer the this syntax and in the remainder will 2473 1.1 christos use the first version of this command. 2474 1.1 christos 2475 1.1 christos @IndexSubentry{Hardware event counters, counter definition} 2476 1.1 christos The counter definition takes an event name, plus optionally one or 2477 1.1 christos more attributes, followed by a comma, and optionally the sampling rate. 2478 1.1 christos The output section below shows the formal definition. 2479 1.1 christos 2480 1.1 christos @cartouche 2481 1.1 christos @smallexample 2482 1.1 christos <ctr_def> == <ctr>[[~<attr>=<val>]...],[<rate>] 2483 1.1 christos @end smallexample 2484 1.1 christos @end cartouche 2485 1.1 christos 2486 1.1 christos The printed help then explains this syntax. Below we have summarized 2487 1.1 christos and expanded this output: 2488 1.1 christos 2489 1.1 christos @table @code 2490 1.1 christos 2491 1.1 christos @item @var{<ctr>} 2492 1.1 christos The counter name must be selected from the available counters listed 2493 1.1 christos as part of the output printed with the @code{-h} option. 2494 1.1 christos On most systems, if a counter is not listed, it may still be specified 2495 1.1 christos by its numeric value. 2496 1.1 christos 2497 1.1 christos @item @var{~<attr>=<val>} 2498 1.1 christos This is an optional attribute that depends on the processor. The list 2499 1.1 christos of supported attributes is printed in the output. Examples of 2500 1.1 christos attributes are ``user'', or ``system''. The value can given in decimal 2501 1.1 christos or hexadecimal format. 2502 1.1 christos Multiple attributes may be specified, and each must be preceded 2503 1.1 christos by a ~. 2504 1.1 christos 2505 1.1 christos @item @var{<rate>} 2506 1.1 christos 2507 1.1 christos The sampling rate is one of the following: 2508 1.1 christos 2509 1.1 christos @table @code 2510 1.1 christos 2511 1.1 christos @item auto 2512 1.1 christos This is the default and matches the rate used by clock profiling. 2513 1.1 christos If clock profiling is disabled, use @samp{on}. 2514 1.1 christos 2515 1.1 christos @item on 2516 1.1 christos Set the per thread maximum sampling rate to ~100 samples/second 2517 1.1 christos 2518 1.1 christos @item lo 2519 1.1 christos Set the per thread maximum sampling rate to ~10 samples/second 2520 1.1 christos 2521 1.1 christos @item hi 2522 1.1 christos Set the per thread maximum sampling rate to ~1000 samples/second 2523 1.1 christos 2524 1.1 christos @item @var{<interval>} 2525 1.1 christos Define the sampling interval. 2526 1.1 christos @xref{Control the Sampling Frequency} how to define this. 2527 1.1 christos 2528 1.1 christos @end table 2529 1.1 christos 2530 1.1 christos @end table 2531 1.1 christos 2532 1.1 christos After the section with the formal definition of events and counters, a 2533 1.1 christos processor specific list is displayed. This part starts with an overview 2534 1.1 christos of the default set of counters and the aliased names supported 2535 1.1 christos @emph{on this specific processor}. 2536 1.1 christos 2537 1.1 christos @smallexample 2538 1.1 christos @verbatim 2539 1.1 christos Default set of HW counters: 2540 1.1 christos 2541 1.1 christos -h cycles,,insts,,llm 2542 1.1 christos 2543 1.1 christos Aliases for most useful HW counters: 2544 1.1 christos 2545 1.1 christos alias raw name type units regs description 2546 1.1 christos 2547 1.1 christos cycles unhalted-core-cycles CPU-cycles 0123 CPU Cycles 2548 1.1 christos insts instruction-retired events 0123 Instructions Executed 2549 1.1 christos llm llc-misses events 0123 Last-Level Cache Misses 2550 1.1 christos br_msp branch-misses-retired events 0123 Branch Mispredict 2551 1.1 christos br_ins branch-instruction-retired events 0123 Branch Instructions 2552 1.1 christos @end verbatim 2553 1.1 christos @end smallexample 2554 1.1 christos 2555 1.1 christos @noindent 2556 1.1 christos The definitions given above may or may not be available on other processors. 2557 1.1 christos 2558 1.1 christos The table above shows the default set of counters defined for this processor, 2559 1.1 christos and the aliases. For each alias the full ``raw'' name is given, plus the 2560 1.1 christos unit of the number returned by the counter (CPU cycles, or a raw count), 2561 1.1 christos the hardware counter the event is allowed to be mapped onto, and a short 2562 1.1 christos description. 2563 1.1 christos 2564 1.1 christos The last part of the output contains all the events that can be monitored: 2565 1.1 christos 2566 1.1 christos @smallexample 2567 1.1 christos @verbatim 2568 1.1 christos Raw HW counters: 2569 1.1 christos 2570 1.1 christos name type units regs description 2571 1.1 christos 2572 1.1 christos unhalted-core-cycles CPU-cycles 0123 2573 1.1 christos unhalted-reference-cycles events 0123 2574 1.1 christos instruction-retired events 0123 2575 1.1 christos llc-reference events 0123 2576 1.1 christos llc-misses events 0123 2577 1.1 christos branch-instruction-retired events 0123 2578 1.1 christos branch-misses-retired events 0123 2579 1.1 christos ld_blocks.store_forward events 0123 2580 1.1 christos ld_blocks.no_sr events 0123 2581 1.1 christos ld_blocks_partial.address_alias events 0123 2582 1.1 christos dtlb_load_misses.miss_causes_a_walk events 0123 2583 1.1 christos dtlb_load_misses.walk_completed_4k events 0123 2584 1.1 christos 2585 1.1 christos <many lines deleted> 2586 1.1 christos 2587 1.1 christos l2_lines_out.silent events 0123 2588 1.1 christos l2_lines_out.non_silent events 0123 2589 1.1 christos l2_lines_out.useless_hwpf events 0123 2590 1.1 christos sq_misc.split_lock events 0123 2591 1.1 christos @end verbatim 2592 1.1 christos @end smallexample 2593 1.1 christos 2594 1.1 christos As can be seen, these names are not always easy to correlate to a specific 2595 1.1 christos event of interest. The processor manual should provide more clarity on this. 2596 1.1 christos 2597 1.1 christos @c -- A new node -------------------------------------------------------------- 2598 1.1 christos @node Examples Using Hardware Event Counters 2599 1.1 christos @subsection Examples Using Hardware Event Counters 2600 1.1 christos @c ---------------------------------------------------------------------------- 2601 1.1 christos 2602 1.1 christos The previous section may give the impression that these counters are hard to 2603 1.1 christos use, but as we will show now, in practice it is quite simple. 2604 1.1 christos 2605 1.1 christos With the information from the @code{-h} option, we can easily set up our first 2606 1.1 christos event counter experiment. 2607 1.1 christos 2608 1.1 christos We start by using the default set of counters defined for our processor and we 2609 1.1 christos use 2 threads: 2610 1.1 christos 2611 1.1 christos @cartouche 2612 1.1 christos @smallexample 2613 1.1 christos $ exe=mxv-pthreads 2614 1.1 christos $ m=8000 2615 1.1 christos $ n=4000 2616 1.1 christos $ exp=mxv.hwc.def.2.thr.er 2617 1.1 christos $ gprofng collect app -O $exp -h auto ./$exe -m $m -n $n -t 2 2618 1.1 christos @end smallexample 2619 1.1 christos @end cartouche 2620 1.1 christos 2621 1.1 christos @IndexSubentry{Options, @code{-h}} 2622 1.1 christos @IndexSubentry{Hardware event counters, @code{auto} option} 2623 1.1 christos The new option here is @code{-h auto}. The @code{auto} keyword enables 2624 1.1 christos hardware event counter profiling and selects the default set of counters 2625 1.1 christos defined for this processor. 2626 1.1 christos 2627 1.1 christos As before, we can display the information, but there is one practical hurdle 2628 1.1 christos to take. Unless we like to view all metrics recorded, we would need to know 2629 1.1 christos the names of the events that have been enabled. This is tedious and also not 2630 1.1 christos portable in case we would like to repeat this experiment on another processor. 2631 1.1 christos 2632 1.1 christos @IndexSubentry{Hardware event counters, @code{hwc} metric} 2633 1.1 christos This is where the special @code{hwc} metric comes very handy. It 2634 1.1 christos automatically expands to the active set of hardware event counters used 2635 1.1 christos in the experiment(s). 2636 1.1 christos 2637 1.1 christos With this, it is very easy to display the event counter values. Note that 2638 1.1 christos although the regular clock based profiling was enabled, we only want to see 2639 1.1 christos the counter values. We also request to see the percentages and limit the 2640 1.1 christos output to the first 5 lines: 2641 1.1 christos 2642 1.1 christos @cartouche 2643 1.1 christos @smallexample 2644 1.1 christos $ exp=mxv.hwc.def.2.thr.er 2645 1.1 christos $ gprofng display text -metrics e.%hwc -limit 5 -functions $exp 2646 1.1 christos @end smallexample 2647 1.1 christos @end cartouche 2648 1.1 christos 2649 1.1 christos @smallexample 2650 1.1 christos @verbatim 2651 1.1 christos Current metrics: e.%cycles:e+%insts:e+%llm:name 2652 1.1 christos Current Sort Metric: Exclusive CPU Cycles ( e.%cycles ) 2653 1.1 christos Print limit set to 5 2654 1.1 christos Functions sorted by metric: Exclusive CPU Cycles 2655 1.1 christos 2656 1.1 christos Excl. CPU Excl. Instructions Excl. Last-Level Name 2657 1.1 christos Cycles Executed Cache Misses 2658 1.1 christos sec. % % % 2659 1.1 christos 2.691 100.00 7906475309 100.00 122658983 100.00 <Total> 2660 1.1 christos 2.598 96.54 7432724378 94.01 121745696 99.26 mxv_core 2661 1.1 christos 0.035 1.31 188860269 2.39 70084 0.06 erand48_r 2662 1.1 christos 0.026 0.95 73623396 0.93 763116 0.62 init_data 2663 1.1 christos 0.018 0.66 76824434 0.97 40040 0.03 drand48 2664 1.1 christos @end verbatim 2665 1.1 christos @end smallexample 2666 1.1 christos 2667 1.1 christos As we have seen before, the first few lines echo the settings. 2668 1.1 christos This includes a list with the hardware event counters used by 2669 1.1 christos default. 2670 1.1 christos 2671 1.1 christos The table that follows makes it very easy to get an overview where the 2672 1.1 christos time is spent and how many of the target events have occurred. 2673 1.1 christos 2674 1.1 christos As before, we can drill down deeper and see the same metrics at the source 2675 1.1 christos line and instruction level. Other than using @code{hwc} in the metrics 2676 1.1 christos definitions, nothing has changed compared to the previous examples: 2677 1.1 christos 2678 1.1 christos @cartouche 2679 1.1 christos @smallexample 2680 1.1 christos $ exp=mxv.hwc.def.2.thr.er 2681 1.1 christos $ gprofng display text -metrics e.hwc -source mxv_core $exp 2682 1.1 christos @end smallexample 2683 1.1 christos @end cartouche 2684 1.1 christos 2685 1.1 christos This is the relevant part of the output. Since the lines get very long, 2686 1.1 christos we have somewhat modified the lay-out: 2687 1.1 christos 2688 1.1 christos @smallexample 2689 1.1 christos @verbatim 2690 1.1 christos Excl. CPU Excl. Excl. 2691 1.1 christos Cycles Instructions Last-Level 2692 1.1 christos sec. Executed Cache Misses 2693 1.1 christos <Function: mxv_core> 2694 1.1 christos 0. 0 0 32. void __attribute__ ((noinline)) 2695 1.1 christos mxv_core(...) 2696 1.1 christos 0. 0 0 33. { 2697 1.1 christos 0. 0 0 34. for (uint64_t i=...) { 2698 1.1 christos 0. 0 0 35. double row_sum = 0.0; 2699 1.1 christos ## 1.872 7291879319 88150571 36. for (int64_t j=0; j<n; j++) 2700 1.1 christos 0.725 140845059 33595125 37. row_sum += A[i][j]*b[j]; 2701 1.1 christos 0. 0 0 38. c[i] = row_sum; 2702 1.1 christos 39. } 2703 1.1 christos 0. 0 0 40. } 2704 1.1 christos @end verbatim 2705 1.1 christos @end smallexample 2706 1.1 christos 2707 1.1 christos In a smiliar way we can display the event counter values at the instruction 2708 1.1 christos level. Again we have modified the lay-out due to page width limitations: 2709 1.1 christos 2710 1.1 christos @cartouche 2711 1.1 christos @smallexample 2712 1.1 christos $ exp=mxv.hwc.def.2.thr.er 2713 1.1 christos $ gprofng display text -metrics e.hwc -disasm mxv_core $exp 2714 1.1 christos @end smallexample 2715 1.1 christos @end cartouche 2716 1.1 christos 2717 1.1 christos @smallexample 2718 1.1 christos @verbatim 2719 1.1 christos Excl. CPU Excl. Excl. 2720 1.1 christos Cycles Instructions Last-Level 2721 1.1 christos sec. Executed Cache Misses 2722 1.1 christos <Function: mxv_core> 2723 1.1 christos 0. 0 0 [33] 4021ba: mov 0x8(%rsp),%r10 2724 1.1 christos 34. for (uint64_t i=...) { 2725 1.1 christos 0. 0 0 [34] 4021bf: cmp %rsi,%rdi 2726 1.1 christos 0. 0 0 [34] 4021c2: jbe 0x37 2727 1.1 christos 0. 0 0 [34] 4021c4: ret 2728 1.1 christos 35. double row_sum = 0.0; 2729 1.1 christos 36. for (int64_t j=0; j<n; j++) 2730 1.1 christos 37. row_sum += A[i][j]*b[j]; 2731 1.1 christos 0. 0 0 [37] 4021c5: mov (%r8,%rdi,8),%rdx 2732 1.1 christos 0. 0 0 [36] 4021c9: mov $0x0,%eax 2733 1.1 christos 0. 0 0 [35] 4021ce: pxor %xmm1,%xmm1 2734 1.1 christos 0.002 12804230 321394 [37] 4021d2: movsd (%rdx,%rax,8),%xmm0 2735 1.1 christos 0.141 60819025 3866677 [37] 4021d7: mulsd (%r9,%rax,8),%xmm0 2736 1.1 christos 0.582 67221804 29407054 [37] 4021dd: addsd %xmm0,%xmm1 2737 1.1 christos ## 1.871 7279075109 87989870 [36] 4021e1: add $0x1,%rax 2738 1.1 christos 0.002 12804210 80351 [36] 4021e5: cmp %rax,%rcx 2739 1.1 christos 0. 0 0 [36] 4021e8: jne 0xffffffffffffffea 2740 1.1 christos 38. c[i] = row_sum; 2741 1.1 christos 0. 0 0 [38] 4021ea: movsd %xmm1,(%r10,%rdi,8) 2742 1.1 christos 0. 0 0 [34] 4021f0: add $0x1,%rdi 2743 1.1 christos 0. 0 0 [34] 4021f4: cmp %rdi,%rsi 2744 1.1 christos 0. 0 0 [34] 4021f7: jb 0xd 2745 1.1 christos 0. 0 0 [35] 4021f9: pxor %xmm1,%xmm1 2746 1.1 christos 0. 0 0 [36] 4021fd: test %rcx,%rcx 2747 1.1 christos 0. 0 80350 [36] 402200: jne 0xffffffffffffffc5 2748 1.1 christos 0. 0 0 [36] 402202: jmp 0xffffffffffffffe8 2749 1.1 christos 39. } 2750 1.1 christos 40. } 2751 1.1 christos 0. 0 0 [40] 402204: ret 2752 1.1 christos @end verbatim 2753 1.1 christos @end smallexample 2754 1.1 christos 2755 1.1 christos So far we have used the default settings for the event counters. It is 2756 1.1 christos quite straightforward to select specific counters. For sake of the 2757 1.1 christos example, let's assume we would like to count how many branch instructions 2758 1.1 christos and retired memory load instructions that missed in the L1 cache have been 2759 1.1 christos executed. We also want to count these events with a high resolution. 2760 1.1 christos 2761 1.1 christos This is the command to do so: 2762 1.1 christos 2763 1.1 christos @cartouche 2764 1.1 christos @smallexample 2765 1.1 christos $ exe=mxv-pthreads 2766 1.1 christos $ m=8000 2767 1.1 christos $ n=4000 2768 1.1 christos $ exp=mxv.hwc.sel.2.thr.er 2769 1.1 christos $ hwc1=br_ins,hi 2770 1.1 christos $ hwc2=mem_load_retired.l1_miss,hi 2771 1.1 christos $ gprofng collect app -O $exp -h $hwc1 -h $hwc2 $exe -m $m -n $n -t 2 2772 1.1 christos @end smallexample 2773 1.1 christos @end cartouche 2774 1.1 christos 2775 1.1 christos As before, we get a table with the event counts. Due to the very 2776 1.1 christos long name for the second counter, we have somewhat modified the 2777 1.1 christos output. 2778 1.1 christos 2779 1.1 christos @cartouche 2780 1.1 christos @smallexample 2781 1.1 christos $ gprofng display text -limit 10 -functions mxv.hwc.sel.2.thr.er 2782 1.1 christos @end smallexample 2783 1.1 christos @end cartouche 2784 1.1 christos 2785 1.1 christos @smallexample 2786 1.1 christos @verbatim 2787 1.1 christos Functions sorted by metric: Exclusive Total CPU Time 2788 1.1 christos Excl. Incl. Excl. Branch Excl. Name 2789 1.1 christos Total Total Instructions mem_load_retired.l1_miss 2790 1.1 christos CPU sec. CPU sec. Events 2791 1.1 christos 2.597 2.597 1305305319 4021340 <Total> 2792 1.1 christos 2.481 2.481 1233233242 3982327 mxv_core 2793 1.1 christos 0.040 0.107 19019012 9003 init_data 2794 1.1 christos 0.028 0.052 23023048 15006 erand48_r 2795 1.1 christos 0.024 0.024 19019008 9004 __drand48_iterate 2796 1.1 christos 0.015 0.067 11011009 2998 drand48 2797 1.1 christos 0.008 0.010 0 3002 _int_malloc 2798 1.1 christos 0.001 0.001 0 0 brk 2799 1.1 christos 0.001 0.002 0 0 sysmalloc 2800 1.1 christos 0. 0.001 0 0 __default_morecore 2801 1.1 christos @end verbatim 2802 1.1 christos @end smallexample 2803 1.1 christos 2804 1.1 christos @IndexSubentry{Options, @code{-compare}} 2805 1.1 christos @IndexSubentry{Commands, @code{compare}} 2806 1.1 christos When using event counters, the values could be very large and it is not easy 2807 1.1 christos to compare the numbers. As we will show next, the @code{ratio} feature is 2808 1.1 christos very useful when comparing such profiles. 2809 1.1 christos 2810 1.1 christos To demonstrate this, we have set up another event counter experiment where 2811 1.1 christos we would like to compare the number of last level cache miss and the number 2812 1.1 christos of branch instructions executed when using a single thread, or two threads. 2813 1.1 christos 2814 1.1 christos These are the commands used to generate the experiment directories: 2815 1.1 christos 2816 1.1 christos @cartouche 2817 1.1 christos @smallexample 2818 1.1 christos $ exe=./mxv-pthreads 2819 1.1 christos $ m=8000 2820 1.1 christos $ n=4000 2821 1.1 christos $ exp1=mxv.hwc.comp.1.thr.er 2822 1.1 christos $ exp2=mxv.hwc.comp.2.thr.er 2823 1.1 christos $ gprofng collect app -O $exp1 -h llm -h br_ins $exe -m $m -n $n -t 1 2824 1.1 christos $ gprofng collect app -O $exp2 -h llm -h br_ins $exe -m $m -n $n -t 2 2825 1.1 christos @end smallexample 2826 1.1 christos @end cartouche 2827 1.1 christos 2828 1.1 christos The following script has been used to get the tables. Due to lay-out 2829 1.1 christos restrictions, we have to create two tables, one for each counter. 2830 1.1 christos 2831 1.1 christos @cartouche 2832 1.1 christos @smallexample 2833 1.1 christos # Limit the output to 5 lines 2834 1.1 christos limit 5 2835 1.1 christos # Define the metrics 2836 1.1 christos metrics name:e.llm 2837 1.1 christos # Set the comparison to ratio 2838 1.1 christos compare ratio 2839 1.1 christos functions 2840 1.1 christos # Define the metrics 2841 1.1 christos metrics name:e.br_ins 2842 1.1 christos # Set the comparison to ratio 2843 1.1 christos compare ratio 2844 1.1 christos functions 2845 1.1 christos @end smallexample 2846 1.1 christos @end cartouche 2847 1.1 christos 2848 1.1 christos Note that we print the name of the function first, followed by the counter 2849 1.1 christos data. 2850 1.1 christos The new element is that we set the comparison mode to @code{ratio}. This 2851 1.1 christos divides the data in a column by its counterpart in the reference experiment. 2852 1.1 christos 2853 1.1 christos This is the command using this script and the two experiment directories as 2854 1.1 christos input: 2855 1.1 christos 2856 1.1 christos @cartouche 2857 1.1 christos @smallexample 2858 1.1 christos $ gprofng display text -script my-script-comp-counters \ 2859 1.1 christos mxv.hwc.comp.1.thr.er \ 2860 1.1 christos mxv.hwc.comp.2.thr.er 2861 1.1 christos @end smallexample 2862 1.1 christos @end cartouche 2863 1.1 christos 2864 1.1 christos By design, we get two tables, one for each counter: 2865 1.1 christos 2866 1.1 christos @smallexample 2867 1.1 christos @verbatim 2868 1.1 christos Functions sorted by metric: Exclusive Last-Level Cache Misses 2869 1.1 christos 2870 1.1 christos mxv.hwc.comp.1.thr.er mxv.hwc.comp.2.thr.er 2871 1.1 christos Name Excl. Last-Level Excl. Last-Level 2872 1.1 christos Cache Misses Cache Misses 2873 1.1 christos ratio 2874 1.1 christos <Total> 122709276 x 0.788 2875 1.1 christos mxv_core 121796001 x 0.787 2876 1.1 christos init_data 723064 x 1.055 2877 1.1 christos erand48_r 100111 x 0.500 2878 1.1 christos drand48 60065 x 1.167 2879 1.1 christos 2880 1.1 christos Functions sorted by metric: Exclusive Branch Instructions 2881 1.1 christos 2882 1.1 christos mxv.hwc.comp.1.thr.er mxv.hwc.comp.2.thr.er 2883 1.1 christos Name Excl. Branch Excl. Branch 2884 1.1 christos Instructions Instructions 2885 1.1 christos ratio 2886 1.1 christos <Total> 1307307316 x 0.997 2887 1.1 christos mxv_core 1235235239 x 0.997 2888 1.1 christos erand48_r 23023033 x 0.957 2889 1.1 christos drand48 20020009 x 0.600 2890 1.1 christos __drand48_iterate 17017028 x 0.882 2891 1.1 christos @end verbatim 2892 1.1 christos @end smallexample 2893 1.1 christos 2894 1.1 christos A ratio less than one in the second column, means that this counter 2895 1.1 christos value was smaller than the value from the reference experiment shown 2896 1.1 christos in the first column. 2897 1.1 christos 2898 1.1 christos This kind of presentation of the results makes it much easier to 2899 1.1 christos quickly interpret the data. 2900 1.1 christos 2901 1.1 christos We conclude this section with thread-level event counter overviews, 2902 1.1 christos but before we go into this, there is an important metric we need to 2903 1.1 christos mention. 2904 1.1 christos 2905 1.1 christos @c -- TBD Explain <Total> for IPC 2906 1.1 christos 2907 1.1 christos @IndexSubentry{Hardware event counters, IPC} 2908 1.1 christos In case it is known how many instructions and CPU cycles have been executed, 2909 1.1 christos the value for the IPC (``Instructions Per Clockycle'') can be computed. 2910 1.1 christos @xref{Hardware Event Counters Explained}. 2911 1.1 christos This is a derived metric that gives an indication how well the processor 2912 1.1 christos is utilized. The inverse of the IPC is called CPI. 2913 1.1 christos @IndexSubentry{Hardware event counters, CPI} 2914 1.1 christos 2915 1.1 christos The @DisplayText{} command automatically computes the IPC and CPI values 2916 1.1 christos if an experiment contains the event counter values for the instructions 2917 1.1 christos and CPU cycles executed. These are part of the metric list and can be 2918 1.1 christos displayed, just like any other metric. 2919 1.1 christos 2920 1.1 christos @IndexSubentry{Options, @code{-metric_list}} 2921 1.1 christos @IndexSubentry{Commands, @code{metric_list}} 2922 1.1 christos This can be verified through the @command{metric_list} command. If we go 2923 1.1 christos back to our earlier experiment with the default event counters, we get 2924 1.1 christos the following result. 2925 1.1 christos 2926 1.1 christos @cartouche 2927 1.1 christos @smallexample 2928 1.1 christos $ gprofng display text -metric_list mxv.hwc.def.2.thr.er 2929 1.1 christos @end smallexample 2930 1.1 christos @end cartouche 2931 1.1 christos 2932 1.1 christos @smallexample 2933 1.1 christos @verbatim 2934 1.1 christos Current metrics: e.totalcpu:i.totalcpu:e.cycles:e+insts:e+llm:name 2935 1.1 christos Current Sort Metric: Exclusive Total CPU Time ( e.totalcpu ) 2936 1.1 christos Available metrics: 2937 1.1 christos Exclusive Total CPU Time: e.%totalcpu 2938 1.1 christos Inclusive Total CPU Time: i.%totalcpu 2939 1.1 christos Exclusive CPU Cycles: e.+%cycles 2940 1.1 christos Inclusive CPU Cycles: i.+%cycles 2941 1.1 christos Exclusive Instructions Executed: e+%insts 2942 1.1 christos Inclusive Instructions Executed: i+%insts 2943 1.1 christos Exclusive Last-Level Cache Misses: e+%llm 2944 1.1 christos Inclusive Last-Level Cache Misses: i+%llm 2945 1.1 christos Exclusive Instructions Per Cycle: e+IPC 2946 1.1 christos Inclusive Instructions Per Cycle: i+IPC 2947 1.1 christos Exclusive Cycles Per Instruction: e+CPI 2948 1.1 christos Inclusive Cycles Per Instruction: i+CPI 2949 1.1 christos Size: size 2950 1.1 christos PC Address: address 2951 1.1 christos Name: name 2952 1.1 christos @end verbatim 2953 1.1 christos @end smallexample 2954 1.1 christos 2955 1.1 christos Among the other metrics, we see the new metrics for the IPC and CPI 2956 1.1 christos listed. 2957 1.1 christos 2958 1.1 christos In the script below, we use this information and add the IPC and CPI 2959 1.1 christos to the metrics to be displayed. We also use a the thread filter to 2960 1.1 christos display these values for the individual threads. 2961 1.1 christos 2962 1.1 christos This is the complete script we have used. Other than a different selection 2963 1.1 christos of the metrics, there are no new features. 2964 1.1 christos 2965 1.1 christos @cartouche 2966 1.1 christos @smallexample 2967 1.1 christos # Define the metrics 2968 1.1 christos metrics e.insts:e.%cycles:e.IPC:e.CPI 2969 1.1 christos # Sort with respect to cycles 2970 1.1 christos sort e.cycles 2971 1.1 christos # Limit the output to 5 lines 2972 1.1 christos limit 5 2973 1.1 christos # Get the function overview for all threads 2974 1.1 christos functions 2975 1.1 christos # Get the function overview for thread 1 2976 1.1 christos thread_select 1 2977 1.1 christos functions 2978 1.1 christos # Get the function overview for thread 2 2979 1.1 christos thread_select 2 2980 1.1 christos functions 2981 1.1 christos # Get the function overview for thread 3 2982 1.1 christos thread_select 3 2983 1.1 christos functions 2984 1.1 christos @end smallexample 2985 1.1 christos @end cartouche 2986 1.1 christos 2987 1.1 christos In the metrics definition on the second line, we explicitly request the 2988 1.1 christos counter values for the instructions (@code{e.insts}) and CPU cycles 2989 1.1 christos (@code{e.cycles}) executed. These names can be found in output from the 2990 1.1 christos @IndexSubentry{Options, @code{-metric_list}} 2991 1.1 christos @IndexSubentry{Commands, @code{metric_list}} 2992 1.1 christos @command{metric_list} command above. 2993 1.1 christos In addition to these metrics, we also request the IPC and CPI to be shown. 2994 1.1 christos 2995 1.1 christos @IndexSubentry{Options, @code{-limit}} 2996 1.1 christos @IndexSubentry{Commands, @code{limit}} 2997 1.1 christos As before, we used the @command{limit} command to control the number of 2998 1.1 christos functions displayed. We then request an overview for all the threads, 2999 1.1 christos followed by three sets of two commands to select a thread and display the 3000 1.1 christos function overview. 3001 1.1 christos 3002 1.1 christos The script above is used as follows: 3003 1.1 christos 3004 1.1 christos @cartouche 3005 1.1 christos @smallexample 3006 1.1 christos $ gprofng display text -script my-script-ipc mxv.hwc.def.2.thr.er 3007 1.1 christos @end smallexample 3008 1.1 christos @end cartouche 3009 1.1 christos 3010 1.1 christos @noindent 3011 1.1 christos This script produces four tables. We list them separately below, 3012 1.1 christos and have left out the additional output. 3013 1.1 christos 3014 1.1 christos @noindent 3015 1.1 christos The first table shows the accumulated values across the three 3016 1.1 christos threads that have been active. 3017 1.1 christos 3018 1.1 christos @smallexample 3019 1.1 christos @verbatim 3020 1.1 christos Functions sorted by metric: Exclusive CPU Cycles 3021 1.1 christos 3022 1.1 christos Excl. Excl. CPU Excl. Excl. Name 3023 1.1 christos Instructions Cycles IPC CPI 3024 1.1 christos Executed sec. % 3025 1.1 christos 7906475309 2.691 100.00 1.473 0.679 <Total> 3026 1.1 christos 7432724378 2.598 96.54 1.434 0.697 mxv_core 3027 1.1 christos 188860269 0.035 1.31 2.682 0.373 erand48_r 3028 1.1 christos 73623396 0.026 0.95 1.438 0.696 init_data 3029 1.1 christos 76824434 0.018 0.66 2.182 0.458 drand48 3030 1.1 christos @end verbatim 3031 1.1 christos @end smallexample 3032 1.1 christos 3033 1.1 christos @noindent 3034 1.1 christos This shows that IPC of this program is completely dominated 3035 1.1 christos by function @code{mxv_core}. It has a fairly low IPC value 3036 1.1 christos of 1.43. 3037 1.1 christos 3038 1.1 christos @noindent 3039 1.1 christos The next table is for thread 1 and shows the values for the 3040 1.1 christos main thread. 3041 1.1 christos 3042 1.1 christos @smallexample 3043 1.1 christos @verbatim 3044 1.1 christos Exp Sel Total 3045 1.1 christos === === ===== 3046 1.1 christos 1 1 3 3047 1.1 christos Functions sorted by metric: Exclusive CPU Cycles 3048 1.1 christos 3049 1.1 christos Excl. Excl. CPU Excl. Excl. Name 3050 1.1 christos Instructions Cycles IPC CPI 3051 1.1 christos Executed sec. % 3052 1.1 christos 473750931 0.093 100.00 2.552 0.392 <Total> 3053 1.1 christos 188860269 0.035 37.93 2.682 0.373 erand48_r 3054 1.1 christos 73623396 0.026 27.59 1.438 0.696 init_data 3055 1.1 christos 76824434 0.018 18.97 2.182 0.458 drand48 3056 1.1 christos 134442832 0.013 13.79 5.250 0.190 __drand48_iterate 3057 1.1 christos @end verbatim 3058 1.1 christos @end smallexample 3059 1.1 christos 3060 1.1 christos @noindent 3061 1.1 christos Although this thread hardly uses any CPU cycles, the overall IPC 3062 1.1 christos of 2.55 is not all that bad. 3063 1.1 christos 3064 1.1 christos @noindent 3065 1.1 christos Last, we show the tables for threads 2 and 3: 3066 1.1 christos 3067 1.1 christos @smallexample 3068 1.1 christos @verbatim 3069 1.1 christos Exp Sel Total 3070 1.1 christos === === ===== 3071 1.1 christos 1 2 3 3072 1.1 christos Functions sorted by metric: Exclusive CPU Cycles 3073 1.1 christos 3074 1.1 christos Excl. Excl. CPU Excl. Excl. Name 3075 1.1 christos Instructions Cycles IPC CPI 3076 1.1 christos Executed sec. % 3077 1.1 christos 3716362189 1.298 100.00 1.435 0.697 <Total> 3078 1.1 christos 3716362189 1.298 100.00 1.435 0.697 mxv_core 3079 1.1 christos 0 0. 0. 0. 0. collector_root 3080 1.1 christos 0 0. 0. 0. 0. driver_mxv 3081 1.1 christos 3082 1.1 christos Exp Sel Total 3083 1.1 christos === === ===== 3084 1.1 christos 1 3 3 3085 1.1 christos Functions sorted by metric: Exclusive CPU Cycles 3086 1.1 christos 3087 1.1 christos Excl. Excl. CPU Excl. Excl. Name 3088 1.1 christos Instructions Cycles IPC CPI 3089 1.1 christos Executed sec. % 3090 1.1 christos 3716362189 1.300 100.00 1.433 0.698 <Total> 3091 1.1 christos 3716362189 1.300 100.00 1.433 0.698 mxv_core 3092 1.1 christos 0 0. 0. 0. 0. collector_root 3093 1.1 christos 0 0. 0. 0. 0. driver_mxv 3094 1.1 christos @end verbatim 3095 1.1 christos @end smallexample 3096 1.1 christos 3097 1.1 christos It is seen that both execute the same number of instructions and 3098 1.1 christos take about the same number of CPU cycles. As a result, the IPC is 3099 1.1 christos the same for both threads. 3100 1.1 christos 3101 1.1 christos @c -- A new node -------------------------------------------------------------- 3102 1.1 christos @c TBD @node Additional Features 3103 1.1 christos @c TBD @section Additional Features 3104 1.1 christos @c ---------------------------------------------------------------------------- 3105 1.1 christos 3106 1.1 christos @c -- A new node -------------------------------------------------------------- 3107 1.1 christos @c TBD @node More Filtering Capabilities 3108 1.1 christos @c TBD @subsection More Filtering Capabilities 3109 1.1 christos @c ---------------------------------------------------------------------------- 3110 1.1 christos 3111 1.1 christos @c TBD Cover @code{samples} and @code{seconds} 3112 1.1 christos 3113 1.1 christos @c -- A new node -------------------------------------------------------------- 3114 1.1 christos @node Java Profiling 3115 1.1 christos @section Java Profiling 3116 1.1 christos @c ---------------------------------------------------------------------------- 3117 1.1 christos 3118 1.1 christos @IndexSubentry{Options, @code{-j}} 3119 1.1 christos @IndexSubentry{Java profiling, @code{-j on/off}} 3120 1.1 christos The @CollectApp{} command supports Java profiling. The @code{-j on} option 3121 1.1 christos can be used for this, but since this feature is enabled by default, there is 3122 1.1 christos no need to set this explicitly. Java profiling may be disabled through the 3123 1.1 christos @code{-j off} option. 3124 1.1 christos 3125 1.1 christos The program is compiled as usual and the experiment directory is created 3126 1.1 christos similar to what we have seen before. The only difference with a C/C++ 3127 1.1 christos application is that the program has to be explicitly executed by java. 3128 1.1 christos 3129 1.1 christos For example, this is how to generate the experiment data for a Java 3130 1.1 christos program that has the source code stored in file @code{Pi.java}: 3131 1.1 christos 3132 1.1 christos @cartouche 3133 1.1 christos @smallexample 3134 1.1 christos $ javac Pi.java 3135 1.1 christos $ gprofng collect app -j on -O pi.demo.er java Pi < pi.in 3136 1.1 christos @end smallexample 3137 1.1 christos @end cartouche 3138 1.1 christos 3139 1.1 christos Regarding which java is selected to generate the data, @ToolName{} 3140 1.1 christos first looks for the JDK in the path set in either the 3141 1.1 christos @IndexSubentry{Java profiling, @code{JDK_HOME}} 3142 1.1 christos @code{JDK_HOME} environment variable, or in the 3143 1.1 christos @IndexSubentry{Java profiling, @code{JAVA_PATH}} 3144 1.1 christos @code{JAVA_PATH} environment variable. If neither of these variables is 3145 1.1 christos set, it checks for a JDK in the search path (set in the PATH 3146 1.1 christos environment variable). If there is no JDK in this path, it checks for 3147 1.1 christos the java executable in @code{/usr/java/bin/java}. 3148 1.1 christos 3149 1.1 christos In case additional options need to be passed on to the JVM, the 3150 1.1 christos @IndexSubentry{Options, @code{-J}} 3151 1.1 christos @IndexSubentry{Java profiling, @code{-J <string>}} 3152 1.1 christos @code{-J <string>} option can be used. The string with the 3153 1.1 christos option(s) has to be delimited by quotation marks in case 3154 1.1 christos there is more than one argument. 3155 1.1 christos 3156 1.1 christos The @DisplayText{} command may be used to view the performance data. There is 3157 1.1 christos no need for any special options and the same commands as previously discussed 3158 1.1 christos are supported. 3159 1.1 christos 3160 1.1 christos @IndexSubentry{Options, @code{-viewmode}} 3161 1.1 christos @IndexSubentry{Commands, @code{viewmode}} 3162 1.1 christos @IndexSubentry{Java profiling, different view modes} 3163 1.1 christos The @code{viewmode} command 3164 1.1 christos @xref{The Viewmode} 3165 1.1 christos is very useful to examine the call stacks. 3166 1.1 christos 3167 1.1 christos For example, this is how one can see the native call stacks. For 3168 1.1 christos lay-out purposes we have restricted the list to the first five entries: 3169 1.1 christos 3170 1.1 christos @cartouche 3171 1.1 christos @smallexample 3172 1.1 christos $ gprofng display text -limit 5 -viewmode machine -calltree pi.demo.er 3173 1.1 christos @end smallexample 3174 1.1 christos @end cartouche 3175 1.1 christos 3176 1.1 christos @smallexample 3177 1.1 christos @verbatim 3178 1.1 christos Print limit set to 5 3179 1.1 christos Viewmode set to machine 3180 1.1 christos Functions Call Tree. Metric: Attributed Total CPU Time 3181 1.1 christos 3182 1.1 christos Attr. Name 3183 1.1 christos Total 3184 1.1 christos CPU sec. 3185 1.1 christos 1.381 +-<Total> 3186 1.1 christos 1.171 +-Pi.calculatePi(double) 3187 1.1 christos 0.110 +-collector_root 3188 1.1 christos 0.110 | +-JavaMain 3189 1.1 christos 0.070 | +-jni_CallStaticVoidMethod 3190 1.1 christos @end verbatim 3191 1.1 christos @end smallexample 3192 1.1 christos 3193 1.1 christos @noindent 3194 1.1 christos Note that the selection of the viewmode is echoed in the output. 3195 1.1 christos 3196 1.1 christos @c -- A new node -------------------------------------------------------------- 3197 1.1 christos @node The gprofng Tools 3198 1.1 christos @chapter The gprofng Tools 3199 1.1 christos @c ---------------------------------------------------------------------------- 3200 1.1 christos 3201 1.1 christos Several tools are included in @ProductName{}. In subsequent chapters these 3202 1.1 christos are discussed in detail. Below a brief description is given, followed by an 3203 1.1 christos overview of the environment variables that are supported. 3204 1.1 christos 3205 1.1 christos @c -- A new node -------------------------------------------------------------- 3206 1.1 christos @node Tools Overview 3207 1.1 christos @section Tools Overview 3208 1.1 christos @c ---------------------------------------------------------------------------- 3209 1.1 christos 3210 1.1 christos The following tools are supported by @ProductName{}: 3211 1.1 christos 3212 1.1 christos @table @code 3213 1.1 christos 3214 1.1 christos @item @CollectApp{} 3215 1.1 christos @IndexSubentry{@code{gprofng}, @code{collect app}} 3216 1.1 christos 3217 1.1 christos Collects the performance data and stores the results in an experiment 3218 1.1 christos directory. There are many options on this tool, but quite often the 3219 1.1 christos defaults are sufficient. 3220 1.1 christos An experiment directory is required for the subsequent analysis of 3221 1.1 christos the results. 3222 1.1 christos 3223 1.1 christos @item @DisplayText{} 3224 1.1 christos @IndexSubentry{@code{gprofng}, @code{display text}} 3225 1.1 christos 3226 1.1 christos Generates performance reports in ASCII format. Commandline 3227 1.1 christos options, and/or commands in a script file are used to control the contents 3228 1.1 christos and lay-out of the generated report(s). 3229 1.1 christos 3230 1.1 christos @item @DisplayHTML{} 3231 1.1 christos @IndexSubentry{@code{gprofng}, @code{display html}} 3232 1.1 christos 3233 1.1 christos Takes one or more experiment directories and generates a directory with 3234 1.1 christos HTML files. Starting from the index.html file, the performance data 3235 1.1 christos may be examined in a browser. 3236 1.1 christos 3237 1.1 christos @item @DisplaySRC{} 3238 1.1 christos @IndexSubentry{@code{gprofng}, @code{display src}} 3239 1.1 christos 3240 1.1 christos Displays the source code, interleaved with the disassembled instructions. 3241 1.1 christos 3242 1.1 christos @item @Archive{} 3243 1.1 christos @IndexSubentry{@code{gprofng}, @code{archive}} 3244 1.1 christos 3245 1.1 christos Archives an experiment directory by (optionally) including source code and 3246 1.1 christos object files, as well as the shared libraries that have been used. 3247 1.1 christos 3248 1.1 christos @item @GUI{} 3249 1.1 christos @IndexSubentry{@code{gprofng}, @code{display gui}} 3250 1.1 christos 3251 1.1 christos This is an optional component that can be installed in addition to the 3252 1.1 christos command line gprofng tools listed above. It supports the graphical 3253 1.1 christos analysis of one or more experiments that have been created using 3254 1.1 christos @CollectApp{}. 3255 1.1 christos 3256 1.1 christos The GUI part of @ProductName{} is a GNU project. This is the link to the 3257 1.1 christos @url{https://savannah.gnu.org/projects/gprofng-gui, gprofng GUI page}. 3258 1.1 christos This page contains more information (e.g. how to clone the repo). 3259 1.1.1.2 christos There is also a 3260 1.1 christos @url{https://ftp.gnu.org/gnu/gprofng-gui, tar file distribution directory} 3261 1.1 christos with tar files that include everything that is needed to build and install 3262 1.1 christos the GUI code. Various versions are available here. 3263 1.1 christos Be aware that in order to build and use the gprofng GUI, Java needs to be 3264 1.1 christos installed first. The minimum Java version required is Java 8. 3265 1.1 christos 3266 1.1 christos @end table 3267 1.1 christos 3268 1.1 christos @c -- A new section ----------------------------------------------------------- 3269 1.1 christos @node The gprofng.rc file with default settings 3270 1.1 christos @section The gprofng.rc file with default settings 3271 1.1 christos @c ---------------------------------------------------------------------------- 3272 1.1 christos The @file{gprofng.rc} 3273 1.1 christos @cindex gprofng.rc 3274 1.1 christos file is used to define default settings for the @DisplayText{}, @Archive{}, 3275 1.1 christos and @DisplaySRC{} tools, but the user can override these defaults through 3276 1.1 christos local configuration settings when building and installing from the source 3277 1.1.1.2 christos code. 3278 1.1 christos 3279 1.1 christos There are three files that are checked when the tool starts up. The first 3280 1.1 christos file has pre-defined settings and comes with the installation, but through 3281 1.1 christos a hidden file called @file{.gprofng.rc}, the user can (re)define the defaults. 3282 1.1 christos 3283 1.1 christos These are the locations and files that are checked upon starting the above 3284 1.1 christos mentioned tools: 3285 1.1 christos 3286 1.1 christos @enumerate 3287 1.1 christos 3288 1.1 christos @item 3289 1.1 christos The system-wide filename is called @file{gprofng.rc} and is located in 3290 1.1.1.2 christos the @file{/etc} subdirectory in case an RPM was used for the installation. 3291 1.1 christos 3292 1.1 christos If @ProductName{} has been built from the source, this file is in 3293 1.1 christos subdirectory @file{etc} in the top level installation directory. 3294 1.1 christos 3295 1.1 christos @item 3296 1.1 christos The user's home directory may have a hidden file called @file{.gprofng.rc}. 3297 1.1 christos 3298 1.1 christos @item 3299 1.1 christos The directory where @DisplayText{} (or @DisplaySRC{}) is invoked from, may 3300 1.1 christos have a hidden file called @file{.gprofng.rc}. 3301 1.1 christos 3302 1.1 christos @end enumerate 3303 1.1 christos 3304 1.1 christos The settings of each file override the settings of the file(s) read before it. 3305 1.1 christos Defaults in the system-wide file are overruled by the file in the user home 3306 1.1 christos directory (if any) and any settings in the @file{.gprofng.rc} file in the 3307 1.1 christos current directory override those. 3308 1.1 christos 3309 1.1 christos Note that the settings in these files only affect the defaults. Unlike 3310 1.1 christos the commands used in a script file, they are not commands for the tools. 3311 1.1 christos 3312 1.1 christos @c -- TBD indxobj_define, 3313 1.1 christos @c -- TBD object_show, 3314 1.1 christos @c -- TBD object_hide, 3315 1.1 christos @c -- TBD object_api, 3316 1.1 christos 3317 1.1 christos The @file{.gprofng.rc} configuration files can contain the 3318 1.1 christos @command{addpath}, 3319 1.1 christos @command{compare}, 3320 1.1 christos @command{dthresh}, 3321 1.1 christos @command{name}, 3322 1.1 christos @command{pathmap}, 3323 1.1 christos @command{printmode}, 3324 1.1 christos @command{sthresh}, 3325 1.1 christos and 3326 1.1 christos @command{viewmode} 3327 1.1 christos commands as described in this user guide. 3328 1.1 christos 3329 1.1 christos They can also contain the following commands, @emph{which cannot be used on 3330 1.1 christos either the command line, or in a script file}: 3331 1.1 christos 3332 1.1 christos @table @code 3333 1.1 christos 3334 1.1 christos @item dmetrics @var{metric-spec} 3335 1.1 christos @IndexSubentry{Commands, @code{dmetrics}} 3336 1.1 christos 3337 1.1 christos Specify the default metrics to be displayed or printed in the function list. 3338 1.1 christos The syntax and use of the metric list is described in section 3339 1.1 christos @ref{Metric Definitions}. 3340 1.1 christos The order of the metric keywords in the list determines the order in which 3341 1.1 christos the metrics are presented. 3342 1.1 christos 3343 1.1 christos Default metrics for the @code{callers-callees} list are derived from the 3344 1.1 christos function list default metrics by adding the corresponding attributed metric 3345 1.1 christos before the first occurrence of each metric name in the list. 3346 1.1 christos 3347 1.1 christos @item dsort @var{metric-spec} 3348 1.1 christos @IndexSubentry{Commands, @code{dsort}} 3349 1.1 christos 3350 1.1 christos Specify the default metric by which the function list is sorted. The sort 3351 1.1 christos metric is the first metric in this list that matches a metric in any loaded 3352 1.1 christos experiment, subject to the following conditions: 3353 1.1 christos 3354 1.1 christos @itemize @bullet 3355 1.1 christos 3356 1.1 christos @item 3357 1.1 christos If the entry in @var{metric-spec} has a visibility string of an exclamation 3358 1.1 christos point (@samp{!}), the first metric whose name matches is used, regardless of 3359 1.1 christos whether it is visible. 3360 1.1 christos 3361 1.1 christos @item 3362 1.1 christos If the entry in @var{metric-spec} has any other visibility string, the first 3363 1.1 christos visible metric whose name matches is used. 3364 1.1 christos 3365 1.1 christos @end itemize 3366 1.1 christos 3367 1.1 christos The syntax and use of the metric list is described in section 3368 1.1 christos @ref{Metric Definitions}. 3369 1.1 christos The default sort metric for the @code{callers-callees} list is the attributed 3370 1.1 christos metric corresponding to the default sort metric for the function list. 3371 1.1 christos 3372 1.1 christos @item en_desc @{on | off | =@var{regex}@} 3373 1.1 christos @IndexSubentry{Commands, @code{en_desc}} 3374 1.1 christos 3375 1.1 christos Set the mode for reading descendant experiments to @samp{on} (enable all 3376 1.1.1.2 christos descendants) or @samp{off} to disable all descendants. If 3377 1.1 christos @samp{=}@var{regex} is used, enable data from those experiments whose 3378 1.1 christos executable name matches the regular expression. 3379 1.1 christos 3380 1.1 christos The default setting is @samp{on} to follow all descendants. In reading 3381 1.1 christos experiments with descendants, any sub-experiments that contain little or 3382 1.1 christos no performance data are ignored by @DisplayText{}. 3383 1.1 christos 3384 1.1 christos @end table 3385 1.1 christos 3386 1.1 christos @c -- A new section ----------------------------------------------------------- 3387 1.1 christos @node Filters 3388 1.1 christos @section Filters 3389 1.1 christos @c ---------------------------------------------------------------------------- 3390 1.1 christos 3391 1.1 christos Various filter commands are supported by @DisplayText{}. 3392 1.1.1.2 christos Thanks to the use of filters, the user can zoom in on a certain area of 3393 1.1 christos interest. With filters, it is possible to select one or more threads to 3394 1.1 christos focus on, define a window in time, select specific call stacks, etc. 3395 1.1 christos @IndexSubentry{Filters, Intro} 3396 1.1 christos 3397 1.1 christos While already powerful by themselves, filters may be combined to further 3398 1.1 christos narrow down the view into the data. 3399 1.1 christos 3400 1.1 christos @IndexSubentry{Filters, Persistence} 3401 1.1 christos It is important to note that filters are @emph{persistent}. A filter is 3402 1.1 christos active until it is reset. This means that successive filter commands 3403 1.1 christos increasingly narrow down the view until one or more are reset. 3404 1.1 christos 3405 1.1 christos @noindent 3406 1.1 christos An example is the following: 3407 1.1 christos 3408 1.1 christos @cartouche 3409 1.1 christos @smallexample 3410 1.1 christos $ gprofng display text -thread_select 1 -functions \ 3411 1.1 christos -cpu_select 2 -functions @dots{} 3412 1.1 christos @end smallexample 3413 1.1 christos @end cartouche 3414 1.1 christos 3415 1.1 christos This command selects thread 1 and requests the function view for this thread. 3416 1.1 christos The third (@command{cpu_select 2}) command @emph{adds} the 3417 1.1 christos constraint that only the events on CPU 2 are to be selected. This means 3418 1.1 christos that the next function view selects events that were executed by thread 1 and 3419 1.1 christos have been running on CPU 2. 3420 1.1 christos 3421 1.1 christos @noindent 3422 1.1 christos In contrast with this single command line, the two commands below look similar, 3423 1.1 christos but behave very differently: 3424 1.1 christos 3425 1.1 christos @cartouche 3426 1.1 christos @smallexample 3427 1.1 christos $ gprofng display text -thread_select 1 -functions @dots{} 3428 1.1 christos $ gprofng display text -cpu_select 2 -functions @dots{} 3429 1.1 christos @end smallexample 3430 1.1 christos @end cartouche 3431 1.1 christos 3432 1.1 christos The first command displays the function view for thread 1. The second 3433 1.1 christos command shows the function view for CPU 2 for @emph{all} threads that have 3434 1.1 christos been running on this CPU. 3435 1.1 christos 3436 1.1 christos As the following example demonstrates, things get a little more tricky in 3437 1.1 christos case a script file is used. Consider the following script file: 3438 1.1 christos 3439 1.1 christos @smallexample 3440 1.1 christos @verbatim 3441 1.1 christos thread_select 1 3442 1.1 christos functions 3443 1.1 christos cpu_select 2 3444 1.1 christos functions 3445 1.1 christos @end verbatim 3446 1.1 christos @end smallexample 3447 1.1 christos 3448 1.1 christos This script file displays the function view for thread 1 first. This is 3449 1.1 christos followed by those functions that were executed by thread 1 @emph{and} have 3450 1.1 christos been run on CPU 2. 3451 1.1 christos 3452 1.1 christos If however, the script should behave like the two command line invocations 3453 1.1 christos shown above, the thread selection filter needs to be reset before CPU 2 is 3454 1.1 christos selected: 3455 1.1 christos 3456 1.1 christos @smallexample 3457 1.1 christos @verbatim 3458 1.1 christos thread_select 1 3459 1.1 christos functions 3460 1.1 christos # Reset the thread selection filter: 3461 1.1 christos thread_select all 3462 1.1 christos cpu_select 2 3463 1.1 christos functions 3464 1.1 christos @end verbatim 3465 1.1 christos @end smallexample 3466 1.1 christos 3467 1.1.1.2 christos In general, filters behave differently than commands or options. In 3468 1.1 christos particular there may be an interaction between different filter definitions. 3469 1.1 christos 3470 1.1 christos For example, as explained above, in the first script file the 3471 1.1 christos @command{thread_select} and @command{cpu_select} commands interact. 3472 1.1 christos 3473 1.1 christos For a list of all the predefined filters see @ref{Predefined Filters}. 3474 1.1 christos 3475 1.1 christos @c -- A new section ----------------------------------------------------------- 3476 1.1 christos @node Supported Environment Variables 3477 1.1 christos @section Supported Environment Variables 3478 1.1 christos @c ---------------------------------------------------------------------------- 3479 1.1 christos 3480 1.1 christos Various environment variables are supported. We refer to the man page for 3481 1.1 christos gprofng(1) for an overview and description 3482 1.1 christos (@xref{Man page for @command{gprofng}}). 3483 1.1 christos 3484 1.1 christos @c -- A new chapter ----------------------------------------------------------- 3485 1.1 christos @node Performance Data Collection 3486 1.1 christos @chapter Performance Data Collection 3487 1.1 christos @c ---------------------------------------------------------------------------- 3488 1.1 christos 3489 1.1 christos The @CollectApp{} command is used to gather the application performance data 3490 1.1 christos while the application executes. 3491 1.1 christos At regular intervals, program execution is halted and the required data is 3492 1.1 christos recorded. 3493 1.1 christos @cindex Experiment directory 3494 1.1 christos An experiment directory is created when the tool starts. This directory is 3495 1.1 christos used to store the relevant information and forms the basis for a subsequent 3496 1.1 christos analysis with one of the viewing tools. 3497 1.1 christos 3498 1.1 christos @c -- A new section ----------------------------------------------------------- 3499 1.1 christos @node The @command{gprofng collect app} Tool 3500 1.1 christos @section The @command{gprofng collect app} Tool 3501 1.1 christos @c ---------------------------------------------------------------------------- 3502 1.1 christos 3503 1.1 christos This is the command to collect the performance information for the target 3504 1.1 christos application. The usage is as follows: 3505 1.1 christos 3506 1.1 christos @cartouche 3507 1.1 christos @smallexample 3508 1.1 christos $ gprofng collect app [OPTION(S)] TARGET [TARGET_ARGUMENTS] 3509 1.1 christos @end smallexample 3510 1.1 christos @end cartouche 3511 1.1 christos 3512 1.1 christos @noindent 3513 1.1 christos Options to the command are passed in first. This is followed by the name of 3514 1.1 christos the target, which is typically a binary executable or a script, followed by 3515 1.1 christos any options that may be required by the target. 3516 1.1 christos 3517 1.1 christos @c -- A new section ----------------------------------------------------------- 3518 1.1 christos @node View the Performance Information 3519 1.1 christos @chapter View the Performance Information 3520 1.1 christos @c ---------------------------------------------------------------------------- 3521 1.1 christos Various tools to view the performance data stored in one or more experiment 3522 1.1 christos directories are available. In this chapter, these will all be covered in 3523 1.1 christos detail. 3524 1.1 christos 3525 1.1 christos @c -- A new section ----------------------------------------------------------- 3526 1.1 christos @node The @command{gprofng display text} Tool 3527 1.1 christos @section The @command{gprofng display text} Tool 3528 1.1 christos @c ---------------------------------------------------------------------------- 3529 1.1 christos 3530 1.1 christos This tool displays the performance information in ASCII format. It supports 3531 1.1 christos a variety of views into the data recorded. 3532 1.1 christos These views can be specified in two ways and both may be used simultaneously: 3533 1.1 christos 3534 1.1 christos @itemize @bullet 3535 1.1 christos 3536 1.1 christos @item 3537 1.1 christos Command line options start with a dash (@samp{-}) symbol and may take an 3538 1.1 christos argument. 3539 1.1 christos 3540 1.1 christos @item 3541 1.1 christos Options may also be included in a file, the ``script file''. In this case, 3542 1.1 christos the dash symbol should @emph{not} be included. Multiple script files can 3543 1.1 christos be used on the same command line. 3544 1.1 christos 3545 1.1 christos @end itemize 3546 1.1 christos 3547 1.1 christos While they may appear as an option, they are really commands and this is 3548 1.1 christos why they will be referred to as @emph{commands} 3549 1.1 christos @cindex Commands 3550 1.1 christos in the documentation. 3551 1.1 christos 3552 1.1 christos As a general rule, @emph{the order of options matters} and if the same option, 3553 1.1 christos or command, occurs multiple times, the rightmost setting is selected. 3554 1.1 christos 3555 1.1 christos @c -- A new sub section ------------------------------------------------------- 3556 1.1 christos @node The @code{gprofng display text} Commands 3557 1.1 christos @subsection The @code{gprofng display text} Commands 3558 1.1 christos @c ---------------------------------------------------------------------------- 3559 1.1 christos 3560 1.1 christos The most commonly used commands are documented in the man page for this tool 3561 1.1 christos (@xref{Man page for @command{gprofng display text}}). In this section we list 3562 1.1 christos and describe all other commands that are supported. 3563 1.1 christos 3564 1.1 christos @c -- A new sub subsection ---------------------------------------------------- 3565 1.1 christos @node Commands that List Experiment Details 3566 1.1 christos @unnumberedsubsubsec Commands that List Experiment Details 3567 1.1 christos @c ---------------------------------------------------------------------------- 3568 1.1 christos 3569 1.1 christos @table @code 3570 1.1 christos 3571 1.1 christos @item experiment_ids 3572 1.1 christos @ifclear man 3573 1.1 christos @IndexSubentry{Options, @code{-experiment_ids}} 3574 1.1 christos @IndexSubentry{Commands, @code{experiment_ids}} 3575 1.1 christos @end ifclear 3576 1.1 christos 3577 1.1 christos For each experiment that has been loaded, show the totals of the metrics 3578 1.1 christos recorded, plus some other operational characteristics like the name of 3579 1.1 christos the executable, PID, etc. The top line contains the accumulated totals 3580 1.1 christos for the metrics. 3581 1.1 christos 3582 1.1 christos @item experiment_list 3583 1.1 christos @ifclear man 3584 1.1 christos @IndexSubentry{Options, @code{-experiment_list}} 3585 1.1 christos @IndexSubentry{Commands, @code{experiment_list}} 3586 1.1 christos @end ifclear 3587 1.1 christos 3588 1.1 christos Display the list of experiments that are loaded. Each experiment is listed 3589 1.1 christos with an index, which is used when selecting samples, threads, or LWPs, and 3590 1.1 christos a process id (PID), which can be used for advanced filtering. 3591 1.1 christos 3592 1.1 christos @item cpu_list 3593 1.1 christos @IndexSubentry{Options, @code{-cpu_list}} 3594 1.1 christos @IndexSubentry{Commands, @code{cpu_list}} 3595 1.1 christos 3596 1.1 christos Display the total number of CPUs that have been used during the experiment(s). 3597 1.1 christos 3598 1.1 christos @item cpus 3599 1.1 christos @IndexSubentry{Options, @code{-cpus}} 3600 1.1 christos @IndexSubentry{Commands, @code{cpus}} 3601 1.1 christos 3602 1.1 christos Show a list of CPUs that were used by the application, along with the metrics 3603 1.1.1.2 christos that have been recorded. The CPUs are represented by a CPU number and show 3604 1.1.1.2 christos the Total CPU time by default. 3605 1.1 christos 3606 1.1 christos Note that since the data is sorted with respect to the default metric, it may 3607 1.1 christos be useful to use the @command{sort name} command to show the list sorted with 3608 1.1 christos respect to the CPU id. 3609 1.1 christos 3610 1.1 christos @item GCEvents 3611 1.1 christos @IndexSubentry{Options, @code{-GCEvents}} 3612 1.1 christos @IndexSubentry{Commands, @code{GCEvents}} 3613 1.1 christos 3614 1.1 christos This commands is for Java applications only. It shows any Garbage Collection 3615 1.1.1.2 christos (GC) events that have occurred while the application was executing. 3616 1.1 christos 3617 1.1 christos @item lwp_list 3618 1.1 christos @IndexSubentry{Options, @code{-lwp_list}} 3619 1.1 christos @IndexSubentry{Commands, @code{lwp_list}} 3620 1.1 christos 3621 1.1 christos Displays the list of LWPs processed during the experiment(s). 3622 1.1 christos 3623 1.1 christos @item processes 3624 1.1 christos @IndexSubentry{Options, @code{-processes}} 3625 1.1 christos @IndexSubentry{Commands, @code{processes}} 3626 1.1 christos 3627 1.1 christos For each experiment that has been loaded, this command displays a list of 3628 1.1 christos processes that were created by the application, along with their metrics. 3629 1.1 christos The processes are represented by process ID (PID) numbers and show the 3630 1.1.1.2 christos Total CPU time metric by default. If additional metrics are recorded in 3631 1.1 christos an experiment, these are shown as well. 3632 1.1 christos 3633 1.1 christos @item samples 3634 1.1 christos @IndexSubentry{Options, @code{-samples}} 3635 1.1 christos @IndexSubentry{Commands, @code{samples}} 3636 1.1 christos 3637 1.1.1.2 christos Display a list of sample points and their metrics, which reflect the 3638 1.1 christos microstates recorded at each sample point in the loaded experiment. 3639 1.1 christos The samples are represented by sample numbers and show the Total CPU time 3640 1.1 christos by default. Other metrics might also be displayed if enabled. 3641 1.1 christos 3642 1.1 christos @item sample_list 3643 1.1 christos @IndexSubentry{Options, @code{-sample_list}} 3644 1.1 christos @IndexSubentry{Commands, @code{sample_list}} 3645 1.1 christos 3646 1.1 christos For each experiment loaded, display the list of samples currently selected. 3647 1.1 christos 3648 1.1 christos @item seconds 3649 1.1 christos @IndexSubentry{Options, @code{-seconds}} 3650 1.1 christos @IndexSubentry{Commands, @code{seconds}} 3651 1.1 christos 3652 1.1 christos Show each second of the profiling run that was captured in the experiment, 3653 1.1 christos along with the metrics collected in that second. The seconds view differs 3654 1.1 christos from the samples view in that it shows periodic samples that occur every 3655 1.1 christos second beginning at 0 and the interval cannot be changed. 3656 1.1 christos 3657 1.1 christos The seconds view lists the seconds of execution with the Total CPU time by 3658 1.1 christos default. Other metrics might also be displayed if the metrics are present 3659 1.1 christos in the loaded experiments. 3660 1.1 christos 3661 1.1 christos @item threads 3662 1.1 christos @IndexSubentry{Options, @code{-threads}} 3663 1.1 christos @IndexSubentry{Commands, @code{threads}} 3664 1.1 christos 3665 1.1 christos Show a list of threads and their metrics. The threads are represented 3666 1.1 christos by a process and thread pair and show the Total CPU time by default. 3667 1.1 christos Other metrics might also be displayed by default if the metrics are 3668 1.1 christos present in the loaded experiment. 3669 1.1 christos 3670 1.1 christos @item thread_list 3671 1.1 christos @IndexSubentry{Options, @code{-thread_list}} 3672 1.1 christos @IndexSubentry{Commands, @code{thread_list}} 3673 1.1 christos 3674 1.1 christos Display the list of threads currently selected for the analysis. 3675 1.1 christos 3676 1.1 christos @end table 3677 1.1 christos 3678 1.1 christos @noindent 3679 1.1 christos @emph{The commands below are for use in scripts and interactive mode only. 3680 1.1 christos They are not allowed on the command line.} 3681 1.1 christos 3682 1.1 christos @table @code 3683 1.1 christos 3684 1.1 christos @item add_exp @var{exp-name} 3685 1.1 christos @IndexSubentry{Commands, @code{add_exp}} 3686 1.1 christos 3687 1.1 christos Add the named experiment to the current session. 3688 1.1 christos 3689 1.1 christos @item drop_exp @var{exp-name} 3690 1.1 christos @IndexSubentry{Commands, @code{drop_exp}} 3691 1.1 christos 3692 1.1 christos Drop the named experiment from the current session. 3693 1.1 christos 3694 1.1 christos @item open_exp @var{exp-name} 3695 1.1 christos @IndexSubentry{Commands, @code{open_exp}} 3696 1.1 christos 3697 1.1 christos Drop all loaded experiments from the session, and then load the named 3698 1.1 christos experiment. 3699 1.1 christos 3700 1.1 christos @end table 3701 1.1 christos 3702 1.1 christos @c -- A new sub subsection ---------------------------------------------------- 3703 1.1 christos @node Commands that Affect Listings and Output 3704 1.1 christos @unnumberedsubsubsec Commands that Affect Listings and Output 3705 1.1 christos @c ---------------------------------------------------------------------------- 3706 1.1 christos 3707 1.1 christos @table @code 3708 1.1 christos 3709 1.1 christos @item dthresh @var{value} 3710 1.1 christos @IndexSubentry{Options, @code{-dthresh}} 3711 1.1 christos @IndexSubentry{Commands, @code{dthresh}} 3712 1.1 christos 3713 1.1 christos Specify the threshold percentage for highlighting metrics in the annotated 3714 1.1 christos disassembly code. If the value of any metric is equal to or greater than 3715 1.1 christos @var{value} as a percentage of the maximum value of that metric for any 3716 1.1 christos instruction line in the file, the line on which the metrics occur has a 3717 1.1 christos @samp{##} marker inserted at the beginning of the line. The default is 75. 3718 1.1 christos 3719 1.1 christos @item printmode @{text | html | @var{single-char}@} 3720 1.1 christos @IndexSubentry{Options, @code{-printmode}} 3721 1.1 christos @IndexSubentry{Commands, @code{printmode}} 3722 1.1 christos 3723 1.1.1.2 christos Set the print mode. If the keyword is @code{text}, printing will be done in 3724 1.1 christos tabular form using plain text. In case the @code{html} keyword is selected, 3725 1.1 christos the output is formatted as an HTML table. 3726 1.1 christos 3727 1.1 christos Alternatively, @var{single-char} may be used in a delimiter separated list, 3728 1.1 christos with the single character @var{single-char} as the delimiter. 3729 1.1 christos 3730 1.1 christos The printmode setting is used only for those commands that generate tables, 3731 1.1 christos such as @command{functions}. The setting is ignored for other printing 3732 1.1 christos commands, including those showing source and disassembly listings. 3733 1.1 christos 3734 1.1 christos @item sthresh @var{value} 3735 1.1 christos @IndexSubentry{Options, @code{-sthresh}} 3736 1.1 christos @IndexSubentry{Commands, @code{sthresh}} 3737 1.1 christos 3738 1.1 christos Specify the threshold percentage for highlighting metrics in the annotated 3739 1.1 christos source code. If the value of any metric is equal to or greater than 3740 1.1 christos @var{value} (as a percentage) of the maximum value of that metric for any 3741 1.1 christos source line in the file, the line on which the metrics occur has a @samp{##} 3742 1.1 christos marker inserted at the beginning of the line. The default is 75. 3743 1.1 christos 3744 1.1 christos @end table 3745 1.1 christos 3746 1.1 christos @c -- A new sub subsection ---------------------------------------------------- 3747 1.1 christos @node Predefined Filters 3748 1.1 christos @unnumberedsubsubsec Predefined Filters 3749 1.1 christos @c ---------------------------------------------------------------------------- 3750 1.1 christos 3751 1.1 christos The filters below use a list, the selection list, to define a sequence of 3752 1.1 christos numbers. @xref{The Selection List}. 3753 1.1 christos Note that this selection is persistent, but the filter can be reset by using 3754 1.1 christos @samp{all} as the @var{selection-list}. 3755 1.1 christos 3756 1.1 christos @table @code 3757 1.1 christos 3758 1.1 christos @item cpu_select @var{selection-list} 3759 1.1 christos @IndexSubentry{Options, @code{-cpu_select}} 3760 1.1 christos @IndexSubentry{Commands, @code{cpu_select}} 3761 1.1 christos 3762 1.1 christos Select the CPU ids specified in the @var{selection-list}. 3763 1.1 christos 3764 1.1 christos @item lwp_select @var{selection-list} 3765 1.1 christos @IndexSubentry{Options, @code{-lwp_select}} 3766 1.1 christos @IndexSubentry{Commands, @code{lwp_select}} 3767 1.1 christos 3768 1.1 christos Select the LWPs specified in the @var{selection-list}. 3769 1.1 christos 3770 1.1 christos @item sample_select @var{selection-list} 3771 1.1 christos @IndexSubentry{Options, @code{-sample-select}} 3772 1.1 christos @IndexSubentry{Commands, @code{sample-select}} 3773 1.1 christos 3774 1.1 christos @item thread_select @var{selection-list} 3775 1.1 christos @IndexSubentry{Options, @code{-thread_select}} 3776 1.1 christos @IndexSubentry{Commands, @code{thread_select}} 3777 1.1 christos 3778 1.1 christos Select a series of threads, or just one, to be used in subsequent views. 3779 1.1 christos The @var{selection-list} consists of a sequence of comma separated numbers. 3780 1.1 christos This may include a range of the form @samp{n-m}. 3781 1.1 christos 3782 1.1 christos @end table 3783 1.1 christos 3784 1.1 christos 3785 1.1 christos @c -- A new sub subsection ---------------------------------------------------- 3786 1.1 christos @node Commands to Set and Change Search Paths 3787 1.1 christos @unnumberedsubsubsec Commands to Set and Change Search Paths 3788 1.1 christos @c ---------------------------------------------------------------------------- 3789 1.1 christos 3790 1.1 christos @table @code 3791 1.1 christos 3792 1.1 christos @item addpath @var{path-list} 3793 1.1 christos @IndexSubentry{Options, @code{-addpath}} 3794 1.1 christos @IndexSubentry{Commands, @code{addpath}} 3795 1.1 christos 3796 1.1 christos Append @var{path-list} to the current setpath settings. Note that multiple 3797 1.1 christos @command{addpath} commands can be used in @file{.gprofng.rc} files, and will 3798 1.1 christos be concatenated. 3799 1.1 christos 3800 1.1 christos @item pathmap @var{old-prefix} @var{new-prefix} 3801 1.1 christos @IndexSubentry{Options, @code{-pathmap}} 3802 1.1 christos @IndexSubentry{Commands, @code{pathmap}} 3803 1.1 christos 3804 1.1 christos If a file cannot be found using the path list set by @command{addpath}, or 3805 1.1.1.2 christos the @command{setpath} command, one or more path remappings may be set with the 3806 1.1 christos @command{pathmap} command. 3807 1.1 christos 3808 1.1 christos With path mapping, the user can specify how to replace the leading component 3809 1.1 christos in a full path by a different string. 3810 1.1 christos 3811 1.1 christos With this command, any path name for a source file, object file, or shared 3812 1.1 christos object that begins with the prefix specified with @var{old-prefix}, the 3813 1.1 christos old prefix is replaced by the prefix specified with @var{new-prefix}. 3814 1.1 christos The resulting path is used to find the file. 3815 1.1 christos 3816 1.1 christos For example, if a source file located in directory @file{/tmp} 3817 1.1 christos is shown in the @DisplayText{} output, but should instead be taken from 3818 1.1 christos @file{/home/demo}, the following @file{pathmap} command redefines the 3819 1.1.1.2 christos path: 3820 1.1 christos 3821 1.1 christos @smallexample 3822 1.1 christos $ gprofng diplay text -pathmap /tmp /home/demo -source ... 3823 1.1 christos @end smallexample 3824 1.1 christos 3825 1.1 christos Note that multiple @command{pathmap} commands can be supplied, and each is 3826 1.1 christos tried until the file is found. 3827 1.1 christos 3828 1.1 christos @item setpath @var{path-list} 3829 1.1 christos @IndexSubentry{Options, @code{-setpath}} 3830 1.1 christos @IndexSubentry{Commands, @code{setpath}} 3831 1.1 christos 3832 1.1.1.2 christos Set the path used to find source and object files. The path is defined 3833 1.1.1.2 christos through the @var{path-list} keyword. It is a colon separated list of 3834 1.1 christos directories, jar files, or zip files. 3835 1.1 christos If any directory has a colon character in it, escape it with a 3836 1.1 christos backslash (@samp{\}). 3837 1.1 christos 3838 1.1 christos The special directory name @code{$expts}, refers 3839 1.1 christos to the set of current experiments in the order in which they were loaded. 3840 1.1 christos You can abbreviate it with a single @samp{$} character. 3841 1.1 christos 3842 1.1 christos The default path is @samp{$expts:..} which is the directories of the 3843 1.1 christos loaded experiments and the current working directory. 3844 1.1 christos 3845 1.1.1.2 christos Use @command{setpath} with no argument to display the current path. 3846 1.1 christos 3847 1.1 christos Note that @command{setpath} commands @emph{are not allowed .gprofng.rc 3848 1.1 christos configuration files}. 3849 1.1 christos 3850 1.1 christos @end table 3851 1.1 christos 3852 1.1 christos @c -- A new subsection -------------------------------------------------------- 3853 1.1 christos @c -- TBD @node Usage examples for @code{gprofng display text} 3854 1.1 christos @c -- TBD @subsection Usage examples for @code{gprofng display text} 3855 1.1 christos @c ---------------------------------------------------------------------------- 3856 1.1 christos 3857 1.1 christos @c -- TBD In this section we present usage examples. 3858 1.1 christos 3859 1.1 christos @c -- A new chapter ----------------------------------------------------------- 3860 1.1 christos @c TBD @node The @code{gprofng display html} Tool 3861 1.1 christos @c TBD @section The @code{gprofng display html} Tool 3862 1.1 christos @c ---------------------------------------------------------------------------- 3863 1.1 christos @c TBD The options are documented in the man page for this tool. In this section we 3864 1.1 christos @c TBD present usage examples. 3865 1.1 christos 3866 1.1 christos @c -- A new chapter ----------------------------------------------------------- 3867 1.1 christos @c TBD @node Display Source Code 3868 1.1 christos @c TBD @chapter Display Source Code 3869 1.1 christos @c ---------------------------------------------------------------------------- 3870 1.1 christos @c TBD The options are documented in the man page for this tool. In this section we 3871 1.1 christos @c TBD present usage examples. 3872 1.1 christos 3873 1.1 christos 3874 1.1 christos @c -- A new chapter ----------------------------------------------------------- 3875 1.1 christos @c TBD @node Archive Experiment Data 3876 1.1 christos @c TBD @chapter Archive Experiment Data 3877 1.1 christos @c ---------------------------------------------------------------------------- 3878 1.1 christos @c TBD The options are documented in the man page for this tool. In this section we 3879 1.1 christos @c TBD present usage examples. 3880 1.1 christos 3881 1.1 christos @c -- A new chapter ----------------------------------------------------------- 3882 1.1 christos @node Terminology 3883 1.1 christos @chapter Terminology 3884 1.1 christos @c ---------------------------------------------------------------------------- 3885 1.1 christos 3886 1.1 christos Throughout this manual, certain terminology specific to profiling tools, 3887 1.1 christos or @ToolName{}, or even to this document only, is used. In this chapter 3888 1.1 christos this terminology is explained in detail. 3889 1.1 christos 3890 1.1 christos @menu 3891 1.1 christos * The Program Counter:: What is a Program Counter? 3892 1.1 christos * Inclusive and Exclusive Metrics:: An explanation of inclusive and exclusive metrics. 3893 1.1 christos * Metric Definitions:: Definitions associated with metrics. 3894 1.1 christos * The Viewmode:: Select the way call stacks are presented. 3895 1.1 christos * The Selection List:: How to define a selection. 3896 1.1 christos * Load Objects and Functions:: The components in an application. 3897 1.1 christos * The Concept of a CPU in gprofng:: The definition of a CPU. 3898 1.1 christos * Hardware Event Counters Explained:: What are event counters? 3899 1.1 christos * apath:: Our generic definition of a path. 3900 1.1 christos @end menu 3901 1.1 christos 3902 1.1 christos @c ---------------------------------------------------------------------------- 3903 1.1 christos @node The Program Counter 3904 1.1 christos @section The Program Counter 3905 1.1 christos @c ---------------------------------------------------------------------------- 3906 1.1 christos 3907 1.1 christos @cindex PC 3908 1.1 christos @cindex Program Counter 3909 1.1.1.2 christos The @emph{Program Counter}, or PC for short, keeps track where program 3910 1.1.1.2 christos execution is. The address of the next instruction to be executed is stored 3911 1.1.1.2 christos in a special purpose register in the processor, or core. 3912 1.1 christos 3913 1.1 christos @cindex Instruction pointer 3914 1.1 christos The PC is sometimes also referred to as the @emph{instruction pointer}, but 3915 1.1 christos we will use Program Counter or PC throughout this document. 3916 1.1 christos 3917 1.1 christos @c ---------------------------------------------------------------------------- 3918 1.1 christos @node Inclusive and Exclusive Metrics 3919 1.1 christos @section Inclusive and Exclusive Metrics 3920 1.1 christos @c ---------------------------------------------------------------------------- 3921 1.1 christos 3922 1.1 christos In the remainder, these two concepts occur quite often and for lack of a better 3923 1.1 christos place, they are explained here. 3924 1.1 christos 3925 1.1 christos @cindex Inclusive metric 3926 1.1 christos The @emph{inclusive} value for a metric includes all values that are part of 3927 1.1 christos the dynamic extent of the target function. For example if function @code{A} 3928 1.1 christos calls functions @code{B} and @code{C}, the inclusive CPU time for @code{A} 3929 1.1 christos includes the CPU time spent in @code{B} and @code{C}. 3930 1.1 christos 3931 1.1 christos @cindex Exclusive metric 3932 1.1 christos In contrast with this, the @emph{exclusive} value for a metric is computed 3933 1.1.1.2 christos by excluding the metric values used by other functions called. In our 3934 1.1.1.2 christos imaginary example, the exclusive CPU time for function @code{A} is the 3935 1.1.1.2 christos time spent outside calling functions @code{B} and @code{C}. 3936 1.1 christos 3937 1.1 christos @cindex Leaf function 3938 1.1 christos In case of a @emph{leaf function}, the inclusive and exclusive values for the 3939 1.1 christos metric are the same since by definition, it is not calling any other 3940 1.1 christos function(s). 3941 1.1 christos 3942 1.1 christos Why do we use these two different values? The inclusive metric shows the most 3943 1.1 christos expensive path, in terms of this metric, in the application. For example, if 3944 1.1 christos the metric is cache misses, the function with the highest inclusive metric 3945 1.1 christos tells you where most of the cache misses come from. 3946 1.1 christos 3947 1.1 christos Within this branch of the application, the exclusive metric points to the 3948 1.1 christos functions that contribute and help to identify which part(s) to consider 3949 1.1 christos for further analysis. 3950 1.1 christos 3951 1.1 christos @c ---------------------------------------------------------------------------- 3952 1.1 christos @node Metric Definitions 3953 1.1 christos @section Metric Definitions 3954 1.1 christos @c ---------------------------------------------------------------------------- 3955 1.1 christos The metrics displayed in the various views are highly customizable. In this 3956 1.1 christos section it is explained how to construct the metrics definition(s). 3957 1.1 christos 3958 1.1 christos @IndexSubentry{Options, @code{-metrics}} 3959 1.1 christos @IndexSubentry{Commands, @code{metrics}} 3960 1.1 christos The @command{metrics} command takes a colon (@samp{:}) separated list, where 3961 1.1 christos each item in the list consists of the following three fields: 3962 1.1 christos @var{<flavor>}@var{<visibility>}@var{<metric-name>}. 3963 1.1 christos 3964 1.1 christos @cindex Flavor field 3965 1.1 christos @cindex Visibility field 3966 1.1 christos @cindex Metric name field 3967 1.1 christos @IndexSubentry{Metrics, Flavor field} 3968 1.1 christos @IndexSubentry{Metrics, Visibility field} 3969 1.1 christos @IndexSubentry{Metrics, Metric name field} 3970 1.1 christos The @var{<flavor>} field is either @samp{e} for ``exclusive'', and/or 3971 1.1 christos @samp{i} for ``inclusive''. The @var{<metric-name>} field is the name of 3972 1.1 christos the metric and the @var{<visibility>} field consists of one ore more characters 3973 1.1 christos from the following table: 3974 1.1 christos 3975 1.1 christos @table @code 3976 1.1 christos 3977 1.1 christos @item . 3978 1.1 christos Show the metric as time. This applies to timing metrics and hardware event 3979 1.1 christos counters that measure cycles. Interpret as @samp{+} for other metrics. 3980 1.1 christos 3981 1.1 christos @item % 3982 1.1 christos Show the metric as a percentage of the total value for this metric. 3983 1.1 christos 3984 1.1 christos @item + 3985 1.1 christos Show the metric as an absolute value. For hardware event counters this is 3986 1.1 christos the event count. Interpret as @samp{.} for timing metrics. 3987 1.1 christos 3988 1.1 christos @item ! 3989 1.1 christos Do not show any metric value. Cannot be used with other visibility characters. 3990 1.1 christos This visibility is meant to be used in a @command{dmetrics} command to set 3991 1.1 christos default metrics that override the built-in visibility defaults 3992 1.1 christos for each type of metric. 3993 1.1 christos 3994 1.1 christos @end table 3995 1.1 christos 3996 1.1 christos Both the @var{<flavor>} and @var{<visibility>} strings may have more than one 3997 1.1 christos character. If both strings have more than one character, the @var{<flavor>} 3998 1.1 christos string is expanded first. For example, @code{ie.%user} is first expanded to 3999 1.1 christos @code{i.%user:e.%user}, which is then expanded into 4000 1.1 christos @code{i.user:i%user:e.user:e%user}. 4001 1.1 christos 4002 1.1 christos @c ---------------------------------------------------------------------------- 4003 1.1 christos @node The Viewmode 4004 1.1 christos @section The Viewmode 4005 1.1 christos 4006 1.1 christos @cindex Viewmode 4007 1.1 christos @IndexSubentry{Options, @code{-viewmode}} 4008 1.1 christos @IndexSubentry{Commands, @code{viewmode}} 4009 1.1 christos 4010 1.1 christos There are different ways to view a call stack in Java. In @ToolName{}, this 4011 1.1 christos is called the @emph{viewmode} and the setting is controlled through a command 4012 1.1 christos with the same name. 4013 1.1 christos 4014 1.1 christos The @code{viewmode} command takes one of the following keywords: 4015 1.1 christos 4016 1.1 christos @table @code 4017 1.1 christos 4018 1.1 christos @item user 4019 1.1 christos This is the default and shows the Java call stacks for Java threads. 4020 1.1 christos No call stacks for any housekeeping threads are shown. The function 4021 1.1 christos list contains a function 4022 1.1 christos @IndexSubentry{Java profiling, @code{<JVM-System>}} 4023 1.1 christos @code{<JVM-System>} that represents the aggregated time from non-Java 4024 1.1 christos threads. 4025 1.1 christos When the JVM software does not report a Java call stack, time is reported 4026 1.1 christos against the function 4027 1.1 christos @IndexSubentry{Java profiling, @code{<no Java callstack recorded>}} 4028 1.1 christos @code{<no Java callstack recorded>}. 4029 1.1 christos 4030 1.1 christos @item expert 4031 1.1 christos Show the Java call stacks for Java threads when the Java code from the 4032 1.1 christos user is executed and machine call stacks when JVM code is executed, or 4033 1.1 christos when the JVM software does not report a Java call stack. 4034 1.1 christos Show the machine call stacks for housekeeping threads. 4035 1.1 christos 4036 1.1 christos @item machine 4037 1.1 christos Show the actual native call stacks for all threads. 4038 1.1 christos 4039 1.1 christos @end table 4040 1.1 christos 4041 1.1 christos @c ---------------------------------------------------------------------------- 4042 1.1 christos @node The Selection List 4043 1.1 christos @section The Selection List 4044 1.1 christos @c ---------------------------------------------------------------------------- 4045 1.1 christos 4046 1.1 christos @cindex Selection list 4047 1.1 christos @cindex List specification 4048 1.1 christos Several commands allow the user to specify a sequence of numbers called the 4049 1.1 christos @emph{selection list}. Such a list may for example be used to select specific 4050 1.1 christos threads from all the threads that have been used when conducting the 4051 1.1 christos experiment(s). 4052 1.1 christos 4053 1.1 christos A selection list (or ``list'' in the remainder of this section) can be a 4054 1.1 christos single number, a contiguous range of numbers with the start and end numbers 4055 1.1 christos separated by a hyphen (@samp{-}), a comma-separated list of numbers and 4056 1.1 christos ranges, or the @code{all} keyword that resets the filter. 4057 1.1 christos @IndexSubentry{Filters, Reset to default} 4058 1.1 christos Lists must not contain spaces. 4059 1.1 christos 4060 1.1 christos Each list can optionally be preceded by an experiment list with a similar 4061 1.1 christos format, separated from the list by a colon (:). 4062 1.1 christos If no experiment list is included, the list applies to all experiments. 4063 1.1 christos 4064 1.1 christos Multiple lists can be concatenated by separating the individual lists 4065 1.1 christos by a plus sign. 4066 1.1 christos 4067 1.1 christos These are some examples of various filters using a list: 4068 1.1 christos 4069 1.1 christos @table @code 4070 1.1 christos 4071 1.1 christos @item thread_select 1 4072 1.1 christos Select thread 1 from all experiments. 4073 1.1 christos 4074 1.1 christos @item thread_select all:1 4075 1.1 christos Select thread 1 from all experiments. 4076 1.1 christos 4077 1.1 christos @item thread_select 1:all 4078 1.1 christos Select all the threads from the first experiment loaded. 4079 1.1 christos 4080 1.1 christos @item thread_select 1:2+3:4 4081 1.1 christos Select thread 2 from experiment 1 and thread 4 from experiment 3. 4082 1.1 christos 4083 1.1 christos @item cpu_select all:1,3,5 4084 1.1 christos Selects cores 1, 3, and 5 from all experiments. 4085 1.1 christos 4086 1.1 christos @item cpu_select 1,2:all 4087 1.1 christos Select all cores from experiments 1 and 2. 4088 1.1 christos 4089 1.1 christos @end table 4090 1.1 christos 4091 1.1 christos Recall that there are several list commands that show the mapping between the 4092 1.1.1.2 christos numbers and the targets. 4093 1.1 christos 4094 1.1 christos @IndexSubentry{Options, @code{-experiment_list}} 4095 1.1 christos @IndexSubentry{Commands, @code{experiment_list}} 4096 1.1 christos For example, the @command{experiment_list} command shows the name(s) of the 4097 1.1.1.2 christos experiment(s) loaded and the associated number. In this example it is used 4098 1.1 christos to get this information for a range of experiments: 4099 1.1 christos 4100 1.1 christos @cartouche 4101 1.1 christos @smallexample 4102 1.1 christos $ gprofng display text -experiment_list mxv.?.thr.er 4103 1.1 christos @end smallexample 4104 1.1 christos @end cartouche 4105 1.1 christos 4106 1.1 christos @noindent 4107 1.1 christos This is the output, showing for each experiment the ID, the PID, and the name: 4108 1.1 christos 4109 1.1 christos @smallexample 4110 1.1 christos @verbatim 4111 1.1 christos ID Sel PID Experiment 4112 1.1 christos == === ======= ============ 4113 1.1 christos 1 yes 2750071 mxv.1.thr.er 4114 1.1 christos 2 yes 1339450 mxv.2.thr.er 4115 1.1 christos 3 yes 3579561 mxv.4.thr.er 4116 1.1 christos @end verbatim 4117 1.1 christos @end smallexample 4118 1.1 christos 4119 1.1 christos @c ---------------------------------------------------------------------------- 4120 1.1 christos @node Load Objects and Functions 4121 1.1 christos @section Load Objects and Functions 4122 1.1 christos @c ---------------------------------------------------------------------------- 4123 1.1 christos 4124 1.1 christos An application consists of various components. The source code files are 4125 1.1 christos compiled into object files. These are then glued together at link time to form 4126 1.1 christos the executable. 4127 1.1 christos During execution, the program may also dynamically load objects. 4128 1.1 christos 4129 1.1 christos @cindex Load object 4130 1.1.1.2 christos A @emph{load object} is defined to be an executable, or shared object. A 4131 1.1.1.2 christos shared library is an example of a load object in @ToolName{}. 4132 1.1 christos 4133 1.1.1.2 christos Each load object, contains a text section with the instructions generated by 4134 1.1.1.2 christos the compiler, a data section for data, and various symbol tables. 4135 1.1 christos All load objects must contain an 4136 1.1 christos @cindex ELF 4137 1.1 christos ELF 4138 1.1 christos symbol table, which gives the names and addresses of all the globally known 4139 1.1 christos functions in that object. 4140 1.1 christos 4141 1.1.1.2 christos Load objects compiled with the -g option contain additional symbolic 4142 1.1.1.2 christos information that can augment the ELF symbol table and provide information 4143 1.1.1.2 christos about functions that are not global, additional information about object 4144 1.1.1.2 christos modules from which the functions came, and line number information relating 4145 1.1.1.2 christos addresses to source lines. 4146 1.1 christos 4147 1.1 christos The term 4148 1.1 christos @cindex Function 4149 1.1 christos @emph{function} 4150 1.1 christos is used to describe a set of instructions that represent a high-level operation 4151 1.1.1.2 christos described in the source code. The term also covers methods as used in C++ 4152 1.1.1.2 christos and in the Java programming language. 4153 1.1 christos 4154 1.1 christos In the @ToolName{} context, functions are provided in source code format. 4155 1.1.1.2 christos Normally their names appear in the symbol table representing a set of 4156 1.1.1.2 christos addresses. 4157 1.1 christos @cindex Program Counter 4158 1.1 christos @cindex PC 4159 1.1.1.2 christos If the Program Counter (PC) is within that set, the program is executing within 4160 1.1.1.2 christos that function. 4161 1.1 christos 4162 1.1.1.2 christos In principle, any address within the text segment of a load object can be 4163 1.1.1.2 christos mapped to a function. Exactly the same mapping is used for the leaf PC and 4164 1.1.1.2 christos all the other PCs on the call stack. 4165 1.1.1.2 christos 4166 1.1.1.2 christos Most of the functions correspond directly to the source model of the program, 4167 1.1.1.2 christos but there are exceptions. This topic is however outside of the scope of this 4168 1.1.1.2 christos guide. 4169 1.1 christos 4170 1.1 christos @c ---------------------------------------------------------------------------- 4171 1.1 christos @node The Concept of a CPU in gprofng 4172 1.1 christos @section The Concept of a CPU in gprofng 4173 1.1 christos @c ---------------------------------------------------------------------------- 4174 1.1 christos 4175 1.1 christos @cindex CPU 4176 1.1 christos In @ProductName{}, there is the concept of a CPU. Admittedly, this is not the 4177 1.1 christos best word to describe what is meant here and may be replaced in the future. 4178 1.1 christos 4179 1.1 christos The word CPU is used in many of the displays. 4180 1.1 christos In the context of @ProductName{}, it is meant to denote a part of the 4181 1.1 christos processor that is capable of executing instructions and with its own state, 4182 1.1 christos like the program counter. 4183 1.1 christos 4184 1.1 christos For example, on a contemporary processor, a CPU could be a core. In case 4185 1.1 christos hardware threads are supported within a core, a CPU is one of those 4186 1.1 christos hardware threads. 4187 1.1 christos 4188 1.1 christos To see which CPUs have been used in the experiment, use the @command{cpu} 4189 1.1 christos command in @DisplayText{}. 4190 1.1 christos 4191 1.1 christos @c ---------------------------------------------------------------------------- 4192 1.1 christos @node Hardware Event Counters Explained 4193 1.1 christos @section Hardware Event Counters Explained 4194 1.1 christos @c ---------------------------------------------------------------------------- 4195 1.1 christos 4196 1.1 christos @IndexSubentry{Hardware event counters, description} 4197 1.1 christos For quite a number of years now, many microprocessors have supported hardware 4198 1.1 christos event counters. 4199 1.1 christos 4200 1.1 christos On the hardware side, this means that in the processor there are one or more 4201 1.1 christos registers dedicated to count certain activities, or ``events''. 4202 1.1.1.2 christos Examples of such events are the number of instructions executed, or the 4203 1.1.1.2 christos number of cache misses at level 2 in the memory hierarchy. 4204 1.1 christos 4205 1.1 christos While there is a limited set of such registers, the user can map events onto 4206 1.1 christos them. In case more than one register is available, this allows for the 4207 1.1 christos simultaenous measurement of various events. 4208 1.1 christos 4209 1.1 christos A simple, yet powerful, example is to simultaneously count the number of CPU 4210 1.1 christos cycles and the number of instructions excuted. These two numbers can then be 4211 1.1 christos used to compute the 4212 1.1 christos @cindex IPC 4213 1.1.1.2 christos @emph{IPC} value. IPC stands for ``Instructions Per Clockcycle'' and each 4214 1.1.1.2 christos processor has an architecturally defined maximum. For example, if this 4215 1.1.1.2 christos maximum number is 2, it means the processor is capable of executing two 4216 1.1.1.2 christos instructions every clock cycle. 4217 1.1 christos 4218 1.1 christos Whether this is actually achieved, depends on several factors, including the 4219 1.1 christos instruction characteristics. 4220 1.1 christos However, in case the IPC value is well below this maximum in a time critical 4221 1.1 christos part of the application and this cannot be easily explained, further 4222 1.1 christos investigation is probably warranted. 4223 1.1 christos 4224 1.1 christos @cindex CPI 4225 1.1 christos A related metric is called @emph{CPI}, or ``Clockcycles Per Instruction''. 4226 1.1 christos It is the inverse of the CPI and can be compared against the theoretical 4227 1.1 christos value(s) of the target instruction(s). A significant difference may point 4228 1.1 christos at a bottleneck. 4229 1.1 christos 4230 1.1 christos One thing to keep in mind is that the value returned by a counter can either 4231 1.1 christos be the number of times the event occured, or a CPU cycle count. In case of 4232 1.1 christos the latter it is possible to convert this number to time. 4233 1.1 christos 4234 1.1 christos @IndexSubentry{Hardware event counters, variable CPU frequency} 4235 1.1 christos This is often easier to interpret than a simple count, but there is one 4236 1.1 christos caveat to keep in mind. The CPU frequency may not have been constant while 4237 1.1 christos the experimen was recorded and this impacts the time reported. 4238 1.1 christos 4239 1.1 christos These event counters, or ``counters'' for short, provide great insight into 4240 1.1 christos what happens deep inside the processor. In case higher level information does 4241 1.1 christos not provide the insight needed, the counters provide the information to get 4242 1.1 christos to the bottom of a performance problem. 4243 1.1 christos 4244 1.1 christos There are some things to consider though. 4245 1.1 christos 4246 1.1 christos @itemize @bullet 4247 1.1 christos 4248 1.1 christos @item 4249 1.1 christos The event definitions and names vary across processors and it may even happen 4250 1.1 christos that some events change with an update. 4251 1.1 christos Unfortunately and this is luckily rare, there are sometimes bugs causing the 4252 1.1 christos wrong count to be returned. 4253 1.1 christos 4254 1.1 christos @IndexSubentry{Hardware event counters, alias name} 4255 1.1 christos In @ToolName{}, some of the processor specific event names have an alias 4256 1.1 christos name. For example @code{insts} measures the instructions executed. 4257 1.1 christos These aliases not only makes it easier to identify the functionality, but also 4258 1.1 christos provide portability of certain events across processors. 4259 1.1 christos 4260 1.1 christos @item 4261 1.1 christos Another complexity is that there are typically many events one can monitor. 4262 1.1 christos There may up to hundreds of events available and it could require several 4263 1.1 christos experiments to zoom in on the root cause of a performance problem. 4264 1.1 christos 4265 1.1 christos @item 4266 1.1 christos There may be restrictions regarding the mapping of event(s) onto the 4267 1.1 christos counters. For example, certain events may be restricted to specific 4268 1.1 christos counters only. As a result, one may have to conduct additional experiments 4269 1.1 christos to cover all the events of interest. 4270 1.1 christos 4271 1.1 christos @item 4272 1.1 christos The names of the events may also not be easy to interpret. In such cases, 4273 1.1 christos the description can be found in the architecture manual for the processor. 4274 1.1 christos 4275 1.1 christos @end itemize 4276 1.1 christos 4277 1.1 christos Despite these drawbacks, hardware event counters are extremely useful and 4278 1.1 christos may even turn out to be indispensable. 4279 1.1 christos 4280 1.1 christos @c ---------------------------------------------------------------------------- 4281 1.1 christos @node apath 4282 1.1 christos @section What is <apath>? 4283 1.1 christos @c ---------------------------------------------------------------------------- 4284 1.1 christos 4285 1.1 christos In most cases, @ToolName{} shows the absolute pathnames of directories. These 4286 1.1 christos tend to be rather long, causing display issues in this document. 4287 1.1 christos 4288 1.1 christos Instead of wrapping these long pathnames over multiple lines, we decided to 4289 1.1 christos represent them by the @code{<apath>} symbol, which stands for ``an absolute 4290 1.1 christos pathname''. 4291 1.1 christos 4292 1.1 christos Note that different occurrences of @code{<apath>} may represent different 4293 1.1 christos absolute pathnames. 4294 1.1 christos 4295 1.1 christos @c -- A new node -------------------------------------------------------------- 4296 1.1 christos @node Other Document Formats 4297 1.1 christos @chapter Other Document Formats 4298 1.1 christos @c ---------------------------------------------------------------------------- 4299 1.1 christos 4300 1.1 christos @emph{This chapter is applicable when building gprofng from the 4301 1.1 christos binutils source.} 4302 1.1 christos 4303 1.1 christos This document is written in Texinfo and the source text is made available as 4304 1.1 christos part of the binutils distribution. The file name is @code{gprofng.texi} and 4305 1.1 christos can be found in subdirectory @code{gprofng/doc} of the top level binutils 4306 1.1 christos directory. 4307 1.1 christos 4308 1.1 christos The default installation procedure creates a file in the @code{info} format and 4309 1.1 christos stores it in the documentation section of binutils. 4310 1.1 christos This source file can however also be used to generate the document in the 4311 1.1 christos @code{html} and @code{pdf} formats. These may be easier to read and search. 4312 1.1 christos 4313 1.1 christos To generate this documentation file in a different format, go to the directory 4314 1.1 christos that was used to build the tools. The make file to build the other formats is 4315 1.1 christos in the @code{gprofng/doc} subdirectory. 4316 1.1 christos 4317 1.1 christos For example, if you have set the build directory to be @var{<my-build-dir>}, 4318 1.1 christos go to subdirectory @var{<my-build-dir>/gprofng/doc}. 4319 1.1 christos 4320 1.1.1.2 christos This subdirectory has a single file called @file{Makefile} that can be used to 4321 1.1.1.2 christos build the documentation in various formats. We recommend to use these 4322 1.1.1.2 christos commands. 4323 1.1 christos 4324 1.1 christos There are four commands to generate the documentation in the @code{html} or 4325 1.1 christos @code{pdf} format. It is assumed that you are in directory @code{gprofng/doc} 4326 1.1 christos under the main directory @var{<my-build-dir>}. 4327 1.1 christos 4328 1.1 christos @table @code 4329 1.1 christos 4330 1.1 christos @item make html 4331 1.1 christos Create the html file in the current directory. 4332 1.1 christos 4333 1.1 christos @item make pdf 4334 1.1 christos Create the pdf file in the current directory. 4335 1.1 christos 4336 1.1 christos @item make install-html 4337 1.1 christos Create and install the html file in the binutils documentation directory. 4338 1.1 christos 4339 1.1 christos @item make install-pdf 4340 1.1.1.2 christos Create and install the pdf file in the binutils documentation directory. 4341 1.1 christos 4342 1.1 christos @end table 4343 1.1 christos 4344 1.1.1.2 christos For example, to install this document in the binutils documentation directory, 4345 1.1.1.2 christos the commands below may be executed. In this notation, @var{<format>} 4346 1.1 christos is one of @code{html}, or @code{pdf}: 4347 1.1 christos 4348 1.1 christos @smallexample 4349 1.1 christos @verbatim 4350 1.1 christos $ cd <my-build-dir>/gprofng/doc 4351 1.1 christos $ make install-<format> 4352 1.1 christos @end verbatim 4353 1.1 christos @end smallexample 4354 1.1 christos 4355 1.1.1.2 christos The binutils installation directory is either the default @code{/usr/local} or 4356 1.1.1.2 christos the one that has been set with the @code{--prefix} option as part of the 4357 1.1.1.2 christos @code{configure} command. In this example we symbolize this location with 4358 1.1.1.2 christos @code{<install>}. 4359 1.1 christos 4360 1.1 christos The documentation directory is @code{<install>/share/doc/gprofng} in case 4361 1.1 christos @code{html} or @code{pdf} is selected and @code{<install>/share/info} for the 4362 1.1 christos file in the @code{info} format. 4363 1.1 christos 4364 1.1 christos @noindent 4365 1.1 christos Some things to note: 4366 1.1 christos 4367 1.1 christos @itemize 4368 1.1 christos 4369 1.1 christos @item 4370 1.1 christos For the @code{pdf} file to be generated, the @code{texi2dvi} tool is required. 4371 1.1 christos @cindex texi2dvi 4372 1.1 christos It is for example available as part of the @code{texinfo-tex} package. 4373 1.1 christos 4374 1.1 christos @item 4375 1.1 christos Instead of generating a single file in the @code{html} format, it is also 4376 1.1 christos possible to create a directory with individual files for the various chapters. 4377 1.1 christos To do so, remove the use of @code{--no-split} in variable @code{MAKEINFOHTML} 4378 1.1 christos in the make file in the @code{<my-build-dir/gprofng/doc} directory. 4379 1.1 christos 4380 1.1 christos @end itemize 4381 1.1 christos 4382 1.1 christos @c -- An appendix ------------------------------------------------------------- 4383 1.1 christos @node The gprofng Man Pages 4384 1.1 christos @appendix The gprofng Man Pages 4385 1.1 christos @c ---------------------------------------------------------------------------- 4386 1.1 christos 4387 1.1 christos In this appendix the man pages for the various @ProductName{} tools are listed. 4388 1.1 christos 4389 1.1 christos @c -- A new node -------------------------------------------------------------- 4390 1.1 christos @c @node gprofng driver 4391 1.1 christos @node Man page for @command{gprofng} 4392 1.1 christos @section Man page for @command{gprofng} 4393 1.1 christos @c ---------------------------------------------------------------------------- 4394 1.1 christos 4395 1.1 christos @include gprofng.texi 4396 1.1 christos 4397 1.1 christos @c -- A new node -------------------------------------------------------------- 4398 1.1 christos @page 4399 1.1 christos @node Man page for @command{gprofng collect app} 4400 1.1 christos @section Man page for @command{gprofng collect app} 4401 1.1 christos @c ---------------------------------------------------------------------------- 4402 1.1 christos 4403 1.1.1.2 christos @include gprofng-collect-app.texi 4404 1.1 christos 4405 1.1 christos @c -- A new node -------------------------------------------------------------- 4406 1.1 christos @page 4407 1.1 christos @node Man page for @command{gprofng display text} 4408 1.1 christos @section Man page for @command{gprofng display text} 4409 1.1 christos @c ---------------------------------------------------------------------------- 4410 1.1 christos 4411 1.1.1.2 christos @include gprofng-display-text.texi 4412 1.1 christos 4413 1.1 christos @c -- A new node -------------------------------------------------------------- 4414 1.1 christos @page 4415 1.1 christos @node Man page for @command{gprofng display html} 4416 1.1 christos @section Man page for @command{gprofng display html} 4417 1.1 christos @c ---------------------------------------------------------------------------- 4418 1.1 christos 4419 1.1.1.2 christos @include gprofng-display-html.texi 4420 1.1 christos 4421 1.1 christos @c -- A new node -------------------------------------------------------------- 4422 1.1 christos @page 4423 1.1 christos @node Man page for @command{gprofng display src} 4424 1.1 christos @section Man page for @command{gprofng display src} 4425 1.1 christos @c ---------------------------------------------------------------------------- 4426 1.1 christos 4427 1.1.1.2 christos @include gprofng-display-src.texi 4428 1.1 christos 4429 1.1 christos @c -- A new node -------------------------------------------------------------- 4430 1.1 christos @page 4431 1.1 christos @node Man page for @command{gprofng archive} 4432 1.1 christos @section Man page for @command{gprofng archive} 4433 1.1 christos @c ---------------------------------------------------------------------------- 4434 1.1 christos 4435 1.1.1.2 christos @include gprofng-archive.texi 4436 1.1 christos 4437 1.1 christos @ifnothtml 4438 1.1 christos @node Index 4439 1.1 christos @unnumbered Index 4440 1.1 christos @printindex cp 4441 1.1 christos @end ifnothtml 4442 1.1 christos 4443 1.1 christos @bye 4444