17ec681f3SmrgPerfetto Tracing
27ec681f3Smrg================
37ec681f3Smrg
47ec681f3SmrgMesa has experimental support for `Perfetto <https://perfetto.dev>`__ for
57ec681f3SmrgGPU performance monitoring.  Perfetto supports multiple
67ec681f3Smrg`producers <https://perfetto.dev/docs/concepts/service-model>`__ each with
77ec681f3Smrgone or more data-sources.  Perfetto already provides various producers and
87ec681f3Smrgdata-sources for things like:
97ec681f3Smrg
107ec681f3Smrg- CPU scheduling events (``linux.ftrace``)
117ec681f3Smrg- CPU frequency scaling (``linux.ftrace``)
127ec681f3Smrg- System calls (``linux.ftrace``)
137ec681f3Smrg- Process memory utilization (``linux.process_stats``)
147ec681f3Smrg
157ec681f3SmrgAs well as various domain specific producers.
167ec681f3Smrg
177ec681f3SmrgThe mesa perfetto support adds additional producers, to allow for visualizing
187ec681f3SmrgGPU performance (frequency, utilization, performance counters, etc) on the
197ec681f3Smrgsame timeline, to better understand and tune/debug system level performance:
207ec681f3Smrg
217ec681f3Smrg- pps-producer: A systemwide daemon that can collect global performance
227ec681f3Smrg  counters.
237ec681f3Smrg- mesa: Per-process producer within mesa to capture render-stage traces
247ec681f3Smrg  on the GPU timeline, track events, etc.
257ec681f3Smrg
267ec681f3SmrgThe exact supported features vary per driver:
277ec681f3Smrg
287ec681f3Smrg.. list-table:: Supported data-sources
297ec681f3Smrg   :header-rows: 1
307ec681f3Smrg
317ec681f3Smrg   * - Driver
327ec681f3Smrg     - PPS Counters
337ec681f3Smrg     - Render Stages
347ec681f3Smrg   * - Freedreno
357ec681f3Smrg     - ``gpu.counters.msm``
367ec681f3Smrg     - ``gpu.renderstages.msm``
377ec681f3Smrg   * - Turnip
387ec681f3Smrg     - ``gpu.counters.msm``
397ec681f3Smrg     -
407ec681f3Smrg   * - Intel
417ec681f3Smrg     - ``gpu.counters.i915``
427ec681f3Smrg     -
437ec681f3Smrg   * - Panfrost
447ec681f3Smrg     - ``gpu.counters.panfrost``
457ec681f3Smrg     -
467ec681f3Smrg
477ec681f3SmrgRun
487ec681f3Smrg---
497ec681f3Smrg
507ec681f3SmrgTo capture a trace with perfetto you need to take the following steps:
517ec681f3Smrg
527ec681f3Smrg1. Build perfetto from sources available at ``subprojects/perfetto`` following
537ec681f3Smrg   `this guide <https://perfetto.dev/docs/quickstart/linux-tracing>`__.
547ec681f3Smrg
557ec681f3Smrg2. Create a `trace config <https://perfetto.dev/#/trace-config.md>`__, which is
567ec681f3Smrg   a json formatted text file with extension ``.cfg``, or use one of the config
577ec681f3Smrg   files under the ``src/tool/pps/cfg`` directory. More examples of config files
587ec681f3Smrg   can be found in ``subprojects/perfetto/test/configs``.
597ec681f3Smrg
607ec681f3Smrg3. Change directory to ``subprojects/perfetto`` and run a
617ec681f3Smrg   `convenience script <https://perfetto.dev/#/running.md>`__ to start the
627ec681f3Smrg   tracing service:
637ec681f3Smrg
647ec681f3Smrg   .. code-block:: console
657ec681f3Smrg
667ec681f3Smrg      cd subprojects/perfetto
677ec681f3Smrg      CONFIG=<path/to/gpu.cfg> OUT=out/linux_clang_release ./tools/tmux -n
687ec681f3Smrg
697ec681f3Smrg4. Start other producers you may need, e.g. ``pps-producer``.
707ec681f3Smrg
717ec681f3Smrg5. Start ``perfetto`` under the tmux session initiated in step 3.
727ec681f3Smrg
737ec681f3Smrg6. Once tracing has finished, you can detach from tmux with :kbd:`Ctrl+b`,
747ec681f3Smrg   :kbd:`d`, and the convenience script should automatically copy the trace
757ec681f3Smrg   files into ``$HOME/Downloads``.
767ec681f3Smrg
777ec681f3Smrg7. Go to `ui.perfetto.dev <https://ui.perfetto.dev>`__ and upload
787ec681f3Smrg   ``$HOME/Downloads/trace.protobuf`` by clicking on **Open trace file**.
797ec681f3Smrg
807ec681f3Smrg8. Alternatively you can open the trace in `AGI <https://gpuinspector.dev/>`__
817ec681f3Smrg   (which despite the name can be used to view non-android traces).
827ec681f3Smrg
837ec681f3SmrgDriver Specifics
847ec681f3Smrg~~~~~~~~~~~~~~~~
857ec681f3Smrg
867ec681f3SmrgBelow is driver specific information/instructions for the PPS producer.
877ec681f3Smrg
887ec681f3SmrgFreedreno / Turnip
897ec681f3Smrg^^^^^^^^^^^^^^^^^^
907ec681f3Smrg
917ec681f3SmrgThe Freedreno PPS driver needs root access to read system-wide
927ec681f3Smrgperformance counters, so you can simply run it with sudo:
937ec681f3Smrg
947ec681f3Smrg.. code-block:: console
957ec681f3Smrg
967ec681f3Smrg   sudo ./build/src/tool/pps/pps-producer
977ec681f3Smrg
987ec681f3SmrgIntel
997ec681f3Smrg^^^^^
1007ec681f3Smrg
1017ec681f3SmrgThe Intel PPS driver needs root access to read system-wide
1027ec681f3Smrg`RenderBasic <https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/gpu-metrics-reference.html>`__
1037ec681f3Smrgperformance counters, so you can simply run it with sudo:
1047ec681f3Smrg
1057ec681f3Smrg.. code-block:: console
1067ec681f3Smrg
1077ec681f3Smrg   sudo ./build/src/tool/pps/pps-producer
1087ec681f3Smrg
1097ec681f3SmrgAnother option to enable access wide data without root permissions would be running the following:
1107ec681f3Smrg
1117ec681f3Smrg.. code-block:: console
1127ec681f3Smrg
1137ec681f3Smrg   sudo sysctl dev.i915.perf_stream_paranoid=0
1147ec681f3Smrg
1157ec681f3SmrgAlternatively using the ``CAP_PERFMON`` permission on the binary should work too.
1167ec681f3Smrg
1177ec681f3SmrgPanfrost
1187ec681f3Smrg^^^^^^^^
1197ec681f3Smrg
1207ec681f3SmrgThe Panfrost PPS driver uses unstable ioctls that behave correctly on
1217ec681f3Smrgkernel version `5.4.23+ <https://lwn.net/Articles/813601/>`__ and
1227ec681f3Smrg`5.5.7+ <https://lwn.net/Articles/813600/>`__.
1237ec681f3Smrg
1247ec681f3SmrgTo run the producer, follow these two simple steps:
1257ec681f3Smrg
1267ec681f3Smrg1. Enable Panfrost unstable ioctls via kernel parameter:
1277ec681f3Smrg
1287ec681f3Smrg   .. code-block:: console
1297ec681f3Smrg
1307ec681f3Smrg      modprobe panfrost unstable_ioctls=1
1317ec681f3Smrg
1327ec681f3Smrg   Alternatively you could add ``panfrost.unstable_ioctls=1`` to your kernel command line, or ``echo 1 > /sys/module/panfrost/parameters/unstable_ioctls``.
1337ec681f3Smrg
1347ec681f3Smrg2. Run the producer:
1357ec681f3Smrg
1367ec681f3Smrg   .. code-block:: console
1377ec681f3Smrg
1387ec681f3Smrg      ./build/pps-producer
1397ec681f3Smrg
1407ec681f3SmrgTroubleshooting
1417ec681f3Smrg---------------
1427ec681f3Smrg
1437ec681f3SmrgTmux
1447ec681f3Smrg~~~~
1457ec681f3Smrg
1467ec681f3SmrgIf the convenience script ``tools/tmux`` keeps copying artifacts to your
1477ec681f3Smrg``SSH_TARGET`` without starting the tmux session, make sure you have ``tmux``
1487ec681f3Smrginstalled in your system.
1497ec681f3Smrg
1507ec681f3Smrg.. code-block:: console
1517ec681f3Smrg
1527ec681f3Smrg   apt install tmux
1537ec681f3Smrg
1547ec681f3SmrgMissing counter names
1557ec681f3Smrg~~~~~~~~~~~~~~~~~~~~~
1567ec681f3Smrg
1577ec681f3SmrgIf the trace viewer shows a list of counters with a description like
1587ec681f3Smrg``gpu_counter(#)`` instead of their proper names, maybe you had a data loss due
1597ec681f3Smrgto the trace buffer being full and wrapped.
1607ec681f3Smrg
1617ec681f3SmrgIn order to prevent this loss of data you can tweak the trace config file in
1627ec681f3Smrgtwo different ways:
1637ec681f3Smrg
1647ec681f3Smrg- Increase the size of the buffer in use:
1657ec681f3Smrg
1667ec681f3Smrg  .. code-block:: javascript
1677ec681f3Smrg
1687ec681f3Smrg      buffers {
1697ec681f3Smrg          size_kb: 2048,
1707ec681f3Smrg          fill_policy: RING_BUFFER,
1717ec681f3Smrg      }
1727ec681f3Smrg
1737ec681f3Smrg- Periodically flush the trace buffer into the output file:
1747ec681f3Smrg
1757ec681f3Smrg  .. code-block:: javascript
1767ec681f3Smrg
1777ec681f3Smrg      write_into_file: true
1787ec681f3Smrg      file_write_period_ms: 250
1797ec681f3Smrg
1807ec681f3Smrg
1817ec681f3Smrg- Discard new traces when the buffer fills:
1827ec681f3Smrg
1837ec681f3Smrg  .. code-block:: javascript
1847ec681f3Smrg
1857ec681f3Smrg      buffers {
1867ec681f3Smrg          size_kb: 2048,
1877ec681f3Smrg          fill_policy: DISCARD,
1887ec681f3Smrg      }
189