Home | History | Annotate | Download | only in tprof
History log of /src/sys/dev/tprof/tprof.c
RevisionDateAuthorComments
 1.23  11-Apr-2023  msaitoh KNF. No functional change.
 1.22  16-Dec-2022  ryo tprof_lock is not a spin mutex. use mutex_{enter,exit}(). oops
 1.21  16-Dec-2022  ryo branches: 1.21.2;
- Add support select(2)/poll(2) on /dev/tprof.
- Changed sampling buffer switching frequency (which is the frequency of tprof_worker()
calls and also the maximum block time of read(2) of /dev/tprof) from 1sec to 125ms.
This improve tprof top responsiveness.
- The maximum number of sampling buffers is now adjusted according to the number of CPUs.
Previously it was fixed at 100 and was insufficient if ncpu was greater than this.

The maximum number of samples per second per CPU is calculated by
"TPROF_MAX_SAMPLES_PER_BUF * (HZ of tprof_worker)".
Therefore, currently, 10000 * (1000/125) = 80000 maximum samplings per CPU.
The actual value will vary slightly from this due to tprof_worker and read(2) timing.
This value may need to be adjusted more in the future.
 1.20  11-Dec-2022  chs make sure error is initialized before we return it.
 1.19  01-Dec-2022  ryo Improve tprof(4)

- Multiple events can now be handled simultaneously.
- Counters should be configured with TPROF_IOC_CONFIGURE_EVENT in advance,
instead of being configured at TPROF_IOC_START.
- The configured counters can be started and stopped repeatedly by
PROF_IOC_START/TPROF_IOC_STOP.
- The value of the performance counter can be obtained at any timing as a 64bit
value with TPROF_IOC_GETCOUNTS.
- Backend common parts are handled in tprof.c as much as possible, and functions
on the tprof_backend side have been reimplemented to be more primitive.
- The reset value of counter overflows for profiling can now be adjusted.
It is calculated by default from the CPU clock (speed of cycle counter) and
TPROF_HZ, but for some events the value may be too large to be sufficient for
profiling. The event counter can be specified as a ratio to the default or as
an absolute value when configuring the event counter.
- Due to overall changes, API and ABI have been changed. TPROF_VERSION and
TPROF_BACKEND_VERSION were updated.
 1.18  01-Dec-2022  ryo don't call kpreempt_{disable,enable}() from an interrupt handler.

Fixed a problem in which the system would freeze if a high load (e.g., build.sh -j20)
was applied while running `tprof monitor -e LsNotHaltedCyc ...' on x86.

This almost eliminates the problem, but still is not enough. tprof_x86 uses NMI
interrupts, which are interrupted even in splhigh(), leaving the possibility of
being interrupted in the splhigh section of percpu_cpu_swap().
 1.17  28-Mar-2022  riastradh driver(9): devsw_detach never fails. Make it return void.

Prune a whole lotta dead branches as a result of this. (Some logic
calling this is also wrong for other reasons; devsw_detach is final
-- you should never have any reason to decide to roll it back. To be
cleaned up in subsequent commits...)

XXX kernel ABI change to devsw_detach signature requires bump
 1.16  01-Nov-2021  skrll Trailing whitespace
 1.15  27-Nov-2020  riastradh tprof: Use percpu rather than a MAXCPUS-element array.
 1.14  13-Jul-2018  maxv branches: 1.14.6; 1.14.14;
Revamp tprof.

Rewrite the Intel backend to use the generic PMC interface, which is
available on all Intel CPUs. Synchronize the AMD backend with the new
interface.

The kernel identifies the PMC interface, and gives its id to userland.
Userland then queries the events itself (via cpuid etc). These events
depend on the PMC interface.

The tprof utility is rewritten to allow the user to choose which event
to count (which was not possible until now, the event was hardcoded in
the backend). The command line format is based on usr.bin/pmc, eg:

tprof -e llc-misses:k -o output sleep 20

The man page is updated too, but the arguments will likely change soon
anyway so it doesn't matter a lot.

The tprof utility has three tables:

Intel Architectural Version 1
Intel Skylake/Kabylake
AMD Family 10h

A CPU can support a combination of tables. For example Kabylake has
Intel-Architectural-Version-1 and its own Intel-Kabylake table.

For now the Intel Skylake/Kabylake table contains only one event, just
to demonstrate that the combination of tables works. Tested on an
Intel Core i5 Kabylake.

The code for AMD Family 10h is taken from the code I had written for
usr.bin/pmc. I haven't tested it yet, but it's the same as pmc(1), so
I guess it works as-is.

The whole thing is written in such a way that (I think) it is not
complicated to add more CPU models, and more architectures (other than
x86).
 1.13  20-Aug-2015  christos branches: 1.13.8; 1.13.16; 1.13.18;
include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.12  25-Jul-2014  dholland branches: 1.12.4;
Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.11  16-Mar-2014  dholland branches: 1.11.2;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.10  14-Apr-2011  yamt branches: 1.10.4; 1.10.14; 1.10.18;
for each samples, record and report cpuid and lwpid.
 1.9  25-Feb-2011  yamt tprof_start: don't forget to restore refcount when failed to start backend.
 1.8  05-Feb-2011  yamt tprof: record pid and userland events.
 1.7  11-Aug-2010  pgoyette branches: 1.7.2; 1.7.4;
Keep condvar wmesg within 8-char limit
 1.6  13-Mar-2009  yamt branches: 1.6.2; 1.6.4;
tprof_stop1: add an assertion.
 1.5  11-Mar-2009  yamt fix breakage where db_regs_t != trapframe.
the problem pointed out by Martin Husemann on tech-kern@.
 1.4  10-Mar-2009  yamt - adapt to MODULAR.
- some preparations to have more backends.
- add some comments.
 1.3  20-Jan-2009  yamt branches: 1.3.2;
comment
 1.2  07-May-2008  yamt branches: 1.2.8;
tprof_start: fix workqueue's IPL.
 1.1  01-Jan-2008  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.14; 1.1.16; 1.1.18;
a simple performance monitor based profiler, inspired from linux oprofile.
 1.1.18.3  09-Oct-2010  yamt sync with head
 1.1.18.2  04-May-2009  yamt sync with head.
 1.1.18.1  16-May-2008  yamt sync with head.
 1.1.16.1  18-May-2008  yamt sync with head.
 1.1.14.1  02-Jun-2008  mjf Sync with HEAD.
 1.1.8.2  18-Feb-2008  mjf Sync with HEAD.
 1.1.8.1  01-Jan-2008  mjf file tprof.c was added on branch mjf-devfs on 2008-02-18 21:06:25 +0000
 1.1.6.2  21-Jan-2008  yamt sync with head
 1.1.6.1  01-Jan-2008  yamt file tprof.c was added on branch yamt-lazymbuf on 2008-01-21 09:44:40 +0000
 1.1.4.2  09-Jan-2008  matt sync with HEAD
 1.1.4.1  01-Jan-2008  matt file tprof.c was added on branch matt-armv6 on 2008-01-09 01:54:36 +0000
 1.1.2.2  02-Jan-2008  bouyer Sync with HEAD
 1.1.2.1  01-Jan-2008  bouyer file tprof.c was added on branch bouyer-xeni386 on 2008-01-02 21:55:17 +0000
 1.2.8.2  28-Apr-2009  skrll Sync with HEAD.
 1.2.8.1  03-Mar-2009  skrll Sync with HEAD.
 1.3.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.6.4.2  21-Apr-2011  rmind sync with head
 1.6.4.1  05-Mar-2011  rmind sync with head
 1.6.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.7.4.2  05-Mar-2011  bouyer Sync with HEAD
 1.7.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.7.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.10.18.1  18-May-2014  rmind sync with head
 1.10.14.2  03-Dec-2017  jdolecek update from HEAD
 1.10.14.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.4.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.2.1  10-Aug-2014  tls Rebase.
 1.12.4.1  22-Sep-2015  skrll Sync with HEAD
 1.13.18.1  10-Jun-2019  christos Sync with HEAD
 1.13.16.1  28-Jul-2018  pgoyette Sync with HEAD
 1.13.8.2  29-Apr-2017  pgoyette Revise previous. Rather than explicitly including <sys/localcount.h>
in all the places where {b,c}devsw is initialized, just include it
from <sys/conf.h>. This avoids an include-sequence dependancy.
 1.13.8.1  29-Apr-2017  pgoyette Add DEVSW_MODULE_INIT to existing device-driver modules, so that they
willl have a localcount defined and thus be permitted to load. Without
a localcount, loading the module will return EINVAL.

XXX the dtrace and drm stuff might need to be fed back upstream?
 1.14.14.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.14.6.1  01-Aug-2023  martin Pull up the following revisions, requested by msaitoh in ticket #1697:

usr.sbin/tprof/tprof.8 1.16,1.22,1.25,1.29 via patch
usr.sbin/tprof/tprof_analyze.c 1.4
usr.sbin/tprof/arch/tprof_x86.c 1.13-1.19
sys/dev/tprof/tprof.c 1.23 via patch
sys/dev/tprof/tprof_x86_amd.c 1.7-1.8 via patch
sys/dev/tprof/tprof_x86_intel.c 1.8 via patch

- Add AMD family 19h (zen3 and zen4) support.
- Add Intel Comet Lake support.
- Add support for Intel Skylake-X and Cascade Lake.
- Print the path that we failed to open on error.
- Use lowercase consistently for hexadecimal numbers.
- KNF
 1.21.2.2  21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #210):

usr.sbin/tprof/tprof.8: revision 1.30
sys/dev/tprof/tprof_x86_amd.c: revision 1.8
sys/dev/tprof/tprof_armv8.c: revision 1.20
sys/dev/tprof/tprof_types.h: revision 1.7
sys/dev/tprof/tprof_x86_intel.c: revision 1.6
sys/dev/tprof/tprof_x86_intel.c: revision 1.7
sys/dev/tprof/tprof_x86_intel.c: revision 1.8
sys/dev/tprof/tprof.c: revision 1.23
usr.sbin/tprof/tprof.8: revision 1.25
usr.sbin/tprof/tprof.8: revision 1.26
usr.sbin/tprof/arch/tprof_x86.c: revision 1.16
usr.sbin/tprof/tprof.8: revision 1.27
usr.sbin/tprof/arch/tprof_x86.c: revision 1.17
usr.sbin/tprof/tprof.8: revision 1.28
usr.sbin/tprof/tprof.h: revision 1.5
usr.sbin/tprof/tprof.8: revision 1.29
sys/dev/tprof/tprof_armv7.c: revision 1.13
usr.sbin/tprof/tprof_top.c: revision 1.9
usr.sbin/tprof/tprof.c: revision 1.21

Add Cometlake support.

Obtain the number of general counters from CPUID 0xa.

Test cpuid_level in tprof_intel_ncounters().
This function is called before tprof_intel_ident().

KNF. No functional change.

Add two note to the tprof(8)'s manual page.
- "list" command prints the maximum number of counters that can be used
simultaneously.
- multiple -e arguments can be specified.

Use the default counter if -e argument is not specified.
monitor command:
The default counter is selected if -e argument is not specified.
list command:
Print the name of the default counter for monitor and top command.

tprof.8: new sentence, new line

tprof(8): fix markup nits

tprof.8: fix typo, s/speficied/specified/
 1.21.2.1  23-Dec-2022  martin Pull up following revision(s) (requested by ryo in ticket #20):

sys/arch/arm/arm/cpufunc.c: revision 1.185
sys/dev/tprof/tprof.c: revision 1.22
sys/arch/arm/arm32/arm32_boot.c: revision 1.45
sys/dev/tprof/tprof_armv8.c: revision 1.19
sys/dev/tprof/tprof_armv7.c: revision 1.12
sys/arch/aarch64/aarch64/cpu.c: revision 1.71
sys/arch/aarch64/aarch64/cpu.c: revision 1.72

tprof_lock is not a spin mutex. use mutex_{enter,exit}(). oops

Explicitly disable overflow interrupts before enabling the cycle counter.

PMCR_EL0.LC should be set. ARM deprecates use of PMCR_EL0.LC=0

Even if an overflow interrupt is occured for a counter outside tprof management,
the bit of onverflow status register must be cleared to prevent an interrupt storm.

RSS XML Feed