Home | History | Annotate | Download | only in include
History log of /src/sys/arch/aarch64/include/cpu.h
RevisionDateAuthorComments
 1.53  30-Dec-2024  jmcneill arm64: Enable support for low power idle CPU states on ACPI platforms.

The ACPI CPU driver parses the _LPI package on each CPU and builds a
table of supported low power states. A custom cpu_idle() implementation
is registered that uses the time previously spent idle to select an
entry method for low power on the next idle entry.

A boot option, "nolpi", can be used to ignore _LPI and use the normal
WFI idle method.

This decreases the battery discharge rate on my Snapdragon X1E laptop from
~17W to ~10W when idle.
 1.52  10-Dec-2024  jmcneill fix 32-bit arm builds
 1.51  10-Aug-2024  riastradh aarch64: Count RNDRRS failure events and add dtrace probe.

PR port-arm/58572: aarch64 RNDRRS failures should be evcounted and
dtraced
 1.50  09-May-2024  pho port-arm/58194: Resurrect vmt(4) from bitrot

On this architecture vmt(4) used to search for a node "/hypervisor" in the
FDT and probed the VMware hypervisor call only when the node was
found. However, things appear to have changed and VMware no longer provides
the FDT node.

Since vmt(4) doesn't actually need to read anything from FDT, and the
hypervisor call logically resides in virtual CPUs themselves, it would be
better to attach it directly to cpu, just like how it's probed on x86.
 1.49  25-Feb-2023  riastradh aarch64: curcpu() audit.

Sprinkle KASSERT (or KDASSERT in hot paths) for kpreempt_disabled()
when we use curcpu() and it's not immediately obvious that the caller
has preemption disabled but closer scrutiny suggests the caller has.

Note unsafe curcpu()s for syscall event counting. Not sure this is
worth changing.

Possible bugs fixed:

- cpu_irq and cpu_fiq could be preempted while trying to run softints
on this CPU.

- data_abort_handler might incorrectly think it was invoked in
interrupt context when it was only preempted and migrated to
another CPU.

- pmap_fault_fixup might report the wrong CPU logs.

(However, we don't currently run with kpreemption on aarch64, so
these are not yet real bugs fixed except if you patch it to build
with __HAVE_PREEMPTION.)
 1.48  03-Nov-2022  skrll branches: 1.48.2;
Provide MI PMAP support on AARCH64
 1.47  25-Jun-2022  jmcneill Remove GIC_SPLFUNCS.
 1.46  25-Jun-2022  jmcneill pic: Update ci_cpl in pic_set_priority callback.

Not all ICs need interrupts disabled to update the priority. DAIF accesses
are not cheap, so push the update of ci_cpl from pic_set_priority to the
IC's pic_set_priority callback, and let the IC driver determine whether
or not it needs interrupts disabled.
 1.45  02-Nov-2021  ryo In order to prevent _mcount() from being recursively called when built with COPTS=-O0,
sprinkle `__always_inline' to make _mcount() be generated as a single function.
 1.44  01-Nov-2021  skrll Fix a last minute rebase/merge botch so that the cpu_hatch commit actually
works.
 1.43  31-Oct-2021  skrll Rework Arm (32bit and 64bit) AP startup so that cpu_hatch doesn't sleep.

The AP initialisation code in cpu_init_secondary_processor will read and
initialise the required system registers and state for the BP to attach
and report.

Rework the interrupt handler code for this new sequence. Thankfully,
this removes a bunch of code for bcm2836mp.

The VFP detection handler on <= armv7 relies on the global undefined
handler being in place until the BP attaches vfp. That is, after the
APs have been spun up.

gicv3_its.c has a serialisation issue which is protected against in
the gicv3_its_cpu_init, which is called from cpu_hatch, with a spin
lock. The serialisation issue needs addressing more completely.

Tested on RPI3, Apple M1, QEMU, and lx2k

Fixes PR port-arm/56264:
diagnostic assertion "l->l_stat == LSONPROC" failed on RPI3
 1.42  31-Oct-2021  skrll Annotate some cpu_info members
 1.41  26-Oct-2021  skrll Add a comment and adjust whitespace to match style in this file
 1.40  10-Oct-2021  skrll Use sys/uvm/pmap/pmap_tlb.c on Aarch64 in the same way that some Arm, MIPS,
and some PPC kernels do. This removes the limitation of 256 processes on
CPUs with 8bit ASID field, e.g. Apple M1.

Additionally the following changes have been made

- removed a couple of unnecessary aarch64_tlbi_all calls
- removed any invalidation after freeing page tables due to
_pmap_sweep_pdp. This was never necessary afaict.
- all kernel mappings are marked global and userland mapping not-global.

Performance testing hasn't show a significant difference. The data here
is from building a kernel on an lx2k system with nvme.

before
1489.6u 400.4s 2:40.65 1176.5% 228+224k 0+32289io 57pf+0w
1482.6u 403.2s 2:38.49 1189.9% 228+222k 0+32274io 46pf+0w
1485.4u 402.2s 2:37.27 1200.2% 228+222k 0+32275io 12pf+0w

after
1493.9u 404.6s 2:37.50 1205.4% 227+221k 0+32265io 48pf+0w
1485.0u 408.0s 2:38.54 1194.0% 227+222k 0+32272io 36pf+0w
1484.3u 407.0s 2:35.88 1213.3% 228+224k 0+32268io 14pf+0w

>>> stats.ttest_ind([160.65,158.49,157.27], [157.5,158.54,155.88])
Ttest_indResult(statistic=1.1923622711296888, pvalue=0.2990182944606766)
>>>
 1.39  18-Sep-2021  jmcneill gic_splx: performance optimizations

Avoid any kind of register access (DAIF, PMR, etc), barriers, and atomic
operations in the common case where no interrupt fires between spl being
raised and lowered.

This introduces a per-CPU return address (ci_splx_restart) used by the
vector handler to restart a sequence in splx that compares the new ipl
with the per-CPU hardware priority state stored in ci_hwpl.
 1.38  14-Aug-2021  ryo Improved the performance of kernel profiling on MULTIPROCESSOR, and possible to get profiling data for each CPU.

In the current implementation, locks are acquired at the entrance of the mcount
internal function, so the higher the number of cores, the more lock conflict
occurs, making profiling performance in a MULTIPROCESSOR environment unusable
and slow. Profiling buffers has been changed to be reserved for each CPU,
improving profiling performance in MP by several to several dozen times.

- Eliminated cpu_simple_lock in mcount internal function, using per-CPU buffers.
- Add ci_gmon member to struct cpu_info of each MP arch.
- Add kern.profiling.percpu node in sysctl tree.
- Add new -c <cpuid> option to kgmon(8) to specify the cpuid, like openbsd.
For compatibility, if the -c option is not specified, the entire system can be
operated as before, and the -p option will get the total profiling data for
all CPUs.
 1.37  08-Aug-2021  skrll Re-apply

Move 'struct pic_pending' from percpu to struct cpu_info. Saves a few
instructions in splx.

There is(/was) no need to use atomic operations on the percpu / cpu_info
members, so don't.

Finally removng the use of percpu should help avoid problems with "late"
attaching cpus.
 1.36  29-May-2021  skrll Deal with the pmap limitation of maxproc in a more complete way and
recognise CPUs with only 8bit ASIDs.
 1.35  29-May-2021  skrll Sort includes. NFCI.
 1.34  27-Mar-2021  jmcneill branches: 1.34.2; 1.34.4;
Revert recent pic optimizations until I have more time to work on this.
 1.33  21-Feb-2021  jmcneill branches: 1.33.2;
Add cpu_dosoftints_ci(). Like cpu_dosoftints(), but takes a cpu_info ptr
so we can avoid the extra tpidr_el1 access if cpu_info is already known.
 1.32  21-Feb-2021  jmcneill Keep current hardware priority value in struct cpu_info and use it instead
of reading icc_pmr_el1 in gicv3_set_priority.
 1.31  20-Feb-2021  jmcneill Move 'struct pic_pending' from percpu to struct cpu_info. Saves a few
instructions in splx.
 1.30  07-Dec-2020  jmcneill ACPI Processor UID is 32-bits (ci_acpiid).
 1.29  21-Nov-2020  jmcneill Add a per-CPU event counter that counts every time an interrupt handler is
preempted by a higher priority interrupt.
 1.28  01-Oct-2020  ryo branches: 1.28.2;
fix build error with LLVM
 1.27  14-Sep-2020  ryo PID_MAX is just an initial value (soft maximum). Don't use it for CTASSERT.
defined __HAVE_CPU_MAXPROC to use function cpu_maxproc().

pointed out by mrg@, thanks.
 1.26  12-Aug-2020  skrll Part II of ad's aarch64 performance improvements (cpu_switch.S bugs are
all mine)

- Use tpidr_el1 to hold curlwp and not curcpu, because curlwp is accessed
much more often by MI code. It also makes curlwp preemption safe and
allows aarch64_curlwp() to be a const function (curcpu must be volatile).

- Make ASTs operate per-LWP rather than per-CPU, otherwise sometimes LWPs
can see spurious ASTs (which doesn't cause a problem, it just means some
time may be wasted).

- Use plain stores to set/clear ASTs. Make sure ASTs are always set on the
same CPU as the target LWP, and delivered via IPI if posted from a remote
CPU so that they are resolved quickly.

- Add some cache line padding to struct cpu_info, to match x86.

- Add a memory barrier in a couple of places where ci_curlwp is set. This
is needed whenever an LWP that is resuming on the CPU could hold an
adaptive mutex. The barrier needs to drain the CPU's store buffer, so
that the update to ci_curlwp becomes globally visible before the LWP can
resume and call mutex_exit(). By my reading of the ARM docs it looks like
the instruction I used will do the right thing, but I'm not 100% sure.
 1.25  01-Jul-2020  ryo - On some systems with a different cache line size (and DIC,IDC) per CPU, trap "mrs Xt,ctr_el0" instruction
to return the minimum cache line size of the system to userland.
- add CLIDR_EL1 and CTR_EL0 to struct aarch64_sysctl_cpu_id.

On most systems, cache line size is the same for all CPUs, so this mechanism won't be required.
Rather, this is primarily for errata support, which will be committed later.
 1.24  01-Jul-2020  ryo Switch the Icache sync operation to the necessary and sufficient one according to the CTR_EL0.DIC and CTR_EL0.IDC flags.

If CTR_EL0.DIC=1, Icache invalidation is not required.
If CTR_EL0.IDC=1, Dcache clean before Icache invalidation is not required.
CLIDR_EL1.LoC is 0, or CLIDR_EL1.LoUIS and CLIDR_EL1.LoUU are 0, Dcache clean is not required as well.

SEE ALSO ARMARM, "CTR_EL0 Cache Type Register", and "CLIDR_EL1 Cache Level ID Register"
 1.23  29-Jun-2020  riastradh Draft fpu_kern_enter/leave on aarch64.
 1.22  10-Mar-2020  christos protect curcpu/curlwp from _KMEMUSER
 1.21  15-Feb-2020  skrll Various updates and improvements to cpu start up on arm/aarch64

- start sharing more code around the AP startup messaging.
- call arm_cpu_topology_set early so that ci_core_id is available for
drivers, e.g. bcm2835_intr.c
- both arm and aarch64 now have
- a static cpu_info_store array
- the same arm_cpu_{hatched,mbox}
 1.20  12-Feb-2020  riastradh Define the MULTIPROCESSOR cpu_number() for modules too.

Modules should work whether the main kernel is multiprocessor or not.
In particular, dtrace should not think cpu_number() is 0 while
cpu_index(curcpu()) and curcpu()->ci_index are nonzero, leading to
rather spectacularly bogus results...
 1.19  15-Jan-2020  mrg port the arm64 cpu topology setup for big.little to arm.

rename arm64 cpu_do_topology() to arm_cpu_do_topology() and
call it from both arm cpu_attach().

replace both aarch64_set_topology() inline code in arm
cpu_attach() with new arm_cpu_do_topology(), which is called
by the arm64 locore as well (possibly not needed, which would
allow it to become static.)

not yet tested on a real big.little armv7 system. tested
on rockpro64 and pinebook pro.
 1.18  12-Jan-2020  mrg provide some semblance of valid cpu topology for big.little systems.

while attaching cpus, if the FDT provides "capacity-dmips-mhz" track
the fastest set, and call cpu_topology_set() with slow=true for any
cpus that are not the fastest.

bug fix for cpu_topology_set(): actually set ci_is_slow for slow cpus.

with this change, and -current's recent scheduler changes, this means
that long running processes run on the faster cores. on RK3399 based
systems, i am seeing 20-50% speed ups for many tasks.


XXX: all this can be made common with armv7 big.little.
 1.17  05-Jan-2020  ad branches: 1.17.2;
Give aarch64 a preemption safe cpu_intr_p().
 1.16  02-Dec-2019  ad + ci_onproc
 1.15  21-Nov-2019  ad mi_userret(): take care of calling preempt(), set spc_curpriority directly,
and remove MD code that does the same.
 1.14  19-Oct-2019  jmcneill Increase aarch64 MAXCPUS to 256.
 1.13  21-Dec-2018  ryo branches: 1.13.4;
- add workaround for Cavium ThunderX errata 27456.
- add cpufuncs table in cpu_info. each cpu clusters may have different erratum. (e.g. big.LITTLE)
 1.12  24-Nov-2018  skrll Provide a LWP_PC for Taylor
 1.11  20-Nov-2018  mrg rewrite the CPU identification on arm64:

- publish per-cpu data
- publish a whole bunch of info in struct aarch64_sysctl_cpu_id
instead of various individual nodes (there are 16 total.)
- add MIDR extractor bits
- define ARMv8.2-A id_aa64mmfr2_el1 and id_aa64zfr0_el1 regs,
but avoid using them until we make sure they exist. (these
members are added to aarch64_sysctl_cpu_id to avoid future
compat issues.)

the arm32 and aarch32 version of these need to be adjusted as
well (and aarch32 data published at all.) still trying to
work out how to make the same userland binary running on a
real arm32 or an aarch32 system can work sanely here.

ok ryo@.
 1.10  18-Oct-2018  skrll Provide generic start code that assumes the MMU is off and caches are
disabled as per the linux booting protocol for ARMv6 and ARMv7 boards.
u-boot image type should be changed to 'linux' for correct behaviour.

The new start code builds a minimal "bootstrap" L1PT with cached access
disabled and uses the same table for all processors. AP startup is
performed in less steps and more code is written in C.

The bootstrap tables and stack are placed into an (orphaned) section
"_init_memory" which is given to uvm when it is no longer used.

Various kernels have been converted to use this code and tested. Some
boards were provided by TNF. Thanks!

The GENERIC kernel now boots on boards using the TEGRA, SUNXI and EXYNOS
kernels. The GENERIC kernel will also work on RPI2 using u-boot.

Thanks to martin@ and aymeric@ for testing on parallella and nanosoc
respectively
 1.9  12-Oct-2018  jmcneill Add ACPI Processor Unique ID (ci_acpiid) to struct cpu_info, required by
ACPI subsystem.
 1.8  10-Sep-2018  ryo cleanup aarch64 mpstart and fdt bootstrap
* arm_cpu_hatch_arg is a bad idea. avoid serializing CPU startup, and eliminate arm_cpu_hatch_arg.
in mpstart, resolve own cpu index using array of cpu_mpidr[] (aarch64)
* add support fdt enable-method "spin-table"
* add support fdt enable-method "brcm,bcm2836-smp" (for 32bit RaspberryPi)
* use arm_fdt_cpu_bootstrap() instead of psci_fdt_bootstrap()
* rename "arm/fdt/psci_fdt.h" to "arm/fdt/psci_fdtvar.h" because of conflict of include file for needs-flag
* add devmap for cpu spin-table of raspberrypi3/aarch64
* no need to force hatch APs for raspberrypi3/arm32 ifndef MULTIPROCESSOR.
* fix to work pmap_extract(kerneltext/data/bss) even if before calling pmap_bootstrap

idea to use cpu_mpidr[] by jmcneill@. reviewd by skrll@. thanks.
 1.7  26-Aug-2018  ryo add support multiple cpu clusters.
* pass cpu index as an argument to secondary processors when hatching.
* keep cpu cache confituration per cpu clusters.

Hello big.LITTLE!
 1.6  08-Aug-2018  jmcneill Add fields for per-cpu GICv3 state
 1.5  23-Jul-2018  ryo rather than using flags to resolve nested locks, reserve pool_cache before locking.
 1.4  21-Jul-2018  ryo * avoid deadlock. mutex_owned() works only for adaptive lock, therefore we cannot use it for spinlock...
* add more NULL check
* clear pte when pmap_enter() fails
 1.3  09-Jul-2018  ryo add MULTIPROCESSOR support
 1.2  01-Apr-2018  ryo branches: 1.2.2;
Add initial support for ARMv8 (AARCH64) (by nisimura@ and ryo@)

- sys/arch/evbarm64 is gone and integrated into sys/arch/evbarm. (by skrll@)
- add support fdt. evbarm/conf/GENERIC64 fdt (bcm2837,sunxi,tegra) based generic 64bit kernel config. (by skrll@, jmcneill@)
 1.1  10-Aug-2014  matt branches: 1.1.4; 1.1.28;
Preliminary files for AARCH64 (64-bit ARM) support.
Enough for a distribution build.
 1.1.28.7  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.1.28.6  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.1.28.5  20-Oct-2018  pgoyette Sync with head
 1.1.28.4  30-Sep-2018  pgoyette Ssync with HEAD
 1.1.28.3  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.1.28.2  28-Jul-2018  pgoyette Sync with HEAD
 1.1.28.1  07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.1.4.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.4.1  10-Aug-2014  tls file cpu.h was added on branch tls-maxphys on 2014-08-20 00:02:39 +0000
 1.2.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.2.1  10-Jun-2019  christos Sync with HEAD
 1.13.4.2  12-Feb-2020  martin Pull up following revision(s) (requested by riastradh in ticket #701):

external/cddl/osnet/dev/dtrace/aarch64/dtrace_isa.c: revision 1.2
external/cddl/osnet/dist/lib/libdtrace/common/dt_open.c: revision 1.17
external/cddl/osnet/dist/lib/libdtrace/common/dt_module.c: revision 1.18
sys/modules/cyclic/Makefile: revision 1.5
external/cddl/osnet/dev/dtrace/aarch64/dtrace_subr.c: revision 1.2
external/cddl/osnet/dev/dtrace/aarch64/dtrace_subr.c: revision 1.3
sys/arch/aarch64/aarch64/vectors.S: revision 1.10
external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.2
external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.3
external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.4
external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.5
external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.6
sys/arch/aarch64/include/cpu.h: revision 1.20
external/cddl/osnet/dist/lib/libdtrace/common/dt_impl.h: revision 1.9

Create a buffer space of 512 bytes before the trapframe.

dtrace fbt needs enough space to emulate an

stp x29, x30, [sp,#-FRAMESIZE]!

instruction in a function prologue. In the aarch64 instruction
encoding, FRAMESIZE can be as large as 512 bytes, so reserve this
much space when KDTRACE_HOOKS is enabled.

Use db_write_bytes to overwrite kernel text.

Tidy up a bit. No functional change intended.

aarch64 fbt_invop doesn't actually use the argument, but it would
make more sense for it to be the return value and/or first argument
register. Certainly it's not `eax'!

Tidy up a bit: don't set things we won't use; assert nonzeroness.

Use /dev/ksyms, not /netbsd, for the running kernel's symbols.

Teach dtrace about el1_trap_exit frames on aarch64.

Implement dtrace_getarg and dtrace_getreg while here.

Count the number of artificial frames in aarch64 fbt probe correctly.

Change the address ranges that aarch64 considers toxic for dtrace.
`Toxic' means dtrace forbids D scripts from even attempting to read
or write at them.

Previously we considered [0, VM_MIN_KERNEL_ADDRESS) toxic, but
VM_MIN_KERNEL_ADDRESS is only the minimum address of the kernel map;
the direct-mapped region lies below it, and with PMAP_MAP_POOLPAGE we
allocate virtual pages for pool backing directly from physical pages
through the direct-mapped region. Also, this did not consider I/O
mappings to be toxic, which they probably should be.

Instead, treat:

[0, AARCH64_KSEG_START)
and
[VM_KERNEL_IO_ADDRESS, 0xfff...ff)

as toxic. (The upper bound for 0xfff...ff ought to be inclusive, not
exclusive, but I think we'll need another mechanism for expressing
that to dtrace!)

Switch from db_write_bytes to using direct-mapping.

This way there's no dependency on ddb.

Define the MULTIPROCESSOR cpu_number() for modules too.
Modules should work whether the main kernel is multiprocessor or not.
In particular, dtrace should not think cpu_number() is 0 while
cpu_index(curcpu()) and curcpu()->ci_index are nonzero, leading to
rather spectacularly bogus results...

cyclic.kmod needs -Wno-sign-compare for aarch64 CPU_INFO_FOREACH.
Provisional workaround; feel free to fix.
 1.13.4.1  23-Oct-2019  martin Pull up following revision(s) (requested by jmcneill in ticket #359):

sys/arch/aarch64/aarch64/locore.S: revision 1.42
sys/arch/aarch64/aarch64/locore.S: revision 1.43
sys/arch/aarch64/aarch64/locore.S: revision 1.44
sys/arch/arm/fdt/cpu_fdt.c: revision 1.28
sys/arch/aarch64/include/cpu.h: revision 1.14
sys/arch/aarch64/include/param.h: revision 1.12
sys/arch/arm/arm32/cpu.c: revision 1.133
sys/arch/arm/arm32/cpu.c: revision 1.134
sys/arch/arm/include/cpu.h: revision 1.101
sys/arch/arm/acpi/cpu_acpi.c: revision 1.7
sys/arch/aarch64/aarch64/cpu.c: revision 1.23
sys/arch/aarch64/aarch64/cpu.c: revision 1.24
sys/arch/aarch64/aarch64/cpu.c: revision 1.25

Increase aarch64 MAXCPUS to 256.

-

Invalidate dcache before polling AP hatched status

-

Avoid overlap between BP and last AP stack. AP stacks are now in order of
increasing address order.

Spotted by and idea from mlelstv.

-

Use separate cacheline aligned arrays for mbox and hatched as before.

-

cpu_hatched_p only for MULTIPROCESSOR
 1.17.2.2  29-Feb-2020  ad Sync with head.
 1.17.2.1  17-Jan-2020  ad Sync with head.
 1.28.2.2  03-Apr-2021  thorpej Sync with HEAD.
 1.28.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.33.2.1  03-Apr-2021  thorpej Sync with HEAD.
 1.34.4.1  31-May-2021  cjep sync with head
 1.34.2.1  17-Jun-2021  thorpej Sync w/ HEAD.
 1.48.2.1  13-Oct-2024  martin Pull up following revision(s) (requested by riastradh in ticket #955):

sys/arch/aarch64/aarch64/cpu.c: revision 1.78
sys/arch/aarch64/include/cpu.h: revision 1.51

aarch64: Count RNDRRS failure events and add dtrace probe.

PR port-arm/58572: aarch64 RNDRRS failures should be evcounted and
dtraced

RSS XML Feed