Cross Reference: /src/sys/arch/aarch64/include/cpu.h

History log of /src/sys/arch/aarch64/include/cpu.h
Revision	Date	Author	Comments
1.53	30-Dec-2024	jmcneill	arm64: Enable support for low power idle CPU states on ACPI platforms. The ACPI CPU driver parses the _LPI package on each CPU and builds a table of supported low power states. A custom cpu_idle() implementation is registered that uses the time previously spent idle to select an entry method for low power on the next idle entry. A boot option, "nolpi", can be used to ignore _LPI and use the normal WFI idle method. This decreases the battery discharge rate on my Snapdragon X1E laptop from ~17W to ~10W when idle.
1.52	10-Dec-2024	jmcneill	fix 32-bit arm builds
1.51	10-Aug-2024	riastradh	aarch64: Count RNDRRS failure events and add dtrace probe. PR port-arm/58572: aarch64 RNDRRS failures should be evcounted and dtraced
1.50	09-May-2024	pho	port-arm/58194: Resurrect vmt(4) from bitrot On this architecture vmt(4) used to search for a node "/hypervisor" in the FDT and probed the VMware hypervisor call only when the node was found. However, things appear to have changed and VMware no longer provides the FDT node. Since vmt(4) doesn't actually need to read anything from FDT, and the hypervisor call logically resides in virtual CPUs themselves, it would be better to attach it directly to cpu, just like how it's probed on x86.
1.49	25-Feb-2023	riastradh	aarch64: curcpu() audit. Sprinkle KASSERT (or KDASSERT in hot paths) for kpreempt_disabled() when we use curcpu() and it's not immediately obvious that the caller has preemption disabled but closer scrutiny suggests the caller has. Note unsafe curcpu()s for syscall event counting. Not sure this is worth changing. Possible bugs fixed: - cpu_irq and cpu_fiq could be preempted while trying to run softints on this CPU. - data_abort_handler might incorrectly think it was invoked in interrupt context when it was only preempted and migrated to another CPU. - pmap_fault_fixup might report the wrong CPU logs. (However, we don't currently run with kpreemption on aarch64, so these are not yet real bugs fixed except if you patch it to build with __HAVE_PREEMPTION.)
1.48	03-Nov-2022	skrll	branches: 1.48.2; Provide MI PMAP support on AARCH64
1.47	25-Jun-2022	jmcneill	Remove GIC_SPLFUNCS.
1.46	25-Jun-2022	jmcneill	pic: Update ci_cpl in pic_set_priority callback. Not all ICs need interrupts disabled to update the priority. DAIF accesses are not cheap, so push the update of ci_cpl from pic_set_priority to the IC's pic_set_priority callback, and let the IC driver determine whether or not it needs interrupts disabled.
1.45	02-Nov-2021	ryo	In order to prevent _mcount() from being recursively called when built with COPTS=-O0, sprinkle `__always_inline' to make _mcount() be generated as a single function.
1.44	01-Nov-2021	skrll	Fix a last minute rebase/merge botch so that the cpu_hatch commit actually works.
1.43	31-Oct-2021	skrll	Rework Arm (32bit and 64bit) AP startup so that cpu_hatch doesn't sleep. The AP initialisation code in cpu_init_secondary_processor will read and initialise the required system registers and state for the BP to attach and report. Rework the interrupt handler code for this new sequence. Thankfully, this removes a bunch of code for bcm2836mp. The VFP detection handler on <= armv7 relies on the global undefined handler being in place until the BP attaches vfp. That is, after the APs have been spun up. gicv3_its.c has a serialisation issue which is protected against in the gicv3_its_cpu_init, which is called from cpu_hatch, with a spin lock. The serialisation issue needs addressing more completely. Tested on RPI3, Apple M1, QEMU, and lx2k Fixes PR port-arm/56264: diagnostic assertion "l->l_stat == LSONPROC" failed on RPI3
1.42	31-Oct-2021	skrll	Annotate some cpu_info members
1.41	26-Oct-2021	skrll	Add a comment and adjust whitespace to match style in this file
1.40	10-Oct-2021	skrll	Use sys/uvm/pmap/pmap_tlb.c on Aarch64 in the same way that some Arm, MIPS, and some PPC kernels do. This removes the limitation of 256 processes on CPUs with 8bit ASID field, e.g. Apple M1. Additionally the following changes have been made - removed a couple of unnecessary aarch64_tlbi_all calls - removed any invalidation after freeing page tables due to _pmap_sweep_pdp. This was never necessary afaict. - all kernel mappings are marked global and userland mapping not-global. Performance testing hasn't show a significant difference. The data here is from building a kernel on an lx2k system with nvme. before 1489.6u 400.4s 2:40.65 1176.5% 228+224k 0+32289io 57pf+0w 1482.6u 403.2s 2:38.49 1189.9% 228+222k 0+32274io 46pf+0w 1485.4u 402.2s 2:37.27 1200.2% 228+222k 0+32275io 12pf+0w after 1493.9u 404.6s 2:37.50 1205.4% 227+221k 0+32265io 48pf+0w 1485.0u 408.0s 2:38.54 1194.0% 227+222k 0+32272io 36pf+0w 1484.3u 407.0s 2:35.88 1213.3% 228+224k 0+32268io 14pf+0w >>> stats.ttest_ind([160.65,158.49,157.27], [157.5,158.54,155.88]) Ttest_indResult(statistic=1.1923622711296888, pvalue=0.2990182944606766) >>>
1.39	18-Sep-2021	jmcneill	gic_splx: performance optimizations Avoid any kind of register access (DAIF, PMR, etc), barriers, and atomic operations in the common case where no interrupt fires between spl being raised and lowered. This introduces a per-CPU return address (ci_splx_restart) used by the vector handler to restart a sequence in splx that compares the new ipl with the per-CPU hardware priority state stored in ci_hwpl.
1.38	14-Aug-2021	ryo	Improved the performance of kernel profiling on MULTIPROCESSOR, and possible to get profiling data for each CPU. In the current implementation, locks are acquired at the entrance of the mcount internal function, so the higher the number of cores, the more lock conflict occurs, making profiling performance in a MULTIPROCESSOR environment unusable and slow. Profiling buffers has been changed to be reserved for each CPU, improving profiling performance in MP by several to several dozen times. - Eliminated cpu_simple_lock in mcount internal function, using per-CPU buffers. - Add ci_gmon member to struct cpu_info of each MP arch. - Add kern.profiling.percpu node in sysctl tree. - Add new -c <cpuid> option to kgmon(8) to specify the cpuid, like openbsd. For compatibility, if the -c option is not specified, the entire system can be operated as before, and the -p option will get the total profiling data for all CPUs.
1.37	08-Aug-2021	skrll	Re-apply Move 'struct pic_pending' from percpu to struct cpu_info. Saves a few instructions in splx. There is(/was) no need to use atomic operations on the percpu / cpu_info members, so don't. Finally removng the use of percpu should help avoid problems with "late" attaching cpus.
1.36	29-May-2021	skrll	Deal with the pmap limitation of maxproc in a more complete way and recognise CPUs with only 8bit ASIDs.
1.35	29-May-2021	skrll	Sort includes. NFCI.
1.34	27-Mar-2021	jmcneill	branches: 1.34.2; 1.34.4; Revert recent pic optimizations until I have more time to work on this.
1.33	21-Feb-2021	jmcneill	branches: 1.33.2; Add cpu_dosoftints_ci(). Like cpu_dosoftints(), but takes a cpu_info ptr so we can avoid the extra tpidr_el1 access if cpu_info is already known.
1.32	21-Feb-2021	jmcneill	Keep current hardware priority value in struct cpu_info and use it instead of reading icc_pmr_el1 in gicv3_set_priority.
1.31	20-Feb-2021	jmcneill	Move 'struct pic_pending' from percpu to struct cpu_info. Saves a few instructions in splx.
1.30	07-Dec-2020	jmcneill	ACPI Processor UID is 32-bits (ci_acpiid).
1.29	21-Nov-2020	jmcneill	Add a per-CPU event counter that counts every time an interrupt handler is preempted by a higher priority interrupt.
1.28	01-Oct-2020	ryo	branches: 1.28.2; fix build error with LLVM
1.27	14-Sep-2020	ryo	PID_MAX is just an initial value (soft maximum). Don't use it for CTASSERT. defined __HAVE_CPU_MAXPROC to use function cpu_maxproc(). pointed out by mrg@, thanks.
1.26	12-Aug-2020	skrll	Part II of ad's aarch64 performance improvements (cpu_switch.S bugs are all mine) - Use tpidr_el1 to hold curlwp and not curcpu, because curlwp is accessed much more often by MI code. It also makes curlwp preemption safe and allows aarch64_curlwp() to be a const function (curcpu must be volatile). - Make ASTs operate per-LWP rather than per-CPU, otherwise sometimes LWPs can see spurious ASTs (which doesn't cause a problem, it just means some time may be wasted). - Use plain stores to set/clear ASTs. Make sure ASTs are always set on the same CPU as the target LWP, and delivered via IPI if posted from a remote CPU so that they are resolved quickly. - Add some cache line padding to struct cpu_info, to match x86. - Add a memory barrier in a couple of places where ci_curlwp is set. This is needed whenever an LWP that is resuming on the CPU could hold an adaptive mutex. The barrier needs to drain the CPU's store buffer, so that the update to ci_curlwp becomes globally visible before the LWP can resume and call mutex_exit(). By my reading of the ARM docs it looks like the instruction I used will do the right thing, but I'm not 100% sure.
1.25	01-Jul-2020	ryo	- On some systems with a different cache line size (and DIC,IDC) per CPU, trap "mrs Xt,ctr_el0" instruction to return the minimum cache line size of the system to userland. - add CLIDR_EL1 and CTR_EL0 to struct aarch64_sysctl_cpu_id. On most systems, cache line size is the same for all CPUs, so this mechanism won't be required. Rather, this is primarily for errata support, which will be committed later.
1.24	01-Jul-2020	ryo	Switch the Icache sync operation to the necessary and sufficient one according to the CTR_EL0.DIC and CTR_EL0.IDC flags. If CTR_EL0.DIC=1, Icache invalidation is not required. If CTR_EL0.IDC=1, Dcache clean before Icache invalidation is not required. CLIDR_EL1.LoC is 0, or CLIDR_EL1.LoUIS and CLIDR_EL1.LoUU are 0, Dcache clean is not required as well. SEE ALSO ARMARM, "CTR_EL0 Cache Type Register", and "CLIDR_EL1 Cache Level ID Register"
1.23	29-Jun-2020	riastradh	Draft fpu_kern_enter/leave on aarch64.
1.22	10-Mar-2020	christos	protect curcpu/curlwp from _KMEMUSER
1.21	15-Feb-2020	skrll	Various updates and improvements to cpu start up on arm/aarch64 - start sharing more code around the AP startup messaging. - call arm_cpu_topology_set early so that ci_core_id is available for drivers, e.g. bcm2835_intr.c - both arm and aarch64 now have - a static cpu_info_store array - the same arm_cpu_{hatched,mbox}
1.20	12-Feb-2020	riastradh	Define the MULTIPROCESSOR cpu_number() for modules too. Modules should work whether the main kernel is multiprocessor or not. In particular, dtrace should not think cpu_number() is 0 while cpu_index(curcpu()) and curcpu()->ci_index are nonzero, leading to rather spectacularly bogus results...
1.19	15-Jan-2020	mrg	port the arm64 cpu topology setup for big.little to arm. rename arm64 cpu_do_topology() to arm_cpu_do_topology() and call it from both arm cpu_attach(). replace both aarch64_set_topology() inline code in arm cpu_attach() with new arm_cpu_do_topology(), which is called by the arm64 locore as well (possibly not needed, which would allow it to become static.) not yet tested on a real big.little armv7 system. tested on rockpro64 and pinebook pro.
1.18	12-Jan-2020	mrg	provide some semblance of valid cpu topology for big.little systems. while attaching cpus, if the FDT provides "capacity-dmips-mhz" track the fastest set, and call cpu_topology_set() with slow=true for any cpus that are not the fastest. bug fix for cpu_topology_set(): actually set ci_is_slow for slow cpus. with this change, and -current's recent scheduler changes, this means that long running processes run on the faster cores. on RK3399 based systems, i am seeing 20-50% speed ups for many tasks. XXX: all this can be made common with armv7 big.little.
1.17	05-Jan-2020	ad	branches: 1.17.2; Give aarch64 a preemption safe cpu_intr_p().
1.16	02-Dec-2019	ad	+ ci_onproc
1.15	21-Nov-2019	ad	mi_userret(): take care of calling preempt(), set spc_curpriority directly, and remove MD code that does the same.
1.14	19-Oct-2019	jmcneill	Increase aarch64 MAXCPUS to 256.
1.13	21-Dec-2018	ryo	branches: 1.13.4; - add workaround for Cavium ThunderX errata 27456. - add cpufuncs table in cpu_info. each cpu clusters may have different erratum. (e.g. big.LITTLE)
1.12	24-Nov-2018	skrll	Provide a LWP_PC for Taylor
1.11	20-Nov-2018	mrg	rewrite the CPU identification on arm64: - publish per-cpu data - publish a whole bunch of info in struct aarch64_sysctl_cpu_id instead of various individual nodes (there are 16 total.) - add MIDR extractor bits - define ARMv8.2-A id_aa64mmfr2_el1 and id_aa64zfr0_el1 regs, but avoid using them until we make sure they exist. (these members are added to aarch64_sysctl_cpu_id to avoid future compat issues.) the arm32 and aarch32 version of these need to be adjusted as well (and aarch32 data published at all.) still trying to work out how to make the same userland binary running on a real arm32 or an aarch32 system can work sanely here. ok ryo@.
1.10	18-Oct-2018	skrll	Provide generic start code that assumes the MMU is off and caches are disabled as per the linux booting protocol for ARMv6 and ARMv7 boards. u-boot image type should be changed to 'linux' for correct behaviour. The new start code builds a minimal "bootstrap" L1PT with cached access disabled and uses the same table for all processors. AP startup is performed in less steps and more code is written in C. The bootstrap tables and stack are placed into an (orphaned) section "_init_memory" which is given to uvm when it is no longer used. Various kernels have been converted to use this code and tested. Some boards were provided by TNF. Thanks! The GENERIC kernel now boots on boards using the TEGRA, SUNXI and EXYNOS kernels. The GENERIC kernel will also work on RPI2 using u-boot. Thanks to martin@ and aymeric@ for testing on parallella and nanosoc respectively
1.9	12-Oct-2018	jmcneill	Add ACPI Processor Unique ID (ci_acpiid) to struct cpu_info, required by ACPI subsystem.
1.8	10-Sep-2018	ryo	cleanup aarch64 mpstart and fdt bootstrap * arm_cpu_hatch_arg is a bad idea. avoid serializing CPU startup, and eliminate arm_cpu_hatch_arg. in mpstart, resolve own cpu index using array of cpu_mpidr[] (aarch64) * add support fdt enable-method "spin-table" * add support fdt enable-method "brcm,bcm2836-smp" (for 32bit RaspberryPi) * use arm_fdt_cpu_bootstrap() instead of psci_fdt_bootstrap() * rename "arm/fdt/psci_fdt.h" to "arm/fdt/psci_fdtvar.h" because of conflict of include file for needs-flag * add devmap for cpu spin-table of raspberrypi3/aarch64 * no need to force hatch APs for raspberrypi3/arm32 ifndef MULTIPROCESSOR. * fix to work pmap_extract(kerneltext/data/bss) even if before calling pmap_bootstrap idea to use cpu_mpidr[] by jmcneill@. reviewd by skrll@. thanks.
1.7	26-Aug-2018	ryo	add support multiple cpu clusters. * pass cpu index as an argument to secondary processors when hatching. * keep cpu cache confituration per cpu clusters. Hello big.LITTLE!
1.6	08-Aug-2018	jmcneill	Add fields for per-cpu GICv3 state
1.5	23-Jul-2018	ryo	rather than using flags to resolve nested locks, reserve pool_cache before locking.
1.4	21-Jul-2018	ryo	* avoid deadlock. mutex_owned() works only for adaptive lock, therefore we cannot use it for spinlock... * add more NULL check * clear pte when pmap_enter() fails
1.3	09-Jul-2018	ryo	add MULTIPROCESSOR support
1.2	01-Apr-2018	ryo	branches: 1.2.2; Add initial support for ARMv8 (AARCH64) (by nisimura@ and ryo@) - sys/arch/evbarm64 is gone and integrated into sys/arch/evbarm. (by skrll@) - add support fdt. evbarm/conf/GENERIC64 fdt (bcm2837,sunxi,tegra) based generic 64bit kernel config. (by skrll@, jmcneill@)
1.1	10-Aug-2014	matt	branches: 1.1.4; 1.1.28; Preliminary files for AARCH64 (64-bit ARM) support. Enough for a distribution build.
1.1.28.7	26-Dec-2018	pgoyette	Sync with HEAD, resolve a few conflicts
1.1.28.6	26-Nov-2018	pgoyette	Sync with HEAD, resolve a couple of conflicts
1.1.28.5	20-Oct-2018	pgoyette	Sync with head
1.1.28.4	30-Sep-2018	pgoyette	Ssync with HEAD
1.1.28.3	06-Sep-2018	pgoyette	Sync with HEAD Resolve a couple of conflicts (result of the uimin/uimax changes)
1.1.28.2	28-Jul-2018	pgoyette	Sync with HEAD
1.1.28.1	07-Apr-2018	pgoyette	Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
1.1.4.2	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.1.4.1	10-Aug-2014	tls	file cpu.h was added on branch tls-maxphys on 2014-08-20 00:02:39 +0000
1.2.2.2	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.2.2.1	10-Jun-2019	christos	Sync with HEAD
1.13.4.2	12-Feb-2020	martin	Pull up following revision(s) (requested by riastradh in ticket #701): external/cddl/osnet/dev/dtrace/aarch64/dtrace_isa.c: revision 1.2 external/cddl/osnet/dist/lib/libdtrace/common/dt_open.c: revision 1.17 external/cddl/osnet/dist/lib/libdtrace/common/dt_module.c: revision 1.18 sys/modules/cyclic/Makefile: revision 1.5 external/cddl/osnet/dev/dtrace/aarch64/dtrace_subr.c: revision 1.2 external/cddl/osnet/dev/dtrace/aarch64/dtrace_subr.c: revision 1.3 sys/arch/aarch64/aarch64/vectors.S: revision 1.10 external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.2 external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.3 external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.4 external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.5 external/cddl/osnet/dev/fbt/aarch64/fbt_isa.c: revision 1.6 sys/arch/aarch64/include/cpu.h: revision 1.20 external/cddl/osnet/dist/lib/libdtrace/common/dt_impl.h: revision 1.9 Create a buffer space of 512 bytes before the trapframe. dtrace fbt needs enough space to emulate an stp x29, x30, [sp,#-FRAMESIZE]! instruction in a function prologue. In the aarch64 instruction encoding, FRAMESIZE can be as large as 512 bytes, so reserve this much space when KDTRACE_HOOKS is enabled. Use db_write_bytes to overwrite kernel text. Tidy up a bit. No functional change intended. aarch64 fbt_invop doesn't actually use the argument, but it would make more sense for it to be the return value and/or first argument register. Certainly it's not `eax'! Tidy up a bit: don't set things we won't use; assert nonzeroness. Use /dev/ksyms, not /netbsd, for the running kernel's symbols. Teach dtrace about el1_trap_exit frames on aarch64. Implement dtrace_getarg and dtrace_getreg while here. Count the number of artificial frames in aarch64 fbt probe correctly. Change the address ranges that aarch64 considers toxic for dtrace. `Toxic' means dtrace forbids D scripts from even attempting to read or write at them. Previously we considered [0, VM_MIN_KERNEL_ADDRESS) toxic, but VM_MIN_KERNEL_ADDRESS is only the minimum address of the kernel map; the direct-mapped region lies below it, and with PMAP_MAP_POOLPAGE we allocate virtual pages for pool backing directly from physical pages through the direct-mapped region. Also, this did not consider I/O mappings to be toxic, which they probably should be. Instead, treat: [0, AARCH64_KSEG_START) and [VM_KERNEL_IO_ADDRESS, 0xfff...ff) as toxic. (The upper bound for 0xfff...ff ought to be inclusive, not exclusive, but I think we'll need another mechanism for expressing that to dtrace!) Switch from db_write_bytes to using direct-mapping. This way there's no dependency on ddb. Define the MULTIPROCESSOR cpu_number() for modules too. Modules should work whether the main kernel is multiprocessor or not. In particular, dtrace should not think cpu_number() is 0 while cpu_index(curcpu()) and curcpu()->ci_index are nonzero, leading to rather spectacularly bogus results... cyclic.kmod needs -Wno-sign-compare for aarch64 CPU_INFO_FOREACH. Provisional workaround; feel free to fix.
1.13.4.1	23-Oct-2019	martin	Pull up following revision(s) (requested by jmcneill in ticket #359): sys/arch/aarch64/aarch64/locore.S: revision 1.42 sys/arch/aarch64/aarch64/locore.S: revision 1.43 sys/arch/aarch64/aarch64/locore.S: revision 1.44 sys/arch/arm/fdt/cpu_fdt.c: revision 1.28 sys/arch/aarch64/include/cpu.h: revision 1.14 sys/arch/aarch64/include/param.h: revision 1.12 sys/arch/arm/arm32/cpu.c: revision 1.133 sys/arch/arm/arm32/cpu.c: revision 1.134 sys/arch/arm/include/cpu.h: revision 1.101 sys/arch/arm/acpi/cpu_acpi.c: revision 1.7 sys/arch/aarch64/aarch64/cpu.c: revision 1.23 sys/arch/aarch64/aarch64/cpu.c: revision 1.24 sys/arch/aarch64/aarch64/cpu.c: revision 1.25 Increase aarch64 MAXCPUS to 256. - Invalidate dcache before polling AP hatched status - Avoid overlap between BP and last AP stack. AP stacks are now in order of increasing address order. Spotted by and idea from mlelstv. - Use separate cacheline aligned arrays for mbox and hatched as before. - cpu_hatched_p only for MULTIPROCESSOR
1.17.2.2	29-Feb-2020	ad	Sync with head.
1.17.2.1	17-Jan-2020	ad	Sync with head.
1.28.2.2	03-Apr-2021	thorpej	Sync with HEAD.
1.28.2.1	14-Dec-2020	thorpej	Sync w/ HEAD.
1.33.2.1	03-Apr-2021	thorpej	Sync with HEAD.
1.34.4.1	31-May-2021	cjep	sync with head
1.34.2.1	17-Jun-2021	thorpej	Sync w/ HEAD.
1.48.2.1	13-Oct-2024	martin	Pull up following revision(s) (requested by riastradh in ticket #955): sys/arch/aarch64/aarch64/cpu.c: revision 1.78 sys/arch/aarch64/include/cpu.h: revision 1.51 aarch64: Count RNDRRS failure events and add dtrace probe. PR port-arm/58572: aarch64 RNDRRS failures should be evcounted and dtraced

OpenGrok