Home | History | Annotate | Download | only in arm32
History log of /src/sys/arch/arm/arm32/cpuswitch.S
RevisionDateAuthorComments
 1.108  07-Oct-2025  skrll Retire the locore.h assembly versions of IRQ{enable,disable} in favour of
the "experimental" cpuswitch.S versions, but rename them

IRQ_{DISABLE,ENABLE}

and have them take a temporary register which is only used for < armv6.
 1.107  01-Mar-2023  riastradh arm32: Optimization: Omit needless membar when triggering softint.

When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.

PR kern/57240

XXX pullup-8
XXX pullup-9
XXX pullup-10
 1.106  23-Feb-2023  riastradh arm32: Add missing barriers in cpu_switchto.

Details in comments.

PR kern/57240

XXX pullup-8
XXX pullup-9
XXX pullup-10
 1.105  30-May-2021  dholland branches: 1.105.12;
typo in comment
 1.104  21-Nov-2020  skrll branches: 1.104.4; 1.104.6;
Ensure that r5 contains curlwp before DO_AST_AND_RESTORE_ALIGNMENT_FAULTS
in lwp_trampoline as required by the move to make ASTs operate per-LWP
rather than per-CPU.

Thanks to martin@ for bisecting the amap corruption he was seeing and
testing this fix.
 1.103  15-Aug-2020  skrll branches: 1.103.2;
#ifdef _ARM_ARCH_7 the dmbs
 1.102  14-Aug-2020  skrll Mirror the changes to aarch64 and

- Switch to TPIDRPRW_IS_CURLWP, because curlwp is accessed much more often
by MI code. It also makes curlwp preemption safe,

- Make ASTs operate per-LWP rather than per-CPU, otherwise sometimes LWPs
can see spurious ASTs (which doesn't cause a problem, it just means some
time may be wasted).

- Make sure ASTs are always set on the same CPU as the target LWP, and
delivered via IPI if posted from a remote CPU so that they are resolved
quickly.

- Add some cache line padding to struct cpu_info.

- Add a memory barrier in a couple of places where ci_curlwp is set. This
is needed whenever an LWP that is resuming on the CPU could hold an
adaptive mutex. The barrier needs to drain the CPU's store buffer, so
that the update to ci_curlwp becomes globally visible before the LWP can
resume and call mutex_exit().
 1.101  10-Jul-2020  skrll Add support for KASAN on ARMv[67]

Thanks to maxv for many pointers and reviews.
 1.100  06-Jul-2020  skrll Whitespace
 1.99  03-Jul-2020  skrll KNF (sort #includes)
 1.98  11-Feb-2020  skrll G/C
 1.97  08-Jan-2020  skrll oldlwp is always non-NULL in cpu_switchto so remove the test for NULL.
 1.96  08-Jan-2020  ad Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
 1.95  29-Oct-2019  joerg branches: 1.95.2;
Explicitly annotate FPU requirements for LLVM MC.

When using GCC, this annotations change the global state, but there is
no push/pop functionality for .fpu to avoid this problem. The state is
local to each inline assembler block with LLVM MC.
 1.94  13-Sep-2019  skrll Typo in comment
 1.93  22-Nov-2018  skrll branches: 1.93.4;
Typo in comment
 1.92  01-Jul-2017  skrll branches: 1.92.4; 1.92.6;
Whitespace (align comments)
 1.91  01-Jul-2017  skrll Trailing whitespace
 1.90  08-Apr-2015  matt branches: 1.90.10;
Make TPIDRPRW_IS_CURLWP work for MULTIPROCESSOR.
get curcpu() from new lwp.
don't set lwp l_cpu (already done).
Remove support for __HAVE_UNNESTED_INTRS
don't set curlwp until after we are done saving the oldlwp.
disable interrupts when setting curlwp/kernel stack pointer.
Overall, these changes simplify cpu_switchto even more.
 1.89  24-Mar-2015  skrll There is no need to save/restore l_private in softint_switch now that
cpu_switchto is fixed
 1.88  24-Mar-2015  matt Rework register usage in cpu_switchto so curcpu() is preserved across
ras_lookup. Only set vfp & tpid registers and do ras lookups if new lwp
is not LW_SYSTEM. (tested on RPI and atf tests on BPI by skrll).
 1.87  22-Mar-2015  matt Fix register usage in softint_switch. load / restore l_private across
softint_dispatch
 1.86  22-Mar-2015  matt Make sure to save the user thread point in softint_switch in case it was
set just before we got an interrupt. Otherwise if the softint blocks, the
old value would be restored and change lost.
 1.85  18-Oct-2014  snj branches: 1.85.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.84  15-Jun-2014  ozaki-r branches: 1.84.2;
Fix wrong instruction; mcr => mrc
 1.83  28-Mar-2014  matt branches: 1.83.2;
ARM_MMU_EXTENDED support.
 1.82  26-Feb-2014  matt Move pmap_recent_user to ci->ci_pmap_lastuser and
pmap_previous_active_lwp to ci->ci_lastlwp. Fix some comments.
 1.81  26-Dec-2013  joerg Replicate mcr with equivalent vms instruction.
 1.80  01-Dec-2013  joerg For load/store double, name the second register explicitly.
 1.79  18-Aug-2013  matt Move parts of cpu.h that are not needed by MI code in <arm/locore.h>
Don't include <machine/cpu.h> or <machine/frame.h>, use <arm/locore.h>
Use <arm/asm.h> instead of <machine/arm.h>
 1.78  18-Aug-2013  matt Move parts of cpu.h that are not needed by MI code in <arm/locore.h>
Don't include <machine/cpu.h> or <machine/frame.h>, use <arm/locore.h>
Use <arm/asm.h> instead of <machine/arm.h>
 1.77  27-Feb-2013  matt branches: 1.77.6;
Don't include <machine/param.h> since we should be getting that stuff from
"assym.h"
 1.76  17-Dec-2012  matt Make sure to load the FPEXC context on context switch (if there a VFP) so
that the VFP state will be what the LWP expects. (This isn't needed on
PPC or MIPS since their FPU/VEC state is reflected in the PSL/CPO_STATUS
which is handled automatically.)
 1.75  10-Dec-2012  matt Rename pcb_sp/PCB_SP to pcb_ksp/PCB_KSP so that ipsec.c will compile.
 1.74  05-Dec-2012  matt ARMFPE hasn't compiled since NetBSD 4. Remove it.
Complete support for FPU_VFP.
fpregs now contains vfpreg.
XXX vfpreg only has space for 16 64-bit FP registers though VFPv3 and later
have 32 64-bit FP registers.
 1.73  08-Nov-2012  skrll Use ENTRY_NP for lwp_trampoline
 1.72  05-Sep-2012  matt branches: 1.72.2;
After calling lwp_startup, set fp to 0 to terminate call stack.
 1.71  01-Sep-2012  matt Need to do a GET_CURCPU(r4) before invoking DO_AST_AND_RESTORE_ALIGNMENT_FAULTS
 1.70  01-Sep-2012  matt blx reg is V5, not V4T
 1.69  31-Aug-2012  skrll DO_AST_AND_RESTORE_ALIGNMENT_FAULTS needs AST_ALIGNMENT_FAULT_LOCALS
 1.68  29-Aug-2012  matt Fix typo.
 1.67  29-Aug-2012  matt Rename ARM options PROCESS_ID_IS_CUR{CPU,LWP} to TPIDRPRW_IS_CUR{CPU,LWP}
since TPIDRPRW is the cp15 register name.
Initialize it early in start along with CI_ARM_CPUID.
Remove other initializations.
We alays have ci_curlwp.
Enable TIPRPRW_IS_CURCPU in std.beagle.
[tested on a beaglboard (cortex-a8)]
 1.66  16-Aug-2012  matt small rototill.
pcb_flags is dead. PCB_NOALIGNFLT is now in stored l_md.md_flags as
MDLWP_NOALIGNFLT. This avoids a few loads of the PCB in exception handling.
pcb_tf has been moved to l_md.md_tf. Again this avoids a lot of pcb
references just to access or set this. It also means that pcb doesn't
need to accessed by MI code.
Move pcb_onfault to after the pcb union.
Add pcb_sp macro to make code prettier.
Add lwp_settrapframe(l, tf) to set the l_md.md_tf field.
Use lwp_trapframe to access it (was process_frame but that name was changed
in a previous commit).
Kill off curpcb in acorn26.
Kill the checks for curlwp being NULL.
Move TRAP_USERMODE from arm32/fault.c to frame.h and a __PROG26 version.
Replace tests for usermode with that macro.
 1.65  14-Aug-2012  matt Kill curpcb/ci_curpcb. Use device_t in cpu_info.
Add ci_softc (where ci_curpcb was so cpu_info doesn't change).
 1.64  12-Aug-2012  matt Rework VFP support to use PCU.
Add emulation of instruction which save/restore the VFP FPSCR.
Add a sysarch hook to VFP FPSCR manipulation.

[The emulation will be used by libc to store/fetch exception modes and
rounding mode on a per-thread basis.]
 1.63  07-Apr-2011  matt branches: 1.63.4; 1.63.12;
Fetch user read-only thread and process id from l->l_private, not the pcb.
(need to g/c the pcb field formerly used for this).
 1.62  01-Feb-2011  matt include "assym.h" instead of pte.h
 1.61  14-Jan-2011  rmind branches: 1.61.2; 1.61.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.
 1.60  10-Dec-2009  rmind branches: 1.60.4;
Rename L_ADDR to L_PCB and amend some comments accordingly.
 1.59  19-Nov-2008  matt Use IF32_bits instead of I32_bit. Only disable/irq is __HAVE_UNNESTED_IRQS
is undefined.
 1.58  27-Apr-2008  matt branches: 1.58.6; 1.58.8; 1.58.14;
Merge kernel changes in matt-armv6 to HEAD.
 1.57  20-Apr-2008  scw branches: 1.57.2;
There's really no need to switch VM contexts within cpu_switchto() as
MI code always calls pmap_deactivate/pmap_activate on context switch.

Instead, just record the last active lwp (or NULL if it exited) and
defer switching VM context to pmap_activate(). This saves an additional
function call overhead in cpu_switchto().

While here, g/c unused cpuswitch.S local .Lblock_userspace_access.
 1.56  15-Mar-2008  rearnsha branches: 1.56.2;
VFP support.
 1.55  19-Jan-2008  chris branches: 1.55.2; 1.55.6;
Optimize cpu_switchto to store the new PCB address in r7, rather than
loading it from memory in 3 places.

Also adjust ordering of a few loads to try and avoid stalling.
 1.54  19-Jan-2008  chris With the removal of IPKDB on arm, the undefined stack is only used to
bounce into SVC32 mode, there is no per-process data stored on it.

We can therefore use the undefined stack setup by the platform machdep.c
as a system wide undefined stack.

This removes the need for a per-process undefined stack, and the processor
mode switching overhead it causes in cpu_switchto.

The space freed in the USPACE is used to increase the per process kernel
stack size.
 1.53  13-Jan-2008  chris Take a micro-optimization from FreeBSD/arm.

When switching from SVC32->UND32 to read/write R13_und we don't need to clear
the mode bits as:
PSR_SVC32_MODE | PSR_UND32_MODE = PSR_UND32_MODE

While reading the code I also noted that interrupts are enabled for most of
the function as pmap_switch returns with interrupts in the state they are on
entry. This appears to be different to what the code after pmap_switch
expects, in that the behaviour suggests they should be disabled.

Because of this I've made the writing of R13_und explicitly disable.
interupts as part of the mode switch.

This also means that the IRQenableALL call is now redundant as the
interrupts are already enabled.

XXX: it's not clear if arm_fpe_core_changecontext should be called with
interrupts disabled.

Remove unused items: IRQdisableALL, IRQenableALL & Lcpufuncs.

Tested on cats. lmbench shows no performance change.
 1.52  12-Jan-2008  skrll Add and fix a couple of comments.
 1.51  12-Jan-2008  skrll Push a switchframe in dumpsys and cpu_switchto, but as dumpsys calls
other funcs a switchframe needs to be a multiple of 8 bytes. Stash sp as
well in the switchframe to bump it to 24bytes.

Setup the switchframe appropriately in cpu_lwp_fork.

Remove savectx - nothing uses it.

All of this make gdb's life much easier when dealing with crash dumps and
live kernels.

Reviewd by chris.
 1.50  17-Oct-2007  garbled branches: 1.50.2; 1.50.8;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.49  15-Sep-2007  scw ARM cpu_switchto() has been partially broken since yamt-idlelwp was merged
as its cache/tlb management smarts relied too heavily on pre-merge context-
switch behaviour. See PR kern/36548 for one manifestation of the breakage.

To address this:
- Ditch the shadow pmap variables in the PCB (pagedir, l1vec, dacr, cstate)
as it was too easy for them to get out of sync with the pmap.
- Re-write (and fix) the convoluted cpuswitch.S cache/tlb ASM code in C.
It's only slightly less efficient, but is much more readable/maintainable.
- Document cpufuncs.cf_context_switch() as being C-callable.
- pmap_activate() becomes a no-op if the lwp's vmspace is already active.
(Good performance win, since pmap_activate() is now invoked on every
context-switch, even though ARM's cpu_switchto() already does all the
grunt work)

XXX: Some CPU-specific armXX_context_switch() implementations (arm67,
arm7tdmi, arm8) always flush the I+D caches. This should not be necessary.
Someone with access to hardware (acorn32?) needs to deal with this.
 1.48  25-May-2007  skrll branches: 1.48.6; 1.48.8; 1.48.10; 1.48.12;
No need to check if oldl == newl in cpu_switchto. All the callers ensure
this is never the case.

Fixup a few comments while I'm here.
 1.47  17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.46  19-Feb-2007  briggs branches: 1.46.4; 1.46.6; 1.46.12;
Get DOMAIN_CLIENT directly from arm/arm32/pte.h instead of from genassym
to avoid redefinition when both assymh and pte.h are included (as in
INTEGRATOR's intmmu.S, which uses more macros from pte.h).
 1.45  09-Feb-2007  ad branches: 1.45.2;
Merge newlock2 to head.
 1.44  13-May-2006  skrll branches: 1.44.8; 1.44.12;
Fix some more comments.
 1.43  10-May-2006  skrll Fix some comments.
 1.42  11-Dec-2005  christos branches: 1.42.4; 1.42.6; 1.42.8; 1.42.10; 1.42.12;
merge ktrace-lwp.
 1.41  15-Nov-2003  scw branches: 1.41.16;
- Add LOCKDEBUG-protected calls to sched_lock_idle() to cpu_switchto and
the idle loop. They seem to have gone AWOL sometime in the past.
Fixes port-arm/23390.
- While here, tidy up the idle loop.
- Add a cheap DIAGNOSTIC check for run queue sanity.
 1.40  04-Nov-2003  scw Fix a braino introduced in r1.37. Thanks to Tom Spindler for spotting it.
 1.39  04-Nov-2003  dsl Remove p_nras from struct proc - use LIST_EMPTY(&p->p_raslist) instead.
Remove p_raslock and rename p_lwplock p_lock (one lock is enough).
Simplify window test when adding a ras and correct test on VM_MAXUSER_ADDRESS.
Avoid unpredictable branch in i386 locore.S
(pad fields left in struct proc to avoid kernel bump)
 1.38  23-Oct-2003  scw Don't drop to spl0 in cpu_switch/cpu_switchto. Do it in the idle loop
instead.

With this change, we no longer need to save the current interrupt level
in the switchframe. This is no great loss since both cpu_switch and
cpu_switchto are always called at splsched, so the process' spl is
effectively saved somewhere in the callstack.

This fixes an evbarm problem reported by Allen Briggs:

lwp gets into sa_switch -> mi_switch with newl != NULL
when it's the last element on the runqueue, so it
hits the second bit of:
if (newl == NULL) {
retval = cpu_switch(l, NULL);
} else {
remrunqueue(newl);
cpu_switchto(l, newl);
retval = 0;
}

mi_switch calls remrunqueue() and cpu_switchto()

cpu_switchto unlocks the sched lock
cpu_switchto drops CPU priority
softclock is received
schedcpu is called from softclock
schedcpu hits the first if () {} block here:
if (l->l_priority >= PUSER) {
if (l->l_stat == LSRUN &&
(l->l_flag & L_INMEM) &&
(l->l_priority / PPQ) != (l->l_usrpri / PPQ)) {
remrunqueue(l);
l->l_priority = l->l_usrpri;
setrunqueue(l);
} else
l->l_priority = l->l_usrpri;
}

Since mi_switch has already run remrunqueue, the LWP has been
removed, but it's not been put back on any queue, so the
remrunqueue panics.
 1.37  13-Oct-2003  scw A couple of Xscale tweaks:

- Use the "clz" instruction to pick a run-queue, instead of using the
ffs-by-table-lookup method.
- Use strd instead of stmia where possible.
- Use multiple ldr instructions instead of ldmia where possible.
 1.36  23-Jun-2003  martin branches: 1.36.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.35  23-Jun-2003  chris Fix for port-arm/21962. Rather than fixing the #ifndef spl0, I removed
the test as spl0 is actually a macro for splx(0). The code now calls
splx(0)

(note building with the #ifdef fixed, caused the build to fail on a
GENERIC acorn32 kernel.)
 1.34  31-May-2003  kristerw Fix LINTSTUB comments.
 1.33  21-May-2003  thorpej Remove #ifdefs supporting the old pmap, switching fully to the new.
 1.32  26-Apr-2003  chris Remove a strh. I don't think it's available on archv3 and it doesn't work
on acorn32's with an SA110 in them as the bus doesn't support halfword
transfers.
 1.31  22-Apr-2003  thorpej Some ARM32_PMAP_NEW-related cleanup:
* Define a new "MMU type", ARM_MMU_SA1. While the SA-1's MMU is basically
compatible with the generic, the SA-1 cache does not have a write-through
mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
compile time. We evaluate it like so:
- If SA-1-style MMU is the only type configured -> 1
- If SA-1-style MMU is not configured -> 0
- Otherwise, defer to a run-time variable.
If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
code can include the necessary run-time support. PMAP_INCLUDE_PTE_SYNC
largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
has a write-back cache. If so, init the PT cache mode to C=1,B=0 to get
write-through mode. Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8(). Old pmap, same as generic. New pmap,
sets page table cacheability to 0 (ARM8 has a write-back cache, but
flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
since we use write-through mode all the time on ARM9 right now. (What
this really tells me is that the test for write-through cache is less
than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1(). Old pmap, same as generic. New pmap,
does generic initialization, then resets page table cache mode to
C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
 1.30  18-Apr-2003  scw Add the generic arm32 bits of the new pmap, contributed by Wasabi Systems.

Some features of the new pmap are:

- It allows L1 descriptor tables to be shared efficiently between
multiple processes. A typical "maxusers 32" kernel, where NPROC is set
to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
with just 4 L1s. This completely solves the problem of running out
of contiguous physical memory for allocating new L1s at runtime on a
busy system.

- Much improved cache/TLB management "smarts". This change ripples
out to encompass the low-level context switch code, which is also
much smarter about when to flush the cache/TLB, and when not to.

- Faster allocation of L2 page tables and associated metadata thanks,
in part, to the pool_cache enhancements recently contributed to
NetBSD by Wasabi Systems.

- Faster VM space teardown due to accurate referenced tracking of L2
page tables.

- Better/faster cache-alias tracking.

The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
 1.29  17-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.28  19-Oct-2002  bjh21 branches: 1.28.2;
Undo recent cpu_switch register usage changes in order to decrease nathanw_sa
merge pain.
 1.27  18-Oct-2002  bjh21 The grand cpu_switch register reshuffle!

In particular, use r8 to hold the old process, and r7 for medium-term
scratch, saving r0-r3 for things we don't need saved over function
calls. This gets rid of five register-to-register MOVs.
 1.26  18-Oct-2002  bjh21 In cpu_switch(), stack more registers at the start of the function,
and hence save fewer into the PCB. This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly. While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.

This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
 1.25  15-Oct-2002  bjh21 Switch to using the MI C versions of setrunqueue() and remrunqueue().
GCC produces almost exactly the same instructions as the hand-assembled
versions, albeit in a different order. It even found one place where it
could shave one off. Its insistence on creating a stack frame might slow
things down marginally, but not, I think, enough to matter.
 1.24  14-Oct-2002  bjh21 Continue the " - . - 8" purge. Specifically:

add rd, pc, #foo - . - 8 -> adr rd, foo
ldr rd, [pc, #foo - . - 8] -> ldr rd, foo

Also, when saving the return address for a function pointer call, use
"mov lr, pc" just before the call unless the return address is somewhere
other than just after the call site.

Finally, a few obvious little micro-optimisations like using LDR directly
rather than ADR followed by LDR, and loading directly into PC rather than
bouncing via R0.
 1.23  13-Oct-2002  bjh21 Instead of "add rd, pc, #foo - . - 8", use either "adr rd, foo" or (where
appropriate) "mov lr, pc". This makes things slightly less confusing and
ugly.
 1.22  12-Oct-2002  bjh21 Move curpcb into struct cpu_info in MULTIPROCESSOR kernels.
 1.21  09-Oct-2002  bjh21 Use ADR rather than an explicit ADD from PC.
 1.20  08-Oct-2002  bjh21 Remove an outdated register assignment comment.
 1.19  05-Oct-2002  bjh21 Minimal changes to allow a kernel with "options MULTIPROCESSOR" to compile
and boot multi-user on a single-processor machine. Many of these changes
are wildly inappropriate for actual multi-processor operation, and correcting
this will be my next task.
 1.18  31-Aug-2002  thorpej Add machine-dependent bits of RAS for arm32.
 1.17  17-Aug-2002  thorpej More local label fixups.
 1.16  17-Aug-2002  thorpej Must ... micro ... optimize!

* Save an instruction in the transition from idle to have-process-to-
switch-to, and eliminate two instructions that cause datadep-stalls
on StrongARM And XScale (one in each idle block).
* Rearrange some other instructions to avoid datadep-stalls on StrongARM
and XScale.
* Since cpu_do_powersave == 0 is by far the common case, avoid a
pipeline flush by reordering the two idle blocks.
 1.15  16-Aug-2002  thorpej * Add a new machdep.powersave sysctl, which controls the use of
the CPU's "sleep" function in the idle loop.
* Default all CPUs to not use powersave, except for the PDA processors
(SA11x0 and PXA2x0).

This significantly reduces inteterrupt latency in high-performance
applications (and was good to squeeze another ~10% out of an XScale
IOP on a Gig-E benchmark).
 1.14  15-Aug-2002  briggs * Use local label names (.Lfoo vs. (Lfoo or foo))
* When moving from cpsr, use "cpsr" instead of "cpsr_all" (which is
provided, but doesn't make sense since mrs doesn't support fields
like msr does).
 1.13  14-Aug-2002  thorpej We only need to modify the CPSR's control field, so use cpsr_c rather
than cpsr_all.
 1.12  14-Aug-2002  chris Tweak asm to avoid a couple of stalls.
 1.11  12-Aug-2002  thorpej Rearrange the beginning of cpu_switch() slightly to reduce data-dep
stalls on StrongARM and XScale.
 1.10  12-Aug-2002  thorpej Make a slight tweak to register usage to save an instruction.
 1.9  06-Aug-2002  thorpej Rearrange the exit path so that we don't do a idcache_wbinv_all *twice*
when a process exits.
 1.8  06-Aug-2002  thorpej * Pass proc0 to switch_exit(), to make this a little more like the
nathanw_sa branch.
* In switch_exit(), set the outgoing-proc register to NULL (rather than
proc0) so that we actually use the "exiting process" optimization in
cpu_switch().
 1.7  14-May-2002  chris branches: 1.7.2;
Implement scheduler lock protocol, this fixes PR arm/10863.

Also add correct locking when freeing pages in pmap_destroy (fix from potr)

This now means that arm32 kernels can be built with LOCKDEBUG enabled. (only tested on cats though)
 1.6  25-Jan-2002  thorpej Overhaul of the ARM cache code. This is mostly a simplification
pass. Rather than providing a whole slew of cache operations that
aren't ever used, distill them down to some useful primitives:

icache_sync_all Synchronize I-cache
icache_sync_range Synchronize I-cache range

dcache_wbinv_all Write-back and Invalidate D-cache
dcache_wbinv_range Write-back and Invalidate D-cache range
dcache_inv_range Invalidate D-cache range
dcache_wb_range Write-back D-cache range

idcache_wbinv_all Write-back and Invalidate D-cache,
Invalidate I-cache
idcache_wbinv_range Write-back and Invalidate D-cache,
Invalidate I-cache range

Note: This does not yet include an overhaul of the actual asm files
that implement the primitives. Instead, we've provided a safe default
for each CPU type, and the individual CPU types can now be optimized
one at a time.
 1.5  29-Nov-2001  thorpej Provide a way for platforms to move away from the old RiscPC-centric
interrupt code. Garbage-collect some unused stuff.
 1.4  19-Nov-2001  chris Give the idle loop a non-profiled entry, means it appears in profile info correctly (rather than all it's time being under remrunqueue)
switch_exit only needs to take 1 parameter, it loads the value of proc0 into R1 itself
Fixup some comments to reflect the real state of things.
Tweak a couple of bits of asm to avoid a load delay.
remove excess code for setting curpcb and curproc.
 1.3  11-Nov-2001  chris branches: 1.3.2;
Correct comments for ffs algoritm (it isn't using register r0)
 1.2  16-Sep-2001  matt branches: 1.2.2;
Fix .type which uses wrong symbol name.
 1.1  28-Jul-2001  chris branches: 1.1.2; 1.1.4;
Move the generic arm32 files into arm/arm32 from arm32/arm32, tested kernel builds on cats and riscpc.
 1.1.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.1.2.6  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.1.2.5  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.2.4  11-Feb-2002  jdolecek Sync w/ -current.
 1.1.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.2.2  03-Aug-2001  lukem update to -current
 1.1.2.1  28-Jul-2001  lukem file cpuswitch.S was added on branch kqueue on 2001-08-03 04:10:58 +0000
 1.2.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.3.2.28  07-Jan-2003  thorpej Shave a couple of cycles off the beginning of cpu_switch().
 1.3.2.27  03-Jan-2003  thorpej Merge switch_exit() and switch_lwp_exit(), and hand-schedule the
resulting function to avoid stalls on StrongARM and XScale.
 1.3.2.26  31-Dec-2002  thorpej Rename cpu_preempt() to cpu_switchto(), and make the caller remove the
new process from its run queue before calling cpu_switchto().

While here, make a few cpu_switch() and cpu_switchto() implementations
get the outgoing LWP from the args, rather than looking at the curlwp
variable.
 1.3.2.25  18-Oct-2002  nathanw Catch up to -current.
 1.3.2.24  18-Sep-2002  thorpej Fix cpu_switch() after RAS integration from trunk.
 1.3.2.23  17-Sep-2002  nathanw Rearrange slightly and pass p, not l, to ras_lookup().
 1.3.2.22  17-Sep-2002  nathanw Catch up to -current.
 1.3.2.21  20-Aug-2002  thorpej Check to see if the incoming LWP has the same L1 table as the
outgoing LWP. If so, then we can skip the cache purge and TTB
reload. This results in a ~40% reduction in cache purges called
from cpu_switch() in my test using two threaded applications which
communicate with each other.
 1.3.2.20  19-Aug-2002  thorpej Partial (ARM only) sync with trunk -- significant performance improvements
for XScale-based systems.
 1.3.2.19  12-Aug-2002  thorpej Rearrange the beginning of cpu_switch() slightly to reduce data-dep
stalls on StrongARM and XScale.
 1.3.2.18  12-Aug-2002  thorpej Add the requisite calls to sched_lock_idle() and sched_unlock_idle() if
LOCKDEBUG is defined, as is done on the trunk.
 1.3.2.17  12-Aug-2002  thorpej More register usage tweaks to reduce differences with trunk.
 1.3.2.16  12-Aug-2002  thorpej Tweak register usage in cpu_switch() slightly to reduce differences
with the trunk.
 1.3.2.15  12-Aug-2002  thorpej Tweak register usage in cpu_preempt() slightly.
 1.3.2.14  12-Aug-2002  thorpej Reduce some differences with the trunk.
 1.3.2.13  06-Aug-2002  thorpej Rearrange the exit path so that we don't do a idcache_wbinv_all *twice*
when a process or lwp exits.
 1.3.2.12  06-Aug-2002  thorpej * In switch_exit()/switch_lwp_exit(), set the outgoing-lwp register to
NULL (rather than lwp0) so that we actually use the "exiting process"
optimization in cpu_switch().
* Correct some comments.
 1.3.2.11  05-Aug-2002  thorpej Fix cpu_preempt() for the __NEWINTR case.
 1.3.2.10  05-Aug-2002  thorpej Back out the changes that implement the scheduler locking protocol.
The register usage in this file is very different than on the trunk,
and so the changes made to the trunk don't really apply here.

Fix up some comments while here.
 1.3.2.9  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.3.2.8  21-Jun-2002  nathanw switch_search -> Lswitch_search in switch_exit().
 1.3.2.7  20-Jun-2002  nathanw Catch up to -current.
 1.3.2.6  11-Apr-2002  thorpej Make this compile again.
 1.3.2.5  28-Feb-2002  nathanw Catch up to -current.
 1.3.2.4  08-Jan-2002  nathanw Catch up to -current.
 1.3.2.3  29-Nov-2001  thorpej l_stat is an int now instead of a u_char. Pointed out by pk in a
commit message.
 1.3.2.2  15-Nov-2001  thorpej Machine-dependent kernel mods for scheduler activations on
32-bit ARM processors. Kernel boots multi-user on an XScale,
but upcalls not yet tested.
 1.3.2.1  11-Nov-2001  thorpej file cpuswitch.S was added on branch nathanw_sa on 2001-11-15 06:39:21 +0000
 1.7.2.2  31-Aug-2002  gehenna catch up with -current.
 1.7.2.1  30-Aug-2002  gehenna catch up with -current.
 1.28.2.7  24-Oct-2002  bjh21 Flush the cache before reading sched_whichqs. This is entirely the wrong
way to do this, but it should work (very slowly) until I work out the
right way.
 1.28.2.6  19-Oct-2002  bjh21 Switch to the idle PCB in the idle loop.
 1.28.2.5  19-Oct-2002  bjh21 In cpu_switch and cpu_exit, use curcpu to find curproc and curpcb, rather
than assuming CPU 0. Also fix a register-shuffling botch in cpu_exit.
 1.28.2.4  19-Oct-2002  bjh21 Reshuffle register usage in cpu_exit along the lines of cpu_switch to
reduce saving and restoring of registers.
 1.28.2.3  19-Oct-2002  bjh21 Redo the following revision:
syssrc/sys/arch/arm/arm32/cpuswitch.S 1.27

Original log message:

The grand cpu_switch register reshuffle!

In particular, use r8 to hold the old process, and r7 for medium-term
scratch, saving r0-r3 for things we don't need saved over function
calls. This gets rid of five register-to-register MOVs.
 1.28.2.2  19-Oct-2002  bjh21 Re-do the following revisions, this time on a branch where they won't
interfere with the nathanw_sa merge:

syssrc/sys/arch/arm/arm32/cpuswitch.S 1.26
syssrc/sys/arch/arm/arm32/genassym.cf 1.18
syssrc/sys/arch/arm/arm32/vm_machdep.c 1.21
syssrc/sys/arch/arm/include/pcb.h 1.5

Original commit message:

In cpu_switch(), stack more registers at the start of the function,
and hence save fewer into the PCB. This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly. While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.

This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
 1.28.2.1  19-Oct-2002  bjh21 file cpuswitch.S was added on branch bjh21-hydra on 2002-10-19 11:59:36 +0000
 1.36.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.36.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.36.2.1  03-Aug-2004  skrll Sync with HEAD
 1.41.16.6  17-Mar-2008  yamt sync with head.
 1.41.16.5  21-Jan-2008  yamt sync with head
 1.41.16.4  27-Oct-2007  yamt sync with head.
 1.41.16.3  03-Sep-2007  yamt sync with head.
 1.41.16.2  26-Feb-2007  yamt sync with head.
 1.41.16.1  21-Jun-2006  yamt sync with head.
 1.42.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.42.10.1  11-May-2006  elad sync with head
 1.42.8.1  24-May-2006  yamt sync with head.
 1.42.6.1  01-Jun-2006  kardel Sync with head.
 1.42.4.1  09-Sep-2006  rpaulo sync with head
 1.44.12.1  04-Mar-2007  bouyer branches: 1.44.12.1.4;
Pull up following revision(s) (requested by matt in ticket #470):
sys/arch/evbarm/iq80310/iq80310_timer.c: revision 1.20
sys/arch/evbarm/ifpga/pl030_rtc.c: revision 1.8
sys/arch/evbarm/include/types.h: revision 1.7
sys/arch/arm/arm32/genassym.cf: revision 1.30
sys/arch/arm/arm32/cpuswitch.S: revision 1.46
Get DOMAIN_CLIENT directly from arm/arm32/pte.h instead of from genassym
to avoid redefinition when both assymh and pte.h are included (as in
INTEGRATOR's intmmu.S, which uses more macros from pte.h).
Convert evbarm to __HAVE_GENERIC_TODR.
 1.44.12.1.4.1  10-Nov-2007  matt Add AT91 support from Sami Kantoluoto
Add TI OMAP2430 support from Marty Fouts @ Danger Inc
 1.44.8.1  30-Jan-2007  ad For now always call sched_unlock_idle/sched_lock_idle. They will be
removed by yamt's cpu_switchto() changes.
 1.45.2.3  08-Apr-2007  skrll Set curlwp in cpu_switchto and provide a cpu_did_resched.

Who is copying who now?
 1.45.2.2  29-Mar-2007  skrll Adapt arm32. Thanks to scw for helping out.

Tested on my cats (SA1)

XXX hydra should die. i've made some changes, but no guarantees.
 1.45.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.46.12.3  03-Oct-2007  garbled Sync with HEAD
 1.46.12.2  26-Jun-2007  garbled Sync with HEAD.
 1.46.12.1  22-May-2007  matt Update to HEAD.
 1.46.6.1  11-Jul-2007  mjf Sync with head.
 1.46.4.2  09-Oct-2007  ad Sync with head.
 1.46.4.1  27-May-2007  ad Sync with head.
 1.48.12.6  23-Mar-2008  matt sync with HEAD
 1.48.12.5  28-Jan-2008  matt Add fast softint switch support.
 1.48.12.4  07-Nov-2007  matt Make sure instructions are aligned after .asciz
 1.48.12.3  06-Nov-2007  matt sync with HEAD
 1.48.12.2  11-Sep-2007  matt Add a diagnostic check for cpu_switchto(x, NULL);
 1.48.12.1  29-Aug-2007  matt Reworked cpuswitch for armv6 and new world order
 1.48.10.3  21-Mar-2008  chris Sync with head.
 1.48.10.2  20-Jan-2008  chris Sync to HEAD.
 1.48.10.1  01-Jan-2008  chris Sync with HEAD.
 1.48.8.1  02-Oct-2007  joerg Sync with HEAD.
 1.48.6.2  28-Feb-2008  rjs Sync with HEAD.
 1.48.6.1  01-Nov-2007  rjs Sync with HEAD.
 1.50.8.2  20-Jan-2008  bouyer Sync with HEAD
 1.50.8.1  19-Jan-2008  bouyer Sync with HEAD
 1.50.2.1  18-Feb-2008  mjf Sync with HEAD.
 1.55.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.55.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.55.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.55.2.1  24-Mar-2008  keiichi sync with head.
 1.56.2.1  18-May-2008  yamt sync with head.
 1.57.2.3  11-Mar-2010  yamt sync with head
 1.57.2.2  04-May-2009  yamt sync with head.
 1.57.2.1  16-May-2008  yamt sync with head.
 1.58.14.1  15-Feb-2014  matt Merge armv7 support from HEAD, specifically support for the BCM5301X
and BCM56340 evbarm kernels.
 1.58.8.1  19-Jan-2009  skrll Sync with HEAD.
 1.58.6.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.60.4.2  21-Apr-2011  rmind sync with head
 1.60.4.1  05-Mar-2011  rmind sync with head
 1.61.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.61.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.63.12.1  28-Nov-2012  matt Merge improved arm support (especially Cortex) from HEAD
including OMAP and BCM53xx support.
 1.63.4.4  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.63.4.3  23-Jan-2013  yamt sync with head
 1.63.4.2  16-Jan-2013  yamt sync with (a bit old) head
 1.63.4.1  30-Oct-2012  yamt sync with head
 1.72.2.5  03-Dec-2017  jdolecek update from HEAD
 1.72.2.4  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.72.2.3  23-Jun-2013  tls resync from head
 1.72.2.2  25-Feb-2013  tls resync with head
 1.72.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.77.6.2  18-May-2014  rmind sync with head
 1.77.6.1  28-Aug-2013  rmind sync with head
 1.83.2.1  10-Aug-2014  tls Rebase.
 1.84.2.1  27-Mar-2015  martin Pull up following revision(s) (requested by skrll in ticket #646):
sys/arch/arm/arm32/genassym.cf: revision 1.70
sys/arch/arm/arm32/cpuswitch.S: revision 1.86-1.89

Only set vfp & tpid registers and do ras lookups if new
lwp is not LW_SYSTEM.
 1.85.2.3  28-Aug-2017  skrll Sync with HEAD
 1.85.2.2  06-Jun-2015  skrll Sync with HEAD
 1.85.2.1  06-Apr-2015  skrll Sync with HEAD
 1.90.10.1  31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1859):

sys/arch/ia64/ia64/vm_machdep.c: revision 1.18
sys/arch/powerpc/powerpc/locore_subr.S: revision 1.67
sys/arch/aarch64/aarch64/locore.S: revision 1.91
sys/arch/mips/include/asm.h: revision 1.74
sys/arch/hppa/include/cpu.h: revision 1.13
sys/arch/arm/arm/armv6_start.S: revision 1.38
(applied also to sys/arch/arm/cortex/a9_mpsubr.S,
sys/arch/arm/cortex/a9_mpsubr.S,
sys/arch/arm/cortex/cortex_init.S)
sys/arch/evbmips/ingenic/cpu_startup.S: revision 1.2
sys/arch/mips/mips/locore.S: revision 1.229
sys/arch/alpha/include/asm.h: revision 1.45
(applied to sys/arch/alpha/alpha/multiproc.s)
sys/arch/sparc64/sparc64/locore.s: revision 1.432
sys/arch/vax/vax/subr.S: revision 1.42
sys/arch/mips/mips/locore_mips3.S: revision 1.116
sys/arch/ia64/ia64/machdep.c: revision 1.44
sys/arch/arm/arm32/cpuswitch.S: revision 1.106
sys/arch/sparc/sparc/locore.s: revision 1.284
(all via patch)

aarch64: Add missing barriers in cpu_switchto.
Details in comments.

Note: This is a conservative change that inserts a barrier where
there was a comment saying none is needed, which is probably correct.
The goal of this change is to systematically add barriers to be
confident in correctness; subsequent changes may remove some bariers,
as an optimization, with an explanation of why each barrier is not
needed.

PR kern/57240

alpha: Add missing barriers in cpu_switchto.
Details in comments.

arm32: Add missing barriers in cpu_switchto.
Details in comments.

hppa: Add missing barriers in cpu_switchto.
Not sure hppa has ever had working MULTIPROCESSOR, so maybe no
pullups needed?

ia64: Add missing barriers in cpu_switchto.
(ia64 has never really worked, so no pullups needed, right?)

mips: Add missing barriers in cpu_switchto.
Details in comments.

powerpc: Add missing barriers in cpu_switchto.
Details in comments.

sparc: Add missing barriers in cpu_switchto.

sparc64: Add missing barriers in cpu_switchto.
Details in comments.

vax: Note where cpu_switchto needs barriers.

Not sure vax has ever had working MULTIPROCESSOR, though, and I'm not
even sure how to spell store-before-load barriers on VAX, so no
functional change for now.
 1.92.6.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.92.6.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.92.6.1  10-Jun-2019  christos Sync with HEAD
 1.92.4.1  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.93.4.1  31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1676):

sys/arch/ia64/ia64/vm_machdep.c: revision 1.18
sys/arch/powerpc/powerpc/locore_subr.S: revision 1.67
sys/arch/aarch64/aarch64/locore.S: revision 1.91
sys/arch/mips/include/asm.h: revision 1.74
sys/arch/hppa/include/cpu.h: revision 1.13
sys/arch/arm/arm/armv6_start.S: revision 1.38
sys/arch/evbmips/ingenic/cpu_startup.S: revision 1.2
sys/arch/mips/mips/locore.S: revision 1.229
sys/arch/aarch64/aarch64/cpuswitch.S: revision 1.40
sys/arch/alpha/include/asm.h: revision 1.45
sys/arch/sparc64/sparc64/locore.s: revision 1.432
sys/arch/vax/vax/subr.S: revision 1.42
sys/arch/mips/mips/locore_mips3.S: revision 1.116
sys/arch/ia64/ia64/machdep.c: revision 1.44
sys/arch/arm/arm32/cpuswitch.S: revision 1.106
sys/arch/sparc/sparc/locore.s: revision 1.284
(all via patch)

aarch64: Add missing barriers in cpu_switchto.
Details in comments.

Note: This is a conservative change that inserts a barrier where
there was a comment saying none is needed, which is probably correct.
The goal of this change is to systematically add barriers to be
confident in correctness; subsequent changes may remove some bariers,
as an optimization, with an explanation of why each barrier is not
needed.

PR kern/57240

alpha: Add missing barriers in cpu_switchto.
Details in comments.

arm32: Add missing barriers in cpu_switchto.
Details in comments.

hppa: Add missing barriers in cpu_switchto.
Not sure hppa has ever had working MULTIPROCESSOR, so maybe no
pullups needed?

ia64: Add missing barriers in cpu_switchto.
(ia64 has never really worked, so no pullups needed, right?)

mips: Add missing barriers in cpu_switchto.
Details in comments.

powerpc: Add missing barriers in cpu_switchto.
Details in comments.

sparc: Add missing barriers in cpu_switchto.

sparc64: Add missing barriers in cpu_switchto.
Details in comments.

vax: Note where cpu_switchto needs barriers.

Not sure vax has ever had working MULTIPROCESSOR, though, and I'm not
even sure how to spell store-before-load barriers on VAX, so no
functional change for now.
 1.95.2.2  29-Feb-2020  ad Sync with head.
 1.95.2.1  17-Jan-2020  ad Sync with head.
 1.103.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.104.6.1  31-May-2021  cjep sync with head
 1.104.4.1  17-Jun-2021  thorpej Sync w/ HEAD.
 1.105.12.1  31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #264):

sys/arch/ia64/ia64/vm_machdep.c: revision 1.18
sys/arch/powerpc/powerpc/locore_subr.S: revision 1.67
sys/arch/aarch64/aarch64/locore.S: revision 1.91
sys/arch/mips/include/asm.h: revision 1.74
sys/arch/hppa/include/cpu.h: revision 1.13
sys/arch/arm/arm/armv6_start.S: revision 1.38
sys/arch/evbmips/ingenic/cpu_startup.S: revision 1.2
sys/arch/mips/mips/locore.S: revision 1.229
sys/arch/aarch64/aarch64/cpuswitch.S: revision 1.40
sys/arch/alpha/include/asm.h: revision 1.45
sys/arch/sparc64/sparc64/locore.s: revision 1.432
sys/arch/vax/vax/subr.S: revision 1.42
sys/arch/mips/mips/locore_mips3.S: revision 1.116
sys/arch/riscv/riscv/cpu_switch.S: revision 1.3
sys/arch/ia64/ia64/machdep.c: revision 1.44
sys/arch/arm/arm32/cpuswitch.S: revision 1.106
sys/arch/sparc/sparc/locore.s: revision 1.284

aarch64: Add missing barriers in cpu_switchto.
Details in comments.

Note: This is a conservative change that inserts a barrier where
there was a comment saying none is needed, which is probably correct.
The goal of this change is to systematically add barriers to be
confident in correctness; subsequent changes may remove some bariers,
as an optimization, with an explanation of why each barrier is not
needed.

PR kern/57240

alpha: Add missing barriers in cpu_switchto.
Details in comments.

arm32: Add missing barriers in cpu_switchto.
Details in comments.

hppa: Add missing barriers in cpu_switchto.
Not sure hppa has ever had working MULTIPROCESSOR, so maybe no
pullups needed?

ia64: Add missing barriers in cpu_switchto.
(ia64 has never really worked, so no pullups needed, right?)

mips: Add missing barriers in cpu_switchto.
Details in comments.

powerpc: Add missing barriers in cpu_switchto.
Details in comments.

riscv: Add missing barriers in cpu_switchto.
Details in comments.

sparc: Add missing barriers in cpu_switchto.

sparc64: Add missing barriers in cpu_switchto.
Details in comments.

vax: Note where cpu_switchto needs barriers.

Not sure vax has ever had working MULTIPROCESSOR, though, and I'm not
even sure how to spell store-before-load barriers on VAX, so no
functional change for now.

RSS XML Feed