History log of /src/common/lib/libc/arch/i386/atomic/atomic.S |
Revision | | Date | Author | Comments |
1.38 |
| 06-Sep-2025 |
riastradh | paravirt_membar_sync(9): New memory barrier.
For use in paravirtualized drivers which require store-before-load ordering -- irrespective of whether the kernel is built for a single processor, or whether the (virtual) machine is booted with a single processor.
This is even required on architectures that don't even have a store-before-load ordering barrier, like m68k; adding, e.g., a virtio bus is _as if_ the architecture has been extended with relaxed memory ordering when talking with that new bus. Such architectures need some way to request the hypervisor enforce that ordering -- on m68k, that's done by issuing a CASL instruction, which qemu maps to an atomic r/m/w with sequential consistency ordering in the host.
PR kern/59618: occasional virtio block device lock ups/hangs
|
1.37 |
| 16-Jul-2024 |
riastradh | xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.
Both xen_mb and membar_sync are designed to provide store-before-load ordering, but xen_mb has to provide it in synchronizing guest with hypervisor, while membar_sync only has to provide it in synchronizing one (guest) CPU with another (guest) CPU.
It is safe to hotpatch away the LOCK prefix in membar_sync on a uniprocessor boot because membar_sync is only designed to coordinate between normal memory on multiple CPUs, and is never necessary when there's only one CPU involved.
But xen_mb is used to coordinate between the guest and the `device' implemented by a hypervisor, which might be running on another _physical_ CPU even if the NetBSD guest only sees one `CPU', i.e., one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must still issue an instruction with store-before-load ordering on multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is costlier for no benefit here).
No need to change xen_wmb (release ordering, load/store-before-store) or xen_rmb (acquire ordering, load-before-load/store) because every x86 store is a store-release and every x86 load is a load-acquire, even on multiprocessor systems, so there's no hotpatching involved anyway.
PR kern/57199
|
1.36 |
| 30-Jul-2022 |
riastradh | branches: 1.36.2; 1.36.8; x86: Eliminate mfence hotpatch for membar_sync.
The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper than MFENCE anyway. Let's save some space and maintenance and rip out the hotpatching for it.
|
1.35 |
| 09-Apr-2022 |
riastradh | Introduce membar_acquire/release. Deprecate membar_enter/exit.
The names membar_enter/exit were unclear, and the documentation of membar_enter has disagreed with the implementations on sparc, powerpc, and even x86(!) for the entire time it has been in NetBSD.
The terms `acquire' and `release' are ubiquitous in the literature today, and have been adopted in the C and C++ standards to mean load-before-load/store and load/store-before-store, respectively, which are exactly the orderings required by acquiring and releasing a mutex, as well as other useful applications like decrementing a reference count and then freeing the underlying object if it went to zero.
Originally I proposed changing one word in the documentation for membar_enter to make it load-before-load/store instead of store-before-load/store, i.e., to make it an acquire barrier. I proposed this on the grounds that
(a) all implementations guarantee load-before-load/store, (b) some implementations fail to guarantee store-before-load/store, and (c) all uses in-tree assume load-before-load/store.
I verified parts (a) and (b) (except, for (a), powerpc didn't even guarantee load-before-load/store -- isync isn't necessarily enough; need lwsync in general -- but it _almost_ did, and it certainly didn't guarantee store-before-load/store).
Part (c) might not be correct, however: under the mistaken assumption that atomic-r/m/w then membar-w/rw is equivalent to atomic-r/m/w then membar-r/rw, I only audited the cases of membar_enter that _aren't_ immediately after an atomic-r/m/w. All of those cases assume load-before-load/store. But my assumption was wrong -- there are cases of atomic-r/m/w then membar-w/rw that would be broken by changing to atomic-r/m/w then membar-r/rw:
https://mail-index.netbsd.org/tech-kern/2022/03/29/msg028044.html
Furthermore, the name membar_enter has been adopted in other places like OpenBSD where it actually does follow the documentation and guarantee store-before-load/store, even if that order is not useful. So the name membar_enter currently lives in a bad place where it means either of two things -- r/rw or w/rw.
With this change, we deprecate membar_enter/exit, introduce membar_acquire/release as better names for the useful pair (r/rw and rw/w), and make sure the implementation of membar_enter guarantees both what was documented _and_ what was implemented, making it an alias for membar_sync.
While here, rework all of the membar_* definitions and aliases. The new logic follows a rule to make it easier to audit:
membar_X is defined as an alias for membar_Y iff membar_X is guaranteed by membar_Y.
The `no stronger than' relation is (the transitive closure of):
- membar_consumer (r/r) is guaranteed by membar_acquire (r/rw) - membar_producer (w/w) is guaranteed by membar_release (rw/w) - membar_acquire (r/rw) is guaranteed by membar_sync (rw/rw) - membar_release (rw/w) is guaranteed by membar_sync (rw/rw)
And, for the deprecated membars:
- membar_enter (whether r/rw, w/rw, or rw/rw) is guaranteed by membar_sync (rw/rw) - membar_exit (rw/w) is guaranteed by membar_release (rw/w)
(membar_exit is identical to membar_release, but the name is deprecated.)
Finally, while here, annotate some of the instructions with their semantics. For powerpc, leave an essay with citations on the unfortunate but -- as far as I can tell -- necessary decision to use lwsync, not isync, for membar_acquire and membar_consumer.
Also add membar(3) and atomic(3) man page links.
|
1.34 |
| 09-Apr-2022 |
riastradh | i386/membar_ops: Upgrade membar_enter from R/RW to RW/RW.
This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).
|
1.33 |
| 09-Apr-2022 |
riastradh | x86: Add a note on membar_sync and mfence.
|
1.32 |
| 09-Apr-2022 |
riastradh | x86: Omit needless store in membar_producer/exit.
On x86, every store is a store-release, so there is no need for any barrier. But this wasn't a barrier anyway; it was just a store, which was redundant with the store of the return address to the stack implied by CALL even if issuing a store made a difference.
|
1.31 |
| 09-Apr-2022 |
riastradh | x86: Every load is a load-acquire, so membar_consumer is a noop.
lfence is only needed for MD logic, such as operations on I/O memory rather than normal cacheable memory, or special instructions like RDTSC -- never for MI synchronization between threads/CPUs. No need for hot-patching to do lfence here.
(The x86_lfence function might reasonably be patched on i386 to do lfence for MD logic, but it isn't now and this doesn't change that.)
|
1.30 |
| 06-Apr-2022 |
riastradh | Nix trailing whitespace in files of membars, atomics, and lock stubs.
Will be touching many of these files soon for functional changes.
No functional change intended.
|
1.29 |
| 01-May-2020 |
maxv | Use the hotpatch framework when patching _atomic_cas_64.
|
1.28 |
| 26-Apr-2020 |
maxv | Use the hotpatch framework for LFENCE/MFENCE.
|
1.27 |
| 26-Apr-2020 |
maxv | Remove unused argument in macro.
|
1.26 |
| 26-Apr-2020 |
maxv | Remove unused.
|
1.25 |
| 26-Apr-2020 |
maxv | Drop the hardcoded array, use the hotpatch section.
|
1.24 |
| 25-Apr-2020 |
bouyer | Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM guests in GENERIC. Xen support can be disabled at runtime with boot -c disable hypervisor
|
1.23 |
| 18-Jul-2018 |
bouyer | branches: 1.23.8; On Xen, always alias _atomic_cas_64 to _atomic_cas_cx8. AFAIK Xen doesn't support CPUs that don't support cx8. i386 XENPAE_DOMU boots again.
|
1.22 |
| 23-May-2014 |
uebayasi | branches: 1.22.22; 1.22.24; Put missing END() markers to set ELF symbol size.
|
1.21 |
| 22-Apr-2014 |
christos | The kernel uses 64 bit atomic ops.
|
1.20 |
| 18-Feb-2014 |
martin | branches: 1.20.2; Provide most missing __sync_*64 primitives for i386
|
1.19 |
| 12-Jan-2011 |
joerg | branches: 1.19.6; 1.19.12; Allow use of traditional CPP to be set on a per platform base in sys.mk. Honour this for dependency processing in bsd.dep.mk. Switch i386 and amd64 assembly to use ISO C90 preprocessor concat and drop the -traditional-cpp on this platform.
|
1.18 |
| 26-Nov-2009 |
pooka | Use strong alias within the kernel namespace regardless of if we're dealing with a hard or soft kernel (kernel linker doesn't support weak symbols).
|
1.17 |
| 02-Apr-2009 |
enami | So that profile kernel runs again, - Adjust the size of functions used to patch. - Fix the jump offset of mcount call when patching functions.
Approved by Andrew Doran.
|
1.16 |
| 12-Jan-2009 |
pooka | branches: 1.16.2; include sys/param.h for _HARDKERNEL instead of homegrown def.
|
1.15 |
| 04-Jan-2009 |
pooka | Opt for libc versions in case of _KERNEL && !_RUMPKERNEL. (kernel version uses sti/cli and is not PIC)
|
1.14 |
| 19-Dec-2008 |
ad | PR kern/40213 my i386 machine can't boot because of tsc
- Patch in atomic_cas_64() twice. The first patch is early and makes it the MP-atomic version available if we have cmpxchg8b. The second patch strips the lock prefix if ncpu==1.
- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
|
1.13 |
| 25-May-2008 |
chs | branches: 1.13.4; enable profiling of assembly functions.
|
1.12 |
| 03-May-2008 |
yamt | branches: 1.12.2; whitespace.
|
1.11 |
| 03-May-2008 |
yamt | rename END to ENDLABEL. i'll use END for other purpose. ok by Andrew Doran.
|
1.10 |
| 28-Apr-2008 |
martin | Remove clause 3 and 4 from TNF licenses
|
1.9 |
| 10-Feb-2008 |
ad | branches: 1.9.4; Add atomic_cas_foo_ni().
|
1.8 |
| 09-Feb-2008 |
ad | membar_enter was doing the wrong thing. For x86 we can alias:
membar_enter -> membar_consumer membar_exit -> membar_producer
|
1.7 |
| 20-Dec-2007 |
ad | branches: 1.7.2; - Make __cpu_simple_lock and similar real functions and patch at runtime. - Remove old x86 atomic ops. - Drop text alignment back to 16 on i386 (really, this time). - Minor cleanup.
|
1.6 |
| 20-Dec-2007 |
ad | 64-bit atomic ops for i386.
|
1.5 |
| 09-Dec-2007 |
ad | Add missing strong aliases.
|
1.4 |
| 29-Nov-2007 |
ad | atomic_add_* takes signed integers, the others take unsigned.
|
1.3 |
| 28-Nov-2007 |
ad | A lock prefix on xchg is meaningless.
|
1.2 |
| 28-Nov-2007 |
ad | Fix up a few minor problems.
|
1.1 |
| 28-Nov-2007 |
ad | x86 atomic ops.
|
1.7.2.3 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.7.2.2 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.7.2.1 |
| 20-Dec-2007 |
matt | file atomic.S was added on branch matt-armv6 on 2008-01-09 01:20:53 +0000
|
1.9.4.2 |
| 04-Jun-2008 |
yamt | sync with head
|
1.9.4.1 |
| 18-May-2008 |
yamt | sync with head.
|
1.12.2.1 |
| 23-Jun-2008 |
wrstuden | Sync w/ -current. 34 merge conflicts to follow.
|
1.13.4.2 |
| 03-Apr-2009 |
snj | Pull up following revision(s) (requested by enami in ticket #645): common/lib/libc/arch/i386/atomic/atomic.S: revision 1.17 sys/arch/amd64/amd64/spl.S: revision 1.21 sys/arch/x86/x86/patch.c: revision 1.17 So that profile kernel runs again, - Adjust the size of functions used to patch. - Fix the jump offset of mcount call when patching functions. Approved by Andrew Doran.
|
1.13.4.1 |
| 02-Feb-2009 |
snj | Pull up following revision(s) (requested by ad in ticket #343): common/lib/libc/arch/i386/atomic/atomic.S: revision 1.14 sys/arch/x86/include/cpufunc.h: revision 1.9 sys/arch/x86/x86/identcpu.c: revision 1.12 sys/arch/x86/x86/cpu.c: revision 1.60 sys/arch/x86/x86/patch.c: revision 1.15 PR kern/40213 my i386 machine can't boot because of tsc - Patch in atomic_cas_64() twice. The first patch is early and makes it the MP-atomic version available if we have cmpxchg8b. The second patch strips the lock prefix if ncpu==1. - Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
|
1.16.2.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Commit is split, to avoid a "too many arguments" protocol error.
|
1.19.12.1 |
| 19-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.19.6.1 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.20.2.1 |
| 10-Aug-2014 |
tls | Rebase.
|
1.22.24.3 |
| 21-Apr-2020 |
martin | Ooops, restore accidently removed files from merge mishap
|
1.22.24.2 |
| 21-Apr-2020 |
martin | Sync with HEAD
|
1.22.24.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.22.22.1 |
| 28-Jul-2018 |
pgoyette | Sync with HEAD
|
1.23.8.1 |
| 14-Apr-2020 |
bouyer | Force _atomic_cas_cx8 only for XENPV; x86_patch works fine for (PV)HVM
|
1.36.8.1 |
| 02-Aug-2025 |
perseant | Sync with HEAD
|
1.36.2.1 |
| 20-Jul-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #764):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.37 sys/arch/xen/include/xenring.h: revision 1.8 sys/arch/i386/i386/cpufunc.S: revision 1.52 sys/arch/amd64/amd64/cpufunc.S: revision 1.68 sys/arch/xen/include/hypervisor.h: revision 1.60 common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.30
xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.
Both xen_mb and membar_sync are designed to provide store-before-load ordering, but xen_mb has to provide it in synchronizing guest with hypervisor, while membar_sync only has to provide it in synchronizing one (guest) CPU with another (guest) CPU.
It is safe to hotpatch away the LOCK prefix in membar_sync on a uniprocessor boot because membar_sync is only designed to coordinate between normal memory on multiple CPUs, and is never necessary when there's only one CPU involved.
But xen_mb is used to coordinate between the guest and the `device' implemented by a hypervisor, which might be running on another _physical_ CPU even if the NetBSD guest only sees one `CPU', i.e., one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must still issue an instruction with store-before-load ordering on multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is costlier for no benefit here).
No need to change xen_wmb (release ordering, load/store-before-store) or xen_rmb (acquire ordering, load-before-load/store) because every x86 store is a store-release and every x86 load is a load-acquire, even on multiprocessor systems, so there's no hotpatching involved anyway.
PR kern/57199
|