Home | History | Annotate | only in /src/common/lib/libc/arch/i386/atomic
History log of /src/common/lib/libc/arch/i386/atomic
RevisionDateAuthorComments
 1.8 18-Feb-2014  martin branches: 1.8.26;
Provide most missing __sync_*64 primitives for i386
 1.7 04-Jan-2009  pooka branches: 1.7.8; 1.7.14;
allow inclusion of atomic ops in librump
 1.6 29-Sep-2008  ad Allow atomic ops to be built as part of libpthread.
 1.5 11-Feb-2008  ad Only build atomic ops for libkern/libc.
 1.4 10-Feb-2008  ad Enable the atomic ops in userspace.
 1.3 20-Dec-2007  ad branches: 1.3.2;
64-bit atomic ops for i386.
 1.2 28-Nov-2007  ad x86 atomic ops.
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file Makefile.inc was initially added on branch thorpej-atomic.
 1.1.2.1 17-Apr-2007  thorpej Add build glue for i386 atomic ops.
 1.3.2.3 23-Mar-2008  matt sync with HEAD
 1.3.2.2 09-Jan-2008  matt sync with HEAD
 1.3.2.1 20-Dec-2007  matt file Makefile.inc was added on branch matt-armv6 on 2008-01-09 01:20:52 +0000
 1.7.14.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.8.26.1 21-Apr-2020  martin Sync with HEAD
 1.38 06-Sep-2025  riastradh paravirt_membar_sync(9): New memory barrier.

For use in paravirtualized drivers which require store-before-load
ordering -- irrespective of whether the kernel is built for a single
processor, or whether the (virtual) machine is booted with a single
processor.

This is even required on architectures that don't even have a
store-before-load ordering barrier, like m68k; adding, e.g., a virtio
bus is _as if_ the architecture has been extended with relaxed memory
ordering when talking with that new bus. Such architectures need
some way to request the hypervisor enforce that ordering -- on m68k,
that's done by issuing a CASL instruction, which qemu maps to an
atomic r/m/w with sequential consistency ordering in the host.

PR kern/59618: occasional virtio block device lock ups/hangs
 1.37 16-Jul-2024  riastradh xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.

Both xen_mb and membar_sync are designed to provide store-before-load
ordering, but xen_mb has to provide it in synchronizing guest with
hypervisor, while membar_sync only has to provide it in synchronizing
one (guest) CPU with another (guest) CPU.

It is safe to hotpatch away the LOCK prefix in membar_sync on a
uniprocessor boot because membar_sync is only designed to coordinate
between normal memory on multiple CPUs, and is never necessary when
there's only one CPU involved.

But xen_mb is used to coordinate between the guest and the `device'
implemented by a hypervisor, which might be running on another
_physical_ CPU even if the NetBSD guest only sees one `CPU', i.e.,
one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must
still issue an instruction with store-before-load ordering on
multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is
costlier for no benefit here).

No need to change xen_wmb (release ordering, load/store-before-store)
or xen_rmb (acquire ordering, load-before-load/store) because every
x86 store is a store-release and every x86 load is a load-acquire,
even on multiprocessor systems, so there's no hotpatching involved
anyway.

PR kern/57199
 1.36 30-Jul-2022  riastradh branches: 1.36.2; 1.36.8;
x86: Eliminate mfence hotpatch for membar_sync.

The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper
than MFENCE anyway. Let's save some space and maintenance and rip
out the hotpatching for it.
 1.35 09-Apr-2022  riastradh Introduce membar_acquire/release. Deprecate membar_enter/exit.

The names membar_enter/exit were unclear, and the documentation of
membar_enter has disagreed with the implementations on sparc,
powerpc, and even x86(!) for the entire time it has been in NetBSD.

The terms `acquire' and `release' are ubiquitous in the literature
today, and have been adopted in the C and C++ standards to mean
load-before-load/store and load/store-before-store, respectively,
which are exactly the orderings required by acquiring and releasing a
mutex, as well as other useful applications like decrementing a
reference count and then freeing the underlying object if it went to
zero.

Originally I proposed changing one word in the documentation for
membar_enter to make it load-before-load/store instead of
store-before-load/store, i.e., to make it an acquire barrier. I
proposed this on the grounds that

(a) all implementations guarantee load-before-load/store,
(b) some implementations fail to guarantee store-before-load/store,
and
(c) all uses in-tree assume load-before-load/store.

I verified parts (a) and (b) (except, for (a), powerpc didn't even
guarantee load-before-load/store -- isync isn't necessarily enough;
need lwsync in general -- but it _almost_ did, and it certainly didn't
guarantee store-before-load/store).

Part (c) might not be correct, however: under the mistaken assumption
that atomic-r/m/w then membar-w/rw is equivalent to atomic-r/m/w then
membar-r/rw, I only audited the cases of membar_enter that _aren't_
immediately after an atomic-r/m/w. All of those cases assume
load-before-load/store. But my assumption was wrong -- there are
cases of atomic-r/m/w then membar-w/rw that would be broken by
changing to atomic-r/m/w then membar-r/rw:

https://mail-index.netbsd.org/tech-kern/2022/03/29/msg028044.html

Furthermore, the name membar_enter has been adopted in other places
like OpenBSD where it actually does follow the documentation and
guarantee store-before-load/store, even if that order is not useful.
So the name membar_enter currently lives in a bad place where it
means either of two things -- r/rw or w/rw.

With this change, we deprecate membar_enter/exit, introduce
membar_acquire/release as better names for the useful pair (r/rw and
rw/w), and make sure the implementation of membar_enter guarantees
both what was documented _and_ what was implemented, making it an
alias for membar_sync.

While here, rework all of the membar_* definitions and aliases. The
new logic follows a rule to make it easier to audit:

membar_X is defined as an alias for membar_Y iff membar_X is
guaranteed by membar_Y.

The `no stronger than' relation is (the transitive closure of):

- membar_consumer (r/r) is guaranteed by membar_acquire (r/rw)
- membar_producer (w/w) is guaranteed by membar_release (rw/w)
- membar_acquire (r/rw) is guaranteed by membar_sync (rw/rw)
- membar_release (rw/w) is guaranteed by membar_sync (rw/rw)

And, for the deprecated membars:

- membar_enter (whether r/rw, w/rw, or rw/rw) is guaranteed by
membar_sync (rw/rw)
- membar_exit (rw/w) is guaranteed by membar_release (rw/w)

(membar_exit is identical to membar_release, but the name is
deprecated.)

Finally, while here, annotate some of the instructions with their
semantics. For powerpc, leave an essay with citations on the
unfortunate but -- as far as I can tell -- necessary decision to use
lwsync, not isync, for membar_acquire and membar_consumer.

Also add membar(3) and atomic(3) man page links.
 1.34 09-Apr-2022  riastradh i386/membar_ops: Upgrade membar_enter from R/RW to RW/RW.

This will be deprecated soon but let's avoid leaving rakes to trip on
with it arising from disagreement over the documentation (W/RW) and
implementation and usage (R/RW).
 1.33 09-Apr-2022  riastradh x86: Add a note on membar_sync and mfence.
 1.32 09-Apr-2022  riastradh x86: Omit needless store in membar_producer/exit.

On x86, every store is a store-release, so there is no need for any
barrier. But this wasn't a barrier anyway; it was just a store,
which was redundant with the store of the return address to the stack
implied by CALL even if issuing a store made a difference.
 1.31 09-Apr-2022  riastradh x86: Every load is a load-acquire, so membar_consumer is a noop.

lfence is only needed for MD logic, such as operations on I/O memory
rather than normal cacheable memory, or special instructions like
RDTSC -- never for MI synchronization between threads/CPUs. No need
for hot-patching to do lfence here.

(The x86_lfence function might reasonably be patched on i386 to do
lfence for MD logic, but it isn't now and this doesn't change that.)
 1.30 06-Apr-2022  riastradh Nix trailing whitespace in files of membars, atomics, and lock stubs.

Will be touching many of these files soon for functional changes.

No functional change intended.
 1.29 01-May-2020  maxv Use the hotpatch framework when patching _atomic_cas_64.
 1.28 26-Apr-2020  maxv Use the hotpatch framework for LFENCE/MFENCE.
 1.27 26-Apr-2020  maxv Remove unused argument in macro.
 1.26 26-Apr-2020  maxv Remove unused.
 1.25 26-Apr-2020  maxv Drop the hardcoded array, use the hotpatch section.
 1.24 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.23 18-Jul-2018  bouyer branches: 1.23.8;
On Xen, always alias _atomic_cas_64 to _atomic_cas_cx8. AFAIK Xen doesn't
support CPUs that don't support cx8.
i386 XENPAE_DOMU boots again.
 1.22 23-May-2014  uebayasi branches: 1.22.22; 1.22.24;
Put missing END() markers to set ELF symbol size.
 1.21 22-Apr-2014  christos The kernel uses 64 bit atomic ops.
 1.20 18-Feb-2014  martin branches: 1.20.2;
Provide most missing __sync_*64 primitives for i386
 1.19 12-Jan-2011  joerg branches: 1.19.6; 1.19.12;
Allow use of traditional CPP to be set on a per platform base in sys.mk.
Honour this for dependency processing in bsd.dep.mk. Switch i386 and
amd64 assembly to use ISO C90 preprocessor concat and drop the
-traditional-cpp on this platform.
 1.18 26-Nov-2009  pooka Use strong alias within the kernel namespace regardless of if we're
dealing with a hard or soft kernel (kernel linker doesn't support
weak symbols).
 1.17 02-Apr-2009  enami So that profile kernel runs again,
- Adjust the size of functions used to patch.
- Fix the jump offset of mcount call when patching functions.

Approved by Andrew Doran.
 1.16 12-Jan-2009  pooka branches: 1.16.2;
include sys/param.h for _HARDKERNEL instead of homegrown def.
 1.15 04-Jan-2009  pooka Opt for libc versions in case of _KERNEL && !_RUMPKERNEL.
(kernel version uses sti/cli and is not PIC)
 1.14 19-Dec-2008  ad PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.13 25-May-2008  chs branches: 1.13.4;
enable profiling of assembly functions.
 1.12 03-May-2008  yamt branches: 1.12.2;
whitespace.
 1.11 03-May-2008  yamt rename END to ENDLABEL. i'll use END for other purpose. ok by Andrew Doran.
 1.10 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.9 10-Feb-2008  ad branches: 1.9.4;
Add atomic_cas_foo_ni().
 1.8 09-Feb-2008  ad membar_enter was doing the wrong thing. For x86 we can alias:

membar_enter -> membar_consumer
membar_exit -> membar_producer
 1.7 20-Dec-2007  ad branches: 1.7.2;
- Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.6 20-Dec-2007  ad 64-bit atomic ops for i386.
 1.5 09-Dec-2007  ad Add missing strong aliases.
 1.4 29-Nov-2007  ad atomic_add_* takes signed integers, the others take unsigned.
 1.3 28-Nov-2007  ad A lock prefix on xchg is meaningless.
 1.2 28-Nov-2007  ad Fix up a few minor problems.
 1.1 28-Nov-2007  ad x86 atomic ops.
 1.7.2.3 23-Mar-2008  matt sync with HEAD
 1.7.2.2 09-Jan-2008  matt sync with HEAD
 1.7.2.1 20-Dec-2007  matt file atomic.S was added on branch matt-armv6 on 2008-01-09 01:20:53 +0000
 1.9.4.2 04-Jun-2008  yamt sync with head
 1.9.4.1 18-May-2008  yamt sync with head.
 1.12.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.13.4.2 03-Apr-2009  snj Pull up following revision(s) (requested by enami in ticket #645):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.17
sys/arch/amd64/amd64/spl.S: revision 1.21
sys/arch/x86/x86/patch.c: revision 1.17
So that profile kernel runs again,
- Adjust the size of functions used to patch.
- Fix the jump offset of mcount call when patching functions.
Approved by Andrew Doran.
 1.13.4.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #343):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.14
sys/arch/x86/include/cpufunc.h: revision 1.9
sys/arch/x86/x86/identcpu.c: revision 1.12
sys/arch/x86/x86/cpu.c: revision 1.60
sys/arch/x86/x86/patch.c: revision 1.15
PR kern/40213 my i386 machine can't boot because of tsc
- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.
- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.16.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.19.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.20.2.1 10-Aug-2014  tls Rebase.
 1.22.24.3 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.22.24.2 21-Apr-2020  martin Sync with HEAD
 1.22.24.1 10-Jun-2019  christos Sync with HEAD
 1.22.22.1 28-Jul-2018  pgoyette Sync with HEAD
 1.23.8.1 14-Apr-2020  bouyer Force _atomic_cas_cx8 only for XENPV; x86_patch works fine for (PV)HVM
 1.36.8.1 02-Aug-2025  perseant Sync with HEAD
 1.36.2.1 20-Jul-2024  martin Pull up following revision(s) (requested by riastradh in ticket #764):

common/lib/libc/arch/i386/atomic/atomic.S: revision 1.37
sys/arch/xen/include/xenring.h: revision 1.8
sys/arch/i386/i386/cpufunc.S: revision 1.52
sys/arch/amd64/amd64/cpufunc.S: revision 1.68
sys/arch/xen/include/hypervisor.h: revision 1.60
common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.30

xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.

Both xen_mb and membar_sync are designed to provide store-before-load
ordering, but xen_mb has to provide it in synchronizing guest with
hypervisor, while membar_sync only has to provide it in synchronizing
one (guest) CPU with another (guest) CPU.

It is safe to hotpatch away the LOCK prefix in membar_sync on a
uniprocessor boot because membar_sync is only designed to coordinate
between normal memory on multiple CPUs, and is never necessary when
there's only one CPU involved.

But xen_mb is used to coordinate between the guest and the `device'
implemented by a hypervisor, which might be running on another
_physical_ CPU even if the NetBSD guest only sees one `CPU', i.e.,
one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must
still issue an instruction with store-before-load ordering on
multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is
costlier for no benefit here).

No need to change xen_wmb (release ordering, load/store-before-store)
or xen_rmb (acquire ordering, load-before-load/store) because every
x86 store is a store-release and every x86 load is a load-acquire,
even on multiprocessor systems, so there's no hotpatching involved
anyway.

PR kern/57199
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_add.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_and.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_cas_32.S was initially added on branch thorpej-atomic.
 1.1.2.5 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.4 17-Apr-2007  thorpej Fix the end-of-function padding so both versions of cas-32 end up the
same size.
 1.1.2.3 17-Apr-2007  thorpej Give the namespace treatment to _atomic_cas_32_486(), too.
 1.1.2.2 17-Apr-2007  thorpej Add the necessary aliasa for _atomic_cas_32().
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_dec.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_inc.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_op_asm.h was initially added on branch thorpej-atomic.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_or.S was initially added on branch thorpej-atomic.
 1.1.2.3 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.2 17-Apr-2007  thorpej Fix copy-and-pasto.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file atomic_swap.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.
 1.1 16-Apr-2007  thorpej branches: 1.1.2;
file membar_ops.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 16-Apr-2007  thorpej Add atomic op implementations for x86.

RSS XML Feed