Cross Reference: /src/common/lib/libc/arch/x86

History log of /src/common/lib/libc/arch/x86_64/atomic/atomic.S
Revision	Date	Author	Comments
1.32	06-Sep-2025	riastradh	paravirt_membar_sync(9): New memory barrier. For use in paravirtualized drivers which require store-before-load ordering -- irrespective of whether the kernel is built for a single processor, or whether the (virtual) machine is booted with a single processor. This is even required on architectures that don't even have a store-before-load ordering barrier, like m68k; adding, e.g., a virtio bus is _as if_ the architecture has been extended with relaxed memory ordering when talking with that new bus. Such architectures need some way to request the hypervisor enforce that ordering -- on m68k, that's done by issuing a CASL instruction, which qemu maps to an atomic r/m/w with sequential consistency ordering in the host. PR kern/59618: occasional virtio block device lock ups/hangs
1.31	16-Jul-2024	riastradh	amd64: Fix performance regression in uniprocessor atomics/membars. Back in 2022, I eliminated the MFENCE hotpatch in membar_sync because it's essentially always more expensive than LOCK ADD with no benefit for CPU/CPU store-before-load ordering. (It is relevant only for non-temporal stores or write-combining memory.) https://mail-index.netbsd.org/source-changes/2022/07/30/msg140047.html But in that change, I made a mistake and _also_ eliminated the LOCK hotpatch on uniprocessor amd64. And our assembler gas helpfully interprets uppercase LOCK just like lowercase lock and assembles them the same way, so I didn't notice. This change restores the LOCK hotpatch, so that when booting on a uniprocessor system (or a uniprocessor guest on a multicore host), the LOCK prefix is replaced by NOP for a cheaper instruction. Found by puzzling over how my explanation for PR kern/57199 could possibly be correct when (on an amd64 guest) ddb x/i membar_sync kept showing the lock prefix even in uniprocessor boots.
1.30	16-Jul-2024	riastradh	xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots. Both xen_mb and membar_sync are designed to provide store-before-load ordering, but xen_mb has to provide it in synchronizing guest with hypervisor, while membar_sync only has to provide it in synchronizing one (guest) CPU with another (guest) CPU. It is safe to hotpatch away the LOCK prefix in membar_sync on a uniprocessor boot because membar_sync is only designed to coordinate between normal memory on multiple CPUs, and is never necessary when there's only one CPU involved. But xen_mb is used to coordinate between the guest and the `device' implemented by a hypervisor, which might be running on another _physical_ CPU even if the NetBSD guest only sees one `CPU', i.e., one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must still issue an instruction with store-before-load ordering on multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is costlier for no benefit here). No need to change xen_wmb (release ordering, load/store-before-store) or xen_rmb (acquire ordering, load-before-load/store) because every x86 store is a store-release and every x86 load is a load-acquire, even on multiprocessor systems, so there's no hotpatching involved anyway. PR kern/57199
1.29	30-Jul-2022	riastradh	branches: 1.29.2; 1.29.8; x86: Eliminate mfence hotpatch for membar_sync. The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper than MFENCE anyway. Let's save some space and maintenance and rip out the hotpatching for it.
1.28	09-Apr-2022	riastradh	Introduce membar_acquire/release. Deprecate membar_enter/exit. The names membar_enter/exit were unclear, and the documentation of membar_enter has disagreed with the implementations on sparc, powerpc, and even x86(!) for the entire time it has been in NetBSD. The terms `acquire' and `release' are ubiquitous in the literature today, and have been adopted in the C and C++ standards to mean load-before-load/store and load/store-before-store, respectively, which are exactly the orderings required by acquiring and releasing a mutex, as well as other useful applications like decrementing a reference count and then freeing the underlying object if it went to zero. Originally I proposed changing one word in the documentation for membar_enter to make it load-before-load/store instead of store-before-load/store, i.e., to make it an acquire barrier. I proposed this on the grounds that (a) all implementations guarantee load-before-load/store, (b) some implementations fail to guarantee store-before-load/store, and (c) all uses in-tree assume load-before-load/store. I verified parts (a) and (b) (except, for (a), powerpc didn't even guarantee load-before-load/store -- isync isn't necessarily enough; need lwsync in general -- but it _almost_ did, and it certainly didn't guarantee store-before-load/store). Part (c) might not be correct, however: under the mistaken assumption that atomic-r/m/w then membar-w/rw is equivalent to atomic-r/m/w then membar-r/rw, I only audited the cases of membar_enter that _aren't_ immediately after an atomic-r/m/w. All of those cases assume load-before-load/store. But my assumption was wrong -- there are cases of atomic-r/m/w then membar-w/rw that would be broken by changing to atomic-r/m/w then membar-r/rw: https://mail-index.netbsd.org/tech-kern/2022/03/29/msg028044.html Furthermore, the name membar_enter has been adopted in other places like OpenBSD where it actually does follow the documentation and guarantee store-before-load/store, even if that order is not useful. So the name membar_enter currently lives in a bad place where it means either of two things -- r/rw or w/rw. With this change, we deprecate membar_enter/exit, introduce membar_acquire/release as better names for the useful pair (r/rw and rw/w), and make sure the implementation of membar_enter guarantees both what was documented _and_ what was implemented, making it an alias for membar_sync. While here, rework all of the membar_* definitions and aliases. The new logic follows a rule to make it easier to audit: membar_X is defined as an alias for membar_Y iff membar_X is guaranteed by membar_Y. The `no stronger than' relation is (the transitive closure of): - membar_consumer (r/r) is guaranteed by membar_acquire (r/rw) - membar_producer (w/w) is guaranteed by membar_release (rw/w) - membar_acquire (r/rw) is guaranteed by membar_sync (rw/rw) - membar_release (rw/w) is guaranteed by membar_sync (rw/rw) And, for the deprecated membars: - membar_enter (whether r/rw, w/rw, or rw/rw) is guaranteed by membar_sync (rw/rw) - membar_exit (rw/w) is guaranteed by membar_release (rw/w) (membar_exit is identical to membar_release, but the name is deprecated.) Finally, while here, annotate some of the instructions with their semantics. For powerpc, leave an essay with citations on the unfortunate but -- as far as I can tell -- necessary decision to use lwsync, not isync, for membar_acquire and membar_consumer. Also add membar(3) and atomic(3) man page links.
1.27	09-Apr-2022	riastradh	x86_64/membar_ops: Upgrade membar_enter from R/RW to RW/RW. This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).
1.26	09-Apr-2022	riastradh	x86: Add a note on membar_sync and mfence.
1.25	09-Apr-2022	riastradh	x86: Omit needless store in membar_producer/exit. On x86, every store is a store-release, so there is no need for any barrier. But this wasn't a barrier anyway; it was just a store, which was redundant with the store of the return address to the stack implied by CALL even if issuing a store made a difference.
1.24	09-Apr-2022	riastradh	x86: Every load is a load-acquire, so membar_consumer is a noop. lfence is only needed for MD logic, such as operations on I/O memory rather than normal cacheable memory, or special instructions like RDTSC -- never for MI synchronization between threads/CPUs. No need for hot-patching to do lfence here. (The x86_lfence function might reasonably be patched on i386 to do lfence for MD logic, but it isn't now and this doesn't change that.)
1.23	06-Apr-2022	riastradh	Nix trailing whitespace in files of membars, atomics, and lock stubs. Will be touching many of these files soon for functional changes. No functional change intended.
1.22	26-Apr-2020	maxv	Use the hotpatch framework for LFENCE/MFENCE.
1.21	26-Apr-2020	maxv	Remove unused argument in macro.
1.20	26-Apr-2020	maxv	Remove unused.
1.19	26-Apr-2020	maxv	Drop the hardcoded array, use the hotpatch section.
1.18	17-Feb-2019	isaki	Add missing export of atomic_or_64 (since rev1.1).
1.17	22-May-2014	uebayasi	branches: 1.17.24; Put missing END() markers to set ELF symbol size.
1.16	12-Jan-2011	joerg	branches: 1.16.12; 1.16.24; Allow use of traditional CPP to be set on a per platform base in sys.mk. Honour this for dependency processing in bsd.dep.mk. Switch i386 and amd64 assembly to use ISO C90 preprocessor concat and drop the -traditional-cpp on this platform.
1.15	26-Nov-2009	pooka	Use strong alias within the kernel namespace regardless of if we're dealing with a hard or soft kernel (kernel linker doesn't support weak symbols).
1.14	12-Jan-2009	pooka	include sys/param.h for _HARDKERNEL instead of homegrown def.
1.13	04-Jan-2009	pooka	Do not use lockpatches with _RUMPKERNEL (non-PIC ... and pointless).
1.12	25-May-2008	chs	enable profiling of assembly functions.
1.11	03-May-2008	yamt	branches: 1.11.2; rename END to ENDLABEL. i'll use END for other purpose. ok by Andrew Doran.
1.10	28-Apr-2008	martin	Remove clause 3 and 4 from TNF licenses
1.9	10-Feb-2008	ad	branches: 1.9.4; Add atomic_cas_foo_ni().
1.8	09-Feb-2008	ad	membar_enter was doing the wrong thing. For x86 we can alias: membar_enter -> membar_consumer membar_exit -> membar_producer
1.7	10-Dec-2007	ad	branches: 1.7.4; Fix _atomic_cas_64. Noted by bouyer@.
1.6	09-Dec-2007	ad	Add missing strong aliases; sure I did this before?
1.5	29-Nov-2007	ad	Fix ia32 -> amd64 thinko.
1.4	29-Nov-2007	ad	atomic_add_* takes signed integers, the others take unsigned.
1.3	28-Nov-2007	ad	A lock prefix on xchg is meaningless.
1.2	28-Nov-2007	ad	Fix up a few minor problems.
1.1	28-Nov-2007	ad	x86 atomic ops.
1.7.4.3	23-Mar-2008	matt	sync with HEAD
1.7.4.2	09-Jan-2008	matt	sync with HEAD
1.7.4.1	10-Dec-2007	matt	file atomic.S was added on branch matt-armv6 on 2008-01-09 01:21:14 +0000
1.9.4.2	04-Jun-2008	yamt	sync with head
1.9.4.1	18-May-2008	yamt	sync with head.
1.11.2.1	23-Jun-2008	wrstuden	Sync w/ -current. 34 merge conflicts to follow.
1.16.24.1	10-Aug-2014	tls	Rebase.
1.16.12.1	19-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.17.24.3	21-Apr-2020	martin	Ooops, restore accidently removed files from merge mishap
1.17.24.2	21-Apr-2020	martin	Sync with HEAD
1.17.24.1	10-Jun-2019	christos	Sync with HEAD
1.29.8.1	02-Aug-2025	perseant	Sync with HEAD
1.29.2.1	20-Jul-2024	martin	Pull up following revision(s) (requested by riastradh in ticket #764): common/lib/libc/arch/i386/atomic/atomic.S: revision 1.37 sys/arch/xen/include/xenring.h: revision 1.8 sys/arch/i386/i386/cpufunc.S: revision 1.52 sys/arch/amd64/amd64/cpufunc.S: revision 1.68 sys/arch/xen/include/hypervisor.h: revision 1.60 common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.30 xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots. Both xen_mb and membar_sync are designed to provide store-before-load ordering, but xen_mb has to provide it in synchronizing guest with hypervisor, while membar_sync only has to provide it in synchronizing one (guest) CPU with another (guest) CPU. It is safe to hotpatch away the LOCK prefix in membar_sync on a uniprocessor boot because membar_sync is only designed to coordinate between normal memory on multiple CPUs, and is never necessary when there's only one CPU involved. But xen_mb is used to coordinate between the guest and the `device' implemented by a hypervisor, which might be running on another _physical_ CPU even if the NetBSD guest only sees one `CPU', i.e., one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must still issue an instruction with store-before-load ordering on multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is costlier for no benefit here). No need to change xen_wmb (release ordering, load/store-before-store) or xen_rmb (acquire ordering, load-before-load/store) because every x86 store is a store-release and every x86 load is a load-acquire, even on multiprocessor systems, so there's no hotpatching involved anyway. PR kern/57199

OpenGrok