Home | History | Annotate | only in /src/common/lib/libc/arch/x86_64
History log of /src/common/lib/libc/arch/x86_64
RevisionDateAuthorComments
 1.6 04-Jan-2009  pooka branches: 1.6.50;
allow inclusion of atomic ops in librump
 1.5 29-Sep-2008  ad Allow atomic ops to be built as part of libpthread.
 1.4 11-Feb-2008  ad Only build atomic ops for libkern/libc.
 1.3 10-Feb-2008  ad Enable the atomic ops in userspace.
 1.2 28-Nov-2007  ad branches: 1.2.4;
x86 atomic ops.
 1.1 18-Apr-2007  thorpej branches: 1.1.2;
file Makefile.inc was initially added on branch thorpej-atomic.
 1.1.2.2 26-Apr-2007  thorpej *_nv() ops implemented in terms of CMPXCHG directly. This is much tighter
code than what GCC can genererate for the generic-in-C versions.
 1.1.2.1 18-Apr-2007  thorpej Build glue for amd64.
 1.2.4.3 23-Mar-2008  matt sync with HEAD
 1.2.4.2 09-Jan-2008  matt sync with HEAD
 1.2.4.1 28-Nov-2007  matt file Makefile.inc was added on branch matt-armv6 on 2008-01-09 01:21:14 +0000
 1.6.50.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.6.50.1 21-Apr-2020  martin Sync with HEAD
 1.32 06-Sep-2025  riastradh paravirt_membar_sync(9): New memory barrier.

For use in paravirtualized drivers which require store-before-load
ordering -- irrespective of whether the kernel is built for a single
processor, or whether the (virtual) machine is booted with a single
processor.

This is even required on architectures that don't even have a
store-before-load ordering barrier, like m68k; adding, e.g., a virtio
bus is _as if_ the architecture has been extended with relaxed memory
ordering when talking with that new bus. Such architectures need
some way to request the hypervisor enforce that ordering -- on m68k,
that's done by issuing a CASL instruction, which qemu maps to an
atomic r/m/w with sequential consistency ordering in the host.

PR kern/59618: occasional virtio block device lock ups/hangs
 1.31 16-Jul-2024  riastradh branches: 1.31.2;
amd64: Fix performance regression in uniprocessor atomics/membars.

Back in 2022, I eliminated the MFENCE hotpatch in membar_sync because
it's essentially always more expensive than LOCK ADD with no benefit
for CPU/CPU store-before-load ordering. (It is relevant only for
non-temporal stores or write-combining memory.)

https://mail-index.netbsd.org/source-changes/2022/07/30/msg140047.html

But in that change, I made a mistake and _also_ eliminated the LOCK
hotpatch on uniprocessor amd64. And our assembler gas helpfully
interprets uppercase LOCK just like lowercase lock and assembles them
the same way, so I didn't notice.

This change restores the LOCK hotpatch, so that when booting on a
uniprocessor system (or a uniprocessor guest on a multicore host),
the LOCK prefix is replaced by NOP for a cheaper instruction.

Found by puzzling over how my explanation for PR kern/57199 could
possibly be correct when (on an amd64 guest) ddb x/i membar_sync kept
showing the lock prefix even in uniprocessor boots.
 1.30 16-Jul-2024  riastradh xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.

Both xen_mb and membar_sync are designed to provide store-before-load
ordering, but xen_mb has to provide it in synchronizing guest with
hypervisor, while membar_sync only has to provide it in synchronizing
one (guest) CPU with another (guest) CPU.

It is safe to hotpatch away the LOCK prefix in membar_sync on a
uniprocessor boot because membar_sync is only designed to coordinate
between normal memory on multiple CPUs, and is never necessary when
there's only one CPU involved.

But xen_mb is used to coordinate between the guest and the `device'
implemented by a hypervisor, which might be running on another
_physical_ CPU even if the NetBSD guest only sees one `CPU', i.e.,
one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must
still issue an instruction with store-before-load ordering on
multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is
costlier for no benefit here).

No need to change xen_wmb (release ordering, load/store-before-store)
or xen_rmb (acquire ordering, load-before-load/store) because every
x86 store is a store-release and every x86 load is a load-acquire,
even on multiprocessor systems, so there's no hotpatching involved
anyway.

PR kern/57199
 1.29 30-Jul-2022  riastradh branches: 1.29.2; 1.29.8;
x86: Eliminate mfence hotpatch for membar_sync.

The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper
than MFENCE anyway. Let's save some space and maintenance and rip
out the hotpatching for it.
 1.28 09-Apr-2022  riastradh Introduce membar_acquire/release. Deprecate membar_enter/exit.

The names membar_enter/exit were unclear, and the documentation of
membar_enter has disagreed with the implementations on sparc,
powerpc, and even x86(!) for the entire time it has been in NetBSD.

The terms `acquire' and `release' are ubiquitous in the literature
today, and have been adopted in the C and C++ standards to mean
load-before-load/store and load/store-before-store, respectively,
which are exactly the orderings required by acquiring and releasing a
mutex, as well as other useful applications like decrementing a
reference count and then freeing the underlying object if it went to
zero.

Originally I proposed changing one word in the documentation for
membar_enter to make it load-before-load/store instead of
store-before-load/store, i.e., to make it an acquire barrier. I
proposed this on the grounds that

(a) all implementations guarantee load-before-load/store,
(b) some implementations fail to guarantee store-before-load/store,
and
(c) all uses in-tree assume load-before-load/store.

I verified parts (a) and (b) (except, for (a), powerpc didn't even
guarantee load-before-load/store -- isync isn't necessarily enough;
need lwsync in general -- but it _almost_ did, and it certainly didn't
guarantee store-before-load/store).

Part (c) might not be correct, however: under the mistaken assumption
that atomic-r/m/w then membar-w/rw is equivalent to atomic-r/m/w then
membar-r/rw, I only audited the cases of membar_enter that _aren't_
immediately after an atomic-r/m/w. All of those cases assume
load-before-load/store. But my assumption was wrong -- there are
cases of atomic-r/m/w then membar-w/rw that would be broken by
changing to atomic-r/m/w then membar-r/rw:

https://mail-index.netbsd.org/tech-kern/2022/03/29/msg028044.html

Furthermore, the name membar_enter has been adopted in other places
like OpenBSD where it actually does follow the documentation and
guarantee store-before-load/store, even if that order is not useful.
So the name membar_enter currently lives in a bad place where it
means either of two things -- r/rw or w/rw.

With this change, we deprecate membar_enter/exit, introduce
membar_acquire/release as better names for the useful pair (r/rw and
rw/w), and make sure the implementation of membar_enter guarantees
both what was documented _and_ what was implemented, making it an
alias for membar_sync.

While here, rework all of the membar_* definitions and aliases. The
new logic follows a rule to make it easier to audit:

membar_X is defined as an alias for membar_Y iff membar_X is
guaranteed by membar_Y.

The `no stronger than' relation is (the transitive closure of):

- membar_consumer (r/r) is guaranteed by membar_acquire (r/rw)
- membar_producer (w/w) is guaranteed by membar_release (rw/w)
- membar_acquire (r/rw) is guaranteed by membar_sync (rw/rw)
- membar_release (rw/w) is guaranteed by membar_sync (rw/rw)

And, for the deprecated membars:

- membar_enter (whether r/rw, w/rw, or rw/rw) is guaranteed by
membar_sync (rw/rw)
- membar_exit (rw/w) is guaranteed by membar_release (rw/w)

(membar_exit is identical to membar_release, but the name is
deprecated.)

Finally, while here, annotate some of the instructions with their
semantics. For powerpc, leave an essay with citations on the
unfortunate but -- as far as I can tell -- necessary decision to use
lwsync, not isync, for membar_acquire and membar_consumer.

Also add membar(3) and atomic(3) man page links.
 1.27 09-Apr-2022  riastradh x86_64/membar_ops: Upgrade membar_enter from R/RW to RW/RW.

This will be deprecated soon but let's avoid leaving rakes to trip on
with it arising from disagreement over the documentation (W/RW) and
implementation and usage (R/RW).
 1.26 09-Apr-2022  riastradh x86: Add a note on membar_sync and mfence.
 1.25 09-Apr-2022  riastradh x86: Omit needless store in membar_producer/exit.

On x86, every store is a store-release, so there is no need for any
barrier. But this wasn't a barrier anyway; it was just a store,
which was redundant with the store of the return address to the stack
implied by CALL even if issuing a store made a difference.
 1.24 09-Apr-2022  riastradh x86: Every load is a load-acquire, so membar_consumer is a noop.

lfence is only needed for MD logic, such as operations on I/O memory
rather than normal cacheable memory, or special instructions like
RDTSC -- never for MI synchronization between threads/CPUs. No need
for hot-patching to do lfence here.

(The x86_lfence function might reasonably be patched on i386 to do
lfence for MD logic, but it isn't now and this doesn't change that.)
 1.23 06-Apr-2022  riastradh Nix trailing whitespace in files of membars, atomics, and lock stubs.

Will be touching many of these files soon for functional changes.

No functional change intended.
 1.22 26-Apr-2020  maxv Use the hotpatch framework for LFENCE/MFENCE.
 1.21 26-Apr-2020  maxv Remove unused argument in macro.
 1.20 26-Apr-2020  maxv Remove unused.
 1.19 26-Apr-2020  maxv Drop the hardcoded array, use the hotpatch section.
 1.18 17-Feb-2019  isaki Add missing export of atomic_or_64 (since rev1.1).
 1.17 22-May-2014  uebayasi branches: 1.17.24;
Put missing END() markers to set ELF symbol size.
 1.16 12-Jan-2011  joerg branches: 1.16.12; 1.16.24;
Allow use of traditional CPP to be set on a per platform base in sys.mk.
Honour this for dependency processing in bsd.dep.mk. Switch i386 and
amd64 assembly to use ISO C90 preprocessor concat and drop the
-traditional-cpp on this platform.
 1.15 26-Nov-2009  pooka Use strong alias within the kernel namespace regardless of if we're
dealing with a hard or soft kernel (kernel linker doesn't support
weak symbols).
 1.14 12-Jan-2009  pooka include sys/param.h for _HARDKERNEL instead of homegrown def.
 1.13 04-Jan-2009  pooka Do not use lockpatches with _RUMPKERNEL (non-PIC ... and pointless).
 1.12 25-May-2008  chs enable profiling of assembly functions.
 1.11 03-May-2008  yamt branches: 1.11.2;
rename END to ENDLABEL. i'll use END for other purpose. ok by Andrew Doran.
 1.10 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.9 10-Feb-2008  ad branches: 1.9.4;
Add atomic_cas_foo_ni().
 1.8 09-Feb-2008  ad membar_enter was doing the wrong thing. For x86 we can alias:

membar_enter -> membar_consumer
membar_exit -> membar_producer
 1.7 10-Dec-2007  ad branches: 1.7.4;
Fix _atomic_cas_64. Noted by bouyer@.
 1.6 09-Dec-2007  ad Add missing strong aliases; sure I did this before?
 1.5 29-Nov-2007  ad Fix ia32 -> amd64 thinko.
 1.4 29-Nov-2007  ad atomic_add_* takes signed integers, the others take unsigned.
 1.3 28-Nov-2007  ad A lock prefix on xchg is meaningless.
 1.2 28-Nov-2007  ad Fix up a few minor problems.
 1.1 28-Nov-2007  ad x86 atomic ops.
 1.7.4.3 23-Mar-2008  matt sync with HEAD
 1.7.4.2 09-Jan-2008  matt sync with HEAD
 1.7.4.1 10-Dec-2007  matt file atomic.S was added on branch matt-armv6 on 2008-01-09 01:21:14 +0000
 1.9.4.2 04-Jun-2008  yamt sync with head
 1.9.4.1 18-May-2008  yamt sync with head.
 1.11.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.16.24.1 10-Aug-2014  tls Rebase.
 1.16.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.24.3 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.17.24.2 21-Apr-2020  martin Sync with HEAD
 1.17.24.1 10-Jun-2019  christos Sync with HEAD
 1.29.8.1 02-Aug-2025  perseant Sync with HEAD
 1.29.2.2 19-Oct-2025  martin Pull up following revision(s) (requested by riastradh in ticket #60):

sys/arch/sparc/sparc/locore.s: revision 1.287
share/man/man9/Makefile: revision 1.475
sys/arch/mips/mips/cpu_subr.c: revision 1.65
sys/arch/mips/mips/cpu_subr.c: revision 1.66
sys/arch/amd64/amd64/cpufunc.S: revision 1.70
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.38
common/lib/libc/arch/sparc/atomic/membar_ops.S: revision 1.9
sys/arch/hppa/hppa/support.S: revision 1.9
sys/arch/alpha/alpha/locore.s: revision 1.145
share/man/man9/paravirt_membar_sync.9: revision 1.1
sys/arch/sparc64/sparc64/locore.s: revision 1.436
distrib/sets/lists/comp/mi: revision 1.2499
sys/arch/i386/i386/cpufunc.S: revision 1.54
common/lib/libc/arch/sparc64/atomic/membar_ops.S: revision 1.10
sys/sys/paravirt_membar.h: revision 1.1
sys/arch/arm/arm/cpu_subr.c: revision 1.6
common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.32
(all via patch)

paravirt_membar_sync(9): New memory barrier.

For use in paravirtualized drivers which require store-before-load
ordering -- irrespective of whether the kernel is built for a single
processor, or whether the (virtual) machine is booted with a single
processor.

This is even required on architectures that don't even have a
store-before-load ordering barrier, like m68k; adding, e.g., a virtio
bus is _as if_ the architecture has been extended with relaxed memory
ordering when talking with that new bus. Such architectures need
some way to request the hypervisor enforce that ordering -- on m68k,
that's done by issuing a CASL instruction, which qemu maps to an
atomic r/m/w with sequential consistency ordering in the host.

PR kern/59618: occasional virtio block device lock ups/hangs

mips: Fix asm arch options in new paravirt_membar_sync.
Need to explicitly enable mips2 (MIPS-II) instructions in order to
use sync. Fixes:
/tmp/ccxgOmXc.s: Assembler messages:
/tmp/ccxgOmXc.s:3576: Error: opcode not supported on this processor: mips1 (mips1) `sync'
--- cpu_subr.o ---
*** Failed target: cpu_subr.o

PR kern/59618: occasional virtio block device lock ups/hangs
 1.29.2.1 20-Jul-2024  martin Pull up following revision(s) (requested by riastradh in ticket #764):

common/lib/libc/arch/i386/atomic/atomic.S: revision 1.37
sys/arch/xen/include/xenring.h: revision 1.8
sys/arch/i386/i386/cpufunc.S: revision 1.52
sys/arch/amd64/amd64/cpufunc.S: revision 1.68
sys/arch/xen/include/hypervisor.h: revision 1.60
common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.30

xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.

Both xen_mb and membar_sync are designed to provide store-before-load
ordering, but xen_mb has to provide it in synchronizing guest with
hypervisor, while membar_sync only has to provide it in synchronizing
one (guest) CPU with another (guest) CPU.

It is safe to hotpatch away the LOCK prefix in membar_sync on a
uniprocessor boot because membar_sync is only designed to coordinate
between normal memory on multiple CPUs, and is never necessary when
there's only one CPU involved.

But xen_mb is used to coordinate between the guest and the `device'
implemented by a hypervisor, which might be running on another
_physical_ CPU even if the NetBSD guest only sees one `CPU', i.e.,
one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must
still issue an instruction with store-before-load ordering on
multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is
costlier for no benefit here).

No need to change xen_wmb (release ordering, load/store-before-store)
or xen_rmb (acquire ordering, load-before-load/store) because every
x86 store is a store-release and every x86 load is a load-acquire,
even on multiprocessor systems, so there's no hotpatching involved
anyway.

PR kern/57199
 1.31.2.1 19-Oct-2025  martin Pull up following revision(s) (requested by riastradh in ticket #60):

sys/arch/sparc/sparc/locore.s: revision 1.287
share/man/man9/Makefile: revision 1.475
sys/arch/mips/mips/cpu_subr.c: revision 1.65
sys/arch/riscv/riscv/cpu_subr.c: revision 1.6
sys/arch/mips/mips/cpu_subr.c: revision 1.66
sys/arch/amd64/amd64/cpufunc.S: revision 1.70
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.38
common/lib/libc/arch/sparc/atomic/membar_ops.S: revision 1.9
sys/arch/hppa/hppa/support.S: revision 1.9
sys/arch/alpha/alpha/locore.s: revision 1.145
share/man/man9/paravirt_membar_sync.9: revision 1.1
sys/arch/sparc64/sparc64/locore.s: revision 1.436
distrib/sets/lists/comp/mi: revision 1.2499
sys/arch/i386/i386/cpufunc.S: revision 1.54
common/lib/libc/arch/sparc64/atomic/membar_ops.S: revision 1.10
sys/sys/paravirt_membar.h: revision 1.1
sys/arch/arm/arm/cpu_subr.c: revision 1.6
sys/arch/virt68k/virt68k/locore.s: revision 1.17
common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.32

paravirt_membar_sync(9): New memory barrier.

For use in paravirtualized drivers which require store-before-load
ordering -- irrespective of whether the kernel is built for a single
processor, or whether the (virtual) machine is booted with a single
processor.

This is even required on architectures that don't even have a
store-before-load ordering barrier, like m68k; adding, e.g., a virtio
bus is _as if_ the architecture has been extended with relaxed memory
ordering when talking with that new bus. Such architectures need
some way to request the hypervisor enforce that ordering -- on m68k,
that's done by issuing a CASL instruction, which qemu maps to an
atomic r/m/w with sequential consistency ordering in the host.

PR kern/59618: occasional virtio block device lock ups/hangs

mips: Fix asm arch options in new paravirt_membar_sync.
Need to explicitly enable mips2 (MIPS-II) instructions in order to
use sync. Fixes:
/tmp/ccxgOmXc.s: Assembler messages:
/tmp/ccxgOmXc.s:3576: Error: opcode not supported on this processor: mips1 (mips1) `sync'
--- cpu_subr.o ---
*** Failed target: cpu_subr.o

PR kern/59618: occasional virtio block device lock ups/hangs
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_add.S was initially added on branch thorpej-atomic.
 1.1.2.3 26-Apr-2007  thorpej *_nv() ops implemented in terms of CMPXCHG directly. This is much tighter
code than what GCC can genererate for the generic-in-C versions.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_and.S was initially added on branch thorpej-atomic.
 1.1.2.3 26-Apr-2007  thorpej *_nv() ops implemented in terms of CMPXCHG directly. This is much tighter
code than what GCC can genererate for the generic-in-C versions.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_cas.S was initially added on branch thorpej-atomic.
 1.1.2.3 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.2 17-Apr-2007  thorpej Fix typo.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_dec.S was initially added on branch thorpej-atomic.
 1.1.2.3 26-Apr-2007  thorpej *_nv() ops implemented in terms of CMPXCHG directly. This is much tighter
code than what GCC can genererate for the generic-in-C versions.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_inc.S was initially added on branch thorpej-atomic.
 1.1.2.3 26-Apr-2007  thorpej *_nv() ops implemented in terms of CMPXCHG directly. This is much tighter
code than what GCC can genererate for the generic-in-C versions.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_op_asm.h was initially added on branch thorpej-atomic.
 1.1.2.2 17-Apr-2007  thorpej amd64 doesn't have opt_cputype.h
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_or.S was initially added on branch thorpej-atomic.
 1.1.2.3 26-Apr-2007  thorpej *_nv() ops implemented in terms of CMPXCHG directly. This is much tighter
code than what GCC can genererate for the generic-in-C versions.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file atomic_swap.S was initially added on branch thorpej-atomic.
 1.1.2.2 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.1 17-Apr-2007  thorpej branches: 1.1.2;
file membar_ops.S was initially added on branch thorpej-atomic.
 1.1.2.4 22-Apr-2007  thorpej Make sure namespace-cleansed aliases are avaialble for all atomic ops.
 1.1.2.3 17-Apr-2007  thorpej Tidy up a comment.
 1.1.2.2 17-Apr-2007  thorpej amd64 doesn't have opt_cputype.h
 1.1.2.1 17-Apr-2007  thorpej First draft implementation of atomic ops for amd64
 1.3 22-May-2014  uebayasi branches: 1.3.24;
Put missing END() markers to set ELF symbol size.
 1.2 04-Feb-2006  uwe branches: 1.2.56; 1.2.70;
libc wants __bswapNN, kernel wants bswapNN. That was not accounted
for during the merge of kernel and libc versions. Fix to match
e.g. i386 code.
 1.1 20-Dec-2005  christos Merge libkern + libc common files. As requested by core.
 1.2.70.1 10-Aug-2014  tls Rebase.
 1.2.56.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.24.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.3.24.1 21-Apr-2020  martin Sync with HEAD
 1.3 22-May-2014  uebayasi branches: 1.3.24;
Put missing END() markers to set ELF symbol size.
 1.2 04-Feb-2006  uwe branches: 1.2.56; 1.2.70;
libc wants __bswapNN, kernel wants bswapNN. That was not accounted
for during the merge of kernel and libc versions. Fix to match
e.g. i386 code.
 1.1 20-Dec-2005  christos Merge libkern + libc common files. As requested by core.
 1.2.70.1 10-Aug-2014  tls Rebase.
 1.2.56.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.24.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.3.24.1 21-Apr-2020  martin Sync with HEAD
 1.2 22-May-2014  uebayasi branches: 1.2.24;
Put missing END() markers to set ELF symbol size.
 1.1 14-Jan-2010  joerg branches: 1.1.12; 1.1.24;
Move AMD64's bswap64 implementation from libc to src/common and share it
with the kernel.
 1.1.24.1 10-Aug-2014  tls Rebase.
 1.1.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.24.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.2.24.1 21-Apr-2020  martin Sync with HEAD
 1.5 27-Jan-2020  ad x86 uses the C versions of bcmp() and memcmp() now.
 1.4 15-Jan-2020  ad Rewrite bcmp() & memcmp() to not use REP CMPS. Seems about 5-10x faster for
small strings on modern hardware.
 1.3 22-Mar-2014  jakllsch branches: 1.3.26; 1.3.30;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.2 12-Nov-2007  ad branches: 1.2.28; 1.2.34;
Don't unconditionally clear the direction flag. The ABI says it must always
be clear when making a function call, and 'cld' takes about 50 clock cyles
on the P4.
 1.1 20-Dec-2005  christos branches: 1.1.18;
Merge libkern + libc common files. As requested by core.
 1.1.18.1 09-Jan-2008  matt sync with HEAD
 1.2.34.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.28.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.30.1 29-Feb-2020  ad Sync with head.
 1.3.26.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.5 22-Mar-2014  jakllsch branches: 1.5.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.4 22-Nov-2009  dsl branches: 1.4.6; 1.4.12;
Align to the destination buffer.
This probably costs 1 clock (on modern cpus) in the normal case.
But gives a big benefit when the destination is misaligned.
In particular when the source has the same misalignment - although
that may not be a gain on Nehalem!
Fixes PR/35535
 1.3 21-Nov-2009  dsl Avoid doing two 'rep movs' operations.
 1.2 12-Nov-2007  ad Don't unconditionally clear the direction flag. The ABI says it must always
be clear when making a function call, and 'cld' takes about 50 clock cyles
on the P4.
 1.1 20-Dec-2005  christos branches: 1.1.18;
Merge libkern + libc common files. As requested by core.
 1.1.18.1 09-Jan-2008  matt sync with HEAD
 1.4.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.5.26.1 21-Apr-2020  martin Sync with HEAD
 1.4 23-Nov-2013  jakllsch Remove x86_64 bzero.S; which since 2009 has only contained instructions
that it should be removed in 2010.
 1.3 01-Aug-2009  dsl branches: 1.3.6; 1.3.12;
Remove some long dependant instruction sequences (ie allow parallel code).
Since 'rep stos' will have a long setup time, avoid doing it more than once.
For misaligned (start address or length) write an unaligned word at both
ends of the buffer then aligned 'rep stosd' the middle.
Use the same code for bzero().
bzero.S is left being compiled for a while (empty) - to avoid issues with
duplicate symbols in libc.a after update builds.
 1.2 12-Nov-2007  ad Don't unconditionally clear the direction flag. The ABI says it must always
be clear when making a function call, and 'cld' takes about 50 clock cyles
on the P4.
 1.1 20-Dec-2005  christos branches: 1.1.18;
Merge libkern + libc common files. As requested by core.
 1.1.18.1 09-Jan-2008  matt sync with HEAD
 1.3.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5 22-Mar-2014  jakllsch branches: 1.5.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.4 20-Jul-2009  christos branches: 1.4.6; 1.4.12;
Put back dsl's string changes, but fix memchr.S to use cmp so that the
condition code is set (and fix the comments 0x10->0x01). From Anon Ymous
We need a test for memchr(x, -1)...
 1.3 19-Jul-2009  christos revert changes that made new kernels hang in ACPI detection
 1.2 18-Jul-2009  dsl Remove a pointless _ALIGN_TEXT.
XXX ffs() ought to be a gcc inline asm.
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.4.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.5.26.1 21-Apr-2020  martin Sync with HEAD
 1.2 17-Jul-2009  dsl Delete files that are no longer needed.
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.6 22-Mar-2014  jakllsch branches: 1.6.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.5 01-Aug-2009  dsl branches: 1.5.6; 1.5.12;
In the misaligned case, xor the read word with the target pattern
before making the unwanted bytes non-zero.
Means that memchr(buf, 0xff) is no longer a special case.
 1.4 20-Jul-2009  christos Put back dsl's string changes, but fix memchr.S to use cmp so that the
condition code is set (and fix the comments 0x10->0x01). From Anon Ymous
We need a test for memchr(x, -1)...
 1.3 19-Jul-2009  christos revert changes that made new kernels hang in ACPI detection
 1.2 18-Jul-2009  dsl A better memchr().
Always read aligned words, invalidating unwanted bytes in first word,
and checking that any match in the last word is before the buffer end.
No loops apart from the one through the data.
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.5.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.6.26.1 21-Apr-2020  martin Sync with HEAD
 1.6 27-Jan-2020  ad x86 uses the C versions of bcmp() and memcmp() now.
 1.5 16-Jan-2020  ad Back out previous, it's broken.
 1.4 15-Jan-2020  ad Rewrite bcmp() & memcmp() to not use REP CMPS. Seems about 5-10x faster for
small strings on modern hardware.
 1.3 22-Mar-2014  jakllsch branches: 1.3.26; 1.3.30;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.2 12-Nov-2007  ad branches: 1.2.28; 1.2.34;
Don't unconditionally clear the direction flag. The ABI says it must always
be clear when making a function call, and 'cld' takes about 50 clock cyles
on the P4.
 1.1 20-Dec-2005  christos branches: 1.1.18;
Merge libkern + libc common files. As requested by core.
 1.1.18.1 09-Jan-2008  matt sync with HEAD
 1.2.34.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.28.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.30.1 29-Feb-2020  ad Sync with head.
 1.3.26.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1 20-Dec-2005  christos branches: 1.1.94;
Merge libkern + libc common files. As requested by core.
 1.1.94.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.1.94.1 21-Apr-2020  martin Sync with HEAD
 1.1 20-Dec-2005  christos branches: 1.1.94;
Merge libkern + libc common files. As requested by core.
 1.1.94.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.1.94.1 21-Apr-2020  martin Sync with HEAD
 1.5 22-May-2014  pooka branches: 1.5.24;
fix build for _KERNEL
 1.4 22-May-2014  uebayasi Put missing END() markers to set ELF symbol size.
 1.3 01-Aug-2009  dsl branches: 1.3.12; 1.3.24;
Remove some long dependant instruction sequences (ie allow parallel code).
Since 'rep stos' will have a long setup time, avoid doing it more than once.
For misaligned (start address or length) write an unaligned word at both
ends of the buffer then aligned 'rep stosd' the middle.
Use the same code for bzero().
bzero.S is left being compiled for a while (empty) - to avoid issues with
duplicate symbols in libc.a after update builds.
 1.2 12-Nov-2007  ad Don't unconditionally clear the direction flag. The ABI says it must always
be clear when making a function call, and 'cld' takes about 50 clock cyles
on the P4.
 1.1 20-Dec-2005  christos branches: 1.1.18;
Merge libkern + libc common files. As requested by core.
 1.1.18.1 09-Jan-2008  matt sync with HEAD
 1.3.24.1 10-Aug-2014  tls Rebase.
 1.3.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.24.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.5.24.1 21-Apr-2020  martin Sync with HEAD
 1.2 17-Jul-2009  dsl Delete files that are no longer needed.
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.2 22-Mar-2014  jakllsch branches: 1.2.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.1 20-Dec-2005  christos branches: 1.1.50; 1.1.56;
Merge libkern + libc common files. As requested by core.
 1.1.56.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.50.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.2.26.1 21-Apr-2020  martin Sync with HEAD
 1.7 22-Mar-2014  jakllsch branches: 1.7.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.6 20-Jul-2009  christos branches: 1.6.6; 1.6.12;
Put back dsl's string changes, but fix memchr.S to use cmp (rather
than test) so that the condition code is set correctly (and fix the
comments: 0x10->0x01 and ^->&). From Anon Ymous

XXX: There are similar comment errors in some of the other string code.

XXX: We really need a regression test that includes misaligned memory
with searches designed to catch corner cases such as searching for 0,
-1, etc, and search length limit violations. Searching for 0 on
misaligned memory would have caught this problem.
 1.5 19-Jul-2009  christos revert changes that made new kernels hang in ACPI detection
 1.4 18-Jul-2009  dsl Shorten a dependency chain by using 'sbb, xor' (at a time when carry is set)
instead of 'mov, neg, dec'.
('mov, not' can't be used because it doesn't set the flags.)
 1.3 18-Jul-2009  dsl Replace with a version that:
1) doesn't do byte compares to find which byte matched
2) doesn't do byte compares if any top bits are set
3) doesn't use a loop when the input is misaligned
4) has less mispredicted branches
Passes regression tests and 'build.sh' doesn't explode (and more than usual).
 1.2 17-Jul-2009  dsl Change all archs so that strchr.[cS] and strrchr.[cS] exist and generate
duplicate symbols for index() and rindex().
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.6.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.7.26.1 21-Apr-2020  martin Sync with HEAD
 1.2 22-Mar-2014  jakllsch branches: 1.2.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.1 20-Dec-2005  christos branches: 1.1.50; 1.1.56;
Merge libkern + libc common files. As requested by core.
 1.1.56.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.50.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.2.26.1 21-Apr-2020  martin Sync with HEAD
 1.2 22-Mar-2014  jakllsch branches: 1.2.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.1 20-Dec-2005  christos branches: 1.1.50; 1.1.56;
Merge libkern + libc common files. As requested by core.
 1.1.56.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.50.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.2.26.1 21-Apr-2020  martin Sync with HEAD
 1.8 30-Mar-2024  andvar s/Westley/Wesley/ in a book reference (in comments).
 1.7 08-Dec-2021  andvar s/efficent/efficient/ in comments.
 1.6 22-Mar-2014  jakllsch branches: 1.6.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.5 12-Jul-2009  dsl branches: 1.5.6; 1.5.12;
Add netbsd copyright.
Reorder a few instructions to interleave a dependency chain.
(I'm really not sure of the best order for those instructions!)
 1.4 12-Jul-2009  dsl Correct some comments
 1.3 11-Jul-2009  dsl After alg 2 triggers, mask with ~x (alg 3) to ignore bytes with top bit set.
Then use bit scan to work out which byte is zero.
If the source is misaligned read the aligned word and make the unwanted
(low order) bytes non-zero.
Passes regression test - which probably tests just enough cases.
 1.2 11-Jul-2009  dsl Change comments about algorithms, 99.6% for random data isn't 'rare' in my book!
(The 'rare' case is any byte values 0x80-0xff.)
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.5.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.6.26.1 21-Apr-2020  martin Sync with HEAD
 1.3 22-Mar-2014  jakllsch branches: 1.3.26;
For all x86_64 string assembly functions that don't overlap (i.e. every
one except memset and bzero) use END() so that symbol size information
is available.
 1.2 17-Jul-2009  dsl branches: 1.2.6; 1.2.12;
Change all archs so that strchr.[cS] and strrchr.[cS] exist and generate
duplicate symbols for index() and rindex().
 1.1 20-Dec-2005  christos branches: 1.1.36;
Merge libkern + libc common files. As requested by core.
 1.1.36.1 23-Jul-2009  jym Sync with HEAD.
 1.2.12.1 19-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.26.2 21-Apr-2020  martin Ooops, restore accidently removed files from merge mishap
 1.3.26.1 21-Apr-2020  martin Sync with HEAD

RSS XML Feed