Home | History | Annotate | Download | only in amd64
History log of /src/sys/arch/amd64/amd64/cpufunc.S
RevisionDateAuthorComments
 1.70  06-Sep-2025  riastradh paravirt_membar_sync(9): New memory barrier.

For use in paravirtualized drivers which require store-before-load
ordering -- irrespective of whether the kernel is built for a single
processor, or whether the (virtual) machine is booted with a single
processor.

This is even required on architectures that don't even have a
store-before-load ordering barrier, like m68k; adding, e.g., a virtio
bus is _as if_ the architecture has been extended with relaxed memory
ordering when talking with that new bus. Such architectures need
some way to request the hypervisor enforce that ordering -- on m68k,
that's done by issuing a CASL instruction, which qemu maps to an
atomic r/m/w with sequential consistency ordering in the host.

PR kern/59618: occasional virtio block device lock ups/hangs
 1.69  23-May-2025  riastradh x86: Expose dtrace_smap_enable/disable symbols even under XENPV.

They're no-ops in that case, but this enables the dtrace modules to
load and work.

PR port-xen/58373: Missing KDTRACE_HOOKS in Xen kernels
 1.68  16-Jul-2024  riastradh xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.

Both xen_mb and membar_sync are designed to provide store-before-load
ordering, but xen_mb has to provide it in synchronizing guest with
hypervisor, while membar_sync only has to provide it in synchronizing
one (guest) CPU with another (guest) CPU.

It is safe to hotpatch away the LOCK prefix in membar_sync on a
uniprocessor boot because membar_sync is only designed to coordinate
between normal memory on multiple CPUs, and is never necessary when
there's only one CPU involved.

But xen_mb is used to coordinate between the guest and the `device'
implemented by a hypervisor, which might be running on another
_physical_ CPU even if the NetBSD guest only sees one `CPU', i.e.,
one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must
still issue an instruction with store-before-load ordering on
multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is
costlier for no benefit here).

No need to change xen_wmb (release ordering, load/store-before-store)
or xen_rmb (acquire ordering, load-before-load/store) because every
x86 store is a store-release and every x86 load is a load-acquire,
even on multiprocessor systems, so there's no hotpatching involved
anyway.

PR kern/57199
 1.67  03-Nov-2023  chs branches: 1.67.6;
dtrace: add support for SMAP

Add support in dtrace for SMAP, so that actions like copyinstr() work.
It would be better if dtrace could use the SMAP_* hotpatch macros directly,
but the hotpatching code does not currently operate on kernel modules,
so we'll use some tiny functions in the base kernel for now.
 1.66  04-Oct-2023  ad Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
 1.65  30-Nov-2020  bouyer branches: 1.65.18;
Introduce smap_enable()/smap_disable() functions, to be used from
C code.
 1.64  19-Jul-2020  maxv branches: 1.64.2;
Revert most of ad's movs/stos change. Instead do a lot simpler: declare
svs_quad_copy() used by SVS only, with no need for instrumentation, because
SVS is disabled when sanitizers are on.
 1.63  24-Jun-2020  maxv remove unused x86_stos
 1.62  15-Jun-2020  riastradh Nix trailing whitespace.
 1.61  15-Jun-2020  msaitoh Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.60  13-Jun-2020  ad Print a rate limited warning if the TSC timecounter goes backwards from the
viewpoint of any single LWP.
 1.59  01-Jun-2020  ad Reported-by: syzbot+6dd5a230d19f0cbc7814@syzkaller.appspotmail.com

Instrument STOS/MOVS for KMSAN to unbreak it.
 1.58  27-May-2020  ad - mismatched END pointed out by maxv@
- ditch the frame, tracer should be able to deal without it in leaf on x86_64
 1.57  27-May-2020  ad - Add a couple of wrapper functions around STOS and MOVS and use them to zero
and copy PTEs in preference to memset()/memcpy().

- Remove related SSE / pageidlezero stuff.
 1.56  20-May-2020  maxv this is kmsan
 1.55  20-May-2020  ad Deal with KMSAN fussiness. Pointed out by msaitoh@.
 1.54  19-May-2020  ad xen needs the TSC funcs too
 1.53  19-May-2020  ad Make cpu_counter(), cpu_counter32() and tsc_get_timecount() into a single
preemption-safe routine.
 1.52  08-May-2020  ad cpu_counter: only need to clear %eax (zero extends).
 1.51  02-May-2020  bouyer Move x86_hotpatch() in !XENPV section. Fixes XEN3* builds.
 1.50  02-May-2020  maxv Modify the hotpatch mechanism, in order to make it much less ROP-friendly.

Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.

- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.

The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).

With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.

The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
 1.49  21-Nov-2019  ad mi_userret(): take care of calling preempt(), set spc_curpriority directly,
and remove MD code that does the same.
 1.48  15-Nov-2019  maxv Remove the ins* and outs* functions. Not sanitizer-friendly, and unused
anyway.
 1.47  14-Nov-2019  maxv Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.46  30-Oct-2019  maxv More inlined ASM.
 1.45  07-Sep-2019  maxv Merge amd64func.S into cpufunc.S, and clean up.
 1.44  07-Sep-2019  maxv Convert rdmsr_locked and wrmsr_locked to inlines.
 1.43  05-Jul-2019  maxv More inlines, prerequisites for future changes. Also, remove fngetsw(),
which was a duplicate of fnstsw().
 1.42  03-Jul-2019  maxv Inline x86_cpuid2(), prerequisite for future changes. Also, add "memory"
on certain other inlines, to make sure GCC does not reorder.
 1.41  29-May-2019  maxv Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.40  19-May-2019  maxv Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.
 1.39  04-May-2019  maxv More inlined ASM. While here switch to proper types.
 1.38  01-May-2019  maxv Start converting the x86 CPU functions to inlined ASM. Matters for NVMM,
where some are invoked millions of times.
 1.37  01-May-2019  maxv Remove unused functions and reorder a little.
 1.36  11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.35  06-Jan-2019  cherry Rollback http://mail-index.netbsd.org/source-changes/2018/12/22/msg101629.html

This change breaks module loading due to weak alias being unsupported
in the kernel module linker.

Requested by maxv@ and others as it affects their work.

No immediate decision on a replacement method is available, but other options
suggested include pre-processing, conditional compilation (#ifdef etc) and other
source level methods to avoid linktime decision making.
 1.34  22-Dec-2018  cherry Introduce a weak alias method of exporting different implementations
of the same API.

For eg: the amd64 native implementation of invlpg() now becomes
amd64_invlpg() with a weak symbol export of invlpg(), while the XEN
implementation becomes xen_invlpg(), also weakly exported as invlpg()

Note that linking in both together without having an override function
named invlpg() would be a mistake, as we have limited control over
which of the two options would emerge as the finally exported invlpg()
resulting in a potential situation where the wrong function is finally
exported. This change avoids this situation.

We should however include an override function invlpg() in that case,
such that it is able to then pass on the call to the appropriate
backing function (amd64_invlpg() in the case of native, and
xen_invlpg() in the case of under XEN virtualisation) at runtime.

This change does not introduce such a function and therefore does not
alter builds to include native as well as XEN implementations in the
same binary. This will be done later, with the introduction of XEN
PVHVM mode, where precisely such a runtime switch is required.

There are no operational changes introduced by this change.
 1.33  21-Jul-2018  maxv More ASLR. Randomize the location of the direct map at boot time on amd64.
This doesn't need "options KASLR" and works on GENERIC. Will soon be
enabled by default.

The location of the areas is abstracted in a slotspace structure. Ideally
we should always use this structure when touching the L4 slots, instead of
the current cocktail of global variables and constants.

machdep initializes the structure with the default values, and we then
randomize its dmap entry. Ideally machdep should randomize everything at
once, but in the case of the direct map its size is determined a little
later in the boot procedure, so we're forced to randomize its location
later too.
 1.32  14-Jul-2018  maxv Drop NENTRY() from the x86 kernels, use ENTRY(). With PMCs (and other hardware
tracing facilities) we have a much better ways of monitoring the CPU activity
than GPROF, without software modification.

Also I think GPROF has never worked, because the 'start' functions of both
i386 and amd64 use ENTRY(), and it would have caused a function call while the
kernel was not yet relocated.
 1.31  01-Nov-2017  maxv branches: 1.31.2; 1.31.4;
Don't fall through functions, explicitly jump instead. While here don't
call smap_enable twice (harmless), and add END() markers.
 1.30  30-Oct-2017  maxv Always use END() markers when declaring functions in assembly, so that ld
can compute the size of the functions. A few remain.

While here, fix a bug in the INTRSTUB macro: we are falling through
resume_, but it is aligned, so it looks like we're executing the inter-
function padding - which probably happens to contain NOPs, but that's
still bad.
 1.29  15-Oct-2017  maxv Add setds and setes, will be useful in the future.
 1.28  15-Oct-2017  maxv Add setusergs on Xen, and simplify.
 1.27  27-Nov-2016  kamil branches: 1.27.8;
Add accessors for available x86 Debug Registers

There are 8 Debug Registers on i386 (available at least since 80386) and 16
on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and
DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3,
DR6-DR7 for all ports.

Debug Registers x86:
* DR0-DR3 Debug Address Registers
* DR4-DR5 Reserved
* DR6 Debug Status Register
* DR7 Debug Control Register
* DR8-DR15 Reserved

Access the registers is available only from a kernel (ring 0) as there is
needed top protected access. For this reason there is need to use special
XEN functions to get and set the registers in the XEN3 kernels.

XEN specific functions as defined in NetBSD:
- HYPERVISOR_get_debugreg()
- HYPERVISOR_set_debugreg()

This code extends the existing rdr6() and ldr6() accessor for additional:
- rdr0() & ldr0()
- rdr1() & ldr1()
- rdr2() & ldr2()
- rdr3() & ldr3()
- rdr7() & ldr7()

Traditionally accessors for DR6 were passing vaddr_t argument, while it's
appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's
not a big deal. The resulting functionality should be equivalent so stick
to this convention and use the vaddr_t type for all DR accessors.

There was already a function defined for rdr6() in XEN, but it had a nit on
AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on
AMD64), truncating result. It still works for DR6, but for the sake of
simplicity always return full 64-bit value.

New accessors duplicate functionality of the dr0() function available on
i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with
logic to set appropriate types of interrupts, now accessors are designed to
pass verbatim values from user-land (with simple sanity checks in the
kernel). At the moment there are no plans to make possible to coexist
KSTACK_CHECK_DR0 with debug registers for user applications (debuggers).

options KSTACK_CHECK_DR0
Detect kernel stack overflow using DR0 register. This option uses DR0
register exclusively so you can't use DR0 register for other purpose
(e.g., hardware breakpoint) if you turn this on.

The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported
to amd64.

Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0.

Sponsored by <The NetBSD Foundation>
 1.26  27-Nov-2016  kamil Fix rdr6() function on amd64

According to the AMD64 SysV ABI the first returned value is passed in RAX,
not in RDI. Actually RDI is used for the first argument passed to a
function.

So far this function was dead code, it will be used for a ptrace(2)
feature to support CPU watchpoints.

The rdr6() function reads state of the DR6 register and returns its value.

Sponsored by <The NetBSD Foundation>
 1.25  12-Feb-2014  dsl branches: 1.25.6; 1.25.10;
Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.24  09-Feb-2014  dsl Best if x86_stmxcsr executes stmxcsr.
 1.23  09-Feb-2014  dsl Add x86_stmxcsr for amd64.
 1.22  08-Dec-2013  dsl Add some definitions for cpu 'extended state'.
These are needed for support of the AVX SIMD instructions.
Nothing yet uses them.
 1.21  24-Sep-2011  jym branches: 1.21.2; 1.21.12; 1.21.16;
White space fix.
 1.20  24-Sep-2011  jym Import rdmsr_safe(msr, *value) for x86 world. It allows reading MSRs
in a safe way by handling the fault that might trigger for certain
register <> CPU/arch combos.

Requested by Jukka. Patch adapted from one found in DragonflyBSD.
 1.19  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.18  22-Feb-2011  joerg branches: 1.18.2;
Be explicit about the member of the fld family wanted here.
 1.17  07-Jul-2010  chs branches: 1.17.2; 1.17.4;
add the guts of TLS support on amd64. based on joerg's patch,
reworked by me to support 32-bit processes as well.
we now keep %fs and %gs loaded with the user values
while in the kernel, which means we don't need to
reload them when returning to user mode.
 1.16  01-Oct-2009  skrll branches: 1.16.2; 1.16.4;
Fix up mwait/monitor now that gas has been fixed.
 1.15  24-Jun-2008  ad branches: 1.15.6; 1.15.10; 1.15.16; 1.15.20;
getss -> x86_getss
 1.14  25-May-2008  chs branches: 1.14.2;
enable profiling of assembly functions, except for x86_pause().
profiling that one causes the system with profiling on to become so slow
that we get spinlock timeouts.
 1.13  11-May-2008  ad Don't reload LDTR unless a new value, which only happens for USER_LDT.
 1.12  10-May-2008  ad Take skew into account for cpu_counter().
 1.11  10-May-2008  ad Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.
 1.10  28-Apr-2008  ad branches: 1.10.2;
Add support for kernel preeemption to the i386 and amd64 ports. Notes:

- I have seen one isolated panic in the x86 pmap, but otherwise i386
seems stable with preemption enabled.

- amd64 is missing the FPU handling changes and it's not yet safe to
enable it there.

- The usual level for kern.sched.kpreempt_pri will be 128 once enabled
by default. For testing, setting it to 0 helps to shake out bugs.
 1.9  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.8  27-Apr-2008  ad branches: 1.8.2;
+lcr2
 1.7  08-Feb-2008  ad branches: 1.7.6; 1.7.8;
breakpoint: set up a stack frame so not to confuse gdb.
 1.6  01-Jan-2008  yamt add x86_cpuid2, which can specify ecx register.
 1.5  20-Dec-2007  ad - Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.4  06-Dec-2007  ad branches: 1.4.4;
Correct argument shuffling in the string I/O functions.
 1.3  22-Nov-2007  bouyer branches: 1.3.2;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.2  12-Nov-2007  ad Don't unconditionally clear the direction flag. The ABI says it must always
be clear when making a function call, and 'cld' takes about 50 clock cyles
on the P4.
 1.1  26-Sep-2007  ad branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.12; 1.1.14;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.1.14.4  23-Mar-2008  matt sync with HEAD
 1.1.14.3  09-Jan-2008  matt sync with HEAD
 1.1.14.2  06-Nov-2007  matt sync with HEAD
 1.1.14.1  26-Sep-2007  matt file cpufunc.S was added on branch matt-armv6 on 2007-11-06 23:14:01 +0000
 1.1.12.4  18-Feb-2008  mjf Sync with HEAD.
 1.1.12.3  27-Dec-2007  mjf Sync with HEAD.
 1.1.12.2  08-Dec-2007  mjf Sync with HEAD.
 1.1.12.1  19-Nov-2007  mjf Sync with HEAD.
 1.1.10.6  11-Feb-2008  yamt sync with head.
 1.1.10.5  21-Jan-2008  yamt sync with head
 1.1.10.4  07-Dec-2007  yamt sync with head
 1.1.10.3  15-Nov-2007  yamt sync with head.
 1.1.10.2  27-Oct-2007  yamt sync with head.
 1.1.10.1  26-Sep-2007  yamt file cpufunc.S was added on branch yamt-lazymbuf on 2007-10-27 11:25:01 +0000
 1.1.8.2  13-Nov-2007  bouyer Sync with HEAD
 1.1.8.1  17-Oct-2007  bouyer amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by
Mathieu Ropert in 2006.
DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to
sysinst if you're lucky. Often it panics because a runable LWP has
a NULL stack (really, it's all of l->l_addr which is has been zeroed out
while the process was on the queue !)
TODO:
- bug fixes :)
- Most of the xpq_* functions should be shared with xen/i386
- The xen/i386 assembly bootstrap code should be remplaced with the C
version in xenamd64/amd64/xpmap.c
- see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
 1.1.6.4  03-Dec-2007  ad Sync with HEAD.
 1.1.6.3  03-Dec-2007  ad Sync with HEAD.
 1.1.6.2  09-Oct-2007  ad Sync with head.
 1.1.6.1  26-Sep-2007  ad file cpufunc.S was added on branch vmlocking on 2007-10-09 13:37:14 +0000
 1.1.4.2  07-Oct-2007  yamt sync with head.
 1.1.4.1  26-Sep-2007  yamt file cpufunc.S was added on branch yamt-x86pmap on 2007-10-07 08:33:20 +0000
 1.1.2.4  09-Dec-2007  jmcneill Sync with HEAD.
 1.1.2.3  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.1.2.2  14-Nov-2007  joerg Sync with HEAD.
 1.1.2.1  26-Sep-2007  joerg file cpufunc.S was added on branch jmcneill-pm on 2007-11-14 19:04:01 +0000
 1.3.2.2  26-Dec-2007  ad Sync with head.
 1.3.2.1  08-Dec-2007  ad Sync with head.
 1.4.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.7.8.2  04-Jun-2008  yamt sync with head
 1.7.8.1  18-May-2008  yamt sync with head.
 1.7.6.2  29-Jun-2008  mjf Sync with HEAD.
 1.7.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.8.2.4  11-Aug-2010  yamt sync with head.
 1.8.2.3  11-Mar-2010  yamt sync with head
 1.8.2.2  04-May-2009  yamt sync with head.
 1.8.2.1  16-May-2008  yamt sync with head.
 1.10.2.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.10.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.14.2.1  27-Jun-2008  simonb Sync with head.
 1.15.20.1  01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1969):
sys/arch/x86/include/cpufunc.h: revision 1.13
sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch
sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch

Backport rdmsr_safe() to access MSR safely.
 1.15.16.1  01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1969):
sys/arch/x86/include/cpufunc.h: revision 1.13
sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch
sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch

Backport rdmsr_safe() to access MSR safely.
 1.15.10.4  27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.15.10.3  28-Mar-2011  jym Cure sync hiccups. Code with compile errors is not really useful, heh.
 1.15.10.2  28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.15.10.1  01-Nov-2009  jym Sync with HEAD.
 1.15.6.1  01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1969):
sys/arch/x86/include/cpufunc.h: revision 1.13
sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch
sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch

Backport rdmsr_safe() to access MSR safely.
 1.16.4.2  17-Mar-2011  rmind - Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE)
is not (yet) enabled. This fixes the issue of stale TLB entry, experienced
early on boot, when PGE is not yet set on primary CPU.
- Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly),
which simplifies and unifies (under x86) code, plus fixes few bugs.
- cpu_attach: remove assignment to cpus_running, as primary CPU might not be
attached first, which causes reset (and thus missed secondary CPUs).
 1.16.4.1  05-Mar-2011  rmind sync with head
 1.16.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.17.4.1  05-Mar-2011  bouyer Sync with HEAD
 1.17.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.18.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.21.16.1  18-May-2014  rmind sync with head
 1.21.12.2  03-Dec-2017  jdolecek update from HEAD
 1.21.12.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.21.2.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.25.10.1  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.25.6.1  05-Dec-2016  skrll Sync with HEAD
 1.27.8.1  26-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #575):
sys/arch/amd64/amd64/copy.S: 1.28 via patch
sys/arch/amd64/amd64/cpufunc.S: 1.31
Don't fall through functions, explicitly jump instead.
 1.31.4.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.31.4.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.31.4.1  10-Jun-2019  christos Sync with HEAD
 1.31.2.3  18-Jan-2019  pgoyette Synch with HEAD
 1.31.2.2  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.31.2.1  28-Jul-2018  pgoyette Sync with HEAD
 1.64.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.65.18.1  20-Jul-2024  martin Pull up following revision(s) (requested by riastradh in ticket #764):

common/lib/libc/arch/i386/atomic/atomic.S: revision 1.37
sys/arch/xen/include/xenring.h: revision 1.8
sys/arch/i386/i386/cpufunc.S: revision 1.52
sys/arch/amd64/amd64/cpufunc.S: revision 1.68
sys/arch/xen/include/hypervisor.h: revision 1.60
common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.30

xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots.

Both xen_mb and membar_sync are designed to provide store-before-load
ordering, but xen_mb has to provide it in synchronizing guest with
hypervisor, while membar_sync only has to provide it in synchronizing
one (guest) CPU with another (guest) CPU.

It is safe to hotpatch away the LOCK prefix in membar_sync on a
uniprocessor boot because membar_sync is only designed to coordinate
between normal memory on multiple CPUs, and is never necessary when
there's only one CPU involved.

But xen_mb is used to coordinate between the guest and the `device'
implemented by a hypervisor, which might be running on another
_physical_ CPU even if the NetBSD guest only sees one `CPU', i.e.,
one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must
still issue an instruction with store-before-load ordering on
multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is
costlier for no benefit here).

No need to change xen_wmb (release ordering, load/store-before-store)
or xen_rmb (acquire ordering, load-before-load/store) because every
x86 store is a store-release and every x86 load is a load-acquire,
even on multiprocessor systems, so there's no hotpatching involved
anyway.

PR kern/57199
 1.67.6.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed