Cross Reference: /src/sys/arch/amd64/amd64/cpufunc.S

History log of /src/sys/arch/amd64/amd64/cpufunc.S
Revision	Date	Author	Comments
1.70	06-Sep-2025	riastradh	paravirt_membar_sync(9): New memory barrier. For use in paravirtualized drivers which require store-before-load ordering -- irrespective of whether the kernel is built for a single processor, or whether the (virtual) machine is booted with a single processor. This is even required on architectures that don't even have a store-before-load ordering barrier, like m68k; adding, e.g., a virtio bus is _as if_ the architecture has been extended with relaxed memory ordering when talking with that new bus. Such architectures need some way to request the hypervisor enforce that ordering -- on m68k, that's done by issuing a CASL instruction, which qemu maps to an atomic r/m/w with sequential consistency ordering in the host. PR kern/59618: occasional virtio block device lock ups/hangs
1.69	23-May-2025	riastradh	x86: Expose dtrace_smap_enable/disable symbols even under XENPV. They're no-ops in that case, but this enables the dtrace modules to load and work. PR port-xen/58373: Missing KDTRACE_HOOKS in Xen kernels
1.68	16-Jul-2024	riastradh	xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots. Both xen_mb and membar_sync are designed to provide store-before-load ordering, but xen_mb has to provide it in synchronizing guest with hypervisor, while membar_sync only has to provide it in synchronizing one (guest) CPU with another (guest) CPU. It is safe to hotpatch away the LOCK prefix in membar_sync on a uniprocessor boot because membar_sync is only designed to coordinate between normal memory on multiple CPUs, and is never necessary when there's only one CPU involved. But xen_mb is used to coordinate between the guest and the `device' implemented by a hypervisor, which might be running on another _physical_ CPU even if the NetBSD guest only sees one `CPU', i.e., one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must still issue an instruction with store-before-load ordering on multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is costlier for no benefit here). No need to change xen_wmb (release ordering, load/store-before-store) or xen_rmb (acquire ordering, load-before-load/store) because every x86 store is a store-release and every x86 load is a load-acquire, even on multiprocessor systems, so there's no hotpatching involved anyway. PR kern/57199
1.67	03-Nov-2023	chs	branches: 1.67.6; dtrace: add support for SMAP Add support in dtrace for SMAP, so that actions like copyinstr() work. It would be better if dtrace could use the SMAP_* hotpatch macros directly, but the hotpatching code does not currently operate on kernel modules, so we'll use some tiny functions in the base kernel for now.
1.66	04-Oct-2023	ad	Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added before we had per-LWP struct rusage; the same is now tracked there.
1.65	30-Nov-2020	bouyer	branches: 1.65.18; Introduce smap_enable()/smap_disable() functions, to be used from C code.
1.64	19-Jul-2020	maxv	branches: 1.64.2; Revert most of ad's movs/stos change. Instead do a lot simpler: declare svs_quad_copy() used by SVS only, with no need for instrumentation, because SVS is disabled when sanitizers are on.
1.63	24-Jun-2020	maxv	remove unused x86_stos
1.62	15-Jun-2020	riastradh	Nix trailing whitespace.
1.61	15-Jun-2020	msaitoh	Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely. x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it still has room. I measured the effect of lfence, mfence, cpuid and rdtscp. The impact to TSC skew and/or drift is: AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify Intel: lfence > rdtscp > cpuid > nomodify So, mfence is the best on AMD and lfence is the best on Intel. If it has no SSE2, we can use cpuid. NOTE: - An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for serializing, but it's not so good. - On Intel i386(not amd64), it seems the improvement is very little. - rdtscp instruct can be used as serializing instruction + rdtsc, but it's not good as [lm]fence. Both Intel and AMD's document say that the latency of rdtscp is bigger than rdtsc, so I suspect the difference of the result comes from it.
1.60	13-Jun-2020	ad	Print a rate limited warning if the TSC timecounter goes backwards from the viewpoint of any single LWP.
1.59	01-Jun-2020	ad	Reported-by: syzbot+6dd5a230d19f0cbc7814@syzkaller.appspotmail.com Instrument STOS/MOVS for KMSAN to unbreak it.
1.58	27-May-2020	ad	- mismatched END pointed out by maxv@ - ditch the frame, tracer should be able to deal without it in leaf on x86_64
1.57	27-May-2020	ad	- Add a couple of wrapper functions around STOS and MOVS and use them to zero and copy PTEs in preference to memset()/memcpy(). - Remove related SSE / pageidlezero stuff.
1.56	20-May-2020	maxv	this is kmsan
1.55	20-May-2020	ad	Deal with KMSAN fussiness. Pointed out by msaitoh@.
1.54	19-May-2020	ad	xen needs the TSC funcs too
1.53	19-May-2020	ad	Make cpu_counter(), cpu_counter32() and tsc_get_timecount() into a single preemption-safe routine.
1.52	08-May-2020	ad	cpu_counter: only need to clear %eax (zero extends).
1.51	02-May-2020	bouyer	Move x86_hotpatch() in !XENPV section. Fixes XEN3* builds.
1.50	02-May-2020	maxv	Modify the hotpatch mechanism, in order to make it much less ROP-friendly. Currently x86_patch_window_open is a big problem, because it is a perfect function to inject/modify executable code with ROP. - Remove x86_patch_window_open(), along with its x86_patch_window_close() counterpart. - Introduce a read-only link-set of hotpatch descriptor structures, which reference a maximum of two read-only hotpatch sources. - Modify x86_hotpatch() to open a window and call the new x86_hotpatch_apply() function in a hard-coded manner. - Modify x86_hotpatch() to take a name and a selector, and have x86_hotpatch_apply() resolve the descriptor from the name and the source from the selector, before hotpatching. - Move the error handling in a separate x86_hotpatch_cleanup() function, that gets called after we closed the window. The resulting implementation is a bit complex and non-obvious. But it gains the following properties: the code executed in the hotpatch window is strictly hard-coded (no callback and no possibility to execute your own code in the window) and the pointers this code accesses are strictly read-only (no possibility to forge pointers to hotpatch an area that was not designated as hotpatchable at compile-time, and no possibility to choose what bytes to write other than the maximum of two read-only templates that were designated as valid for the given destination at compile-time). With current CPUs this slightly improves a situation that is already pretty bad by definition on x86. Assuming CET however, this change closes a big hole and is kinda great. The only ~problem there is, is that dtrace-fbt tries to hotpatch random places with random bytes, and there is just no way to make it safe. However dtrace is only in a module, that is rarely used and never compiled into the kernel, so it's not a big problem; add a shitty & vulnerable independent hotpatch window in it, and leave big XXXs. It looks like fbt is going to collapse soon anyway.
1.49	21-Nov-2019	ad	mi_userret(): take care of calling preempt(), set spc_curpriority directly, and remove MD code that does the same.
1.48	15-Nov-2019	maxv	Remove the ins* and outs* functions. Not sanitizer-friendly, and unused anyway.
1.47	14-Nov-2019	maxv	Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed. We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory. The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan. The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable). We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK. Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment. Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level. The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset. This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable. kMSan is available with LLVM, but not with GCC. The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported.
1.46	30-Oct-2019	maxv	More inlined ASM.
1.45	07-Sep-2019	maxv	Merge amd64func.S into cpufunc.S, and clean up.
1.44	07-Sep-2019	maxv	Convert rdmsr_locked and wrmsr_locked to inlines.
1.43	05-Jul-2019	maxv	More inlines, prerequisites for future changes. Also, remove fngetsw(), which was a duplicate of fnstsw().
1.42	03-Jul-2019	maxv	Inline x86_cpuid2(), prerequisite for future changes. Also, add "memory" on certain other inlines, to make sure GCC does not reorder.
1.41	29-May-2019	maxv	Add PCID support in SVS. This avoids TLB flushes during kernel<->user transitions, which greatly reduces the performance penalty introduced by SVS. We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages in both ASIDs. The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether SVS+PCID is in use.
1.40	19-May-2019	maxv	Misc changes in the x86 FPU code. Reduces a future diff. No real functional change.
1.39	04-May-2019	maxv	More inlined ASM. While here switch to proper types.
1.38	01-May-2019	maxv	Start converting the x86 CPU functions to inlined ASM. Matters for NVMM, where some are invoked millions of times.
1.37	01-May-2019	maxv	Remove unused functions and reorder a little.
1.36	11-Feb-2019	cherry	We reorganise definitions for XEN source support as follows: XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
1.35	06-Jan-2019	cherry	Rollback http://mail-index.netbsd.org/source-changes/2018/12/22/msg101629.html This change breaks module loading due to weak alias being unsupported in the kernel module linker. Requested by maxv@ and others as it affects their work. No immediate decision on a replacement method is available, but other options suggested include pre-processing, conditional compilation (#ifdef etc) and other source level methods to avoid linktime decision making.
1.34	22-Dec-2018	cherry	Introduce a weak alias method of exporting different implementations of the same API. For eg: the amd64 native implementation of invlpg() now becomes amd64_invlpg() with a weak symbol export of invlpg(), while the XEN implementation becomes xen_invlpg(), also weakly exported as invlpg() Note that linking in both together without having an override function named invlpg() would be a mistake, as we have limited control over which of the two options would emerge as the finally exported invlpg() resulting in a potential situation where the wrong function is finally exported. This change avoids this situation. We should however include an override function invlpg() in that case, such that it is able to then pass on the call to the appropriate backing function (amd64_invlpg() in the case of native, and xen_invlpg() in the case of under XEN virtualisation) at runtime. This change does not introduce such a function and therefore does not alter builds to include native as well as XEN implementations in the same binary. This will be done later, with the introduction of XEN PVHVM mode, where precisely such a runtime switch is required. There are no operational changes introduced by this change.
1.33	21-Jul-2018	maxv	More ASLR. Randomize the location of the direct map at boot time on amd64. This doesn't need "options KASLR" and works on GENERIC. Will soon be enabled by default. The location of the areas is abstracted in a slotspace structure. Ideally we should always use this structure when touching the L4 slots, instead of the current cocktail of global variables and constants. machdep initializes the structure with the default values, and we then randomize its dmap entry. Ideally machdep should randomize everything at once, but in the case of the direct map its size is determined a little later in the boot procedure, so we're forced to randomize its location later too.
1.32	14-Jul-2018	maxv	Drop NENTRY() from the x86 kernels, use ENTRY(). With PMCs (and other hardware tracing facilities) we have a much better ways of monitoring the CPU activity than GPROF, without software modification. Also I think GPROF has never worked, because the 'start' functions of both i386 and amd64 use ENTRY(), and it would have caused a function call while the kernel was not yet relocated.
1.31	01-Nov-2017	maxv	branches: 1.31.2; 1.31.4; Don't fall through functions, explicitly jump instead. While here don't call smap_enable twice (harmless), and add END() markers.
1.30	30-Oct-2017	maxv	Always use END() markers when declaring functions in assembly, so that ld can compute the size of the functions. A few remain. While here, fix a bug in the INTRSTUB macro: we are falling through resume_, but it is aligned, so it looks like we're executing the inter- function padding - which probably happens to contain NOPs, but that's still bad.
1.29	15-Oct-2017	maxv	Add setds and setes, will be useful in the future.
1.28	15-Oct-2017	maxv	Add setusergs on Xen, and simplify.
1.27	27-Nov-2016	kamil	branches: 1.27.8; Add accessors for available x86 Debug Registers There are 8 Debug Registers on i386 (available at least since 80386) and 16 on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3, DR6-DR7 for all ports. Debug Registers x86: * DR0-DR3 Debug Address Registers * DR4-DR5 Reserved * DR6 Debug Status Register * DR7 Debug Control Register * DR8-DR15 Reserved Access the registers is available only from a kernel (ring 0) as there is needed top protected access. For this reason there is need to use special XEN functions to get and set the registers in the XEN3 kernels. XEN specific functions as defined in NetBSD: - HYPERVISOR_get_debugreg() - HYPERVISOR_set_debugreg() This code extends the existing rdr6() and ldr6() accessor for additional: - rdr0() & ldr0() - rdr1() & ldr1() - rdr2() & ldr2() - rdr3() & ldr3() - rdr7() & ldr7() Traditionally accessors for DR6 were passing vaddr_t argument, while it's appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's not a big deal. The resulting functionality should be equivalent so stick to this convention and use the vaddr_t type for all DR accessors. There was already a function defined for rdr6() in XEN, but it had a nit on AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on AMD64), truncating result. It still works for DR6, but for the sake of simplicity always return full 64-bit value. New accessors duplicate functionality of the dr0() function available on i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with logic to set appropriate types of interrupts, now accessors are designed to pass verbatim values from user-land (with simple sanity checks in the kernel). At the moment there are no plans to make possible to coexist KSTACK_CHECK_DR0 with debug registers for user applications (debuggers). options KSTACK_CHECK_DR0 Detect kernel stack overflow using DR0 register. This option uses DR0 register exclusively so you can't use DR0 register for other purpose (e.g., hardware breakpoint) if you turn this on. The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported to amd64. Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0. Sponsored by <The NetBSD Foundation>
1.26	27-Nov-2016	kamil	Fix rdr6() function on amd64 According to the AMD64 SysV ABI the first returned value is passed in RAX, not in RDI. Actually RDI is used for the first argument passed to a function. So far this function was dead code, it will be used for a ptrace(2) feature to support CPU watchpoints. The rdr6() function reads state of the DR6 register and returns its value. Sponsored by <The NetBSD Foundation>
1.25	12-Feb-2014	dsl	branches: 1.25.6; 1.25.10; Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
1.24	09-Feb-2014	dsl	Best if x86_stmxcsr executes stmxcsr.
1.23	09-Feb-2014	dsl	Add x86_stmxcsr for amd64.
1.22	08-Dec-2013	dsl	Add some definitions for cpu 'extended state'. These are needed for support of the AVX SIMD instructions. Nothing yet uses them.
1.21	24-Sep-2011	jym	branches: 1.21.2; 1.21.12; 1.21.16; White space fix.
1.20	24-Sep-2011	jym	Import rdmsr_safe(msr, *value) for x86 world. It allows reading MSRs in a safe way by handling the fault that might trigger for certain register <> CPU/arch combos. Requested by Jukka. Patch adapted from one found in DragonflyBSD.
1.19	12-Jun-2011	rmind	Welcome to 5.99.53! Merge rmind-uvmplock branch: - Reorganize locking in UVM and provide extra serialisation for pmap(9). New lock order: [vmpage-owner-lock] -> pmap-lock. - Simplify locking in some pmap(9) modules by removing P->V locking. - Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs). - Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner. Add TLBSTATS option for x86 to collect statistics about TLB shootdowns. - Unify /dev/mem et al in MI code and provide required locking (removes kernel-lock on some ports). Also, avoid cache-aliasing issues. Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches formed the core changes of this branch.
1.18	22-Feb-2011	joerg	branches: 1.18.2; Be explicit about the member of the fld family wanted here.
1.17	07-Jul-2010	chs	branches: 1.17.2; 1.17.4; add the guts of TLS support on amd64. based on joerg's patch, reworked by me to support 32-bit processes as well. we now keep %fs and %gs loaded with the user values while in the kernel, which means we don't need to reload them when returning to user mode.
1.16	01-Oct-2009	skrll	branches: 1.16.2; 1.16.4; Fix up mwait/monitor now that gas has been fixed.
1.15	24-Jun-2008	ad	branches: 1.15.6; 1.15.10; 1.15.16; 1.15.20; getss -> x86_getss
1.14	25-May-2008	chs	branches: 1.14.2; enable profiling of assembly functions, except for x86_pause(). profiling that one causes the system with profiling on to become so slow that we get spinlock timeouts.
1.13	11-May-2008	ad	Don't reload LDTR unless a new value, which only happens for USER_LDT.
1.12	10-May-2008	ad	Take skew into account for cpu_counter().
1.11	10-May-2008	ad	Improve x86 tsc handling: - Ditch the cross-CPU calibration stuff. It didn't work properly, and it's near impossible to synchronize the CPUs in a running system, because bus traffic will interfere with any calibration attempt, messing up the timings. - Only enable the TSC on CPUs where we are sure it does not drift. If we are On a known good CPU, give the TSC high timecounter quality, making it the default. - When booting CPUs, detect TSC skew and account for it. Most Intel MP systems have synchronized counters, but that need not be true if the system has a complicated bus structure. As far as I know, AMD systems do not have synchronized TSCs and so we need to handle skew. - While an AP is waiting to be set running, try and make the TSC drift by entering a reduced power state. If we detect drift, ensure that the TSC does not get a high timecounter quality. This should not happen and is only for safety. - Make cpu_counter() stuff LKM safe.
1.10	28-Apr-2008	ad	branches: 1.10.2; Add support for kernel preeemption to the i386 and amd64 ports. Notes: - I have seen one isolated panic in the x86 pmap, but otherwise i386 seems stable with preemption enabled. - amd64 is missing the FPU handling changes and it's not yet safe to enable it there. - The usual level for kern.sched.kpreempt_pri will be 128 once enabled by default. For testing, setting it to 0 helps to shake out bugs.
1.9	28-Apr-2008	martin	Remove clause 3 and 4 from TNF licenses
1.8	27-Apr-2008	ad	branches: 1.8.2; +lcr2
1.7	08-Feb-2008	ad	branches: 1.7.6; 1.7.8; breakpoint: set up a stack frame so not to confuse gdb.
1.6	01-Jan-2008	yamt	add x86_cpuid2, which can specify ecx register.
1.5	20-Dec-2007	ad	- Make __cpu_simple_lock and similar real functions and patch at runtime. - Remove old x86 atomic ops. - Drop text alignment back to 16 on i386 (really, this time). - Minor cleanup.
1.4	06-Dec-2007	ad	branches: 1.4.4; Correct argument shuffling in the string I/O functions.
1.3	22-Nov-2007	bouyer	branches: 1.3.2; Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support to NetBSD/Xen, both Dom0 and DomU.
1.2	12-Nov-2007	ad	Don't unconditionally clear the direction flag. The ABI says it must always be clear when making a function call, and 'cld' takes about 50 clock cyles on the P4.
1.1	26-Sep-2007	ad	branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.12; 1.1.14; x86 changes for pcc and LKMs. - Replace most inline assembly with proper functions. As a side effect this reduces the size of amd64 GENERIC by about 120kB, and i386 by a smaller amount. Nearly all of the inlines did something slow, or something that does not need to be fast. - Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL. In that case make them inlines. Makes curlwp LKM and preemption safe. - Make bus_space and bus_dma more LKM friendly. - Share a few more files between the ports. - Other minor changes.
1.1.14.4	23-Mar-2008	matt	sync with HEAD
1.1.14.3	09-Jan-2008	matt	sync with HEAD
1.1.14.2	06-Nov-2007	matt	sync with HEAD
1.1.14.1	26-Sep-2007	matt	file cpufunc.S was added on branch matt-armv6 on 2007-11-06 23:14:01 +0000
1.1.12.4	18-Feb-2008	mjf	Sync with HEAD.
1.1.12.3	27-Dec-2007	mjf	Sync with HEAD.
1.1.12.2	08-Dec-2007	mjf	Sync with HEAD.
1.1.12.1	19-Nov-2007	mjf	Sync with HEAD.
1.1.10.6	11-Feb-2008	yamt	sync with head.
1.1.10.5	21-Jan-2008	yamt	sync with head
1.1.10.4	07-Dec-2007	yamt	sync with head
1.1.10.3	15-Nov-2007	yamt	sync with head.
1.1.10.2	27-Oct-2007	yamt	sync with head.
1.1.10.1	26-Sep-2007	yamt	file cpufunc.S was added on branch yamt-lazymbuf on 2007-10-27 11:25:01 +0000
1.1.8.2	13-Nov-2007	bouyer	Sync with HEAD
1.1.8.1	17-Oct-2007	bouyer	amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by Mathieu Ropert in 2006. DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to sysinst if you're lucky. Often it panics because a runable LWP has a NULL stack (really, it's all of l->l_addr which is has been zeroed out while the process was on the queue !) TODO: - bug fixes :) - Most of the xpq_* functions should be shared with xen/i386 - The xen/i386 assembly bootstrap code should be remplaced with the C version in xenamd64/amd64/xpmap.c - see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
1.1.6.4	03-Dec-2007	ad	Sync with HEAD.
1.1.6.3	03-Dec-2007	ad	Sync with HEAD.
1.1.6.2	09-Oct-2007	ad	Sync with head.
1.1.6.1	26-Sep-2007	ad	file cpufunc.S was added on branch vmlocking on 2007-10-09 13:37:14 +0000
1.1.4.2	07-Oct-2007	yamt	sync with head.
1.1.4.1	26-Sep-2007	yamt	file cpufunc.S was added on branch yamt-x86pmap on 2007-10-07 08:33:20 +0000
1.1.2.4	09-Dec-2007	jmcneill	Sync with HEAD.
1.1.2.3	27-Nov-2007	joerg	Sync with HEAD. amd64 Xen support needs testing.
1.1.2.2	14-Nov-2007	joerg	Sync with HEAD.
1.1.2.1	26-Sep-2007	joerg	file cpufunc.S was added on branch jmcneill-pm on 2007-11-14 19:04:01 +0000
1.3.2.2	26-Dec-2007	ad	Sync with head.
1.3.2.1	08-Dec-2007	ad	Sync with head.
1.4.4.1	02-Jan-2008	bouyer	Sync with HEAD
1.7.8.2	04-Jun-2008	yamt	sync with head
1.7.8.1	18-May-2008	yamt	sync with head.
1.7.6.2	29-Jun-2008	mjf	Sync with HEAD.
1.7.6.1	02-Jun-2008	mjf	Sync with HEAD.
1.8.2.4	11-Aug-2010	yamt	sync with head.
1.8.2.3	11-Mar-2010	yamt	sync with head
1.8.2.2	04-May-2009	yamt	sync with head.
1.8.2.1	16-May-2008	yamt	sync with head.
1.10.2.2	18-Sep-2008	wrstuden	Sync with wrstuden-revivesa-base-2.
1.10.2.1	23-Jun-2008	wrstuden	Sync w/ -current. 34 merge conflicts to follow.
1.14.2.1	27-Jun-2008	simonb	Sync with head.
1.15.20.1	01-Jun-2015	sborrill	Pull up the following revisions(s) (requested by msaitoh in ticket #1969): sys/arch/x86/include/cpufunc.h: revision 1.13 sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch Backport rdmsr_safe() to access MSR safely.
1.15.16.1	01-Jun-2015	sborrill	Pull up the following revisions(s) (requested by msaitoh in ticket #1969): sys/arch/x86/include/cpufunc.h: revision 1.13 sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch Backport rdmsr_safe() to access MSR safely.
1.15.10.4	27-Aug-2011	jym	Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen work of cherry@. No regression observed on suspend/restore.
1.15.10.3	28-Mar-2011	jym	Cure sync hiccups. Code with compile errors is not really useful, heh.
1.15.10.2	28-Mar-2011	jym	Sync with HEAD. TODO before merge: - shortcut for suspend code in sysmon, when powerd(8) is not running. Borrow ``xs_watch'' thread context? - bug hunting in xbd + xennet resume. Rings are currently thrashed upon resume, so current implementation force flush them on suspend. It's not really needed.
1.15.10.1	01-Nov-2009	jym	Sync with HEAD.
1.15.6.1	01-Jun-2015	sborrill	Pull up the following revisions(s) (requested by msaitoh in ticket #1969): sys/arch/x86/include/cpufunc.h: revision 1.13 sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch Backport rdmsr_safe() to access MSR safely.
1.16.4.2	17-Mar-2011	rmind	- Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE) is not (yet) enabled. This fixes the issue of stale TLB entry, experienced early on boot, when PGE is not yet set on primary CPU. - Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly), which simplifies and unifies (under x86) code, plus fixes few bugs. - cpu_attach: remove assignment to cpus_running, as primary CPU might not be attached first, which causes reset (and thus missed secondary CPUs).
1.16.4.1	05-Mar-2011	rmind	sync with head
1.16.2.1	17-Aug-2010	uebayasi	Sync with HEAD.
1.17.4.1	05-Mar-2011	bouyer	Sync with HEAD
1.17.2.1	06-Jun-2011	jruoho	Sync with HEAD.
1.18.2.1	23-Jun-2011	cherry	Catchup with rmind-uvmplock merge.
1.21.16.1	18-May-2014	rmind	sync with head
1.21.12.2	03-Dec-2017	jdolecek	update from HEAD
1.21.12.1	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.21.2.1	22-May-2014	yamt	sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
1.25.10.1	07-Jan-2017	pgoyette	Sync with HEAD. (Note that most of these changes are simply $NetBSD$ tag issues.)
1.25.6.1	05-Dec-2016	skrll	Sync with HEAD
1.27.8.1	26-Feb-2018	snj	Pull up following revision(s) (requested by maxv in ticket #575): sys/arch/amd64/amd64/copy.S: 1.28 via patch sys/arch/amd64/amd64/cpufunc.S: 1.31 Don't fall through functions, explicitly jump instead.
1.31.4.3	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.31.4.2	08-Apr-2020	martin	Merge changes from current as of 20200406
1.31.4.1	10-Jun-2019	christos	Sync with HEAD
1.31.2.3	18-Jan-2019	pgoyette	Synch with HEAD
1.31.2.2	26-Dec-2018	pgoyette	Sync with HEAD, resolve a few conflicts
1.31.2.1	28-Jul-2018	pgoyette	Sync with HEAD
1.64.2.1	14-Dec-2020	thorpej	Sync w/ HEAD.
1.65.18.1	20-Jul-2024	martin	Pull up following revision(s) (requested by riastradh in ticket #764): common/lib/libc/arch/i386/atomic/atomic.S: revision 1.37 sys/arch/xen/include/xenring.h: revision 1.8 sys/arch/i386/i386/cpufunc.S: revision 1.52 sys/arch/amd64/amd64/cpufunc.S: revision 1.68 sys/arch/xen/include/hypervisor.h: revision 1.60 common/lib/libc/arch/x86_64/atomic/atomic.S: revision 1.30 xen: Don't hotpatch away LOCK prefix in xen_mb, even on UP boots. Both xen_mb and membar_sync are designed to provide store-before-load ordering, but xen_mb has to provide it in synchronizing guest with hypervisor, while membar_sync only has to provide it in synchronizing one (guest) CPU with another (guest) CPU. It is safe to hotpatch away the LOCK prefix in membar_sync on a uniprocessor boot because membar_sync is only designed to coordinate between normal memory on multiple CPUs, and is never necessary when there's only one CPU involved. But xen_mb is used to coordinate between the guest and the `device' implemented by a hypervisor, which might be running on another _physical_ CPU even if the NetBSD guest only sees one `CPU', i.e., one _virtual_ CPU. So even on `uniprocessor' boots, xen_mb must still issue an instruction with store-before-load ordering on multiprocessor systems, such as a LOCK ADD (or MFENCE, but MFENCE is costlier for no benefit here). No need to change xen_wmb (release ordering, load/store-before-store) or xen_rmb (acquire ordering, load-before-load/store) because every x86 store is a store-release and every x86 load is a load-acquire, even on multiprocessor systems, so there's no hotpatching involved anyway. PR kern/57199
1.67.6.1	02-Aug-2025	perseant	Sync with HEAD

OpenGrok