Home | History | Annotate | Download | only in x86
History log of /src/sys/arch/xen/x86/cpu.c
RevisionDateAuthorComments
 1.145  25-Feb-2023  riastradh xen/x86/cpu.c: Nix trailing whitespace.

No functional change intended.
 1.144  25-Feb-2023  riastradh xen/x86/cpu.c: Membar audit.

I see no reason for store-before-load ordering here; as far as I'm
aware, evtchn_upcall_mask is only shared between a (v)CPU and its
(hypervisor) interrupts, not other (v)CPUs.
 1.143  25-Feb-2023  riastradh x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.
 1.142  20-Aug-2022  riastradh branches: 1.142.4;
x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.141  07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.140  24-Apr-2021  thorpej branches: 1.140.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.139  14-Jul-2020  yamaguchi branches: 1.139.4;
Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.138  08-Jul-2020  jdolecek initalize ci_kfpu_spl, to avoid triggering KASSERT() in fpu_kern_enter()

Follows revision 1.177 of sys/arch/x86/x86/cpu.c:
"""
Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();
"""

With this DomU kernel boots with:
[ 2.0000571] aes: Intel AES-NI
 1.137  27-Jun-2020  jdolecek avoid excessive stack usage in mp_cpu_start(), this is called after VM
init so kmem(9) can be used
 1.136  21-May-2020  ad - Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.135  25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.134  21-Apr-2020  ad Follow convention and put entire predicate inside __predict_false()
 1.133  24-Feb-2020  rin branches: 1.133.4;
0x%p --> %p for non-external codes.
 1.132  13-Jan-2020  bouyer Don't call cpu_switchto() before idle_loop(), it should not be needed any more.
While there, assume (and KASSERT) that curlwp == ci->ci_data.cpu_idlelwp,
this saves a lwp_getpcb() call.
 1.131  23-Nov-2019  ad branches: 1.131.2;
cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.130  12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.129  09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.128  02-Feb-2019  cherry Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.127  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.126  12-Aug-2018  maxv Introduce PDIR_SLOT_USERLIM, which indicates the limit of the user slots.
Use it instead of PDIR_SLOT_PTE when we just want to iterate over the
user slots. Also use it in SVS, I had hardcoded 255 because there was no
proper define (which there now is).
 1.125  27-Jul-2018  maxv Reduce the size of the blocks. No functional change.
 1.124  26-Jul-2018  maxv Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.
 1.123  24-Jul-2018  bouyer Sync cpu_boot_secondary_processors() with x86/x86/cpu.c:
explicitely wait for all CPUs to be registered in kcpuset_running.
 1.122  23-Jun-2018  jdolecek branches: 1.122.2;
re-do the XEN XSAVE support, this time to leave all probe code in
cpu_probe_fpu(), and have XEN cpu_init() just act

the xen probe is now guarded by XEN_USE_XSAVE option and XSAVE
support is thus still disabled by default (same as before), so it
wouldn't interfere with maxv's eager fpu rototil, while still
allowing testing for others

PR kern/50332
 1.121  23-Jun-2018  maxv Revert the rest of jdolecek's changes. This puts us back in a clean,
sensical state.
 1.120  22-Jun-2018  maxv Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.
 1.119  20-Jun-2018  jdolecek as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv
 1.118  19-Jun-2018  jdolecek fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8
 1.117  13-Jan-2018  bouyer branches: 1.117.2;
Needs cpu_init_tss() for application processor too.
 1.116  11-Nov-2017  maxv Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.
 1.115  11-Nov-2017  bouyer Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
 1.114  11-Nov-2017  riastradh No externs in .c files!
 1.113  08-Nov-2017  maxv Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.
 1.112  17-Sep-2017  maxv Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.
 1.111  06-Jul-2017  bouyer gdt_prepframes() is called with a number of pages, don't convert to a number
of pages again. This didn't fail because we're called with only one page, and
the conversion from '1' to pages resulted in 1 again.
 1.110  23-Mar-2017  maxv branches: 1.110.6;
Remove PG_k completely.
 1.109  11-Feb-2017  maxv Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.
 1.108  05-Feb-2017  maxv Rename ldt->ldtstore and gdt->gdtstore on i386. It reduces the diff with
amd64, and makes it easier to track down these variables on nxr - 'ldt'
and 'gdt' being common keywords.
 1.107  02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.106  22-Jan-2017  maxv Import xpmap_pg_nx, and put it in the per-cpu recursive slot on amd64.
 1.105  25-Nov-2016  maxv branches: 1.105.2;
KNF a little
 1.104  07-Jul-2016  msaitoh branches: 1.104.2;
KNF. Remove extra spaces. No functional change.
 1.103  13-Dec-2015  christos need definition
 1.102  13-Dec-2015  christos fix the build.
 1.101  08-Dec-2014  msaitoh Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.
 1.100  27-Nov-2014  bouyer branches: 1.100.2;
Revert sys/arch/x86/x86/pmap.c 1.185; a CPU needs to get pmap updates,
especially for pmap_kernel(), as soon as it is up.
Instead move all pmap-related cpu_info initialisations, including
initializing ci_kpm_mtx, in cpu_attach_common() from cpu_init()
(ci_pmap and ci_tlbstate as already initialized in cpu_attach_common()).
 1.99  18-Oct-2014  snj src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.98  12-Feb-2014  dsl branches: 1.98.4;
Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.97  11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.96  26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.95  01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.94  23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.93  24-Jun-2012  jym branches: 1.93.2; 1.93.4;
Update comment: we stopped using xcall to sync PTP between CPUs.
pmap_kpm_sync_xcall => xen_kpm_sync
 1.92  06-Jun-2012  rmind Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.
 1.91  20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.90  11-Mar-2012  jym Typo fix.
 1.89  25-Feb-2012  bouyer The code assumes that ci_index is also the Xen's cpunum, and that
cpunum is less than XEN_LEGACY_MAX_VCPUS. KASSERT both.
 1.88  24-Feb-2012  bouyer Don't maintain ci_cpumask for physical CPUs, it's not used.
 1.87  24-Feb-2012  bouyer Get rid of phycpus_attached bitmask; it's maintained but not used and
will limit the number of physical CPUs to 32 without good reasons.
 1.86  24-Feb-2012  cherry (xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).
 1.85  23-Feb-2012  cherry Cleanup.

- Remove cruft from native x86 origin.
- Remove access to privileged MSRs.
- Cleanup stale comments.
 1.84  23-Feb-2012  cherry cpu_load_pmap() should not be used to load pmap_kernel(), since in the
x86 model, its mappings are shared across pmaps. KASSERT() for this
and remove unused codepaths.
 1.83  22-Feb-2012  bouyer use pmap_protect() instead of pmap_kenter_pa() to remap R/O an exiting
page. This gets rid of the last "mapping already present" warnings.
 1.82  21-Feb-2012  bouyer Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.

Makes LOCKDEBUG kernels boot again
 1.81  17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.80  13-Feb-2012  jym branches: 1.80.2;
PAT flags are not under control of Xen domains currently, so there is no
point in enabling them.

Avoids:
- a warning logged by hypervisor when a domain attempts to modify the PAT
MSR.
- an error during domain resuming, where a PAT flag has been set on a page
while the hypervisor does not allow it.

ok releng@.
 1.79  28-Jan-2012  cherry Update comments to remove references to alternate pte space.
 1.78  28-Jan-2012  cherry stop using alternate pde mapping in xen pmap
 1.77  09-Jan-2012  cherry revert previous commit. DIAGNOSTIC should only do strict checks, not muffle current ones
 1.76  06-Jan-2012  cherry Address those pesky DIAGNOSTIC messages. \n
Take a performance hit at fork() for not DTRT. \n
Note: Only applicable for kernels built with "options DIAGNOSTIC" \n
 1.75  04-Jan-2012  cherry Use macro PDP_SIZE instead of numeric constant, for unshared PAE L3 entries.
Thanks jym@
 1.74  30-Dec-2011  cherry Never cut-paste code from email!
Use the right count (0 -> 2) of l3 unshared userland entries for per-cpu initialisation.
 1.73  30-Dec-2011  cherry Force pae l3 page allocation for new vcpus to be < 4G, so they fit in 32bits
 1.72  30-Dec-2011  cherry per-cpu shadow directory pages should be updated locally via cross-calls. Do this.
 1.71  07-Dec-2011  cegger switch from xen3-public to xen-public.
 1.70  06-Nov-2011  cherry branches: 1.70.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.
 1.69  06-Nov-2011  cherry [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.68  20-Oct-2011  jruoho branches: 1.68.2;
Remove code that is commented out and out-of-sync with x86. If Xen needs to
use cpu_resume(), cpu_suspend(), or cpu_shutdown() in the future, it is
better to expose these from x86 rather than duplicate code.
 1.67  06-Oct-2011  mrg remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.
 1.66  28-Sep-2011  jruoho Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.
 1.65  20-Sep-2011  jym Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.64  16-Aug-2011  dholland Fix broken build.
 1.63  15-Aug-2011  cherry Do not panic() on xen_send_ipi() sent to a cpu not yet running.
x86 MP boot depends on this strange behaviour.
 1.62  13-Aug-2011  cherry MP probing and startup code
 1.61  11-Aug-2011  cherry Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs
 1.60  16-Jul-2011  rmind Initialise cpus_running to 1 on Xen, as it was done on x86.

Problem analysed by hannken@. Fixes PR/45062.
 1.59  15-Jun-2011  rmind Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.
 1.58  15-Jun-2011  rmind - cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.
 1.57  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.56  26-Feb-2011  jruoho branches: 1.56.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.
 1.55  24-Feb-2011  jruoho Catch up with x86 on cpufeaturebus.
 1.54  24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.53  24-Feb-2011  jruoho Add cpufeaturebus and est(4) for Xen.
 1.52  14-Nov-2010  bouyer branches: 1.52.2; 1.52.4;
Boot vs AP processors don't make sense for physical CPUs, these are
handled by the hypervisor and all CPUs are running when the dom0 is started.
In addition, we don't have a reliable way to determine the boot CPU as
- we may not be running on the boot CPU
- we don't have access to the lapic id
So simplify by ignoring the information and assign phycpu_info_primary to the
first attached CPU.
 1.51  06-Nov-2010  uebayasi Machine dependent code is considered as part of UVM. Include
internal API header.
 1.50  03-Nov-2010  jruoho Fill cpu_info::ci_acpiid also on Xen.
 1.49  20-Aug-2010  jruoho Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.48  09-Aug-2010  jruoho Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
 1.47  24-Jul-2010  jym Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.46  06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.45  28-Jun-2010  rmind mp_cpu_start: although fragment is commented out, add pmap_update(), just
in case somebody would come up with a clever idea to copy-paste that.
 1.44  04-May-2010  jym Enable the NX bit feature for Xen i386pae and amd64 kernels.

Tested with Xen 3.1 and Xen 3.3, dom0 and domU, by bouyer@ and jym@.

Ok bouyer@.
 1.43  18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.42  03-Mar-2010  jym branches: 1.42.2;
Use roundup2() instead of hardcoding the CACHE_LINE_SIZE rounding
operation.
 1.41  24-Feb-2010  dyoung A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
 1.40  08-Jan-2010  dyoung branches: 1.40.2;
Expand PMF_FN_* macros.
 1.39  27-Nov-2009  rmind - Use uvm_lwp_setuarea() instead of directly setting address to lwp_t::l_addr.
- Replace most remaining uses of l_addr with uvm_lwp_getuarea() or lwp_getpcb().
- Amend assembly in ports where it accesses PCB via struct user.
- Rename L_ADDR to L_PCB in few places. Reduce sys/user.h inclusions.
 1.38  24-Nov-2009  cegger Remove X86_MAXPROCS. This fixes PR port-xen/41755.
This also reduces diff to x86/x86/cpu.c as a nice side effect.
'looks good' bouyer@
 1.37  21-Nov-2009  rmind Catch-up Xen and usermode with lwp_getpcb() and unbreak Xen build.
 1.36  07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.35  22-Sep-2009  cegger fix botch with merging in changes from x86/x86/cpu.c:

don't use wbinvd(). Xen flushes the cache for us.
This makes DomU boot again.
Spotted by bouyer@.
 1.34  30-Jul-2009  cegger from x86/x86/cpu.c:
- use atomic operations to set flags
- Align struct cpu_info to 64b.
 1.33  29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.32  08-Jun-2009  cegger from sys/arch/x86/x86/cpu.c:

Implement -1 (RB_MD1) for physical CPUs in the Dom0.
 1.31  23-Dec-2008  cegger branches: 1.31.2;
catch up with x86/x86/cpu.c: move from malloc to kmem
 1.30  06-Nov-2008  cegger Link cpus in the order they are attaching and not in inverse order.
 1.29  31-Oct-2008  rmind - Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.
 1.28  22-Aug-2008  bouyer branches: 1.28.2; 1.28.4;
printf()->aprint_debug_dev() to match x86/cpu.c
 1.27  28-May-2008  ad branches: 1.27.4;
Give it a private X86_MAXPROCS def. XXX
 1.26  16-May-2008  bouyer call x86_cpu_idle_init(), avoid null function pointer call (cpu_idle()) when
scheduling starts.
cleanup printfs of vcpu
 1.25  11-May-2008  ad Fix typo.
 1.24  11-May-2008  ad Don't reload LDTR unless a new value, which only happens for USER_LDT.
 1.23  11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.22  11-May-2008  ad Share cpu.h between the x86 ports.
 1.21  11-May-2008  ad Update xen for identcpu changes.
 1.20  10-May-2008  ad Make xen build after tsc changes.
 1.19  09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.18  28-Apr-2008  martin branches: 1.18.2;
Remove clause 3 and 4 from TNF licenses
 1.17  24-Apr-2008  cegger branches: 1.17.2;
keep up with x86/x86/cpu.c, rev. 1.33:
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than panicing.
 1.16  21-Apr-2008  cegger Access Xen's vcpu info structure per-CPU.
Tested on i386 and amd64 (both dom0 and domU) by me.
Xen2 tested (both dom0 and domU) by bouyer.
OK bouyer
 1.15  18-Apr-2008  cegger branches: 1.15.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.
 1.14  17-Apr-2008  bouyer Do not set ioapic_bsp_id in cpu_attach_common(). It's already initialized
in cpu_attach(), and doing it here will overwrite the cpu_number of the
physical CPU with the one from the virtual CPU (which is always 0).
XXX is ioapic_bsp_id read somewhere ?
 1.13  17-Apr-2008  yamt cpu_debug_dump: s/curproc/curlwp/ in a message.
 1.12  17-Apr-2008  cegger reduce diff to x86/x86/cpu.c
 1.11  13-Apr-2008  cegger reduce diff to x86/x86/cpu.c
 1.10  13-Apr-2008  cegger - device_t/softc split
- ansify
 1.9  06-Apr-2008  cegger use aprint_*_dev and device_xname
 1.8  16-Jan-2008  dogcow branches: 1.8.6;
cargo-cult copy cpu_offline_md; fixes compile on i386/x86_64
 1.7  11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.6  04-Jan-2008  yamt branches: 1.6.2;
i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.
 1.5  18-Dec-2007  joerg Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.4  12-Dec-2007  bouyer Initialize ci_idepth in cpu_info_primary, makes LOCKDEBUG kernels boot.
 1.3  10-Dec-2007  bouyer branches: 1.3.2;
Make Xen kernels build again.
 1.2  22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1  17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file cpu.c was initially added on branch bouyer-xenamd64.
 1.1.4.3  18-Feb-2008  mjf Sync with HEAD.
 1.1.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.1.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.1.2.4  18-Nov-2007  bouyer Ignore MTRRs for now, make kernel build again.
 1.1.2.3  18-Nov-2007  bouyer Sync with HEAD
 1.1.2.2  25-Oct-2007  bouyer Finish sync with HEAD. Especially use the new x86 pmap for xenamd64.
For this:
- rename pmap_pte_set() to pmap_pte_testset()
- make pmap_pte_set() a function or macro for non-atomic PTE write
- define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries
- define pmap_pte_flush() which is a nop in x86 case, and flush the
MMUops queue in the Xen case
 1.1.2.1  17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.10.2  13-Dec-2007  yamt sync with head.
 1.2.10.1  11-Dec-2007  yamt sync with head.
 1.2.8.3  21-Jan-2008  yamt sync with head
 1.2.8.2  07-Dec-2007  yamt sync with head
 1.2.8.1  22-Nov-2007  yamt file cpu.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:16 +0000
 1.2.6.1  26-Dec-2007  ad Sync with head.
 1.2.4.2  03-Dec-2007  ad Sync with HEAD.
 1.2.4.1  22-Nov-2007  ad file cpu.c was added on branch vmlocking on 2007-12-03 19:04:38 +0000
 1.2.2.2  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1  22-Nov-2007  joerg file cpu.c was added on branch jmcneill-pm on 2007-11-27 19:36:18 +0000
 1.3.2.5  19-Jan-2008  bouyer Sync with HEAD
 1.3.2.4  08-Jan-2008  bouyer Sync with HEAD
 1.3.2.3  06-Jan-2008  bouyer Merge needed changes to genassym.cf and locore.S for xeni386 back to
arch/i386. Switch xeni386 to use the arch/i386 cpu.h.
 1.3.2.2  02-Jan-2008  bouyer Sync with HEAD
 1.3.2.1  13-Dec-2007  bouyer Sync with HEAD
 1.6.2.3  23-Mar-2008  matt sync with HEAD
 1.6.2.2  09-Jan-2008  matt sync with HEAD
 1.6.2.1  04-Jan-2008  matt file cpu.c was added on branch matt-armv6 on 2008-01-09 01:50:13 +0000
 1.8.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.8.6.2  28-Sep-2008  mjf Sync with HEAD.
 1.8.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.15.2.2  04-Jun-2008  yamt sync with head
 1.15.2.1  18-May-2008  yamt sync with head.
 1.17.2.7  09-Oct-2010  yamt sync with head
 1.17.2.6  11-Aug-2010  yamt sync with head.
 1.17.2.5  11-Mar-2010  yamt sync with head
 1.17.2.4  19-Aug-2009  yamt sync with head.
 1.17.2.3  20-Jun-2009  yamt sync with head
 1.17.2.2  04-May-2009  yamt sync with head.
 1.17.2.1  16-May-2008  yamt sync with head.
 1.18.2.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.18.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.27.4.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.27.4.1  19-Oct-2008  haad Sync with HEAD.
 1.28.4.3  22-Nov-2010  riz Pull up following revision(s) (requested by bouyer in ticket #1475):
sys/arch/xen/x86/cpu.c: revision 1.52
Boot vs AP processors don't make sense for physical CPUs, these are
handled by the hypervisor and all CPUs are running when the dom0 is started.
In addition, we don't have a reliable way to determine the boot CPU as
- we may not be running on the boot CPU
- we don't have access to the lapic id
So simplify by ignoring the information and assign phycpu_info_primary to the
first attached CPU.
 1.28.4.2  22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.28.4.1  13-Nov-2008  snj branches: 1.28.4.1.2; 1.28.4.1.4;
Pull up following revision(s) (requested by rmind in ticket #48):
sys/kern/kern_cpu.c: revision 1.37
sys/arch/x86/x86/cpu.c: revision 1.58
sys/arch/xen/x86/cpu.c: revision 1.29
sys/sys/cpu.h: revision 1.24
sys/kern/sys_sched.c: revision 1.31
- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().
Should fix PR/39349.
 1.28.4.1.4.1  20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.28.4.1.2.1  23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.28.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.31.2.9  27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.31.2.8  28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.31.2.7  10-Jan-2011  jym Sync with HEAD
 1.31.2.6  24-Oct-2010  jym Sync with HEAD
 1.31.2.5  01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.31.2.4  01-Nov-2009  jym Sync with HEAD.
 1.31.2.3  23-Jul-2009  jym Sync with HEAD.
 1.31.2.2  18-Jun-2009  cegger register physical CPUs with pmf.
No suspend/resume handlers needed since the hypervisor itself handles them.
ok @jym
 1.31.2.1  09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.40.2.7  09-Nov-2010  uebayasi Sync with HEAD.
 1.40.2.6  09-Nov-2010  uebayasi Sync with HEAD.
 1.40.2.5  06-Nov-2010  uebayasi Sync with HEAD.
 1.40.2.4  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.40.2.3  26-Aug-2010  uebayasi Fix build.
 1.40.2.2  17-Aug-2010  uebayasi Sync with HEAD.
 1.40.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.42.2.4  05-Mar-2011  rmind sync with head
 1.42.2.3  03-Jul-2010  rmind sync with head
 1.42.2.2  31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.42.2.1  30-May-2010  rmind sync with head
 1.52.4.1  05-Mar-2011  bouyer Sync with HEAD
 1.52.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.56.2.12  20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.56.2.11  09-Sep-2011  cherry fix amd64 boot.
 1.56.2.10  01-Sep-2011  cherry fix %cr3 init. from mhitch@, tested by riz@ & mhitch@
 1.56.2.9  30-Aug-2011  cherry Add per-cpu mmu queues
 1.56.2.8  26-Aug-2011  cherry Name the L4 per-cpu pointer appropriately.
User cr3 should point to the per-cpu L4, not the user pmap pdir
 1.56.2.7  20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.56.2.6  17-Aug-2011  cherry Pullup relevant changes from -current
 1.56.2.5  07-Aug-2011  cherry Fix XEN3PAE_DOMx build
 1.56.2.4  31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.56.2.3  16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.56.2.2  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.56.2.1  03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.68.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.68.2.4  30-Oct-2012  yamt sync with head
 1.68.2.3  23-May-2012  yamt sync with head.
 1.68.2.2  17-Apr-2012  yamt sync with head
 1.68.2.1  10-Nov-2011  yamt sync with head
 1.70.4.5  29-Apr-2012  mrg sync to latest -current.
 1.70.4.4  05-Apr-2012  mrg sync to latest -current.
 1.70.4.3  04-Mar-2012  mrg sync to latest -current.
 1.70.4.2  24-Feb-2012  mrg sync to -current.
 1.70.4.1  18-Feb-2012  mrg merge to -current.
 1.80.2.5  12-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #314):
sys/arch/xen/x86/cpu.c: revision 1.92
sys/kern/subr_kcpuset.c: revision 1.6
sys/sys/kcpuset.h: revision 1.6
sys/arch/xen/x86/x86_xpmap.c: revision 1.44
Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.
Tested by chs@.
 1.80.2.4  09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.80.2.3  24-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #45):
sys/arch/xen/x86/cpu.c: revision 1.87
sys/arch/xen/x86/cpu.c: revision 1.88
Get rid of phycpus_attached bitmask; it's maintained but not used and
will limit the number of physical CPUs to 32 without good reasons.
Don't maintain ci_cpumask for physical CPUs, it's not used.
 1.80.2.2  22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #31):
sys/arch/x86/x86/pmap.c: revision 1.166
sys/arch/xen/x86/cpu.c: revision 1.83
- Make pmap_write_protect() work with pmap_kernel() too ((va & L2_FRAME)
strips the high bits of a LP64 address)
- use pmap_protect() in pmap_pdp_ctor() to remap the PDP read-only instead
of (ab)using pmap_kenter_pa(). No more "mapping already present" on
console with DIAGNOSTIC kernels
- make sure to zero the whole PDP (NTOPLEVEL_PDES doens't include
high-level entries on i386 and i386PAE, reserved by Xen). Not sure
how it has worked before
- remove an always-true test (&& pmap != pmap_kernel(); we KASSERT that
at the function entry).
use pmap_protect() instead of pmap_kenter_pa() to remap R/O an exiting
page. This gets rid of the last "mapping already present" warnings.
 1.80.2.1  22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.93.4.1  18-May-2014  rmind sync with head
 1.93.2.2  03-Dec-2017  jdolecek update from HEAD
 1.93.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.98.4.2  07-Mar-2016  msaitoh Pullup the following revision to fix build break caused by ticket #1118:

sys/arch/xen/x86/cpu.c 1.102-1.103

Increase the number of entries of cpu_features from 5 to 7.
 1.98.4.1  04-Aug-2015  snj branches: 1.98.4.1.2;
Pull up following revision(s) (requested by prlw1 in ticket #934):
sys/arch/xen/x86/cpu.c: revision 1.100
Move all pmap-related cpu_info initialisations, including
initializing ci_kpm_mtx, in cpu_attach_common() from cpu_init()
(ci_pmap and ci_tlbstate as already initialized in cpu_attach_common()).
 1.98.4.1.2.1  20-Mar-2018  martin Additionally pull up the following for ticket #1118:

sys/arch/xen/x86/cpu.c 1.102-1.103

to unbreak the build (adjust cpu_feature declaration to changes in generic
x86 code).
 1.100.2.6  28-Aug-2017  skrll Sync with HEAD
 1.100.2.5  05-Feb-2017  skrll Sync with HEAD
 1.100.2.4  05-Dec-2016  skrll Sync with HEAD
 1.100.2.3  09-Jul-2016  skrll Sync with HEAD
 1.100.2.2  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.100.2.1  06-Apr-2015  skrll Sync with HEAD
 1.104.2.3  26-Apr-2017  pgoyette Sync with HEAD
 1.104.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.104.2.1  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.105.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.110.6.1  13-Mar-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #629:

sys/arch/amd64/amd64/genassym.cf 1.63,1.64
sys/arch/amd64/amd64/locore.S 1.144
sys/arch/amd64/amd64/machdep.c 1.281-1.283
sys/arch/i386/i386/genassym.cf 1.105-1.106
sys/arch/i386/i386/locore.S 1.155
sys/arch/i386/i386/machdep.c 1.802 (adapted),1.803
sys/arch/x86/include/cpu.h 1.85
sys/arch/x86/x86/intr.c 1.115-1.116
sys/arch/x86/x86/pmap.c 1.275
sys/arch/x86/x86/sys_machdep.c 1.45
sys/arch/xen/x86/cpu.c 1.117

Stop sharing the double-fault stack.
Merge the TSS structures into one single cpu_tss structure, and
allocate it dynamically.
 1.117.2.3  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.117.2.2  28-Jul-2018  pgoyette Sync with HEAD
 1.117.2.1  25-Jun-2018  pgoyette Sync with HEAD
 1.122.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.122.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.122.2.1  10-Jun-2019  christos Sync with HEAD
 1.131.2.2  29-Feb-2020  ad Sync with head.
 1.131.2.1  17-Jan-2020  ad Sync with head.
 1.133.4.1  18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.139.4.1  02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.140.8.1  04-Aug-2021  thorpej Adapt to CFARGS().
 1.142.4.1  31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #268):

sys/arch/xen/xenbus/xenbus_comms.c: revision 1.25
sys/arch/xen/xenbus/xenbus_comms.c: revision 1.26
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.110
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.111
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.112
sys/arch/xen/x86/cpu.c: revision 1.144
sys/arch/xen/x86/cpu.c: revision 1.145
sys/arch/xen/include/hypervisor.h: revision 1.56
sys/arch/xen/include/hypervisor.h: revision 1.57
sys/arch/xen/xen/xbdback_xenbus.c: revision 1.102
sys/arch/xen/xen/xbdback_xenbus.c: revision 1.103
sys/arch/xen/include/xenring.h: revision 1.7
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.109
sys/arch/xen/xen/xengnt.c: revision 1.40
sys/arch/xen/xen/xengnt.c: revision 1.41
sys/arch/xen/xen/if_xennet_xenbus.c: revision 1.129
sys/arch/xen/xen/xencons.c: revision 1.51
sys/arch/xen/xen/xencons.c: revision 1.52
sys/arch/xen/xen/xencons.c: revision 1.53
sys/arch/xen/xen/xbd_xenbus.c: revision 1.130 (patch)
sys/arch/xen/xen/xbd_xenbus.c: revision 1.131 (patch)

xen: Fix sense of xen_rmb/wmb to make sense.

Use membar_acquire and membar_release, not membar_consumer and
membar_producer, out of paranoia -- that better matches Linux's
rmb/wmb (at least for non-I/O loads and stores).

Proposed on port-xen:
https://mail-index.netbsd.org/port-xen/2022/07/13/msg010248.html

xen/x86/cpu.c: Membar audit.

I see no reason for store-before-load ordering here; as far as I'm
aware, evtchn_upcall_mask is only shared between a (v)CPU and its
(hypervisor) interrupts, not other (v)CPUs.

xennet(4): Membar audit.
- xennet_tx_complete: Other side owns rsp_prod, giving us responses
to tx commands. We own rsp_cons, recording which responess we've
processed already.
1. Other side initializes responses before advancing rsp_prod, so
we must observe rsp_prod before trying to examine the responses.
Hence load from rsp_prod must be followed by xen_rmb.
(Can this just use atomic_load_acquire?)
2. As soon as other side observes rsp_event, it may start to
overwrite now-unused response slots, so we must finish using the
response before advancing rsp_cons. Hence we must issue xen_wmb
before store to rsp_event.
(Can this just use atomic_store_release?)
(Should this use RING_FINAL_CHECK_FOR_RESPONSES?)
3. When loop is done and we set rsp_event, we must ensure the other
side has had a chance to see that we want more before we check
whether there is more to consume; otherwise the other side might
not bother to send us an interrupt. Hence after setting
rsp_event, we must issue xen_mb (store-before-load) before
re-checking rsp_prod.
- xennet_handler (rx): Same deal, except the xen_mb is buried in
RING_FINAL_CHECK_FOR_RESPONSES. Unclear why xennet_tx_complete has
this open-coded while xennet_handler (rx) uses the macro.

xbd(4): Membar audit.
After consuming slots, must issue xen_wmb before notifying the other
side that we've consumed them in RING_FINAL_CHECK_FOR_RESPONSES.
xbdback(4): Membar audit.

After consuming request slots, must issue xen_wmb notifying the other
side that we've consumed them in RING_FINAL_CHECK_FOR_REQUESTS.

xencons(4): Membar audit.
- xenconscn_getc: Once we have consumed an input slot, it is clearer
to issue xen_wmb (release, i.e., load/store-before-store) before
advancing in_cons so that the update becomes a store-release
freeing the input slot for the other side to reuse.
- xenconscn_putc: After filling an output slot, must issue xen_wmb
(release, i.e., load/store-before-store) before advancing out_prod,
and another one before notifying the other side of the advance.

xencons(4): Reduce unnecessary membars.
- xencons_handler: After advancing in_cons, only need one xen_wmb
before notifying the hypervisor that we're ready for more.
(XXX Should this do xen_mb and re-check in_prod at that point, or
does hypervisor_notify_via_evtchn obviate the need for this?)
- xenvonscn_getc: After reading in_prod, only need one xen_rmb before
using the slots it is telling us are now ready.

xengnt(4): Membar audit.
This had the sense of membars reversed, presumably because xen_rmb
and xen_wmb had gotten reversed at some point.
xenbus_comms.c: Membar audit.

This had the sense of membars reversed, presumably because xen_rmb
and xen_wmb had gotten reversed at some point.

xennetback(4): Fix xennetback_evthandler loop.
- After observing the other side has produced pending tx requests by
reading sring->req_prod, must issue xen_rmb before touching them.
Despite all the effort to use the heavy-weight
RING_FINAL_CHECK_FOR_REQUESTS on each request in the loop, this
barrier was missing.
- No need to update req_cons at each iteration in the loop. It's
private. Just update it once at the end.
- After consuming requests, must issue xen_wmb before releasing the
slots with RING_FINAL_CHECK_FOR_REQUEST for the other side to
reuse.

xennetback(4): Fix membars in xennetback_rx_copy_process.
- No need for barrier around touching req_cons and rsp_prod_pvt,
which are private.
- RING_PUSH_RESPONSES_AND_CHECK_NOTIFY already issues xen_wmb, no
need to add one explicitly.
- After pushing responses, must issue xen_wmb (not xen_rmb) before
hypervisor_notify_via_evtchn.

xennetback(4): Omit needless membars in xennetback_connect.
xneti is a private data structure to which we have exclusive access
here; ordering the stores doesn't make sense.

xen/hypervisor.h: Nix trailing whitespace.
No functional change intended.

xen/x86/cpu.c: Nix trailing whitespace.
No functional change intended.

xbd(4): Nix trailing whitespace.

xbdback(4): Nix trailing whitespace.
No functional change intended.

xencons(4): Nix trailing whitespace.
No functional change intended.

xengnt(4): Nix trailing whitespace.
No functional change intended.

xenbus_comms.c: Nix trailing whitespace.
No functional change intended.

xennetback(4): Nix trailing whitespace.
No functional change intended.

RSS XML Feed