Home | History | Annotate | Download | only in x86
History log of /src/sys/arch/xen/x86/x86_xpmap.c
RevisionDateAuthorComments
 1.93  11-May-2024  andvar s/boostrap/bootstrap/ in comment, warning message and documentation.
 1.92  20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.91  11-May-2022  bouyer In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.
 1.90  06-Sep-2020  riastradh Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.
 1.89  26-May-2020  bouyer Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().
 1.88  06-May-2020  bouyer xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.
 1.87  06-May-2020  bouyer KASSERT() that the per-cpu queues are run at IPL_VM after boot.
 1.86  02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.85  30-Oct-2019  maxv Switch to new PTE bits.
 1.84  09-Mar-2019  maxv branches: 1.84.4;
Start replacing the x86 PTE bits.
 1.83  07-Mar-2019  maxv Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.
 1.82  04-Feb-2019  cherry Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.
 1.81  29-Jul-2018  maxv Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.
 1.80  27-Jul-2018  maxv Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.
 1.79  26-Jul-2018  maxv Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.
 1.78  26-Jul-2018  maxv Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.
 1.77  26-Jul-2018  maxv Merge the blocks. No functional change.
 1.76  26-Jul-2018  maxv Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.
 1.75  24-Jun-2018  jdolecek branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls
 1.74  16-Sep-2017  maxv branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.
 1.73  18-Mar-2017  maxv Style, and remove debug code that does not work anyway.
 1.72  08-Mar-2017  maxv A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.
 1.71  02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.70  22-Jan-2017  maxv Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.
 1.69  06-Jan-2017  maxv branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.
 1.68  16-Dec-2016  maxv The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.
 1.67  15-Nov-2016  maxv Mmh, apparently I didn't properly test my previous change since it does not
compile anymore
 1.66  15-Nov-2016  maxv Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.
 1.65  11-Nov-2016  maxv Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.
 1.64  11-Nov-2016  maxv Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.
 1.63  01-Nov-2016  maxv Map the PTE space as non-executable on PAE. The same is already done on
amd64.
 1.62  01-Nov-2016  maxv Map the remaining pages as non-executable. Only text should have X.
 1.61  25-Aug-2016  bouyer Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.
 1.60  23-Aug-2016  bouyer Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2
 1.59  11-Aug-2016  maxv Make the I/O area non-executable on Xen.
 1.58  03-Aug-2016  maxv Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.
 1.57  02-Aug-2016  maxv Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.
 1.56  02-Aug-2016  maxv Use PG_RO instead of a magic zero.
 1.55  02-Aug-2016  maxv KNF, and use PAGE_SIZE instead of NBPG.
 1.54  29-May-2016  bouyer branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.
 1.53  06-May-2014  cherry branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.
 1.52  10-Nov-2013  jnemeth branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.
 1.51  08-Nov-2013  christos fix unused variable warnings
 1.50  06-Nov-2013  mrg - move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.
 1.49  16-Sep-2012  rmind branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.
 1.48  21-Aug-2012  bouyer branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.
 1.47  21-Aug-2012  rmind Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).
 1.46  30-Jun-2012  jym Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.
 1.45  27-Jun-2012  jym Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).
 1.44  06-Jun-2012  rmind Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.
 1.43  20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.42  02-Mar-2012  bouyer MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.
 1.41  24-Feb-2012  cherry (xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).
 1.40  23-Feb-2012  bouyer On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699
 1.39  17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.38  12-Jan-2012  cherry branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath
 1.37  09-Jan-2012  cherry Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()
 1.36  06-Nov-2011  cherry branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.
 1.35  06-Nov-2011  cherry [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.34  20-Sep-2011  jym branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.33  21-Aug-2011  jym Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!
 1.32  13-Aug-2011  cherry Call the right function
(fix for an egregious error)
 1.31  13-Aug-2011  cherry Add locking around ops to the hypervisor MMU "queue".
 1.30  13-Aug-2011  cherry remove unnecessary locking overhead for UP
 1.29  10-Aug-2011  cherry Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops
 1.28  15-Jun-2011  rmind Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.
 1.27  15-Jun-2011  rmind - cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.
 1.26  08-May-2011  jym branches: 1.26.2;
Print the PGD address in the debug message.
 1.25  29-Mar-2011  jym Typo fix.
 1.24  10-Feb-2011  jym Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.
 1.23  20-Dec-2010  jym branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...
 1.22  19-Dec-2010  jym Need the successful count (for AMI debugging)
 1.21  24-Jul-2010  jym Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.20  15-Jul-2010  jym With Xen, PDPpaddr should contain a guest physical address (== PFN).
 1.19  26-Feb-2010  jym branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
 1.18  12-Feb-2010  jym Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.
 1.17  23-Oct-2009  snj branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).
 1.16  19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.15  29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.14  23-Jul-2009  jym Fix typos in comments and __PRINTKs.
 1.13  20-Jun-2009  cegger sprintf -> snprintf. Wrap long lines.
 1.12  13-Nov-2008  cegger branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.
 1.11  24-Oct-2008  jym branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).
 1.10  21-Oct-2008  cegger introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.9  05-Sep-2008  tron Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.
 1.8  14-Apr-2008  cegger branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions
 1.7  17-Feb-2008  bouyer branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.
 1.6  23-Jan-2008  bouyer Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).
 1.5  15-Jan-2008  bouyer Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.
 1.4  11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.3  23-Nov-2007  bouyer branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.
 1.2  22-Nov-2007  bouyer Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1  21-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.
 1.1.4.2  18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.1.2.6  22-Nov-2007  bouyer Disable debug messages
 1.1.2.5  21-Nov-2007  bouyer When HYPERVISOR_mmu_update_self() fails in xpq_flush_queue(), dump the content
of the queue.
 1.1.2.4  19-Nov-2007  bouyer Get rid of arch/xenamd64, step 3: merge xenamd64/amd64/xpmap.c in
xen/x86/x86_xpmap.c
 1.1.2.3  26-Oct-2007  bouyer Make amd64, i386 and xen kernels build and work again.
 1.1.2.2  25-Oct-2007  bouyer Finish sync with HEAD. Especially use the new x86 pmap for xenamd64.
For this:
- rename pmap_pte_set() to pmap_pte_testset()
- make pmap_pte_set() a function or macro for non-atomic PTE write
- define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries
- define pmap_pte_flush() which is a nop in x86 case, and flush the
MMUops queue in the Xen case
 1.1.2.1  21-Oct-2007  bouyer Factorise some Xen pmap code in x86_xpmap.c.
More xpmap_{ptom,mtop} -> xpmap_{ptom,mtop}_masked

The xenamd64 kernel is now good enough to complete a sysinst install from
xennet to xbd.
 1.3.16.3  23-Mar-2008  matt sync with HEAD
 1.3.16.2  09-Jan-2008  matt sync with HEAD
 1.3.16.1  23-Nov-2007  matt file x86_xpmap.c was added on branch matt-armv6 on 2008-01-09 01:50:15 +0000
 1.3.12.14  20-Jan-2008  bouyer Remove debug printk()
 1.3.12.13  19-Jan-2008  bouyer Sync with HEAD
 1.3.12.12  18-Jan-2008  bouyer Fix APDP handling. A XEN i386PAE kernel now boots multiuser
 1.3.12.11  17-Jan-2008  bouyer - Fix L2_SLOT_APTE value (not sure how I got this value but it was definitively
wrong)
- Use global variable for the PAE L3 page adresses, so that pmap.c can get it
from the bootstrap code
- Extent the size of our virtual PDP from 3 to 4 pages, so that pmap->pm_pdir[]
is contigous for the whole VA range. The last page is a shadow of
the kernel's real PDP (L3[3]).
- make pm_pdirpa an array of 4 paddr_t if using PAE. introduce a
pmap_pdirpa macro to get the physical address of a given PD entry.
- fix pmap_map_pte

The kernel now boots single-user. fsck will cause a kernel fault in
pmap_pdes_invalid() on exit.
 1.3.12.10  15-Jan-2008  bouyer Snapshot of work in progress: an Xen i386PAE kernel boots and start init
on a amd64 dom0, but panics when init forks.
This code needs a lot of cleanup, and the pmap handling is minimal to
allow init to start. It's a proof of concept of how PAE on Xen can work.

For PAE guest, the Xen MMU handling differs in some significant way
from the i386 or amd64 Xen.
The L3 page has only 4 entries, the last one mapping 0xc0000000->0xffffffff
(which happens to be our kenrel VM range, that's cool). The L2 page
pointed to by this last entry is handled specially by Xen because it
contains some Xen private mapping, including a recursive mapping. So this
page can only be pointed to by exactly one L3 entry, and nothing else
(it can't be part of a recursive mapping for example). In addition, it
would waste too much VA space to do recursive mapping at the L3 level.

We do pmap switching at the L# level, instead of doing it though %cr3.
%cr3 is static, as is L3[3] which contains only kenrel mappings.
pmap_load() does pmap switching though the first 3 entries for L3.

PTE mapping is done though 4 contigous L2 entries; the last one pointing
to a shadow of L3[3]. This way we can consider we have a 2-level VM system,
but with the L2 being 4 pages in size instead of one. The plx_i()
macros can be used with it to access the PTE without changes.

This can be reused as is for native PAE support (without the L3[3] shadow
which wouldn't be needed here)
 1.3.12.9  13-Jan-2008  bouyer Add i386PAE suport for bootstrap.
 1.3.12.8  13-Jan-2008  bouyer Work in progress on xeni386 PAE support:
Make xeni386 build with a 64bit paddr_t. For this vaddr_t vs paddr_t vs
pointers usages had to be clarified.
If 'options PAE' is present in a Xen3 kernel, switch paddr_t, pd_entry_t
and pt_entry_t to 64bits, and add the PAE entry in the __xen_guest ELF section.
 1.3.12.7  11-Jan-2008  bouyer Ops, fix XENPRINTK usage.
 1.3.12.6  11-Jan-2008  bouyer printk -> XENPRINTK
 1.3.12.5  09-Jan-2008  bouyer Merge xen bits to i386/i386/gdt.c. Convert remaining uses of PTE_* macros to
pmap_pte_* macros/inlines.
Fix think-o in pmap.c for native i386.
 1.3.12.4  05-Jan-2008  bouyer Make it build on for XEN2_*
 1.3.12.3  15-Dec-2007  bouyer Switch xen/i386 to the x86 xen_pmap_bootstrap().
 1.3.12.2  15-Dec-2007  bouyer Cleanup xen_pmap_bootstrap() and make it build on i386.
 1.3.12.1  11-Dec-2007  bouyer Switch i386 to x86/x86/pmap.c
 1.3.8.5  27-Feb-2008  yamt sync with head.
 1.3.8.4  04-Feb-2008  yamt sync with head.
 1.3.8.3  21-Jan-2008  yamt sync with head
 1.3.8.2  07-Dec-2007  yamt sync with head
 1.3.8.1  23-Nov-2007  yamt file x86_xpmap.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:18 +0000
 1.3.4.2  03-Dec-2007  ad Sync with HEAD.
 1.3.4.1  23-Nov-2007  ad file x86_xpmap.c was added on branch vmlocking on 2007-12-03 19:04:42 +0000
 1.3.2.2  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.3.2.1  23-Nov-2007  joerg file x86_xpmap.c was added on branch jmcneill-pm on 2007-11-27 19:36:22 +0000
 1.7.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.7.6.2  28-Sep-2008  mjf Sync with HEAD.
 1.7.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.8.10.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.8.10.1  19-Oct-2008  haad Sync with HEAD.
 1.8.6.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.8.4.5  11-Aug-2010  yamt sync with head.
 1.8.4.4  11-Mar-2010  yamt sync with head
 1.8.4.3  19-Aug-2009  yamt sync with head.
 1.8.4.2  18-Jul-2009  yamt sync with head.
 1.8.4.1  04-May-2009  yamt sync with head.
 1.11.4.1  24-Feb-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #1729):
sys/arch/x86/x86/pmap.c: revision 1.170 via patch
sys/arch/xen/x86/x86_xpmap.c: revision 1.40 via patch

Fix random kernel panic on domains with large memory.
May fix PR port-xen/38699
 1.11.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.12.4.15  27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.12.4.14  02-May-2011  jym Sync with head.
 1.12.4.13  30-Mar-2011  jym Sync with my commits in HEAD.
 1.12.4.12  29-Mar-2011  jym More sync fixes. And add the mbr_gpt files.
 1.12.4.11  28-Mar-2011  jym Cure sync hiccups. Code with compile errors is not really useful, heh.
 1.12.4.10  28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.12.4.9  10-Jan-2011  jym Sync with HEAD
 1.12.4.8  24-Oct-2010  jym Sync with HEAD
 1.12.4.7  01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.12.4.6  01-Nov-2009  jym Sync with HEAD.
 1.12.4.5  24-Jul-2009  jym - rework the page pinning API, so that now a function is provided for
each level of indirection encountered during virtual memory translations. Update
pmap accordingly. Pinning looks cleaner that way, and it offers the possibility
to pin lower level pages if necessary (NetBSD does not do it currently).

- some fixes and comments to explain how page validation/invalidation take
place during save/restore/migrate under Xen. L2 shadow entries from PAE are now
handled, so basically, suspend/resume works with PAE.

- fixes an issue reported by Christoph (cegger@) for xencons suspend/resume
in dom0.

TODO:

- PAE save/restore is currently limited to single-user only, multi-user
support requires modifications in PAE pmap that should be discussed first. See
the comments about the L2 shadow pages cached in pmap_pdp_cache in this commit.

- grant table bug is still there; do not use the kernels of this branch
to test suspend/resume, unless you want to experience bad crashes in dom0,
and push the big red button.

Now there is light at the end of the tunnel :)

Note: XEN2 kernels will neither build nor work with this branch.
 1.12.4.4  23-Jul-2009  jym Sync with HEAD.
 1.12.4.3  31-May-2009  jym Modifications for the Xen suspend/migrate/resume branch:

- introduce xenbus_device_{suspend,resume}() functions. These are routines
used to suspend/resume MI parts of the Xenbus device interfaces, like updating
frontend/backend devices' paths found in XenStore.

- introduce HYPERVISOR_sysctl(), an hypercall used only by Xentools to obtain
information from hypervisor (listing VMs, printing console, etc.). I use it
to query xenconsole from ddb(), as a last resort in case of a panic() in
dom0 (xm being not available). Currently unused in the branch; could be, if
requested.

- disable the rwlock(9) used to protect code that could use transient MFNs.
It could trigger nasty context switches in place it should not to.

- fix some bugs in the xennet/xbd suspend/resume pmf(9) handlers.

- following XenSource's design, talk_to_otherend() is now called
watch_otherend(), and free_otherend_details() is used by Xenbus device
suspend/resume routines.

- some slight modifications in pmap regarding APDP. Introduce an inline
function (pmap_unmap_apdp_pde()) that clears APDP entry for the current pmap.

- similarly, implement pmap_unmap_all_apdp_pdes() that iterates through all
pmaps and tears down APDP, as Xen does not handle them properly.

TODO/XXX:

- pmap_unmap_apdp_pde() does not handle APDP shadow entry of PAE. It will,
once I figure out how PAE uses it.

- revisit the pmap locking issue regarding transient MFNs. As NetBSD does not
use kernel preemption and MP for Xen, this could be skipped momentarily. See
http://mail-index.netbsd.org/port-xen/2009/04/27/msg004903.html for details.

- fix a bug regarding grant tables which could technically DoS a dom0 if
ridiculously high consumer/producer indexes are passed down in the ring during
a resume.

All in all, once the grant table index issue and APDP PAE are fixed, next step
is to torture test this branch.

Tested under i386 PAE and non-PAE, Xen3 dom0 and domU. amd64 is only compile
tested.
 1.12.4.2  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.4.1  09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.17.2.2  17-Aug-2010  uebayasi Sync with HEAD.
 1.17.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.19.2.3  31-May-2011  rmind sync with head
 1.19.2.2  21-Apr-2011  rmind sync with head
 1.19.2.1  05-Mar-2011  rmind sync with head
 1.23.4.1  17-Feb-2011  bouyer Sync with HEAD
 1.23.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.26.2.12  21-Oct-2011  bouyer Make this build without 'options MULTIPROCESSOR'
 1.26.2.11  20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.26.2.10  18-Sep-2011  cherry Make the xpq locking per-cpu
 1.26.2.9  09-Sep-2011  cherry fix amd64 boot.
 1.26.2.8  30-Aug-2011  cherry Add per-cpu mmu queues
 1.26.2.7  20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.26.2.6  17-Aug-2011  cherry Pullup relevant changes from -current
 1.26.2.5  31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.26.2.4  16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.26.2.3  27-Jun-2011  cherry Add xpq locking around xpq_queue_tlb_flush()
 1.26.2.2  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.26.2.1  03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.34.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.34.2.4  30-Oct-2012  yamt sync with head
 1.34.2.3  23-May-2012  yamt sync with head.
 1.34.2.2  17-Apr-2012  yamt sync with head
 1.34.2.1  10-Nov-2011  yamt sync with head
 1.36.4.6  29-Apr-2012  mrg sync to latest -current.
 1.36.4.5  06-Mar-2012  mrg sync to -current
 1.36.4.4  06-Mar-2012  mrg sync to -current
 1.36.4.3  04-Mar-2012  mrg sync to latest -current.
 1.36.4.2  24-Feb-2012  mrg sync to -current.
 1.36.4.1  18-Feb-2012  mrg merge to -current.
 1.38.2.5  12-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #314):
sys/arch/xen/x86/cpu.c: revision 1.92
sys/kern/subr_kcpuset.c: revision 1.6
sys/sys/kcpuset.h: revision 1.6
sys/arch/xen/x86/x86_xpmap.c: revision 1.44
Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.
Tested by chs@.
 1.38.2.4  09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.38.2.3  05-Mar-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #80):
sys/arch/xen/x86/x86_xpmap.c: revision 1.42
sys/arch/x86/include/specialreg.h: revision 1.56
sys/arch/amd64/amd64/machdep.c: revision 1.179
sys/arch/i386/i386/locore.S: revision 1.97
sys/arch/i386/i386/machdep.c: revision 1.723 via patch
sys/arch/x86/include/cpu.h: revision 1.49

Fix possible FPU registers corruption on context switches.
Fix type of pointers passed to some hypercalls.
 1.38.2.2  23-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #39):
sys/arch/x86/x86/pmap.c: revision 1.170
sys/arch/xen/x86/x86_xpmap.c: revision 1.40
On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.
May fix PR port-xen/38699
 1.38.2.1  22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.48.2.3  03-Dec-2017  jdolecek update from HEAD
 1.48.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.48.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.49.2.1  18-May-2014  rmind sync with head
 1.52.2.1  10-Aug-2014  tls Rebase.
 1.53.4.5  28-Aug-2017  skrll Sync with HEAD
 1.53.4.4  05-Feb-2017  skrll Sync with HEAD
 1.53.4.3  05-Dec-2016  skrll Sync with HEAD
 1.53.4.2  05-Oct-2016  skrll Sync with HEAD
 1.53.4.1  09-Jul-2016  skrll Sync with HEAD
 1.54.2.4  20-Mar-2017  pgoyette Sync with HEAD
 1.54.2.3  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.54.2.2  04-Nov-2016  pgoyette Sync with HEAD
 1.54.2.1  06-Aug-2016  pgoyette Sync with HEAD
 1.69.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.74.2.3  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.74.2.2  28-Jul-2018  pgoyette Sync with HEAD
 1.74.2.1  25-Jun-2018  pgoyette Sync with HEAD
 1.75.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.75.2.1  10-Jun-2019  christos Sync with HEAD
 1.84.4.2  13-May-2022  martin Pull up following revision(s) (requested by bouyer in ticket #1444):

sys/arch/xen/x86/x86_xpmap.c: revision 1.91

In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.
 1.84.4.1  31-May-2020  martin Pull up following revision(s) (requested by bouyer in ticket #935):

sys/arch/xen/x86/x86_xpmap.c: revision 1.89
sys/arch/x86/include/pmap.h: revision 1.121
sys/arch/xen/xen/privcmd.c: revision 1.58
sys/external/mit/xen-include-public/dist/xen/include/public/memory.h: revision 1.2
sys/arch/xen/include/xenpmap.h: revision 1.44
sys/arch/xen/include/xenio.h: revision 1.12
sys/arch/x86/x86/pmap.c: revision 1.394
(all via patch)

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()

Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().

Implement new ioctl, needed by Xen 4.13:
IOCTL_PRIVCMD_MMAPBATCH_V2
IOCTL_PRIVCMD_MMAP_RESOURCE
IOCTL_GNTDEV_MMAP_GRANT_REF
IOCTL_GNTDEV_ALLOC_GRANT_REF

Always enable declarations needed by privcmd.c

RSS XML Feed