Home | History | Annotate | only in /src/sys/arch/xen/x86
History log of /src/sys/arch/xen/x86
RevisionDateAuthorComments
 1.26 17-Oct-2023  bouyer Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.
Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen
when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.
x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console
xen/x86/consinit.c: support genfb as possible console
xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.
xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.
 1.25 02-May-2020  bouyer branches: 1.25.20;
Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.24 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.23 24-May-2019  nonaka branches: 1.23.8;
Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.22 28-Jan-2019  bad Sprinkle DPRINTF #ifdef DEBUG_GEOM and set booted_method like arch/x86/x86/x86_autoconf.c

As discussed 1 week ago on port-xen.
 1.21 22-Dec-2018  cherry This change modifies the mainbus(4) entry point for all x86 sub-archs
in the following way:

i) It provides a unified entry point in
x86/x86/mainbus.c:mainbus_attach()
ii) It carves out the preliminary bus attachment sequence that is
common to all sub-archs into
x86/x86/mainbus.c: x86_cpubus_attach()
iii) It consolidates the remaining pathways as internal callee
functions so that these may be called piecemeal if required. A
special usecase of this is XEN PVHVM which may need to call the
native configure path, the xen configure path, or both.
iv) It moves the driver private data structures from
i386/i386_mainbus.c to an x86/ level one. This allows for other
sub-arch's to do similar, if needed. (They do not at the moment).
v) For dom0 kernels, it enables 'acpi0 at mainbus?' and
'acpi0 at hypervisorbus'. This serves two purposes:
a) To demonstrate the possibility of dynamic configuration tree
traversal ordering changes.
b) To allow for the common acpi_check(self, "acpibus") call in
x86/mainbus.c to not barf when it is called from the dom0 attach
path. We allow for the acpi0 device to be a child of mainbus with
the changes to amd64/conf/XEN3_DOM0 and i386/conf/XEN3PAE_DOM0
without actually probing further in the code. This path will later
be pursued in a PVHVM boot codepath.

There should be no operative changes with this change. If there are,
please complain loudly.
 1.20 07-Oct-2018  mlelstv Support bootspec.
 1.19 29-Jul-2017  maxv branches: 1.19.2; 1.19.4;
Remove the remaining parts of compat_oldboot.
 1.18 23-May-2017  nonaka branches: 1.18.2;
x86: hypervisor detection from FreeBSD for x2APIC support.
 1.17 03-Apr-2014  christos branches: 1.17.6;
Change findroot() to cpu_bootconf() since this is what it does. Remove bogus
comment.
 1.16 03-Oct-2012  dsl branches: 1.16.2;
Remove all references to KVM86.
It was only ever used by APMBIOS - and then only if an option was selected.
Probably didn't work well at all!
 1.15 29-Jul-2012  mlelstv branches: 1.15.2;
Do not call setroot() from MD code and from MI code, which has
unwanted sideeffects in the RB_ASKNAME case. This fixes PR/46732.

No longer wrap MD cpu_rootconf(), as hp300 port stores reboot information
as a side effect. Instead call MI rootconf() from MD code which makes
rootconf() now a wrapper to setroot().

Adjust several MD routines to set the global booted_device,booted_partition
variables instead of passing partial information to setroot().

Make cpu_rootconf(9) describe the calling order.
 1.14 10-Jun-2012  mlelstv Make detection of root on wedges (dk(4)) machine independent. Remove
MD code for x86, xen, sparc64.
 1.13 27-Nov-2009  rmind branches: 1.13.12; 1.13.18;
- Use uvm_lwp_setuarea() instead of directly setting address to lwp_t::l_addr.
- Replace most remaining uses of l_addr with uvm_lwp_getuarea() or lwp_getpcb().
- Amend assembly in ports where it accesses PCB via struct user.
- Rename L_ADDR to L_PCB in few places. Reduce sys/user.h inclusions.
 1.12 21-Nov-2009  rmind Catch-up Xen and usermode with lwp_getpcb() and unbreak Xen build.
 1.11 06-Nov-2009  dyoung Use deviter(9) instead of accessing alldevs directly.
 1.10 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.9 12-Feb-2009  cegger Make Dom0/DomU boot with root-on-nfs when 'bootdev' parameter is missing or wrong.
In this case, we get prompted for the root device.
Make sure that nfs_bootstatic_callback is initialized or we will miss the nfsroot bootparameter,
causing a boot failure even when root device is correct.
 1.8 18-Dec-2008  cegger branches: 1.8.2;
remove unused malloc.h
 1.7 27-Oct-2008  cegger branches: 1.7.2;
change nfs boot behaviour to automatically try next boot method if boot information are incomplete to succeed.
That way, it is possible combine static and dhcp boot:
For example, to boot diskless you can specify the nfs-server and the rootpath statically. All other information will be taken via dhcp.

Patch has been presented on port-xen, tech-kern and tech-net:
http://mail-index.netbsd.org/port-xen/2008/10/24/msg004488.html
http://mail-index.netbsd.org/tech-kern/2008/10/24/msg003255.html
http://mail-index.netbsd.org/tech-net/2008/10/24/msg000864.html

No comments, no objections.
 1.6 24-Oct-2008  cegger branches: 1.6.2;
findroot(): set booted_device also when specifying a network device to bootdev.
Useful for booting with root on nfs.
 1.5 24-Oct-2008  cegger struct device * -> device_t
 1.4 21-Oct-2008  cegger introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.3 06-Apr-2008  cegger branches: 1.3.4; 1.3.10;
use aprint_*_dev and device_xname
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.8; 1.2.16; 1.2.22;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file autoconf.c was initially added on branch bouyer-xenamd64.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.22.2 17-Jan-2009  mjf Sync with HEAD.
 1.2.22.1 02-Jun-2008  mjf Sync with HEAD.
 1.2.16.2 09-Jan-2008  matt sync with HEAD
 1.2.16.1 22-Nov-2007  matt file autoconf.c was added on branch matt-armv6 on 2008-01-09 01:50:13 +0000
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file autoconf.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:15 +0000
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file autoconf.c was added on branch vmlocking on 2007-12-03 19:04:37 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file autoconf.c was added on branch jmcneill-pm on 2007-11-27 19:36:17 +0000
 1.3.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.3.4.3 11-Mar-2010  yamt sync with head
 1.3.4.2 19-Aug-2009  yamt sync with head.
 1.3.4.1 04-May-2009  yamt sync with head.
 1.6.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.6.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.7.2.1 23-Feb-2009  snj Pull up following revision(s) (requested by cegger in ticket #460):
sys/arch/xen/x86/autoconf.c: revision 1.9
Make Dom0/DomU boot with root-on-nfs when 'bootdev' parameter is missing
or wrong.
In this case, we get prompted for the root device.
Make sure that nfs_bootstatic_callback is initialized or we will miss
the nfsroot bootparameter, causing a boot failure even when root device
is correct.
 1.8.2.3 24-Oct-2010  jym Sync with HEAD
 1.8.2.2 01-Nov-2009  jym Sync with HEAD.
 1.8.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.13.18.2 08-Aug-2012  martin Pull up following revision(s) (requested by mlelstv in ticket #466):
sys/arch/amiga/amiga/autoconf.c: revision 1.113
sys/arch/rs6000/rs6000/autoconf.c: revision 1.4
sys/arch/emips/emips/autoconf.c: revision 1.6
sys/arch/sandpoint/sandpoint/autoconf.c: revision 1.27
sys/arch/evbmips/alchemy/autoconf.c: revision 1.18
sys/arch/sgimips/sgimips/autoconf.c: revision 1.43
sys/arch/atari/atari/autoconf.c: revision 1.63
sys/arch/powerpc/oea/ofw_autoconf.c: revision 1.17
sys/arch/mmeye/mmeye/autoconf.c: revision 1.9
distrib/sets/lists/comp/mi: revision 1.1771
sys/arch/mipsco/mipsco/autoconf.c: revision 1.25
sys/arch/iyonix/iyonix/autoconf.c: revision 1.14
sys/arch/hp300/hp300/autoconf.c: revision 1.100
sys/kern/init_main.c: revision 1.445
sys/arch/pmax/pmax/autoconf.c: revision 1.79
sys/arch/netwinder/netwinder/autoconf.c: revision 1.11
sys/arch/dreamcast/dreamcast/autoconf.c: revision 1.10
sys/arch/ibmnws/ibmnws/autoconf.c: revision 1.12
sys/arch/evbppc/ev64260/autoconf.c: revision 1.17
sys/arch/evbmips/gdium/autoconf.c: revision 1.5
sys/arch/algor/algor/autoconf.c: revision 1.21
share/man/man9/Makefile: revision 1.367
sys/arch/ews4800mips/ews4800mips/autoconf.c: revision 1.9
sys/arch/amigappc/amigappc/autoconf.c: revision 1.5
sys/arch/x86/x86/x86_autoconf.c: revision 1.65
sys/arch/acorn26/acorn26/autoconf.c: revision 1.9
sys/arch/mvmeppc/mvmeppc/autoconf.c: revision 1.13
sys/arch/vax/vax/autoconf.c: revision 1.94
sys/arch/usermode/dev/cpu.c: revision 1.72
sys/arch/evbppc/virtex/autoconf.c: revision 1.5
sys/arch/next68k/next68k/autoconf.c: revision 1.26
sys/arch/mac68k/mac68k/autoconf.c: revision 1.73
sys/arch/ia64/ia64/autoconf.c: revision 1.6
sys/arch/evbppc/obs405/obs405_autoconf.c: revision 1.6
share/man/man9/cpu_rootconf.9: revision 1.7
sys/arch/landisk/landisk/autoconf.c: revision 1.6
sys/arch/evbmips/malta/autoconf.c: revision 1.16
sys/arch/sun3/sun3/autoconf.c: revision 1.76
sys/arch/evbppc/explora/autoconf.c: revision 1.13
sys/arch/sun3/sun3/autoconf.c: revision 1.77
sys/arch/evbmips/loongson/autoconf.c: revision 1.3
sys/arch/evbmips/atheros/autoconf.c: revision 1.11
sys/arch/sparc64/sparc64/autoconf.c: revision 1.188
sys/arch/acorn32/acorn32/autoconf.c: revision 1.18
sys/arch/evbarm/evbarm/autoconf.c: revision 1.13
sys/arch/cobalt/cobalt/autoconf.c: revision 1.30
sys/arch/mvme68k/mvme68k/autoconf.c: revision 1.46
sys/arch/hp700/hp700/autoconf.c: revision 1.48
sys/arch/evbmips/adm5120/autoconf.c: revision 1.5
sys/arch/hpcmips/hpcmips/autoconf.c: revision 1.25
sys/arch/alpha/alpha/autoconf.c: revision 1.52
sys/arch/sparc/sparc/autoconf.c: revision 1.244
sys/arch/evbppc/pmppc/autoconf.c: revision 1.7
sys/arch/bebox/bebox/autoconf.c: revision 1.25
sys/arch/luna68k/luna68k/autoconf.c: revision 1.13
sys/arch/hpcarm/hpcarm/autoconf.c: revision 1.20
sys/arch/evbppc/walnut/autoconf.c: revision 1.21
sys/arch/cesfic/cesfic/autoconf.c: revision 1.26
sys/arch/cats/cats/autoconf.c: revision 1.17
sys/arch/x68k/x68k/autoconf.c: revision 1.67
sys/arch/news68k/news68k/autoconf.c: revision 1.21
sys/arch/arc/arc/autoconf.c: revision 1.34
sys/arch/evbsh3/evbsh3/autoconf.c: revision 1.11
sys/sys/conf.h: revision 1.143
sys/arch/evbmips/rasoc/autoconf.c: revision 1.3
sys/arch/hpcsh/hpcsh/autoconf.c: revision 1.26
sys/arch/sun68k/sun68k/autoconf.c: revision 1.29
sys/arch/evbmips/rmixl/autoconf.c: revision 1.6
sys/arch/zaurus/zaurus/autoconf.c: revision 1.12
sys/arch/xen/x86/autoconf.c: revision 1.15
sys/arch/evbppc/mpc85xx/autoconf.c: revision 1.6
sys/arch/shark/shark/autoconf.c: revision 1.18
sys/arch/prep/prep/autoconf.c: revision 1.25
sys/arch/newsmips/newsmips/autoconf.c: revision 1.36
sys/arch/sbmips/sbmips/autoconf.c: revision 1.8
Do not call setroot() from MD code and from MI code, which has
unwanted sideeffects in the RB_ASKNAME case. This fixes PR/46732.
No longer wrap MD cpu_rootconf(), as hp300 port stores reboot information
as a side effect. Instead call MI rootconf() from MD code which makes
rootconf() now a wrapper to setroot().
Adjust several MD routines to set the global booted_device,booted_partition
variables instead of passing partial information to setroot().
Make cpu_rootconf(9) describe the calling order.
add rootconf(9) as a link to cpu_rootconf(9)
make this compile again
 1.13.18.1 05-Jul-2012  riz Pull up following revision(s) (requested by mlelstv in ticket #402):
sys/dev/vnd.c: revision 1.221
sys/kern/init_main.c: revision 1.443
sys/kern/init_main.c: revision 1.444
sys/dev/dkwedge/dk.c: revision 1.64
sys/arch/x86/x86/x86_autoconf.c: revision 1.63
sys/arch/sparc64/sparc64/autoconf.c: revision 1.187
sys/sys/device.h: revision 1.141
sys/dev/dkwedge/dkwedge_bsdlabel.c: revision 1.17
sys/kern/kern_subr.c: revision 1.213
sys/arch/zaurus/zaurus/autoconf.c: revision 1.11
sys/arch/xen/x86/autoconf.c: revision 1.14
sys/sys/disk.h: revision 1.57
Use the label's packname to create wedge names instead of the classic
device names. Fall back to classic device names when the label has an
empty name or the default name 'fictitious'.
autodiscover wedges
Make detection of root on wedges (dk(4)) machine independent. Remove
MD code for x86, xen, sparc64.
Make detection of root on wedges (dk(4)) machine independent. Remove
MD code for zaurus.
Do not try to find the wedge we booted from if opendisk(booted_device)
failed.
 1.13.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.13.12.1 30-Oct-2012  yamt sync with head
 1.15.2.3 03-Dec-2017  jdolecek update from HEAD
 1.15.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.15.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.16.2.1 18-May-2014  rmind sync with head
 1.17.6.1 28-Aug-2017  skrll Sync with HEAD
 1.18.2.2 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.18.2.1 13-Oct-2018  martin Pull up following revision(s) (requested by mlelstv in ticket #1057):

sys/arch/xen/x86/autoconf.c: revision 1.20
sys/arch/xen/include/xen.h: revision 1.40

Support bootspec.
 1.19.4.1 10-Jun-2019  christos Sync with HEAD
 1.19.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.19.2.1 20-Oct-2018  pgoyette Sync with head
 1.23.8.2 12-Apr-2020  bouyer remove stray 'else'
 1.23.8.1 08-Apr-2020  bouyer Remove VM_GUEST_XEN and define only Xen subtypes:
VM_GUEST_XENPV
VM_GUEST_XENPVH
VM_GUEST_XENHVM
VM_GUEST_XENPVHVM

Set vm_guest in the start routine, if it is hypervisor-specific (e.g Xen PV).
If vm_guest was not set early and we detect Xen in identify_hypervisor(),
assume it is VM_GUEST_XENHVM. Refine to VM_GUEST_PVXENHVM in
hypervisor_match().
 1.25.20.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.5 16-Apr-2005  yamt tweak x86 bus_dma code so that it can be used by xen port.

- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.4 01-Apr-2005  yamt branches: 1.4.2;
merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.3 10-Mar-2005  matt branches: 1.3.2;
Update to new bus_dma semantics.
 1.2 09-Mar-2005  bouyer Merge the bouyer-xen2 branch. This add supports for the Xen 2.0 virtual
machine kernel (both privileged and non-privileged domains), and remove support
for the old xen 1.2.
 1.1 20-Jan-2005  bouyer branches: 1.1.2; 1.1.4;
file bus_dma.c was initially added on branch bouyer-xen2.
 1.1.4.2 19-Mar-2005  yamt (re-)convert arch/xen to the new API.
XXX except for xbdback.c and xennetback.c, because i'm not sure
what they're doing.
 1.1.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.2 16-Feb-2005  bouyer Support functions to map/unmap guest's memory into our kernel VM space.
This will be used by the block and network back-end.
The caller provide a list of machine address pages, and we return the
address virtual address they've been mapped to. These mapping are not
registered to the pmap (we don't have physical addresses for these pages),
so change bus_dma(9) to handle these mappings as well.
 1.1.2.1 20-Jan-2005  bouyer bus_dma(9) for xen. Derived from arch/x86/x86/bus_dma.c.
bounce buffers not supported yet, because xen doesn't have an interface
to request memory in a specific range (this means that DMA on ISA won't
be supported), but I've left the code commented out because xen will
likely provide an appropriate hypercall in the future.
 1.3.2.1 21-Apr-2005  tron Pull up file removal (requested by yamt in ticket #175):
tweak x86 bus_dma code so that it can be used by xen port.
- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.4.2.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.2.2 01-Apr-2005  skrll Sync with HEAD.
 1.4.2.1 01-Apr-2005  skrll file bus_dma.c was added on branch ktrace-lwp on 2005-04-01 14:29:11 +0000
 1.9 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.8 04-Mar-2007  christos branches: 1.8.2; 1.8.10; 1.8.18; 1.8.20; 1.8.22;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.7 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.6 15-Jan-2006  bouyer branches: 1.6.24;
Snapshot of work in progress on NetBSD port to Xen3:
- kernel (both dom0 and domU) boot, console is functionnal and it can starts
software from a ramdisk
- there is no driver front-end expect console for domU yet.
- dom0 can probe devices and ex(4) work when Xen3 is booted without acpi
and apic support. But the on-board IDE doens't get interrupts.
The PCI code still needs work (it's hardcoded to mode 1). Some of this
code should be shared with ../x86
The physical insterrupt code needs to get MPBIOS and ACPI support, and
do interrupt routing to properly interract with Xen.
To enable Xen-3.0 support, add
options XEN3
to your kernel config file (this will disable Xen2 support)
Changes affecting Xen-2.0 support (no functionnal changes intended):
- get more constants from genassym for assembly code
- remove some unneeded registers move from start()
- map the shared info page from start(), and remove the pte = 0xffffffff hack
- vector.S: in hypervisor_callback() make sure %esi points to
HYPERVISOR_shared_info before accessing the info page. Remplace some
hand-written assembly with the equivalent macro defined in frameasm.h
- more debug code, dissabled by default.

while here added my copyright on some files I worked on in 2005.
 1.5 24-Nov-2005  yamt branches: 1.5.2;
bus_dmamem_map: honour BUS_DMA_NOWAIT. noted by Manuel Bouyer.
bus_space_map: always do NOWAIT allocation as it used to be before yamt-km.

we have too many copies!
 1.4 01-Apr-2005  yamt branches: 1.4.2; 1.4.8;
merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.3 09-Mar-2005  bouyer Merge the bouyer-xen2 branch. This add supports for the Xen 2.0 virtual
machine kernel (both privileged and non-privileged domains), and remove support
for the old xen 1.2.
 1.2 26-Apr-2004  cl branches: 1.2.2; 1.2.4; 1.2.8; 1.2.10; 1.2.12;
Rework the physical<->machine memory mapping: offset physical addresses
by 0x100000 (above the I/O Memory "hole") leaving all physical addresses
below unused, don't perform phys<->mach mapping for addresses below 0x100000
or beyond the real hardware's physical memory.

-> /dev/mem works now as expected and X works in domain0.
 1.1 24-Apr-2004  cl Make bus_space map machine addresses instead of physical addresses.
 1.2.12.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.2.12.1 13-Feb-2005  yamt - use new apis.
- simplify bootstrap and pvpage allocation.
- remove no longer needed .globl decls.
 1.2.10.1 29-Apr-2005  kent sync with -current
 1.2.8.2 09-Mar-2005  bouyer bus_space_map/unmap() should map machine addresses, not physical addresses,
for memory-mapped I/O. So use pmap_kenter_ma() instead of pmap_kenter_pa(),
and introduce pmap_extract_ma() to get the machine address of a mapping.
Now memory-mapped I/O should work outside of the ISA hole (note that PCI
devices are usually mapped in the ISA memory hole anyway).
 1.2.8.1 13-Dec-2004  bouyer Commit files from netbsd-2.0-xen-sparse/sys/arch/xen in the Xen-2.0
distribution. These are the files modified from the 2.0 tree to get
NetBSD/xen working with Xen 2.
 1.2.4.6 11-Dec-2005  christos Sync with head.
 1.2.4.5 01-Apr-2005  skrll Sync with HEAD.
 1.2.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.4.2 03-Aug-2004  skrll Sync with HEAD
 1.2.4.1 26-Apr-2004  skrll file bus_space.c was added on branch ktrace-lwp on 2004-08-03 10:43:19 +0000
 1.2.2.2 22-May-2004  he Pull up revisions 1.1-1.2 (requested by cl in ticket #337):
Upgrade xen support:
- add block device driver
- network device driver bug fixes
- support for vga/keyboard/mouse
- support for domain0 operations
- fix /dev/mem and i386_iopl, reboot, event dispatch
- fix clock support, cpu speed report, lazy fpu switching
- add xen12load loader
- sys/arch/xen parts of build.sh release support
[cl, ticket #337]
 1.2.2.1 26-Apr-2004  he file bus_space.c was added on branch netbsd-2-0 on 2004-05-22 15:57:25 +0000
 1.4.8.1 29-Nov-2005  yamt sync with head.
 1.4.2.4 27-Oct-2007  yamt sync with head.
 1.4.2.3 03-Sep-2007  yamt sync with head.
 1.4.2.2 26-Feb-2007  yamt sync with head.
 1.4.2.1 21-Jun-2006  yamt sync with head.
 1.5.2.1 01-Feb-2006  yamt sync with head.
 1.6.24.2 12-Mar-2007  rmind Sync with HEAD.
 1.6.24.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.8.22.1 06-Oct-2007  yamt sync with head.
 1.8.20.1 06-Nov-2007  matt sync with HEAD
 1.8.18.1 02-Oct-2007  joerg Sync with HEAD.
 1.8.10.1 03-Oct-2007  garbled Sync with HEAD
 1.8.2.1 12-Oct-2007  ad Fix merge errors.
 1.18 17-Oct-2023  bouyer Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.
Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen
when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.
x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console
xen/x86/consinit.c: support genfb as possible console
xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.
xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.
 1.17 22-Jul-2023  mrg xen: declare 'default_consinfo' as extern in a header

this makes pvh_consinit.c actually compile with CONS_OVERRIDE set.
i didn't see any good header to add to (bootinfo.h and cpu.h both
seem to be poor choices but were considered), hence the new one
with just this definition.
 1.16 13-Oct-2012  jdc branches: 1.16.68;
Adapt to the changed signature of pckbc_cnattach().
 1.15 01-Jul-2011  dyoung branches: 1.15.2; 1.15.12;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.14 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.13 18-Mar-2009  cegger branches: 1.13.2; 1.13.4;
Ansify function definitions w/o arguments. Generated with sed.
 1.12 16-Mar-2009  cegger ansify function definitions
 1.11 21-Oct-2008  cegger branches: 1.11.2; 1.11.8;
introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.10 22-Nov-2007  bouyer branches: 1.10.14; 1.10.18; 1.10.24;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.9 14-Nov-2007  ad Remove pccons.
 1.8 29-Jan-2007  hubertf branches: 1.8.6; 1.8.22; 1.8.24; 1.8.28; 1.8.30;
Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
 1.7 09-Dec-2006  bouyer Remove extra ) causing compile failure when CONS_OVERRIDE is defined.
From Hideo Masuda in PR port-xen/35217.
 1.6 11-Dec-2005  christos branches: 1.6.20; 1.6.22; 1.6.24;
merge ktrace-lwp.
 1.5 16-Jun-2005  bouyer branches: 1.5.2;
Allow compiling a domain0 kernel with vga but without pckbc, and add
console support for USB keyboard. Problem pointed out by Karl Janmar on
port-xen.
 1.4 09-Mar-2005  bouyer branches: 1.4.2;
Merge the bouyer-xen2 branch. This add supports for the Xen 2.0 virtual
machine kernel (both privileged and non-privileged domains), and remove support
for the old xen 1.2.
 1.3 24-Apr-2004  cl branches: 1.3.2; 1.3.6; 1.3.8; 1.3.10;
Enable keyboard and vga display as console when running as domain-0.
 1.2 24-Apr-2004  cl Consistently use xencons for eveything reffering to Xen's virtual console.

rename arch/xen/xen/console.c -> arch/xen/xen/xencons.c
 1.1 11-Mar-2004  cl branches: 1.1.2;
Add port to the Xen virtual machine monitor.
(see http://www.cl.cam.ac.uk/Research/SRG/netos/xen/)
 1.1.2.1 22-May-2004  he Pull up revisions 1.2-1.3 (requested by cl in ticket #337):
Upgrade xen support:
- add block device driver
- network device driver bug fixes
- support for vga/keyboard/mouse
- support for domain0 operations
- fix /dev/mem and i386_iopl, reboot, event dispatch
- fix clock support, cpu speed report, lazy fpu switching
- add xen12load loader
- sys/arch/xen parts of build.sh release support
[cl, ticket #337]
 1.3.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.3.8.1 29-Apr-2005  kent sync with -current
 1.3.6.1 08-Mar-2005  bouyer Add support for ISA bus.
Clean up console attachement, and add support for VGA/pckbc console.
Add support for USB devices, including USB audio (which means others audio
devices should work too)
Add some more generic options to XEN0.
 1.3.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.3.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.2 03-Aug-2004  skrll Sync with HEAD
 1.3.2.1 24-Apr-2004  skrll file consinit.c was added on branch ktrace-lwp on 2004-08-03 10:43:19 +0000
 1.4.2.2 11-Dec-2006  tron Pull up following revision(s) (requested by bouyer in ticket #1600):
sys/arch/xen/x86/consinit.c: revision 1.7
Remove extra ) causing compile failure when CONS_OVERRIDE is defined.
From Hideo Masuda in PR port-xen/35217.
 1.4.2.1 28-Jun-2005  tron branches: 1.4.2.1.2; 1.4.2.1.4;
Pull up revision 1.5 (requested by bouyer in ticket #482):
Allow compiling a domain0 kernel with vga but without pckbc, and add
console support for USB keyboard. Problem pointed out by Karl Janmar on
port-xen.
 1.4.2.1.4.1 11-Dec-2006  tron Pull up following revision(s) (requested by bouyer in ticket #1600):
sys/arch/xen/x86/consinit.c: revision 1.7
Remove extra ) causing compile failure when CONS_OVERRIDE is defined.
From Hideo Masuda in PR port-xen/35217.
 1.4.2.1.2.1 11-Dec-2006  tron Pull up following revision(s) (requested by bouyer in ticket #1600):
sys/arch/xen/x86/consinit.c: revision 1.7
Remove extra ) causing compile failure when CONS_OVERRIDE is defined.
From Hideo Masuda in PR port-xen/35217.
 1.5.2.4 07-Dec-2007  yamt sync with head
 1.5.2.3 15-Nov-2007  yamt sync with head.
 1.5.2.2 26-Feb-2007  yamt sync with head.
 1.5.2.1 30-Dec-2006  yamt sync with head.
 1.6.24.1 09-Dec-2006  riz Pull up following revision(s) (requested by bouyer in ticket #266):
sys/arch/xen/x86/consinit.c: revision 1.7
Remove extra ) causing compile failure when CONS_OVERRIDE is defined.
From Hideo Masuda in PR port-xen/35217.
 1.6.22.1 10-Dec-2006  yamt sync with head.
 1.6.20.2 01-Feb-2007  ad Sync with head.
 1.6.20.1 12-Jan-2007  ad Sync with head.
 1.8.30.2 08-Dec-2007  mjf Sync with HEAD.
 1.8.30.1 19-Nov-2007  mjf Sync with HEAD.
 1.8.28.2 18-Nov-2007  bouyer Sync with HEAD
 1.8.28.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.8.24.1 09-Jan-2008  matt sync with HEAD
 1.8.22.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.8.22.1 21-Nov-2007  joerg Sync with HEAD.
 1.8.6.1 03-Dec-2007  ad Sync with HEAD.
 1.10.24.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.10.18.2 11-Aug-2010  yamt sync with head.
 1.10.18.1 04-May-2009  yamt sync with head.
 1.10.14.1 17-Jan-2009  mjf Sync with HEAD.
 1.11.8.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.11.8.3 24-Oct-2010  jym Sync with HEAD
 1.11.8.2 01-Nov-2009  jym Sync with HEAD.
 1.11.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.13.4.1 30-May-2010  rmind sync with head
 1.13.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.15.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.15.2.1 30-Oct-2012  yamt sync with head
 1.16.68.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.145 25-Feb-2023  riastradh xen/x86/cpu.c: Nix trailing whitespace.

No functional change intended.
 1.144 25-Feb-2023  riastradh xen/x86/cpu.c: Membar audit.

I see no reason for store-before-load ordering here; as far as I'm
aware, evtchn_upcall_mask is only shared between a (v)CPU and its
(hypervisor) interrupts, not other (v)CPUs.
 1.143 25-Feb-2023  riastradh x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.
 1.142 20-Aug-2022  riastradh branches: 1.142.4;
x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.141 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.140 24-Apr-2021  thorpej branches: 1.140.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.139 14-Jul-2020  yamaguchi branches: 1.139.4;
Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.138 08-Jul-2020  jdolecek initalize ci_kfpu_spl, to avoid triggering KASSERT() in fpu_kern_enter()

Follows revision 1.177 of sys/arch/x86/x86/cpu.c:
"""
Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();
"""

With this DomU kernel boots with:
[ 2.0000571] aes: Intel AES-NI
 1.137 27-Jun-2020  jdolecek avoid excessive stack usage in mp_cpu_start(), this is called after VM
init so kmem(9) can be used
 1.136 21-May-2020  ad - Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.135 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.134 21-Apr-2020  ad Follow convention and put entire predicate inside __predict_false()
 1.133 24-Feb-2020  rin branches: 1.133.4;
0x%p --> %p for non-external codes.
 1.132 13-Jan-2020  bouyer Don't call cpu_switchto() before idle_loop(), it should not be needed any more.
While there, assume (and KASSERT) that curlwp == ci->ci_data.cpu_idlelwp,
this saves a lwp_getpcb() call.
 1.131 23-Nov-2019  ad branches: 1.131.2;
cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.130 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.129 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.128 02-Feb-2019  cherry Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.127 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.126 12-Aug-2018  maxv Introduce PDIR_SLOT_USERLIM, which indicates the limit of the user slots.
Use it instead of PDIR_SLOT_PTE when we just want to iterate over the
user slots. Also use it in SVS, I had hardcoded 255 because there was no
proper define (which there now is).
 1.125 27-Jul-2018  maxv Reduce the size of the blocks. No functional change.
 1.124 26-Jul-2018  maxv Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.
 1.123 24-Jul-2018  bouyer Sync cpu_boot_secondary_processors() with x86/x86/cpu.c:
explicitely wait for all CPUs to be registered in kcpuset_running.
 1.122 23-Jun-2018  jdolecek branches: 1.122.2;
re-do the XEN XSAVE support, this time to leave all probe code in
cpu_probe_fpu(), and have XEN cpu_init() just act

the xen probe is now guarded by XEN_USE_XSAVE option and XSAVE
support is thus still disabled by default (same as before), so it
wouldn't interfere with maxv's eager fpu rototil, while still
allowing testing for others

PR kern/50332
 1.121 23-Jun-2018  maxv Revert the rest of jdolecek's changes. This puts us back in a clean,
sensical state.
 1.120 22-Jun-2018  maxv Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.
 1.119 20-Jun-2018  jdolecek as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv
 1.118 19-Jun-2018  jdolecek fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8
 1.117 13-Jan-2018  bouyer branches: 1.117.2;
Needs cpu_init_tss() for application processor too.
 1.116 11-Nov-2017  maxv Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.
 1.115 11-Nov-2017  bouyer Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
 1.114 11-Nov-2017  riastradh No externs in .c files!
 1.113 08-Nov-2017  maxv Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.
 1.112 17-Sep-2017  maxv Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.
 1.111 06-Jul-2017  bouyer gdt_prepframes() is called with a number of pages, don't convert to a number
of pages again. This didn't fail because we're called with only one page, and
the conversion from '1' to pages resulted in 1 again.
 1.110 23-Mar-2017  maxv branches: 1.110.6;
Remove PG_k completely.
 1.109 11-Feb-2017  maxv Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.
 1.108 05-Feb-2017  maxv Rename ldt->ldtstore and gdt->gdtstore on i386. It reduces the diff with
amd64, and makes it easier to track down these variables on nxr - 'ldt'
and 'gdt' being common keywords.
 1.107 02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.106 22-Jan-2017  maxv Import xpmap_pg_nx, and put it in the per-cpu recursive slot on amd64.
 1.105 25-Nov-2016  maxv branches: 1.105.2;
KNF a little
 1.104 07-Jul-2016  msaitoh branches: 1.104.2;
KNF. Remove extra spaces. No functional change.
 1.103 13-Dec-2015  christos need definition
 1.102 13-Dec-2015  christos fix the build.
 1.101 08-Dec-2014  msaitoh Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.
 1.100 27-Nov-2014  bouyer branches: 1.100.2;
Revert sys/arch/x86/x86/pmap.c 1.185; a CPU needs to get pmap updates,
especially for pmap_kernel(), as soon as it is up.
Instead move all pmap-related cpu_info initialisations, including
initializing ci_kpm_mtx, in cpu_attach_common() from cpu_init()
(ci_pmap and ci_tlbstate as already initialized in cpu_attach_common()).
 1.99 18-Oct-2014  snj src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.98 12-Feb-2014  dsl branches: 1.98.4;
Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.97 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.96 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.95 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.94 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.93 24-Jun-2012  jym branches: 1.93.2; 1.93.4;
Update comment: we stopped using xcall to sync PTP between CPUs.
pmap_kpm_sync_xcall => xen_kpm_sync
 1.92 06-Jun-2012  rmind Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.
 1.91 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.90 11-Mar-2012  jym Typo fix.
 1.89 25-Feb-2012  bouyer The code assumes that ci_index is also the Xen's cpunum, and that
cpunum is less than XEN_LEGACY_MAX_VCPUS. KASSERT both.
 1.88 24-Feb-2012  bouyer Don't maintain ci_cpumask for physical CPUs, it's not used.
 1.87 24-Feb-2012  bouyer Get rid of phycpus_attached bitmask; it's maintained but not used and
will limit the number of physical CPUs to 32 without good reasons.
 1.86 24-Feb-2012  cherry (xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).
 1.85 23-Feb-2012  cherry Cleanup.

- Remove cruft from native x86 origin.
- Remove access to privileged MSRs.
- Cleanup stale comments.
 1.84 23-Feb-2012  cherry cpu_load_pmap() should not be used to load pmap_kernel(), since in the
x86 model, its mappings are shared across pmaps. KASSERT() for this
and remove unused codepaths.
 1.83 22-Feb-2012  bouyer use pmap_protect() instead of pmap_kenter_pa() to remap R/O an exiting
page. This gets rid of the last "mapping already present" warnings.
 1.82 21-Feb-2012  bouyer Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.

Makes LOCKDEBUG kernels boot again
 1.81 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.80 13-Feb-2012  jym branches: 1.80.2;
PAT flags are not under control of Xen domains currently, so there is no
point in enabling them.

Avoids:
- a warning logged by hypervisor when a domain attempts to modify the PAT
MSR.
- an error during domain resuming, where a PAT flag has been set on a page
while the hypervisor does not allow it.

ok releng@.
 1.79 28-Jan-2012  cherry Update comments to remove references to alternate pte space.
 1.78 28-Jan-2012  cherry stop using alternate pde mapping in xen pmap
 1.77 09-Jan-2012  cherry revert previous commit. DIAGNOSTIC should only do strict checks, not muffle current ones
 1.76 06-Jan-2012  cherry Address those pesky DIAGNOSTIC messages. \n
Take a performance hit at fork() for not DTRT. \n
Note: Only applicable for kernels built with "options DIAGNOSTIC" \n
 1.75 04-Jan-2012  cherry Use macro PDP_SIZE instead of numeric constant, for unshared PAE L3 entries.
Thanks jym@
 1.74 30-Dec-2011  cherry Never cut-paste code from email!
Use the right count (0 -> 2) of l3 unshared userland entries for per-cpu initialisation.
 1.73 30-Dec-2011  cherry Force pae l3 page allocation for new vcpus to be < 4G, so they fit in 32bits
 1.72 30-Dec-2011  cherry per-cpu shadow directory pages should be updated locally via cross-calls. Do this.
 1.71 07-Dec-2011  cegger switch from xen3-public to xen-public.
 1.70 06-Nov-2011  cherry branches: 1.70.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.
 1.69 06-Nov-2011  cherry [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.68 20-Oct-2011  jruoho branches: 1.68.2;
Remove code that is commented out and out-of-sync with x86. If Xen needs to
use cpu_resume(), cpu_suspend(), or cpu_shutdown() in the future, it is
better to expose these from x86 rather than duplicate code.
 1.67 06-Oct-2011  mrg remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.
 1.66 28-Sep-2011  jruoho Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.
 1.65 20-Sep-2011  jym Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.64 16-Aug-2011  dholland Fix broken build.
 1.63 15-Aug-2011  cherry Do not panic() on xen_send_ipi() sent to a cpu not yet running.
x86 MP boot depends on this strange behaviour.
 1.62 13-Aug-2011  cherry MP probing and startup code
 1.61 11-Aug-2011  cherry Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs
 1.60 16-Jul-2011  rmind Initialise cpus_running to 1 on Xen, as it was done on x86.

Problem analysed by hannken@. Fixes PR/45062.
 1.59 15-Jun-2011  rmind Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.
 1.58 15-Jun-2011  rmind - cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.
 1.57 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.56 26-Feb-2011  jruoho branches: 1.56.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.
 1.55 24-Feb-2011  jruoho Catch up with x86 on cpufeaturebus.
 1.54 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.53 24-Feb-2011  jruoho Add cpufeaturebus and est(4) for Xen.
 1.52 14-Nov-2010  bouyer branches: 1.52.2; 1.52.4;
Boot vs AP processors don't make sense for physical CPUs, these are
handled by the hypervisor and all CPUs are running when the dom0 is started.
In addition, we don't have a reliable way to determine the boot CPU as
- we may not be running on the boot CPU
- we don't have access to the lapic id
So simplify by ignoring the information and assign phycpu_info_primary to the
first attached CPU.
 1.51 06-Nov-2010  uebayasi Machine dependent code is considered as part of UVM. Include
internal API header.
 1.50 03-Nov-2010  jruoho Fill cpu_info::ci_acpiid also on Xen.
 1.49 20-Aug-2010  jruoho Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.48 09-Aug-2010  jruoho Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
 1.47 24-Jul-2010  jym Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.46 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.45 28-Jun-2010  rmind mp_cpu_start: although fragment is commented out, add pmap_update(), just
in case somebody would come up with a clever idea to copy-paste that.
 1.44 04-May-2010  jym Enable the NX bit feature for Xen i386pae and amd64 kernels.

Tested with Xen 3.1 and Xen 3.3, dom0 and domU, by bouyer@ and jym@.

Ok bouyer@.
 1.43 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.42 03-Mar-2010  jym branches: 1.42.2;
Use roundup2() instead of hardcoding the CACHE_LINE_SIZE rounding
operation.
 1.41 24-Feb-2010  dyoung A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
 1.40 08-Jan-2010  dyoung branches: 1.40.2;
Expand PMF_FN_* macros.
 1.39 27-Nov-2009  rmind - Use uvm_lwp_setuarea() instead of directly setting address to lwp_t::l_addr.
- Replace most remaining uses of l_addr with uvm_lwp_getuarea() or lwp_getpcb().
- Amend assembly in ports where it accesses PCB via struct user.
- Rename L_ADDR to L_PCB in few places. Reduce sys/user.h inclusions.
 1.38 24-Nov-2009  cegger Remove X86_MAXPROCS. This fixes PR port-xen/41755.
This also reduces diff to x86/x86/cpu.c as a nice side effect.
'looks good' bouyer@
 1.37 21-Nov-2009  rmind Catch-up Xen and usermode with lwp_getpcb() and unbreak Xen build.
 1.36 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.35 22-Sep-2009  cegger fix botch with merging in changes from x86/x86/cpu.c:

don't use wbinvd(). Xen flushes the cache for us.
This makes DomU boot again.
Spotted by bouyer@.
 1.34 30-Jul-2009  cegger from x86/x86/cpu.c:
- use atomic operations to set flags
- Align struct cpu_info to 64b.
 1.33 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.32 08-Jun-2009  cegger from sys/arch/x86/x86/cpu.c:

Implement -1 (RB_MD1) for physical CPUs in the Dom0.
 1.31 23-Dec-2008  cegger branches: 1.31.2;
catch up with x86/x86/cpu.c: move from malloc to kmem
 1.30 06-Nov-2008  cegger Link cpus in the order they are attaching and not in inverse order.
 1.29 31-Oct-2008  rmind - Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.
 1.28 22-Aug-2008  bouyer branches: 1.28.2; 1.28.4;
printf()->aprint_debug_dev() to match x86/cpu.c
 1.27 28-May-2008  ad branches: 1.27.4;
Give it a private X86_MAXPROCS def. XXX
 1.26 16-May-2008  bouyer call x86_cpu_idle_init(), avoid null function pointer call (cpu_idle()) when
scheduling starts.
cleanup printfs of vcpu
 1.25 11-May-2008  ad Fix typo.
 1.24 11-May-2008  ad Don't reload LDTR unless a new value, which only happens for USER_LDT.
 1.23 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.22 11-May-2008  ad Share cpu.h between the x86 ports.
 1.21 11-May-2008  ad Update xen for identcpu changes.
 1.20 10-May-2008  ad Make xen build after tsc changes.
 1.19 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.18 28-Apr-2008  martin branches: 1.18.2;
Remove clause 3 and 4 from TNF licenses
 1.17 24-Apr-2008  cegger branches: 1.17.2;
keep up with x86/x86/cpu.c, rev. 1.33:
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than panicing.
 1.16 21-Apr-2008  cegger Access Xen's vcpu info structure per-CPU.
Tested on i386 and amd64 (both dom0 and domU) by me.
Xen2 tested (both dom0 and domU) by bouyer.
OK bouyer
 1.15 18-Apr-2008  cegger branches: 1.15.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.
 1.14 17-Apr-2008  bouyer Do not set ioapic_bsp_id in cpu_attach_common(). It's already initialized
in cpu_attach(), and doing it here will overwrite the cpu_number of the
physical CPU with the one from the virtual CPU (which is always 0).
XXX is ioapic_bsp_id read somewhere ?
 1.13 17-Apr-2008  yamt cpu_debug_dump: s/curproc/curlwp/ in a message.
 1.12 17-Apr-2008  cegger reduce diff to x86/x86/cpu.c
 1.11 13-Apr-2008  cegger reduce diff to x86/x86/cpu.c
 1.10 13-Apr-2008  cegger - device_t/softc split
- ansify
 1.9 06-Apr-2008  cegger use aprint_*_dev and device_xname
 1.8 16-Jan-2008  dogcow branches: 1.8.6;
cargo-cult copy cpu_offline_md; fixes compile on i386/x86_64
 1.7 11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.6 04-Jan-2008  yamt branches: 1.6.2;
i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.
 1.5 18-Dec-2007  joerg Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.4 12-Dec-2007  bouyer Initialize ci_idepth in cpu_info_primary, makes LOCKDEBUG kernels boot.
 1.3 10-Dec-2007  bouyer branches: 1.3.2;
Make Xen kernels build again.
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file cpu.c was initially added on branch bouyer-xenamd64.
 1.1.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.4 18-Nov-2007  bouyer Ignore MTRRs for now, make kernel build again.
 1.1.2.3 18-Nov-2007  bouyer Sync with HEAD
 1.1.2.2 25-Oct-2007  bouyer Finish sync with HEAD. Especially use the new x86 pmap for xenamd64.
For this:
- rename pmap_pte_set() to pmap_pte_testset()
- make pmap_pte_set() a function or macro for non-atomic PTE write
- define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries
- define pmap_pte_flush() which is a nop in x86 case, and flush the
MMUops queue in the Xen case
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.10.2 13-Dec-2007  yamt sync with head.
 1.2.10.1 11-Dec-2007  yamt sync with head.
 1.2.8.3 21-Jan-2008  yamt sync with head
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file cpu.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:16 +0000
 1.2.6.1 26-Dec-2007  ad Sync with head.
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file cpu.c was added on branch vmlocking on 2007-12-03 19:04:38 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file cpu.c was added on branch jmcneill-pm on 2007-11-27 19:36:18 +0000
 1.3.2.5 19-Jan-2008  bouyer Sync with HEAD
 1.3.2.4 08-Jan-2008  bouyer Sync with HEAD
 1.3.2.3 06-Jan-2008  bouyer Merge needed changes to genassym.cf and locore.S for xeni386 back to
arch/i386. Switch xeni386 to use the arch/i386 cpu.h.
 1.3.2.2 02-Jan-2008  bouyer Sync with HEAD
 1.3.2.1 13-Dec-2007  bouyer Sync with HEAD
 1.6.2.3 23-Mar-2008  matt sync with HEAD
 1.6.2.2 09-Jan-2008  matt sync with HEAD
 1.6.2.1 04-Jan-2008  matt file cpu.c was added on branch matt-armv6 on 2008-01-09 01:50:13 +0000
 1.8.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.8.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.8.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.15.2.2 04-Jun-2008  yamt sync with head
 1.15.2.1 18-May-2008  yamt sync with head.
 1.17.2.7 09-Oct-2010  yamt sync with head
 1.17.2.6 11-Aug-2010  yamt sync with head.
 1.17.2.5 11-Mar-2010  yamt sync with head
 1.17.2.4 19-Aug-2009  yamt sync with head.
 1.17.2.3 20-Jun-2009  yamt sync with head
 1.17.2.2 04-May-2009  yamt sync with head.
 1.17.2.1 16-May-2008  yamt sync with head.
 1.18.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.18.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.27.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.27.4.1 19-Oct-2008  haad Sync with HEAD.
 1.28.4.3 22-Nov-2010  riz Pull up following revision(s) (requested by bouyer in ticket #1475):
sys/arch/xen/x86/cpu.c: revision 1.52
Boot vs AP processors don't make sense for physical CPUs, these are
handled by the hypervisor and all CPUs are running when the dom0 is started.
In addition, we don't have a reliable way to determine the boot CPU as
- we may not be running on the boot CPU
- we don't have access to the lapic id
So simplify by ignoring the information and assign phycpu_info_primary to the
first attached CPU.
 1.28.4.2 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.28.4.1 13-Nov-2008  snj branches: 1.28.4.1.2; 1.28.4.1.4;
Pull up following revision(s) (requested by rmind in ticket #48):
sys/kern/kern_cpu.c: revision 1.37
sys/arch/x86/x86/cpu.c: revision 1.58
sys/arch/xen/x86/cpu.c: revision 1.29
sys/sys/cpu.h: revision 1.24
sys/kern/sys_sched.c: revision 1.31
- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().
Should fix PR/39349.
 1.28.4.1.4.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.28.4.1.2.1 23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.28.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.31.2.9 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.31.2.8 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.31.2.7 10-Jan-2011  jym Sync with HEAD
 1.31.2.6 24-Oct-2010  jym Sync with HEAD
 1.31.2.5 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.31.2.4 01-Nov-2009  jym Sync with HEAD.
 1.31.2.3 23-Jul-2009  jym Sync with HEAD.
 1.31.2.2 18-Jun-2009  cegger register physical CPUs with pmf.
No suspend/resume handlers needed since the hypervisor itself handles them.
ok @jym
 1.31.2.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.40.2.7 09-Nov-2010  uebayasi Sync with HEAD.
 1.40.2.6 09-Nov-2010  uebayasi Sync with HEAD.
 1.40.2.5 06-Nov-2010  uebayasi Sync with HEAD.
 1.40.2.4 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.40.2.3 26-Aug-2010  uebayasi Fix build.
 1.40.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.40.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.42.2.4 05-Mar-2011  rmind sync with head
 1.42.2.3 03-Jul-2010  rmind sync with head
 1.42.2.2 31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.42.2.1 30-May-2010  rmind sync with head
 1.52.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.52.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.56.2.12 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.56.2.11 09-Sep-2011  cherry fix amd64 boot.
 1.56.2.10 01-Sep-2011  cherry fix %cr3 init. from mhitch@, tested by riz@ & mhitch@
 1.56.2.9 30-Aug-2011  cherry Add per-cpu mmu queues
 1.56.2.8 26-Aug-2011  cherry Name the L4 per-cpu pointer appropriately.
User cr3 should point to the per-cpu L4, not the user pmap pdir
 1.56.2.7 20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.56.2.6 17-Aug-2011  cherry Pullup relevant changes from -current
 1.56.2.5 07-Aug-2011  cherry Fix XEN3PAE_DOMx build
 1.56.2.4 31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.56.2.3 16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.56.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.56.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.68.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.68.2.4 30-Oct-2012  yamt sync with head
 1.68.2.3 23-May-2012  yamt sync with head.
 1.68.2.2 17-Apr-2012  yamt sync with head
 1.68.2.1 10-Nov-2011  yamt sync with head
 1.70.4.5 29-Apr-2012  mrg sync to latest -current.
 1.70.4.4 05-Apr-2012  mrg sync to latest -current.
 1.70.4.3 04-Mar-2012  mrg sync to latest -current.
 1.70.4.2 24-Feb-2012  mrg sync to -current.
 1.70.4.1 18-Feb-2012  mrg merge to -current.
 1.80.2.5 12-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #314):
sys/arch/xen/x86/cpu.c: revision 1.92
sys/kern/subr_kcpuset.c: revision 1.6
sys/sys/kcpuset.h: revision 1.6
sys/arch/xen/x86/x86_xpmap.c: revision 1.44
Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.
Tested by chs@.
 1.80.2.4 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.80.2.3 24-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #45):
sys/arch/xen/x86/cpu.c: revision 1.87
sys/arch/xen/x86/cpu.c: revision 1.88
Get rid of phycpus_attached bitmask; it's maintained but not used and
will limit the number of physical CPUs to 32 without good reasons.
Don't maintain ci_cpumask for physical CPUs, it's not used.
 1.80.2.2 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #31):
sys/arch/x86/x86/pmap.c: revision 1.166
sys/arch/xen/x86/cpu.c: revision 1.83
- Make pmap_write_protect() work with pmap_kernel() too ((va & L2_FRAME)
strips the high bits of a LP64 address)
- use pmap_protect() in pmap_pdp_ctor() to remap the PDP read-only instead
of (ab)using pmap_kenter_pa(). No more "mapping already present" on
console with DIAGNOSTIC kernels
- make sure to zero the whole PDP (NTOPLEVEL_PDES doens't include
high-level entries on i386 and i386PAE, reserved by Xen). Not sure
how it has worked before
- remove an always-true test (&& pmap != pmap_kernel(); we KASSERT that
at the function entry).
use pmap_protect() instead of pmap_kenter_pa() to remap R/O an exiting
page. This gets rid of the last "mapping already present" warnings.
 1.80.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.93.4.1 18-May-2014  rmind sync with head
 1.93.2.2 03-Dec-2017  jdolecek update from HEAD
 1.93.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.98.4.2 07-Mar-2016  msaitoh Pullup the following revision to fix build break caused by ticket #1118:

sys/arch/xen/x86/cpu.c 1.102-1.103

Increase the number of entries of cpu_features from 5 to 7.
 1.98.4.1 04-Aug-2015  snj branches: 1.98.4.1.2;
Pull up following revision(s) (requested by prlw1 in ticket #934):
sys/arch/xen/x86/cpu.c: revision 1.100
Move all pmap-related cpu_info initialisations, including
initializing ci_kpm_mtx, in cpu_attach_common() from cpu_init()
(ci_pmap and ci_tlbstate as already initialized in cpu_attach_common()).
 1.98.4.1.2.1 20-Mar-2018  martin Additionally pull up the following for ticket #1118:

sys/arch/xen/x86/cpu.c 1.102-1.103

to unbreak the build (adjust cpu_feature declaration to changes in generic
x86 code).
 1.100.2.6 28-Aug-2017  skrll Sync with HEAD
 1.100.2.5 05-Feb-2017  skrll Sync with HEAD
 1.100.2.4 05-Dec-2016  skrll Sync with HEAD
 1.100.2.3 09-Jul-2016  skrll Sync with HEAD
 1.100.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.100.2.1 06-Apr-2015  skrll Sync with HEAD
 1.104.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.104.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.104.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.105.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.110.6.1 13-Mar-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #629:

sys/arch/amd64/amd64/genassym.cf 1.63,1.64
sys/arch/amd64/amd64/locore.S 1.144
sys/arch/amd64/amd64/machdep.c 1.281-1.283
sys/arch/i386/i386/genassym.cf 1.105-1.106
sys/arch/i386/i386/locore.S 1.155
sys/arch/i386/i386/machdep.c 1.802 (adapted),1.803
sys/arch/x86/include/cpu.h 1.85
sys/arch/x86/x86/intr.c 1.115-1.116
sys/arch/x86/x86/pmap.c 1.275
sys/arch/x86/x86/sys_machdep.c 1.45
sys/arch/xen/x86/cpu.c 1.117

Stop sharing the double-fault stack.
Merge the TSS structures into one single cpu_tss structure, and
allocate it dynamically.
 1.117.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.117.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.117.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.122.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.122.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.122.2.1 10-Jun-2019  christos Sync with HEAD
 1.131.2.2 29-Feb-2020  ad Sync with head.
 1.131.2.1 17-Jan-2020  ad Sync with head.
 1.133.4.1 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.139.4.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.140.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.142.4.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #268):

sys/arch/xen/xenbus/xenbus_comms.c: revision 1.25
sys/arch/xen/xenbus/xenbus_comms.c: revision 1.26
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.110
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.111
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.112
sys/arch/xen/x86/cpu.c: revision 1.144
sys/arch/xen/x86/cpu.c: revision 1.145
sys/arch/xen/include/hypervisor.h: revision 1.56
sys/arch/xen/include/hypervisor.h: revision 1.57
sys/arch/xen/xen/xbdback_xenbus.c: revision 1.102
sys/arch/xen/xen/xbdback_xenbus.c: revision 1.103
sys/arch/xen/include/xenring.h: revision 1.7
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.109
sys/arch/xen/xen/xengnt.c: revision 1.40
sys/arch/xen/xen/xengnt.c: revision 1.41
sys/arch/xen/xen/if_xennet_xenbus.c: revision 1.129
sys/arch/xen/xen/xencons.c: revision 1.51
sys/arch/xen/xen/xencons.c: revision 1.52
sys/arch/xen/xen/xencons.c: revision 1.53
sys/arch/xen/xen/xbd_xenbus.c: revision 1.130 (patch)
sys/arch/xen/xen/xbd_xenbus.c: revision 1.131 (patch)

xen: Fix sense of xen_rmb/wmb to make sense.

Use membar_acquire and membar_release, not membar_consumer and
membar_producer, out of paranoia -- that better matches Linux's
rmb/wmb (at least for non-I/O loads and stores).

Proposed on port-xen:
https://mail-index.netbsd.org/port-xen/2022/07/13/msg010248.html

xen/x86/cpu.c: Membar audit.

I see no reason for store-before-load ordering here; as far as I'm
aware, evtchn_upcall_mask is only shared between a (v)CPU and its
(hypervisor) interrupts, not other (v)CPUs.

xennet(4): Membar audit.
- xennet_tx_complete: Other side owns rsp_prod, giving us responses
to tx commands. We own rsp_cons, recording which responess we've
processed already.
1. Other side initializes responses before advancing rsp_prod, so
we must observe rsp_prod before trying to examine the responses.
Hence load from rsp_prod must be followed by xen_rmb.
(Can this just use atomic_load_acquire?)
2. As soon as other side observes rsp_event, it may start to
overwrite now-unused response slots, so we must finish using the
response before advancing rsp_cons. Hence we must issue xen_wmb
before store to rsp_event.
(Can this just use atomic_store_release?)
(Should this use RING_FINAL_CHECK_FOR_RESPONSES?)
3. When loop is done and we set rsp_event, we must ensure the other
side has had a chance to see that we want more before we check
whether there is more to consume; otherwise the other side might
not bother to send us an interrupt. Hence after setting
rsp_event, we must issue xen_mb (store-before-load) before
re-checking rsp_prod.
- xennet_handler (rx): Same deal, except the xen_mb is buried in
RING_FINAL_CHECK_FOR_RESPONSES. Unclear why xennet_tx_complete has
this open-coded while xennet_handler (rx) uses the macro.

xbd(4): Membar audit.
After consuming slots, must issue xen_wmb before notifying the other
side that we've consumed them in RING_FINAL_CHECK_FOR_RESPONSES.
xbdback(4): Membar audit.

After consuming request slots, must issue xen_wmb notifying the other
side that we've consumed them in RING_FINAL_CHECK_FOR_REQUESTS.

xencons(4): Membar audit.
- xenconscn_getc: Once we have consumed an input slot, it is clearer
to issue xen_wmb (release, i.e., load/store-before-store) before
advancing in_cons so that the update becomes a store-release
freeing the input slot for the other side to reuse.
- xenconscn_putc: After filling an output slot, must issue xen_wmb
(release, i.e., load/store-before-store) before advancing out_prod,
and another one before notifying the other side of the advance.

xencons(4): Reduce unnecessary membars.
- xencons_handler: After advancing in_cons, only need one xen_wmb
before notifying the hypervisor that we're ready for more.
(XXX Should this do xen_mb and re-check in_prod at that point, or
does hypervisor_notify_via_evtchn obviate the need for this?)
- xenvonscn_getc: After reading in_prod, only need one xen_rmb before
using the slots it is telling us are now ready.

xengnt(4): Membar audit.
This had the sense of membars reversed, presumably because xen_rmb
and xen_wmb had gotten reversed at some point.
xenbus_comms.c: Membar audit.

This had the sense of membars reversed, presumably because xen_rmb
and xen_wmb had gotten reversed at some point.

xennetback(4): Fix xennetback_evthandler loop.
- After observing the other side has produced pending tx requests by
reading sring->req_prod, must issue xen_rmb before touching them.
Despite all the effort to use the heavy-weight
RING_FINAL_CHECK_FOR_REQUESTS on each request in the loop, this
barrier was missing.
- No need to update req_cons at each iteration in the loop. It's
private. Just update it once at the end.
- After consuming requests, must issue xen_wmb before releasing the
slots with RING_FINAL_CHECK_FOR_REQUEST for the other side to
reuse.

xennetback(4): Fix membars in xennetback_rx_copy_process.
- No need for barrier around touching req_cons and rsp_prod_pvt,
which are private.
- RING_PUSH_RESPONSES_AND_CHECK_NOTIFY already issues xen_wmb, no
need to add one explicitly.
- After pushing responses, must issue xen_wmb (not xen_rmb) before
hypervisor_notify_via_evtchn.

xennetback(4): Omit needless membars in xennetback_connect.
xneti is a private data structure to which we have exclusive access
here; ordering the stores doesn't make sense.

xen/hypervisor.h: Nix trailing whitespace.
No functional change intended.

xen/x86/cpu.c: Nix trailing whitespace.
No functional change intended.

xbd(4): Nix trailing whitespace.

xbdback(4): Nix trailing whitespace.
No functional change intended.

xencons(4): Nix trailing whitespace.
No functional change intended.

xengnt(4): Nix trailing whitespace.
No functional change intended.

xenbus_comms.c: Nix trailing whitespace.
No functional change intended.

xennetback(4): Nix trailing whitespace.
No functional change intended.
 1.46 01-Mar-2023  riastradh xen/x86: Need kpreempt_disable/enable around curcpu() access.

This is called with `hardware' interrupts enabled (between sti and
cli), so presumably preemption is possible here.

XXX pullup-8
XXX pullup-9
XXX pullup-10
 1.45 07-Sep-2022  knakahara branches: 1.45.4;
NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.44 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.43 31-May-2022  bouyer When we have pending events in stipending(), evt_set_pending() has to set
the ih_pending flag for each handler too. Xen/i386 should be stable again.
 1.42 31-May-2022  bouyer Revert previous; evt_set_pending() will set ret to 1 if needed to this was
not our bug.
 1.41 31-May-2022  bouyer stipending(): if we're going to process some interrupts don't return 0.
Hopefully fixes random hang seen in i386 Xen PV.

The bug has been there ~forever but was masked by the fact that spllower()
did call event handlers much more often.
 1.40 19-May-2022  bouyer Restore de EOI mechanism for pirq, using the newer hypervisor interface.
It is needed.
Hopefully fixes kern/56291, kern/56793, kern/55667
 1.39 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.38 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.37 21-Apr-2020  jdolecek convert to newer HYPERVISOR_physdev_op() interface, now command and the
arg are separate arguments - this is needed for newer physdev_op commands

remove code for PHYSDEVOP_IRQ_UNMASK_NOTIFY, it is obsolete since
interface version 0x00030202 and is unsupported by newer versions of Xen

confirmed working on amd64 Dom0, i386 compile-tested only
 1.36 09-May-2019  bouyer branches: 1.36.2; 1.36.8;
sti/cli are not allowed on Xen, we have to clear/set a bit in the
shared page. Revert x86_disable_intr/x86_enable_intr to plain function
calls on XENPV.
While there, clean up unused functions and macros, and change cli()/sti()
macros to x86_disable_intr/x86_enable_intr.
Makes Xen domU boot again
(http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/)
 1.35 12-Feb-2019  cherry conditionally include XENPV specific code.

This explicitly excludes PV only functionality that would be wrong to
attempt to use in other modes, for eg: p2m table management.
 1.34 25-Dec-2018  cherry Excise XEN specific code out of x86/x86/intr.c into xen/x86/xen_intr.c

While at it, separate the source function tracking so that the interrupt
paths are truly independant.

Use weak symbol exporting to provision for future PVHVM co-existence
of both files, but with independant paths. Introduce assembler code
such that in a unified scenario, native interrupts get first priority
in spllower(), followed by XEN event callbacks. IPL management and
semantics are unchanged - native handlers and xen callbacks are
expected to maintain their ipl related semantics.

In summary, after this commit, native and XEN now have completely
unrelated interrupt handling mechanisms, including
intr_establish_xname() and assembler stubs and intr handler
management.

Happy Christmas!
 1.33 19-Nov-2018  kre Hide differences between i386 and amd64 interrupt frames so XEN does
not need to know there is one. Hopefully unbreak i386 build.
 1.32 18-Nov-2018  cherry On Xen, copy just the bits we need from the trapframe for hardclock(9)
and statclock(9).

Current, the macros that use the trapframe are:
CLKF_USERMODE()
CLKF_PC()
CLKF_INTR()

Of these, CLKF_INTR() already ignores the frame and uses the ci_idepth
variable to do its job.

Convert the two remaining ones to do this, but only for XEN.
 1.31 18-Nov-2018  cherry Save the interrupt trap/clockframe to a per-cpu copy.

We can use this copy to pass on the trapframe to hardclock(9) from
within the xen timer handler. This delinks the current dependency
between MD code and the handler, which is specially prototyped to take
the clockframe unlike any other handler.

This change has performance implications, as each interrupt entry will
copy the entire trapframe over to the per-cpu cached copy. This can be
mitigated by selectively copying just the parts of the clockframe that
are used by hardclock() et. al.

Tested on amd64 XEN domU
 1.30 17-Nov-2018  cherry Use hypervisor provided interface to unmask specific ports.

Although at first glance this looks suboptimal, the unmask operation
fast path does not use hypervisor_unmask_event(). Instead, it directly
operates on the mask and pending bit arrays to provide what would
effectively be an "auto mask/eoi" semantic.

This change is thus not in the fast path, and has the advantage of
performance improvements since cross CPU state updates etc. is handled
within the hypervisor instead of domU IPIs.
 1.29 26-Oct-2018  cherry Decompose hypervisor_enable_event() into functional steps.

The hypervisor_unmask_event() step is relevant for any event.

The pirq related step is only relevant for pirq bound events.

Prune blanket usage of this, so that usage is semantically appropriate.
 1.28 21-Sep-2014  bouyer branches: 1.28.12; 1.28.18; 1.28.20;
Make Xen kernels compile without DIAGNOSTIC
 1.27 13-Jan-2013  bouyer branches: 1.27.12;
Re-apply
http://mail-index.netbsd.org/source-changes/2012/11/25/msg039125.html
http://mail-index.netbsd.org/source-changes/2012/11/25/msg039126.html
they're not involved in i386 domU hang shown by ATF.
 1.26 12-Jan-2013  bouyer Revert these commits from november 2012:
http://mail-index.netbsd.org/source-changes/2012/11/25/msg039125.html
http://mail-index.netbsd.org/source-changes/2012/11/25/msg039126.html
http://mail-index.netbsd.org/source-changes/2012/11/25/msg039142.html

they cause a i386PAE domU to hang while running ATF tests, as shown in
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/

(we should pay more attention to test results, myself first).
 1.25 12-Jan-2013  bouyer Back out this commit:
http://mail-index.netbsd.org/source-changes/2012/12/28/msg039950.html
which cause a panic when running tests on amd64, as shown on:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/
(i386 hangs for unrelated reasons).
 1.24 28-Dec-2012  cherry Simplify the xen event handler callback by:
- moving the interrupt handler callback traversal into a separate
function.
- using evt_iterate_bits() to scan through the pending bitfield
- removing cross-cpu pending actions - events recieved on the wrong
vcpu are re-routed via hypervisor_send_event().
- simplifying nested while() loops by encapsulating them in
equivalent functions.

Many thanks for multiple reviews by bouyer@ and jym@
 1.23 25-Nov-2012  cherry Make hypervisor_set_ipending() and its consumers cpu unaware. This syncs syntax with semantics
 1.22 10-Nov-2012  cherry Remove e a redundant layer of function calling in the event handling path
 1.21 27-Dec-2011  cherry branches: 1.21.6;
Optimise branch predict hint for the intended use-case (cross cpu event notification)
 1.20 27-Dec-2011  cherry Do not touch pending flags across vcpus
 1.19 26-Dec-2011  cherry Do not fiddle with the event masks of non-local vcpus when unmasking events across vcpus
 1.18 03-Dec-2011  bouyer branches: 1.18.2;
hypervisor_unmask_event(): don't check/update evtchn_pending_sel for the
current CPU, but for any CPU which may accept this event.
xen/xenevt.c: more use of atomic ops and locks where appropriate, and some
other SMP fixes. Handle all events on the primary CPU (may be revisited
later). Set/clear ci_evtmask[] for watched events.

This should fix the problems on dom0 kernels reported by jym@
 1.17 19-Nov-2011  cherry [merging from cherry-xenmp] bring in bouyer@'s changes via:
http://mail-index.netbsd.org/source-changes/2011/10/22/msg028271.html
From the Log:
Log Message:
Various interrupt fixes, mainly:
keep a per-cpu mask of enabled events, and use it to get pending events.
A cpu-specific event (all of them at this time) should not be ever masked
by another CPU, because it may prevent the target CPU from seeing it
(the clock events all fires at once for example).
 1.16 20-Sep-2011  jym branches: 1.16.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.15 10-Aug-2011  cherry refactor the bitstring/mask operations to be behind an API. Make pending interrupt marking cpu aware.
 1.14 30-Mar-2011  jym branches: 1.14.2;
Fix a year old bug that was only fixed in jym-xensuspend branch, but
not in HEAD:
- use uvm_km_alloc() instead of kmem_alloc() to enforce alignement when
allocating p2m_frame pages (xentools can only deal with page-aligned
addresses)
- do not use paddr_t for p2m_frame_list_list with PAE, xentools expect
32 bits PFNs even with 64 bits PTE.

Required to make ``xm dump-core'' work as expected.
 1.13 23-Oct-2009  snj branches: 1.13.4; 1.13.6;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).
 1.12 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.11 21-Oct-2008  cegger branches: 1.11.8;
introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.10 16-Sep-2008  bouyer Implement the arch-dependent p2m frame lists list. This adds support for
'xm dump-core' for NetBSD domUs.
From Jean-Yves Migeon (jean-yves dot migeon at espci dot fr)
 1.9 01-Jul-2008  bouyer branches: 1.9.2;
Raise ci_idepth (and switch to interrupt stack on i386) becore calling
xenevt_event().
 1.8 21-Apr-2008  cegger branches: 1.8.2; 1.8.4; 1.8.6;
Access Xen's vcpu info structure per-CPU.
Tested on i386 and amd64 (both dom0 and domU) by me.
Xen2 tested (both dom0 and domU) by bouyer.
OK bouyer
 1.7 14-Apr-2008  cegger branches: 1.7.2;
- use POSIX integer types
- ansify functions
 1.6 19-Feb-2008  bouyer branches: 1.6.6;
The event bitmasks provided by the hypervisor are unsigned long (so 64bits
on amd64). Make sure to use the right type to store and manipulate them.
This fixes amd64, where basically any event channel > 31 was not working
(and you get there after starting/stopping a domU a few times). Things
would occasionally unwedge though the spllower() callbacks.
 1.5 19-Feb-2008  bouyer Fix xenevt to not call softint_schedule() above IPL_HIGH:
Register a ipl callback for IPL_HIGH.
if the current ipl level is too high, just record the event in a bitmap,
and record IPL_HIGH as pending. The callback will process the pending events.
 1.4 20-Dec-2007  ad branches: 1.4.2;
- Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.3 12-Dec-2007  bouyer cleanup the debug event handler to not use the IPL system at all. Fix
debug event storm on XEN2.
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10; 1.2.12;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file hypervisor_machdep.c was initially added on branch bouyer-xenamd64.
 1.1.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.12.2 02-Jan-2008  bouyer Sync with HEAD
 1.2.12.1 13-Dec-2007  bouyer cleanup the way debug event is handled: make it bypass the IPL system
completely, it's called by shortcuts in the normal path because we want it to
be always called, even if the IPL is high.
Fix debug even recursion on XEN2
 1.2.10.1 13-Dec-2007  yamt sync with head.
 1.2.8.4 27-Feb-2008  yamt sync with head.
 1.2.8.3 21-Jan-2008  yamt sync with head
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file hypervisor_machdep.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:17 +0000
 1.2.6.1 26-Dec-2007  ad Sync with head.
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file hypervisor_machdep.c was added on branch vmlocking on 2007-12-03 19:04:40 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file hypervisor_machdep.c was added on branch jmcneill-pm on 2007-11-27 19:36:19 +0000
 1.4.2.3 23-Mar-2008  matt sync with HEAD
 1.4.2.2 09-Jan-2008  matt sync with HEAD
 1.4.2.1 20-Dec-2007  matt file hypervisor_machdep.c was added on branch matt-armv6 on 2008-01-09 01:50:14 +0000
 1.6.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.6.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.6.6.2 02-Jul-2008  mjf Sync with HEAD.
 1.6.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.7.2.1 18-May-2008  yamt sync with head.
 1.8.6.1 03-Jul-2008  simonb Sync with head.
 1.8.4.2 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.8.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.8.2.3 11-Mar-2010  yamt sync with head
 1.8.2.2 19-Aug-2009  yamt sync with head.
 1.8.2.1 04-May-2009  yamt sync with head.
 1.9.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.9.2.1 19-Oct-2008  haad Sync with HEAD.
 1.11.8.10 27-Aug-2011  jym Rename the functions for suspend to reflect that Xen does not hijack
the ACPI "sleepstate" sysctl(7) node anymore.

Add a boolean value to mark that the save/suspend operation has been
notified by dom0, so as to avoid possible errors where admin would like
to schedule the domain for sleep without dom0 being prepared for that. Fail
with EAGAIN in this case.

Sprinkle some KNF.
 1.11.8.9 27-Aug-2011  jym Further sync with HEAD.
 1.11.8.8 07-May-2011  jym KNF.
 1.11.8.7 02-May-2011  jym Sync with head.
 1.11.8.6 30-Mar-2011  jym Sync with my commits in HEAD.
 1.11.8.5 29-Mar-2011  jym More sync fixes. And add the mbr_gpt files.
 1.11.8.4 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.11.8.3 01-Nov-2009  jym Sync with HEAD.
 1.11.8.2 29-May-2009  jym - use uvm_km_alloc() instead of kmem_alloc() to enforce alignement when
allocating p2m_frame pages (xentools can only deal with page-aligned addresses)
- *sigh* do not use paddr_t for p2m_frame_list_list with PAE, xentools expect
32 bits addresses even with 64 bits PTE...
 1.11.8.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.13.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.13.4.1 21-Apr-2011  rmind sync with head
 1.14.2.6 22-Oct-2011  bouyer Various interrupt fixes, mainly:
keep a per-cpu mask of enabled events, and use it to get pending events.
A cpu-specific event (all of them at this time) should not be ever masked
by another CPU, because it may prevent the target CPU from seeing it
(the clock events all fires at once for example).
 1.14.2.5 18-Sep-2011  cherry Use an IPI to re-route events to the cpu where the handler has been registered
 1.14.2.4 22-Aug-2011  cherry Do not trust the hypervisor to route events to the right cpu. Enforce this in stipending()
 1.14.2.3 17-Aug-2011  cherry Pullup relevant changes from -current
 1.14.2.2 04-Aug-2011  cherry first cut at per-cpu event handling
 1.14.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.16.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.16.2.1 17-Apr-2012  yamt sync with head
 1.18.2.1 18-Feb-2012  mrg merge to -current.
 1.21.6.3 03-Dec-2017  jdolecek update from HEAD
 1.21.6.2 25-Feb-2013  tls resync with head
 1.21.6.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.27.12.1 22-Sep-2014  martin Pull up following revision(s) (requested by bouyer in ticket #115):
sys/arch/xen/x86/hypervisor_machdep.c: revision 1.28
sys/arch/xen/xenbus/xenbus_client.c: revision 1.13
sys/arch/xen/xen/xbdback_xenbus.c: revision 1.60
sys/arch/xen/xen/clock.c: revision 1.63
Make Xen kernels compile without DIAGNOSTIC
 1.28.20.1 10-Jun-2019  christos Sync with HEAD
 1.28.18.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.28.18.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.28.12.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1864):

sys/arch/xen/x86/hypervisor_machdep.c: revision 1.46 (patch)

xen/x86: Need kpreempt_disable/enable around curcpu() access.

This is called with `hardware' interrupts enabled (between sti and
cli), so presumably preemption is possible here.
 1.36.8.7 25-Apr-2020  bouyer sync with bouyer-xenpvh-base2 (HEAD)
 1.36.8.6 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.36.8.5 19-Apr-2020  bouyer Move xen_ipi.c to XENPV only.
Make sure we don't need to send events to remote CPUs (outside of IPIs)
 1.36.8.4 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.36.8.3 16-Apr-2020  bouyer amd64: Xhypervisor_pvhvm_callback has to be in text.user for SVS.
Thanks to maxv@ for helping me with this.
Enable SVS again.
While there, increase ci_idepth before calling do_hypervisor_callback,
and don't touch ci_idepth while looping over pending events.
 1.36.8.2 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.36.8.1 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.36.2.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1681):

sys/arch/xen/x86/hypervisor_machdep.c: revision 1.46 (patch)

xen/x86: Need kpreempt_disable/enable around curcpu() access.

This is called with `hardware' interrupts enabled (between sti and
cli), so presumably preemption is possible here.
 1.45.4.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #269):

sys/arch/xen/x86/hypervisor_machdep.c: revision 1.46

xen/x86: Need kpreempt_disable/enable around curcpu() access.

This is called with `hardware' interrupts enabled (between sti and
cli), so presumably preemption is possible here.
 1.4 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.3 02-Dec-2007  ad branches: 1.3.2; 1.3.6; 1.3.14; 1.3.20; 1.3.22; 1.3.24; 1.3.26;
Don't clear ci_want_resched in MD code; it's done in mi_switch().
 1.2 22-Nov-2007  bouyer branches: 1.2.2;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file idle_machdep.c was initially added on branch bouyer-xenamd64.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.2.3 03-Dec-2007  joerg Sync with HEAD.
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file idle_machdep.c was added on branch jmcneill-pm on 2007-11-27 19:36:20 +0000
 1.3.26.1 23-Jun-2008  wrstuden Remove files removed on branch. Updating using patch has its
drawbacks. :-)
 1.3.24.1 16-May-2008  yamt sync with head.
 1.3.22.1 17-Jun-2008  yamt fix merge botches
 1.3.20.1 02-Jun-2008  mjf Sync with HEAD.
 1.3.14.2 09-Jan-2008  matt sync with HEAD
 1.3.14.1 02-Dec-2007  matt file idle_machdep.c was added on branch matt-armv6 on 2008-01-09 01:50:14 +0000
 1.3.6.2 07-Dec-2007  yamt sync with head
 1.3.6.1 02-Dec-2007  yamt file idle_machdep.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:17 +0000
 1.3.2.2 03-Dec-2007  ad Sync with HEAD.
 1.3.2.1 02-Dec-2007  ad file idle_machdep.c was added on branch vmlocking on 2007-12-03 19:04:40 +0000
 1.34 04-Nov-2017  cherry Retire xen/x86/intr.c and use the new xen specific glue in x86/x86/intr.c

The purpose of this change is to expose the x86/include/intr.h API
to drivers. Specifically the following functions:

void *intr_establish_xname(...);
void *intr_establish(...);
void intr_disestablish(...);

while maintaining the old API from xen/include/evtchn.h, specifically
the following functions:

int event_set_handler(...);
int event_remove_handler(...);

This is so that if things break, we can keep using the old API until
everything stabilises. This is a stepping stone towards getting the
actual XEN event callback path rework code in place - which can be
done opaquely behind the intr.h API - NetBSD/XEN specific drivers that
have been ported to the intr.h API should then work without
significant further modifications.
 1.33 04-Nov-2017  cherry On XEN dom0, the function xen/x86/intr.c:xen_intr_map() is used to map
hardware interrupts to XEN callbacks called 'events'. This function
combines both the allocation and the binding.

This change is the first part of breaking up that combination into
xen_pirq_alloc() and the binding will happen as part of the
pic_addroute() callback of a new pseudo PIC_XEN

This code will be added later on.
 1.32 16-Jul-2017  cherry branches: 1.32.2;
Remove the xen specific interrupt type for the x86 intr_handle_t
For this to work, we use the evtchn.c:get_pirq_to_evtchn() glue
function to make things easier.
 1.31 23-May-2017  nonaka x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.30 17-Oct-2016  jdolecek provide stub intr xname establish for xen
 1.29 13-Aug-2011  cherry branches: 1.29.12; 1.29.30; 1.29.34;
Remove spurious header.
Thanks rmind@
 1.28 11-Aug-2011  cherry Make event/interrupt handling MP aware
 1.27 19-Mar-2010  dyoung branches: 1.27.6;
Cosmetic: abbreviate: use `pc' instead of `pci_chipset_tag'.
 1.26 18-Aug-2009  jmcneill branches: 1.26.2; 1.26.4;
Switch to ACPICA 20090730, and update for API changes.
 1.25 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.24 03-Jun-2009  cegger Interrupt handling in Xen 3.5 changed. There's no longer
a hardcoded upper limit. So *our* upper limit of 200 may be different from machine to machine now.
So just retry if the hypercall failed.
 1.23 22-Apr-2009  ad Make xen kernels build again.
 1.22 10-Mar-2009  bouyer When ioapic is used, for ISA interrupts, reuse the legacy ISA interrupt
number instead of allocating a new one. Force allocating a new interrupt number
for PCI devices, as the number stored in the PCI interrupt register
may be wrong.
This should help using a pciide controller in compat mode or ISA devices
in a non-0 domain.
 1.21 05-Sep-2008  tron branches: 1.21.2; 1.21.4; 1.21.8; 1.21.12;
Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.
 1.20 03-Jul-2008  drochner branches: 1.20.2;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.19 30-May-2008  ad branches: 1.19.2;
Add a 'known_mpsafe' argument to intr_establish().
 1.18 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.17 11-Jan-2008  bouyer branches: 1.17.6; 1.17.8; 1.17.10; 1.17.12;
Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.16 20-Dec-2007  ad - Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.15 03-Dec-2007  ad branches: 1.15.2; 1.15.6;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.14 22-Nov-2007  bouyer Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.13 29-Jan-2007  hubertf branches: 1.13.6; 1.13.12; 1.13.22; 1.13.24; 1.13.28; 1.13.30;
Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
 1.12 08-Dec-2006  yamt - pass intrframe by-pointer, not by-value.
- make i386 and xen use per-cpu interrupt stack.

xen part is reviewed by Manuel Bouyer.
 1.11 15-Oct-2006  yamt include machine/mpconfig.h so that these files can be compiled
with ACPI but without MPBIOS.
 1.10 12-Oct-2006  yamt intr_establish: add a missing ";" in the case of NIOAPIC==0.
 1.9 28-Sep-2006  bouyer Add Xen3 support for ACPI and/or MPBIOS + IOAPIC. To help with this, physical
CPUs are now configured on mainbus only in dom0, and only to know about
their APIC id. virtual CPUs are attached to hypervisor as:
vcpu* at hypervisor?
and this is what's used as curcpu(). The kernel config files needs to be
updated for this, see XEN3_DOM0 or XEN3_DOMU for examples.
XEN3_DOM0 now has acpi, MPBIOS and ioapic by default.
Note that a Xen dom0 kernel doens't have access to the lapic.
 1.8 09-Apr-2006  bouyer branches: 1.8.8; 1.8.10;
Add support for ACPI in xen-3 dom0 support. We can now boot a xen-3 dom0
kernel with a default xen comamnd line.
 1.7 11-Dec-2005  christos branches: 1.7.4; 1.7.6; 1.7.8; 1.7.10; 1.7.12;
merge ktrace-lwp.
 1.6 16-Apr-2005  bouyer branches: 1.6.2;
Get rid of the event to pseudo-irq mapping. We are limited to 32 pseudo-irq,
including soft interrupt, and this is way too low in some use (lots of domains,
or domains with lots of xennet, or even hardware with lots of devices at
different interrupts).
Based on idea from YAMAMOTO Takashi, keep one list of handler per-event and
one per-IPL (so the same handler is now in 2 lists). In the common case were
an event is received at low IPL, we can call the handlers quickly (there
is usually only one handler per event, unless the event is mapped to a
physical interrupt and this interrupt is shared by different devices).
Deffered events and software interrupts are handled by a bitmask (as before)
with one bit per IPL. When one IPL has an event pending all handlers for
this IPL will be called.
With this change, it is now possible to have all the 1024 events active.

While here, handle debug event in a special way: the handler is always called,
regardless of the current IPL. Make the handler print usefull informations
about events and IPL states.
Also remove code not used on Xen in files inherited from the x86 port.
 1.5 11-Apr-2005  yamt fix a bug which corrupts runqueue.
when dealing with events, which are handed to xenevt pseudo device,
don't call wakeup(9)/selnotify(9) at too high IPL. PR/29792.
 1.4 09-Mar-2005  bouyer branches: 1.4.2;
Merge the bouyer-xen2 branch. This add supports for the Xen 2.0 virtual
machine kernel (both privileged and non-privileged domains), and remove support
for the old xen 1.2.
 1.3 23-Oct-2004  yamt branches: 1.3.4; 1.3.6; 1.3.8;
don't reference kernel_lock directly.
 1.2 11-Apr-2004  cl branches: 1.2.2;
catch up with arch/x86/x86/intr.c
1.15/kochi
use designated initializer for struct pic initializers.
just for readability.

update the xenev_pic initializer as well
 1.1 11-Mar-2004  cl branches: 1.1.2;
Add port to the Xen virtual machine monitor.
(see http://www.cl.cam.ac.uk/Research/SRG/netos/xen/)
 1.1.2.1 22-May-2004  he Pull up revision 1.2 (requested by cl in ticket #337):
Upgrade xen support:
- add block device driver
- network device driver bug fixes
- support for vga/keyboard/mouse
- support for domain0 operations
- fix /dev/mem and i386_iopl, reboot, event dispatch
- fix clock support, cpu speed report, lazy fpu switching
- add xen12load loader
- sys/arch/xen parts of build.sh release support
[cl, ticket #337]
 1.2.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.2.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 11-Apr-2004  skrll file intr.c was added on branch ktrace-lwp on 2004-08-03 10:43:19 +0000
 1.3.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.3.6.1 29-Apr-2005  kent sync with -current
 1.3.4.2 18-Jan-2005  bouyer Remove unused code.
 1.3.4.1 17-Dec-2004  bouyer Sync with arch/x86/x86/intr.c 1.20.
 1.4.2.2 28-Apr-2005  tron Pull up revision 1.6 (requested by bouyer in ticket #192):
Get rid of the event to pseudo-irq mapping. We are limited to 32 pseudo-irq,
including soft interrupt, and this is way too low in some use (lots of domains,
or domains with lots of xennet, or even hardware with lots of devices at
different interrupts).
Based on idea from YAMAMOTO Takashi, keep one list of handler per-event and
one per-IPL (so the same handler is now in 2 lists). In the common case were
an event is received at low IPL, we can call the handlers quickly (there
is usually only one handler per event, unless the event is mapped to a
physical interrupt and this interrupt is shared by different devices).
Deffered events and software interrupts are handled by a bitmask (as before)
with one bit per IPL. When one IPL has an event pending all handlers for
this IPL will be called.
With this change, it is now possible to have all the 1024 events active.
While here, handle debug event in a special way: the handler is always called,
regardless of the current IPL. Make the handler print usefull informations
about events and IPL states.
Also remove code not used on Xen in files inherited from the x86 port.
 1.4.2.1 13-Apr-2005  tron Pull up revision 1.5 (requested by yamt in ticket #146):
fix a bug which corrupts runqueue.
when dealing with events, which are handed to xenevt pseudo device,
don't call wakeup(9)/selnotify(9) at too high IPL. PR/29792.
 1.6.2.5 21-Jan-2008  yamt sync with head
 1.6.2.4 07-Dec-2007  yamt sync with head
 1.6.2.3 26-Feb-2007  yamt sync with head.
 1.6.2.2 30-Dec-2006  yamt sync with head.
 1.6.2.1 21-Jun-2006  yamt sync with head.
 1.7.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.7.10.1 19-Apr-2006  elad sync with head - hopefully this will work
 1.7.8.1 11-Apr-2006  yamt sync with head
 1.7.6.1 22-Apr-2006  simonb Sync with head.
 1.7.4.1 09-Sep-2006  rpaulo sync with head
 1.8.10.2 10-Dec-2006  yamt sync with head.
 1.8.10.1 22-Oct-2006  yamt sync with head
 1.8.8.3 01-Feb-2007  ad Sync with head.
 1.8.8.2 12-Jan-2007  ad Sync with head.
 1.8.8.1 18-Nov-2006  ad Sync with head.
 1.13.30.3 18-Feb-2008  mjf Sync with HEAD.
 1.13.30.2 27-Dec-2007  mjf Sync with HEAD.
 1.13.30.1 08-Dec-2007  mjf Sync with HEAD.
 1.13.28.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.13.24.2 23-Mar-2008  matt sync with HEAD
 1.13.24.1 09-Jan-2008  matt sync with HEAD
 1.13.22.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.13.22.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.13.12.1 18-Apr-2007  thorpej Convert i386 and amd64 to the new atomic ops API.
 1.13.6.1 03-Dec-2007  ad Sync with HEAD.
 1.15.6.2 06-Jan-2008  bouyer Merge needed changes to genassym.cf and locore.S for xeni386 back to
arch/i386. Switch xeni386 to use the arch/i386 cpu.h.
 1.15.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.15.2.1 26-Dec-2007  ad Sync with head.
 1.17.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.17.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.17.10.5 11-Aug-2010  yamt sync with head.
 1.17.10.4 19-Aug-2009  yamt sync with head.
 1.17.10.3 20-Jun-2009  yamt sync with head
 1.17.10.2 04-May-2009  yamt sync with head.
 1.17.10.1 16-May-2008  yamt sync with head.
 1.17.8.2 04-Jun-2008  yamt sync with head
 1.17.8.1 18-May-2008  yamt sync with head.
 1.17.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.17.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.19.2.1 03-Jul-2008  simonb Sync with head.
 1.20.2.1 19-Oct-2008  haad Sync with HEAD.
 1.21.12.1 21-Apr-2010  matt sync to netbsd-5
 1.21.8.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.21.8.4 24-Oct-2010  jym Sync with HEAD
 1.21.8.3 01-Nov-2009  jym Sync with HEAD.
 1.21.8.2 06-Jun-2009  jym As requested by cegger@, apply the following patch to jym-xensuspend branch:

Interrupt handling in Xen 3.5 changed. There's no longer
a hardcoded upper limit. So *our* upper limit of 200 may be different from machine to machine now.
So just retry if the hypercall failed.
 1.21.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.21.4.1 04-Oct-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1054):
sys/arch/xen/x86/intr.c: revision 1.22
sys/arch/xen/xen/isa_machdep.c: revision 1.14
sys/arch/xen/xen/pci_intr_machdep.c: revision 1.9
sys/arch/xen/xen/pciide_machdep.c: revision 1.12
When ioapic is used, for ISA interrupts, reuse the legacy ISA interrupt
number instead of allocating a new one. Force allocating a new interrupt number
for PCI devices, as the number stored in the PCI interrupt register
may be wrong.
This should help using a pciide controller in compat mode or ISA devices
in a non-0 domain.
 1.21.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.26.4.1 30-May-2010  rmind sync with head
 1.26.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.27.6.2 17-Aug-2011  cherry Pullup relevant changes from -current
 1.27.6.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.29.34.1 04-Nov-2016  pgoyette Sync with HEAD
 1.29.30.2 28-Aug-2017  skrll Sync with HEAD
 1.29.30.1 05-Dec-2016  skrll Sync with HEAD
 1.29.12.1 03-Dec-2017  jdolecek update from HEAD
 1.32.2.2 16-Jul-2017  cherry 2739767
 1.32.2.1 16-Jul-2017  cherry file intr.c was added on branch perseant-stdc-iso10646 on 2017-07-16 06:14:25 +0000
 1.20 22-Dec-2018  cherry Move mainbus(4) driver files in various x86 sub-archs to name prefixed
versions. This allows us to further modularise them by unifying common
bus probe code in x86/x86/mainbus.c to be introduced next.

This commit has no functional changes. It is done for ease of
visibility of newer diffs in the queue.
 1.19 23-May-2017  nonaka branches: 1.19.8; 1.19.10;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.18 03-Mar-2014  dsl branches: 1.18.6;
Use the global pci_mode to avoid 'set but not used' warnings from gcc 4.8.3.
 1.17 31-Jan-2014  bouyer Move back call to pci_mode_detect() outside of #ifdef PCI_BUS_FIXUP.
Even if mode is not used, the call to pci_mode_detect() is mandatory to
initialize the PCI subsystem.
Fix "panic booting -current DOM0" reported by Patrick Welche on port-xen.
 1.16 06-Nov-2013  mrg - move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.
 1.15 20-Sep-2011  jym branches: 1.15.2; 1.15.12; 1.15.16;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.14 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.13 12-Nov-2010  dholland Build fix for xen domu + PCI, from Juho Salminen in PR 44083.
 1.12 07-Aug-2010  cegger acpi_madt.h is gone
 1.11 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.10 15-Feb-2010  dyoung branches: 1.10.2;
Don't use the global variable pci_mode, but use a local copy of
the return value of pci_mode_detect(), instead.
 1.9 18-Aug-2009  jmcneill branches: 1.9.2;
Switch to ACPICA 20090730, and update for API changes.
 1.8 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.7 18-Jan-2009  bouyer branches: 1.7.2;
The Xen PCI_BUS_FIXUP/PCI_ADDR_FIXUP has rotted, catch up with x86 changes
in this area. Patch provided by FUKAUMI Naoki in PR#40356.
 1.6 09-Nov-2008  cegger Nuke last parameter from mpaci_scan_apics() and mpbios_scan().
It is unused.
 1.5 21-Oct-2008  cegger branches: 1.5.2; 1.5.4;
introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.4 16-Apr-2008  cegger branches: 1.4.4; 1.4.10;
device_t / softc split
reviewed, tested and approved by bouyer
 1.3 11-Jan-2008  bouyer branches: 1.3.6;
Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.8; 1.2.12; 1.2.16;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file mainbus.c was initially added on branch bouyer-xenamd64.
 1.1.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.16.3 23-Mar-2008  matt sync with HEAD
 1.2.16.2 09-Jan-2008  matt sync with HEAD
 1.2.16.1 22-Nov-2007  matt file mainbus.c was added on branch matt-armv6 on 2008-01-09 01:50:15 +0000
 1.2.12.1 05-Jan-2008  bouyer Remove files that just include the x86 counterpart.
 1.2.8.3 21-Jan-2008  yamt sync with head
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file mainbus.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:18 +0000
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file mainbus.c was added on branch vmlocking on 2007-12-03 19:04:41 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file mainbus.c was added on branch jmcneill-pm on 2007-11-27 19:36:21 +0000
 1.3.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.3.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.4.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.4.4.4 11-Aug-2010  yamt sync with head.
 1.4.4.3 11-Mar-2010  yamt sync with head
 1.4.4.2 19-Aug-2009  yamt sync with head.
 1.4.4.1 04-May-2009  yamt sync with head.
 1.5.4.1 22-Jan-2009  snj Pull up following revision(s) (requested by bouyer in ticket #286):
sys/arch/xen/conf/files.xen: revision 1.92
sys/arch/xen/x86/mainbus.c: revision 1.7 via patch
sys/arch/xen/xen/hypervisor.c: revision 1.43
The Xen PCI_BUS_FIXUP/PCI_ADDR_FIXUP has rotted, catch up with x86 changes
in this area. Patch provided by FUKAUMI Naoki in PR#40356.
 1.5.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.7.2.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.7.2.5 10-Jan-2011  jym Sync with HEAD
 1.7.2.4 24-Oct-2010  jym Sync with HEAD
 1.7.2.3 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.7.2.2 01-Nov-2009  jym Sync with HEAD.
 1.7.2.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.9.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.9.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.10.2.2 05-Mar-2011  rmind sync with head
 1.10.2.1 30-May-2010  rmind sync with head
 1.15.16.1 18-May-2014  rmind sync with head
 1.15.12.2 03-Dec-2017  jdolecek update from HEAD
 1.15.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.15.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.6.1 28-Aug-2017  skrll Sync with HEAD
 1.19.10.1 10-Jun-2019  christos Sync with HEAD
 1.19.8.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.25 18-Aug-2025  andvar Fix various typos, mainly in comments:
s/invaid/invalid/
s/instad/instead/
s/wich/with/
s/tranform/transform/
s/tranmist/transmit/
s/tranceiver/transceiver/
s/Tranparent/Transparent/
s/tranlated/translated/
s/tranfer/transfer/
s/tranmissions/transmissions/
s/condtions/conditions/
s/Recient/Recent/
 1.24 26-May-2022  bouyer aprint_debug(): if a hypercall fail, print the return code.
 1.23 24-May-2022  bouyer Remove useless info from debug printf, fix format warning on i386
 1.22 24-May-2022  bouyer - msipic_construct_msix_pic(): set mp_table_base to memaddr (without
table_offset), this is what Xen wants
while there use pci_conf_write16() in msi_set_msictl_enablebit() too,
for consistency (it seems that Xen accepts the 32bit write at this point,
but this may change).

- xen_map_msix_pirq(): don't forget to set map_irq.table_base in the
MSI-X case, otherwise Xen maps it as MSI
- call pic_hwunmask() after pirq_establish() in msi/msix case, to make sure
the msi-x vector is unmasked.

Now MSI-X works with Xen so stop disabling it in pci_attach_hook().
 1.21 23-May-2022  bouyer Work in progress on MSI/MSI-X on Xen (MSI works on my hardware, more work
needed for MSI-X):
- Xen silently rejects 32 bits writes to MSI configuration registers
(especially when setting PCI_MSI_CTL_MSI_ENABLE/PCI_MSIX_CTL_ENABLE),
it expects 16 bits writes. So introduce a pci_conf_write16(),
only available on XENPV (and working only for mode 1 without
PCI_OVERRIDE_CONF_WRITE) and use it to enable MSI or MSI-X on XENPV.
- for multi-MSI vectors, Xen allocates all of them in a single hypercall,
so it's not convenient to do it at intr_establish() time.
So do it at alloc() time and register the pirqs in the msipic structure.
xen_pic_to_gsi() now just returns the values cached in the msipic.
As a bonus, if the PHYSDEVOP_map_pirq hypercall fails we can fail
the alloc() and we don't need the xen_pci_msi*_probe() hacks.

options NO_PCI_MSI_MSIX still on by default for XEN3_DOM0.
 1.20 01-Aug-2020  jdolecek adjust includes to pull __HAVE_PCI_MSI_MSIX properly
 1.19 19-Jul-2020  jdolecek add #ifdef __HAVE_PCI_MSI_MSIX so this still compiles with NO_PCI_MSI_MSIX
 1.18 19-Jul-2020  jdolecek for Xen MSI, fallback to INTx when PHYSDEVOP_map_pirq fails for the device

apparently Xen requires VT-d to be enabled in BIOS for PHYSDEVOP_map_pirq
to work, this change makes it work on systems with VT-d disabled or missing

adresses the panic part of PR port-xen/55285 by Patrick Welche
 1.17 23-May-2020  jdolecek switch back to PHYSDEVOP_alloc_irq_vector for non-MSI interrupts - on my
computer it works the same as PHYSDEVOP_map_pirq, but seems it doesn't
on other systems

fixes PR port-xen/55285 for Patrick Welche, but not yet for another system
by Frank Kardel
 1.16 15-May-2020  jdolecek use short for irq2port[] to save memory (4KB), it only needs to store
numbers <= NR_EVENT_CHANNELS (2048)
 1.15 15-May-2020  jdolecek only call PHYSDEVOP_map_pirq for a shared interrupt once, same as previous code

fixes boot problem reported privately by Frank Kardel and Patrick Welche
 1.14 04-May-2020  jdolecek add support for using MSI for XenPV Dom0

use PHYSDEVOP_map_pirq to get the pirq/gsi for MSI/MSI-X, switch also INTx
to use it instead of PHYSDEVOP_alloc_irq_vector

MSI confirmed working with single-vector MSI for wm(4), ahcisata(4), bge(4)

XXX added some provision for MSI-X, but it doesn't actually work (no interrupts
delivered), needs some further investigation; disable MSI-X for XENPV
via flag in x86/pci/pci_machdep.c
 1.13 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.12 21-Apr-2020  jdolecek convert to newer HYPERVISOR_physdev_op() interface, now command and the
arg are separate arguments - this is needed for newer physdev_op commands

remove code for PHYSDEVOP_IRQ_UNMASK_NOTIFY, it is obsolete since
interface version 0x00030202 and is unsupported by newer versions of Xen

confirmed working on amd64 Dom0, i386 compile-tested only
 1.11 07-Apr-2020  jdolecek branches: 1.11.2;
remove <sys/malloc.h> include, not used here
 1.10 13-Feb-2019  cherry Catchup with struct intrstub; unification.

This should fix dom0 build breakage.
 1.9 10-Oct-2018  cherry Do not export the 'irq<->vector' abstraction outside of pintr.c
anymore. We now think of them as a unified thing called 'gsi',
which is generated by mpacpi/mpbios
 1.8 10-Oct-2018  cherry Since GSIs are invented by the mpbios/mpacpi interrupt routing probe code,
it's possible for shared GSIs to popup even outside the original
legacy_irq range.

Relax this latter, older assumption.

Thanks to Brad Spencer for extensive trialing on interesting hardware.
 1.7 07-Oct-2018  cherry Switch over to a "GSI" concept for guest irqs.

On XEN there is a namespace called GSI which includes:

i) legacy_irq (0 - 16)
ii) "gsi" (16-nr_irqs_gsi)
iii) msi

We try to mirror this in guest space, but are mindful that legacy_irq
is 1:1 bound to actual hardware legacy_irq. Apart from this, XEN doesn't
really care what number scheme we use, as long as it doesn't encroach
on the MSI space, which is TBD for us.

Thus we trust the mpbios.c/mpacpi.c code to correctly map the pic,pin
tuples into the correct global gsi space, which we then register with
xen. As we now do, we allow for duplicate gsi registrations, in case
any hardware shares the same (pic,pin);

This enables us to now use the (pic,pin) tuple as the canonical reference
for device interrupt addresses, and leave any global mappings to specific
code. Thus xen_pic_to_gsi().

Note that this requires separate support for MSI, which I will get around to
once things stabilise - however the API change facilitates this nicely.

I note that the msi addroute() function does not use the "pin" parameter.
This can be made use of, to encode the gsi number, for XEN. This is however
TBD.

We further tweak the xen_vec_alloc() code to be uniform for the NIOAPICS
and other cases, and ensure that i8259.c DTRT wrt to route().

This will allow us to use pic->pic_addroute() without needing to worry about
pic specific issues.

The next step is to consolidate the pic_addroute() XEN related #ifdefs into
a -DXEN specific file, so that we don't clutter x86/ code with #ifdef XENs.

This change has functional implications, and there is likely breakage coming
especially on bespoke platforms that I haven't been able to test yet.

I am especially interested in bug reports from platforms with legacy (esp. i386)
and with multiple ioapics.
 1.6 06-Oct-2018  cherry Change the name of xen_pirq_alloc() to xen_vec_alloc() to reflect
its actual job.

The idea is that we will strip this down until it is as close to
idt_vec_alloc() as possible.
 1.5 06-Oct-2018  cherry Move the pic->pic_addroute() call from within pintr.c:xen_pirq_alloc() to
intr.c:intr_establish_xname()

xen_pirq_alloc() now returns a vector value, as is intended by
the semantics of the call to the hypervisor, PHYSDEVOP_ASSIGN_VECTOR.

This also brings our usage closer to native.
 1.4 06-Oct-2018  cherry Teach intr_establish_xname() for XEN to tolerate shared legacy_irq
registrations.

The current XEN code has not been able to tolerate shared legacy_irq
requests in xen_pirq_alloc(). This was never a problem because:

i) The only potential callpath with shared legacy_irq was
isa_intr_establish_xname().
ii) The other callpath, namely pci_intr_establish_xname() passed
legacy_irq to intr_establish_xname(). However, this was ignored,
and a value of zero was passed to xen_pirq_alloc() which in
turn, allocated a new irq value, thus effectively demultiplexing
any shared legacy_irq value intended across randomly allocated
new irq values.
iii) Presumably on all platforms that XEN runs on, the isa callpath
mentioned in i) never had shared irqs, and thus this was never
a problem.

Except:
We now use a unified path for both isa_intr_establish_xname() and
pci_intr_establish_xname(). This means that now, intr_establish_xname()
which is a callee of both, needs to have a way to discern who the caller
was, and decide to pass on or discard the legacy_irq value, to preserve
the old semantics. However, this is impossible to do so, because the
callpath doesn't explicitly provide a mechanism for this discernment.

This is however not a problem, because from XEN's point of view, a
repeat registration of an irq is only a warning. legacy_irq is the only
case in which this repeat should occur, per the current implementation of
xen_pirq_alloc(). We thus tweak the KASSERT()s to tolerate a repeat value
in the legacy_irq, while maintaining the original intent of the KASSERT()
which was to ensure that existing unrelated irq registrations should never
be overwritten.

Once we re-organise XEN specific code and unify with the native
intr_establish_xname() path, we will not run into this problem, because
legacy_irq will be treated as the pin number of the i8259 pic
exactly as it is now treated in native.

In short, this commit should fix some of the panics being seen on
-current for certain configurations of hardware on which dom0 runs.
 1.3 17-Feb-2018  maxv branches: 1.3.2; 1.3.4;
Rename i8259_stubs -> legacy_stubs. We will want the entries to have the
same name, eg:

legacy_stubs
-> Xintr_legacy0, Xrecurse_legacy0, Xresume_legacy0
-> Xintr_legacy1, Xrecurse_legacy1, Xresume_legacy1
...
 1.2 13-Dec-2017  bouyer Fixes for physical interrupts on Xen:
- do not cast int * to intr_handle_t *, they're not the same size
- legacy_irq is not always -1 for ioapic interrupts, test pic_type instead
- change irq2port[] to hold (port + 1) so that 0 is an invalid value
- add KASSERTs to make sure vect, port or irq values extracted from arrays are
valid (or that they are invalid before write)
- for the !ioapic case, we still need to do PHYSDEVOP_ASSIGN_VECTOR and
bind_pirq_to_evtch().

now XEN3_DOM0 boots again
 1.1 04-Nov-2017  cherry branches: 1.1.2;
On XEN dom0, the function xen/x86/intr.c:xen_intr_map() is used to map
hardware interrupts to XEN callbacks called 'events'. This function
combines both the allocation and the binding.

This change is the first part of breaking up that combination into
xen_pirq_alloc() and the binding will happen as part of the
pic_addroute() callback of a new pseudo PIC_XEN

This code will be added later on.
 1.1.2.2 03-Dec-2017  jdolecek update from HEAD
 1.1.2.1 04-Nov-2017  jdolecek file pintr.c was added on branch tls-maxphys on 2017-12-03 11:36:51 +0000
 1.3.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.4.1 10-Jun-2019  christos Sync with HEAD
 1.3.2.1 20-Oct-2018  pgoyette Sync with head
 1.11.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.6 17-Oct-2023  bouyer Make sure to always fall back to xen_early_console, even for dom0
 1.5 16-Oct-2023  bouyer Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.
 1.4 22-Jul-2023  mrg xen: declare 'default_consinfo' as extern in a header

this makes pvh_consinit.c actually compile with CONS_OVERRIDE set.
i didn't see any good header to add to (bootinfo.h and cpu.h both
seem to be poor choices but were considered), hence the new one
with just this definition.
 1.3 24-Mar-2023  bouyer Allow a PVH dom0 to use VGA as console: make xen_pvh_consinit() return 1 if
it handles the console and 0 otherwise (especially when console=tty0 or
console=pc is present on the command line).
In consinit() fallback to native console selection if xen_pvh_consinit()
returns 0.
 1.2 03-May-2020  bouyer branches: 1.2.20;
Hanble dom0 console. This one doesn't need a ring to be mapped, and
can be used earlier.
 1.1 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.2.20.2 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.2.20.1 30-Mar-2023  martin Pull up following revision(s) (requested by bouyer in ticket #131):

sys/arch/x86/x86/consinit.c: revision 1.36
sys/arch/xen/x86/pvh_consinit.c: revision 1.3
sys/arch/xen/include/xen.h: revision 1.48

Allow a PVH dom0 to use VGA as console: make xen_pvh_consinit() return 1 if
it handles the console and 0 otherwise (especially when console=tty0 or
console=pc is present on the command line).

In consinit() fallback to native console selection if xen_pvh_consinit()
returns 0.
 1.93 11-May-2024  andvar s/boostrap/bootstrap/ in comment, warning message and documentation.
 1.92 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.91 11-May-2022  bouyer In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.
 1.90 06-Sep-2020  riastradh Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.
 1.89 26-May-2020  bouyer Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().
 1.88 06-May-2020  bouyer xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.
 1.87 06-May-2020  bouyer KASSERT() that the per-cpu queues are run at IPL_VM after boot.
 1.86 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.85 30-Oct-2019  maxv Switch to new PTE bits.
 1.84 09-Mar-2019  maxv branches: 1.84.4;
Start replacing the x86 PTE bits.
 1.83 07-Mar-2019  maxv Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.
 1.82 04-Feb-2019  cherry Bump up XEN source API compatibility to 0x00030208 from 0x00030201,

but maintain backwards source API compilation compatibility.

ie; sources with config(5)
options __XEN_INTERFACE_VERSION__=0x00030201 # Xen 3.1 interface

should compile and run without problems.

Not that API version 0x00030201 is the lowest version we support now.
 1.81 29-Jul-2018  maxv Reduce the confusion, rename a bunch of variables and reorg a little.
Tested on i386PAE-domU and amd64-dom0.
 1.80 27-Jul-2018  maxv Try to reduce the confusion, rename:

l2_4_count -> PDIRSZ
count -> nL2
bootstrap_tables -> our_tables
init_tables -> xen_tables

No functional change.
 1.79 26-Jul-2018  maxv Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.
 1.78 26-Jul-2018  maxv Retire XENDEBUG_LOW, and switch its only user to XENDEBUG.
 1.77 26-Jul-2018  maxv Merge the blocks. No functional change.
 1.76 26-Jul-2018  maxv Simplify the conditions; (PTP_LEVELS > 3) and (PTP_LEVELS > 2) are for
amd64, so use ifdef __x86_64__. No functional change.
 1.75 24-Jun-2018  jdolecek branches: 1.75.2;
mark with XXXSMP all remaining spl*() and tsleep() calls
 1.74 16-Sep-2017  maxv branches: 1.74.2;
Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.
 1.73 18-Mar-2017  maxv Style, and remove debug code that does not work anyway.
 1.72 08-Mar-2017  maxv A few changes:
* Use markers to reduce false sharing.
* Remove XENDEBUG_SYNC and several debug messages, they are just useless.
* Remove xen_vcpu_*. They are unused and not optimized: if we really
wanted to flush ranges we should pack the VAs in a mmuext_op array
instead of performing several hypercalls in a loop.
* Start removing PG_k.
* KNF, reorder, simplify and remove stupid comments.
 1.71 02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.70 22-Jan-2017  maxv Export xpmap_pg_nx, and put it in the page table pages. It does not change
anything, since Xen removes the X bit on these; but it is better for
consistency.
 1.69 06-Jan-2017  maxv branches: 1.69.2;
Remove a few #if 0s, and explain what we are doing on PAE: the last two PAs
are entered in reversed order.
 1.68 16-Dec-2016  maxv The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.
 1.67 15-Nov-2016  maxv Mmh, apparently I didn't properly test my previous change since it does not
compile anymore
 1.66 15-Nov-2016  maxv Keep simplifying that stuff. Also, replace plX_pi(KERNTEXTOFF) by
LX_SLOT_KERNBASE: the base address is KERNBASE, and we just start mapping
from KERNTEXTOFF. For symmetry with the normal amd64, does not change
anything.
 1.65 11-Nov-2016  maxv Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.
 1.64 11-Nov-2016  maxv Start simplifying the Xen locore: rename and reorder several things, remove
awful debug messages, use unsigned counters, fix typos and KNF.
 1.63 01-Nov-2016  maxv Map the PTE space as non-executable on PAE. The same is already done on
amd64.
 1.62 01-Nov-2016  maxv Map the remaining pages as non-executable. Only text should have X.
 1.61 25-Aug-2016  bouyer Revert to 1.59 (adding back the W^X kernel mapings), and move the data+bss
mapping late so that mappings that should be RO (such as page tables) won't
be made RW by accident.
 1.60 23-Aug-2016  bouyer Stopgap measure: revert to rev 1.56. starting with 1.57 an i386PAE Xen
kernel doesn't boot:
(XEN) mm.c:2394:d139v0 Bad type (saw 5400000000000001 != exp 7000000000000000) for mfn 1136f5 (pfn 621)
(XEN) mm.c:887:d139v0 Could not get page type PGT_writable_page
(XEN) mm.c:939:d139v0 Error getting mfn 1136f5 (pfn 621) from L1 entry 00000001136f5003 for l1e_owner=139, pg_owner=139
(XEN) mm.c:1254:d139v0 Failure in alloc_l1_table: entry 33
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f57 (pfn dbf) for type 1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:947:d139v0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d139v0 Failure in alloc_l2_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f5b (pfn dbb) for type 2200000000000000: caf=8000000000000003 taf=2200000000000001
(XEN) mm.c:1412:d139v0 Failure in alloc_l3_table: entry 3
(XEN) mm.c:2141:d139v0 Error while validating mfn 112f60 (pfn db6) for type 3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:3044:d139v0 Error while pinning mfn 112f60
(XEN) traps.c:459:d139v0 Unhandled bkpt fault/trap [#3] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080231894 compat_create_bounce_frame+0xda/0xf2
 1.59 11-Aug-2016  maxv Make the I/O area non-executable on Xen.
 1.58 03-Aug-2016  maxv Map the recursive slot and page table pages as non-executable on Xen. Same
as normal x86.
 1.57 02-Aug-2016  maxv Map the kernel text, rodata and data+bss independently on Xen, with
respectively RX, R and RW.
 1.56 02-Aug-2016  maxv Use PG_RO instead of a magic zero.
 1.55 02-Aug-2016  maxv KNF, and use PAGE_SIZE instead of NBPG.
 1.54 29-May-2016  bouyer branches: 1.54.2;
Switch to elf notes for amd64 instead of the old key=value list to describe the
guest requirements and support.
Add infrastructure to query the hypervisor about features support.
For verbose boot, print the features suppoted by the hypervisor for this
guest.
 1.53 06-May-2014  cherry branches: 1.53.4;
Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.
 1.52 10-Nov-2013  jnemeth branches: 1.52.2;
Change xpq_flush_cache to just do WBINVD letting the hypervisor trap and
handle it as MMUEXT_FLUSH_CACHE is a privileged hypervisor operation.
 1.51 08-Nov-2013  christos fix unused variable warnings
 1.50 06-Nov-2013  mrg - move variables inside their #ifdef use
- remove unused and set-but-unused variables
- use __USE() in a particularly ugly case

with these, and a couple of other changes, amd64 gcc 4.8.1 world
is able to complete build.sh release.
 1.49 16-Sep-2012  rmind branches: 1.49.2;
Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific
about the interface.
 1.48 21-Aug-2012  bouyer branches: 1.48.2;
Redo previous the correct way: Xen expects a u_long * for vcpumask,
so use 2 uint32_t on LP64.
 1.47 21-Aug-2012  rmind Fix Xen build. Make xcpumask uint32_t, fits 32 CPUs (can increase).
 1.46 30-Jun-2012  jym Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.
 1.45 27-Jun-2012  jym Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).
 1.44 06-Jun-2012  rmind Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.

Tested by chs@.
 1.43 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.42 02-Mar-2012  bouyer MMUEXT_INVLPG_MULTI and MMUEXT_TLB_FLUSH_MULTI use a long as cpu mask,
not uint32_t, so pass a pointer of the right type.
While there, cleanup includes and delete local, redundant define of PG_k.
 1.41 24-Feb-2012  cherry (xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).
 1.40 23-Feb-2012  bouyer On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699
 1.39 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.38 12-Jan-2012  cherry branches: 1.38.2;
relocate pte_lock initialisation to the earliest points after %fs is first usable in the XEN bootpath
 1.37 09-Jan-2012  cherry Make cross-cpu pte access MP safe.
XXX: review cases of use of pmap_set_pte() vs direct use of xpq_queue_pte_update()
 1.36 06-Nov-2011  cherry branches: 1.36.4;
[merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.
 1.35 06-Nov-2011  cherry [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.34 20-Sep-2011  jym branches: 1.34.2;
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.33 21-Aug-2011  jym Merge err printf with the panic(9) message.

Also fix the if () {...} statement with braces, to avoid calling panic()
every time. Hi cherry!
 1.32 13-Aug-2011  cherry Call the right function
(fix for an egregious error)
 1.31 13-Aug-2011  cherry Add locking around ops to the hypervisor MMU "queue".
 1.30 13-Aug-2011  cherry remove unnecessary locking overhead for UP
 1.29 10-Aug-2011  cherry Introduce locking primitives for Xen pte operations, and xen helper calls for MP related MMU ops
 1.28 15-Jun-2011  rmind Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.
 1.27 15-Jun-2011  rmind - cpu_hatch: call tlbflushg(), just to make sure that TLB is clean.
- xen_bootstrap_tables: call xpq_queue_tlb_flush() for safety.
- Initialise cpus_attached and ci_cpumask for primary CPU.
 1.26 08-May-2011  jym branches: 1.26.2;
Print the PGD address in the debug message.
 1.25 29-Mar-2011  jym Typo fix.
 1.24 10-Feb-2011  jym Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.
 1.23 20-Dec-2010  jym branches: 1.23.2; 1.23.4;
Now, get the return error too, in case that could help with EC2
troubleshooting...
 1.22 19-Dec-2010  jym Need the successful count (for AMI debugging)
 1.21 24-Jul-2010  jym Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.20 15-Jul-2010  jym With Xen, PDPpaddr should contain a guest physical address (== PFN).
 1.19 26-Feb-2010  jym branches: 1.19.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
 1.18 12-Feb-2010  jym Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.
 1.17 23-Oct-2009  snj branches: 1.17.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).
 1.16 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.15 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.14 23-Jul-2009  jym Fix typos in comments and __PRINTKs.
 1.13 20-Jun-2009  cegger sprintf -> snprintf. Wrap long lines.
 1.12 13-Nov-2008  cegger branches: 1.12.4;
Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.
 1.11 24-Oct-2008  jym branches: 1.11.2; 1.11.4;
- rename init_events() to events_init(), to better reflect netbsd semantics

- change unbind_[pv]irq_from_evtch() so that they now return the event
channel the [PV]IRQ was bound to. It reflects the opposite behaviour of the
bind_[pv]irq_to_evtch() functions.

- remove xenbus_suspend() and xenbus_resume() prototypes, as they are not
used anywhere else, and will conflict with the xenbus pmf(9) handlers.

- make start_info aligned on a page boundary, as Xen expects it to be so.

- mask event channel during xbd detach before removing its handler (can
avoid spurious events).

- add the "protocol" entry in xenstore during xbd initialization. Normally
created during domU's boot by xentools, it is under domU's responsibility
in all other cases (save/restore, hot plugging, etc.).

- modifications to xs_init(), so that it can properly return an error.

Reviewed by Christoph (cegger@).
 1.10 21-Oct-2008  cegger introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.9 05-Sep-2008  tron Compile NetBSD/amd64 kernels with "-Wextra". Patches contributed by
Juan RP in PR port-amd64/39266.
 1.8 14-Apr-2008  cegger branches: 1.8.4; 1.8.6; 1.8.10;
- use POSIX integer types
- ansify functions
 1.7 17-Feb-2008  bouyer branches: 1.7.6;
The informations about console and store page number are long, so avoid
overflow on i386PAE when converting to machine address. Fix booting
XEN3PAE kernels when xen maps it above 4Gb.
 1.6 23-Jan-2008  bouyer Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).
 1.5 15-Jan-2008  bouyer Allocate one more L2 slot in xen_pmap_bootstrap() for i386.
pmap_bootstrap()/init386() wants to map a few additionnal things after
first_avail that we didn't account for, before pmap_growkernel() is
used/functionnal, and if the loaded kernel is close to the end of
the last L2 slot we loose. Should fix port-xen/37761 by YAMAMOTO Takashi.

Fix a XENPRINTF() so that low debug builds again.
 1.4 11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.3 23-Nov-2007  bouyer branches: 1.3.2; 1.3.4; 1.3.8; 1.3.12; 1.3.16;
xpq_flush_queue(): cast values to u_int64_t and use PRIx64 in printf().
Fix build of i386 Xen kernels, reported by Hisashi T Fujinaka.
 1.2 22-Nov-2007  bouyer Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 21-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file x86_xpmap.c was initially added on branch bouyer-xenamd64.
 1.1.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.6 22-Nov-2007  bouyer Disable debug messages
 1.1.2.5 21-Nov-2007  bouyer When HYPERVISOR_mmu_update_self() fails in xpq_flush_queue(), dump the content
of the queue.
 1.1.2.4 19-Nov-2007  bouyer Get rid of arch/xenamd64, step 3: merge xenamd64/amd64/xpmap.c in
xen/x86/x86_xpmap.c
 1.1.2.3 26-Oct-2007  bouyer Make amd64, i386 and xen kernels build and work again.
 1.1.2.2 25-Oct-2007  bouyer Finish sync with HEAD. Especially use the new x86 pmap for xenamd64.
For this:
- rename pmap_pte_set() to pmap_pte_testset()
- make pmap_pte_set() a function or macro for non-atomic PTE write
- define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries
- define pmap_pte_flush() which is a nop in x86 case, and flush the
MMUops queue in the Xen case
 1.1.2.1 21-Oct-2007  bouyer Factorise some Xen pmap code in x86_xpmap.c.
More xpmap_{ptom,mtop} -> xpmap_{ptom,mtop}_masked

The xenamd64 kernel is now good enough to complete a sysinst install from
xennet to xbd.
 1.3.16.3 23-Mar-2008  matt sync with HEAD
 1.3.16.2 09-Jan-2008  matt sync with HEAD
 1.3.16.1 23-Nov-2007  matt file x86_xpmap.c was added on branch matt-armv6 on 2008-01-09 01:50:15 +0000
 1.3.12.14 20-Jan-2008  bouyer Remove debug printk()
 1.3.12.13 19-Jan-2008  bouyer Sync with HEAD
 1.3.12.12 18-Jan-2008  bouyer Fix APDP handling. A XEN i386PAE kernel now boots multiuser
 1.3.12.11 17-Jan-2008  bouyer - Fix L2_SLOT_APTE value (not sure how I got this value but it was definitively
wrong)
- Use global variable for the PAE L3 page adresses, so that pmap.c can get it
from the bootstrap code
- Extent the size of our virtual PDP from 3 to 4 pages, so that pmap->pm_pdir[]
is contigous for the whole VA range. The last page is a shadow of
the kernel's real PDP (L3[3]).
- make pm_pdirpa an array of 4 paddr_t if using PAE. introduce a
pmap_pdirpa macro to get the physical address of a given PD entry.
- fix pmap_map_pte

The kernel now boots single-user. fsck will cause a kernel fault in
pmap_pdes_invalid() on exit.
 1.3.12.10 15-Jan-2008  bouyer Snapshot of work in progress: an Xen i386PAE kernel boots and start init
on a amd64 dom0, but panics when init forks.
This code needs a lot of cleanup, and the pmap handling is minimal to
allow init to start. It's a proof of concept of how PAE on Xen can work.

For PAE guest, the Xen MMU handling differs in some significant way
from the i386 or amd64 Xen.
The L3 page has only 4 entries, the last one mapping 0xc0000000->0xffffffff
(which happens to be our kenrel VM range, that's cool). The L2 page
pointed to by this last entry is handled specially by Xen because it
contains some Xen private mapping, including a recursive mapping. So this
page can only be pointed to by exactly one L3 entry, and nothing else
(it can't be part of a recursive mapping for example). In addition, it
would waste too much VA space to do recursive mapping at the L3 level.

We do pmap switching at the L# level, instead of doing it though %cr3.
%cr3 is static, as is L3[3] which contains only kenrel mappings.
pmap_load() does pmap switching though the first 3 entries for L3.

PTE mapping is done though 4 contigous L2 entries; the last one pointing
to a shadow of L3[3]. This way we can consider we have a 2-level VM system,
but with the L2 being 4 pages in size instead of one. The plx_i()
macros can be used with it to access the PTE without changes.

This can be reused as is for native PAE support (without the L3[3] shadow
which wouldn't be needed here)
 1.3.12.9 13-Jan-2008  bouyer Add i386PAE suport for bootstrap.
 1.3.12.8 13-Jan-2008  bouyer Work in progress on xeni386 PAE support:
Make xeni386 build with a 64bit paddr_t. For this vaddr_t vs paddr_t vs
pointers usages had to be clarified.
If 'options PAE' is present in a Xen3 kernel, switch paddr_t, pd_entry_t
and pt_entry_t to 64bits, and add the PAE entry in the __xen_guest ELF section.
 1.3.12.7 11-Jan-2008  bouyer Ops, fix XENPRINTK usage.
 1.3.12.6 11-Jan-2008  bouyer printk -> XENPRINTK
 1.3.12.5 09-Jan-2008  bouyer Merge xen bits to i386/i386/gdt.c. Convert remaining uses of PTE_* macros to
pmap_pte_* macros/inlines.
Fix think-o in pmap.c for native i386.
 1.3.12.4 05-Jan-2008  bouyer Make it build on for XEN2_*
 1.3.12.3 15-Dec-2007  bouyer Switch xen/i386 to the x86 xen_pmap_bootstrap().
 1.3.12.2 15-Dec-2007  bouyer Cleanup xen_pmap_bootstrap() and make it build on i386.
 1.3.12.1 11-Dec-2007  bouyer Switch i386 to x86/x86/pmap.c
 1.3.8.5 27-Feb-2008  yamt sync with head.
 1.3.8.4 04-Feb-2008  yamt sync with head.
 1.3.8.3 21-Jan-2008  yamt sync with head
 1.3.8.2 07-Dec-2007  yamt sync with head
 1.3.8.1 23-Nov-2007  yamt file x86_xpmap.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:18 +0000
 1.3.4.2 03-Dec-2007  ad Sync with HEAD.
 1.3.4.1 23-Nov-2007  ad file x86_xpmap.c was added on branch vmlocking on 2007-12-03 19:04:42 +0000
 1.3.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.3.2.1 23-Nov-2007  joerg file x86_xpmap.c was added on branch jmcneill-pm on 2007-11-27 19:36:22 +0000
 1.7.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.7.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.7.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.8.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.8.10.1 19-Oct-2008  haad Sync with HEAD.
 1.8.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.8.4.5 11-Aug-2010  yamt sync with head.
 1.8.4.4 11-Mar-2010  yamt sync with head
 1.8.4.3 19-Aug-2009  yamt sync with head.
 1.8.4.2 18-Jul-2009  yamt sync with head.
 1.8.4.1 04-May-2009  yamt sync with head.
 1.11.4.1 24-Feb-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #1729):
sys/arch/x86/x86/pmap.c: revision 1.170 via patch
sys/arch/xen/x86/x86_xpmap.c: revision 1.40 via patch

Fix random kernel panic on domains with large memory.
May fix PR port-xen/38699
 1.11.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.12.4.15 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.12.4.14 02-May-2011  jym Sync with head.
 1.12.4.13 30-Mar-2011  jym Sync with my commits in HEAD.
 1.12.4.12 29-Mar-2011  jym More sync fixes. And add the mbr_gpt files.
 1.12.4.11 28-Mar-2011  jym Cure sync hiccups. Code with compile errors is not really useful, heh.
 1.12.4.10 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.12.4.9 10-Jan-2011  jym Sync with HEAD
 1.12.4.8 24-Oct-2010  jym Sync with HEAD
 1.12.4.7 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.12.4.6 01-Nov-2009  jym Sync with HEAD.
 1.12.4.5 24-Jul-2009  jym - rework the page pinning API, so that now a function is provided for
each level of indirection encountered during virtual memory translations. Update
pmap accordingly. Pinning looks cleaner that way, and it offers the possibility
to pin lower level pages if necessary (NetBSD does not do it currently).

- some fixes and comments to explain how page validation/invalidation take
place during save/restore/migrate under Xen. L2 shadow entries from PAE are now
handled, so basically, suspend/resume works with PAE.

- fixes an issue reported by Christoph (cegger@) for xencons suspend/resume
in dom0.

TODO:

- PAE save/restore is currently limited to single-user only, multi-user
support requires modifications in PAE pmap that should be discussed first. See
the comments about the L2 shadow pages cached in pmap_pdp_cache in this commit.

- grant table bug is still there; do not use the kernels of this branch
to test suspend/resume, unless you want to experience bad crashes in dom0,
and push the big red button.

Now there is light at the end of the tunnel :)

Note: XEN2 kernels will neither build nor work with this branch.
 1.12.4.4 23-Jul-2009  jym Sync with HEAD.
 1.12.4.3 31-May-2009  jym Modifications for the Xen suspend/migrate/resume branch:

- introduce xenbus_device_{suspend,resume}() functions. These are routines
used to suspend/resume MI parts of the Xenbus device interfaces, like updating
frontend/backend devices' paths found in XenStore.

- introduce HYPERVISOR_sysctl(), an hypercall used only by Xentools to obtain
information from hypervisor (listing VMs, printing console, etc.). I use it
to query xenconsole from ddb(), as a last resort in case of a panic() in
dom0 (xm being not available). Currently unused in the branch; could be, if
requested.

- disable the rwlock(9) used to protect code that could use transient MFNs.
It could trigger nasty context switches in place it should not to.

- fix some bugs in the xennet/xbd suspend/resume pmf(9) handlers.

- following XenSource's design, talk_to_otherend() is now called
watch_otherend(), and free_otherend_details() is used by Xenbus device
suspend/resume routines.

- some slight modifications in pmap regarding APDP. Introduce an inline
function (pmap_unmap_apdp_pde()) that clears APDP entry for the current pmap.

- similarly, implement pmap_unmap_all_apdp_pdes() that iterates through all
pmaps and tears down APDP, as Xen does not handle them properly.

TODO/XXX:

- pmap_unmap_apdp_pde() does not handle APDP shadow entry of PAE. It will,
once I figure out how PAE uses it.

- revisit the pmap locking issue regarding transient MFNs. As NetBSD does not
use kernel preemption and MP for Xen, this could be skipped momentarily. See
http://mail-index.netbsd.org/port-xen/2009/04/27/msg004903.html for details.

- fix a bug regarding grant tables which could technically DoS a dom0 if
ridiculously high consumer/producer indexes are passed down in the ring during
a resume.

All in all, once the grant table index issue and APDP PAE are fixed, next step
is to torture test this branch.

Tested under i386 PAE and non-PAE, Xen3 dom0 and domU. amd64 is only compile
tested.
 1.12.4.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.4.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.17.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.17.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.19.2.3 31-May-2011  rmind sync with head
 1.19.2.2 21-Apr-2011  rmind sync with head
 1.19.2.1 05-Mar-2011  rmind sync with head
 1.23.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.23.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.26.2.12 21-Oct-2011  bouyer Make this build without 'options MULTIPROCESSOR'
 1.26.2.11 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.26.2.10 18-Sep-2011  cherry Make the xpq locking per-cpu
 1.26.2.9 09-Sep-2011  cherry fix amd64 boot.
 1.26.2.8 30-Aug-2011  cherry Add per-cpu mmu queues
 1.26.2.7 20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.26.2.6 17-Aug-2011  cherry Pullup relevant changes from -current
 1.26.2.5 31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.26.2.4 16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.26.2.3 27-Jun-2011  cherry Add xpq locking around xpq_queue_tlb_flush()
 1.26.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.26.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.34.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.34.2.4 30-Oct-2012  yamt sync with head
 1.34.2.3 23-May-2012  yamt sync with head.
 1.34.2.2 17-Apr-2012  yamt sync with head
 1.34.2.1 10-Nov-2011  yamt sync with head
 1.36.4.6 29-Apr-2012  mrg sync to latest -current.
 1.36.4.5 06-Mar-2012  mrg sync to -current
 1.36.4.4 06-Mar-2012  mrg sync to -current
 1.36.4.3 04-Mar-2012  mrg sync to latest -current.
 1.36.4.2 24-Feb-2012  mrg sync to -current.
 1.36.4.1 18-Feb-2012  mrg merge to -current.
 1.38.2.5 12-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #314):
sys/arch/xen/x86/cpu.c: revision 1.92
sys/kern/subr_kcpuset.c: revision 1.6
sys/sys/kcpuset.h: revision 1.6
sys/arch/xen/x86/x86_xpmap.c: revision 1.44
Few fixes for Xen:
- cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes.
- Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect
ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and
xen_mcast_tlbflush() routines.
Tested by chs@.
 1.38.2.4 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.38.2.3 05-Mar-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #80):
sys/arch/xen/x86/x86_xpmap.c: revision 1.42
sys/arch/x86/include/specialreg.h: revision 1.56
sys/arch/amd64/amd64/machdep.c: revision 1.179
sys/arch/i386/i386/locore.S: revision 1.97
sys/arch/i386/i386/machdep.c: revision 1.723 via patch
sys/arch/x86/include/cpu.h: revision 1.49

Fix possible FPU registers corruption on context switches.
Fix type of pointers passed to some hypercalls.
 1.38.2.2 23-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #39):
sys/arch/x86/x86/pmap.c: revision 1.170
sys/arch/xen/x86/x86_xpmap.c: revision 1.40
On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.
May fix PR port-xen/38699
 1.38.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.48.2.3 03-Dec-2017  jdolecek update from HEAD
 1.48.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.48.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.49.2.1 18-May-2014  rmind sync with head
 1.52.2.1 10-Aug-2014  tls Rebase.
 1.53.4.5 28-Aug-2017  skrll Sync with HEAD
 1.53.4.4 05-Feb-2017  skrll Sync with HEAD
 1.53.4.3 05-Dec-2016  skrll Sync with HEAD
 1.53.4.2 05-Oct-2016  skrll Sync with HEAD
 1.53.4.1 09-Jul-2016  skrll Sync with HEAD
 1.54.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.54.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.54.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.54.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.69.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.74.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.74.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.74.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.75.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.75.2.1 10-Jun-2019  christos Sync with HEAD
 1.84.4.2 13-May-2022  martin Pull up following revision(s) (requested by bouyer in ticket #1444):

sys/arch/xen/x86/x86_xpmap.c: revision 1.91

In bootstrap, after switching to a new page table make sure that
now-unused memory is unmapped.
 1.84.4.1 31-May-2020  martin Pull up following revision(s) (requested by bouyer in ticket #935):

sys/arch/xen/x86/x86_xpmap.c: revision 1.89
sys/arch/x86/include/pmap.h: revision 1.121
sys/arch/xen/xen/privcmd.c: revision 1.58
sys/external/mit/xen-include-public/dist/xen/include/public/memory.h: revision 1.2
sys/arch/xen/include/xenpmap.h: revision 1.44
sys/arch/xen/include/xenio.h: revision 1.12
sys/arch/x86/x86/pmap.c: revision 1.394
(all via patch)

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()

Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().

Implement new ioctl, needed by Xen 4.13:
IOCTL_PRIVCMD_MMAPBATCH_V2
IOCTL_PRIVCMD_MMAP_RESOURCE
IOCTL_GNTDEV_MMAP_GRANT_REF
IOCTL_GNTDEV_ALLOC_GRANT_REF

Always enable declarations needed by privcmd.c
 1.34 14-May-2024  andvar fix recently committed typos by msaitoh in few more places, as well as few more.
mainly s/contigous/contiguous/ and s/miliseconds/milliseconds/ in comments.
 1.33 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.32 06-May-2020  bouyer Make MP-safe: make sure the xpq_queue* are flushed before making the
pages visible to UVM.
 1.31 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.30 10-Apr-2020  jdolecek add and pass dma tag to PV drivers attached to xenbus, so thay can
use bus_dmamap_load_mbuf() et.al.

due to XENPV override, _BUS_BUS_TO_PHYS() dmamap segment ds_addr
gets filled with ma, so value can be directly used for e.g. grant calls
 1.29 09-Apr-2020  jdolecek update to __XEN_INTERFACE_VERSION__ 0x0003020a aka Xen 3.2.10

this brings grant memory v2 support:
- status separated from flags - revoking access needs just memory barrier,
no need for expensive cmpxchg16 any more
- sub-page hypervisor copy-only grants, to be used by xennet(4)
- 64-bit frame, i.e. support for DomU RAM >16TB

the grant table is now always allocated on boot to maximum size, it's now
never grown in runtime; switch back to regular kmem_alloc()/kmem_free()

code now requires v2 support, no compatibility for grant version 1 retained -
Xen v2 support predates all currently supported Xen versions

also interface for baloon changed slightly, code updated
 1.28 03-Sep-2018  riastradh branches: 1.28.10;
Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.27 24-Jun-2018  jdolecek branches: 1.27.2;
mark with XXXSMP all remaining spl*() and tsleep() calls
 1.26 30-Jun-2012  jym branches: 1.26.38;
Use setter to set xenguest_handles.
 1.25 30-Jun-2012  jym Extend the xpmap API, as described in [1]. This change is mechanical and
avoids exposing the MD phys_to_machine/machine_to_phys tables directly.
Added:

- xpmap_ptom handles PFN (pseudo physical) to MFN (machine frame number)
translations, and is under control of the domain.
- xpmap_mtop is its counterpart (MFN to PFN), and is under control of
hypervisor.

xpmap_ptom_map() map a pseudo-phys address to a machine address
xpmap_ptom_unmap() unmap a pseudo-phys address (invalidation)
xpmap_ptom_isvalid() check for pseudo-phys address validity

The parameters are physical/machine addresses, like bus_dma/bus_space(9).
As x86 MFNs are tracked by u_long (Xen's choice) while machine addresses
can be 64 bits entities (PAE), use ptoa() to avoid truncation when bit
shifting by PAGE_SHIFT.

I kept the same namespace (xpmap_) to avoid code churn.

[1] http://mail-index.netbsd.org/port-xen/2009/05/09/msg004951.html

XXX will document ptoa/atop/trunc_page separately.
 1.24 27-Jun-2012  jym Retire XEN_COMPAT_030001 as detailed on port-xen@:

http://mail-index.netbsd.org/port-xen/2012/06/25/msg007431.html

The xen_p2m API comes next.

ok bouyer@.
Tested on i386 PAE and amd64 (Xen 3.3 on private test bed, and
Xen 3.4 for Amazon EC2).

FWIW, Amazon always reported:

hypervisor0 at mainbus0: Xen version 3.4.3-kaos_t1micro

multiple times for Europe and US West-1, so I guess they are now at
3.4 (32 and 64 bits).
 1.23 01-Jul-2011  dyoung branches: 1.23.2;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.22 12-Nov-2010  njoly branches: 1.22.6;
Include uvm.h not uvm_extern.h following recent changes.
 1.21 22-Mar-2010  bouyer bus_dmamem_alloc() may not get a boundary smaller than size, but
it's perfectly valid for bus_dmamap_create() to do so (a contigous
transfers will then split in multiple segment).
Fix _xen_bus_dmamem_alloc_range() and _bus_dmamem_alloc_range() to
allow a boundary limit smaller than size:
- compute appropriate boundary for uvm_pglistalloc(), wich doesn't
accept boundary < size
- also take care of boundary when deciding to start a new segment.
While there, remove useless boundary argument to _xen_alloc_contig().
Fix the boundary-related issue of PR port-amd64/42980
 1.20 09-Mar-2010  jym branches: 1.20.2;
Although Xen's documentation states that the address_bits field is not used
by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain
circumstances (stack leak), the field could have an improper value, leading
to a fail of the hypercall.

Set it to 0 ("no addressing restriction") to avoid that.

Patch tested by Sam Fourman and haad@.

This should fix the rare "failed allocating DMA memory" encountered
under NetBSD dom0. Will ask for a pull-up.
 1.19 02-Mar-2010  jym Catch the return value from the XENMEM_decrease_reservation hypercall,
and not some error value stored earlier.

While here, fix a typo in a comment.
 1.18 27-Feb-2010  jym Make a XENMEM_decrease_reservation DEBUG printf() more meaningful.
 1.17 12-Feb-2010  jym Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.
 1.16 23-Jan-2010  cegger branches: 1.16.2;
fix address overflow with 32bit PAE.
Reported and tested by Mark Davies on port-xen@.
 1.15 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.14 24-Jan-2009  bouyer branches: 1.14.2;
Properly check the return value of HYPERVISOR_memory_op(): it returns
the number of successfull operations, so a return value of 0 is also
a failure.
 1.13 18-Dec-2008  cegger remove unused malloc.h
 1.12 13-Nov-2008  cegger Finish preparation to new interface.
New interface not yet used by default. It needs some testing first.
 1.11 04-Jun-2008  ad branches: 1.11.4; 1.11.6; 1.11.8; 1.11.12;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.10 28-Apr-2008  martin branches: 1.10.2;
Remove clause 3 and 4 from TNF licenses
 1.9 24-Feb-2007  bouyer branches: 1.9.42; 1.9.44; 1.9.46;
Properly honnor bus_dma bus address range restriction for Xen3 (the Xen2
interface doesn't allow it), so that .e.g bus_dma_subregion() has a chance
to work. Unfortunably a stock Xen hypervisor won't allow a upper bound less
than 2^31 (2GB) so devices like bce(4) will need a hacked hypervisor to
work properly.
 1.8 03-Sep-2006  bouyer branches: 1.8.6; 1.8.8;
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.7 28-Aug-2006  bouyer Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
 1.6 15-Jan-2006  bouyer branches: 1.6.2; 1.6.6; 1.6.16; 1.6.18;
Snapshot of work in progress on NetBSD port to Xen3:
- kernel (both dom0 and domU) boot, console is functionnal and it can starts
software from a ramdisk
- there is no driver front-end expect console for domU yet.
- dom0 can probe devices and ex(4) work when Xen3 is booted without acpi
and apic support. But the on-board IDE doens't get interrupts.
The PCI code still needs work (it's hardcoded to mode 1). Some of this
code should be shared with ../x86
The physical insterrupt code needs to get MPBIOS and ACPI support, and
do interrupt routing to properly interract with Xen.
To enable Xen-3.0 support, add
options XEN3
to your kernel config file (this will disable Xen2 support)
Changes affecting Xen-2.0 support (no functionnal changes intended):
- get more constants from genassym for assembly code
- remove some unneeded registers move from start()
- map the shared info page from start(), and remove the pte = 0xffffffff hack
- vector.S: in hypervisor_callback() make sure %esi points to
HYPERVISOR_shared_info before accessing the info page. Remplace some
hand-written assembly with the equivalent macro defined in frameasm.h
- more debug code, dissabled by default.

while here added my copyright on some files I worked on in 2005.
 1.5 24-Dec-2005  perry branches: 1.5.2;
__asm__ -> __asm
__const__ -> const
__inline__ -> inline
__volatile__ -> volatile
 1.4 11-Dec-2005  christos merge ktrace-lwp.
 1.3 22-Aug-2005  bouyer branches: 1.3.2; 1.3.8;
Fix a memory leak. Thanks to YAMAMOTO Takashi for the notice.
 1.2 20-Aug-2005  bouyer Also properly check the alignement and boundary constraints.
 1.1 20-Aug-2005  bouyer Deal with the machine address space being non-contigous in bus_dmamem_alloc():
- Define _BUS_AVAIL_END to 0xffffffff, as we don't have an easy way to
find the upper bound for our machine address space (and this can change
when we swap pages with the hypervisor).
- implement _xen_bus_dmamem_alloc_range(), which will request a contigous
set of pages to the hypervisor if the pages returned by uvm_pglistalloc()
don't fit the constraints.
We can't deal with the low/high constraints yet, because Xen doesn't offer a
way to get pages in a specific ranges of addresses.

Based on patches from Dave Thompson (in private mail), with heavy hacking
by me.
 1.3.8.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.8.1 22-Aug-2005  skrll file xen_bus_dma.c was added on branch ktrace-lwp on 2005-11-10 14:00:34 +0000
 1.3.2.5 16-Sep-2006  ghen Pull up following revision(s) (requested by bouyer in ticket #1510):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.7
sys/arch/xen/x86/xen_bus_dma.c: revision 1.8
sys/arch/x86/include/bus_private.h: revision 1.6
sys/arch/x86/x86/bus_dma.c: revision 1.30
sys/arch/xen/include/bus_private.h: revision 1.7
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.3.2.4 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #696):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.3
Fix a memory leak. Thanks to YAMAMOTO Takashi for the notice.
 1.3.2.3 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #696):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.2
Also properly check the alignement and boundary constraints.
 1.3.2.2 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #696):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.1
sys/arch/xen/include/bus_private.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.28
Deal with the machine address space being non-contigous in bus_dmamem_alloc():
- Define _BUS_AVAIL_END to 0xffffffff, as we don't have an easy way to
find the upper bound for our machine address space (and this can change
when we swap pages with the hypervisor).
- implement _xen_bus_dmamem_alloc_range(), which will request a contigous
set of pages to the hypervisor if the pages returned by uvm_pglistalloc()
don't fit the constraints.
We can't deal with the low/high constraints yet, because Xen doesn't offer a
way to get pages in a specific ranges of addresses.
Based on patches from Dave Thompson (in private mail), with heavy hacking
by me.
 1.3.2.1 22-Aug-2005  tron file xen_bus_dma.c was added on branch netbsd-3 on 2005-08-25 20:49:54 +0000
 1.5.2.1 01-Feb-2006  yamt sync with head.
 1.6.18.1 14-Sep-2006  riz Pull up following revision(s) (requested by bouyer in ticket #150):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.7
sys/arch/xen/x86/xen_bus_dma.c: revision 1.8
sys/arch/x86/include/bus_private.h: revision 1.6
sys/arch/x86/x86/bus_dma.c: revision 1.30
sys/arch/xen/include/bus_private.h: revision 1.7
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.6.16.4 26-Feb-2007  yamt sync with head.
 1.6.16.3 30-Dec-2006  yamt sync with head.
 1.6.16.2 21-Jun-2006  yamt sync with head.
 1.6.16.1 15-Jan-2006  yamt file xen_bus_dma.c was added on branch yamt-lazymbuf on 2006-06-21 14:58:23 +0000
 1.6.6.2 14-Sep-2006  yamt sync with head.
 1.6.6.1 03-Sep-2006  yamt sync with head.
 1.6.2.1 09-Sep-2006  rpaulo sync with head
 1.8.8.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.8.6.1 13-Jun-2010  riz Pull up following revision(s) (requested by jym in ticket #1388):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.20
Although Xen's documentation states that the address_bits field is not used
by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain
circumstances (stack leak), the field could have an improper value, leading
to a fail of the hypercall.
Set it to 0 ("no addressing restriction") to avoid that.
Patch tested by Sam Fourman and haad@.
This should fix the rare "failed allocating DMA memory" encountered
under NetBSD dom0. Will ask for a pull-up.
 1.9.46.5 11-Aug-2010  yamt sync with head.
 1.9.46.4 11-Mar-2010  yamt sync with head
 1.9.46.3 19-Aug-2009  yamt sync with head.
 1.9.46.2 04-May-2009  yamt sync with head.
 1.9.46.1 16-May-2008  yamt sync with head.
 1.9.44.2 17-Jun-2008  yamt sync with head.
 1.9.44.1 18-May-2008  yamt sync with head.
 1.9.42.3 17-Jan-2009  mjf Sync with HEAD.
 1.9.42.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.9.42.1 02-Jun-2008  mjf Sync with HEAD.
 1.10.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.12.1 21-Apr-2010  matt sync to netbsd-5
 1.11.8.3 19-Nov-2010  riz Pull up following revision(s) (requested by bouyer in ticket #1348):
sys/arch/x86/x86/bus_dma.c: revision 1.54
sys/arch/xen/x86/xen_bus_dma.c: revision 1.21
bus_dmamem_alloc() may not get a boundary smaller than size, but
it's perfectly valid for bus_dmamap_create() to do so (a contigous
transfers will then split in multiple segment).
Fix _xen_bus_dmamem_alloc_range() and _bus_dmamem_alloc_range() to
allow a boundary limit smaller than size:
- compute appropriate boundary for uvm_pglistalloc(), wich doesn't
accept boundary < size
- also take care of boundary when deciding to start a new segment.
While there, remove useless boundary argument to _xen_alloc_contig().
Fix the boundary-related issue of PR port-amd64/42980
 1.11.8.2 29-Mar-2010  snj Pull up following revision(s) (requested by jym in ticket #1334):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.20
Although Xen's documentation states that the address_bits field is not used
by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain
circumstances (stack leak), the field could have an improper value, leading
to a fail of the hypercall.
Set it to 0 ("no addressing restriction") to avoid that.
Patch tested by Sam Fourman and haad@.
This should fix the rare "failed allocating DMA memory" encountered
under NetBSD dom0.
 1.11.8.1 30-Jan-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1271):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.16
sys/arch/xen/xen/xengnt.c: revision 1.17 via patch
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.33
fix address overflow with 32bit PAE.
Reported and tested by Mark Davies on port-xen@.
 1.11.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.11.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.11.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.14.2.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.14.2.5 10-Jan-2011  jym Sync with HEAD
 1.14.2.4 24-Oct-2010  jym Sync with HEAD
 1.14.2.3 01-Nov-2009  jym Sync with HEAD.
 1.14.2.2 31-May-2009  jym Modifications for the Xen suspend/migrate/resume branch:

- introduce xenbus_device_{suspend,resume}() functions. These are routines
used to suspend/resume MI parts of the Xenbus device interfaces, like updating
frontend/backend devices' paths found in XenStore.

- introduce HYPERVISOR_sysctl(), an hypercall used only by Xentools to obtain
information from hypervisor (listing VMs, printing console, etc.). I use it
to query xenconsole from ddb(), as a last resort in case of a panic() in
dom0 (xm being not available). Currently unused in the branch; could be, if
requested.

- disable the rwlock(9) used to protect code that could use transient MFNs.
It could trigger nasty context switches in place it should not to.

- fix some bugs in the xennet/xbd suspend/resume pmf(9) handlers.

- following XenSource's design, talk_to_otherend() is now called
watch_otherend(), and free_otherend_details() is used by Xenbus device
suspend/resume routines.

- some slight modifications in pmap regarding APDP. Introduce an inline
function (pmap_unmap_apdp_pde()) that clears APDP entry for the current pmap.

- similarly, implement pmap_unmap_all_apdp_pdes() that iterates through all
pmaps and tears down APDP, as Xen does not handle them properly.

TODO/XXX:

- pmap_unmap_apdp_pde() does not handle APDP shadow entry of PAE. It will,
once I figure out how PAE uses it.

- revisit the pmap locking issue regarding transient MFNs. As NetBSD does not
use kernel preemption and MP for Xen, this could be skipped momentarily. See
http://mail-index.netbsd.org/port-xen/2009/04/27/msg004903.html for details.

- fix a bug regarding grant tables which could technically DoS a dom0 if
ridiculously high consumer/producer indexes are passed down in the ring during
a resume.

All in all, once the grant table index issue and APDP PAE are fixed, next step
is to torture test this branch.

Tested under i386 PAE and non-PAE, Xen3 dom0 and domU. amd64 is only compile
tested.
 1.14.2.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.16.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.20.2.2 05-Mar-2011  rmind sync with head
 1.20.2.1 30-May-2010  rmind sync with head
 1.22.6.2 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.22.6.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.23.2.1 30-Oct-2012  yamt sync with head
 1.26.38.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.26.38.1 25-Jun-2018  pgoyette Sync with HEAD
 1.27.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.27.2.1 10-Jun-2019  christos Sync with HEAD
 1.28.10.2 20-Apr-2020  bouyer We need xenbus_bus_dma_tag for PVHVM too, but without phys->machine translation
 1.28.10.1 20-Apr-2020  bouyer Sync with HEAD
 1.1 22-Jul-2023  mrg xen: declare 'default_consinfo' as extern in a header

this makes pvh_consinit.c actually compile with CONS_OVERRIDE set.
i didn't see any good header to add to (bootinfo.h and cpu.h both
seem to be poor choices but were considered), hence the new one
with just this definition.
 1.31 25-Feb-2023  riastradh xen_intr.c: Use kpreempt_disable/enable around access to curcpu().

curcpu() is not otherwise guaranteed to be stable at these points.

While here, nix nonsensical membars. This need only be synchronized
with interrupts on the same CPU.

Proposed on port-xen:
https://mail-index.netbsd.org/port-xen/2022/07/13/msg010250.html

XXX pullup-8 (in __sti/__cli, __save/restore_flags in include/xen.h)
XXX pullup-9
XXX pullup-10
 1.30 24-May-2022  bouyer branches: 1.30.4;
- msipic_construct_msix_pic(): set mp_table_base to memaddr (without
table_offset), this is what Xen wants
while there use pci_conf_write16() in msi_set_msictl_enablebit() too,
for consistency (it seems that Xen accepts the 32bit write at this point,
but this may change).

- xen_map_msix_pirq(): don't forget to set map_irq.table_base in the
MSI-X case, otherwise Xen maps it as MSI
- call pic_hwunmask() after pirq_establish() in msi/msix case, to make sure
the msi-x vector is unmasked.

Now MSI-X works with Xen so stop disabling it in pci_attach_hook().
 1.29 09-Aug-2021  andvar s/alway /always/
 1.28 01-Aug-2020  jdolecek adjust includes to pull __HAVE_PCI_MSI_MSIX properly
 1.27 07-May-2020  bouyer Change event_set_handler() to take the target CPU parameter. If ci is NULL,
event_set_handler() will choose the CPU and bind the event.
If ci is not NULL the caller is responsible for binding the event.
Use a IPI xcall to register the handlers if needed.
pull in a hack from x86 to force pirq handlers to be mpsafe if registered at
a level != IPL_VM. This is for the com at isa interrupt handler, which
registers at IPL_HIGH and has to way to tell it's mpsafe (taking
KERNEL_LOCK at IPL_HIGH causes deadlocks on MP systems).
 1.26 05-May-2020  bouyer Make DOM0OPS build for PVH/PVHVM too
 1.25 04-May-2020  jdolecek add support for using MSI for XenPV Dom0

use PHYSDEVOP_map_pirq to get the pirq/gsi for MSI/MSI-X, switch also INTx
to use it instead of PHYSDEVOP_alloc_irq_vector

MSI confirmed working with single-vector MSI for wm(4), ahcisata(4), bge(4)

XXX added some provision for MSI-X, but it doesn't actually work (no interrupts
delivered), needs some further investigation; disable MSI-X for XENPV
via flag in x86/pci/pci_machdep.c
 1.24 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.23 21-Apr-2020  jdolecek adjust so that this at least compiles and links with __HAVE_PCI_MSI_MSIX
 1.22 13-Apr-2020  bouyer By default, events are bound to CPU 0 (exept for IPIs and VTIMERs which
are bound to a different CPU at creation time).
Recent MI changes caused the scheduler to choose a different CPU when
probing and attaching xennet devices (I guess it's the xenbus thread which
runs on a different CPU). This cause the callback to be called on a different
CPU than the one expected by the kernel, and the event is ignored.
It is handled when the clock causes the callback to be called on the right
CPU, which is why xennet still run, but slowly.

Change event_set_handler() to do a EVTCHNOP_bind_vcpu if requested to,
and make sure we don't do it for IPIs and VIRQs (for theses, the op fails).
 1.21 06-Apr-2020  jdolecek branches: 1.21.2;
add known_mpsafe parameter also to pirq_establish(), and pass the parameter
to underlying event_set_handler()
 1.20 06-Apr-2020  jdolecek remove restriction on interrupt level for MP-safe interrupt handlers
 1.19 03-Apr-2020  ad Attach xen IPI event counters.
 1.18 23-Dec-2019  thorpej Provide XEN stubs for intr_mask() / intr_unmask().
 1.17 07-Jun-2019  cherry branches: 1.17.2;
Fix build for the XEN3_PVHVM kernel by conditoinally compiling
redundant functions x86_enable_intr()/x86_disable_intr()
 1.16 09-May-2019  bouyer sti/cli are not allowed on Xen, we have to clear/set a bit in the
shared page. Revert x86_disable_intr/x86_enable_intr to plain function
calls on XENPV.
While there, clean up unused functions and macros, and change cli()/sti()
macros to x86_disable_intr/x86_enable_intr.
Makes Xen domU boot again
(http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/)
 1.15 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.14 12-Feb-2019  cherry Move xen event related code which interfaces with the NetBSD interrupt
subsystem into a separate namespace where it can co-exist with the
native equivalent in PVHVM mode.

On PV, we alias and export the native symbols - this means that
although the namespace is different, the semantics must be identical.

Eg: xen_intr_establish_xname() vs. intr_establish_xname().

The specific functions we need in PVHVM are:

- spllower, xen_spllower (for native as well as XEN event spl
despatch/defer)
- xen_disable_intr()/xen_enable_intr() ,
x86_disable_intr()/x86_enable_intr()
- xen_read_psl()/xen_write_psl(),
x86_read_psl()/x86_write_psl()
- intr_establish() et. al, xen_intr_establish() et. al.

This gives us the ability to manage Paravirtualised drivers such as
xbd(4) as well as fully emulated ones such as wd(4)., for eg
 1.13 26-Dec-2018  cherry Xen can use the native splraise(9) functions.

There is no need for a slower C version.
 1.12 25-Dec-2018  cherry fix i386 build - missed sources migration in previous commit.

allow xen_intr.c to build by bringing in static support functions for
-D INTRSTACKSIZE

This should fix the i386 build now.
 1.11 25-Dec-2018  cherry Excise XEN specific code out of x86/x86/intr.c into xen/x86/xen_intr.c

While at it, separate the source function tracking so that the interrupt
paths are truly independant.

Use weak symbol exporting to provision for future PVHVM co-existence
of both files, but with independant paths. Introduce assembler code
such that in a unified scenario, native interrupts get first priority
in spllower(), followed by XEN event callbacks. IPL management and
semantics are unchanged - native handlers and xen callbacks are
expected to maintain their ipl related semantics.

In summary, after this commit, native and XEN now have completely
unrelated interrupt handling mechanisms, including
intr_establish_xname() and assembler stubs and intr handler
management.

Happy Christmas!
 1.10 24-Dec-2018  cherry Bifurcate the interrupt establish functions between XEN and non-XEN

Thus intr_establish_xname() becomes xen_intr_establish_xname() etc.

One consequence of this is that dom0 devices expect the native
function calls to be available and we thus provide weak aliasing for
dom0 builds to succeed. XEN and non-XEN devices are distinguished by
the PIC they are established on. XEN interrupts are exclusively
established on xen_pic, while dom0 interrupts are established on
natively available PICs.

This allows us an orthogonal path to xen device management (eg:
xenstore events) in XENPVHVM, without having to worry about unifying
the vector entry paths, etc., which is quite challenging.
 1.9 16-Jan-2009  jym branches: 1.9.58; 1.9.64; 1.9.66;
Replace x86 memory fences in Xen drivers by their Xen equivalents, to reduce
MD dependency:

x86_lfence() => xen_rmb()
x86_sfence() => xen_wmb()
x86_mfence() => xen_mb()

Discussed in
http://mail-index.netbsd.org/port-xen/2009/01/15/msg004655.html

Ok by bouyer@.
 1.8 01-Jul-2008  bouyer branches: 1.8.4;
spllower(): return immediatly if ci->ci_ilevel <= nlevel, as the asm
versions do.
 1.7 25-May-2008  bouyer branches: 1.7.2;
Add a KASSERT(): Xspllower() will reenable interrupts, so make sure it's
not wrong to do so.
 1.6 24-May-2008  bouyer G/C dead code: remove now-unused softintr-related code.
 1.5 28-Apr-2008  martin branches: 1.5.2;
Remove clause 3 and 4 from TNF licenses
 1.4 21-Apr-2008  cegger branches: 1.4.2;
Access Xen's vcpu info structure per-CPU.
Tested on i386 and amd64 (both dom0 and domU) by me.
Xen2 tested (both dom0 and domU) by bouyer.
OK bouyer
 1.3 14-Apr-2008  cegger branches: 1.3.2;
- use POSIX integer types
- ansify functions
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.8; 1.2.16; 1.2.22;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file xen_intr.c was initially added on branch bouyer-xenamd64.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.22.3 17-Jan-2009  mjf Sync with HEAD.
 1.2.22.2 02-Jul-2008  mjf Sync with HEAD.
 1.2.22.1 02-Jun-2008  mjf Sync with HEAD.
 1.2.16.2 09-Jan-2008  matt sync with HEAD
 1.2.16.1 22-Nov-2007  matt file xen_intr.c was added on branch matt-armv6 on 2008-01-09 01:50:16 +0000
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file xen_intr.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:18 +0000
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file xen_intr.c was added on branch vmlocking on 2007-12-03 19:04:44 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file xen_intr.c was added on branch jmcneill-pm on 2007-11-27 19:36:22 +0000
 1.3.2.2 04-Jun-2008  yamt sync with head
 1.3.2.1 18-May-2008  yamt sync with head.
 1.4.2.2 04-May-2009  yamt sync with head.
 1.4.2.1 16-May-2008  yamt sync with head.
 1.5.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.5.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.7.2.1 03-Jul-2008  simonb Sync with head.
 1.8.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.9.66.4 21-Apr-2020  martin Sync with HEAD
 1.9.66.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.9.66.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.66.1 10-Jun-2019  christos Sync with HEAD
 1.9.64.2 18-Jan-2019  pgoyette Synch with HEAD
 1.9.64.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.9.58.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1862):

sys/arch/xen/x86/xen_intr.c: revision 1.31 (patch)
sys/arch/xen/include/xen.h (apply patch)

xen_intr.c: Use kpreempt_disable/enable around access to curcpu().

curcpu() is not otherwise guaranteed to be stable at these points.

While here, nix nonsensical membars. This need only be synchronized
with interrupts on the same CPU.

Proposed on port-xen:
https://mail-index.netbsd.org/port-xen/2022/07/13/msg010250.html
 1.17.2.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1679):

sys/arch/xen/x86/xen_intr.c: revision 1.31

xen_intr.c: Use kpreempt_disable/enable around access to curcpu().

curcpu() is not otherwise guaranteed to be stable at these points.

While here, nix nonsensical membars. This need only be synchronized
with interrupts on the same CPU.

Proposed on port-xen:
https://mail-index.netbsd.org/port-xen/2022/07/13/msg010250.html
 1.21.2.10 25-Apr-2020  bouyer sync with bouyer-xenpvh-base2 (HEAD)
 1.21.2.9 20-Apr-2020  bouyer channel %d -> chan %d, for the benefit of 'systat vm'
 1.21.2.8 20-Apr-2020  bouyer Sync with HEAD
 1.21.2.7 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.21.2.6 19-Apr-2020  bouyer Add a struct pic * member to struct intrhand.
This will be used for interrupt_get_count()
For Xen remplace pic_type with a pointer to the pic, and add a pointer
to intrhand, in struct pintrhand
Make event_set_handler return the pointer to struct intrhand.
Don't allocate a fake intrhand in xen_intr_establish_xname(), use the
one returned by event_set_handler().
 1.21.2.5 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.21.2.4 14-Apr-2020  bouyer Remove spllower alias, xen_spllower is gone
 1.21.2.3 12-Apr-2020  bouyer We need to call x86_init_preempt() for all CPUs now.
 1.21.2.2 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.21.2.1 11-Apr-2020  bouyer Include ci_isources[] for XenPV too.
Adjust spllower() to XenPV needs, and switch XenPV to the native spllower().
Remove xen_spllower().
 1.30.4.1 31-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #267):

sys/arch/xen/x86/xen_intr.c: revision 1.31

xen_intr.c: Use kpreempt_disable/enable around access to curcpu().

curcpu() is not otherwise guaranteed to be stable at these points.

While here, nix nonsensical membars. This need only be synchronized
with interrupts on the same CPU.

Proposed on port-xen:
https://mail-index.netbsd.org/port-xen/2022/07/13/msg010250.html
 1.42 06-Nov-2023  rin xen_ipi: valid_ipimask: Sprinkle __diagused to fix clang !DIAGNOSTIC build
 1.41 06-Aug-2023  riastradh xen/x86: Get the right intrframe pointer in ddb ipi.

This was broken with the transition from evtchn_set_handler to
intr_establish_xname in 2017, remained broken with the transition
from intr_establish_xname to xen_intr_establish_xname in 2018, and
still remained broken when xen_intr_establish_xname was changed back
to evtchn_set_handler in 2020.

The mechanism is grody -- instead of a secret second argument to the
interrupt handler, the intrframe pointer should be replaced by a
struct cpu_info member that is saved and restored by the interrupt
handler calling logic. But we should make sure the replacement
actually works first -- which is not trivial in part because the
users are hidden behind sketchy function pointer casts.

With any luck, this will make `mach cpu N' work in ddb on Xen.

XXX pullup-10
XXX pullup-9 (by patch)
 1.40 05-Jan-2022  christos branches: 1.40.4;
remove DIAGNOSTIC so that function is defined for KASSERTMSG. Hope that the
compiler removes it.
 1.39 07-May-2020  bouyer Change event_set_handler() to take the target CPU parameter. If ci is NULL,
event_set_handler() will choose the CPU and bind the event.
If ci is not NULL the caller is responsible for binding the event.
Use a IPI xcall to register the handlers if needed.
pull in a hack from x86 to force pirq handlers to be mpsafe if registered at
a level != IPL_VM. This is for the com at isa interrupt handler, which
registers at IPL_HIGH and has to way to tell it's mpsafe (taking
KERNEL_LOCK at IPL_HIGH causes deadlocks on MP systems).
 1.38 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.37 21-Apr-2020  ad Remove spurious reference to XEN_IPI_KICK - it represents the absence of
a specific IPI type.
 1.36 13-Apr-2020  bouyer By default, events are bound to CPU 0 (exept for IPIs and VTIMERs which
are bound to a different CPU at creation time).
Recent MI changes caused the scheduler to choose a different CPU when
probing and attaching xennet devices (I guess it's the xenbus thread which
runs on a different CPU). This cause the callback to be called on a different
CPU than the one expected by the kernel, and the event is ignored.
It is handled when the clock causes the callback to be called on the right
CPU, which is why xennet still run, but slowly.

Change event_set_handler() to do a EVTCHNOP_bind_vcpu if requested to,
and make sure we don't do it for IPIs and VIRQs (for theses, the op fails).
 1.35 01-Dec-2019  ad branches: 1.35.6;
Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.
 1.34 23-Nov-2019  ad cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.33 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.32 02-Feb-2019  cherry branches: 1.32.4;
Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.31 27-Jan-2019  dholland fix duplicated chunk from merge
 1.30 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.29 24-Dec-2018  cherry Bifurcate the interrupt establish functions between XEN and non-XEN

Thus intr_establish_xname() becomes xen_intr_establish_xname() etc.

One consequence of this is that dom0 devices expect the native
function calls to be available and we thus provide weak aliasing for
dom0 builds to succeed. XEN and non-XEN devices are distinguished by
the PIC they are established on. XEN interrupts are exclusively
established on xen_pic, while dom0 interrupts are established on
natively available PICs.

This allows us an orthogonal path to xen device management (eg:
xenstore events) in XENPVHVM, without having to worry about unifying
the vector entry paths, etc., which is quite challenging.
 1.28 26-Oct-2018  cherry Decompose hypervisor_enable_event() into functional steps.

The hypervisor_unmask_event() step is relevant for any event.

The pirq related step is only relevant for pirq bound events.

Prune blanket usage of this, so that usage is semantically appropriate.
 1.27 24-Oct-2018  cherry When using the intr_establish_xname() interface to register
XEN events, follow established x86/intr.c conventions - set
the 'legacy' irq value to -1, to indicate that the pic, pin
combination (&xen_pic, port) is used for registration.
 1.26 24-Jul-2018  bouyer Fix what looks like a typo in xen_send_ipi():
ci != NULL || ci != curcpu()
is always true
 1.25 24-Jun-2018  jdolecek branches: 1.25.2;
add support for kern.intr.list aka intrctl(8) 'list' for xen

event_set_handler() and pirq_establish() now have extra intrname
parameter; shared intr_create_intrid() is used to provide the value

xen drivers were changed to pass the specific driver instance
name as the xname, e.g. 'vcpu0 clock' instead just 'clock', or
'xencons0' instead of 'xencons'

associated evcnt is now changed to use intrname - this matches native x86
 1.24 23-Jun-2018  jdolecek make compile without DDB

PR port-xen/50282
 1.23 06-Nov-2017  cherry branches: 1.23.2;
Switch XEN drivers to use intr_establish_xname()/intr_disestablish()

This completes the API transition.
 1.22 15-Aug-2017  maxv Remove unused arg, to have the same definition as amd64.
 1.21 12-Aug-2017  maxv Remove vm86.

Pass 3.
 1.20 07-Jul-2016  msaitoh branches: 1.20.10;
KNF. Remove extra spaces. No functional change.
 1.19 07-Feb-2015  joerg valid_ipimask is only used under DIAGNOSTIC, so only define it then.
 1.18 19-May-2014  rmind branches: 1.18.2; 1.18.4;
Implement MI IPI interface with cross-call support.
 1.17 12-Feb-2014  dsl branches: 1.17.2;
Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.16 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.15 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.14 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.13 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.12 14-Sep-2013  joerg GC max_cpus.
 1.11 27-Dec-2012  cherry branches: 1.11.2;
Remove unused header evtchn.h from intr.h
 1.10 17-Feb-2012  bouyer branches: 1.10.2;
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.9 30-Dec-2011  cherry branches: 1.9.2;
Remove spurious (debug) printf()
 1.8 28-Dec-2011  cherry Remove temporary variable definition that is unused in non DIAGNOSTIC builds.
 1.7 07-Dec-2011  cegger switch from xen3-public to xen-public.
 1.6 07-Nov-2011  cherry branches: 1.6.4;
Add an ipi callback to force hypervisor callback. this is useful to "re-route" interrupts to a given vcpu
 1.5 27-Sep-2011  jym branches: 1.5.2;
Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
 1.4 15-Aug-2011  cherry branches: 1.4.2;
invert buggy ci_flag test
 1.3 10-Aug-2011  cherry KNF police (rmind@ :-)
 1.2 10-Aug-2011  cherry xen ipi infrastructure
 1.1 03-Jun-2011  cherry branches: 1.1.2;
file xen_ipi.c was initially added on branch cherry-xenmp.
 1.1.2.4 18-Sep-2011  cherry Use an IPI to re-route events to the cpu where the handler has been registered
 1.1.2.3 17-Aug-2011  cherry Pullup relevant changes from -current
 1.1.2.2 31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.1.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.4.2.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.4.2.1 15-Aug-2011  jym file xen_ipi.c was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.5.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.2.3 23-Jan-2013  yamt sync with head
 1.5.2.2 17-Apr-2012  yamt sync with head
 1.5.2.1 10-Nov-2011  yamt sync with head
 1.6.4.1 18-Feb-2012  mrg merge to -current.
 1.9.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.10.2.3 03-Dec-2017  jdolecek update from HEAD
 1.10.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.2.1 25-Feb-2013  tls resync with head
 1.11.2.1 18-May-2014  rmind sync with head
 1.17.2.1 10-Aug-2014  tls Rebase.
 1.18.4.2 28-Aug-2017  skrll Sync with HEAD
 1.18.4.1 06-Apr-2015  skrll Sync with HEAD
 1.18.2.1 08-Feb-2015  snj Pull up following revision(s) (requested by joerg in ticket #498):
sys/arch/xen/x86/xen_ipi.c: revision 1.19
valid_ipimask is only used under DIAGNOSTIC, so only define it then.
 1.20.10.1 30-Apr-2021  martin Pull up following revision(s) (requested by kre in ticket #1675):

sys/arch/xen/x86/xen_ipi.c: revision 1.24 (patch)

make compile without DDB
PR port-xen/50282
 1.23.2.4 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.23.2.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.23.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.23.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.25.2.4 21-Apr-2020  martin Sync with HEAD
 1.25.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.25.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.2.1 10-Jun-2019  christos Sync with HEAD
 1.32.4.1 10-Aug-2023  sborrill Pull up the following revisions(s) (requested by riastradh in ticket #1713):
sys/arch/xen/x86/xen_ipi.c: revision 1.41 via patch

xen/x86: Fix 'mach cpu N' in ddb by passing the right pointer to a
struct intrframe to IPI handlers.
 1.35.6.7 25-Apr-2020  bouyer sync with bouyer-xenpvh-base2 (HEAD)
 1.35.6.6 20-Apr-2020  bouyer Misc fixes after merge
 1.35.6.5 20-Apr-2020  bouyer Sync with HEAD
 1.35.6.4 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.35.6.3 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.35.6.2 12-Apr-2020  bouyer Add XEN_IPI_KPREEMPT to list of valid IPIs
 1.35.6.1 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING
 1.40.4.2 14-Dec-2023  martin Pull up following revision(s) (requested by rin in ticket #494):

sys/arch/xen/x86/xen_ipi.c: revision 1.42
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.86

xen_ipi: valid_ipimask: Sprinkle __diagused to fix clang !DIAGNOSTIC build

nvmm_x86_vmx: vmx_vmptrst: Sprinkle __diagused to fix clang !DIAGNOSTIC build
 1.40.4.1 10-Aug-2023  sborrill Pull up the following revisions(s) (requested by riastradh in ticket #318):
sys/arch/xen/x86/xen_ipi.c: revision 1.41

xen/x86: Fix 'mach cpu N' in ddb by passing the right pointer to a
struct intrframe to IPI handlers.
 1.10 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.9 24-Apr-2021  thorpej branches: 1.9.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.8 02-May-2020  bouyer branches: 1.8.4;
Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.7 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.6 14-Feb-2019  cherry branches: 1.6.4; 1.6.12;
Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.5 13-Feb-2019  cherry Conditionally compile a conditionally used variable.
 1.4 22-Dec-2018  maxv branches: 1.4.2;
Style, once again.
 1.3 22-Dec-2018  cherry This change modifies the mainbus(4) entry point for all x86 sub-archs
in the following way:

i) It provides a unified entry point in
x86/x86/mainbus.c:mainbus_attach()
ii) It carves out the preliminary bus attachment sequence that is
common to all sub-archs into
x86/x86/mainbus.c: x86_cpubus_attach()
iii) It consolidates the remaining pathways as internal callee
functions so that these may be called piecemeal if required. A
special usecase of this is XEN PVHVM which may need to call the
native configure path, the xen configure path, or both.
iv) It moves the driver private data structures from
i386/i386_mainbus.c to an x86/ level one. This allows for other
sub-arch's to do similar, if needed. (They do not at the moment).
v) For dom0 kernels, it enables 'acpi0 at mainbus?' and
'acpi0 at hypervisorbus'. This serves two purposes:
a) To demonstrate the possibility of dynamic configuration tree
traversal ordering changes.
b) To allow for the common acpi_check(self, "acpibus") call in
x86/mainbus.c to not barf when it is called from the dom0 attach
path. We allow for the acpi0 device to be a child of mainbus with
the changes to amd64/conf/XEN3_DOM0 and i386/conf/XEN3PAE_DOM0
without actually probing further in the code. This path will later
be pursued in a PVHVM boot codepath.

There should be no operative changes with this change. If there are,
please complain loudly.
 1.2 22-Dec-2018  cherry Don't forget pedigree. Re-introduce old RCS Id tags from the originals
 1.1 22-Dec-2018  cherry Move mainbus(4) driver files in various x86 sub-archs to name prefixed
versions. This allows us to further modularise them by unifying common
bus probe code in x86/x86/mainbus.c to be introduced next.

This commit has no functional changes. It is done for ease of
visibility of newer diffs in the queue.
 1.4.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.4.2.1 22-Dec-2018  pgoyette file xen_mainbus.c was added on branch pgoyette-compat on 2018-12-26 14:01:46 +0000
 1.6.12.3 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.6.12.2 16-Apr-2020  bouyer Don't try to attach hypervisor on non Xen; no more error messages about
hypervisor when booting on bare metal.
 1.6.12.1 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.6.4.2 10-Jun-2019  christos Sync with HEAD
 1.6.4.1 14-Feb-2019  christos file xen_mainbus.c was added on branch phil-wifi on 2019-06-10 22:06:56 +0000
 1.8.4.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.9.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.41 25-Aug-2023  riastradh xen: Provide definitions or ifdefs to make drm build in XEN3_DOM0.

No idea if it works, but it builds now.

PR port-xen/49330
 1.40 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.39 06-Sep-2020  riastradh Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.
 1.38 19-Jul-2020  maxv don't include opt_user_ldt.h when it is not needed
 1.37 07-Jan-2020  ad Correction to previous.
 1.36 07-Jan-2020  ad pmap_extract_ma(): don't need to take pm_lock for pmap_kernel().
 1.35 04-Jan-2020  ad x86 pmap improvements, reducing system time during a build by about 15% on
my test machine:

- Replace the global pv_hash with a per-pmap record of dynamically allocated
pv entries. The data structure used for this can be changed easily, and
has no special concurrency requirements. For now go with radixtree.

- Change pmap_pdp_cache back into a pool; cache the page directory with the
pmap, and avoid contention on pmaps_lock by adjusting the global list in
the pool_cache ctor & dtor. Align struct pmap and its lock, and update
some comments.

- Simplify pv_entry lists slightly. Allow both PP_EMBEDDED and dynamically
allocated entries to co-exist on a single page. This adds a pointer to
struct vm_page on x86, but shrinks pv_entry to 32 bytes (which also gets
it nicely aligned).

- More elegantly solve the chicken-and-egg problem introduced into the pmap
with radixtree lookup for pages, where we need PTEs mapped and page
allocations to happen under a single hold of the pmap's lock. While here
undo some cut-n-paste.

- Don't adjust pmap_kernel's stats with atomics, because its mutex is now
held in the places the stats are changed.
 1.34 15-Dec-2019  ad uvm_pagerealloc() can now block because of radixtree manipulation, so defer
freeing PTPs until pmap_unmap_ptes(), where we still have the pmap locked
but can finally tolerate context switches again.

To be revisited soon: pmap_map_ptes() seems broken WRT other pmap load.

Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
 1.33 08-Dec-2019  ad Merge x86 pmap changes from yamt-pagecache:

- Deal better with the multi-level pmap object locking kludge.
- Handle uvm_pagealloc() being able to block.
 1.32 30-Oct-2019  maxv Switch to new PTE bits.
 1.31 10-Mar-2019  maxv Two changes:

* Allow large pages to be passed in pmap_pdes_valid, this happens under
DDB when it reads RIP (.text), called via pmap_extract.

* Invert a branch in pmap_extract, so that 'l_cpu' is not touched if we're
dealing with the kernel pmap.

This fixes 'boot -d'.
 1.30 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.29 07-Mar-2019  maxv Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.
 1.28 02-Feb-2019  cherry Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.27 26-Jul-2018  maxv Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.
 1.26 23-Mar-2017  maxv branches: 1.26.12; 1.26.14;
Remove PG_k completely.
 1.25 26-Dec-2016  cherry branches: 1.25.2;
In the MP case,
do not attempt to pmap_tlb_shootdown() after a pmap_kenter_ma() during
boot. pmap_tlb_shootdown() assumes post boot. Instead invalidate the
entry on the local CPU only.

XXX: to DTRT, probably this assumption needs re-examination.
XXX: The tradeoff is a (predicted) single word size comparison
penalty, so perhaps a decision needs performance stats.
 1.24 13-Dec-2016  kamil Torn down KSTACK_CHECK_DR0, i386-only feature to detect stack overflow

This feature was intended to detect stack overflow with CPU Debug Registers
(x86). It was never ported to other ports, neither amd64 and should be
adapted for SMP...

Currently there might be better ways to detect stack overflows like page
mapping protection. Since the number of Debug Registers is restricted
(4 on x86), torn it down completely.

This interface introduced helper functions for Debug Registers, they will
be replaced with the new <x86/dbregs.h> interface.

KSTACK_CHECK_DR0 was disabled by default and won't affect ordinary users.

Sponsored by <The NetBSD Foundation>
 1.23 21-Nov-2016  ozaki-r Sweep unnecessary xcall.h inclusions
 1.22 24-Jun-2012  jym branches: 1.22.2; 1.22.14; 1.22.16; 1.22.20;
Enable the map/unmap recursive mapping functions for all Xen ports for
save/restore.

For an unknown reason (to me) Xen refuses to update VM translations
when the entry is pointing back to itself (which is precisely
what our recursive VM model does). So enable the functions that take
care of this, which will avoid all sort of memory corruption upon restore
leading domU to trample upon itself.

Save/restore works again for amd64. The occasional domU frontend corruption is
still present, but is harmless to dom0. Now we have a working shell and
ddb inside domU, that helps debugging a tiny bit.

XXX pull-up to -6.
 1.21 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.20 11-Mar-2012  jym Split the map/unmap code from the sync/flush code: move xpq_flush_queue()
calls after pmap_{,un}map_recursive_entries() so that pmap's handlers
handle the flush themselves.

Now pmap_{,un}map_recursive_entries() do what their names imply, nothing more.

Fix pmap_xen_suspend()'s comment: APDPs are now gone.

pmap's handlers are called deep during kernel save/restore. We already
are at IPL_VM + kpreemption disabled. No need to wrap the xpq_flush_queue()
with splvm/splx.
 1.19 02-Mar-2012  bouyer Add some more KASSERT()
 1.18 24-Feb-2012  cherry (xen) - remove the (*xpq_cpu)() shim.We hasten the %fs/%gs setup process during boot.Although this is hacky, it lets us use the non-xen specificpmap_pte_xxx() functions in pmap code (and others).
 1.17 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.16 28-Jan-2012  cherry branches: 1.16.2;
stop using alternate pde mapping in xen pmap
 1.15 22-Jan-2012  cherry Do not clobber pmap_kernel()'s pdir unnecessarily while syncing per-cpu pdirs
 1.14 19-Jan-2012  bouyer add a missing splvm()/splx() to protect the xpq queue.
 1.13 09-Jan-2012  cherry Harden cross-cpu L3 sync - avoid optimisations that may race.
Update ci->ci_kpm_pdir from user pmap, not global pmap_kernel() entry which may get clobbered by other CPUs.
XXX: Look into why we use pmap_kernel() userspace entries at all.
 1.12 30-Dec-2011  cherry per-cpu shadow directory pages should be updated locally via cross-calls. Do this.
 1.11 07-Dec-2011  cegger switch from xen3-public to xen-public.
 1.10 23-Nov-2011  jym branches: 1.10.2;
Move Xen-specific functions to Xen pmap. Requested by cherry@.

Un'ifdef XEN in xen_pmap.c, it is always defined there.
 1.9 20-Nov-2011  jym Expose pmap_pdp_cache publicly to x86/xen pmap. Provide suspend/resume
callbacks for Xen pmap.

Turn static internal callbacks of pmap_pdp_cache.

XXX the implementation of pool_cache_invalidate(9) is still wrong, and
IMHO this needs fixing before -6. See
http://mail-index.netbsd.org/tech-kern/2011/11/18/msg011924.html
 1.8 08-Nov-2011  cherry Expose the PG_k #define pt/pd bit to both xen and "baremetal" x86. This is required, since kernel pages are mapped with user permissions in XEN/amd64 since the VM kernel runs in ring3. Since XEN/i386(including PAE) runs in ring1, supervisor mode is appropriate for these ports. We need to share this since the pmap implementation is still shared. Once the xen implementation is sufficiently independant of the x86 one, this can be made private to xen/include/xenpmap.h
 1.7 06-Nov-2011  cherry [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.6 18-Oct-2011  jym branches: 1.6.2;
Move Xen specific functions out of x86 native pmap to xen_pmap.c.

Provide a wrapper to trigger pmap pool_cache(9) invalidations without
exposing the caches to outside world.
 1.5 20-Sep-2011  jym Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.4 13-Aug-2011  cherry Add locking around ops to the hypervisor MMU "queue".
 1.3 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.2 01-Feb-2011  chuck branches: 1.2.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.1 10-May-2010  dyoung branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.12;
Provide pmap_enter_ma(), pmap_extract_ma(), pmap_kenter_ma() in all x86
kernels, and use them in the bus_space(9) implementation instead of ugly
Xen #ifdef-age. In a non-Xen kernel, the _ma() functions either call or
alias the equivalent _pa() functions.

Reviewed on port-xen@netbsd.org and port-i386@netbsd.org. Passes
rmind@'s and bouyer@'s inspection. Tested on i386 and on Xen DOMU /
DOM0.
 1.1.12.1 08-Feb-2011  bouyer Sync with HEAD
 1.1.10.1 06-Jun-2011  jruoho Sync with HEAD.
 1.1.8.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.1.8.3 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.1.8.2 24-Oct-2010  jym Sync with HEAD
 1.1.8.1 10-May-2010  jym file xen_pmap.c was added on branch jym-xensuspend on 2010-10-24 22:48:22 +0000
 1.1.6.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.1.6.1 10-May-2010  uebayasi file xen_pmap.c was added on branch uebayasi-xip on 2010-08-17 06:45:35 +0000
 1.1.4.2 11-Aug-2010  yamt sync with head.
 1.1.4.1 10-May-2010  yamt file xen_pmap.c was added on branch yamt-nfs-mp on 2010-08-11 22:52:59 +0000
 1.1.2.4 05-Mar-2011  rmind sync with head
 1.1.2.3 31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.1.2.2 30-May-2010  rmind sync with head
 1.1.2.1 10-May-2010  rmind file xen_pmap.c was added on branch rmind-uvmplock on 2010-05-30 05:17:14 +0000
 1.2.2.5 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.2.2.4 22-Aug-2011  cherry Remove spurious locks
 1.2.2.3 20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.2.2.2 16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.2.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.6.2.4 30-Oct-2012  yamt sync with head
 1.6.2.3 23-May-2012  yamt sync with head.
 1.6.2.2 17-Apr-2012  yamt sync with head
 1.6.2.1 10-Nov-2011  yamt sync with head
 1.10.2.6 29-Apr-2012  mrg sync to latest -current.
 1.10.2.5 05-Apr-2012  mrg sync to latest -current.
 1.10.2.4 06-Mar-2012  mrg sync to -current
 1.10.2.3 06-Mar-2012  mrg sync to -current
 1.10.2.2 04-Mar-2012  mrg sync to latest -current.
 1.10.2.1 18-Feb-2012  mrg merge to -current.
 1.16.2.3 02-Jul-2012  jdc Pull up revisions:
src/sys/arch/xen/include/xenpmap.h revision 1.35 via patch
src/sys/arch/xen/x86/xen_pmap.c revision 1.22 via patch
(requested by jym in ticket #372).

Enable the map/unmap recursive mapping functions for all Xen ports for
save/restore.

For an unknown reason (to me) Xen refuses to update VM translations
when the entry is pointing back to itself (which is precisely
what our recursive VM model does). So enable the functions that take
care of this, which will avoid all sort of memory corruption upon restore
leading domU to trample upon itself.

Save/restore works again for amd64. The occasional domU frontend
corruption is
still present, but is harmless to dom0. Now we have a working shell and
ddb inside domU, that helps debugging a tiny bit.

XXX pull-up to -6.
 1.16.2.2 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.16.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.22.20.2 26-Apr-2017  pgoyette Sync with HEAD
 1.22.20.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.22.16.3 28-Aug-2017  skrll Sync with HEAD
 1.22.16.2 05-Feb-2017  skrll Sync with HEAD
 1.22.16.1 05-Dec-2016  skrll Sync with HEAD
 1.22.14.1 17-Apr-2017  snj Pull up following revision(s) (requested by khorben in ticket #1367):
sys/arch/amd64/conf/XEN3_DOM0: revision 1.126
sys/arch/i386/conf/XEN3_DOM0: revision 1.104
sys/arch/xen/x86/xen_pmap.c: revision 1.25
In the MP case,
do not attempt to pmap_tlb_shootdown() after a pmap_kenter_ma() during
boot. pmap_tlb_shootdown() assumes post boot. Instead invalidate the
entry on the local CPU only.
XXX: to DTRT, probably this assumption needs re-examination.
XXX: The tradeoff is a (predicted) single word size comparison
penalty, so perhaps a decision needs performance stats.
xen dom0 SMP is now bootable again.
--
add the 'options MULTIPROCESSOR' in respective configs, but mark them
experimental - and thus disabled by default.
 1.22.2.1 03-Dec-2017  jdolecek update from HEAD
 1.25.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.26.14.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.26.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.26.14.1 10-Jun-2019  christos Sync with HEAD
 1.26.12.1 28-Jul-2018  pgoyette Sync with HEAD
 1.18 01-Sep-2022  bouyer Add PVH support for backend drivers grant operation.
Now a domU in a PVH dom0 boots multiuser.
 1.17 21-Feb-2021  jdolecek in xen_shm_map(), make sure to unmap any successfully mapped pages
before returning failure if there is partial failure

fix detection of partial failure - GNTTABOP_map_grant_ref can actually return
zero for partial failure, so we need to always check all the entries
to detect it

previously, DIAGNOSTIC kernel triggered panic() for partial failure,
and non-DIAGNOSTIC kernel did not detect it at all, leading to Dom0 page
fault later; since the mapping failure can be triggered by malicious
DomU via bad grant reference, it's important to expect the calls
to fail, and handle it gracefully without crashing Dom0

part of fixes for XSA-362
 1.16 25-Apr-2020  bouyer branches: 1.16.2;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.15 19-Apr-2020  jdolecek change interface for xen_shm_map() so that caller always supplies the VA,
it now fails only if the Xen hypercall fails, in which case the failure
is final

change xbdback to pre-allocate KVA on xbdback attach (and free on detach),
so it has always KVA to map the request pages

remove no longer needed KVA allocation failure handling
 1.14 13-Apr-2020  chs slightly change and fix the semantics of pool_set*wat(), pool_sethardlimit()
and pool_prime() (and their pool_cache_* counterparts):

- the pool_set*wat() APIs are supposed to specify thresholds for the count of
free items in the pool before pool pages are automatically allocated or freed
during pool_get() / pool_put(), whereas pool_sethardlimit() and pool_prime()
are supposed to specify minimum and maximum numbers of total items
in the pool (both free and allocated). these were somewhat conflated
in the existing code, so separate them as they were intended.

- change pool_prime() to take an absolute number of items to preallocate
rather than an increment over whatever was done before, and wait for
any memory allocations to succeed. since pool_prime() can no longer fail
after this, change its return value to void and adjust all callers.

- pool_setlowat() is documented as not immediately attempting to allocate
any memory, but it was changed some time ago to immediately try to allocate
up to the lowat level, so just fix the manpage to describe the current
behaviour.

- add a pool_cache_prime() to complete the API set.
 1.13 27-Jan-2019  pgoyette branches: 1.13.4; 1.13.10;
Merge the [pgoyette-compat] branch
 1.12 27-Jul-2018  maxv style, localify global variables, etc, no real functional change
 1.11 24-Jun-2018  jdolecek branches: 1.11.2;
mark with XXXSMP all remaining spl*() and tsleep() calls
 1.10 02-Sep-2011  dyoung branches: 1.10.52;
Report vmem(9) errors out-of-band so that we can use vmem(9) to manage
ranges that include the least and the greatest vmem_addr_t. Update
vmem(9) uses throughout the kernel. Slightly expand on the tests in
subr_vmem.c, which still pass. I've been running a kernel with this
patch without any trouble.
 1.9 31-Jul-2011  jym Fix typo in comment.
 1.8 28-Mar-2010  snj Spell "enough" properly.
 1.7 19-Oct-2009  bouyer branches: 1.7.2; 1.7.4;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.6 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.5 16-Mar-2009  cegger ansify function definitions
 1.4 18-Dec-2008  cegger branches: 1.4.2;
remove unused malloc.h
 1.3 17-Feb-2008  bouyer branches: 1.3.6; 1.3.10; 1.3.18;
Add missing __KERNEL_RCSID()
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.8; 1.2.16;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file xen_shm_machdep.c was initially added on branch bouyer-xenamd64.
 1.1.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.16.3 23-Mar-2008  matt sync with HEAD
 1.2.16.2 09-Jan-2008  matt sync with HEAD
 1.2.16.1 22-Nov-2007  matt file xen_shm_machdep.c was added on branch matt-armv6 on 2008-01-09 01:50:16 +0000
 1.2.8.3 27-Feb-2008  yamt sync with head.
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file xen_shm_machdep.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:19 +0000
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file xen_shm_machdep.c was added on branch vmlocking on 2007-12-03 19:04:45 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file xen_shm_machdep.c was added on branch jmcneill-pm on 2007-11-27 19:36:23 +0000
 1.3.18.2 28-Apr-2009  skrll Sync with HEAD.
 1.3.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.3.10.4 11-Aug-2010  yamt sync with head.
 1.3.10.3 11-Mar-2010  yamt sync with head
 1.3.10.2 19-Aug-2009  yamt sync with head.
 1.3.10.1 04-May-2009  yamt sync with head.
 1.3.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.4.2.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.4.2.3 24-Oct-2010  jym Sync with HEAD
 1.4.2.2 01-Nov-2009  jym Sync with HEAD.
 1.4.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.7.4.1 30-May-2010  rmind sync with head
 1.7.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.10.52.2 28-Jul-2018  pgoyette Sync with HEAD
 1.10.52.1 25-Jun-2018  pgoyette Sync with HEAD
 1.11.2.2 21-Apr-2020  martin Sync with HEAD
 1.11.2.1 10-Jun-2019  christos Sync with HEAD
 1.13.10.1 20-Apr-2020  bouyer Sync with HEAD
 1.13.4.1 23-Feb-2021  martin Pull up following revision(s) (requested by jdolecek in ticket #1210):

sys/arch/xen/x86/xen_shm_machdep.c: revision 1.17 (via patch)

in xen_shm_map(), make sure to unmap any successfully mapped pages
before returning failure if there is partial failure
fix detection of partial failure - GNTTABOP_map_grant_ref can actually re=
turn

zero for partial failure, so we need to always check all the entries
to detect it

previously, kernel triggered panic() for partial failure, leading to
Dom0 page fault later; since the mapping failure can be triggered by
malicious DomU via bad grant reference, it's important to expect
the calls to fail, and handle it gracefully without crashing Dom0

part of fixes for XSA-362
 1.16.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.29 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.28 06-May-2020  bouyer xpq_queue_* use per-cpu queue; splvm() is enough to protect them.
remove the XXX SMP comments.
 1.27 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.26 04-May-2019  kre branches: 1.26.8;

More of maxv's "switch to proper types" - hopefully unbreak i386 build.
 1.25 04-May-2019  maxv More inlined ASM. While here switch to proper types.
 1.24 06-Jan-2019  cherry Rollback http://mail-index.netbsd.org/source-changes/2018/12/22/msg101629.html

This change breaks module loading due to weak alias being unsupported
in the kernel module linker.

Requested by maxv@ and others as it affects their work.

No immediate decision on a replacement method is available, but other options
suggested include pre-processing, conditional compilation (#ifdef etc) and other
source level methods to avoid linktime decision making.
 1.23 22-Dec-2018  cherry Introduce a weak alias method of exporting different implementations
of the same API.

For eg: the amd64 native implementation of invlpg() now becomes
amd64_invlpg() with a weak symbol export of invlpg(), while the XEN
implementation becomes xen_invlpg(), also weakly exported as invlpg()

Note that linking in both together without having an override function
named invlpg() would be a mistake, as we have limited control over
which of the two options would emerge as the finally exported invlpg()
resulting in a potential situation where the wrong function is finally
exported. This change avoids this situation.

We should however include an override function invlpg() in that case,
such that it is able to then pass on the call to the appropriate
backing function (amd64_invlpg() in the case of native, and
xen_invlpg() in the case of under XEN virtualisation) at runtime.

This change does not introduce such a function and therefore does not
alter builds to include native as well as XEN implementations in the
same binary. This will be done later, with the introduction of XEN
PVHVM mode, where precisely such a runtime switch is required.

There are no operational changes introduced by this change.
 1.22 18-Oct-2018  cherry Zero out page table memory for IDT before use.
To copy the IDT entry before registration, de-reference the indexed
value, not the first entry.
Add a MAX_XEN_IDT value for max entries we expect and KASSERT() for
this as a sanity check.
 1.21 23-Sep-2018  cherry Fix for i386, functionality intended in:
http://mail-index.netbsd.org/source-changes/2018/09/23/msg099357.html

This should fix the build for both GENERIC and XEN3PAE_DOM0

This has not been boot tested on native or xen3pae

Notes: pmap_changeprot_local() seems to be x86_64 only.
I was a bit surprised by this initially, but I suspect that the table
protections are enforced via ring0/ring1 fencing rather than page protections

the gdt registration code in i386 is still messy. I will leave it as is
for now - to avoid a rabbit hole.
 1.20 23-Sep-2018  cherry Make XEN use the same api as native, for idt vector allocation
and registration.

lidt() placed in xenfunc() on maxv@ suggestion.

There should be no functional change due to this commit.

Tested on amd64 native and XEN.
 1.19 26-Jul-2018  maxv Remove dead code.
 1.18 24-Jun-2018  jdolecek branches: 1.18.2;
mark with XXXSMP all remaining spl*() and tsleep() calls
 1.17 15-Oct-2017  maxv branches: 1.17.2;
Add setusergs on Xen, and simplify.
 1.16 05-Feb-2017  maxv Rename ldt->ldtstore and gdt->gdtstore on i386. It reduces the diff with
amd64, and makes it easier to track down these variables on nxr - 'ldt'
and 'gdt' being common keywords.
 1.15 13-Dec-2016  kamil branches: 1.15.2;
Switch x86 CPU Debug Register types from vaddr_t to register_t

This is more opaque and appropriate type, as vaddr_t is meant to be used
for vitual address value. Not all DR on x86 are used to represent virtual
address (DR6 and DR7 are definitely not).

No functional change intended.

Change suggested by <christos>

Sponsored by <The NetBSD Foundation>
 1.14 27-Nov-2016  kamil Add accessors for available x86 Debug Registers

There are 8 Debug Registers on i386 (available at least since 80386) and 16
on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and
DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3,
DR6-DR7 for all ports.

Debug Registers x86:
* DR0-DR3 Debug Address Registers
* DR4-DR5 Reserved
* DR6 Debug Status Register
* DR7 Debug Control Register
* DR8-DR15 Reserved

Access the registers is available only from a kernel (ring 0) as there is
needed top protected access. For this reason there is need to use special
XEN functions to get and set the registers in the XEN3 kernels.

XEN specific functions as defined in NetBSD:
- HYPERVISOR_get_debugreg()
- HYPERVISOR_set_debugreg()

This code extends the existing rdr6() and ldr6() accessor for additional:
- rdr0() & ldr0()
- rdr1() & ldr1()
- rdr2() & ldr2()
- rdr3() & ldr3()
- rdr7() & ldr7()

Traditionally accessors for DR6 were passing vaddr_t argument, while it's
appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's
not a big deal. The resulting functionality should be equivalent so stick
to this convention and use the vaddr_t type for all DR accessors.

There was already a function defined for rdr6() in XEN, but it had a nit on
AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on
AMD64), truncating result. It still works for DR6, but for the sake of
simplicity always return full 64-bit value.

New accessors duplicate functionality of the dr0() function available on
i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with
logic to set appropriate types of interrupts, now accessors are designed to
pass verbatim values from user-land (with simple sanity checks in the
kernel). At the moment there are no plans to make possible to coexist
KSTACK_CHECK_DR0 with debug registers for user applications (debuggers).

options KSTACK_CHECK_DR0
Detect kernel stack overflow using DR0 register. This option uses DR0
register exclusively so you can't use DR0 register for other purpose
(e.g., hardware breakpoint) if you turn this on.

The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported
to amd64.

Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0.

Sponsored by <The NetBSD Foundation>
 1.13 06-Nov-2011  cherry branches: 1.13.10; 1.13.28; 1.13.32;
[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.12 13-Aug-2011  cherry branches: 1.12.2;
Add locking around ops to the hypervisor MMU "queue".
 1.11 24-Jul-2010  jym branches: 1.11.6;
Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.10 12-Feb-2010  jym branches: 1.10.2;
Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.
 1.9 23-Oct-2009  snj branches: 1.9.2;
Remove 3rd and 4th clauses. OK cl@ (copyright holder).
 1.8 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.7 11-May-2008  ad branches: 1.7.12;
Don't reload LDTR unless a new value, which only happens for USER_LDT.
 1.6 30-Apr-2008  cegger branches: 1.6.2;
AMD's APM Volume 2 says 'All control registers are 64bit in long mode'.
Fix the CR0 prototype to match this (the asm implementation is correct though).
OK ad
 1.5 21-Apr-2008  cegger branches: 1.5.2;
Access Xen's vcpu info structure per-CPU.
Tested on i386 and amd64 (both dom0 and domU) by me.
Xen2 tested (both dom0 and domU) by bouyer.
OK bouyer
 1.4 17-Feb-2008  bouyer branches: 1.4.6; 1.4.8;
Add missing __KERNEL_RCSID()
 1.3 11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.2 22-Nov-2007  bouyer branches: 1.2.2; 1.2.4; 1.2.8; 1.2.12; 1.2.16;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.1 17-Oct-2007  bouyer branches: 1.1.2; 1.1.4;
file xenfunc.c was initially added on branch bouyer-xenamd64.
 1.1.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.1.2.2 21-Oct-2007  bouyer Factorise some Xen pmap code in x86_xpmap.c.
More xpmap_{ptom,mtop} -> xpmap_{ptom,mtop}_masked

The xenamd64 kernel is now good enough to complete a sysinst install from
xennet to xbd.
 1.1.2.1 17-Oct-2007  bouyer Prepare for xenamd64:
- kill xen/i386/identcpu.c, use i386/i386/identcpu.c instead (with a few
#ifndef XEN)
- move some files that can be shared between i386 and amd64 from
xen/i386 to xen/x86 (or to xen/xen for non-cpu-specific code)
- split assembly out of xen/include/hypervisor.h to xen/include/hypercalls.h
- use <xen/...> instead of <machine/...> for cpu-independant include files.

more work needed here, i386-specific files should got out of arch/xen to
arch/xeni386, and more code shared with arch/i386.
 1.2.16.3 23-Mar-2008  matt sync with HEAD
 1.2.16.2 09-Jan-2008  matt sync with HEAD
 1.2.16.1 22-Nov-2007  matt file xenfunc.c was added on branch matt-armv6 on 2008-01-09 01:50:17 +0000
 1.2.12.1 09-Jan-2008  bouyer Merge xen bits to i386/i386/gdt.c. Convert remaining uses of PTE_* macros to
pmap_pte_* macros/inlines.
Fix think-o in pmap.c for native i386.
 1.2.8.4 27-Feb-2008  yamt sync with head.
 1.2.8.3 21-Jan-2008  yamt sync with head
 1.2.8.2 07-Dec-2007  yamt sync with head
 1.2.8.1 22-Nov-2007  yamt file xenfunc.c was added on branch yamt-lazymbuf on 2007-12-07 17:27:19 +0000
 1.2.4.2 03-Dec-2007  ad Sync with HEAD.
 1.2.4.1 22-Nov-2007  ad file xenfunc.c was added on branch vmlocking on 2007-12-03 19:04:45 +0000
 1.2.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.2.1 22-Nov-2007  joerg file xenfunc.c was added on branch jmcneill-pm on 2007-11-27 19:36:24 +0000
 1.4.8.1 18-May-2008  yamt sync with head.
 1.4.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.2.4 11-Aug-2010  yamt sync with head.
 1.5.2.3 11-Mar-2010  yamt sync with head
 1.5.2.2 19-Aug-2009  yamt sync with head.
 1.5.2.1 16-May-2008  yamt sync with head.
 1.6.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.7.12.3 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.7.12.2 24-Oct-2010  jym Sync with HEAD
 1.7.12.1 01-Nov-2009  jym Sync with HEAD.
 1.9.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.9.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.10.2.1 05-Mar-2011  rmind sync with head
 1.11.6.3 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.11.6.2 31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.11.6.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.12.2.1 10-Nov-2011  yamt sync with head
 1.13.32.2 20-Mar-2017  pgoyette Sync with HEAD
 1.13.32.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.13.28.3 28-Aug-2017  skrll Sync with HEAD
 1.13.28.2 05-Feb-2017  skrll Sync with HEAD
 1.13.28.1 05-Dec-2016  skrll Sync with HEAD
 1.13.10.1 03-Dec-2017  jdolecek update from HEAD
 1.15.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.17.2.6 18-Jan-2019  pgoyette Synch with HEAD
 1.17.2.5 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.17.2.4 20-Oct-2018  pgoyette Sync with head
 1.17.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.17.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.17.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.18.2.1 10-Jun-2019  christos Sync with HEAD
 1.26.8.2 12-Apr-2020  bouyer kpreempt_disable() only for x86_64 (which calls pmap_changeprot_local)).
On i386 curcpu() is not valid yet and we don't need preemption disabled.
 1.26.8.1 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING

RSS XML Feed