Home | History | Annotate | only in /src/sys/arch/x86
History log of /src/sys/arch/x86
RevisionDateAuthorComments
 1.9 15-Jan-2014  joerg Reduce amount of -no-integrated-as on x86 as .code16 is now supported by
LLVM.
 1.8 15-Sep-2011  christos branches: 1.8.2; 1.8.12; 1.8.16;
fix typo, revert to previous version
 1.7 14-Sep-2011  christos revert previous; bug was in the position of the inclusion of the file.
 1.6 14-Sep-2011  christos Don't depend on the .d file here; since this is the only rule, acpi_wakeup.d
will never be build!
 1.5 20-May-2011  joerg LLVM's assembler parser doesn't support .code32 yet, so disable it as
needed.
 1.4 18-Jan-2009  hans branches: 1.4.2; 1.4.6; 1.4.8;
Use sed, awk and hexdump from tools to make this work on Solaris. Ok by apb.
 1.3 11-Dec-2007  lukem branches: 1.3.4; 1.3.6; 1.3.16; 1.3.24; 1.3.26;
MAKEVERBOSE support
 1.2 09-Dec-2007  jmcneill branches: 1.2.2;
How did these get lost?
 1.1 07-Sep-2007  jmcneill branches: 1.1.2; 1.1.8; 1.1.10; 1.1.12;
file Makefile.wakecode.inc was initially added on branch jmcneill-pm.
 1.1.12.2 13-Dec-2007  yamt sync with head.
 1.1.12.1 11-Dec-2007  yamt sync with head.
 1.1.10.1 26-Dec-2007  ad Sync with head.
 1.1.8.1 27-Dec-2007  mjf Sync with HEAD.
 1.1.2.2 24-Sep-2007  joerg Generate the ACPI wakecode image dynamically at build time.
 1.1.2.1 07-Sep-2007  jmcneill Share ACPI wakecode generation between i386 and amd64, and convert amd64
to use joerg's new build scripts for generating wakecode.
 1.2.2.1 13-Dec-2007  bouyer Sync with HEAD
 1.3.26.1 27-Mar-2009  msaitoh Pull up following revision(s) (requested by sketch in ticket #536):
etc/Makefile: revision 1.364
Makefile: revision 1.267
usr.sbin/postinstall/postinstall: revision 1.90
usr.bin/hexdump/parse.c: revision 1.25
sys/arch/x86/acpi/genwakecode.sh: revision 1.3
usr.sbin/postinstall/postinstall: revision 1.87
usr.sbin/postinstall/postinstall: revision 1.88
usr.sbin/postinstall/postinstall: revision 1.89
sys/arch/x86/acpi/Makefile.wakecode.inc: revision 1.4
sys/conf/Makefile.kern.inc: revision 1.120
Use ll instead of non-standard q as length modifier in format strings. Makes
this work on Solaris. OK by apb.
Not every grep knows -q. Ok by apb.
Use sed, awk and hexdump from tools to make this work on Solaris. Ok by apb.
Use awk and grep host tools where required. 'build.sh release' now
works on Solaris (but only with HOST_CC=/usr/sfw/bin/gcc for now).
"grep -q" is not portable; use "grep >/dev/null" instead. Also add a
comment saying that postinstal is invoked during a cross build.
In file_exists_exact(), fix an incorrect test of "1" instead of "$1",
and improve the comment explaining what this function does.
As long as we don't yet have a working TOOL_GREP, fgrep is more portablethan grep -F.
 1.3.24.1 19-Jan-2009  skrll Sync with HEAD.
 1.3.16.1 04-May-2009  yamt sync with head.
 1.3.6.2 21-Jan-2008  yamt sync with head
 1.3.6.1 11-Dec-2007  yamt file Makefile.wakecode.inc was added on branch yamt-lazymbuf on 2008-01-21 09:40:05 +0000
 1.3.4.2 09-Jan-2008  matt sync with HEAD
 1.3.4.1 11-Dec-2007  matt file Makefile.wakecode.inc was added on branch matt-armv6 on 2008-01-09 01:49:45 +0000
 1.4.8.1 06-Jun-2011  jruoho Sync with HEAD.
 1.4.6.1 31-May-2011  rmind sync with head
 1.4.2.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.8.16.1 18-May-2014  rmind sync with head
 1.8.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.8.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.84 25-Oct-2020  nia Normalize some machine dependent CPU frequenct sysctl variables.

This moves machdep.*.frequency.* to machdep.cpu.frequency.*.

This was proposed on tech-kern some time ago. The intention is to allow
third-party tools such as estd and conky to more easily and reliably
fetch or modify the current CPU frequency without iterating through
various machine-dependent variables to check their presence.
 1.83 19-Mar-2020  ad PR kern/55080: current does not boot

Back out previous. To be addressed differently.
 1.82 14-Mar-2020  ad Put ACPI idle under ACPICPU_ENABLE_C3 until the wrinkles are ironed out.
This seems well written and basically all good, but currently doesn't enter
a low power state, and imposes a big performance penalty. Proposed on
port-i386 & port-amd64.
 1.81 05-Nov-2019  maxv Add the __nocsan attribute on this function. Races on ci_want_resched are
accepted (part of the design).
 1.80 06-Oct-2019  uwe xc_barrier - convenience function to xc_broadcast() a nop.

Make the intent more clear and also avoid a bunch of (xcfunc_t)nullop
casts that gcc 8 -Wcast-function-type is not happy about.
 1.79 10-Nov-2018  maxv Remove unused cpu_msr.h includes.
 1.78 08-Dec-2016  nat branches: 1.78.14; 1.78.16;
Add a synthesized pc beeper and keyboard bell for platforms with an audio
device.
 1.77 17-Apr-2014  christos branches: 1.77.4; 1.77.8;
CID/1203191: Out of bounds read
 1.76 27-Mar-2014  christos branches: 1.76.2;
correct/add protection against snprintf overflow.
 1.75 11-Dec-2013  msaitoh Make new function named tsc_is_invariant() to avoid code duplication.
The behavior of acpicpu_md_flags() will change on some CPUs because
the detecting code of invariant TSC is replaced with newer code.
 1.74 20-Nov-2013  jruoho Allow 4-bit range for MSR_THERM_CONTROL.
 1.73 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.72 06-Dec-2012  jruoho branches: 1.72.2;
Disable C1E also on K8, if present. From Imre Vadasz <imre@vdsz.com>
in PR install/47224.
 1.71 11-Feb-2012  jruoho branches: 1.71.2; 1.71.6; 1.71.8;
Fix missing case for AMD 0x15.
 1.70 11-Feb-2012  jruoho Add non-XPSS support for AMD family 15h a.k.a. "Bulldozer". Ok releng@.
 1.69 15-Nov-2011  jruoho branches: 1.69.4;
Add support for AMD family 12h. Also revert revision 1.67, as it implies
maintenance burden for limited value. XXX: Need to add family 15h too.
 1.68 18-Oct-2011  jruoho branches: 1.68.2;
Convert to use cpufreq(9).
 1.67 24-Sep-2011  jruoho Try to obtain reliable MHz values for AMD familiesi 10h and 11h.
 1.66 24-Sep-2011  jruoho Be more intelligent; read the MSR_CMPHALT with rdmsr_safe() and set the
C1E-flag based on this. Pointed out by jmcneill@.
 1.65 24-Sep-2011  jruoho As the detection of C1E is not entirely clear-cut, use rdmsr_safe()
when reading the AMD "interrupt pending and CMP-halt register".
 1.64 13-Jul-2011  jruoho Do not disable interrupts at machine-level in the MI idle-loop entry.
 1.63 23-Jun-2011  jruoho Fix bug pointed out by njoly@.
 1.62 22-Jun-2011  jruoho Get rid of RUN_ONCE(9). Should fix PR # kern/44043.
 1.61 12-Jun-2011  jruoho Move the evaluation of the _PDC control method out from the acpicpu(4)
driver to the main acpi(4) stack. Follow Linux and evaluate it early.
Should fix PR port-amd64/42895, possibly also PR kern/42583, and many
other comparable bugs.

A common sense explanation is that Intel supplies additional CPU tables to
OEMs. BIOS writers do not bother to modify their DSDTs, but instead load
these extra tables dynamically as secondary SSDT tables. The actual Load()
happens when the _PDC method is invoked, and thus namespace errors occur
when the CPU-specific ACPI methods are not yet present but referenced in the
AML by various drivers, including, but not limited to, acpitz(4).
 1.60 06-Jun-2011  jruoho When getting the frequency, use APERF/MPERF as a fallback method.
 1.59 31-May-2011  jruoho branches: 1.59.2;
Remove the sanity check that tested the internal consistency of the "FID/VID
algorithm" used by K8. Tested by cegger@. The check is still included in the
original powernow(4) (where possible failures have probably gone unnoticed
because the driver is less noisy).
 1.58 04-Apr-2011  dyoung Neither pci_dma64_available(), pci_probe_device(), pci_mapreg_map(9),
pci_find_rom(), pci_intr_map(9), pci_enumerate_bus(), nor the match
predicate passed to pciide_compat_intr_establish() should ever modify
their pci_attach_args argument, so make their pci_attach_args arguments
const and deal with the fallout throughout the kernel.

For the most part, these changes add a 'const' where there was no
'const' before, however, some drivers and MD code used to modify
pci_attach_args. Now those drivers either copy their pci_attach_args
and modify the copy, or refrain from modifying pci_attach_args:

Xen: according to Manuel Bouyer, writing to pci_attach_args in
pci_intr_map() was a leftover from Xen 2. Probably a bug. I
stopped writing it. I have not tested this change.

siside(4): sis_hostbr_match() needlessly wrote to pci_attach_args.
Probably a bug. I use a temporary variable. I have not tested this
change.

slide(4): sl82c105_chip_map() overwrote the caller's pci_attach_args.
Probably a bug. Use a local pci_attach_args. I have not tested
this change.

viaide(4): via_sata_chip_map() and via_sata_chip_map_new() overwrote the
caller's pci_attach_args. Probably a bug. Make a local copy of the
caller's pci_attach_args and modify the copy. I have not tested
this change.

While I'm here, make pci_mapreg_submap() static.

With these changes in place, I have tested the compilation of these
kernels:

alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-eb NSLU2
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE GUMSTIX
HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321 IXDP425 IXM1200
KUROBOX_PRO LUBBOCK MARVELL_NAS NAPPI SHEEVAPLUG SMDK2800 TEAMASA_NPWR
TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sgimips GENERIC32_IP2x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC

As of Sun Apr 3 15:26:26 CDT 2011, I could not compile these kernels
with or without my patches in place:

### evbmips-el GDIUM

nbmake: nbmake: don't know how to make /home/dyoung/pristine-nbsd/src/sys/arch/mips/mips/softintr.c. Stop

### evbarm-el MPCSA_GENERIC
src/sys/arch/evbarm/conf/MPCSA_GENERIC:318: ds1672rtc*: unknown device `ds1672rtc'

### ia64 GENERIC

/tmp/genassym.28085/assym.c: In function 'f111':
/tmp/genassym.28085/assym.c:67: error: invalid application of 'sizeof' to incomplete type 'struct pcb'
/tmp/genassym.28085/assym.c:76: error: dereferencing pointer to incomplete type

### sgimips GENERIC32_IP3x

crmfb.o: In function `crmfb_attach':
crmfb.c:(.text+0x2304): undefined reference to `ddc_read_edid'
crmfb.c:(.text+0x2304): relocation truncated to fit: R_MIPS_26 against `ddc_read_edid'
crmfb.c:(.text+0x234c): undefined reference to `edid_parse'
crmfb.c:(.text+0x234c): relocation truncated to fit: R_MIPS_26 against `edid_parse'
crmfb.c:(.text+0x2354): undefined reference to `edid_print'
crmfb.c:(.text+0x2354): relocation truncated to fit: R_MIPS_26 against `edid_print'
 1.57 24-Mar-2011  jruoho Reset APERF and MPERF only after interrupts have been enabled.
 1.56 24-Mar-2011  jruoho Remove the "simple CPU lock" that was unnecessary.
Thanks to rmind@ for clarifications.
 1.55 05-Mar-2011  jruoho branches: 1.55.2;
Add __cpu_simple_lock_t. Use it, x86_read_psl(), and x86_disable_intr() to
disable interrupts locally and protect the access to APERF and MPERF. Also
rationalize the MD initialization sequence.
 1.54 05-Mar-2011  jruoho If the P-state control mask is set, do a proper read-modify-write.
 1.53 04-Mar-2011  jruoho Rename a badly named constant. Make it correspond with <x86/specialreg.h>.
 1.52 02-Mar-2011  jruoho Adjust the detection of Turbo Boost to prevent a theoretical array OOB access.
 1.51 02-Mar-2011  jruoho Append Intel's Turbo Boost to the debug printfs if we detect it.
 1.50 01-Mar-2011  jruoho Remove the cross-call from the APERF/MPERF -function.
 1.49 01-Mar-2011  jruoho Move the xcall(9) that does the P- and T-state transformations from the MD
layer to the main code. Makes the caches coherent and provides consistent
vmstat(1) output. This is still not quite right, given that most of the
cross-calls are typically unnecessary with the dependency coordination.
 1.48 27-Feb-2011  jruoho Provide MD wrappers for match and attach.
 1.47 27-Feb-2011  jruoho Claim to support the dependency coordination during the _PDC/_OSC query.
(Although we do not actually support it.) Only after these bits are set,
many Intel-based BIOSes are willing to relinquish the necessary information.
 1.46 25-Feb-2011  jruoho Fix an oversight; the APERF and MPERF counters are per-CPU, so also reset
these by broadcasting to all CPUs with x86_msr_xcall(9).
 1.45 25-Feb-2011  jruoho Add couple of comments.
 1.44 25-Feb-2011  jruoho Also declare support for APERF/MPERF during the BIOS _PDC/_OSC query.
 1.43 25-Feb-2011  jruoho Rename couple of badly named functions for consistency. No functional change.
 1.42 25-Feb-2011  jruoho Add support for APERF and MPERF on AMD processors.
 1.41 25-Feb-2011  jruoho Add preliminary support for the IA32_APERF and IA32_MPERF frequency counters.
These are not yet used for anything and only Intel is supported at the moment.
 1.40 24-Feb-2011  jmcneill add support for Family 14h (AMD Fusion)
 1.39 15-Feb-2011  jruoho Fix and add comments.
 1.38 13-Jan-2011  jruoho branches: 1.38.2; 1.38.4;
Move the function that counts the CPUs from acpicpu(4) to the MD layer.
 1.37 30-Dec-2010  jruoho Add an additional assertion for the control MSR address.
 1.36 30-Nov-2010  jruoho Fix boolean brain freeze.
 1.35 30-Nov-2010  jruoho Add AMD C1E quirk. Tested by cegger@.

(a) This should be removed once C-states are supported.

(b) As there seems to be no reliable way to detect whether C1E is present,
the quirk blindly assumes that C1E is used on families 10h and 11h.
 1.34 25-Aug-2010  jruoho branches: 1.34.2;
Add definitions for Intel Digital Thermal Sensor and Power Management, at
CPUID Fn0000_0006, %eax, %ecx. Use these instead of magic numbers.
 1.33 24-Aug-2010  jruoho As all reported P-state failures so far have centered around the status-
check (today it was christos@' laptop), follow Linux and disable this rather
expensive sanity-check for the time being. A hypothesis about the cause of
the failures relates to the absence of cross-CPU coordination in the current
implementation.
 1.32 24-Aug-2010  jruoho Add native support for AMD family 0Fh processors. This is the furthest we
will go backwards; K7 will not be supported already due doubts about
availability and reliability of ACPI during that era. Some unfortunate code
duplication is present (but not overly much). Thanks to cegger@ and jakllsch@
for patiently testing this.
 1.31 23-Aug-2010  jruoho Other entry points beyond x86_cpu_idle_halt() may use HLT as the
idle-mechanism. Send an IPI also for these in cpu_need_resched().
 1.30 22-Aug-2010  jruoho Still DELAY(9) a little even when we do not do the status-check.
 1.29 21-Aug-2010  jruoho After discussion with jakllsch@ and jmcneill@, revert the previous and only
do the status-check when the comparison value reported by BIOS is not zero.
The uncertainty noted in the previous commit still applies. But if we ever
see a timeout again, it will likely be either a firmware bug or a special
case like the Intel Turbo Boost.
 1.28 21-Aug-2010  jruoho When we do the sanity check that a P- or T-state transition was successful,
compare also against the control-field. There appears to be many BIOSes in
the field that report a zero value in the status-field. It is unclear whether
this should be taken as a hint that the status-check is not necessary also
during P-state transitions. If we still see timeouts (EAGAIN), this should
be reverted and the status-check should be bypassed if ps->ps_status is 0.
 1.27 21-Aug-2010  jruoho Use an inverse logic when filling the (X)PSS structures -- if we know
the addresses, we trust ourselves more than a random BIOS in the field.
 1.26 21-Aug-2010  jruoho Add a comment.
 1.25 21-Aug-2010  jruoho Check from CPUID 0x06 %eax (on Intel) whether we might actually have an
invariant APIC timer or an "ARAT" ("always running APIC timer"). This means
that the APIC timer may keep ticking at the same rate also in deep C-states
with some new or forthcoming Intel CPUs.
 1.24 21-Aug-2010  jruoho Add a quirk for Turbo Boost.

It was observed that at least Sverre Froyen's ThinkPad T500 reports values
that do not match readings from the IA32_PERF_STATUS register. This only
applied to the P0-state. Thus, for now, skip the status check if Turbo
Boost has been detected and the requested state is P0.

This needs to be revisited once Turbo Boost actually works in NetBSD. It is
unclear whether this is a BIOS flaw or not; these values may well be what we
get from IA32_PERF_STATUS once the CPU actually uses the +133.33 MHz boost.
 1.23 21-Aug-2010  jruoho Detect Intel's Turbo Boost and presence of IA32_APERF/IA32_MPERF. The former
is required for a quirk, and the latter is needed for hardware P-state
coordination (once acpicpu(4) will support fine-grained coordination).
 1.22 21-Aug-2010  jruoho Detect whether TSC is invariant, which may be the case on both new AMD and
Intel processors. The invariance means that TSC runs at a constant rate
during all ACPI state changes. If it is variant, skew may occur and TSC is
generally unsuitable for wall clock services. This is especially relevant
with C-states; with variant TSC, the whole counter may be stopped with states
larger than C1. All x86 CPUs before circa mid-2000s can be assumed to have a
variant time stamp counter.
 1.21 21-Aug-2010  jruoho Properly detect AMD hardware P-state support. Also detect "core boost" (only
present in some models of family 10h).
 1.20 20-Aug-2010  jruoho Check if SpeedStep is enabled. If it is disabled, try to enable it.
 1.19 20-Aug-2010  jruoho Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.18 19-Aug-2010  jruoho Properly calculate the AMD CPU family.
 1.17 19-Aug-2010  jruoho Add native P-state support for AMD family 10h and 11h processors. Both are
supported irrespective of XPSS. Family 10h tested by jakllsch@.
 1.16 19-Aug-2010  jmcneill VIA CPUs can have EST as well, so treat them the same as Intel
 1.15 18-Aug-2010  jruoho Use the idea from cegger@ and fill the (X)PSS structure during initialization.
 1.14 18-Aug-2010  jruoho Check the status of P- and T-state transformations on all CPUs. This is
still not ideal, as ACPI gives us information about "cross logical processor
dependencies". For instance, a single MSR call on one CPU may cause all other
CPUs in the same domain to follow the state shift. Thus, rather than using
xc_broadcast(9), we should xc_unicast(9) on per-domain or per-CPU-set basis.
 1.13 18-Aug-2010  jruoho Add MD support for the vendor-independent extended PSS. Some conforming AMD
systems are known to work. Alas, not all of them. We still need to deal with
the variety of different PowerNow! revisions.
 1.12 14-Aug-2010  jruoho branches: 1.12.2;
Move the PIIX4-quirk to the MD file and disable T-states for PIIX4.
 1.11 13-Aug-2010  jruoho Remove some unnecessary locking. Mainly a leftover from previous revisions
where the dynamic maximum/minimum was used also when retrieving the current
state. The state-array itself changes only in C-states.
 1.10 13-Aug-2010  jruoho Merge T-state a.k.a. throttling support for acpicpu(4).

Remarks:

1. Native instructions are supported only on Intel. Native support for
other x86 vendors will be investigated. By assumption, AMD and others
use the I/O based approach.

2. The existing code, INTEL_ONDEMAND_CLOCKMOD, must be disabled in
order to use acpicpu(4). Otherwise fatal MSR races may occur.
Unlike with P-states, no attempt is done to disable the existing
implementation.

3. There is no rationale to export controls to user land.

4. Throttling is an artefact from the past. T-states will not be used for
power management per se. For CPU frequency management, P-states are
preferred in all circumstances. No noticeable additional power savings
were observed in various experiments. When the system has been scaled
to the highest (i.e. lowest power) P-state, it is preferable to move
from C0 to deeper C-states than it is to actively throttle the CPU.

5. But T-states need to be implemented for passive cooling via acpitz(4).
As specified by ACPI and Intel documents, these can be used as the
last line of defence against critical thermal conditions. Support
for this will be added later.
 1.9 09-Aug-2010  jruoho branches: 1.9.2;
Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
 1.8 09-Aug-2010  jruoho Remove a redundant function.
 1.7 09-Aug-2010  jruoho When retrieving the current frequency, scan all available P-states.
Only use the dynamic maximum when setting a frequency.
 1.6 09-Aug-2010  jruoho Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.
 1.5 08-Aug-2010  jruoho Merge P-state support for acpicpu(4).

Remarks:

1. All processors (x86 or not) for which the vendor has implemented
ACPI I/O access routines are supported. Native instructions are
currently supported only for Intel's "Enhanced Speedstep". Code for
"PowerNow!" (AMD) will be merged later. Native support for VIA's
"PowerSaver" will be investigated.

2. Backwards compatibility with existing userland code is maintained.
Comparable to the case with cpu_idle(9), the ACPI CPU driver
installs alternative functions for the existing sysctl(8) controls.
The "native" behavior (if any) is restored upon detachment.

3. The dynamic nature of ACPI-provided P-states needs more investigation.
The maximum frequency induced (but not forced) by the firmware may
change dynamically. Currently, the sysctl(8) controls error out with
a value larger than the dynamic maximum. The code itself does not
however yet react to the notifications from the firmware by changing
the frequencies in-place. Presumably the system administrator should
be able to choose whether to use dynamic or static frequencies.
 1.4 04-Aug-2010  jruoho Run a xcall(9) to ensure that all CPUs are out from the ACPI idle-loop
before detachment.
 1.3 23-Jul-2010  jruoho Make sure we use MWAIT with MONITOR.

Also clarify when we have interrupts disabled.
 1.2 18-Jul-2010  jruoho Add missing CVS identifiers.
 1.1 18-Jul-2010  jruoho Merge a driver for ACPI CPUs with basic support for processor power states,
also known as C-states. The code is modular and provides an easy way to add
the remaining functionality later (namely throttling and P-states).

Remarks:

1. Commented out in the GENERICs; more testing exposure is needed.

2. The C3-state is disabled for the time being because it turns off
timers, among them the local APIC timer. This may not be universally
true on all x86 processors; define ACPICPU_ENABLE_C3 to test.

3. The algorithm used to choose a power state may need tuning. When
evaluating the appropriate state, the implementation uses the
previous sleep time as an indicator. Additional hints would include
for example the system load.

Also bus master activity is evaluated when choosing a state. The
usb(4) stack is notorious for such activity even when unused.
Typically it must be disabled in order to reach the C3-state,
but it may also prevent the use of C2.

4. While no extensive empirical measurements have been carried out, the
power savings are somewhere between 1-2 W with C1 and C2, depending
on the processor, firmware, and load. With C3 even up to 4 W can be
saved. The less something ticks, the more power is saved.

ok jmcneill@, joerg@, and discussed with various people.
 1.9.2.3 09-Oct-2010  yamt sync with head
 1.9.2.2 11-Aug-2010  yamt sync with head.
 1.9.2.1 09-Aug-2010  yamt file acpi_cpu_md.c was added on branch yamt-nfs-mp on 2010-08-11 22:52:54 +0000
 1.12.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.12.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.12.2.1 14-Aug-2010  uebayasi file acpi_cpu_md.c was added on branch uebayasi-xip on 2010-08-17 06:45:29 +0000
 1.34.2.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.34.2.5 02-May-2011  jym Sync with head.
 1.34.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.34.2.3 10-Jan-2011  jym Sync with HEAD
 1.34.2.2 24-Oct-2010  jym Sync with HEAD
 1.34.2.1 25-Aug-2010  jym file acpi_cpu_md.c was added on branch jym-xensuspend on 2010-10-24 22:48:16 +0000
 1.38.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.38.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.38.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.55.2.5 12-Jun-2011  rmind sync with head
 1.55.2.4 21-Apr-2011  rmind sync with head
 1.55.2.3 06-Mar-2011  rmind sync with head (and fix few botches with this)
 1.55.2.2 05-Mar-2011  rmind sync with head
 1.55.2.1 05-Mar-2011  rmind file acpi_cpu_md.c was added on branch rmind-uvmplock on 2011-03-05 20:52:27 +0000
 1.59.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.68.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.68.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.68.2.1 17-Apr-2012  yamt sync with head
 1.69.4.1 18-Feb-2012  mrg merge to -current.
 1.71.8.1 13-Dec-2012  riz Pull up following revision(s) (requested by jruoho in ticket #741):
sys/arch/x86/acpi/acpi_cpu_md.c: revision 1.72
Disable C1E also on K8, if present. From Imre Vadasz <imre@vdsz.com>
in PR install/47224.
 1.71.6.3 03-Dec-2017  jdolecek update from HEAD
 1.71.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.71.6.1 25-Feb-2013  tls resync with head
 1.71.2.2 25-Nov-2013  bouyer Pull up following revision(s) (requested by jruoho in ticket #987):
sys/arch/x86/acpi/acpi_cpu_md.c: revision 1.74
sys/dev/acpi/acpi_cpu_tstate.c: revision 1.32
As discussed with bouyer@, fix a too eager T-state validation check to
accomodate new Intel CPUs.
Allow 4-bit range for MSR_THERM_CONTROL.
 1.71.2.1 13-Dec-2012  riz Pull up following revision(s) (requested by jruoho in ticket #741):
sys/arch/x86/acpi/acpi_cpu_md.c: revision 1.72
Disable C1E also on K8, if present. From Imre Vadasz <imre@vdsz.com>
in PR install/47224.
 1.72.2.1 18-May-2014  rmind sync with head
 1.76.2.1 10-Aug-2014  tls Rebase.
 1.77.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.77.4.1 05-Feb-2017  skrll Sync with HEAD
 1.78.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.78.16.1 10-Jun-2019  christos Sync with HEAD
 1.78.14.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.40 06-Oct-2025  riastradh x86: Wire up PCI resource manager if enabled.

Enable in your kernel config with `options PCI_RESOURCE'.

Adapted from a patch by mlelstv@.

PR port-amd64/59118: Thinkpad T495s - iwm PCI BAR is zero
 1.39 30-Apr-2025  imil branches: 1.39.2;
Introduce pvh_boot boolean to identify the real hypervisor when booting in PVH
mode.

As of now, sys/arch/x86/x86/identcpu.c / identify_hypervisor() returns in the
case of vm_guest being VM_GUEST_GENPVH, yet this VM type is not an actual
hypervisor but an information recorded in locore.S to drive boot method.
We need to investigate what type of hypervisor is really running the VM in
order to apply specifics, so instead of relying on vm_guest_is_pvh() which only
checks for VM_GUEST_XENPVH || VM_GUEST_GENPVH, pvh_boot informs on the boot
method while allowing to identify the real hypervisor.

Idea ok'd by bouyer@, tested on Xen domU, Xen dom0 with GENERIC PVH and
qemu GENERIC PVH boot.
 1.38 06-Dec-2024  bouyer Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.37 02-Dec-2024  bouyer Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.
 1.36 16-Oct-2023  bouyer branches: 1.36.6;
Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.35 24-Jan-2023  riastradh x86/acpi/acpi_machdep.c: Nix trailing whitespace.

No functional change intended.
 1.34 28-Oct-2022  riastradh branches: 1.34.2;
x86/acpi: Mark acpica interrupt handlers MP-safe.

acpica has its own internal locking, and the interrupt handlers we
install with AcpiInstall*Handler (gpe, notify, &c.) also have their
own locking.
 1.33 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.32 12-May-2021  thorpej - Define a device call for PCI bus instances to fetch a direct child's
device handle given the device's device/function #s (extracted from
a pcitag_t). Use it to associate the handle with the child device
at config_found() time.
- Implement this device call for ACPI and OpenFirmware.
- Enable the OpenFirmware variant for evbarm FDT, macppc, ofppc, sparc64.
- Obsolete acpi_device_register(); it is no longer needed.
- Obsolete setting the OpenFirmware handle in PCI devices in the
sparc64 device_register(); it is no longer needed.
 1.31 04-Feb-2021  thorpej branches: 1.31.4; 1.31.6;
Call acpi_device_register() as appropriate.
 1.30 02-May-2020  bouyer branches: 1.30.2;
Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.29 22-Dec-2019  thorpej Add acpi_intr_mask() and acpi_intr_unmask() which, following the pre-existing
ACPI software layering model, are wrappers around acpi_md_intr_mask() and
acpi_md_intr_unmask(), which in turn are wrappers around intr_mask() and
intr_unmask().

XXX ARM and IA64 implementations of acpi_md_intr_mask() and
acpi_md_intr_unmask() are just stubs for now.
 1.28 12-Sep-2019  martin Cast physical addresses via uintptr_t to ACPI_PHYSICAL_ADDRESS to deal
with all size variants of the types used here in different builds.
Patch from manu@.
 1.27 12-Sep-2019  manu Attempt to obtain ACPI RSDP from the hypervisor for Xen PV

There are three possible way of obtaining the ACPI RSDP
- From Extended BIOS Data Area (EBDA) when kernel or Xen was booted from
BIOS bootstrap
- From EFI SystemTable when kernel is booted from EFI bootstrap
- When Xen is booted from EFI bootstrap, EBDA is not mapped, and EFI
SystemTable is not passed to the kernel. The only way to go is to
obtain ACPI RSDP trhough an hypercall.

Note: EFI bootstrap support for booting Xen has not yet been committed.
 1.26 01-May-2019  mlelstv branches: 1.26.2;
Handle ISA/EISA interrupts like isa_machdep.c.
 1.25 09-Mar-2019  kre In acpi_md_OsRemoveInterruptHandler() redir and mpflags are only
relevant to the NIOAPIC > 0 case (not used without that). Rearrange
#if's slightly to make that happen (avoid "set but not used" warnings
(aka errors) when NIOAPIC == 0 (or undefined)).
 1.24 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.23 03-Mar-2019  maxv Fix bug, PG_W is 'wired', not 'writable'.
 1.22 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.21 22-Nov-2018  jmcneill Apply MADT interrupt source overrides to interrupts established via
acpi_md_intr_establish.
 1.20 16-Nov-2018  jmcneill Add MD functions for establishing and disestablishing interrupt handlers.
 1.19 20-Mar-2018  bouyer branches: 1.19.2;
Allow registering ACPI interrupt handlers with a xname.
AcpiOsInstallInterruptHandler(), part of ACPICA API, doesn't allow passing
the xname. I extend the API with AcpiOsInstallInterruptHandler_xname()
for this purpose, and change acpi_md_OsInstallInterruptHandler() to
accept and use the xname (ia64 doens't use it).
The xname was hardcoded to "acpi SCI" in the
x86 acpi_md_OsInstallInterruptHandler(), so I make
AcpiOsInstallInterruptHandler() call
AcpiOsInstallInterruptHandler_xname with xname = "acpi SCI".

Now 'vmstat -i' shows the device's name instead of "acpi SCI" for for i2c HID
interrupts.

Proposed on tech-kern@ on Dec 29.
 1.18 14-Feb-2017  nonaka branches: 1.18.6; 1.18.12;
Handle persistent memory. Currently only debug output.
 1.17 14-Feb-2017  nonaka x86: make btinfo_memmap from btinfo_efimemmap for to reduce mem_cluster_cnt.

should fix PR/51953.
 1.16 09-Feb-2017  nonaka efi_md::md_virt always uses uint64_t.
 1.15 24-Jan-2017  nonaka Initial commit of native amd64 EFI boot loader.
 1.14 15-Oct-2016  jdolecek branches: 1.14.2;
provide intr xname
 1.13 21-Sep-2016  jmcneill Set hw.acpi.sleep.vbios when a non-HW accelerated VGA driver attaches.
If the VGA_POST option is present in the kernel the default value is 2,
otherwise 1. PR kern/50781

Reviewed by: agc, mrg
 1.12 28-Jan-2016  htodd branches: 1.12.2;
Fix build break.
 1.11 28-Jan-2016  christos Add support for grub to find the ACPI root table pointer via a bootinfo entry
from grub.
From: https://mail-index.netbsd.org/tech-kern/2014/05/22/msg017119.html
 1.10 06-Oct-2015  christos CID/1325751: Avoid possible 32 bit overflow.
 1.9 02-Oct-2015  msaitoh PCI Extended Configuration stuff written by nonaka@:
- Add PCI Extended Configuration Space support into x86.
- Check register offset of pci_conf_read() in MD part. It returns (pcireg_t)-1
if it isn't accessible.
- Decode Extended Capability in PCI Extended Configuration Space.
Currently the following extended capabilities are decoded:
- Advanced Error Reporting
- Virtual Channel
- Device Serial Number
- Power Budgeting
- Root Complex Link Declaration
- Root Complex Event Collector Association
- Access Control Services
- Alternative Routing-ID Interpretation
- Address Translation Services
- Single Root IO Virtualization
- Page Request
- TPH Requester
- Latency Tolerance Reporting
- Secondary PCI Express
- Process Address Space ID
- LN Requester
- L1 PM Substates
The following extended capabilities are not decoded yet:
- Root Complex Internal Link Control
- Multi-Function Virtual Channel
- RCRB Header
- Vendor Unique
- Configuration Access Correction
- Multiple Root IO Virtualization
- Multicast
- Resizable BAR
- Dynamic Power Allocation
- Protocol Multiplexing
- Downstream Port Containment
- Precision Time Management
- M-PCIe
- Function Reading Status Queueing
- Readiness Time Reporting
- Designated Vendor-Specific
 1.8 12-May-2014  joerg branches: 1.8.4;
acpi_md_findoverride is only used when NIOAPIC > 0, so don't provide it
otherwise.
 1.7 06-Oct-2013  jakllsch branches: 1.7.2;
Correct acpi_md_OsWritable() logic so that it can return TRUE.
From Masanori Kanaoka in PR 47571.
 1.6 31-Mar-2013  chs branches: 1.6.4;
yet more fixes for PR 47648 / PR 47016:
when using a temporary mp_intr_map, initialize the "flags" field
as well as "redir" since apic_set_redir() uses both. fix how
the flags field is change when applying an override, the trigger
and polarity sub-fields aren't just one bit like they are in redir.
 1.5 25-Mar-2013  chs redo the ACPI interrupt handler setup again, this time handling
MADT overrides that change the pin as well as the polarity.
fixes PR 47648.
 1.4 23-Sep-2012  chs locate PCI buses and determine their bus numbers using the info
previously extracted from ACPICA rather than trying to figure it out again.
allow PCI buses that don't have a _PRT method.
 1.3 30-Jan-2012  rmind branches: 1.3.2; 1.3.6;
acpi_md_ncpus: use kcpuset_attached instead.
 1.2 01-Jul-2011  dyoung branches: 1.2.2; 1.2.4; 1.2.8;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.1 12-Jun-2011  jruoho branches: 1.1.2;
Follow IA-64 with the x86-specific ACPI MD functions and move these where
they belong to. Remove an unused function. Minor KNF. No functional change.
 1.1.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.1.2.1 12-Jun-2011  cherry file acpi_machdep.c was added on branch cherry-xenmp on 2011-06-23 14:19:47 +0000
 1.2.8.1 18-Feb-2012  mrg merge to -current.
 1.2.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.4.2 30-Oct-2012  yamt sync with head
 1.2.4.1 17-Apr-2012  yamt sync with head
 1.2.2.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.2.2.1 01-Jul-2011  jym file acpi_machdep.c was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.3.6.4 03-Dec-2017  jdolecek update from HEAD
 1.3.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.6.2 23-Jun-2013  tls resync from head
 1.3.6.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.3.2.2 31-Mar-2013  riz Pull up following revision(s) (requested by chs in ticket #855):
sys/arch/x86/acpi/acpi_machdep.c: revision 1.5
sys/arch/x86/acpi/acpi_machdep.c: revision 1.6
sys/arch/x86/x86/mpacpi.c: revision 1.97
redo the ACPI interrupt handler setup again, this time handling
MADT overrides that change the pin as well as the polarity.
fixes PR 47648.
yet more fixes for PR 47648 / PR 47016:
when using a temporary mp_intr_map, initialize the "flags" field
as well as "redir" since apic_set_redir() uses both. fix how
the flags field is change when applying an override, the trigger
and polarity sub-fields aren't just one bit like they are in redir.
 1.3.2.1 22-Nov-2012  riz Pull up following revision(s) (requested by chs in ticket #683):
sys/arch/ia64/include/acpi_machdep.h: revision 1.6
sys/arch/x86/include/acpi_machdep.h: revision 1.11
sys/dev/acpi/acpi.c: revision 1.255
sys/arch/x86/acpi/acpi_machdep.c: revision 1.4
sys/arch/x86/x86/mpacpi.c: revision 1.95
sys/arch/x86/x86/mpacpi.c: revision 1.96
sys/arch/ia64/acpi/acpi_machdep.c: revision 1.6
locate PCI buses and determine their bus numbers using the info
previously extracted from ACPICA rather than trying to figure it out again.
allow PCI buses that don't have a _PRT method.
as a workaround for PR 47016, call ioapic_reenable() at the end of
ACPI interrupt routing to fix the settings for the SCI interrupt.
the problem is that after my recent changes, the SCI handler is
installed before the MADT info is parsed, so we don't know what
polarity it should have. the real fix for this will be to rearrange
the ACPI initialization so that everything is done in a more sensible
order, but that will take some more time.
 1.6.4.1 18-May-2014  rmind sync with head
 1.7.2.1 10-Aug-2014  tls Rebase.
 1.8.4.6 28-Aug-2017  skrll Sync with HEAD
 1.8.4.5 05-Feb-2017  skrll Sync with HEAD
 1.8.4.4 05-Dec-2016  skrll Sync with HEAD
 1.8.4.3 05-Oct-2016  skrll Sync with HEAD
 1.8.4.2 19-Mar-2016  skrll Sync with HEAD
 1.8.4.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.12.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.12.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.14.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.18.12.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.18.12.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.18.6.2 23-Sep-2019  martin Apply patch, requested by manu in ticket #1380: add EFI specific guids
here locally for XEN (solved differently in HEAD by including more efi
support code in XEN kernels for PVHVM).
 1.18.6.1 18-Sep-2019  martin Pull up following revision(s) (requested by manu in ticket #1380):

sys/arch/x86/acpi/acpi_machdep.c: revision 1.27,1.28 (patch)

Attempt to obtain ACPI RSDP from the hypervisor for Xen PV
There are three possible way of obtaining the ACPI RSDP

- From Extended BIOS Data Area (EBDA) when kernel or Xen was booted from
BIOS bootstrap
- From EFI SystemTable when kernel is booted from EFI bootstrap
- When Xen is booted from EFI bootstrap, EBDA is not mapped, and EFI
SystemTable is not passed to the kernel. The only way to go is to
obtain ACPI RSDP trhough an hypercall.

Note: EFI bootstrap support for booting Xen has not yet been committed.

Cast physical addresses via uintptr_t to ACPI_PHYSICAL_ADDRESS to deal
with all size variants of the types used here in different builds.
 1.19.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.19.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.19.2.1 10-Jun-2019  christos Sync with HEAD
 1.26.2.1 17-Sep-2019  martin Pull up following revision(s) (requested by manu in ticket #204):

sys/arch/x86/acpi/acpi_machdep.c: revision 1.27
sys/arch/x86/acpi/acpi_machdep.c: revision 1.28

Attempt to obtain ACPI RSDP from the hypervisor for Xen PV

There are three possible way of obtaining the ACPI RSDP

- From Extended BIOS Data Area (EBDA) when kernel or Xen was booted from
BIOS bootstrap
- From EFI SystemTable when kernel is booted from EFI bootstrap
- When Xen is booted from EFI bootstrap, EBDA is not mapped, and EFI
SystemTable is not passed to the kernel. The only way to go is to
obtain ACPI RSDP trhough an hypercall.

Note: EFI bootstrap support for booting Xen has not yet been committed.

Cast physical addresses via uintptr_t to ACPI_PHYSICAL_ADDRESS to deal
with all size variants of the types used here in different builds.
 1.30.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.31.6.1 31-May-2021  cjep sync with head
 1.31.4.1 13-May-2021  thorpej Sync with HEAD.
 1.34.2.2 29-Mar-2025  martin Pull up following revision(s) (requested by imil in ticket #1074):

sys/arch/x86/x86/x86_machdep.c: revision 1.155
sys/arch/x86/include/cpu.h: revision 1.137
sys/arch/x86/x86/x86_machdep.c: revision 1.156
sys/arch/x86/include/cpu.h: revision 1.138
sys/arch/x86/x86/consinit.c: revision 1.40
sys/arch/x86/acpi/acpi_machdep.c: revision 1.37
sys/arch/x86/acpi/acpi_machdep.c: revision 1.38
sys/arch/amd64/amd64/machdep.c: revision 1.370
sys/arch/xen/xen/hypervisor.c: revision 1.97
sys/arch/xen/xen/hypervisor.c: revision 1.98
sys/arch/amd64/amd64/genassym.cf: revision 1.98
sys/arch/x86/x86/x86_autoconf.c: revision 1.88
sys/arch/x86/x86/x86_autoconf.c: revision 1.89
sys/arch/amd64/amd64/locore.S: revision 1.226
sys/arch/amd64/amd64/locore.S: revision 1.227
sys/arch/x86/x86/identcpu.c: revision 1.131

Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.

Get one more change from PR kern/57813, needed for non-Xen PVH.

Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.34.2.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #425):

sys/arch/x86/pci/pci_machdep.c: revision 1.96
sys/arch/x86/acpi/acpi_machdep.c: revision 1.36
sys/arch/x86/x86/hyperv.c: revision 1.16
sys/arch/x86/x86/genfb_machdep.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.56
sys/arch/x86/include/genfb_machdep.h: revision 1.6

Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.36.6.1 02-Aug-2025  perseant Sync with HEAD
 1.39.2.1 20-Oct-2025  martin Pull up following revision(s) (requested by riastradh in ticket #66):

sys/arch/x86/include/mpacpi.h: revision 1.12
sys/arch/x86/x86/mpacpi.c: revision 1.112
sys/arch/amd64/conf/ALL: revision 1.194
sys/arch/i386/conf/ALL: revision 1.524
sys/arch/x86/acpi/acpi_machdep.c: revision 1.40
sys/arch/i386/conf/GENERIC: revision 1.1261
sys/dev/acpi/acpi_mcfg.h: revision 1.6
sys/arch/amd64/conf/GENERIC: revision 1.618

x86: Wire up PCI resource manager if enabled.

Enable in your kernel config with `options PCI_RESOURCE'.

Adapted from a patch by mlelstv@.
PR port-amd64/59118: Thinkpad T495s - iwm PCI BAR is zero
 1.2 20-Jun-2011  jruoho branches: 1.2.2; 1.2.4;
Use acpi_match_cpu_handle() from acpi_util.c and only evaluate
the _PDC control method for CPUs that are enabled in the MADT.
 1.1 12-Jun-2011  jruoho Move the evaluation of the _PDC control method out from the acpicpu(4)
driver to the main acpi(4) stack. Follow Linux and evaluate it early.
Should fix PR port-amd64/42895, possibly also PR kern/42583, and many
other comparable bugs.

A common sense explanation is that Intel supplies additional CPU tables to
OEMs. BIOS writers do not bother to modify their DSDTs, but instead load
these extra tables dynamically as secondary SSDT tables. The actual Load()
happens when the _PDC method is invoked, and thus namespace errors occur
when the CPU-specific ACPI methods are not yet present but referenced in the
AML by various drivers, including, but not limited to, acpitz(4).
 1.2.4.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.2.4.1 20-Jun-2011  jym file acpi_pdc.c was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.2.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.2.2.1 20-Jun-2011  cherry file acpi_pdc.c was added on branch cherry-xenmp on 2011-06-23 14:19:47 +0000
 1.57 19-Oct-2023  bouyer Move definition of acpi_md_vesa_modenum to acpi_wakeup.c; allows building
kernels without framebuffer devices.
Problem reported by John D. Baker on current-users@
 1.56 16-Oct-2023  bouyer Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.55 25-Aug-2023  riastradh xen: Provide definitions or ifdefs to make drm build in XEN3_DOM0.

No idea if it works, but it builds now.

PR port-xen/49330
 1.54 01-Jun-2021  riastradh branches: 1.54.12;
x86: Reset cached tsc in every lwp to 0 on suspend/resume.

This avoids spuriously warning about tsc going backwards, which is to
be expected after a suspend/resume cycle.
 1.53 21-May-2020  ad branches: 1.53.6;
- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.52 22-Feb-2020  chs remove some unnecessary includes of internal UVM headers.
 1.51 12-Oct-2019  maxv branches: 1.51.2;
Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.50 17-Jun-2019  jmcneill The second parameter to AcpiSetFirmwareWakingVector sets the
X_Firmware_Waking_Vector field (where supported), which will cause firmware
to resume in protected mode. Since our wake code assumes real mode, always
set X_Firmware_Waking_Vector to 0.
 1.49 23-Sep-2017  maxv branches: 1.49.4;
Initialize the errata MSRs when waking up, otherwise they are clear and
we're re-enabling certain CPU bugs.
 1.48 23-Sep-2017  maxv Reinitialize the PAT MSR when waking up, otherwise the write-combined
pages become write-through.
 1.47 19-Sep-2017  maya Remove unused macro
 1.46 10-Aug-2017  maxv Save and restore xcr0 when doing ACPI sleeps. Should fix PR/49174.
 1.45 20-Oct-2016  maxv branches: 1.45.8;
There is a huge fpu synchronization issue here.

When the remote CPUs receive the ACPI sleep IPI, they do not save the fpu
state of the lwp they are executing. The problem is, when waking up they
reinitialize the registers of their local fpu and go back to their lwp
directly. Therefore, if an lwp is interrupted while storing data in an fpu
register, that data gets overwritten, which basically means the lwp is
likely to go crazy when resuming execution.

Fix this by simply saving the fpu state correctly. This way when going to
sleep the state is stored in the lwp's pcb and CR0_TS is set, so the next
time the lwp wants to use the fpu we'll get a dna, and the state will be
restored as expected.

While here, don't forget to reenable interrupts (and the spl) if an error
occurs.
 1.44 20-Oct-2016  maxv Reload the MSRs on the original cpu on i386 - looks like I forgot this part
in my rev1.41. Technically it does not change anything, since the only MSR
is NOX and it is already reloaded in the trampoline.
 1.43 07-Oct-2016  skrll Don't include sys/cdefs.h and __KERNEL_RSCID twice... once is enough.
 1.42 20-Sep-2016  maya use a value of hw.acpi.sleep.vbios that might actually
work for any real hardware suspend.

stop dragging feet through the ground in PR kern/50781
 1.41 27-Jul-2016  maxv Call cpu_init_msrs on i386 when waking up. Currently it does not change
anything, since MSR_EFER is already enabled earlier. But if we add new
MSRs in the future, we will want them when waking up as well.
 1.40 24-Jul-2016  maxv The MSR EFER state is not saved and restored when sleeping on i386. On PAE,
the CPU crashes right after waking up, since it needs to access NOX-ed
pages, which are to be enabled in an MSR.

Fix this by properly saving and restoring the EFER MSR. It's a little
tricky since the wakeup code uses %edx, but rdmsr overwrites it. We just
save it in %esi.

Now, the CPU sleeps properly on PAE kernels.
 1.39 18-Aug-2015  christos branches: 1.39.2;
dup the argument of the wakeup vector. XXX: is that correct?
 1.38 25-Feb-2014  pooka branches: 1.38.6;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.37 19-Feb-2014  dsl Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.
 1.36 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.35 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.34 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.33 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.32 26-Aug-2012  jakllsch branches: 1.32.2; 1.32.4;
It turns out we're actually waiting for other processors to be unbusy, not busy.
Unbreaks ACPI suspend on uniprocessor. Probably fixes unnoticed bugs on MP.
Needs pullup to netbsd-6.
 1.31 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.30 10-Apr-2012  jruoho Now that 6.0 is branched, remove the ACPI-related sysctl nodes in machdep.
 1.29 01-Jul-2011  dyoung branches: 1.29.2; 1.29.6; 1.29.8;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.28 16-Feb-2011  jruoho Explicitly re-enable the SCI interrupt when the wakeup starts (and before
interrupts are enabled). A workaround for a BIOS bug. Fixes the interrupt
storm reported by Taylor R. Campbell in PR # 44581.
 1.27 13-Jan-2011  jruoho branches: 1.27.2; 1.27.4;
Add a comment.
 1.26 31-Dec-2010  jruoho Move the ACPI sleep-specific sysctl variables to hw.acpi.sleep. The old
machdep-variables are provided for backwards compatibility (eventually these
should be removed). All ACPI sysctl variables are now under hw.acpi.
 1.25 29-Jul-2010  jruoho Remove the custom enter_s4_with_bios(). Use ACPICA's native
AcpiEnterSleepStateS4bios() instead. Minimum functional change.

ok jmcneill@
 1.24 28-Jul-2010  jruoho Use acpi_eval_set_integer(), KNF. No functional change.
 1.23 14-Apr-2010  jruoho UINT32 -> uint32_t; UINT8 -> uint8_t.
 1.22 11-Apr-2010  jruoho Use CTLTYPE_BOOL.
 1.21 28-Feb-2010  jruoho branches: 1.21.2;
Use native functions instead of polluting the namespace with ACPICA-macros.
 1.20 07-Nov-2009  cegger branches: 1.20.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.19 26-Oct-2009  cegger kill extra whitespaces
reviewed by tsutsui@
 1.18 02-Sep-2009  joerg Be a bit more noisy by telling the user VGA_POST is missing in the
kernel config when trying machdep.acpi_vbios_reset=2.
 1.17 02-Sep-2009  joerg Don't allow machdep.acpi_vbios_reset=2 if option VGA_POST is missing.
 1.16 24-Aug-2009  jmcneill Pass the VBE mode number from the bootloader to the kernel, and then
make the ACPI wakecode aware of it. Restore the desired VBE mode on resume
when acpi_vbios_reset=1, so suspend/resume with genfb console will work.
 1.15 18-Aug-2009  jmcneill Switch to ACPICA 20090730, and update for API changes.
 1.14 27-Mar-2009  drochner Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)
 1.13 18-Mar-2009  cegger bcopy -> memcpy
 1.12 26-Feb-2009  drochner sync TSC on resume (because CPUs were switched off in the meantime),
otherwise we get diverging timecounters leading to eg the monotonic
clock jump backwards
(pullup candidate)
 1.11 17-Nov-2008  joerg branches: 1.11.4;
On resum-from-RAM explicitly restore PCI link device state before
reenabling interrupts. At least one BIOS doesn't do this automatically
as reported by Christoph Egger.
 1.10 23-Sep-2008  joerg branches: 1.10.2; 1.10.4;
Explicitly disable all GPEs and clear fixed events before enabling
interrupts. This is the first part of PR 38683.
 1.9 19-Sep-2008  jmcneill Revert previous.
 1.8 10-Sep-2008  jmcneill PR# 38683 - T61 cannot suspend with recent kernels

Don't restore spl until after AcpiLeaveSleepState.
 1.7 31-Jul-2008  joerg machdep.acpi_vbios_reset = 2 --> vga_pci_resume will use x86emu to do a
POST when options VGA_POST is present.
 1.6 11-May-2008  ad branches: 1.6.4;
Share cpu.h between the x86 ports.
 1.5 28-Apr-2008  martin branches: 1.5.2;
Remove clause 3 and 4 from TNF licenses
 1.4 03-Apr-2008  jmcneill branches: 1.4.2; 1.4.4;
Disable machdep.acpi_beep_on_reset by default.
 1.3 30-Jan-2008  ad branches: 1.3.6;
splhigh == splipi
 1.2 15-Jan-2008  joerg branches: 1.2.2;
Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@
 1.1 18-Dec-2007  joerg branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.1.8.3 23-Mar-2008  matt sync with HEAD
 1.1.8.2 09-Jan-2008  matt sync with HEAD
 1.1.8.1 18-Dec-2007  matt file acpi_wakeup.c was added on branch matt-armv6 on 2008-01-09 01:49:45 +0000
 1.1.6.3 19-Jan-2008  bouyer Sync with HEAD
 1.1.6.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.6.1 18-Dec-2007  bouyer file acpi_wakeup.c was added on branch bouyer-xeni386 on 2008-01-02 21:51:17 +0000
 1.1.4.2 26-Dec-2007  ad Sync with head.
 1.1.4.1 18-Dec-2007  ad file acpi_wakeup.c was added on branch vmlocking2 on 2007-12-26 21:38:48 +0000
 1.1.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.2.1 18-Dec-2007  mjf file acpi_wakeup.c was added on branch mjf-devfs on 2008-02-18 21:05:16 +0000
 1.2.2.3 04-Feb-2008  yamt sync with head.
 1.2.2.2 21-Jan-2008  yamt sync with head
 1.2.2.1 15-Jan-2008  yamt file acpi_wakeup.c was added on branch yamt-lazymbuf on 2008-01-21 09:40:05 +0000
 1.3.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.3.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.3.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.4.4.6 11-Aug-2010  yamt sync with head.
 1.4.4.5 11-Mar-2010  yamt sync with head
 1.4.4.4 16-Sep-2009  yamt sync with head
 1.4.4.3 19-Aug-2009  yamt sync with head.
 1.4.4.2 04-May-2009  yamt sync with head.
 1.4.4.1 16-May-2008  yamt sync with head.
 1.4.2.1 18-May-2008  yamt sync with head.
 1.5.2.4 10-Oct-2008  skrll Sync with HEAD.
 1.5.2.3 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.5.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.5.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.6.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.6.4.1 19-Oct-2008  haad Sync with HEAD.
 1.10.4.2 24-Mar-2009  snj Pull up following revision(s) (requested by drochner in ticket #589):
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.12
sync TSC on resume (because CPUs were switched off in the meantime),
otherwise we get diverging timecounters leading to eg the monotonic
clock jump backwards
(pullup candidate)
 1.10.4.1 25-Nov-2008  snj Pull up following revision(s) (requested by joerg in ticket #125):
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.11
sys/dev/acpi/acpi_pci_link.c: revision 1.14
sys/dev/acpi/acpivar.h: revision 1.34
On resum-from-RAM explicitly restore PCI link device state before
reenabling interrupts. At least one BIOS doesn't do this automatically
as reported by Christoph Egger.
 1.10.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.10.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.10.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.11.4.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.11.4.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.11.4.4 10-Jan-2011  jym Sync with HEAD
 1.11.4.3 24-Oct-2010  jym Sync with HEAD
 1.11.4.2 01-Nov-2009  jym Sync with HEAD.
 1.11.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.20.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.20.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.21.2.2 05-Mar-2011  rmind sync with head
 1.21.2.1 30-May-2010  rmind sync with head
 1.27.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.27.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.29.8.2 03-Sep-2012  riz Pull up following revision(s) (requested by jakllsch in ticket #529):
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.32
It turns out we're actually waiting for other processors to be unbusy, not busy.
Unbreaks ACPI suspend on uniprocessor. Probably fixes unnoticed bugs on MP.
Needs pullup to netbsd-6.
 1.29.8.1 09-May-2012  riz branches: 1.29.8.1.2;
Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.29.8.1.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.29.6.1 29-Apr-2012  mrg sync to latest -current.
 1.29.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.2.3 30-Oct-2012  yamt sync with head
 1.29.2.2 23-May-2012  yamt sync with head.
 1.29.2.1 17-Apr-2012  yamt sync with head
 1.32.4.1 18-May-2014  rmind sync with head
 1.32.2.2 03-Dec-2017  jdolecek update from HEAD
 1.32.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.38.6.4 28-Aug-2017  skrll Sync with HEAD
 1.38.6.3 05-Dec-2016  skrll Sync with HEAD
 1.38.6.2 05-Oct-2016  skrll Sync with HEAD
 1.38.6.1 22-Sep-2015  skrll Sync with HEAD
 1.39.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.39.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.39.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.45.8.1 04-May-2018  martin Pull up following revision(s) (requested by maya in ticket #784):
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.46
Save and restore xcr0 when doing ACPI sleeps. Should fix PR/49174.
 1.49.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.49.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.51.2.1 29-Feb-2020  ad Sync with head.
 1.53.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.54.12.2 20-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #432):

sys/arch/x86/x86/genfb_machdep.c: revision 1.23 (patch)
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.57 (patch)

Move definition of acpi_md_vesa_modenum to acpi_wakeup.c; allows building
kernels without framebuffer devices.

Problem reported by John D. Baker on current-users@
 1.54.12.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #425):

sys/arch/x86/pci/pci_machdep.c: revision 1.96
sys/arch/x86/acpi/acpi_machdep.c: revision 1.36
sys/arch/x86/x86/hyperv.c: revision 1.16
sys/arch/x86/x86/genfb_machdep.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.56
sys/arch/x86/include/genfb_machdep.h: revision 1.6

Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.3 18-Jan-2009  hans Use sed, awk and hexdump from tools to make this work on Solaris. Ok by apb.
 1.2 09-Dec-2007  jmcneill branches: 1.2.6; 1.2.8; 1.2.18; 1.2.26; 1.2.28;
How did these get lost?
 1.1 07-Sep-2007  jmcneill branches: 1.1.2; 1.1.8; 1.1.10; 1.1.12;
file genwakecode.sh was initially added on branch jmcneill-pm.
 1.1.12.1 11-Dec-2007  yamt sync with head.
 1.1.10.1 26-Dec-2007  ad Sync with head.
 1.1.8.1 27-Dec-2007  mjf Sync with HEAD.
 1.1.2.2 07-Sep-2007  joerg Try to format the output a bit nicer. Drop FreeBSD CVS ID -- this
doesn't have much in common with the FreeBSD version.
 1.1.2.1 07-Sep-2007  jmcneill Share ACPI wakecode generation between i386 and amd64, and convert amd64
to use joerg's new build scripts for generating wakecode.
 1.2.28.1 27-Mar-2009  msaitoh Pull up following revision(s) (requested by sketch in ticket #536):
etc/Makefile: revision 1.364
Makefile: revision 1.267
usr.sbin/postinstall/postinstall: revision 1.90
usr.bin/hexdump/parse.c: revision 1.25
sys/arch/x86/acpi/genwakecode.sh: revision 1.3
usr.sbin/postinstall/postinstall: revision 1.87
usr.sbin/postinstall/postinstall: revision 1.88
usr.sbin/postinstall/postinstall: revision 1.89
sys/arch/x86/acpi/Makefile.wakecode.inc: revision 1.4
sys/conf/Makefile.kern.inc: revision 1.120
Use ll instead of non-standard q as length modifier in format strings. Makes
this work on Solaris. OK by apb.
Not every grep knows -q. Ok by apb.
Use sed, awk and hexdump from tools to make this work on Solaris. Ok by apb.
Use awk and grep host tools where required. 'build.sh release' now
works on Solaris (but only with HOST_CC=/usr/sfw/bin/gcc for now).
"grep -q" is not portable; use "grep >/dev/null" instead. Also add a
comment saying that postinstal is invoked during a cross build.
In file_exists_exact(), fix an incorrect test of "1" instead of "$1",
and improve the comment explaining what this function does.
As long as we don't yet have a working TOOL_GREP, fgrep is more portablethan grep -F.
 1.2.26.1 19-Jan-2009  skrll Sync with HEAD.
 1.2.18.1 04-May-2009  yamt sync with head.
 1.2.8.2 21-Jan-2008  yamt sync with head
 1.2.8.1 09-Dec-2007  yamt file genwakecode.sh was added on branch yamt-lazymbuf on 2008-01-21 09:40:05 +0000
 1.2.6.2 09-Jan-2008  matt sync with HEAD
 1.2.6.1 09-Dec-2007  matt file genwakecode.sh was added on branch matt-armv6 on 2008-01-09 01:49:45 +0000
 1.4 21-Aug-2025  imil PR 59568: Disable viogpu in MICROVM kernel configuration as it is not
supported for now.
 1.3 08-May-2025  imil branches: 1.3.2; 1.3.4;
Rename BOOTCYCLETIME kernel option and subsequent files to BOOT_DURATION
 1.2 06-May-2025  imil Add BOOTCYCLETIME option to print kernel boot time

Introduce a new kernel option, BOOTCYCLETIME, which will print
the time taken for the kernel to boot on (for now) amd64 and i386
architectures.
 1.1 28-Mar-2025  imil x86: consolidate MICROVM kernel configurations

Move common configuration options from amd64/conf/MICROVM and
i386/conf/MICROVM into a shared x86/conf/MICROVM.common file.
 1.3.4.2 02-Aug-2025  perseant Sync with HEAD
 1.3.4.1 08-May-2025  perseant file MICROVM.common was added on branch perseant-exfatfs on 2025-08-02 05:56:16 +0000
 1.3.2.1 25-Aug-2025  martin Pull up following revision(s) (requested by imil in ticket #13):

sys/arch/x86/conf/MICROVM.common: revision 1.4

PR 59568: Disable viogpu in MICROVM kernel configuration as it is not
supported for now.
 1.126 14-Jun-2023  rin Make PCI_ADDR_FIXUP depended on PCI_BUS_FIXUP.
It is no-op if PCI_BUS_FIXUP is missing.
 1.125 28-Oct-2022  skrll MI PMAP EFI_RUNTIME support
 1.124 24-Sep-2022  riastradh x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.
 1.123 30-Aug-2022  riastradh x86: Rename x86/efi.c -> x86/efi_machdep.c.

Avoid collision with dev/efi.c.
 1.122 21-Jul-2021  jmcneill x86's platform.c no longer has any x86 specific code in it, so move it to
dev/smbios_platform.c to let other ports use it
 1.121 21-Jul-2021  jmcneill Separate MI smbios interface from MD specific code.
 1.120 27-Oct-2020  ryo branches: 1.120.6;
move vmt(4) from MD to MI, and add support vmt on aarch64. tested on ESXi-Arm Fling

- move from sys/arch/x86/x86/{vmt.c,vmtreg.h,vmtvar.h} to sys/dev/vmt/{vmt_subr.c,vmtreg.h,vmtvar.h},
and split the attach part of the cpufeaturebus and fdt
- add aarch64 vmware backdoor op
- add include guard to vmt{reg,var}.h
- Yet there is still some little-endian dependency. it needs to be fixed in order to work properly on aarch64eb
 1.119 01-Aug-2020  jdolecek defflag NO_PCI_MSI_MSIX
 1.118 25-Jul-2020  riastradh Implement ChaCha with SSE2 on x86 machines.

Slightly disappointed that it only doubles, rather than quadruples,
throughput on my Ivy Bridge laptop. Worth investigating.
 1.117 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.116 29-Jun-2020  riastradh New permutation-based AES implementation using SSSE3.

This covers a lot of CPUs -- particularly lower-end CPUs over the
past decade which lack AES-NI.

Derived from Mike Hamburg's public domain vpaes software; see
<https://crypto.stanford.edu/vpaes/> for details.
 1.115 29-Jun-2020  riastradh New SSE2-based bitsliced AES implementation.

This should work on essentially all x86 CPUs of the last two decades,
and may improve throughput over the portable C aes_ct implementation
from BearSSL by

(a) reducing the number of vector operations in sequence, and
(b) batching four rather than two blocks in parallel.

Derived from BearSSL'S aes_ct64 implementation adjusted so that where
aes_ct64 uses 64-bit q[0],...,q[7], aes_sse2 uses (q[0], q[4]), ...,
(q[3], q[7]), each tuple representing a pair of 64-bit quantities
stacked in a single 128-bit register. This translation was done very
naively, and mostly reduces the cost of ShiftRows and data movement
without doing anything to address the S-box or (Inv)MixColumns, which
spread all 64-bit quantities across separate registers and ignore the
upper halves.

Unfortunately, SSE2 -- which is all that is guaranteed on all amd64
CPUs -- doesn't have PSHUFB, which would help out a lot more. For
example, vpaes relies on that. Perhaps there are enough CPUs out
there with PSHUFB but not AES-NI to make it worthwhile to import or
adapt vpaes too.

Note: This includes local definitions of various Intel compiler
intrinsics for gcc and clang in terms of their __builtin_* &c.,
because the necessary header files are not available during the
kernel build. This is a kludge -- we should fix it properly; the
present approach is expedient but not ideal.
 1.114 29-Jun-2020  riastradh Add AES implementation with VIA ACE.
 1.113 29-Jun-2020  riastradh padlock(4): Remove legacy rijndael API use.

This doesn't actually need to compute AES -- it just needs the
standard AES key schedule, so use the BearSSL constant-time key
schedule implementation.

XXX Compile-tested only.
XXX The byte-order business here seems highly questionable.
 1.112 29-Jun-2020  riastradh Add x86 AES-NI support.

Limited to amd64 for now. In principle, AES-NI should work in 32-bit
mode, and there may even be some 32-bit-only CPUs that support
AES-NI, but that requires work to adapt the assembly.
 1.111 06-May-2020  bouyer x86/x86/ipi.c should not be built for XENPV, even if dom0ops is defined.
 1.110 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.109 22-Apr-2020  rin Make crypto/rijindael optional again as cprng_strong does no longer
depend on it. Dependency is explicitly declared in files.foo if a
component requires it.
 1.108 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.107 15-Feb-2019  nonaka branches: 1.107.4; 1.107.10;
Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.106 25-Dec-2018  mlelstv Make ipmi driver available to other platforms.
Add ACPI attachment.
 1.105 20-Dec-2018  cherry Enable 'options NO_PCI_MSI_MSIX' to DTRT in x86 builds.

Document 'options NO_PCI_MSI_MSIX' in options(4).
 1.104 07-Dec-2018  maxv Add an option to have a static kernel memory layout. This option is
disabled by default - that is to say, KASLR remains enabled by default.
 1.103 16-Jul-2018  maxv Move
arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c
into
dev/tprof/tprof_x86_intel.c
dev/tprof/tprof_x86_amd.c
 1.102 13-Jul-2018  maxv Remove the X86PMC code I had written, replaced by tprof. Many defines
become unused in specialreg.h, so remove them. We don't want to add
defines all the time, there are countless PMCs on many generations, and
it's better to just inline the event/unit values.
 1.101 22-May-2018  maxv branches: 1.101.2;
Mmh, don't compile spectre.c on Xen.
 1.100 01-May-2018  pgoyette Make MPVERBOSE a defparam rather than defflag. It has multiple
non-zero usages within mpacpi.c
 1.99 28-Mar-2018  maxv Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.
 1.98 18-Mar-2018  christos Separate the compat code in its own file to facilitate module building.
 1.97 01-Mar-2018  mrg branches: 1.97.2;
move the imc code into x86/pci/files.pci so that pci is defined in time.
 1.96 01-Mar-2018  pgoyette Move the imc(4) and imcsmb(4) sources into architecture-specific
directory (for previous CVS history see the sys/dev/pci/imcsmb/
Attic)
 1.95 01-Mar-2018  pgoyette Replace spaces with tabs (xterm copy-&-pasto)
 1.94 01-Mar-2018  pgoyette Move the imc and imcsmb stuff out of general files.pci and into the
architecture-specific files.x86

Should unbreak the sgimips build.
 1.93 11-Feb-2018  maxv Move SVS into x86/svs.c
 1.92 22-Jan-2018  jdolecek rename sys/arch/x86/x86/pmap_tlb.c to sys/arch/x86/x86/x86_tlb.c, so that
x86 can eventually use uvm/pmap/pmap_tlb.c; step to future PCID support
 1.91 08-Jan-2018  christos Make things compile again.
 1.90 15-Aug-2017  maxv Merge into x86/.
 1.89 15-Aug-2017  maxv Merge into x86/.
 1.88 10-Mar-2017  maxv branches: 1.88.6;
Move pmc.c into x86/, it can be shared with amd64.
 1.87 27-Feb-2016  tls branches: 1.87.2; 1.87.4;
Add cpu_rng, a framework for simple on-CPU random number generators.
 1.86 28-Jan-2016  christos Add support for grub to find the ACPI root table pointer via a bootinfo entry
from grub.
From: https://mail-index.netbsd.org/tech-kern/2014/05/22/msg017119.html
 1.85 11-Nov-2015  skrll Split out the pmap_pv_track stuff for use by others.

Discussed with riastradh@
 1.84 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.83 10-Oct-2014  uebayasi branches: 1.83.2;
Normalize: acpicpu depends on acpi.
 1.82 10-Oct-2014  uebayasi Define "machdep" attribute and mark files (in amd64 and x86).
 1.81 18-Mar-2014  riastradh Merge riastradh-drm2 to HEAD.
 1.80 17-Jul-2013  soren A few "isa" files are so tightly integrated into the x86 platform code
as to not really be part of the optional isa bus autoconf machinery.

Allows configuring a kernel like so:

include "arch/amd64/conf/GENERIC"
no isa
 1.79 29-Aug-2012  drochner branches: 1.79.2; 1.79.4; 1.79.10;
Extend the CPU microcode update framework to support Intel x86 CPUs.
Contrary to the AMD implementation, it doesn't use xcalls to distribute
the update to all CPUs but relies on cpuctl(8) to bind itself to the
right CPU -- to keep it simple and avoid possible problems with
hyperthreading.
Also, it doesn't parse the vendor supplied file to pick the right
part for the present CPU model but relies on userland to prepare
files with specific filenames. I'll commit a pkg for this in a minute
(pkgsrc/sysutils/intel-microcode).
The ioctl interface changed; compatibility is provided (should be
limited to COMPAT_NETBSD6 as soon as this is available).
 1.78 07-May-2012  jym Merge i386 and amd64 version of db_memrw.c.

Use this opportunity to skip calculating the VA of the page. Let the CPU
deal with the invalidation itself through invlpg + destination address to
avoid converting between canonical/non canonical forms.
 1.77 13-Jan-2012  martin Make option CPU_UCODE global
 1.76 13-Jan-2012  cegger Support CPU microcode loading via cpuctl(8).
Implemented and enabled via CPU_UCODE kernel config option
for x86 and Xen Dom0.
Tested on different AMD machines with different
CPU families.

ok wiz@ for the manpages
ok releng@
ok core@ via releng@
 1.75 19-Oct-2011  dyoung branches: 1.75.2; 1.75.6;
Don't link pci_ranges.c with x86 kernels for now, it's using a
pcibus_attach_args member that I haven't added, yet.
 1.74 17-Oct-2011  jmcneill vmt needs sysmon_taskq
 1.73 17-Oct-2011  jmcneill add a port of the VMware Tools driver vmt(4) from OpenBSD
 1.72 29-Aug-2011  dyoung Add pci_ranges.c to the set of files compiled when 'pci' is configured.
 1.71 12-Jun-2011  jruoho Follow IA-64 with the x86-specific ACPI MD functions and move these where
they belong to. Remove an unused function. Minor KNF. No functional change.
 1.70 12-Jun-2011  jruoho Move the evaluation of the _PDC control method out from the acpicpu(4)
driver to the main acpi(4) stack. Follow Linux and evaluate it early.
Should fix PR port-amd64/42895, possibly also PR kern/42583, and many
other comparable bugs.

A common sense explanation is that Intel supplies additional CPU tables to
OEMs. BIOS writers do not bother to modify their DSDTs, but instead load
these extra tables dynamically as secondary SSDT tables. The actual Load()
happens when the _PDC method is invoked, and thus namespace errors occur
when the CPU-specific ACPI methods are not yet present but referenced in the
AML by various drivers, including, but not limited to, acpitz(4).
 1.69 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.68 31-May-2011  dyoung branches: 1.68.2;
Don't use the C preprocessor to configure USERCONF. Instead, either do
or do not link in subr_userconf.c and x86_userconf.c.

Provide no-op stubs for userconf_bootinfo(), userconf_init(), and
userconf_prompt().

Delete all occurrences of #include "opt_userconf.h" as well as USERCONF
and __HAVE_USERCONF_BOOTINFO #ifdef'age.
 1.67 10-Apr-2011  christos Merge db_trace for x86. From: Vladimir Kirillov proger at wilab dot org dot ua
 1.66 04-Apr-2011  dyoung As pointed out by Manuel Bouyer and Taylor R Campbell, I forgot to
commit the change to files.x86 that adds x86_stub.c, so do that.
 1.65 16-Mar-2011  jakllsch sys/arch/x86/x86/iclockmod.c has been removed.
 1.64 04-Mar-2011  jruoho Move INTEL_ONDEMAND_CLOCKMOD -- or odcm(4) -- to the cpufeaturebus.
 1.63 27-Feb-2011  jruoho Move acpicpu(4) from "acpinodebus" to "cpufeaturebus".
 1.62 24-Feb-2011  jruoho Move VIA_C7TEMP to the cpufeaturebus.
 1.61 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.60 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.59 20-Feb-2011  jruoho Modularize coretemp(4). Ok jmcneill@.
 1.58 19-Feb-2011  jmcneill modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.57 05-Feb-2011  yamt decouple tprof and its backends.
 1.56 18-Jul-2010  jruoho branches: 1.56.2; 1.56.4;
Merge a driver for ACPI CPUs with basic support for processor power states,
also known as C-states. The code is modular and provides an easy way to add
the remaining functionality later (namely throttling and P-states).

Remarks:

1. Commented out in the GENERICs; more testing exposure is needed.

2. The C3-state is disabled for the time being because it turns off
timers, among them the local APIC timer. This may not be universally
true on all x86 processors; define ACPICPU_ENABLE_C3 to test.

3. The algorithm used to choose a power state may need tuning. When
evaluating the appropriate state, the implementation uses the
previous sleep time as an indicator. Additional hints would include
for example the system load.

Also bus master activity is evaluated when choosing a state. The
usb(4) stack is notorious for such activity even when unused.
Typically it must be disabled in order to reach the C3-state,
but it may also prevent the use of C2.

4. While no extensive empirical measurements have been carried out, the
power savings are somewhere between 1-2 W with C1 and C2, depending
on the processor, firmware, and load. With C3 even up to 4 W can be
saved. The less something ticks, the more power is saved.

ok jmcneill@, joerg@, and discussed with various people.
 1.55 08-Jul-2010  rmind Unify i386 and amd64 procfs MD code into x86.
 1.54 05-Oct-2009  rmind branches: 1.54.2; 1.54.4;
Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.53 02-Oct-2009  jmcneill Add support for VIA C7 temperature sensors (options VIA_C7TEMP)
 1.52 30-Apr-2009  rmind Move x86 CPU topology detection code into the separate file (as it was originally).
OK by <yamt>.
 1.51 17-Apr-2009  dyoung Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.50 16-Apr-2009  rmind - Add macros to handle (some) trapframe registers for common x86 code.
- Merge i386 and amd64 syscall.c into x86. No functional changes intended.

Proposed on (port-i386 & port-amd64). Unfortunately, I cannot merge these
lists into the single port-x86. :(
 1.49 07-Apr-2009  dyoung Add opt_intrdebug.h for the INTRDEBUG option, and #include it here and
there. Fixes GENERIC/i386 compilation with 'options INTRDEBUG'.
 1.48 30-Mar-2009  rmind Merge i386 and amd64 vm_machdep.c into x86. No functional changes intended.
Note: some #ifdefs will be removed with macros.
 1.47 30-Mar-2009  rmind Merge/move core_machdep.c into x86, no difference between i386 and amd64.
 1.46 24-Feb-2009  yamt - rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.45 17-Feb-2009  jmcneill PR# port-i386/37026: userconf(4) doesn't work with vesafb(4)

Add early console support for x86 genfb.
 1.44 03-Aug-2008  joerg branches: 1.44.2; 1.44.4; 1.44.8; 1.44.12;
Move some MD declarations from x86/pci/files.pci to x86/conf/files.x86,
so that Xen can use the former.

Drop Xen's pcib.c in favor of the x86 code and thereby unbreak ichlpcib.
 1.43 11-May-2008  ad branches: 1.43.4;
Simplify x86 identcpu code, and share between i386/amd64.
 1.42 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.41 25-Apr-2008  ad branches: 1.41.2; 1.41.4;
Include null IPI functions if !MULTIPROCESSOR.
 1.40 01-Jan-2008  yamt branches: 1.40.6; 1.40.8;
try to detect processor resource sharing topologies. ie. package/core/smt IDs.
 1.39 26-Dec-2007  yamt - share idt entry allocation code among x86.
- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
 1.38 25-Dec-2007  joerg Add initial version of calling VGA POST from vga_resume. This is the
equivalent to "vbetool post" using x86emu in the kernel.
 1.37 18-Dec-2007  joerg Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.36 09-Dec-2007  jmcneill branches: 1.36.2;
Merge jmcneill-pm branch.
 1.35 03-Dec-2007  ad branches: 1.35.2; 1.35.4;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.34 07-Nov-2007  ad __cpu_simple_locks really should be simple, otherwise they can cause
problems for e.g. profiling.
 1.33 29-Oct-2007  xtraeme branches: 1.33.2;
Add coretemp(4). A new driver for Intel Core's on-die thermal sensor,
available on Intel Core or newer CPUs.

Ported from FreeBSD. Tested by rmind on i386 and joerg on amd64.

Enabled with "options INTEL_CORETEMP".
 1.32 26-Oct-2007  xtraeme - Share pchb(4) between i386 and amd64; one copy is enough for both.
- Move some of the x86 PCI devices into x86/pci/files.pci.
- Add more x86 stuff into x86/conf/files.x86.

ok joerg.
 1.31 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.30 03-Jun-2007  xtraeme branches: 1.30.8; 1.30.10; 1.30.14;
Make the Enhanced Speedstep driver available for i386 and amd64.
To use it on EM64T CPUs supporting the EST CPUID feature. Note that
some CPUs still don't work with this driver, like Xeon or Pentium 4.

Move the p[34]_get_bus_clock functions into its own file,
intel_busclock.c and remove this code from i386/identcpu.c.

Tested on i386 by myself and amd64 by Tonerre.
 1.29 16-Apr-2007  ad branches: 1.29.2;
Share the sysarch stuff between the x86 ports. PR kern/36046.
 1.28 20-Mar-2007  xtraeme Driver for Intel Thermal Monitor (feature TM) On-Demand Clock
Modulation.

This works by changing the duty cycle of the clock modulation,
and saves power and helps to not increase the temperature by
software.

Adapted from OpenBSD/FreeBSD's p4tcc.

To enable it one must use "options INTEL_ONDEMAND_CLOCKMOD".

Tested by me in UP and SMP, ok'ed by Matthew R. Green.
 1.27 20-Mar-2007  xtraeme MSR read and write IPI handlers for x86. A MSR will be read or written
in all CPUs available in the system. This adds another member
to struct cpu_info, ci_msr_rvalue; it will contain the value of the MSR
in a previous operation.

Tested with clockmod in UP and SMP by me, tested with est in SMP
by Daniel Carosone and Michael Van Elst.

Ok'ed by Andrew Doran and Matthew R. Green.
 1.26 15-Mar-2007  xtraeme Ok... there were people really angry with this, backing it out.
 1.25 15-Mar-2007  xtraeme Add a driver for the Pentium 4 and later models with feature TM
(Thermal Monitor).

This driver will throttle the CPU clock modulation, saving some
power, also known as ODMC (On Demand Modulation Clock).

The processor can change from 12.5% to 100% (there are two erratas,
so two levels might be skipped in the worst case).

If supported, you'll see the following sysctl sub-tree:

machdep.p4tcc.throttling.target: CPU Clock throttling state (0 = lowest, 7 highest)
machdep.p4tcc.throttling.current: current CPU throttling state
machdep.p4tcc.throttling.available: list of CPU Clock throttling states

machdep.p4tcc.throttling.target = 2
machdep.p4tcc.throttling.current = 2
machdep.p4tcc.throttling.available = 7 6 5 4 3 2

Adapted from OpenBSD/FreeBSD.
 1.24 05-Mar-2007  drochner branches: 1.24.2; 1.24.4; 1.24.6;
clean up how cpus and ioapics are attached at the mainbus:
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
 1.23 17-Feb-2007  daniel branches: 1.23.2;
Add an opencrypto provider for the AES xcrypt instructions found on VIA
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.

Reviewed on tech-crypto and port-i386, no objections to commiting this.
 1.22 09-Feb-2007  ad Merge newlock2 to head.
 1.21 01-Jan-2007  ad Report on and where possible, try to work around some of the known errata
for Athlon 64 and Opteron processors. Tested briefly by cube@ and elad@.
 1.20 01-Oct-2006  bouyer branches: 1.20.2;
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
 1.19 07-Aug-2006  xtraeme branches: 1.19.4; 1.19.6;
* Do not change struct powernow_pst_s (I added another member in my
previous patch) and this MUST be of that size, otherwise the tables
won't be found.

* powernow_k8.c moved into x86/x86, it should work both i386 and amd64.

* Added more DPRINTFs needed to found the first problem.

* Create "machdep.powernow.frequency" again, I can't remember why I
removed frequency... it should work with estd now.

* Do not try to call k[78]_powernow_init() if cpu is not AMD (thanks
to christos).

And more things I can't remember, but this time it will work in
Athlon 64 cpus and it won't crash in EM64T cpus.
 1.18 06-Aug-2006  xtraeme AMD PowerNow!/Cool`n'Quiet driver for NetBSD/amd64,
adapted from OpenBSD.

Tested on a few machines:

http://bigbird.dohd.org:3021/NetBSD/dmesg
http://www.bsd.org.il/netbsd/acpi/dmesg

Thanks to cube, elad and others for testing and fixes.

Enabled by default on GENERIC.
 1.17 04-Jul-2006  christos Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.16 03-Feb-2006  bouyer branches: 1.16.4; 1.16.12;
Split move interrupt-related PCI functions from pci_machdep.c to
pci_intr_machdep.c. In Xen-3 registers access is done the normal way but
interrupts need custom setup. Proposed on port-amd64, port-i386 and
port-xen a week ago.
 1.15 30-Dec-2005  jmmv branches: 1.15.2; 1.15.4;
Add a 'struct bootinfo' to represent the bootinfo structure used in the
kernel by x86 platforms (instead of a simple char *). This way, the code
in, e.g., lookup_bootinfo, is a bit easier to understand.

While here, move the lookup_bootinfo function used in x86 platforms (amd64,
i386 and xen) to a common file (x86/x86_machdep.c), as it was exactly the
same in all of them.
 1.14 11-Dec-2005  christos merge ktrace-lwp.
 1.13 03-Jul-2005  cube branches: 1.13.2;
Move definitions for PCI_*_FIXUP to files.x86 so that ACPI compiles for
amd64...
 1.12 20-Oct-2004  thorpej branches: 1.12.10;
Move boot device detection code from i386 and amd64 ports to x86_autoconf.c.
Rename i386_alldisks and x86_64_alldisks to x86_alldisks, adjust other
references to compensate.
 1.11 30-Aug-2004  drochner Phase out the use of a string as first "attach args" member to control
which bustype should be attached with a specific call to config_found()
(from a "mainbus" or a bus bridge).
Do it for isa/eisa/mca and pci/agp for now. These buses all attach to
an mi interface attribute "isabus", "eisabus" etc., and the autoconf
framework now allows to specify an interface attribute on config_found()
and config_search(), which limits the search of matching config data
to these which attach to that specific attribute.
So we basically have to call config_found_ia(..., "foobus", ...) where
such a bus is attached.
As a consequence, where a "mainbus" or alike also attaches other
devices (eg CPUs) which do not attach to a specific attribute yet,
we need at least pass an attribute name (different from "foobus") so
that the foo bus is not found at these places. This made some minor
changes necessary which are not obviously related to the mentioned buses.
 1.10 08-Oct-2003  bouyer pciide_machdep.c depends on pciide_common, not pciide.
Pointed out and fix tested by Marc Recht.
 1.9 06-Sep-2003  fvdl Move the bulk of pci_intr_string into a seperate intr_string function. Use
that new function to print the pciide compat interrupt in pciide_machdep.c.
Share pciide_machdep.c between amd64 and i386.
 1.8 29-May-2003  fvdl branches: 1.8.2;
Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.7 11-May-2003  fvdl Update for acpi file move to sys/arch/x86/x86.
 1.6 25-Apr-2003  fvdl Share some common cache info cpuid code between i386 and x86_64.
 1.5 12-Mar-2003  thorpej Split bus_space and bus_dma into separate files.
 1.4 01-Mar-2003  fvdl lock_machdep.c moved here from arch/i386/i386.
 1.3 27-Feb-2003  fvdl Add consinit.c
 1.2 27-Feb-2003  fvdl Catch up with isa_machdep.c and pci_machdep.c move.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.8.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.8.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.8.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.8.2.2 03-Sep-2004  skrll Sync with HEAD
 1.8.2.1 03-Aug-2004  skrll Sync with HEAD
 1.12.10.1 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1621):
sys/arch/i386/conf/GENERIC: revision 1.787 via patch
share/man/man4/Makefile: revision 1.407 via patch
distrib/sets/lists/man/mi: revision 1.936 via patch
share/man/man4/ipmi.4: revision 1.1 via patch
sys/arch/i386/i386/bios32.c: revision 1.11 via patch
sys/dev/DEVNAMES: revision 1.221 via patch
sys/arch/x86/x86/ipmi.c: revision 1.1 via patch
sys/arch/i386/i386/mainbus.c: revision 1.65 via patch
sys/arch/x86/include/smbiosvar.h: revision 1.1 via patch
sys/arch/x86/include/ipmivar.h: revision 1.1 via patch
sys/arch/x86/conf/files.x86: revision 1.20 via patch
sys/arch/i386/conf/files.i386: revision 1.293 via patch
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
Add manpage for new ipmi driver.
Claim ipmi.
 1.13.2.8 21-Jan-2008  yamt sync with head
 1.13.2.7 07-Dec-2007  yamt sync with head
 1.13.2.6 15-Nov-2007  yamt sync with head.
 1.13.2.5 27-Oct-2007  yamt sync with head.
 1.13.2.4 03-Sep-2007  yamt sync with head.
 1.13.2.3 26-Feb-2007  yamt sync with head.
 1.13.2.2 30-Dec-2006  yamt sync with head.
 1.13.2.1 21-Jun-2006  yamt sync with head.
 1.15.4.1 09-Sep-2006  rpaulo sync with head
 1.15.2.1 18-Feb-2006  yamt sync with head.
 1.16.12.1 13-Jul-2006  gdamore Merge from HEAD.
 1.16.4.1 11-Aug-2006  yamt sync with head
 1.19.6.1 22-Oct-2006  yamt sync with head
 1.19.4.3 27-Jan-2007  ad If running on a PPro or later, at boot patch in versions of spllower() and
similar that use cmpxchg8b instead of cli/sti. Cuts the clock cycles for
splx() by a factor of ~6 on the P4, and ~3 on the PIII when bracketed by
serializing instructions (and hopefully more when not).
 1.19.4.2 12-Jan-2007  ad Sync with head.
 1.19.4.1 18-Nov-2006  ad Sync with head.
 1.20.2.2 12-Sep-2007  msaitoh Pull up following patches (requested by xtraeme in ticket #809)

share/man/man4/options.4 patch
sys/arch/i386/conf/files.i386 patch
sys/arch/i386/i386/est.c delete
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/include/cpuvar.h patch
sys/arch/x86/x86/est.c new file
sys/arch/x86/x86/intel_busclock.c new file
sys/arch/amd64/amd64/identcpu.c patch
sys/arch/amd64/conf/GENERIC patch

Add support for the VIA C7-M and Eden processors in the Enhanced
Speedstep driver.
amd64: The Enhanced Speedstep driver is now able to work on EM64T
CPUs running in 64bit mode.
 1.20.2.1 20-Apr-2007  bouyer branches: 1.20.2.1.2;
Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.20.2.1.2.1 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.23.2.4 07-May-2007  yamt sync with head.
 1.23.2.3 24-Mar-2007  yamt sync with head.
 1.23.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.23.2.1 17-Feb-2007  rmind file files.x86 was added on branch yamt-idlelwp on 2007-03-12 05:51:45 +0000
 1.24.6.1 29-Mar-2007  reinoud Pullup to -current
 1.24.4.1 11-Jul-2007  mjf Sync with head.
 1.24.2.6 03-Dec-2007  ad Sync with HEAD.
 1.24.2.5 03-Dec-2007  ad Sync with HEAD.
 1.24.2.4 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.24.2.3 09-Jun-2007  ad Sync with head.
 1.24.2.2 27-May-2007  ad Sync with head.
 1.24.2.1 10-Apr-2007  ad Sync with head.
 1.29.2.1 26-Jun-2007  garbled Sync with HEAD.
 1.30.14.1 13-Nov-2007  bouyer Sync with HEAD
 1.30.10.3 09-Jan-2008  matt sync with HEAD
 1.30.10.2 08-Nov-2007  matt sync with -HEAD
 1.30.10.1 06-Nov-2007  matt sync with HEAD
 1.30.8.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.30.8.5 11-Nov-2007  joerg Sync with HEAD.
 1.30.8.4 29-Oct-2007  joerg Sync with HEAD.
 1.30.8.3 28-Oct-2007  joerg Sync with HEAD.
 1.30.8.2 04-Sep-2007  joerg Move common PCI devices on i386 and amd64 into a arch/x86/pci/fils.pci.
 1.30.8.1 03-Aug-2007  jmcneill Pull in power management changes from private branch.
 1.33.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.33.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.33.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.33.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.35.4.1 11-Dec-2007  yamt sync with head.
 1.35.2.1 26-Dec-2007  ad Sync with head.
 1.36.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.40.8.1 18-May-2008  yamt sync with head.
 1.40.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.40.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.41.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.41.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.41.2.4 11-Aug-2010  yamt sync with head.
 1.41.2.3 11-Mar-2010  yamt sync with head
 1.41.2.2 04-May-2009  yamt sync with head.
 1.41.2.1 16-May-2008  yamt sync with head.
 1.43.4.1 19-Oct-2008  haad Sync with HEAD.
 1.44.12.1 21-Apr-2010  matt sync to netbsd-5
 1.44.8.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.44.8.5 02-May-2011  jym Sync with head.
 1.44.8.4 29-Mar-2011  jym More sync fixes. And add the mbr_gpt files.
 1.44.8.3 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.44.8.2 01-Nov-2009  jym Sync with HEAD.
 1.44.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.44.4.3 21-Nov-2010  riz Pull up following revision(s) (requested by hubertf in ticket #1403):
sys/arch/x86/conf/files.x86: revision 1.49
sys/arch/i386/i386/autoconf.c: revision 1.94
sys/arch/x86/x86/intr.c: revision 1.60
Add opt_intrdebug.h for the INTRDEBUG option, and #include it here and
there. Fixes GENERIC/i386 compilation with 'options INTRDEBUG'.
 1.44.4.2 05-Oct-2009  sborrill Pull up the following revisions(s) (requested by jmcneill in ticket #1061):
sys/arch/x86/conf/files.x86: revision 1.53
sys/arch/x86/include/cpuvar.h: revision 1.31
sys/arch/x86/x86/identcpu.c: revision 1.17
sys/arch/x86/x86/viac7temp.c: revision 1.1
sys/arch/i386/conf/ALL: revision 1.218
sys/arch/i386/conf/GENERIC: revision 1.949
Add support for VIA C7 temperature sensors (options VIA_C7TEMP) and enable
in i386 GENERIC kernel.
 1.44.4.1 16-Jun-2009  snj Pull up following revision(s) (requested by rmind in ticket #782):
sys/arch/x86/conf/files.x86: revision 1.52 via patch
sys/arch/x86/include/cpu.h: revision 1.17
sys/arch/x86/x86/cpu_topology.c: revision 1.1
sys/arch/x86/x86/identcpu.c: revision 1.16 via patch
Move x86 CPU topology detection code into the separate file (as it was
originally).
OK by <yamt>.
 1.44.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.44.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.54.4.4 12-Jun-2011  rmind sync with head
 1.54.4.3 21-Apr-2011  rmind sync with head
 1.54.4.2 05-Mar-2011  rmind sync with head
 1.54.4.1 26-May-2010  rmind Split x86 TLB shootdown code into a separate file.
Code part is under TNF license, as per pmap.c 1.105.2.4 revision.
 1.54.2.3 30-Oct-2010  uebayasi xmd_machdep.c is gone.
 1.54.2.2 20-Aug-2010  uebayasi xmd(4) glue for i386. XIP mount panics now.
 1.54.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.56.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.56.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.56.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.68.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.75.6.2 02-Jun-2012  mrg sync to latest -current.
 1.75.6.1 18-Feb-2012  mrg merge to -current.
 1.75.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.75.2.3 30-Oct-2012  yamt sync with head
 1.75.2.2 23-May-2012  yamt sync with head.
 1.75.2.1 17-Apr-2012  yamt sync with head
 1.79.10.1 23-Jul-2013  riastradh sync with HEAD
 1.79.4.1 28-Aug-2013  rmind sync with head
 1.79.2.2 03-Dec-2017  jdolecek update from HEAD
 1.79.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.83.2.4 28-Aug-2017  skrll Sync with HEAD
 1.83.2.3 19-Mar-2016  skrll Sync with HEAD
 1.83.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.83.2.1 06-Jun-2015  skrll Sync with HEAD
 1.87.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.87.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.88.6.3 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1593:

sys/arch/x86/conf/files.x86 1.108
sys/arch/x86/include/apicvar.h 1.7 via patch
sys/arch/x86/include/cpu.h 1.121
sys/arch/x86/x86/cpu.c 1.185 via patch
sys/arch/x86/x86/hyperv.c 1.7
sys/arch/x86/x86/tsc.c 1.41
sys/arch/xen/conf/files.xen 1.181

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.88.6.2 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.88.6.1 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.97.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.97.2.6 28-Jul-2018  pgoyette Sync with HEAD
 1.97.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.97.2.4 02-May-2018  pgoyette Synch with HEAD
 1.97.2.3 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.97.2.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.97.2.1 18-Mar-2018  pgoyette Import more christos@ changes from -current
 1.101.2.2 21-Apr-2020  martin Sync with HEAD
 1.101.2.1 10-Jun-2019  christos Sync with HEAD
 1.107.10.3 25-Apr-2020  bouyer sync with bouyer-xenpvh-base2 (HEAD)
 1.107.10.2 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.107.10.1 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING
 1.107.4.1 15-Jul-2020  martin Pull up the following, requested by msaitoh in ticket #1015

sys/arch/x86/conf/files.x86 1.108 (via patch)
sys/arch/x86/include/apicvar.h 1.7 (via patch)
sys/arch/x86/include/cpu.h 1.121 (via patch)
sys/arch/x86/x86/cpu.c 1.185 (via patch)
sys/arch/x86/x86/hyperv.c 1.7 (via patch)
sys/arch/x86/x86/tsc.c 1.41 (via patch)
sys/arch/xen/conf/files.xen 1.181 (via patch)

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.120.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.26 30-Nov-2024  christos Create a new header lwp_private.h to contain _lwp_getprivate_fast,
_lwp_gettcb_fast, _lwp_settcb and remove them from mcontext.h, so that:
1. we don't need special hacks to hide them
2. we can include <lwp.h> where needed to get the necessary prototypes
without redefining them locally.
 1.25 30-Apr-2021  christos branches: 1.25.20;
Merge the x86 gdt function and constant definitions
 1.24 11-May-2019  christos branches: 1.24.14;
Undo previous, fixed in userland.
 1.23 11-May-2019  christos expose the {rd,wr}msr functions to userland and install the header for
the benefit of cpuctl (fix the build).
 1.22 17-Feb-2018  kamil branches: 1.22.4;
Stop installing dbregs.h

This is now kernel-only header. The behavior is well specified by the CPU
documents and we don't introduce changes to it.

Noted by <wiz>
 1.21 15-Dec-2016  kamil branches: 1.21.8;
Add support for hardware assisted watchpoints/breakpoints API in ptrace(2)

Add new ptrace(2) calls:
- PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
- PT_READ_WATCHPOINT - read struct ptrace_watchpoint from the kernel state
- PT_WRITE_WATCHPOINT - write new struct ptrace_watchpoint state, this
includes enabling and disabling watchpoints

The ptrace_watchpoint structure contains MI and MD parts:

typedef struct ptrace_watchpoint {
int pw_index; /* HW Watchpoint ID (count from 0) */
lwpid_t pw_lwpid; /* LWP described */
struct mdpw pw_md; /* MD fields */
} ptrace_watchpoint_t;

For example amd64 defines MD as follows:
struct mdpw {
void *md_address;
int md_condition;
int md_length;
};

These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.

Tested on amd64, initial support added for i386 and XEN.

Sponsored by <The NetBSD Foundation>
 1.20 27-Feb-2016  tls branches: 1.20.2;
Add cpu_rng, a framework for simple on-CPU random number generators.
 1.19 11-Feb-2014  dsl branches: 1.19.6;
Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.18 07-Feb-2014  dsl Userspace (especially libkvm) build better is cpu_extended_state.h
is exported.
 1.17 29-Aug-2012  drochner branches: 1.17.2; 1.17.4;
Extend the CPU microcode update framework to support Intel x86 CPUs.
Contrary to the AMD implementation, it doesn't use xcalls to distribute
the update to all CPUs but relies on cpuctl(8) to bind itself to the
right CPU -- to keep it simple and avoid possible problems with
hyperthreading.
Also, it doesn't parse the vendor supplied file to pick the right
part for the present CPU model but relies on userland to prepare
files with specific filenames. I'll commit a pkg for this in a minute
(pkgsrc/sysutils/intel-microcode).
The ioctl interface changed; compatibility is provided (should be
limited to COMPAT_NETBSD6 as soon as this is available).
 1.16 17-Jul-2011  dyoung branches: 1.16.2;
Good-bye bus.h. Don't install <machine/bus.h>.
 1.15 20-Dec-2010  christos To use x86/cpu.h struct cpu_info from userland, we need via_padlock.h installed.
 1.14 07-Jul-2010  njoly Install x86/pte.h
 1.13 11-May-2008  ad branches: 1.13.12; 1.13.18; 1.13.20;
Share cpu.h between the x86 ports.
 1.12 20-Jan-2008  yamt branches: 1.12.6; 1.12.8; 1.12.10; 1.12.12;
- rewrite P->V tracking.
- use a hash rather than SPLAY trees.
SPLAY tree is a wrong algorithm to use here.
will be revisited if it slows down anything other than
micro-benchmarks.
- optimize the single mapping case (it's a common case) by
embedding an entry into mdpage.
- don't keep a pmap pointer as it can be obtained from ptp.
(discussed on port-i386 some years ago.)
ideally, a single paddr_t should be enough to describe a pte.
but it needs some more thoughts as it can increase computational
costs.
- pmap_enter: simplify and fix races with pmap_sync_pv.
- don't bother to lock pm_obj[i] where i > 0, unless DIAGNOSTIC.
- kill mp_link to save space.
- add many KASSERTs.
 1.11 18-Oct-2007  yamt branches: 1.11.2; 1.11.8;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.
 1.10 16-Apr-2007  ad branches: 1.10.10; 1.10.12; 1.10.14; 1.10.16;
+ x86/sysarch.h
 1.9 09-Feb-2007  ad branches: 1.9.2; 1.9.6; 1.9.8;
Merge newlock2 to head.
 1.8 01-Jan-2007  ad Report on and where possible, try to work around some of the known errata
for Athlon 64 and Opteron processors. Tested briefly by cube@ and elad@.
 1.7 04-Feb-2006  jmmv branches: 1.7.14;
Revert yesterday's change that attempted to fix the detection of the
boot device when using a Multiboot boot loader. It couldn't work because
these boot loaders do not pass a checksum of the disk so matchbiosdisk()
cannot really find any matches. I should have gone to sleep before
commiting...

Found by xtraeme@.
 1.6 03-Feb-2006  jmmv branches: 1.6.2;
When booting an i386 kernel with Multiboot, properly detect the boot device
by looking it up in the x86_alldisks table (instead of trying to match it
to 'wd*' manually).

In order to do this, move the cpu_rootconf function from x86 common code
to amd64 and i386 specific one. This way, i386 can do an extra step (call
the appropriate Multiboot code) in the appropriate place (after
x86_matchbiosdisks and before findroot()).
 1.5 22-Oct-2003  kleink branches: 1.5.16; 1.5.30;
Use a common <machine/math.h> for amd64 and i386.
 1.4 26-Apr-2003  fvdl branches: 1.4.2;
Install cacheinfo.h
 1.3 03-Mar-2003  fvdl Install cpuvar.h
 1.2 27-Feb-2003  fvdl Move a few more files to x86/include. Trim the list of files to install
in /usr/include a bit.
 1.1 26-Feb-2003  fvdl Install header files.
 1.4.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.30.1 09-Sep-2006  rpaulo sync with head
 1.5.16.4 21-Jan-2008  yamt sync with head
 1.5.16.3 27-Oct-2007  yamt sync with head.
 1.5.16.2 03-Sep-2007  yamt sync with head.
 1.5.16.1 26-Feb-2007  yamt sync with head.
 1.6.2.1 22-Apr-2006  simonb Sync with head.
 1.7.14.2 12-Jan-2007  ad Sync with head.
 1.7.14.1 24-Oct-2006  ad Compile fixes
 1.9.8.1 11-Jul-2007  mjf Sync with head.
 1.9.6.2 23-Oct-2007  ad Sync with head.
 1.9.6.1 27-May-2007  ad Sync with head.
 1.9.2.1 07-May-2007  yamt sync with head.
 1.10.16.1 25-Oct-2007  bouyer Sync with HEAD.
 1.10.14.1 08-Oct-2007  yamt merge some parts of x86 pmap.h.
 1.10.12.2 23-Mar-2008  matt sync with HEAD
 1.10.12.1 06-Nov-2007  matt sync with HEAD
 1.10.10.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.11.8.1 20-Jan-2008  bouyer Sync with HEAD
 1.11.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.12.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.12.10.2 11-Aug-2010  yamt sync with head.
 1.12.10.1 16-May-2008  yamt sync with head.
 1.12.8.1 18-May-2008  yamt sync with head.
 1.12.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.20.1 05-Mar-2011  rmind sync with head
 1.13.18.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.13.12.3 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.13.12.2 10-Jan-2011  jym Sync with HEAD
 1.13.12.1 24-Oct-2010  jym Sync with HEAD
 1.16.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.16.2.1 30-Oct-2012  yamt sync with head
 1.17.4.1 18-May-2014  rmind sync with head
 1.17.2.2 03-Dec-2017  jdolecek update from HEAD
 1.17.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.6.2 05-Feb-2017  skrll Sync with HEAD
 1.19.6.1 19-Mar-2016  skrll Sync with HEAD
 1.20.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.21.8.1 01-Mar-2018  martin Pull up following revision(s) (requested by kamil in ticket #599):
sys/arch/x86/include/Makefile: revision 1.22
Stop installing dbregs.h
This is now kernel-only header. The behavior is well specified by the CPU=
documents and we don't introduce changes to it.
Noted by <wiz>
 1.22.4.1 10-Jun-2019  christos Sync with HEAD
 1.24.14.1 13-May-2021  thorpej Sync with HEAD.
 1.25.20.1 02-Aug-2025  perseant Sync with HEAD
 1.14 22-Dec-2019  thorpej Add acpi_intr_mask() and acpi_intr_unmask() which, following the pre-existing
ACPI software layering model, are wrappers around acpi_md_intr_mask() and
acpi_md_intr_unmask(), which in turn are wrappers around intr_mask() and
intr_unmask().

XXX ARM and IA64 implementations of acpi_md_intr_mask() and
acpi_md_intr_unmask() are just stubs for now.
 1.13 16-Nov-2018  jmcneill Add MD functions for establishing and disestablishing interrupt handlers.
 1.12 20-Mar-2018  bouyer branches: 1.12.2;
Allow registering ACPI interrupt handlers with a xname.
AcpiOsInstallInterruptHandler(), part of ACPICA API, doesn't allow passing
the xname. I extend the API with AcpiOsInstallInterruptHandler_xname()
for this purpose, and change acpi_md_OsInstallInterruptHandler() to
accept and use the xname (ia64 doens't use it).
The xname was hardcoded to "acpi SCI" in the
x86 acpi_md_OsInstallInterruptHandler(), so I make
AcpiOsInstallInterruptHandler() call
AcpiOsInstallInterruptHandler_xname with xname = "acpi SCI".

Now 'vmstat -i' shows the device's name instead of "acpi SCI" for for i2c HID
interrupts.

Proposed on tech-kern@ on Dec 29.
 1.11 23-Sep-2012  chs branches: 1.11.36;
locate PCI buses and determine their bus numbers using the info
previously extracted from ACPICA rather than trying to figure it out again.
allow PCI buses that don't have a _PRT method.
 1.10 12-Jun-2011  jruoho branches: 1.10.2; 1.10.8; 1.10.12;
Follow IA-64 with the x86-specific ACPI MD functions and move these where
they belong to. Remove an unused function. Minor KNF. No functional change.
 1.9 12-Jun-2011  jruoho Move the evaluation of the _PDC control method out from the acpicpu(4)
driver to the main acpi(4) stack. Follow Linux and evaluate it early.
Should fix PR port-amd64/42895, possibly also PR kern/42583, and many
other comparable bugs.

A common sense explanation is that Intel supplies additional CPU tables to
OEMs. BIOS writers do not bother to modify their DSDTs, but instead load
these extra tables dynamically as secondary SSDT tables. The actual Load()
happens when the _PDC method is invoked, and thus namespace errors occur
when the CPU-specific ACPI methods are not yet present but referenced in the
AML by various drivers, including, but not limited to, acpitz(4).
 1.8 13-Jan-2011  jruoho branches: 1.8.6;
Move the function that counts the CPUs from acpicpu(4) to the MD layer.
 1.7 24-Jul-2010  jruoho Revert the previous partially for the time being.
 1.6 24-Jul-2010  jruoho Move ACPI_FLUSH_CPU_CACHE() (a.k.a. WBINVD on x86) to MD headers where it
belongs to. Let IA-64 define its own function/instruction instead of
requiring a dummy wbinvd() to satisfy the definition in a MI header.
 1.5 14-Mar-2009  jmcneill branches: 1.5.2; 1.5.4;
Add acpi_md_OsEnableInterrupt, to go with acpi_md_OsDisableInterrupt
 1.4 15-Dec-2007  joerg branches: 1.4.10; 1.4.18; 1.4.24;
Move mapping of the real mode location for the ACPI wakeup code into a
separate function called from acpi_md_callback.
 1.3 09-Dec-2007  jmcneill branches: 1.3.2;
Merge jmcneill-pm branch.
 1.2 02-May-2005  kochi branches: 1.2.2; 1.2.56; 1.2.58; 1.2.68; 1.2.70;
Merge changes for ACPI-CA 20050408.
 1.1 11-May-2003  fvdl branches: 1.1.2;
Moved here from sys/arch/i386/include.
 1.1.2.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.70.1 11-Dec-2007  yamt sync with head.
 1.2.68.1 26-Dec-2007  ad Sync with head.
 1.2.58.1 09-Jan-2008  matt sync with HEAD
 1.2.56.4 02-Oct-2007  jmcneill Update to ACPI-CA 20070320
 1.2.56.3 08-Sep-2007  joerg Now that the real mode pages are statically allocated, make
acpi_wakeup_paddr static and local to acpi_wakeup.c.
 1.2.56.2 08-Sep-2007  joerg Start to revamp the ACPI wake code (i386 only, amd64 gets minimal fixes
to keep being compilable):

- In init386 and the amd64 equivalent, just reserve the low-level code.
Do not map and don't copy the wakecode yet. This avoids the conflicts
with the MP tramp code as well. The wakecode is expected to be less
than one page long, which is way too much space.
acpi_md_get_npages_of_wakecode and acpi_md_install_wakecode are
dropped, acpi_wakeup_paddr is set instead of the reserved address.
- Split the wakecode into the essential low-level part to setup
protected mode with paging and valid CS and DS (which stays as
wakecode) and the rest. Inline beepon and beepoff as they are used
exactly once.
- Split the acpi_restorecpu and acpi_savecpu assembly from apci_wakeup.c
and merge acpi_restorecpu with the second half dropped from wakecode.
Most registers are not exported, just those needed to be patched into
wakecode. Don't bother to save or restore %eax, it is overriden
anyway.
- Don't bother to save and restore eflags in acpi_md_sleep, they are
handled correctly by the assembly. Don't play games with cr3 either,
we modify the pmap of the running processes. Copy the wakecode
directly before patching it, after the identity mapping has been
setup.
- Drop clear_reg and acpi_printcpu.
- Add an commented out broadcast IPI to halt the other CPUs explicitly.
 1.2.56.1 23-Aug-2007  joerg From FreeBSD: explicitly load regions first to allow acpi_md_callback
to actually query the routing tables.

Drop the argment to acpi_md_callback, passing around singletons is not
that helpful.
 1.2.2.1 21-Jan-2008  yamt sync with head
 1.3.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.4.24.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.4.24.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.4.24.3 24-Oct-2010  jym Sync with HEAD
 1.4.24.2 01-Nov-2009  jym Sync with HEAD.
 1.4.24.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.4.18.1 28-Apr-2009  skrll Sync with HEAD.
 1.4.10.2 11-Aug-2010  yamt sync with head.
 1.4.10.1 04-May-2009  yamt sync with head.
 1.5.4.1 05-Mar-2011  rmind sync with head
 1.5.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.8.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.10.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.10.8.1 22-Nov-2012  riz Pull up following revision(s) (requested by chs in ticket #683):
sys/arch/ia64/include/acpi_machdep.h: revision 1.6
sys/arch/x86/include/acpi_machdep.h: revision 1.11
sys/dev/acpi/acpi.c: revision 1.255
sys/arch/x86/acpi/acpi_machdep.c: revision 1.4
sys/arch/x86/x86/mpacpi.c: revision 1.95
sys/arch/x86/x86/mpacpi.c: revision 1.96
sys/arch/ia64/acpi/acpi_machdep.c: revision 1.6
locate PCI buses and determine their bus numbers using the info
previously extracted from ACPICA rather than trying to figure it out again.
allow PCI buses that don't have a _PRT method.
as a workaround for PR 47016, call ioapic_reenable() at the end of
ACPI interrupt routing to fix the settings for the SCI interrupt.
the problem is that after my recent changes, the SCI handler is
installed before the MADT info is parsed, so we don't know what
polarity it should have. the real fix for this will be to rearrange
the ACPI initialization so that everything is done in a more sensible
order, but that will take some more time.
 1.10.2.1 30-Oct-2012  yamt sync with head
 1.11.36.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.11.36.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.12.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12.2.1 10-Jun-2019  christos Sync with HEAD
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.11 02-May-2025  imil Add support for CPUID leaf 0x40000010 to detect TSC and LAPIC frequency on
hypervisors implementing the VMware-defined interface

This change enables virtual machines to obtain TSC and LAPIC frequency
information directly from the hypervisor via CPUID leaf 0x40000010, avoiding
the need for runtime calibration, thus reducing boot speed in supported
environments.

Tested on GENERIC and MICROVM kernels, QEMU/KVM and QEMU/NVMM (current and
10.1), Intel and AMD CPUs, NetBSD/amd64 and i386.
 1.10 06-Mar-2025  imil Revert VMware-compatible TSC and LAPIC frequency detection.
 1.9 06-Mar-2025  imil Add support for CPUID leaf 0x40000010, which enables VMware-compatible TSC
and LAPIC frequency detection for virtual machines.
 1.8 25-Apr-2020  bouyer branches: 1.8.26;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.7 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.6 14-Jun-2019  msaitoh branches: 1.6.2; 1.6.8;
- Dump LAPIC and I/O APIC correctly.
- Don't print redirect target on LAPIC.
- Fix DEST_MASK:
- DEST_MASK is not 1 bit but 2 bit.
- Add missing "\0"s to print decoded name correctly.
- Support both LAPIC and I/O APIC correctly in apic_format_redir().
- Improve output of some bits using with snprintb()'s "F\B\1" and ":\V".
 1.5 28-Apr-2008  martin branches: 1.5.80; 1.5.88;
Remove clause 3 and 4 from TNF licenses
 1.4 05-Mar-2007  drochner branches: 1.4.40; 1.4.42; 1.4.44;
clean up how cpus and ioapics are attached at the mainbus:
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
 1.3 29-May-2005  christos branches: 1.3.2; 1.3.34;
Sprinkle const.
 1.2 27-Oct-2003  junyoung Nuke __P().
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.34.1 12-Mar-2007  rmind Sync with HEAD.
 1.3.2.1 03-Sep-2007  yamt sync with head.
 1.4.44.1 16-May-2008  yamt sync with head.
 1.4.42.1 18-May-2008  yamt sync with head.
 1.4.40.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.88.2 21-Apr-2020  martin Sync with HEAD
 1.5.88.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5.80.1 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1593:

sys/arch/x86/conf/files.x86 1.108
sys/arch/x86/include/apicvar.h 1.7 via patch
sys/arch/x86/include/cpu.h 1.121
sys/arch/x86/x86/cpu.c 1.185 via patch
sys/arch/x86/x86/hyperv.c 1.7
sys/arch/x86/x86/tsc.c 1.41
sys/arch/xen/conf/files.xen 1.181

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.6.8.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.6.2.1 15-Jul-2020  martin Pull up the following, requested by msaitoh in ticket #1015

sys/arch/x86/conf/files.x86 1.108 (via patch)
sys/arch/x86/include/apicvar.h 1.7 (via patch)
sys/arch/x86/include/cpu.h 1.121 (via patch)
sys/arch/x86/x86/cpu.c 1.185 (via patch)
sys/arch/x86/x86/hyperv.c 1.7 (via patch)
sys/arch/x86/x86/tsc.c 1.41 (via patch)
sys/arch/xen/conf/files.xen 1.181 (via patch)

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.8.26.1 02-Aug-2025  perseant Sync with HEAD
 1.6 24-May-2019  nonaka Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.5 22-Dec-2018  cherry This change modifies the mainbus(4) entry point for all x86 sub-archs
in the following way:

i) It provides a unified entry point in
x86/x86/mainbus.c:mainbus_attach()
ii) It carves out the preliminary bus attachment sequence that is
common to all sub-archs into
x86/x86/mainbus.c: x86_cpubus_attach()
iii) It consolidates the remaining pathways as internal callee
functions so that these may be called piecemeal if required. A
special usecase of this is XEN PVHVM which may need to call the
native configure path, the xen configure path, or both.
iv) It moves the driver private data structures from
i386/i386_mainbus.c to an x86/ level one. This allows for other
sub-arch's to do similar, if needed. (They do not at the moment).
v) For dom0 kernels, it enables 'acpi0 at mainbus?' and
'acpi0 at hypervisorbus'. This serves two purposes:
a) To demonstrate the possibility of dynamic configuration tree
traversal ordering changes.
b) To allow for the common acpi_check(self, "acpibus") call in
x86/mainbus.c to not barf when it is called from the dom0 attach
path. We allow for the acpi0 device to be a child of mainbus with
the changes to amd64/conf/XEN3_DOM0 and i386/conf/XEN3PAE_DOM0
without actually probing further in the code. This path will later
be pursued in a PVHVM boot codepath.

There should be no operative changes with this change. If there are,
please complain loudly.
 1.4 21-Sep-2016  jmcneill branches: 1.4.8; 1.4.14; 1.4.16;
Set hw.acpi.sleep.vbios when a non-HW accelerated VGA driver attaches.
If the VGA_POST option is present in the kernel the default value is 2,
otherwise 1. PR kern/50781

Reviewed by: agc, mrg
 1.3 18-Oct-2011  dyoung branches: 1.3.12; 1.3.30; 1.3.34;
Define some optional routines that will help device_register() to
register ISA & PCI devices. Add stub implementations of the routines.
 1.2 04-Feb-2006  jmmv Revert yesterday's change that attempted to fix the detection of the
boot device when using a Multiboot boot loader. It couldn't work because
these boot loaders do not pass a checksum of the disk so matchbiosdisk()
cannot really find any matches. I should have gone to sleep before
commiting...

Found by xtraeme@.
 1.1 03-Feb-2006  jmmv branches: 1.1.2;
When booting an i386 kernel with Multiboot, properly detect the boot device
by looking it up in the x86_alldisks table (instead of trying to match it
to 'wd*' manually).

In order to do this, move the cpu_rootconf function from x86 common code
to amd64 and i386 specific one. This way, i386 can do an extra step (call
the appropriate Multiboot code) in the appropriate place (after
x86_matchbiosdisks and before findroot()).
 1.1.2.1 22-Apr-2006  simonb Sync with head.
 1.3.34.1 04-Nov-2016  pgoyette Sync with HEAD
 1.3.30.1 05-Oct-2016  skrll Sync with HEAD
 1.3.12.1 03-Dec-2017  jdolecek update from HEAD
 1.4.16.1 10-Jun-2019  christos Sync with HEAD
 1.4.14.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.4.8.1 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.5 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.4 25-Dec-2007  perry branches: 1.4.6; 1.4.8; 1.4.10;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.3 04-Mar-2007  christos branches: 1.3.20; 1.3.26; 1.3.28; 1.3.32;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.2 27-Oct-2003  junyoung branches: 1.2.16; 1.2.54;
Nuke __P().
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.2.54.1 12-Mar-2007  rmind Sync with HEAD.
 1.2.16.2 21-Jan-2008  yamt sync with head
 1.2.16.1 03-Sep-2007  yamt sync with head.
 1.3.32.1 02-Jan-2008  bouyer Sync with HEAD
 1.3.28.1 26-Dec-2007  ad Sync with head.
 1.3.26.1 18-Feb-2008  mjf Sync with HEAD.
 1.3.20.1 09-Jan-2008  matt sync with HEAD
 1.4.10.1 16-May-2008  yamt sync with head.
 1.4.8.1 18-May-2008  yamt sync with head.
 1.4.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.32 30-Apr-2025  imil Introduce pvh_boot boolean to identify the real hypervisor when booting in PVH
mode.

As of now, sys/arch/x86/x86/identcpu.c / identify_hypervisor() returns in the
case of vm_guest being VM_GUEST_GENPVH, yet this VM type is not an actual
hypervisor but an information recorded in locore.S to drive boot method.
We need to investigate what type of hypervisor is really running the VM in
order to apply specifics, so instead of relying on vm_guest_is_pvh() which only
checks for VM_GUEST_XENPVH || VM_GUEST_GENPVH, pvh_boot informs on the boot
method while allowing to identify the real hypervisor.

Idea ok'd by bouyer@, tested on Xen domU, Xen dom0 with GENERIC PVH and
qemu GENERIC PVH boot.
 1.31 20-Aug-2022  riastradh branches: 1.31.10;
x86/bootinfo.h: Add include guard.
 1.30 21-Jun-2019  nonaka PR/54147: Increase BOOTINFO_MAXSIZE to 16Kib.

Some systems require a larger bootinfo size for memory descriptors.
 1.29 13-Apr-2018  nonaka branches: 1.29.2;
x86: Increase BOOTINFO_MAXSIZE to 8Kib.

Proposed on port-i386 and port-amd64 with no objections:
http://mail-index.netbsd.org/port-i386/2018/04/11/msg003692.html
http://mail-index.netbsd.org/port-amd64/2018/04/11/msg002697.html
 1.28 09-Nov-2017  christos branches: 1.28.2;
add "prekern" to the string list.
 1.27 07-Oct-2017  maxv Add a new option in libsa, to load dynamic binaries. A separate function
is used, and it does not break in any way the generic static loader. Then,
add a new "pkboot" command in the x86 bootloader, which boots a
GENERIC_KASLR kernel via the prekern. (See thread on tech-kern@.)
 1.26 14-Feb-2017  nonaka branches: 1.26.6;
x86: add e820 memory type.
 1.25 24-Jan-2017  nonaka Initial commit of native amd64 EFI boot loader.
 1.24 28-Jan-2016  christos branches: 1.24.2; 1.24.4;
Add support for grub to find the ACPI root table pointer via a bootinfo entry
from grub.
From: https://mail-index.netbsd.org/tech-kern/2014/05/22/msg017119.html
 1.23 30-Aug-2013  jmcneill branches: 1.23.6;
Add support for using a raw file-system image as memory disk root with
the x86 bootloader.
 1.22 16-May-2013  christos branches: 1.22.2;
Complete the dosparts -> mbrparts conversion. Only x86k new uses dosparts
because it also uses struct dos_partition.
 1.21 16-May-2013  christos Complete the dosparts -> mbrparts conversion. Only x86k new uses dosparts
because it also uses struct dos_partition.
 1.20 16-May-2013  christos Complete the dosparts -> mbrparts conversion. Only x86k new uses dosparts
because it also uses struct dos_partition.
 1.19 28-Nov-2011  tls branches: 1.19.8;

Add support for passing saved entropy (random seed file) to the kernel
from the bootloader. This can fix the problem of poor quality keys
for other kernel modules which call arc4random() early in kernel startup
(NFS startup, in particular, causes this).

We continue to rely on the etc/rc.d/random_seed script to save entropy
to the seed file at shutdown and erase the seed file at startup.

Boot loader support implemented only for i386 and amd64 ports for now but
it should be easy for other ports to do the same or similar.
 1.18 26-May-2011  uebayasi branches: 1.18.4;
Support userconf(4) command in boot(8)/boot.cfg(5) on i386/amd64.

From jmmv@, no objections seen in the proposed thread:

http://mail-index.netbsd.org/tech-kern/2009/01/22/msg004081.html
 1.17 06-Feb-2011  jmcneill add BI_MODULE_IMAGE boot module type
 1.16 24-Aug-2009  jmcneill branches: 1.16.4; 1.16.6; 1.16.8;
Pass the VBE mode number from the bootloader to the kernel, and then
make the ACPI wakecode aware of it. Restore the desired VBE mode on resume
when acpi_vbios_reset=1, so suspend/resume with genfb console will work.
 1.15 16-Feb-2009  jmcneill Kernel-side modifications for framebuffer console support on i386 and amd64.

* New BTINFO_FRAMEBUFFER kernel parameter to pass screen configuration
* Early attach support for framebuffer console
* Pass BTINFO_FRAMEBUFFER parameters to genfb in device_register
* Provide hooks to genfb to set VGA DAC palette in 8bpp mode
 1.14 09-Sep-2008  tron branches: 1.14.2; 1.14.8;
Remove duplicate definition of "bootinfo" structure.
Patch provided by Juan RP in PR kern/39495.
 1.13 02-May-2008  ad branches: 1.13.2; 1.13.6;
- Give x86 BIOS boot the ability to load new style modules and pass them
into the kernel. Based on a patch by jmcneill@, with many fixes and
improvements by me.

- Put MEMORY_DISK_DYNAMIC and MODULAR into the GENERIC kernels, so that
you can load miniroot.kmod from the boot blocks and boot into the
installer!
 1.12 25-Dec-2007  perry branches: 1.12.6; 1.12.8; 1.12.10;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.11 03-Feb-2006  jmmv branches: 1.11.46; 1.11.52; 1.11.56; 1.11.60;
Implement support for 'The Multiboot Specification' so that i386 kernels
can be booted directly from Multiboot-compliant boot loaders (e.g. GRUB).
See the added multiboot(8) manual page for more information.

No objections in tech-kern@; only positive comments.
 1.10 30-Dec-2005  jmmv branches: 1.10.2; 1.10.4;
Add a 'struct bootinfo' to represent the bootinfo structure used in the
kernel by x86 platforms (instead of a simple char *). This way, the code
in, e.g., lookup_bootinfo, is a bit easier to understand.

While here, move the lookup_bootinfo function used in x86 platforms (amd64,
i386 and xen) to a common file (x86/x86_machdep.c), as it was exactly the
same in all of them.
 1.9 06-Jul-2005  junyoung branches: 1.9.2;
BIOSDISK_EXT13INFO_V3 -> BIOSDISK_EXTINFO_V3
u_intNN_t -> uintNN_t
u_int -> unsigned int
Remove trailing spaces
 1.8 12-Jun-2005  dyoung Make disklabel(8) and fdisk(8) into "host tools " last step: build
and install ${TOOLDIR}/bin/${MACHINE_GNU_PLATFORM}-disklabel,
${TOOLDIR}/bin/${MACHINE_GNU_PLATFORM}-fdisk by "reaching over" to
the sources in ${NETBSDSRCDIR}/sbin/{disklabel fdisk}/.

To avoid clashes with a build-host's header files, especially on
*BSD, the host-tools versions of fdisk and disklabel search for
#includes such as disklabel.h, disklabel_acorn.h, disklabel_gpt.h,
and bootinfo.h in a new #includes namespace, nbinclude/. That is,
they #include <nbinclude/sys/disklabel.h>, <nbinclude/machine/disklabel.h>,
<nbinclude/sparc64/disklabel.h>, instead of <sys/disklabel.h> and
such. I have also updated the system headers to #include from
nbinclude/-space when HAVE_NBTOOL_CONFIG_H is #defined.
 1.7 04-Feb-2005  fvdl The bootinfo_wedge structure must be packed, or the 32bit alignments
used by the bootloader don't match the amd64 kernel.
 1.6 23-Oct-2004  thorpej branches: 1.6.4; 1.6.6;
Add support for passing booted wedge information to the kernel.
 1.5 24-Mar-2004  drochner remove license clauses 3 and 4 from my cpoyright notices
 1.4 27-Oct-2003  junyoung Nuke __P().
 1.3 08-Oct-2003  lukem Overhaul MBR handling (part 1):

<sys/bootblock.h>:
* Added definitions for the Master Boot Record (MBR) used by
a variety of systems (primarily i386), including the format
of the BIOS Parameter Block (BPB).
This information was cribbed from a variety of sources
including <sys/disklabel_mbr.h> which this is a superset of.

As part of this, some data structure elements and #defines
were renamed to be more "namespace friendly" and consistent
with other bootblocks and MBR documentation.
Update all uses of the old names to the new names.

<sys/disklabel_mbr.h>:
* Deprecated in favor of <sys/bootblock.h> (the latter is more
"host tool" friendly).

amd64 & i386:
* Renamed /usr/mdec/bootxx_dosfs to /usr/mdec/bootxx_msdos, to
be consistent with the naming convention of the msdosfs tools.

* Removed /usr/mdec/bootxx_ufs, as it's equivalent to bootxx_ffsv1
and it's confusing to have two functionally equivalent bootblocks,
especially given that "ufs" has multiple meanings (it could be
a synonym for "ffs", or the group of ffs/lfs/ext2fs file systems).

* Rework pbr.S (the first sector of bootxx_*):
+ Ensure that BPB (bytes 11..89) and the partition table
(bytes 446..509) do not contain code.
+ Add support for booting from FAT partitions if BOOT_FROM_FAT
is defined. (Only set for bootxx_msdos).
+ Remove "dummy" partition 3; if people want to installboot(8)
these to the start of the disk they can use fdisk(8) to
create a real MBR partition table...
+ Compile with TERSE_ERROR so it fits because of the above.
Whilst this is less user friendly, I feel it's important
to have a valid partition table and BPB in the MBR/PBR.

* Renamed /usr/mdec/biosboot to /usr/mdec/boot, to be consistent
with other platforms.

* Enable SUPPORT_DOSFS in /usr/mdec/boot (stage2), so that
we can boot off FAT partitions.

* Crank version of /usr/mdec/boot to 3.1, and fix some of the other
entries in the version file.

installboot(8) (i386):
* Read the existing MBR of the filesystem and retain the BIOS
Parameter Block (BPB) in bytes 11..89 and the MBR partition
table in bytes 446..509. (Previously installboot(8) would
trash those two sections of the MBR.)

mbrlabel(8):
* Use sys/lib/libkern/xlat_mbr_fstype.c instead of homegrown code
to map the MBR partition type to the NetBSD disklabel type.


Test built "make release" for i386, and new bootblocks verified to work
(even off FAT!).
 1.2 16-Apr-2003  dsl branches: 1.2.2;
Add definitions (#defined out) to pass the result of the v3.x bios
extended disk information request to the kernel.
Binary compatible with the existing code, disabled because I don't
have a system with a bios that supports the request.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.5 06-Feb-2005  skrll Sync with HEAD.
 1.2.2.4 02-Nov-2004  skrll Sync with HEAD.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.6.6.1 12-Feb-2005  yamt sync with head.
 1.6.4.1 29-Apr-2005  kent sync with -current
 1.9.2.2 21-Jan-2008  yamt sync with head
 1.9.2.1 21-Jun-2006  yamt sync with head.
 1.10.4.1 09-Sep-2006  rpaulo sync with head
 1.10.2.1 18-Feb-2006  yamt sync with head.
 1.11.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.11.56.1 26-Dec-2007  ad Sync with head.
 1.11.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.11.46.1 09-Jan-2008  matt sync with HEAD
 1.12.10.3 16-Sep-2009  yamt sync with head
 1.12.10.2 04-May-2009  yamt sync with head.
 1.12.10.1 16-May-2008  yamt sync with head.
 1.12.8.1 18-May-2008  yamt sync with head.
 1.12.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.12.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.6.1 19-Oct-2008  haad Sync with HEAD.
 1.13.2.1 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.14.8.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.14.8.3 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.14.8.2 01-Nov-2009  jym Sync with HEAD.
 1.14.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.14.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.16.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.16.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.16.4.2 31-May-2011  rmind sync with head
 1.16.4.1 05-Mar-2011  rmind sync with head
 1.18.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.4.1 17-Apr-2012  yamt sync with head
 1.19.8.3 03-Dec-2017  jdolecek update from HEAD
 1.19.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.8.1 23-Jun-2013  tls resync from head
 1.22.2.1 18-May-2014  rmind sync with head
 1.23.6.3 28-Aug-2017  skrll Sync with HEAD
 1.23.6.2 05-Feb-2017  skrll Sync with HEAD
 1.23.6.1 19-Mar-2016  skrll Sync with HEAD
 1.24.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.24.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.26.6.2 27-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1282):

sys/arch/x86/include/bootinfo.h: revision 1.30

PR/54147: Increase BOOTINFO_MAXSIZE to 16Kib.

Some systems require a larger bootinfo size for memory descriptors.
 1.26.6.1 14-Apr-2018  martin Pull up following revision(s) (requested by nonaka in ticket #753):

sys/arch/x86/include/bootinfo.h: revision 1.29

x86: Increase BOOTINFO_MAXSIZE to 8Kib.

Proposed on port-i386 and port-amd64 with no objections:
http://mail-index.netbsd.org/port-i386/2018/04/11/msg003692.html
http://mail-index.netbsd.org/port-amd64/2018/04/11/msg002697.html
 1.28.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.29.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.31.10.1 02-Aug-2025  perseant Sync with HEAD
 1.1 20-Aug-2022  riastradh x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
 1.21 17-Jul-2011  dyoung Good-bye bus.h. Don't install <machine/bus.h>.
 1.20 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.19 13-Feb-2009  bouyer branches: 1.19.2; 1.19.4;
Change bus_size_t from paddr_t to size_t. It doens't make sense to have
a 64bit bus_size_t on i386 as the address space is 32bits anyway.
With a 64bit bus_size_t we need a different bus_space.S for PAE and non-PAE.
 1.18 08-Feb-2009  bouyer branches: 1.18.2;
Apply patch proposed on port-amd64/port-i386, allowing to use a 64bit
bus_addr_t on i386PAE kernels:
change bus_addr_t to be a paddr_t (so its size follows paddr_t depending
on options PAE)
remplace bus_addr_t with vaddr_t where the value is used as a virtual address.

Difference with the proposed patch: cast to uintmax_t and use %jx in
printf() as suggested by Joerg.
 1.17 06-Nov-2008  dyoung Use NULL instead of (bus_dma_tag_t)0.
 1.16 28-Apr-2008  martin branches: 1.16.6; 1.16.8; 1.16.10; 1.16.14;
Remove clause 3 and 4 from TNF licenses
 1.15 17-Oct-2007  garbled branches: 1.15.16; 1.15.18; 1.15.20;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.14 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.13 04-Mar-2007  christos branches: 1.13.2; 1.13.10; 1.13.18; 1.13.20; 1.13.22;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 21-Feb-2007  mrg add a pair of new bus_dma(9) functions:
int _bus_dmatag_subregion(bus_dma_tag_t tag,
bus_addr_t min_addr,
bus_addr_t max_addr,
bus_dma_tag_t *newtag,
int flags)
void _bus_dmatag_destroy(bus_dma_tag_t tag)

that allow a (normally broken/limited) device to restrict the bus address
range it can talk to. this is used by bce(4) to limit DMA addresses to
1GB range, the maximum the chip can address.

all this is from Yorick Hardy <yhardy@uj.ac.za> with input from several
people on tech-kern.

XXX: bus_dma(9) needs an update still.
 1.11 16-Feb-2006  perry branches: 1.11.20;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.10 24-Dec-2005  perry branches: 1.10.2; 1.10.4; 1.10.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.9 16-Apr-2005  yamt branches: 1.9.2;
tweak x86 bus_dma code so that it can be used by xen port.

- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.8 09-Mar-2005  matt branches: 1.8.2;
Add a dm_maxsegsz public member to bus_dmamap_t. This allows a user of the API
to select the maximum segment size for each bus_dmamap_load (up to the maxsegsz
supplied to bus_dmamap_create). dm_maxsegsz is reset to the value supplied to
bus_dmamap_create when the dmamap is unloaded.
 1.7 20-Jun-2004  thorpej branches: 1.7.4; 1.7.6;
Remove the "ID" component of the x86 bus_dma flags, since these are no
longer "ISA DMA" specific flags.
 1.6 05-Jun-2004  yamt unexport following x86 bus_dma internal functions.
_bus_dma_alloc_bouncebuf
_bus_dma_free_bouncebuf
_bus_dmamap_load_buffer
 1.5 14-Jan-2004  yamt issue memory read barrier for BUS_DMASYNC_POSTREAD operation.
PR/21665 from Stephan Uphoff.
 1.4 27-Oct-2003  junyoung Nuke __P().
 1.3 15-Jun-2003  fvdl branches: 1.3.2;
Handle 64bit DMA addresses on PCI for platforms that can (currently only
enabled on amd64). Add a dmat64 field to various PCI attach structures,
and pass it down where needed. Implement a simple new function called
pci_dma64_available(pa) to test if 64bit DMA addresses may be used.
This returns 1 iff _PCI_HAVE_DMA64 is defined in <machine/pci_machdep.h>,
and there is more than 4G of memory.
 1.2 07-May-2003  fvdl Generalize bounce buffers, and use them for 32 bit PCI if needed.
Make ALLOCNOW the default iff bouncing might be needed (this has
no effect on i386 because ISA DMA devices already had to use
ALLOCNOW, and PCI isn't bounced (yet), since we don't do > 4G
at this point for i386.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.4 01-Apr-2005  skrll Sync with HEAD.
 1.3.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.7.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.4.1 29-Apr-2005  kent sync with -current
 1.8.2.1 21-Apr-2005  tron Pull up revision 1.9 (requested by yamt in ticket #175):
tweak x86 bus_dma code so that it can be used by xen port.
- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.9.2.4 27-Oct-2007  yamt sync with head.
 1.9.2.3 03-Sep-2007  yamt sync with head.
 1.9.2.2 26-Feb-2007  yamt sync with head.
 1.9.2.1 21-Jun-2006  yamt sync with head.
 1.10.6.1 22-Apr-2006  simonb Sync with head.
 1.10.4.1 09-Sep-2006  rpaulo sync with head
 1.10.2.1 18-Feb-2006  yamt sync with head.
 1.11.20.2 12-Mar-2007  rmind Sync with HEAD.
 1.11.20.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.13.22.1 06-Oct-2007  yamt sync with head.
 1.13.20.1 06-Nov-2007  matt sync with HEAD
 1.13.18.1 02-Oct-2007  joerg Sync with HEAD.
 1.13.10.1 03-Oct-2007  garbled Sync with HEAD
 1.13.2.1 09-Oct-2007  ad Sync with head.
 1.15.20.3 11-Aug-2010  yamt sync with head.
 1.15.20.2 04-May-2009  yamt sync with head.
 1.15.20.1 16-May-2008  yamt sync with head.
 1.15.18.1 18-May-2008  yamt sync with head.
 1.15.16.2 17-Jan-2009  mjf Sync with HEAD.
 1.15.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.16.14.1 21-Apr-2010  matt sync to netbsd-5
 1.16.10.2 29-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/include/bus.h: revision 1.19
Change bus_size_t from paddr_t to size_t. It doens't make sense to have
a 64bit bus_size_t on i386 as the address space is 32bits anyway.
With a 64bit bus_size_t we need a different bus_space.S for PAE and non-PAE.
 1.16.10.1 29-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/include/bus.h: revision 1.18
sys/arch/x86/include/isa_machdep.h: revision 1.7
sys/arch/x86/x86/bus_space.c: revision 1.21
Apply patch proposed on port-amd64/port-i386, allowing to use a 64bit
bus_addr_t on i386PAE kernels:
change bus_addr_t to be a paddr_t (so its size follows paddr_t depending
on options PAE)
remplace bus_addr_t with vaddr_t where the value is used as a virtual address.
Difference with the proposed patch: cast to uintmax_t and use %jx in
printf() as suggested by Joerg.
 1.16.8.2 03-Mar-2009  skrll Sync with HEAD.
 1.16.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.16.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.18.2.4 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.18.2.3 24-Oct-2010  jym Sync with HEAD
 1.18.2.2 01-Nov-2009  jym Sync with HEAD.
 1.18.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.19.4.1 30-May-2010  rmind sync with head
 1.19.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.6 21-Jan-2021  kre PRIxXXXX (etc) definitions should not include the %

Will fix anything this ends up breaking later.
 1.5 14-Nov-2019  maxv branches: 1.5.8;
Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.4 04-Oct-2019  maxv Add DMA instrumentation in KASAN. We note the original buffer and length in
the map, and check the buffer on each bus_dmamap_sync. This allows us to
find DMA buffer overflows and UAFs, which couldn't be found before because
the device accesses to memory are outside of KASAN's control.
 1.3 23-Sep-2019  skrll Provide PRIxBUSADDR, PRIxBUSSIZE, PRIuBUSSIZE, and PRIxBSH for all arches
to follow arm and (generic) mips.

Reviewed by christos.
 1.2 25-Aug-2011  dyoung branches: 1.2.2; 1.2.56;
Add to x86 bus_space_tag_t a member, bst_exists, that tells whether a
routine is overridden by this tag or by any ancestral tag.
 1.1 01-Jul-2011  dyoung Per discussion at
<http://mail-index.netbsd.org/tech-kern/2010/04/02/msg007941.html>,
divide each machine's bus.h into bus_defs.h (constants & data types)
and bus_funcs.h (macro implementations of bus_space(9) routines and MD
prototypes).

Note that some bus_space(9) routines' implementation will move to .c
files from inline subroutines or macros in .h files.

I've only made the split for machine architectures where there is PCI.
All of the non-PCI-having architectures will require a similar split.

These #include files are not referenced by any (committed) Makefiles or
header files, yet. Changes to Makefiles, to <sys/bus.h>, and to some
more machine-dependent files will dribble in before I throw the switch.
 1.2.56.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.2.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.2.2.1 25-Aug-2011  jym file bus_defs.h was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.5.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.1 01-Jul-2011  dyoung branches: 1.1.2;
Per discussion at
<http://mail-index.netbsd.org/tech-kern/2010/04/02/msg007941.html>,
divide each machine's bus.h into bus_defs.h (constants & data types)
and bus_funcs.h (macro implementations of bus_space(9) routines and MD
prototypes).

Note that some bus_space(9) routines' implementation will move to .c
files from inline subroutines or macros in .h files.

I've only made the split for machine architectures where there is PCI.
All of the non-PCI-having architectures will require a similar split.

These #include files are not referenced by any (committed) Makefiles or
header files, yet. Changes to Makefiles, to <sys/bus.h>, and to some
more machine-dependent files will dribble in before I throw the switch.
 1.1.2.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.1.2.1 01-Jul-2011  jym file bus_funcs.h was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.16 22-Jan-2022  skrll Ensure bus_dmatag_subregion is called with an inclusive max_addr
everywhere.
 1.15 22-Feb-2020  chs remove some unnecessary includes of internal UVM headers.
 1.14 01-Sep-2011  christos branches: 1.14.54; 1.14.60;
Add bus_dma overrides. From dyoung
 1.13 31-Aug-2011  dyoung Add override members to x86_bus_dma_tag.
 1.12 12-Nov-2010  uebayasi Pull in uvm/uvm.h for VM_PAGE_TO_PHYS().
 1.11 28-Apr-2008  martin branches: 1.11.14; 1.11.22;
Remove clause 3 and 4 from TNF licenses
 1.10 17-Oct-2007  garbled branches: 1.10.16; 1.10.18; 1.10.20;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.9 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.8 04-Mar-2007  christos branches: 1.8.2; 1.8.10; 1.8.18; 1.8.20; 1.8.22;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.7 21-Feb-2007  mrg add a pair of new bus_dma(9) functions:
int _bus_dmatag_subregion(bus_dma_tag_t tag,
bus_addr_t min_addr,
bus_addr_t max_addr,
bus_dma_tag_t *newtag,
int flags)
void _bus_dmatag_destroy(bus_dma_tag_t tag)

that allow a (normally broken/limited) device to restrict the bus address
range it can talk to. this is used by bce(4) to limit DMA addresses to
1GB range, the maximum the chip can address.

all this is from Yorick Hardy <yhardy@uj.ac.za> with input from several
people on tech-kern.

XXX: bus_dma(9) needs an update still.
 1.6 28-Aug-2006  bouyer branches: 1.6.8;
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
 1.5 16-Feb-2006  perry branches: 1.5.2; 1.5.12;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.4 24-Dec-2005  perry branches: 1.4.2; 1.4.4; 1.4.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.3 22-Aug-2005  bouyer branches: 1.3.6;
Rename _PRIVATE_BUS_DMAMEM_ALLOC_RANGE to _BUS_DMAMEM_ALLOC_RANGE for
consistency with other macros defined in bus_private.h. Pointed out by
YAMAMOTO Takashi.
 1.2 20-Aug-2005  bouyer More adjustements to deal with Xen's physical <=> machine addresses mappings:
- Allow _bus_dmamem_alloc_range to be provided from external source:
Use a _PRIVATE_BUS_DMAMEM_ALLOC_RANGE macro, defined to
_bus_dmamem_alloc_range by default.
- avail_end is the end of the physical address range. Define a macro
_BUS_AVAIL_END (defined by default to avail_end) and use it instead.
 1.1 16-Apr-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6;
add files which i forgot to add with arch/x86/x86/bus_dma.c rev.1.21.
 1.1.6.5 27-Oct-2007  yamt sync with head.
 1.1.6.4 03-Sep-2007  yamt sync with head.
 1.1.6.3 26-Feb-2007  yamt sync with head.
 1.1.6.2 30-Dec-2006  yamt sync with head.
 1.1.6.1 21-Jun-2006  yamt sync with head.
 1.1.4.2 29-Apr-2005  kent sync with -current
 1.1.4.1 16-Apr-2005  kent file bus_private.h was added on branch kent-audio2 on 2005-04-29 11:28:29 +0000
 1.1.2.5 16-Sep-2006  ghen Pull up following revision(s) (requested by bouyer in ticket #1510):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.7
sys/arch/xen/x86/xen_bus_dma.c: revision 1.8
sys/arch/x86/include/bus_private.h: revision 1.6
sys/arch/x86/x86/bus_dma.c: revision 1.30
sys/arch/xen/include/bus_private.h: revision 1.7
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.1.2.4 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #697):
sys/arch/x86/x86/bus_dma.c: revision 1.23
sys/arch/x86/include/bus_private.h: revision 1.3
sys/arch/xen/include/bus_private.h: revision 1.3
Rename _PRIVATE_BUS_DMAMEM_ALLOC_RANGE to _BUS_DMAMEM_ALLOC_RANGE for
consistency with other macros defined in bus_private.h. Pointed out by
YAMAMOTO Takashi.
 1.1.2.3 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #695):
sys/arch/x86/x86/bus_dma.c: revision 1.22
sys/arch/x86/include/bus_private.h: revision 1.2
More adjustements to deal with Xen's physical <=> machine addresses mappings:
- Allow _bus_dmamem_alloc_range to be provided from external source:
Use a _PRIVATE_BUS_DMAMEM_ALLOC_RANGE macro, defined to
_bus_dmamem_alloc_range by default.
- avail_end is the end of the physical address range. Define a macro
_BUS_AVAIL_END (defined by default to avail_end) and use it instead.
 1.1.2.2 21-Apr-2005  tron Pull up revision 1.1 (requested by yamt in ticket #175):
add files which i forgot to add with arch/x86/x86/bus_dma.c rev.1.21.
 1.1.2.1 16-Apr-2005  tron file bus_private.h was added on branch netbsd-3 on 2005-04-21 18:43:01 +0000
 1.3.6.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.6.1 22-Aug-2005  skrll file bus_private.h was added on branch ktrace-lwp on 2005-11-10 14:00:20 +0000
 1.4.6.1 22-Apr-2006  simonb Sync with head.
 1.4.4.1 09-Sep-2006  rpaulo sync with head
 1.4.2.1 18-Feb-2006  yamt sync with head.
 1.5.12.1 14-Sep-2006  riz Pull up following revision(s) (requested by bouyer in ticket #150):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.7
sys/arch/xen/x86/xen_bus_dma.c: revision 1.8
sys/arch/x86/include/bus_private.h: revision 1.6
sys/arch/x86/x86/bus_dma.c: revision 1.30
sys/arch/xen/include/bus_private.h: revision 1.7
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.5.2.1 03-Sep-2006  yamt sync with head.
 1.6.8.2 12-Mar-2007  rmind Sync with HEAD.
 1.6.8.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.8.22.1 06-Oct-2007  yamt sync with head.
 1.8.20.1 06-Nov-2007  matt sync with HEAD
 1.8.18.1 02-Oct-2007  joerg Sync with HEAD.
 1.8.10.1 03-Oct-2007  garbled Sync with HEAD
 1.8.2.1 09-Oct-2007  ad Sync with head.
 1.10.20.1 16-May-2008  yamt sync with head.
 1.10.18.1 18-May-2008  yamt sync with head.
 1.10.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.11.22.1 05-Mar-2011  rmind sync with head
 1.11.14.1 10-Jan-2011  jym Sync with HEAD
 1.14.60.1 29-Feb-2020  ad Sync with head.
 1.14.54.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.1 26-Sep-2007  ad branches: 1.1.2; 1.1.4; 1.1.6; 1.1.10; 1.1.14; 1.1.28; 1.1.30; 1.1.32;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.1.32.1 16-May-2008  yamt sync with head.
 1.1.30.1 18-May-2008  yamt sync with head.
 1.1.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.14.2 06-Nov-2007  matt sync with HEAD
 1.1.14.1 26-Sep-2007  matt file busdefs.h was added on branch matt-armv6 on 2007-11-06 23:23:33 +0000
 1.1.10.2 27-Oct-2007  yamt sync with head.
 1.1.10.1 26-Sep-2007  yamt file busdefs.h was added on branch yamt-lazymbuf on 2007-10-27 11:28:54 +0000
 1.1.6.2 09-Oct-2007  ad Sync with head.
 1.1.6.1 26-Sep-2007  ad file busdefs.h was added on branch vmlocking on 2007-10-09 13:38:41 +0000
 1.1.4.2 06-Oct-2007  yamt sync with head.
 1.1.4.1 26-Sep-2007  yamt file busdefs.h was added on branch yamt-x86pmap on 2007-10-06 15:33:31 +0000
 1.1.2.2 02-Oct-2007  joerg Sync with HEAD.
 1.1.2.1 26-Sep-2007  joerg file busdefs.h was added on branch jmcneill-pm on 2007-10-02 18:27:49 +0000
 1.31 09-Dec-2021  msaitoh Print TLB message consistently to improve readability.

Example:
cpu0: L2 cache: 256KB 64B/line 4-way
cpu0: L3 cache: 4MB 64B/line 16-way
cpu0: 64B prefetching
-cpu0: ITLB: 64 4KB entries 8-way, 2M/4M: 8 entries
+cpu0: ITLB: 64 4KB entries 8-way, 8 2M/4M entries
cpu0: DTLB: 64 4KB entries 4-way, 4 1GB entries 4-way
cpu0: L2 STLB: 1536 4KB entries 6-way
cpu0: Initial APIC ID 0
 1.30 07-Oct-2021  msaitoh Move some common functions into x86/identcpu_subr.c. No functional change.
 1.29 27-Sep-2021  msaitoh Add Load Only TLB and Store Only TLB.
 1.28 26-Jul-2019  msaitoh branches: 1.28.2;
- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.27 24-Jul-2019  msaitoh It seems that AMD zen2's CPUID 0x80000006 leaf's spec has changed.
The EDX register's acsociativity field has 9. In the latest available document,
it's a reserved value. I have no access to zen2's document, but many websites
say that the acsociativity is 16. Add it.
 1.26 12-Mar-2018  msaitoh branches: 1.26.2;
AMD L3 cache association bitfield is not 8bit but 4bit like others association
bitfields.
 1.25 12-Mar-2018  msaitoh Add 3way and 6way of L2 cache or TLB on AMD CPU.
 1.24 09-Mar-2018  msaitoh Add yet another Shared L2 TLB (2M/4M pages).

XXX need redesign.
 1.23 05-Mar-2018  msaitoh branches: 1.23.2;
Add Intel Deterministic Address Translation Parameter Leaf(0x18) definitions.
 1.22 27-Apr-2016  msaitoh branches: 1.22.10;
Add new desc 0x64 and 0xc4.
 1.21 08-Jan-2016  msaitoh Index 0x6c is not 126 entries but 128 entries. The old value was from
previous SDM.
 1.20 19-Oct-2015  msaitoh Add some TLB entries from the latest Intel SDM. This change might be incorrect
because the document itself is very strange.
 1.19 09-Sep-2014  msaitoh branches: 1.19.2;
Add new cache descriptor (0xc3) from the latest Intel SDM.
 1.18 03-Jul-2014  msaitoh branches: 1.18.2;
Fix some entries:
- Desc 0x55 and 0xb1 are Instruction TLB but not fixed to 4K.
- Desc 0x5a and 0xc0 are Data TLB but not fixed to 4K.
- Desc 0x57 and 0x59 are 4K fixed DTLB.
- Fix string of desc 0xc2 and it's not fixed to 4K.
- Desc 0xca is 4K fixed L2 shared TLB.
- Add desc 0xa0.

BUG: A lot of CPUs have multiple CAI_DTLB and/or CAI_DTLB2 entries. Currently
TLB info is indexed in ci_cinfo[CAI_COUNT], so some info is overwritten.

Nowadays CPUs have very complexed TLBs. It's hard to manage with CAI_* index.
We should think to separate TLB info structure from ci_cinfo[CAI_COUNT]
in struct cpu_info.
 1.17 28-Oct-2013  msaitoh branches: 1.17.2;
Support prefetch size.
 1.16 14-Sep-2013  msaitoh Add Shared L2 TLB and some cache and tlb entries from the latest document.
 1.15 17-Jul-2013  msaitoh Add some new TLB and cache entries from document (Table 3-22 Encoding of CPUID
Leaf 2 Descriptors, Intel 64 and IA-32 Architectures Software Developer's
Manual Vol. 2A.)
 1.14 17-Jul-2013  msaitoh Fix 0x0d's DCACHE entry and 0xeb's L3CACHE entry from the document
(Table 3-22 Encoding of CPUID Leaf 2 Descriptors, Intel 64 and IA-32
Architectures Software Developer's Manual Vol. 2A.)
 1.13 04-Dec-2011  chs branches: 1.13.2; 1.13.6; 1.13.10;
add info on L2 TLBs and 1GB pages.
 1.12 13-May-2009  pgoyette branches: 1.12.12; 1.12.16;
Fix toyp in previous. Pointed out by snj@
 1.11 13-May-2009  pgoyette 1. Extend CPU probe of Intel processors to handle extended-models. This
allows us to properly identify new Intel 45nm processors, Core i7,
Atom, and the 45nm Xeon MP.

2. Properly decode several new Intel cache descriptors, as listed in the
most recent (March 2009) edition of Intel's Application Note 485.

3. Convert decode of the various features masks to use the newly added
snprintb_m(3) routine.

Addresses my PR bin/41289
Addresses my PR bin/41290
 1.10 15-Apr-2009  lukem Constify a userland-only member.
 1.9 30-May-2008  christos branches: 1.9.6; 1.9.8; 1.9.12; 1.9.16;
don't undef __CI_TBL before we use it :-)
 1.8 30-May-2008  christos - fix an amd cache entry.
- merge tables
- support phenom
from Paul Goyette
 1.7 30-May-2008  christos PR/38722: Paul Goyette: Share cacheinfo information
 1.6 11-May-2008  cegger print L3 and TLB cache information for AMD Barcelona/Phenom
 1.5 11-May-2008  ad Simplify x86 identcpu code, and share between i386/amd64.
 1.4 16-Apr-2005  yamt branches: 1.4.82; 1.4.84; 1.4.86; 1.4.88;
make multi inclusion protection macros consistent.
 1.3 17-Aug-2004  briggs branches: 1.3.4; 1.3.10;
Get correct cache information for earlier VIA C3 models.
Mostly from PR kern/26689 submitted by Michael van Elst.
 1.2 08-Aug-2004  briggs VIA C3 cache info.
 1.1 25-Apr-2003  fvdl branches: 1.1.2; 1.1.4;
Share some common cache info cpuid code between i386 and x86_64.
 1.1.4.2 22-Aug-2004  tron Pull up revision 1.3 (requested by briggs in ticket #770):
Get correct cache information for earlier VIA C3 models.
Mostly from PR kern/26689 submitted by Michael van Elst.
 1.1.4.1 12-Aug-2004  jmc Pullup rev 1.2 (requested by briggs in ticket #742)

Enable VIA C3 CPU support
 1.1.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.1.2.1 12-Aug-2004  skrll Sync with HEAD.
 1.3.10.1 21-Apr-2005  tron Pull up revision 1.4 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.3.4.1 29-Apr-2005  kent sync with -current
 1.4.88.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.4.86.3 16-May-2009  yamt sync with head
 1.4.86.2 04-May-2009  yamt sync with head.
 1.4.86.1 16-May-2008  yamt sync with head.
 1.4.84.2 04-Jun-2008  yamt sync with head
 1.4.84.1 18-May-2008  yamt sync with head.
 1.4.82.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.16.1 21-Apr-2010  matt sync to netbsd-5
 1.9.12.3 01-Nov-2009  jym Sync with HEAD.
 1.9.12.2 31-May-2009  jym Sync with HEAD.
 1.9.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.9.8.1 18-May-2009  bouyer Pull up following revision(s) (requested by pgoyette in ticket #761):
sys/arch/x86/include/cacheinfo.h: revisions 1.11, 1.12
usr.sbin/cpuctl/arch/i386.c: revisions 1.18, 1.19 via patch
1. Extend CPU probe of Intel processors to handle extended-models. This
allows us to properly identify new Intel 45nm processors, Core i7,
Atom, and the 45nm Xeon MP.
2. Properly decode several new Intel cache descriptors, as listed in the
most recent (March 2009) edition of Intel's Application Note 485.
Addresses my PR bin/41289
Addresses my PR bin/41290
 1.9.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.12.16.1 18-Feb-2012  mrg merge to -current.
 1.12.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.12.1 17-Apr-2012  yamt sync with head
 1.13.10.2 18-May-2014  rmind sync with head
 1.13.10.1 28-Aug-2013  rmind sync with head
 1.13.6.2 03-Dec-2017  jdolecek update from HEAD
 1.13.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.2.1 29-Dec-2014  martin Pullup the following revisions, requested by msaitoh in #1219:

sys/arch/x86/include/cacheinfo.h 1.14-1.19

Update Intel's cache and TLB descripotr table. This changes the number
of page coloring on some CPUs.
- Add Shared L2 TLB.
- Support prefetch size.
- Add some new TLB and cache entries from the document.
- Fix some entries:
- Fix 0x0d's DCACHE entry and 0xeb's L3CACHE entry.
- Desc 0x55 and 0xb1 are Instruction TLB but not fixed to 4K.
- Desc 0x5a and 0xc0 are Data TLB but not fixed to 4K.
- Desc 0x57 and 0x59 are 4K fixed DTLB.
- Fix string of desc 0xc2 and it's not fixed to 4K.
- Desc 0xca is 4K fixed L2 shared TLB.
 1.17.2.1 10-Aug-2014  tls Rebase.
 1.18.2.4 09-Oct-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #1636):
sys/arch/x86/include/cacheinfo.h: 1.23-1.26
sys/arch/x86/include/cpu.h: 1.70
sys/arch/x86/include/specialreg.h: 1.91-1.93,1.98,1.100,1.102-1.124,1.126,1.130 via patch
sys/arch/x86/x86/cpu_topology.c: 1.10
sys/arch/x86/x86/identcpu.c: 1.56-1.57,1.70 via patch
usr.sbin/cpuctl/arch/i386.c: 1.71,1.75-1.79,1.81-1.85 via patch
Add some register definitions for x86:
- Add CLWB bit.
- Fix a few (unused) MSR values, and add some bit definitions of
MSR_EFER from Murray Armfield in PR#42861.
- CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify
comments and snprintb() string.
- Define CPUID Fn00000001 %ebx bits and use them.
No functional change.
- Add Structured Extended Flags Enumeration Leaf's bit definitions:
AVX512_{IFMA,VBMI2,VNNI,BITALG,VPOPCNTDQ,4VNNIW,4FMAPS},GFNI&VAES.
- Add Turbo Boost Max Technology 3.0 bit.
- Add AMD SVM features definitions.
- Add Intel cpuid 7 %edx IBRS and STIBP bit definitions.
- Fix swapped comments for EFER LME and LMA
- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add MSR_IA32_ARCH_CAPABILITIES definition.
- Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.
- Add Intel Deterministic Address Translation Parameter Leaf(0x18)
definitions.
- s/CLFUSH/CLFLUSH/
- Add AMD's Disable Indirect Branch Predictor bit definition.
- Add the MSR bits definitions for IBRS, STIBP and IBPB.
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.
- Add AMD's CPUID Fn80000001 %edx MMX and FXSR bit definitions.
- Add RDCL_NO and IBRS_ALL.
- Add SSBD and RSBA bit definitions.
- Add AMD's SSB bit definitions for F15H, F16H and F17H.
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
- Add yet another Shared L2 TLB (2M/4M pages).
- Add 3way and 6way of L2 cache or TLB on AMD CPU.
- AMD L3 cache association bitfield is not 8bit but 4bit like others
association bitfields.
- Sort entries. No functional change.
- Modify comment, fix typo in comment and add comment.
cpuctl(8):
- Add detection for Quark X1000, Xeon E5 v4, E7 v4,
Core i7-69xx Extreme Edition, Xeon Scalable (Skylake),
Xeon Phi [357]200 (Knights Landing), Atom (Goldmont),
Atom (Denverton), Future Core (Cannon Lake), Atom (Goldmont Plus),
Xeon Phi 7215, 7285 and 7295 (Knights Mill) and
7th or 8th gen Core (Kaby Lake, Coffee Lake).
- Print Structured Extended Feature leaf Fn0000_0007 %ebx on AMD,too.
- Print Fn0000_0007 %ecx on Intel.
- Print Intel cpuid 7 %edx.
- Parse the TLB info from `cpuid leaf 18H' on Intel processor.
- Use aprint_error_dev() for error output.
 1.18.2.3 08-Dec-2016  snj Pull up following revision(s) (requested by msaitoh in ticket #1285):
sys/arch/x86/include/cacheinfo.h: revision 1.22
sys/arch/x86/include/specialreg.h: revisions 1.87 and 1.90
usr.sbin/cpuctl/arch/i386.c: revisions 1.72-1.74
Changes for x86's cpuctl(8):
- Add Quark X1000, Xeon E[57] v4, Core i7-69xx Extreme, 7th gen Core,
Denverton, Xeon Phi [357]200, Future Xeon and Future Xeon Phi.
- Add SGX, UMIP, RDPID, SGXLC, AVX512DQ, AVX512BW and AVX512VL bit.
- Fix the bit location of CLFLUSHOPT.
- Add new TLB descriptor 0x64 and 0xc4.
 1.18.2.2 06-Mar-2016  martin branches: 1.18.2.2.2;
Pull up the following changes, requested by msaitoh in #1117:

sys/arch/x86/include/cacheinfo.h 1.20-1.21
sys/arch/x86/include/specialreg.h 1.83-1.86
usr.sbin/cpuctl/arch/i386.c 1.67-1.70

Changes for x86's cpuctl(8):
- Add some TLB information (index 0x6a-0x6d).
- Add Hardware-Controlled Performance States (HWP) bits, FPU Data
Pointer Updated Only bit and CLFLUSHOPT bit.
- Add some AMD's bit definitions from "BIOS and Kernel Developer(BKDG)
for AMD Family 15h Models 60h-6Fh Processors".
- Add Xeon E5-4600 v3,
- Add Xeon E3-1200 v4 and v5.
- Add 6th gen Core, Xeon E3-1500 v5 and Xeon D-1500.
- Change CPU family 0x1c from "Atom Family" to "45nm Atom Family"
 1.18.2.1 12-Dec-2014  martin Pull up following revision(s) (requested by msaitoh in ticket #310):
sys/arch/x86/include/specialreg.h: revision 1.79-1.80
usr.sbin/cpuctl/arch/i386.c: revision 1.59
sys/arch/x86/include/cacheinfo.h: revision 1.19

Update some cpuid related values:
- Add XSAVECC, XGETBV, XSAVES, SMAP and PQE
- Change XINUSE to XGETBV
- Add new cache descripter value (0xc3)
- Update signatures for the follwing CPUs:
- Core M-5xxx
- Core i7 Extreme
- Future Core (0x4e)
- Future Xeon (0x56)
 1.18.2.2.2.1 18-Jan-2017  skrll Sync with netbsd-5
 1.19.2.3 29-May-2016  skrll Sync with HEAD
 1.19.2.2 19-Mar-2016  skrll Sync with HEAD
 1.19.2.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.22.10.5 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1721:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.22.10.4 16-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1338):

usr.sbin/cpuctl/arch/i386.c: revision 1.104
sys/arch/x86/x86/identcpu.c: revision 1.93
sys/arch/x86/include/cacheinfo.h: revision 1.28
sys/arch/x86/include/specialreg.h: revision 1.150

- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.22.10.3 16-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1338):

sys/arch/x86/include/cacheinfo.h: revision 1.27
sys/arch/x86/x86/identcpu.c: revision 1.74

Handle more Vortex CPU's from Andrius V.
While here refactor the code to make it smaller.

-

It seems that AMD zen2's CPUID 0x80000006 leaf's spec has changed.
The EDX register's acsociativity field has 9. In the latest available document,
it's a reserved value. I have no access to zen2's document, but many websites
say that the acsociativity is 16. Add it.

-

- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.22.10.2 09-Apr-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #715):

sys/arch/x86/include/cacheinfo.h: revision 1.24-1.26
usr.sbin/cpuctl/arch/i386.c: revision 1.81-1.84

- Parse the TLB info from `cpuid leaf 18H' on Intel processor. Currently,
this change doesn't decode perfectly. Tested with Gemini Lake. It has
two L2 Shared TLB. One is 4MB and another is 2MB/4MB but former isn't
printed yet:
cpu0: ITLB 1 4KB entries 48-way
cpu0: DTLB 1 4KB entries 32-way
cpu0: L2 STLB 8 4MB entries 4-way
Need some rework for struct x86_cache_info.
- Use aprint_error_dev() for error output.
Calculate way and number of entries correctly from CPUID leaf 18H.
Add yet another Shared L2 TLB (2M/4M pages).
XXX need redesign.

Add 3way and 6way of L2 cache or TLB on AMD CPU.
AMD L3 cache association bitfield is not 8bit but 4bit like others association
bitfields.

From the latest Intel SDM:
- Add Xeon Phi 7215, 7285 and 7295
- Add Coffee Lake
 1.22.10.1 16-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #633):
sys/arch/x86/include/specialreg.h: revision 1.107
sys/arch/x86/include/specialreg.h: revision 1.108
sys/arch/x86/include/specialreg.h: revision 1.109
sys/arch/x86/include/cacheinfo.h: revision 1.23
sys/arch/x86/include/specialreg.h: revision 1.110
sys/arch/x86/include/specialreg.h: revision 1.111
sys/arch/x86/include/specialreg.h: revision 1.112
sys/arch/x86/include/specialreg.h: revision 1.113
sys/arch/x86/include/specialreg.h: revision 1.114
usr.sbin/cpuctl/arch/i386.c: revision 1.79
sys/arch/x86/x86/identcpu.c: revision 1.70
sys/arch/x86/include/specialreg.h: revision 1.106

Add comment.

Add Intel cpuid 7 %edx IBRS(IBPB Speculation Control) and
STIBP(STIBP Speculation Control) from OpenBSD.

Print Intel cpuid 7 %edx.

Example output of cpuctl -v identify 0:
+cpu0: 00000007: 00000000 000027ab 00000000 0c000000
(snip)
+cpu0: SEF edx 0xc000000<IBRS,STIBP>

fix swapped comments for EFER LME and LMA

- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add comment.
Add MSR_IA32_ARCH_CAPABILITIES definition.

Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.

Add Intel Deterministic Address Translation Parameter Leaf(0x18) definitions.

Sort entries. No functional change.

s/CLFUSH/CLFLUSH/
No functional change.
 1.23.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.26.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.28.2.1 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1396:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.140 24-Apr-2025  riastradh amd64: Allocate FPU save state outside pcb if it's too large.

We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as
large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte
state.

We only do this for user threads, and only on machines where it's
necessary, to avoid incurring much overhead. There is still a tiny
bit of overhead when saving and restoring the FPU state by using a
pointer indirection instead of arithmetic indirection for access to
struct pcb::pcb_savefpu, but this is probably a drop in the bucket
compared to the memory traffic incurred by the FPU state save/restore
anyway.

For now, these paths are mostly disabled on i386. We could enable
them but it will require either rewriting cpu_uarea_alloc/free for
i386, or adopting a guard page like amd64 does, which might be costly
and so should be undertaken only with some thought and care. And
since Intel AMX instructions only work in 64-bit mode, it's not
likely to be useful on i386.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu

These changes, as a side effect, may fix:

PR kern/57258: kthread_fpu_enter/exit problem

by making sure to allocate an FPU save space that is large enough to
guarantee fpu_kern_enter/leave work safely, instead of just using a
union savefpu object on the stack (which, at 576 bytes, may be too
small on some machines, particularly with AVX512 requiring ~2.5K).
(But we'll have to do some extra work with kthread_fpu_enter/exit_md
-- if we try doing them again on x86 -- to actually allocate the
separate pcb on these machines!)
 1.139 22-Apr-2025  imil NVMM hypervisor identification, KVM and GenPVH identification fixes

arch/x86/include/cpu.h, arch/x86/x86/identcpu.c: Enable NVMM hypervisor
discovery
arch/x86/x86/identcpu.c: Fix vm_guest_t for KVM in vm_system_products
iarch/x86/x86/x86_machdep.c: Add NVMM and GenPVH in vm_guest_name
 1.138 06-Dec-2024  bouyer Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.137 02-Dec-2024  bouyer Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.
 1.136 01-Aug-2023  riastradh branches: 1.136.6;
xen: Report when hardclock jump exceeds timecounter(9) limit.
 1.135 13-Jul-2023  riastradh xen: Record event when local view of timecounter is behind global.
 1.134 13-Jul-2023  riastradh Break cycle by using `struct kmutex *' instead of `kmutex_t *'.

sys/sched.h included sys/mutex.h
which includes sys/intr.h
which includes machine/intr.h
which on cats includes arm/footbridge/footbridge_intr.h
which includes arm/cpu.h
which includes sys/cpu_data.h
which includes sys/sched.h

But there was never any real need for sys/mutex.h in sys/sched.h,
because it only uses pointers to the opaque struct kmutex. Cycle
broken by using `struct kmutex *' instead of pulling in sys/mutex.h
for the definition of kmutex_t.

Side effect: This revealed that sys/cpu_data.h needed sys/intr.h
(which was pulled in accidentally by sys/mutex.h via sys/sched.h) for
SOFTINT_COUNT. Also revealed some other machine/cpu.h header files
were missing includes of sys/mutex.h for kmutex_t.
 1.133 07-Sep-2022  knakahara branches: 1.133.4;
NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.132 07-Oct-2021  msaitoh Move some common functions into x86/identcpu_subr.c. No functional change.
 1.131 14-Aug-2021  ryo Improved the performance of kernel profiling on MULTIPROCESSOR, and possible to get profiling data for each CPU.

In the current implementation, locks are acquired at the entrance of the mcount
internal function, so the higher the number of cores, the more lock conflict
occurs, making profiling performance in a MULTIPROCESSOR environment unusable
and slow. Profiling buffers has been changed to be reserved for each CPU,
improving profiling performance in MP by several to several dozen times.

- Eliminated cpu_simple_lock in mcount internal function, using per-CPU buffers.
- Add ci_gmon member to struct cpu_info of each MP arch.
- Add kern.profiling.percpu node in sysctl tree.
- Add new -c <cpuid> option to kgmon(8) to specify the cpuid, like openbsd.
For compatibility, if the -c option is not specified, the entire system can be
operated as before, and the -p option will get the total profiling data for
all CPUs.
 1.130 19-Feb-2021  christos Identify VirtualBox as a separate guest type.
 1.129 08-Aug-2020  christos branches: 1.129.2;
PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.
 1.128 19-Jul-2020  maxv don't include opt_user_ldt.h when it is not needed
 1.127 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.126 19-Jun-2020  maxv localify
 1.125 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.124 30-Apr-2020  bouyer Don't #include xen/intrdefs.h is !XEN.
Should fix third-party module builds (e.g. virtualbox)
 1.123 27-Apr-2020  bouyer Move ci_vcpu under the #ifdef XEN section at the end of the struct cpu_info.
Hopefully will fix the nvmm module.
 1.122 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.121 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.120 13-Apr-2020  bouyer By default, events are bound to CPU 0 (exept for IPIs and VTIMERs which
are bound to a different CPU at creation time).
Recent MI changes caused the scheduler to choose a different CPU when
probing and attaching xennet devices (I guess it's the xenbus thread which
runs on a different CPU). This cause the callback to be called on a different
CPU than the one expected by the kernel, and the event is ignored.
It is handled when the clock causes the callback to be called on the right
CPU, which is why xennet still run, but slowly.

Change event_set_handler() to do a EVTCHNOP_bind_vcpu if requested to,
and make sure we don't do it for IPIs and VIRQs (for theses, the op fails).
 1.119 10-Apr-2020  bouyer Revert, wrong branch
 1.118 10-Apr-2020  bouyer Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.
 1.117 15-Jan-2020  ad branches: 1.117.4;
Push the INVLPG limit for shootdowns up to 16 (for UBC).
 1.116 30-Dec-2019  thorpej branches: 1.116.2;
Fix a problem with intr_unmask() that can cause a forever-loop:
- When handling the source-is-masked case in the interrupt vector, set the
interrupt bit in a new ci_imasked field and ensure the bit is cleared
from ci_ipending.
- In intr_unmask(), transfer the bit from ci_imasked to ci_ipending for
non-level-sensitive interrupts (the PIC does the work for us in the
level-sensitive case), and only force pending interrupts to be processed
in this case. (In all cases, make sure the now-unmasked bit is cleared
from ci_imasked.)

Before, the bit was left in ci_ipending so as not to use edge-triggered
interrupts while the source is masked, but Xspllower() relies on the
pending bits getting cleared.

Tested by forcing all wm(4) interrupts on my test system though an
intr_mask() / softint / intr_unmask() cycle and exercising the network
heavily.
 1.115 01-Dec-2019  ad Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.
 1.114 27-Nov-2019  maxv Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();
 1.113 23-Nov-2019  ad cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.112 21-Nov-2019  ad x86 TLB shootdown IPI changes:

- Shave some time off processing.
- Reduce cacheline/bus traffic on systems with many CPUs.
- Reduce time spent at IPL_VM.
 1.111 21-Nov-2019  ad mi_userret(): take care of calling preempt(), set spc_curpriority directly,
and remove MD code that does the same.
 1.110 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.109 03-Oct-2019  maxv Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
 1.108 07-Aug-2019  maxv Add support for USER_LDT in SVS. This allows us to have both enabled at
the same time.

We allocate an LDT for each CPU in the GDT and map an area for it, in
addition to the default LDT already present. In context switches between
different processes, we choose between the default or the per-cpu LDT
selector: if the user set specific LDT entries, we memcpy them to the
per-cpu LDT and load the per-cpu selector.

Tested by Naveen Narayanan (with Wine on amd64).
 1.107 26-Jun-2019  mgorny branches: 1.107.2;
Fetch XSAVE area component offsets and sizes when initializing x86 CPU

Introduce two new arrays, x86_xsave_offsets and x86_xsave_sizes,
and initialize them with XSAVE area component offsets and sizes queried
via CPUID. This will be needed to implement getters and setters for
additional register types.

While at it, add XSAVE_* constants corresponding to specific XSAVE
components.
 1.106 27-May-2019  maxv Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.
 1.105 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.104 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.103 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.102 02-Feb-2019  cherry Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.101 25-Dec-2018  cherry Excise XEN specific code out of x86/x86/intr.c into xen/x86/xen_intr.c

While at it, separate the source function tracking so that the interrupt
paths are truly independant.

Use weak symbol exporting to provision for future PVHVM co-existence
of both files, but with independant paths. Introduce assembler code
such that in a unified scenario, native interrupts get first priority
in spllower(), followed by XEN event callbacks. IPL management and
semantics are unchanged - native handlers and xen callbacks are
expected to maintain their ipl related semantics.

In summary, after this commit, native and XEN now have completely
unrelated interrupt handling mechanisms, including
intr_establish_xname() and assembler stubs and intr handler
management.

Happy Christmas!
 1.100 18-Nov-2018  cherry On Xen, copy just the bits we need from the trapframe for hardclock(9)
and statclock(9).

Current, the macros that use the trapframe are:
CLKF_USERMODE()
CLKF_PC()
CLKF_INTR()

Of these, CLKF_INTR() already ignores the frame and uses the ci_idepth
variable to do its job.

Convert the two remaining ones to do this, but only for XEN.
 1.99 18-Nov-2018  cherry Save the interrupt trap/clockframe to a per-cpu copy.

We can use this copy to pass on the trapframe to hardclock(9) from
within the xen timer handler. This delinks the current dependency
between MD code and the handler, which is specially prototyped to take
the clockframe unlike any other handler.

This change has performance implications, as each interrupt entry will
copy the entire trapframe over to the per-cpu cached copy. This can be
mitigated by selectively copying just the parts of the clockframe that
are used by hardclock() et. al.

Tested on amd64 XEN domU
 1.98 05-Oct-2018  maxv export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
 1.97 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.96 16-Jul-2018  pgoyette More rearrangement of struct cpu_info to keep all the un-conditional
members at fixed locations.

Should address my PR kern/52919

OK maxv@

XXX kernel version bump coming momentarily.
 1.95 15-Jul-2018  maxv Hum. Move the __HAVE_DIRECT_MAP block a little below, otherwise dynamically
loaded kernel modules use a wrong offset for some ci_* fields. Found when
modloading tprof_amd on an AMD 10h, the read of ci_signature was at a
wrong address, and the cpu family was not detected correctly.
 1.94 30-Jun-2018  riastradh Just use struct cpu_info members for the Xen clock state.

Silly to use percpu(9) for some things and struct cpu_info for
others.
 1.93 29-Jun-2018  riastradh Rewrite Xen timecounter and hardclock timer.

With this change, the Xen timecounter should now be globally
monotonic, as every timecounter is supposed to be. Should also fix a
litany of races in the timecounter logic.

Proposed last year; see mailing list for further details:
https://mail-index.netbsd.org/port-xen/2017/10/31/msg009112.html

ok cherry
 1.92 14-Jun-2018  maxv branches: 1.92.2;
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.
 1.91 04-Apr-2018  maxv Enable the SpectreV2 mitigation by default at boot time.
 1.90 30-Mar-2018  maxv Retrieve cpuid.7:%edx.
 1.89 18-Jan-2018  maxv branches: 1.89.2;
Unmap the kernel heap from the user page tables (SVS).

This implementation is optimized and organized in such a way that we
don't need to copy the kernel stack to a safe place during user<->kernel
transitions. We create two VAs that point to the same physical page; one
will be mapped in userland and is offset in order to contain only the
trapframe, the other is mapped in the kernel and maps the entire stack.

Sent on tech-kern@ a week ago.
 1.88 07-Jan-2018  maxv Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.
 1.87 05-Jan-2018  maxv Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.
 1.86 04-Jan-2018  maxv Allocate the TSS area dynamically. This way cpu_info and cpu_tss can be
put in separate pages.
 1.85 04-Jan-2018  maxv Group the different TSSes into a cpu_tss structure. And pack this
structure to make sure there is no padding between 'tss' and 'iomap'.
 1.84 28-Dec-2017  maxv typos
 1.83 02-Dec-2017  christos Add padding to make the 32/64 bit structs the same.
 1.82 27-Nov-2017  maxv Remove unused fields, there is no alignment we need to enforce.
 1.81 23-Nov-2017  kamil Restore removed sysctl(2) x86 entry: fpu_present

Hardcode it to 1 for now on i386 and amd64.

This unbreaks software that used it (e.g. LLDB).

Removal noted by <christos>

PR lib/52756 by myself
 1.80 09-Oct-2017  maya GC i386_fpu_present. no FPU x86 is not supported.

Also delete newly unused send_sigill
 1.79 16-Sep-2017  maxv Move xpq_idx into cpu_info, to prevent false sharing between CPUs. Saves
10s when doing a './build.sh -j 3 kernel=GENERIC' on xen-amd64-domU.
 1.78 27-Aug-2017  maxv style, and move some i386-specific code into i386/
 1.77 27-Aug-2017  maxv Localify. By the way, we should use a different stack for NMIs.
 1.76 12-Aug-2017  maxv Remove vm86.

Pass 3.
 1.75 22-Jul-2017  maxv Call _proc0_tss_ldt_init only once, and rename them.
 1.74 16-Jul-2017  cherry branches: 1.74.2;
Unify the xen and native x86/ interrupt setup functions and
spl traversal data structures.

This is towards PVHVM.
 1.73 16-Jun-2017  jdolecek dumpconf(void) long doesn't exist, remove the prototype

PR kern/39714 by Henning Petersen
 1.72 09-Jun-2017  chs if __HIDE_DELAY is defined, do not define delay() or DELAY().
needed by dtrace and ZFS.
 1.71 23-May-2017  nonaka branches: 1.71.2;
x86: hypervisor detection from FreeBSD for x2APIC support.
 1.70 15-May-2017  msaitoh CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify comments
and snprintb() sring.
 1.69 14-Apr-2017  kamil branches: 1.69.2;
x86: Export fpu_save, fpu_save_size, xsave_features to dedicated sysctl nodes

Add new defines:
- CPU_FPU_SAVE (15)
int: FPU Instructions layout
* to use this, CPU_OSFXSR must be true
* 0: FSAVE
* 1: FXSAVE
* 2: XSAVE
* 3: XSAVEOPT
- CPU_FPU_SAVE_SIZE (16)
int: FPU Instruction layout size
- CPU_XSAVE_FEATURES (17)
quad: FPU XSAVE features

Bump CPU_MAXID from 15 to 18.

These values were prepared originally to be exported without ASCIIZ name to
be used as handler. These values are useful to get FPU accessors in a
debugger easier to implement on x86 (PT_SETFPREG, PT_GETFPREG).

This interface handles all supported x86 targets. In the older (i386) and
less featured CPUs check first osfxsr (OS uses FXSAVE/FXRSTOR).

According to sys/arch/x86/include/cpu.h r.1.65 this was prepared to be
exported beyond simple CTL_CREATE node.

Sponsored by <The NetBSD Foundation>
 1.68 11-Feb-2017  maxv Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.
 1.67 13-Dec-2015  maxv branches: 1.67.2; 1.67.4;
Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.66 23-Feb-2014  dsl branches: 1.66.4; 1.66.6; 1.66.8;
Rename (the recently added) 'x86_xsave_size' to 'x86_fpu_save_size'
and default to 512 (the size of the fxsave structure).
 1.65 23-Feb-2014  dsl Determine whether the cpu supports xsave (and hence AVX).
The result is only written to sysctl nodes at the moment.
I see:
machdep.fpu_save = 3 (implies xsaveopt)
machdep.xsave_size = 832
machdep.xsave_features = 7
Completely common up the i386 and amd64 machdep sysctl creation.
 1.64 22-Feb-2014  dsl Re-use the unused ci_cpu_serial[3] to save the highest cpuid values
for the normal and extended leafs.
(The 'normal' one might be luring in the global cpulevel.)
Read the 'extended feature' from cpuid.80000001.%ecx/edx into
ci_feat_val[3/2] just after saving cpuid.1.%ecx/dx in ci_feat_val[1/0]
instead of doing it separately for amd k678 and via c3 processors
in their probe functions and repeating it for all cpus a few instructions
later when x86_cpu_topology() is called.
x86_cpu_topology() is only called from cpu_probe() and really doesn't
deserve its own source file. Chasing the setup code is bad enough anyway.
 1.63 20-Feb-2014  dsl This needs stdint.h in userspace (for uint64_t)
 1.62 15-Feb-2014  dsl Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).
 1.61 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.60 04-Feb-2014  dsl There is no need to check for recursive calls into fpudna().
Rename the associated ci_fpsaving field to 'unused'.
I'm not sure they could ever happen, you could get unwanted calls into
the fpu trap code while saving state when using INT13 - but these are
different.
The return value from the i386 fpudna() was always 1 - possibly a historic
relic of the kernel fp emulation. Remove and don't check in trap.S.
The amd64 and i386 fpudna() code is now almost identical.
 1.59 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.58 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.57 10-Nov-2013  christos use __unused instead of __USE and void cast to mark iterator variable unused
where needed (from phone)
 1.56 05-Nov-2013  christos initialize cii before using it.
 1.55 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.54 17-Oct-2013  christos __USE() unused variables
 1.53 27-Oct-2012  chs branches: 1.53.2;
split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.
 1.52 15-Jul-2012  dsl branches: 1.52.2;
Rename MDP_IRET to MDL_IRET since it is an lwp flag, not a proc one.
Add an MDL_COMPAT32 flag to the lwp's md_flags, set it for 32bit lwps
and use it to force 'return to user' with iret (as is done when
MDL_IRET is set).
Split the iret/sysret code paths much later.
Remove all the replicated code for 32bit system calls - which was only
needed so that iret was always used.
frameasm.h for XEN contains '#define swapgs', while XEN probable never
needs swapgs, this is likely to be confusing.
Add a SWAPGS which is a nop on XEN and swapgs otherwise.
(I've not yet checked all the swapgs in files that include frameasm.h)
Simple x86 programs still work.
Hijack 6.99.9 kernel bump (needed for compat32 modules)
 1.51 16-Jun-2012  chs rename the global variable "cpu" to "cputype" to avoid conflicting with
dtrace, which wants to use "cpu" as a local variable.
 1.50 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.49 02-Mar-2012  bouyer Follow locore.S and move FPU handling from x86_64_switch_context() to
x86_64_tls_switch(); raise IPL to IPL_HIGH in x86_64_switch_context()
and test ci_fpcurlwp to decide to disable FPU or not.
Change the Xen i386 context switch code to be like the amd64 one.
 1.48 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.47 12-Feb-2012  jym branches: 1.47.2;
Xen clock management routines keep track of CPU (following MP merge).
Reflect this change in the suspend/resume routines so they can cope with
domU CPU suspend, instead of setting their cpu_info pointer to NULL.

Avoid copy/pasting by using the resume routines during attachement.

ok releng@.

No regression observed, and allows domU to suspend successfully again.
Restore is a different beast as PD/PT flags are marked "invalid" by Xen-4
hypervisor, and blocks resuming. Looking into it.
 1.46 28-Jan-2012  cherry stop using alternate pde mapping in xen pmap
 1.45 30-Dec-2011  cherry Move the per-cpu l3 page allocation code to a separate MD function. Avoids code duplication for xen PAE
 1.44 07-Dec-2011  cegger switch from xen3-public to xen-public.
 1.43 19-Nov-2011  cherry branches: 1.43.4;
[merging from cherry-xenmp] bring in bouyer@'s changes via:
http://mail-index.netbsd.org/source-changes/2011/10/22/msg028271.html
From the Log:
Log Message:
Various interrupt fixes, mainly:
keep a per-cpu mask of enabled events, and use it to get pending events.
A cpu-specific event (all of them at this time) should not be ever masked
by another CPU, because it may prevent the target CPU from seeing it
(the clock events all fires at once for example).
 1.42 10-Nov-2011  jym Turn the 'i386_use_pae' variable into simply 'use_pae'. Technically
speaking we are also running with PAE enabled in long mode under amd64,
so this variable will be used in various places across x86 machdep to
branch at runtime to functions that require extra handling for PAE mode.
 1.41 06-Nov-2011  cherry [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.
 1.40 01-Nov-2011  joerg branches: 1.40.2;
Reduce exposure of kernel internals for __KMEMUSER
 1.39 17-Oct-2011  jmcneill add a "vm" device class for cpufeaturebus
 1.38 20-Sep-2011  jym Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.37 11-Aug-2011  cherry Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs
 1.36 10-Aug-2011  cherry Add Xen specific members to struct cpu_info, Add proper per-cpu curcpu() functionality
 1.35 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.34 31-May-2011  dyoung branches: 1.34.2;
Don't use the C preprocessor to configure USERCONF. Instead, either do
or do not link in subr_userconf.c and x86_userconf.c.

Provide no-op stubs for userconf_bootinfo(), userconf_init(), and
userconf_prompt().

Delete all occurrences of #include "opt_userconf.h" as well as USERCONF
and __HAVE_USERCONF_BOOTINFO #ifdef'age.
 1.33 26-May-2011  uebayasi Support userconf(4) command in boot(8)/boot.cfg(5) on i386/amd64.

From jmmv@, no objections seen in the proposed thread:

http://mail-index.netbsd.org/tech-kern/2009/01/22/msg004081.html
 1.32 13-Apr-2011  mrg move the include sys/types.h xor stdbool.h to the top of the file,
so that "bool" will be present when used later in the file.
 1.31 24-Feb-2011  jruoho Fix autoconf(9) of cpufeaturebus.
 1.30 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.29 20-Feb-2011  jruoho Modularize coretemp(4). Ok jmcneill@.
 1.28 20-Feb-2011  jmcneill cpu.h no longer needs via_padlock.h
 1.27 19-Feb-2011  jmcneill modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.26 22-Dec-2010  christos branches: 1.26.2; 1.26.4;
Make __HAVE_CPU_DATA_FIRST true
 1.25 16-Aug-2010  jym Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for their
reviews.

In kernel, it matches the 'i386_use_pae' variable (0: kernel does not use
PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the functions
that should get called for VA => PA translations.
 1.24 04-Aug-2010  jruoho Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.
 1.23 24-Jul-2010  jym Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.22 09-May-2010  rmind Drop x86 MD package/core/smt IDs and use MI.
 1.21 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.20 18-Jan-2010  rmind branches: 1.20.2; 1.20.4;
x86_cpu_topology, not toplogy.
 1.19 09-Jan-2010  cegger add x2apic support.
patch presented on current-users@, port-i386@ and port-amd64@ on 2009-12-22

No comments.
 1.18 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.17 30-Apr-2009  rmind Move x86 CPU topology detection code into the separate file (as it was originally).
OK by <yamt>.
 1.16 19-Apr-2009  ad cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.15 16-Apr-2009  rmind - Add macros to handle (some) trapframe registers for common x86 code.
- Merge i386 and amd64 syscall.c into x86. No functional changes intended.

Proposed on (port-i386 & port-amd64). Unfortunately, I cannot merge these
lists into the single port-x86. :(
 1.14 30-Mar-2009  tsutsui #include <sys/types.h>, not <stdbool.h> for userland
in defined(_STANDALONE) case too.
 1.13 28-Mar-2009  rmind kvtop: change return type to paddr_t.
 1.12 27-Mar-2009  dyoung If defined(_KERNEL), #include <sys/types.h>, otherwise #include
<stdbool.h>, for the bool definition that we need. cpu.h only got the
definition by chance, before.
 1.11 07-Mar-2009  ad Expose more stuff if _KMEMUSER is defined.
 1.10 29-Dec-2008  pooka branches: 1.10.2;
_LKM -> _MODULE
 1.9 25-Oct-2008  mrg branches: 1.9.2; 1.9.4; 1.9.8; 1.9.10;
this uses an evcnt so, include <sys/evcnt.h>
 1.8 13-Oct-2008  cegger print features4: cpuid fn80000001 %ecx on AMD CPUs.
 1.7 30-May-2008  ad branches: 1.7.2; 1.7.6; 1.7.8;
fillw is dead.
 1.6 28-May-2008  ad Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.
 1.5 22-May-2008  ad Mark x86_curlwp() with __attribute__ ((const)), so gcc can CSE it and know
that it does not clobber global data.
 1.4 12-May-2008  ad branches: 1.4.2; 1.4.4;
- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().
 1.3 11-May-2008  ad Don't reload LDTR unless a new value, which only happens for USER_LDT.
 1.2 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.1 11-May-2008  ad Share cpu.h between the x86 ports.
 1.4.4.3 04-Jun-2008  yamt sync with head
 1.4.4.2 18-May-2008  yamt sync with head.
 1.4.4.1 12-May-2008  yamt file cpu.h was added on branch yamt-pf42 on 2008-05-18 12:33:01 +0000
 1.4.2.6 09-Oct-2010  yamt sync with head
 1.4.2.5 11-Aug-2010  yamt sync with head.
 1.4.2.4 11-Mar-2010  yamt sync with head
 1.4.2.3 04-May-2009  yamt sync with head.
 1.4.2.2 16-May-2008  yamt sync with head.
 1.4.2.1 12-May-2008  yamt file cpu.h was added on branch yamt-nfs-mp on 2008-05-16 02:23:27 +0000
 1.7.8.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.7.8.1 19-Oct-2008  haad Sync with HEAD.
 1.7.6.2 23-Jun-2008  wrstuden Add files to branch that were added on -current.

After this, all that's left of update is to merge some changes
that had conflicts.
 1.7.6.1 30-May-2008  wrstuden file cpu.h was added on branch wrstuden-revivesa on 2008-06-23 05:02:12 +0000
 1.7.2.3 17-Jan-2009  mjf Sync with HEAD.
 1.7.2.2 02-Jun-2008  mjf Sync with HEAD.
 1.7.2.1 30-May-2008  mjf file cpu.h was added on branch mjf-devfs2 on 2008-06-02 13:22:50 +0000
 1.9.10.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.9.10.1 21-Apr-2010  matt sync to netbsd-5
 1.9.8.1 23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.9.4.2 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.9.4.1 16-Jun-2009  snj Pull up following revision(s) (requested by rmind in ticket #782):
sys/arch/x86/conf/files.x86: revision 1.52 via patch
sys/arch/x86/include/cpu.h: revision 1.17
sys/arch/x86/x86/cpu_topology.c: revision 1.1
sys/arch/x86/x86/identcpu.c: revision 1.16 via patch
Move x86 CPU topology detection code into the separate file (as it was
originally).
OK by <yamt>.
 1.9.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.9.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.10.2.9 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.10.2.8 02-May-2011  jym Sync with head.
 1.10.2.7 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.10.2.6 10-Jan-2011  jym Sync with HEAD
 1.10.2.5 24-Oct-2010  jym Sync with HEAD
 1.10.2.4 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.10.2.3 01-Nov-2009  jym Sync with HEAD.
 1.10.2.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.10.2.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.20.4.6 12-Jun-2011  rmind sync with head
 1.20.4.5 31-May-2011  rmind sync with head
 1.20.4.4 21-Apr-2011  rmind sync with head
 1.20.4.3 05-Mar-2011  rmind sync with head
 1.20.4.2 30-May-2010  rmind sync with head
 1.20.4.1 26-Apr-2010  rmind Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap,
also provide TLBSTATS option to measure and track TLB shootdowns. Details:

http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html

Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.

XXX: amd64 and xen are not yet; work in progress.
 1.20.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.20.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.20.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.26.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.26.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.34.2.7 22-Oct-2011  bouyer Various interrupt fixes, mainly:
keep a per-cpu mask of enabled events, and use it to get pending events.
A cpu-specific event (all of them at this time) should not be ever masked
by another CPU, because it may prevent the target CPU from seeing it
(the clock events all fires at once for example).
 1.34.2.6 01-Sep-2011  cherry fix %cr3 init. from mhitch@, tested by riz@ & mhitch@
 1.34.2.5 20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.34.2.4 17-Aug-2011  cherry Pullup relevant changes from -current
 1.34.2.3 16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.34.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.34.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.40.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.40.2.4 30-Oct-2012  yamt sync with head
 1.40.2.3 23-May-2012  yamt sync with head.
 1.40.2.2 17-Apr-2012  yamt sync with head
 1.40.2.1 10-Nov-2011  yamt sync with head
 1.43.4.5 29-Apr-2012  mrg sync to latest -current.
 1.43.4.4 06-Mar-2012  mrg sync to -current
 1.43.4.3 06-Mar-2012  mrg sync to -current
 1.43.4.2 04-Mar-2012  mrg sync to latest -current.
 1.43.4.1 18-Feb-2012  mrg merge to -current.
 1.47.2.3 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.47.2.2 05-Mar-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #80):
sys/arch/xen/x86/x86_xpmap.c: revision 1.42
sys/arch/x86/include/specialreg.h: revision 1.56
sys/arch/amd64/amd64/machdep.c: revision 1.179
sys/arch/i386/i386/locore.S: revision 1.97
sys/arch/i386/i386/machdep.c: revision 1.723 via patch
sys/arch/x86/include/cpu.h: revision 1.49

Fix possible FPU registers corruption on context switches.
Fix type of pointers passed to some hypercalls.
 1.47.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.52.2.3 03-Dec-2017  jdolecek update from HEAD
 1.52.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.52.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.53.2.1 18-May-2014  rmind sync with head
 1.66.8.1 19-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.66.6.2 28-Aug-2017  skrll Sync with HEAD
 1.66.6.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.66.4.2 09-Oct-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #1636):
sys/arch/x86/include/cacheinfo.h: 1.23-1.26
sys/arch/x86/include/cpu.h: 1.70
sys/arch/x86/include/specialreg.h: 1.91-1.93,1.98,1.100,1.102-1.124,1.126,1.130 via patch
sys/arch/x86/x86/cpu_topology.c: 1.10
sys/arch/x86/x86/identcpu.c: 1.56-1.57,1.70 via patch
usr.sbin/cpuctl/arch/i386.c: 1.71,1.75-1.79,1.81-1.85 via patch
Add some register definitions for x86:
- Add CLWB bit.
- Fix a few (unused) MSR values, and add some bit definitions of
MSR_EFER from Murray Armfield in PR#42861.
- CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify
comments and snprintb() string.
- Define CPUID Fn00000001 %ebx bits and use them.
No functional change.
- Add Structured Extended Flags Enumeration Leaf's bit definitions:
AVX512_{IFMA,VBMI2,VNNI,BITALG,VPOPCNTDQ,4VNNIW,4FMAPS},GFNI&VAES.
- Add Turbo Boost Max Technology 3.0 bit.
- Add AMD SVM features definitions.
- Add Intel cpuid 7 %edx IBRS and STIBP bit definitions.
- Fix swapped comments for EFER LME and LMA
- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add MSR_IA32_ARCH_CAPABILITIES definition.
- Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.
- Add Intel Deterministic Address Translation Parameter Leaf(0x18)
definitions.
- s/CLFUSH/CLFLUSH/
- Add AMD's Disable Indirect Branch Predictor bit definition.
- Add the MSR bits definitions for IBRS, STIBP and IBPB.
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.
- Add AMD's CPUID Fn80000001 %edx MMX and FXSR bit definitions.
- Add RDCL_NO and IBRS_ALL.
- Add SSBD and RSBA bit definitions.
- Add AMD's SSB bit definitions for F15H, F16H and F17H.
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
- Add yet another Shared L2 TLB (2M/4M pages).
- Add 3way and 6way of L2 cache or TLB on AMD CPU.
- AMD L3 cache association bitfield is not 8bit but 4bit like others
association bitfields.
- Sort entries. No functional change.
- Modify comment, fix typo in comment and add comment.
cpuctl(8):
- Add detection for Quark X1000, Xeon E5 v4, E7 v4,
Core i7-69xx Extreme Edition, Xeon Scalable (Skylake),
Xeon Phi [357]200 (Knights Landing), Atom (Goldmont),
Atom (Denverton), Future Core (Cannon Lake), Atom (Goldmont Plus),
Xeon Phi 7215, 7285 and 7295 (Knights Mill) and
7th or 8th gen Core (Kaby Lake, Coffee Lake).
- Print Structured Extended Feature leaf Fn0000_0007 %ebx on AMD,too.
- Print Fn0000_0007 %ecx on Intel.
- Print Intel cpuid 7 %edx.
- Parse the TLB info from `cpuid leaf 18H' on Intel processor.
- Use aprint_error_dev() for error output.
 1.66.4.1 06-Mar-2016  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67
Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.67.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.67.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.67.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.69.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.71.2.10 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1721:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.71.2.9 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1593:

sys/arch/x86/conf/files.x86 1.108
sys/arch/x86/include/apicvar.h 1.7 via patch
sys/arch/x86/include/cpu.h 1.121
sys/arch/x86/x86/cpu.c 1.185 via patch
sys/arch/x86/x86/hyperv.c 1.7
sys/arch/x86/x86/tsc.c 1.41
sys/arch/xen/conf/files.xen 1.181

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.71.2.8 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.71.2.7 23-Jun-2018  martin Pull up the following, via patch, requested by maxv in ticket #897:

sys/arch/amd64/amd64/locore.S 1.166 (patch)
sys/arch/i386/i386/locore.S 1.157 (patch)
sys/arch/x86/include/cpu.h 1.92 (patch)
sys/arch/x86/include/fpu.h 1.9 (patch)
sys/arch/x86/x86/fpu.c 1.33-1.39 (patch)
sys/arch/x86/x86/identcpu.c 1.72 (patch)
sys/arch/x86/x86/vm_machdep.c 1.34 (patch)
sys/arch/x86/x86/x86_machdep.c 1.116,1.117 (patch)

Support eager fpu switch, to work around INTEL-SA-00145.
Provide a sysctl machdep.fpu_eager, which gets automatically
initialized to 1 on affected CPUs.
 1.71.2.6 09-Jun-2018  martin Pullup the following revisions, requested by maxv in ticket #865:

sys/arch/amd64/amd64/machdep.c 1.303 (patch)
sys/arch/amd64/conf/GENERIC 1.492 (patch)
sys/arch/amd64/conf/files.amd64 1.103 (patch)
sys/arch/i386/i386/machdep.c 1.806 (patch)
sys/arch/i386/conf/GENERIC 1.1179 (patch)
sys/arch/i386/conf/files.i386 1.393 (patch)
sys/arch/x86/include/cpu.h 1.91 (patch)
sys/arch/x86/include/specialreg.h upto 1.126 (patch)
sys/arch/x86/x86/x86_machdep.c upto 1.115 (patch, adapted)
sys/arch/x86/x86/spectre.c upto 1.19 (patch, adapted,
no IBRS,
SpectreV2 mitigations not
enabled by default)

Backport the hardware SpectreV2 and SpectreV4 mitigations.
 1.71.2.5 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #681):
sys/arch/x86/include/cpu.h: revision 1.90
sys/arch/x86/x86/identcpu.c: revision 1.71
Retrieve cpuid.7:%edx.
 1.71.2.4 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.71.2.3 16-Mar-2018  martin Pull up the following revisions (via patch), requested by maxv in #635:

sys/arch/amd64/amd64/gdt.c 1.39-1.45 (patch)
sys/arch/amd64/amd64/amd64/machdep.c 1.284,1.287,1.288 (patch)
sys/arch/amd64/amd64/include/param.h 1.23 (patch)
sys/arch/amd64/include/types.h 1.53 (patch)
sys/arch/x86/include/cpu.h 1.87 (patch)
sys/arch/x86/include/pmap.h 1.73,1.74 (patch)
sys/arch/x86/x86/cpu.c 1.142 (patch)
sys/arch/x86/x86/intr.c 1.117 (partial),1.120 (patch)
sys/arch/x86/x86/pmap.c 1.276 (patch)

Initialize ist0 in cpu_init_tss.
Backport __HAVE_PCPU_AREA.
 1.71.2.2 13-Mar-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #629:

sys/arch/amd64/amd64/genassym.cf 1.63,1.64
sys/arch/amd64/amd64/locore.S 1.144
sys/arch/amd64/amd64/machdep.c 1.281-1.283
sys/arch/i386/i386/genassym.cf 1.105-1.106
sys/arch/i386/i386/locore.S 1.155
sys/arch/i386/i386/machdep.c 1.802 (adapted),1.803
sys/arch/x86/include/cpu.h 1.85
sys/arch/x86/x86/intr.c 1.115-1.116
sys/arch/x86/x86/pmap.c 1.275
sys/arch/x86/x86/sys_machdep.c 1.45
sys/arch/xen/x86/cpu.c 1.117

Stop sharing the double-fault stack.
Merge the TSS structures into one single cpu_tss structure, and
allocate it dynamically.
 1.71.2.1 08-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #611):
sys/arch/x86/x86/cpu.c: revision 1.134 (patch)
sys/arch/x86/include/cpu.h: revision 1.78 (patch)
sys/arch/i386/i386/machdep.c: revision 1.792 (patch)

style, and move some i386-specific code into i386/
 1.74.2.2 16-Jul-2017  cherry 2302677
 1.74.2.1 16-Jul-2017  cherry file cpu.h was added on branch perseant-stdc-iso10646 on 2017-07-16 14:02:49 +0000
 1.89.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.89.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.89.2.5 20-Oct-2018  pgoyette Sync with head
 1.89.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.89.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.89.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.89.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.92.2.3 21-Apr-2020  martin Sync with HEAD
 1.92.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.92.2.1 10-Jun-2019  christos Sync with HEAD
 1.107.2.2 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1396:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.107.2.1 15-Jul-2020  martin Pull up the following, requested by msaitoh in ticket #1015

sys/arch/x86/conf/files.x86 1.108 (via patch)
sys/arch/x86/include/apicvar.h 1.7 (via patch)
sys/arch/x86/include/cpu.h 1.121 (via patch)
sys/arch/x86/x86/cpu.c 1.185 (via patch)
sys/arch/x86/x86/hyperv.c 1.7 (via patch)
sys/arch/x86/x86/tsc.c 1.41 (via patch)
sys/arch/xen/conf/files.xen 1.181 (via patch)

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.116.2.1 17-Jan-2020  ad Sync with head.
 1.117.4.7 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.117.4.6 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.117.4.5 16-Apr-2020  bouyer Avoid overflow of ci_ipi_events[] in the PVHVM case (it's size is
XEN_NIPIS but we use x86 IPIs): size XEN_NIPIS only for PV, and
CTASSERT that XEN_NIPIS <= X86_NIPI if we ever use Xen IPIs for
PVHVM.
 1.117.4.4 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.117.4.3 11-Apr-2020  bouyer Include ci_isources[] for XenPV too.
Adjust spllower() to XenPV needs, and switch XenPV to the native spllower().
Remove xen_spllower().
 1.117.4.2 10-Apr-2020  bouyer Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.
 1.117.4.1 08-Apr-2020  bouyer Remove VM_GUEST_XEN and define only Xen subtypes:
VM_GUEST_XENPV
VM_GUEST_XENPVH
VM_GUEST_XENHVM
VM_GUEST_XENPVHVM

Set vm_guest in the start routine, if it is hypervisor-specific (e.g Xen PV).
If vm_guest was not set early and we detect Xen in identify_hypervisor(),
assume it is VM_GUEST_XENHVM. Refine to VM_GUEST_PVXENHVM in
hypervisor_match().
 1.129.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.133.4.2 29-Mar-2025  martin Pull up following revision(s) (requested by imil in ticket #1074):

sys/arch/x86/x86/x86_machdep.c: revision 1.155
sys/arch/x86/include/cpu.h: revision 1.137
sys/arch/x86/x86/x86_machdep.c: revision 1.156
sys/arch/x86/include/cpu.h: revision 1.138
sys/arch/x86/x86/consinit.c: revision 1.40
sys/arch/x86/acpi/acpi_machdep.c: revision 1.37
sys/arch/x86/acpi/acpi_machdep.c: revision 1.38
sys/arch/amd64/amd64/machdep.c: revision 1.370
sys/arch/xen/xen/hypervisor.c: revision 1.97
sys/arch/xen/xen/hypervisor.c: revision 1.98
sys/arch/amd64/amd64/genassym.cf: revision 1.98
sys/arch/x86/x86/x86_autoconf.c: revision 1.88
sys/arch/x86/x86/x86_autoconf.c: revision 1.89
sys/arch/amd64/amd64/locore.S: revision 1.226
sys/arch/amd64/amd64/locore.S: revision 1.227
sys/arch/x86/x86/identcpu.c: revision 1.131

Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.

Get one more change from PR kern/57813, needed for non-Xen PVH.

Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.133.4.1 09-Aug-2023  martin Pull up following revision(s) (requested by maya in ticket #316):

sys/arch/m68k/include/mutex.h: revision 1.13
sys/arch/arm/include/cpu.h: revision 1.125
sys/arch/sun68k/include/intr.h: revision 1.21
sys/arch/arm/include/mutex.h: revision 1.28
sys/sys/rwlock.h: revision 1.18
sys/arch/powerpc/include/mutex.h: revision 1.7
sys/arch/arm/include/mutex.h: revision 1.29
sys/arch/powerpc/include/mutex.h: revision 1.8
sys/uvm/uvm_param.h: revision 1.42
sys/sys/ksem.h: revision 1.16
sys/arch/x86/include/mutex.h: revision 1.10
sys/sys/proc.h: revision 1.372
sys/sys/ksem.h: revision 1.17
sys/arch/ia64/include/mutex.h: revision 1.8
sys/arch/evbarm/include/intr.h: revision 1.29
sys/sys/lua.h: revision 1.9
sys/arch/next68k/include/intr.h: revision 1.23
sys/arch/ia64/include/mutex.h: revision 1.9
sys/arch/hp300/include/intr.h: revision 1.35
sys/arch/hp300/include/intr.h: revision 1.36
sys/arch/sparc/include/cpu.h: revision 1.111
sys/arch/hppa/include/mutex.h: revision 1.16
sys/arch/vax/include/intr.h: revision 1.31
sys/arch/hppa/include/mutex.h: revision 1.17
sys/arch/news68k/include/intr.h: revision 1.28
sys/arch/hppa/include/mutex.h: revision 1.18
sys/arch/hppa/include/intr.h: revision 1.3
sys/arch/hppa/include/mutex.h: revision 1.19
sys/arch/hppa/include/intr.h: revision 1.4
sys/sys/sched.h: revision 1.92
sys/opencrypto/cryptodev.h: revision 1.51
sys/arch/vax/include/mutex.h: revision 1.20
sys/arch/sparc64/include/mutex.h: revision 1.10
sys/arch/ia64/include/sapicvar.h: revision 1.2
sys/arch/riscv/include/mutex.h: revision 1.5
sys/arch/amiga/dev/grfabs_cc.c: revision 1.39
sys/external/bsd/drm2/include/linux/idr.h: revision 1.11
sys/arch/riscv/include/mutex.h: revision 1.6
sys/ddb/files.ddb: revision 1.16
sys/arch/mac68k/include/intr.h: revision 1.32
share/man/man4/ddb.4: revision 1.203
sys/ddb/db_command.c: revision 1.183
sys/arch/mips/include/mutex.h: revision 1.10
sys/ddb/db_command.c: revision 1.184
sys/arch/x68k/include/intr.h: revision 1.22
sys/arch/sparc/include/psl.h: revision 1.51
sys/arch/or1k/include/mutex.h: revision 1.4
sys/arch/mips/include/mutex.h: revision 1.11
sys/arch/arm/xscale/pxa2x0_intr.h: revision 1.16
sys/arch/sparc64/include/cpu.h: revision 1.134
sys/arch/sparc/include/psl.h: revision 1.52
sys/arch/or1k/include/mutex.h: revision 1.5
sys/arch/mvme68k/include/intr.h: revision 1.22
sys/arch/luna68k/include/intr.h: revision 1.16
external/cddl/osnet/sys/sys/kcondvar.h: revision 1.6
sys/arch/sparc/include/mutex.h: revision 1.12
sys/arch/sparc/include/mutex.h: revision 1.13
sys/arch/usermode/include/mutex.h: revision 1.5
sys/arch/usermode/include/mutex.h: revision 1.6
sys/kern/kern_core.c: revision 1.38
usr.sbin/crash/Makefile: revision 1.49
sys/arch/amiga/include/intr.h: revision 1.23
sys/arch/alpha/include/mutex.h: revision 1.12
sys/arch/alpha/include/mutex.h: revision 1.13
sys/arch/evbarm/lubbock/sacc_obio.c: revision 1.16
sys/ddb/ddb.h: revision 1.6
sys/arch/sparc64/include/mutex.h: revision 1.8
sys/arch/sh3/include/mutex.h: revision 1.12
sys/arch/evbarm/lubbock/sacc_obio.c: revision 1.17
sys/ddb/db_syncobj.c: revision 1.1
sys/arch/vax/include/mutex.h: revision 1.18
sys/arch/sparc64/include/psl.h: revision 1.63
sys/arch/sparc64/include/mutex.h: revision 1.9
sys/arch/sh3/include/mutex.h: revision 1.13
sys/arch/evbarm/lubbock/obio.c: revision 1.13
sys/arch/atari/include/intr.h: revision 1.23
sys/ddb/db_syncobj.c: revision 1.2
sys/arch/vax/include/mutex.h: revision 1.19
sys/arch/evbarm/g42xxeb/obio.c: revision 1.14
sys/arch/evbarm/g42xxeb/obio.c: revision 1.15
sys/arch/cesfic/include/intr.h: revision 1.14
sys/ddb/db_syncobj.h: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.134
sys/arch/evbarm/g42xxeb/obio.c: revision 1.16
sys/arch/cesfic/include/intr.h: revision 1.15
sys/arch/arm/xscale/pxa2x0_intr.c: revision 1.26
sys/sys/cpu_data.h: revision 1.54
sys/arch/m68k/include/mutex.h: revision 1.12
sys/arch/ia64/acpi/madt.c: revision 1.6

sys/rwlock.h: Make this more self-contained for bool.

machine/mutex.h: Sprinkle includes so this can be used by crash(8).

ddb: New `show all tstiles' command.
Shows who's waiting for which locks and what the owner is up to.

Include psl.h for ipl_cookie_t if __MUTEX_PRIVATE

sys: Rip <sys/resourcevar.h> out of <uvm/uvm_param.h>.

And thus out of <sys/param.h>, which is exceedingly overused and
fragile and delenda est.

Should fix (some) issues with the recent inclusion of machine/lock.h
in various machine/mutex.h files.

arm/mutex.h: Need machine/intr.h, machine/lock.h.

For ipl_cookie_t and __cpu_simple_lock_t.
evbarm/intr.h: Define ipl_cookie_t before including ARM_INTR_IMPL.

Otherwise arm/mutex.h doesn't work, due to a cyclic dependency which
should really be fixed.
opencrypto/cryptodev.h: Fix includes.
- Move sys/condvar.h under #ifdef _KERNEL.
- Add some other necessary includes and forward declarations.
- Sort.

hp300/intr.h: Fix missing includes.
linux/idr.h: Need <sys/mutex.h> for kmutex_t.
amiga/intr.h: Don't define spl*() functions if !_KERNEL.

This is used by crash(8) now, and what's important is ipl_cookie_t.
cesfic/intr.h: Expose ipl_cookie_t to userland for crash(8).
cesfic/intr.h: Expose ipl_cookie_t to userland only with _KMEMUSER.

Probably not necessary but let's be a little more cautious about
this.

atari/intr.h: Expose ipl_cookie_t with _KMEMUSER for crash(8).

arm/cpu.h: Need sys/param.h for COHERENCY_UNIT.

Nix machine/param.h -- not meant to be used directly, pulled in by
sys/param.h.

Move the definition of ipl_cookie_t out of the kernel-only sections,
some _KMEMUSER applications need it.

ddb: Cast pointer to uintptr_t first before db_expr_t.

hppa/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

luna68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

mvme68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

news68k/intr.h: Fix includes. Put some definitions under _KERNEL.

next68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

sys/ksem.h: Hack around fstat(8) abuse of _KERNEL.

sun68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

vax/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

x68k/intr.h: Put functions under _KERNEL so crash(8) can use this.

Make ipl_cookie_t visible for _KMEMUSER userland applications.

fix editor mishap in previous

Explicitly include <sys/mutex.h> for kmutex_t.

Replace kmutex_t * (which may be undefined here) with struct kmutex *,
suggested by Taylor.

hp300/intr.h: Put most of this under #ifdef _KERNEL.
Only ipl_cookie_t really needs to be exposed now, for crash(8).

mac68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).
Make inclusion of sys/intr.h explicit for spl*.

fix hppa and vax builds.

machine/lock.h isn't necessary for __cpu_simple_lock_t, it's in
sys/types.h. avoids cpu_data.h vs sched.h include order issues.

move the hppa ipl_t typedef with the moved usage of it.
machine/mutex.h: Sprinkle sys/types.h, omit machine/lock.h.

Turns out machine/lock.h is not needed for __cpu_simple_lock_t, which
always comes from sys/types.h. And, really, sys/types.h (or at least
sys/stdint.h) is needed for uintN_t and uintptr_t.

ddb: Cast pointer to uintptr_t, then to db_expr_t.
Avoids warnings about conversion between pointer and integer of
different size on some architectures.

re-fix hppa builds.

this file uses __cpu_simple_lock(), not just the underlying type,
so it does need machine/lock.h.

Break cycle by using `struct kmutex *' instead of `kmutex_t *'.
sys/sched.h included sys/mutex.h
which includes sys/intr.h
which includes machine/intr.h
which on cats includes arm/footbridge/footbridge_intr.h
which includes arm/cpu.h
which includes sys/cpu_data.h
which includes sys/sched.h

But there was never any real need for sys/mutex.h in sys/sched.h,
because it only uses pointers to the opaque struct kmutex. Cycle
broken by using `struct kmutex *' instead of pulling in sys/mutex.h
for the definition of kmutex_t.

Side effect: This revealed that sys/cpu_data.h needed sys/intr.h
(which was pulled in accidentally by sys/mutex.h via sys/sched.h) for
SOFTINT_COUNT. Also revealed some other machine/cpu.h header files
were missing includes of sys/mutex.h for kmutex_t.

ia64: Need sys/types.h for u_int, vaddr_t; sys/mutex.h for kmutex_t.

explicitly include no longer implicitly included sys/mutex.h.

arm/xscale: Use sys/bitops.h fls32 - 1 instead of 31 - __builtin_clz.
Sidesteps namespace collision with `#define bits ...' in net/zlib.c.

complete the previous - there were two calls to find_first_bit() to fix.

arm/xscale: Missed a spot with previous find_first_bit commit.

evbarm/g42xxeb: Fix off-by-one in previous.

The original find_first_bit(x) was 31 - __builtin_clz((uint32_t)x),
which is equivalent to fls32(x) - 1, not to fls32(x).

Note that fls32 is 1-based and returns 0 for x=0.
 1.136.6.1 02-Aug-2025  perseant Sync with HEAD
 1.7 15-Jun-2020  msaitoh Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.6 08-May-2020  ad Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().
 1.5 02-Feb-2011  bouyer Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@
 1.4 10-May-2008  ad branches: 1.4.12; 1.4.20; 1.4.26; 1.4.28;
Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.
 1.3 10-May-2008  ad Merge cpu_counter.h.
 1.2 28-Apr-2008  martin branches: 1.2.2;
Remove clause 3 and 4 from TNF licenses
 1.1 07-Jul-2007  tsutsui branches: 1.1.2; 1.1.4; 1.1.16; 1.1.36; 1.1.38; 1.1.40;
Move x86 common cpu_counter functions into <x86/cpu_counter.h>.
 1.1.40.1 16-May-2008  yamt sync with head.
 1.1.38.1 18-May-2008  yamt sync with head.
 1.1.36.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.16.2 03-Sep-2007  yamt sync with head.
 1.1.16.1 07-Jul-2007  yamt file cpu_counter.h was added on branch yamt-lazymbuf on 2007-09-03 14:31:19 +0000
 1.1.4.2 15-Jul-2007  ad Sync with head.
 1.1.4.1 07-Jul-2007  ad file cpu_counter.h was added on branch vmlocking on 2007-07-15 13:21:05 +0000
 1.1.2.2 11-Jul-2007  mjf Sync with head.
 1.1.2.1 07-Jul-2007  mjf file cpu_counter.h was added on branch mjf-ufs-trans on 2007-07-11 20:03:13 +0000
 1.2.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.4.28.1 08-Feb-2011  bouyer Sync with HEAD
 1.4.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.4.20.1 05-Mar-2011  rmind sync with head
 1.4.12.1 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.19 24-Apr-2025  riastradh amd64: Allocate FPU save state outside pcb if it's too large.

We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as
large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte
state.

We only do this for user threads, and only on machines where it's
necessary, to avoid incurring much overhead. There is still a tiny
bit of overhead when saving and restoring the FPU state by using a
pointer indirection instead of arithmetic indirection for access to
struct pcb::pcb_savefpu, but this is probably a drop in the bucket
compared to the memory traffic incurred by the FPU state save/restore
anyway.

For now, these paths are mostly disabled on i386. We could enable
them but it will require either rewriting cpu_uarea_alloc/free for
i386, or adopting a guard page like amd64 does, which might be costly
and so should be undertaken only with some thought and care. And
since Intel AMX instructions only work in 64-bit mode, it's not
likely to be useful on i386.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu

These changes, as a side effect, may fix:

PR kern/57258: kthread_fpu_enter/exit problem

by making sure to allocate an FPU save space that is large enough to
guarantee fpu_kern_enter/leave work safely, instead of just using a
union savefpu object on the stack (which, at 576 bytes, may be too
small on some machines, particularly with AVX512 requiring ~2.5K).
(But we'll have to do some extra work with kthread_fpu_enter/exit_md
-- if we try doing them again on x86 -- to actually allocate the
separate pcb on these machines!)
 1.18 25-Feb-2023  riastradh branches: 1.18.6;
x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.

In fpu_kern_enter, make sure all the MXCSR exception status bits are
set when we start using the FPU, so that instructions which exhibit
MCDT are unaffected by it.

While here, zero all the other FPU registers in fpu_kern_enter.

In principle we could skip this step on future CPUs that fix the MCDT
bug, but there's probably not much benefit -- workloads that do a lot
of crypto in the kernel are probably better off using
kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles
in the first place.

For details, see:
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
 1.17 26-Jun-2019  mgorny branches: 1.17.28;
Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).
 1.16 23-May-2018  maxv branches: 1.16.2;
Clean up the FPU headers.
 1.15 08-Nov-2017  maxv branches: 1.15.2;
remove vestige
 1.14 31-Oct-2017  maxv Remove outdated comment.
 1.13 31-Oct-2017  maxv Don't embed our own values in the reserved fields of the XSAVE area, it
really is a bad idea. Move them into the PCB.
 1.12 31-Oct-2017  maxv Add xsh_xcomp_bv and fx_zero, and use uint8_t instead.
 1.11 10-Aug-2017  maxv Remove the svr4/ibcs2 fpu flags.
 1.10 18-Aug-2016  maxv KNF and simplify.
 1.9 25-Feb-2014  dsl branches: 1.9.4; 1.9.6; 1.9.10; 1.9.12;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
 1.8 18-Feb-2014  dsl It seems that firefox includes machine/fpu.h on amd64.
Add the file back so that the firwfox source doesn't have to depend
on the version of netbsd it is being compiled for.
(The i386 version doesn't play the same games in its SIGFPE handler.)
 1.7 15-Feb-2014  dsl Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).
 1.6 13-Feb-2014  dsl Check the argument types for the fpu asm functions.
 1.5 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.4 09-Feb-2014  dsl Add compatibility for some userspace code (eg firefox) that seems to look
inside the ucontext structure passed to signal handlers to modify the
xmm registers.
This should make the code compile - I'm not at all sure it works as expected,
the interactions between FP and signal handlers aren't at all clear.
AFAICT the FP state is saved on the user stack when the handler is called,
however the FP trap code can already done odd things to the FPU....
 1.3 08-Feb-2014  dsl Add bit defs for more of the x87 status register.
 1.2 07-Feb-2014  dsl Convert the amd64 build to use x86/cpu_extended_state.h so that the fpu
definitions match those of i386.
Mostly just structure and field renames, in addition:
1) process_xmm_to_s87() and process_s87_to_xmm() moved into
x86/convert_xmm_s87.c so they can be used by amd64's netbsd32 code.
2) The linux signal code simplified to use a structure copy for ths fxsave
data - it matches the hardware definition and won't change.
 1.1 07-Feb-2014  dsl Move all the hardware register layout for the x86 cpus into a header
that can also be used by amd64.
Add in skeleton definitions for XSAVE and AVX.
Update some comments to match reality.
 1.9.12.2 28-Aug-2017  skrll Sync with HEAD
 1.9.12.1 05-Oct-2016  skrll Sync with HEAD
 1.9.10.3 03-Dec-2017  jdolecek update from HEAD
 1.9.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.10.1 25-Feb-2014  tls file cpu_extended_state.h was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
 1.9.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.6.1 25-Feb-2014  yamt file cpu_extended_state.h was added on branch yamt-pagecache on 2014-05-22 11:40:13 +0000
 1.9.4.2 18-May-2014  rmind sync with head
 1.9.4.1 25-Feb-2014  rmind file cpu_extended_state.h was added on branch rmind-smpnet on 2014-05-18 17:45:30 +0000
 1.15.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.16.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.17.28.1 25-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #244):

sys/arch/x86/x86/fpu.c: revision 1.80
sys/arch/x86/include/cpu_extended_state.h: revision 1.18

x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.

In fpu_kern_enter, make sure all the MXCSR exception status bits are
set when we start using the FPU, so that instructions which exhibit
MCDT are unaffected by it.

While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT
bug, but there's probably not much benefit -- workloads that do a lot
of crypto in the kernel are probably better off using
kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles
in the first place.

For details, see:
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
 1.18.6.1 02-Aug-2025  perseant Sync with HEAD
 1.7 05-Oct-2009  rmind Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.6 17-Oct-2007  garbled branches: 1.6.20; 1.6.34;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.5 06-Oct-2007  xtraeme Use a two clause license for all the code I contributed.

The envsys code will be changed later.
 1.4 25-Mar-2007  xtraeme branches: 1.4.2; 1.4.6; 1.4.8; 1.4.10; 1.4.18; 1.4.20; 1.4.22; 1.4.24;
Add another member to struct cpu_msr_broadcast, msr_read that will
enable the rdmsr call in msr_write_ipi(), so that when it's not
defined we don't read it before writing; disabled in powernow_k8
and enabled in the others.
 1.3 21-Mar-2007  xtraeme branches: 1.3.2;
Remove unneeded headers.
 1.2 21-Mar-2007  xtraeme Remove the MSR read IPI handler, there won't be any driver that will
use it, and we can see if the values are ok in the CPUs in the write
operation.

Suggested by YAMAMOTO Takashi.
 1.1 20-Mar-2007  xtraeme MSR read and write IPI handlers for x86. A MSR will be read or written
in all CPUs available in the system. This adds another member
to struct cpu_info, ci_msr_rvalue; it will contain the value of the MSR
in a previous operation.

Tested with clockmod in UP and SMP by me, tested with est in SMP
by Daniel Carosone and Michael Van Elst.

Ok'ed by Andrew Doran and Matthew R. Green.
 1.3.2.3 15-Apr-2007  yamt sync with head.
 1.3.2.2 24-Mar-2007  yamt sync with head.
 1.3.2.1 21-Mar-2007  yamt file cpu_msr.h was added on branch yamt-idlelwp on 2007-03-24 14:55:05 +0000
 1.4.24.1 14-Oct-2007  yamt sync with head.
 1.4.22.3 27-Oct-2007  yamt sync with head.
 1.4.22.2 03-Sep-2007  yamt sync with head.
 1.4.22.1 25-Mar-2007  yamt file cpu_msr.h was added on branch yamt-lazymbuf on 2007-09-03 14:31:19 +0000
 1.4.20.1 06-Nov-2007  matt sync with HEAD
 1.4.18.1 07-Oct-2007  joerg Sync with HEAD.
 1.4.10.2 11-Jul-2007  mjf Sync with head.
 1.4.10.1 25-Mar-2007  mjf file cpu_msr.h was added on branch mjf-ufs-trans on 2007-07-11 20:03:13 +0000
 1.4.8.1 16-Oct-2007  garbled Sync with HEAD
 1.4.6.2 20-Apr-2007  bouyer Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.4.6.1 25-Mar-2007  bouyer file cpu_msr.h was added on branch netbsd-4 on 2007-04-20 20:31:27 +0000
 1.4.2.3 09-Oct-2007  ad Sync with head.
 1.4.2.2 10-Apr-2007  ad Sync with head.
 1.4.2.1 25-Mar-2007  ad file cpu_msr.h was added on branch vmlocking on 2007-04-10 13:22:45 +0000
 1.6.34.1 01-Nov-2009  jym Sync with HEAD.
 1.6.20.1 11-Mar-2010  yamt sync with head
 1.4 10-May-2020  maxv Reintroduce cpu_rng_early_sample(), but this time with embedded detection
for RDRAND/RDSEED, because TSC is not very strong.
 1.3 30-Apr-2020  riastradh Simplify Intel RDRAND/RDSEED and VIA C3 RNG API.

Push it all into MD x86 code to keep it simpler, until we have other
examples on other CPUs. Simplify RDSEED-to-RDRAND fallback.
Eliminate cpu_earlyrng in favour of just using entropy_extract, which
is available early now.
 1.2 21-Jul-2018  maxv More ASLR. Randomize the location of the direct map at boot time on amd64.
This doesn't need "options KASLR" and works on GENERIC. Will soon be
enabled by default.

The location of the areas is abstracted in a slotspace structure. Ideally
we should always use this structure when touching the L4 slots, instead of
the current cocktail of global variables and constants.

machdep initializes the structure with the default values, and we then
randomize its dmap entry. Ideally machdep should randomize everything at
once, but in the case of the direct map its size is determined a little
later in the boot procedure, so we're forced to randomize its location
later too.
 1.1 27-Feb-2016  tls branches: 1.1.2; 1.1.18; 1.1.20; 1.1.22;
Add cpu_rng, a framework for simple on-CPU random number generators.
 1.1.22.1 10-Jun-2019  christos Sync with HEAD
 1.1.20.1 28-Jul-2018  pgoyette Sync with HEAD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 27-Feb-2016  jdolecek file cpu_rng.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.1.2.2 19-Mar-2016  skrll Sync with HEAD
 1.1.2.1 27-Feb-2016  skrll file cpu_rng.h was added on branch nick-nhusb on 2016-03-19 11:30:07 +0000
 1.5 15-Sep-2022  msaitoh Verify checksum of the extended signature table.
 1.4 17-Mar-2018  christos branches: 1.4.6;
tuck in all the compat microcode code in one place.
 1.3 17-Oct-2012  drochner branches: 1.3.30; 1.3.36;
put binary compatibility support for the old AMD-only CPU microcode
update API inside COMPAT_60
 1.2 29-Aug-2012  drochner branches: 1.2.2;
Extend the CPU microcode update framework to support Intel x86 CPUs.
Contrary to the AMD implementation, it doesn't use xcalls to distribute
the update to all CPUs but relies on cpuctl(8) to bind itself to the
right CPU -- to keep it simple and avoid possible problems with
hyperthreading.
Also, it doesn't parse the vendor supplied file to pick the right
part for the present CPU model but relies on userland to prepare
files with specific filenames. I'll commit a pkg for this in a minute
(pkgsrc/sysutils/intel-microcode).
The ioctl interface changed; compatibility is provided (should be
limited to COMPAT_NETBSD6 as soon as this is available).
 1.1 13-Jan-2012  cegger branches: 1.1.4; 1.1.6;
Support CPU microcode loading via cpuctl(8).
Implemented and enabled via CPU_UCODE kernel config option
for x86 and Xen Dom0.
Tested on different AMD machines with different
CPU families.

ok wiz@ for the manpages
ok releng@
ok core@ via releng@
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 13-Jan-2012  yamt file cpu_ucode.h was added on branch yamt-pagecache on 2012-04-17 00:07:05 +0000
 1.1.4.2 18-Feb-2012  mrg merge to -current.
 1.1.4.1 13-Jan-2012  mrg file cpu_ucode.h was added on branch jmcneill-usbmp on 2012-02-18 07:33:34 +0000
 1.2.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.3.36.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.3.36.1 17-Mar-2018  pgoyette Import christos's changes for the compat_60 cpu_ucode stuff
 1.3.30.1 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1772):

sys/arch/x86/include/cpu_ucode.h: revision 1.5
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.19
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.20

Add missing newline in a message. KNF.
Verify checksum of the extended signature table.
 1.4.6.1 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1538):
sys/arch/x86/include/cpu_ucode.h: revision 1.5
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.19
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.20
Add missing newline in a message. KNF.
Verify checksum of the extended signature table.
 1.42 24-Oct-2020  mgorny Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs

When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64
use the 64-suffixed variant in order to include the complete FIP/FDP
registers in the x87 area.

The difference between the two variants is that the FXSAVE64 (new)
variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64),
while the legacy FXSAVE variant uses split fields: 32-bit offset,
16-bit segment and 16-bit reserved field (union fp_addr.fa_32).
The latter implies that the actual addresses are truncated to 32 bits
which is insufficient in modern programs.

The change is applied only to 64-bit programs on amd64. Plain i386
and compat32 continue using plain FXSAVE. Similarly, NVMM is not
changed as I am not familiar with that code.

This is a potentially breaking change. However, I don't think it likely
to actually break anything because the data provided by the old variant
were not meaningful (because of the truncated pointer).
 1.41 15-Jun-2020  msaitoh Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.40 14-Jun-2020  riastradh Use static constant rather than stack memset buffer for zero fpregs.
 1.39 02-May-2020  maxv Modify the hotpatch mechanism, in order to make it much less ROP-friendly.

Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.

- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.

The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).

With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.

The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
 1.38 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.37 30-Oct-2019  maxv branches: 1.37.6;
More inlined ASM.
 1.36 07-Sep-2019  maxv Convert rdmsr_locked and wrmsr_locked to inlines.
 1.35 07-Sep-2019  maxv Add a memory barrier on wrmsr, because some MSRs control memory access
rights (we don't use them though). Also add barriers on fninit and clts
for safety.
 1.34 05-Jul-2019  maxv branches: 1.34.2;
More inlines, prerequisites for future changes. Also, remove fngetsw(),
which was a duplicate of fnstsw().
 1.33 03-Jul-2019  maxv Inline x86_cpuid2(), prerequisite for future changes. Also, add "memory"
on certain other inlines, to make sure GCC does not reorder.
 1.32 30-May-2019  christos use __asm
 1.31 29-May-2019  maxv Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.30 11-May-2019  christos Undo previous, fixed in userland.
 1.29 11-May-2019  christos expose the {rd,wr}msr functions to userland and install the header for
the benefit of cpuctl (fix the build).
 1.28 09-May-2019  bouyer sti/cli are not allowed on Xen, we have to clear/set a bit in the
shared page. Revert x86_disable_intr/x86_enable_intr to plain function
calls on XENPV.
While there, clean up unused functions and macros, and change cli()/sti()
macros to x86_disable_intr/x86_enable_intr.
Makes Xen domU boot again
(http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/)
 1.27 04-May-2019  maxv More inlined ASM. While here switch to proper types.
 1.26 01-May-2019  maxv Start converting the x86 CPU functions to inlined ASM. Matters for NVMM,
where some are invoked millions of times.
 1.25 01-May-2019  maxv Remove unused functions and reorder a little.
 1.24 22-Feb-2018  maxv branches: 1.24.4;
Improve the SVS initialization.

Declare x86_patch_window_open() and x86_patch_window_close(), and globalify
x86_hotpatch().

Introduce svs_enable() in x86/svs.c, that does the SVS hotpatching.

Change svs_init() to take a bool. This function gets called twice; early
when the system just booted (and nothing is initialized), lately when at
least pmap_kernel has been initialized.
 1.23 15-Oct-2017  maxv Add setds and setes, will be useful in the future.
 1.22 13-Dec-2016  kamil branches: 1.22.8;
Torn down KSTACK_CHECK_DR0, i386-only feature to detect stack overflow

This feature was intended to detect stack overflow with CPU Debug Registers
(x86). It was never ported to other ports, neither amd64 and should be
adapted for SMP...

Currently there might be better ways to detect stack overflows like page
mapping protection. Since the number of Debug Registers is restricted
(4 on x86), torn it down completely.

This interface introduced helper functions for Debug Registers, they will
be replaced with the new <x86/dbregs.h> interface.

KSTACK_CHECK_DR0 was disabled by default and won't affect ordinary users.

Sponsored by <The NetBSD Foundation>
 1.21 13-Dec-2016  kamil Switch x86 CPU Debug Register types from vaddr_t to register_t

This is more opaque and appropriate type, as vaddr_t is meant to be used
for vitual address value. Not all DR on x86 are used to represent virtual
address (DR6 and DR7 are definitely not).

No functional change intended.

Change suggested by <christos>

Sponsored by <The NetBSD Foundation>
 1.20 27-Nov-2016  kamil Add accessors for available x86 Debug Registers

There are 8 Debug Registers on i386 (available at least since 80386) and 16
on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and
DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3,
DR6-DR7 for all ports.

Debug Registers x86:
* DR0-DR3 Debug Address Registers
* DR4-DR5 Reserved
* DR6 Debug Status Register
* DR7 Debug Control Register
* DR8-DR15 Reserved

Access the registers is available only from a kernel (ring 0) as there is
needed top protected access. For this reason there is need to use special
XEN functions to get and set the registers in the XEN3 kernels.

XEN specific functions as defined in NetBSD:
- HYPERVISOR_get_debugreg()
- HYPERVISOR_set_debugreg()

This code extends the existing rdr6() and ldr6() accessor for additional:
- rdr0() & ldr0()
- rdr1() & ldr1()
- rdr2() & ldr2()
- rdr3() & ldr3()
- rdr7() & ldr7()

Traditionally accessors for DR6 were passing vaddr_t argument, while it's
appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's
not a big deal. The resulting functionality should be equivalent so stick
to this convention and use the vaddr_t type for all DR accessors.

There was already a function defined for rdr6() in XEN, but it had a nit on
AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on
AMD64), truncating result. It still works for DR6, but for the sake of
simplicity always return full 64-bit value.

New accessors duplicate functionality of the dr0() function available on
i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with
logic to set appropriate types of interrupts, now accessors are designed to
pass verbatim values from user-land (with simple sanity checks in the
kernel). At the moment there are no plans to make possible to coexist
KSTACK_CHECK_DR0 with debug registers for user applications (debuggers).

options KSTACK_CHECK_DR0
Detect kernel stack overflow using DR0 register. This option uses DR0
register exclusively so you can't use DR0 register for other purpose
(e.g., hardware breakpoint) if you turn this on.

The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported
to amd64.

Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0.

Sponsored by <The NetBSD Foundation>
 1.19 05-Jan-2016  hannken branches: 1.19.2;
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.

As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.18 25-Feb-2014  dsl branches: 1.18.4; 1.18.6; 1.18.8;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
 1.17 13-Feb-2014  dsl Check the argument types for the fpu asm functions.
 1.16 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.15 09-Feb-2014  dsl Add x86_stmxcsr for amd64.
 1.14 08-Dec-2013  dsl Add some definitions for cpu 'extended state'.
These are needed for support of the AVX SIMD instructions.
Nothing yet uses them.
 1.13 24-Sep-2011  jym branches: 1.13.2; 1.13.8; 1.13.12; 1.13.14; 1.13.16; 1.13.22;
Import rdmsr_safe(msr, *value) for x86 world. It allows reading MSRs
in a safe way by handling the fault that might trigger for certain
register <> CPU/arch combos.

Requested by Jukka. Patch adapted from one found in DragonflyBSD.
 1.12 07-Jul-2010  chs add the guts of TLS support on amd64. based on joerg's patch,
reworked by me to support 32-bit processes as well.
we now keep %fs and %gs loaded with the user values
while in the kernel, which means we don't need to
reload them when returning to user mode.
 1.11 27-Jan-2009  christos branches: 1.11.2; 1.11.4; 1.11.6;
factor out common reset code.
 1.10 19-Dec-2008  cegger x86_patch() is not available on Xen.
Make Xen kernels link again.
 1.9 19-Dec-2008  ad PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.8 30-Apr-2008  cegger branches: 1.8.8; 1.8.10;
AMD's APM Volume 2 says 'All control registers are 64bit in long mode'.
Fix the CR0 prototype to match this (the asm implementation is correct though).
OK ad
 1.7 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.6 27-Apr-2008  ad branches: 1.6.2;
+lcr2
 1.5 16-Apr-2008  cegger branches: 1.5.2;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.4 01-Jan-2008  yamt branches: 1.4.6;
add x86_cpuid2, which can specify ecx register.
 1.3 15-Nov-2007  ad branches: 1.3.6;
Remove support for 80386 level CPUs. PR port-i386/36163.
 1.2 26-Sep-2007  ad branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10; 1.2.12; 1.2.14;
Update copyright.
 1.1 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.2.14.3 09-Jan-2008  matt sync with HEAD
 1.2.14.2 06-Nov-2007  matt sync with HEAD
 1.2.14.1 26-Sep-2007  matt file cpufunc.h was added on branch matt-armv6 on 2007-11-06 23:23:34 +0000
 1.2.12.2 18-Feb-2008  mjf Sync with HEAD.
 1.2.12.1 19-Nov-2007  mjf Sync with HEAD.
 1.2.10.4 21-Jan-2008  yamt sync with head
 1.2.10.3 07-Dec-2007  yamt sync with head
 1.2.10.2 27-Oct-2007  yamt sync with head.
 1.2.10.1 26-Sep-2007  yamt file cpufunc.h was added on branch yamt-lazymbuf on 2007-10-27 11:28:54 +0000
 1.2.8.1 18-Nov-2007  bouyer Sync with HEAD
 1.2.6.3 03-Dec-2007  ad Sync with HEAD.
 1.2.6.2 09-Oct-2007  ad Sync with head.
 1.2.6.1 26-Sep-2007  ad file cpufunc.h was added on branch vmlocking on 2007-10-09 13:38:41 +0000
 1.2.4.2 06-Oct-2007  yamt sync with head.
 1.2.4.1 26-Sep-2007  yamt file cpufunc.h was added on branch yamt-x86pmap on 2007-10-06 15:33:31 +0000
 1.2.2.3 21-Nov-2007  joerg Sync with HEAD.
 1.2.2.2 02-Oct-2007  joerg Sync with HEAD.
 1.2.2.1 26-Sep-2007  joerg file cpufunc.h was added on branch jmcneill-pm on 2007-10-02 18:27:49 +0000
 1.3.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.4.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.4.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.2.1 18-May-2008  yamt sync with head.
 1.6.2.3 11-Aug-2010  yamt sync with head.
 1.6.2.2 04-May-2009  yamt sync with head.
 1.6.2.1 16-May-2008  yamt sync with head.
 1.8.10.4 01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1969):
sys/arch/x86/include/cpufunc.h: revision 1.13
sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch
sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch

Backport rdmsr_safe() to access MSR safely.
 1.8.10.3 02-Feb-2009  snj branches: 1.8.10.3.6; 1.8.10.3.10;
Pull up following revision(s) (requested by ad in ticket #396):
sys/arch/amd64/amd64/machdep.c: revision 1.122
sys/arch/i386/i386/machdep.c: revision 1.657
sys/arch/x86/include/cpufunc.h: revision 1.11
sys/arch/x86/x86/x86_machdep.c: revision 1.28
factor out common reset code.
 1.8.10.2 02-Feb-2009  snj Pull up following revision(s) (requested by bouyer in ticket #343):
sys/arch/x86/x86/identcpu.c: revision 1.13
sys/arch/x86/include/cpufunc.h: revision 1.10
x86_patch() is not available on Xen.
Make Xen kernels link again.
 1.8.10.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #343):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.14
sys/arch/x86/include/cpufunc.h: revision 1.9
sys/arch/x86/x86/identcpu.c: revision 1.12
sys/arch/x86/x86/cpu.c: revision 1.60
sys/arch/x86/x86/patch.c: revision 1.15
PR kern/40213 my i386 machine can't boot because of tsc
- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.
- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.8.10.3.10.1 01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1969):
sys/arch/x86/include/cpufunc.h: revision 1.13
sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch
sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch

Backport rdmsr_safe() to access MSR safely.
 1.8.10.3.6.1 01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1969):
sys/arch/x86/include/cpufunc.h: revision 1.13
sys/arch/amd64/amd64/cpufunc.S: revision 1.20-1.21 via patch
sys/arch/i386/i386/cpufunc.S: revision 1.16-1.17, 1.21 via patch

Backport rdmsr_safe() to access MSR safely.
 1.8.8.2 03-Mar-2009  skrll Sync with HEAD.
 1.8.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.11.6.1 05-Mar-2011  rmind sync with head
 1.11.4.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.11.2.1 24-Oct-2010  jym Sync with HEAD
 1.13.22.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1361):
sys/arch/x86/include/cpufunc.h: revision 1.19
sys/arch/x86/x86/errata.c: revision 1.23
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.13.16.1 18-May-2014  rmind sync with head
 1.13.14.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1361):
sys/arch/x86/include/cpufunc.h: revision 1.19
sys/arch/x86/x86/errata.c: revision 1.23
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.13.12.2 03-Dec-2017  jdolecek update from HEAD
 1.13.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.8.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1361):
sys/arch/x86/include/cpufunc.h: revision 1.19
sys/arch/x86/x86/errata.c: revision 1.23
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.13.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.8.1 06-Feb-2016  snj Pull up following revision(s) (requested by hannken in ticket #1073):
sys/arch/x86/x86/errata.c: revision 1.23
sys/arch/x86/include/cpufunc.h: revision 1.19
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.18.6.3 05-Feb-2017  skrll Sync with HEAD
 1.18.6.2 05-Dec-2016  skrll Sync with HEAD
 1.18.6.1 19-Mar-2016  skrll Sync with HEAD
 1.18.4.1 26-Jan-2016  snj Pull up following revision(s) (requested by hannken in ticket #1073):
sys/arch/x86/x86/errata.c: revision 1.23
sys/arch/x86/include/cpufunc.h: revision 1.19
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.19.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.22.8.1 06-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #603:

amd64/conf/kern.ldscript 1.25 (patch)
amd64/conf/kern.ldscript.Xen 1.14 (patch)
i386/conf/kern.ldscript 1.21 (patch)
i386/conf/kern.ldscript.Xen 1.15 (patch)
x86/include/cpufunc.h 1.24 (patch)
x86/x86/patch.c 1.25 (partial) 1.26 (partial)

Backport x86_hotpatch.
 1.24.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.24.4.1 10-Jun-2019  christos Sync with HEAD
 1.34.2.1 16-Oct-2019  martin Pull up following revision(s) (requested by maxv in ticket #338):

sys/arch/x86/include/cpufunc.h: revision 1.35

Add a memory barrier on wrmsr, because some MSRs control memory access
rights (we don't use them though). Also add barriers on fninit and clts
for safety.
 1.37.6.1 15-Apr-2020  bouyer On amd64, always use the cmpxchg8b version of spllower. All x86_64 host should
have it and we already rely on it in lock stubs.
On i386, always use i686_mutex_spin_exit and cx8_spllower for Xen;
Xen doesn't run on CPUs on CPUs lacking the required instructions anyway.
Skip x86_patch only for XENPV, and adjust for changes in assembly functions.
Tested on Xen PV and PVHVM, and on bare metal core i5.
 1.4 08-Dec-2013  dsl Remove the now-unused CPU_MAXMODEL and CPU_DEFMODEL
 1.3 27-Jan-2011  bouyer branches: 1.3.4; 1.3.14; 1.3.18;
Properly identify vortex86 CPUs.
 1.2 11-May-2008  ad branches: 1.2.12; 1.2.20; 1.2.26; 1.2.28;
Re-base the cpu types at 0 so they can be used as an array index.
 1.1 01-Jan-2007  ad branches: 1.1.2; 1.1.6; 1.1.20; 1.1.32; 1.1.52; 1.1.54; 1.1.56; 1.1.58;
Report on and where possible, try to work around some of the known errata
for Athlon 64 and Opteron processors. Tested briefly by cube@ and elad@.
 1.1.58.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.1.56.1 16-May-2008  yamt sync with head.
 1.1.54.1 18-May-2008  yamt sync with head.
 1.1.52.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.32.2 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.1.32.1 01-Jan-2007  wrstuden file cputypes.h was added on branch wrstuden-fixsa on 2007-09-03 07:04:12 +0000
 1.1.20.2 05-Jun-2007  bouyer Pull up following revision(s) (requested by xtraeme in ticket 702):
sys/arch/amd64/amd64/identcpu.c patch
sys/arch/amd64/include/cpu.h patch
sys/arch/x86/include/cputypes.h 1.1
Print all extended features for Intel EM64T CPUs on amd64.
 1.1.20.1 01-Jan-2007  bouyer file cputypes.h was added on branch netbsd-4 on 2007-06-05 20:28:11 +0000
 1.1.6.2 26-Feb-2007  yamt sync with head.
 1.1.6.1 01-Jan-2007  yamt file cputypes.h was added on branch yamt-lazymbuf on 2007-02-26 09:08:47 +0000
 1.1.2.2 12-Jan-2007  ad Sync with head.
 1.1.2.1 01-Jan-2007  ad file cputypes.h was added on branch newlock2 on 2007-01-12 01:49:08 +0000
 1.2.28.1 08-Feb-2011  bouyer Sync with HEAD
 1.2.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.2.20.1 05-Mar-2011  rmind sync with head
 1.2.12.1 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.3.18.1 18-May-2014  rmind sync with head
 1.3.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.55 01-May-2025  imil Introduce cpu_max_hypervisor_cpuid to cache hypervisor CPUID leaf

This variable stores the maximum supported hypervisor CPUID leaf so that
future checks can avoid repeated calls to x86_cpuid().
 1.54 11-Apr-2025  imil nvmm(4): implement CPUID leaf 0x40000010, VMware compatible TSC and LAPIC
frequency detection. Partially fixes PR kern/59170
 1.53 14-Jul-2020  yamaguchi branches: 1.53.26;
Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.52 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.51 11-Feb-2019  cherry branches: 1.51.10;
We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.50 23-May-2017  nonaka branches: 1.50.10;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.49 19-Apr-2017  nonaka remove prototypes of nonexistent function.
 1.48 13-Jan-2017  christos branches: 1.48.2;
Add missing forward decl.
 1.47 13-Dec-2015  maxv branches: 1.47.2;
Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.46 20-Apr-2012  rmind branches: 1.46.2; 1.46.14; 1.46.16; 1.46.18;
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.45 13-Aug-2011  cherry branches: 1.45.2; 1.45.6; 1.45.8;
MP probing and startup code
 1.44 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.43 04-Mar-2011  jruoho branches: 1.43.2;
Move INTEL_ONDEMAND_CLOCKMOD -- or odcm(4) -- to the cpufeaturebus.
 1.42 24-Feb-2011  jruoho Fix autoconf(9) of cpufeaturebus.
 1.41 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.40 20-Feb-2011  jruoho Modularize coretemp(4). Ok jmcneill@.
 1.39 19-Feb-2011  jmcneill modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.38 20-Aug-2010  jruoho branches: 1.38.2; 1.38.4;
Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.37 09-Aug-2010  jruoho Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
 1.36 09-Aug-2010  jruoho Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.
 1.35 08-Aug-2010  jruoho Merge P-state support for acpicpu(4).

Remarks:

1. All processors (x86 or not) for which the vendor has implemented
ACPI I/O access routines are supported. Native instructions are
currently supported only for Intel's "Enhanced Speedstep". Code for
"PowerNow!" (AMD) will be merged later. Native support for VIA's
"PowerSaver" will be investigated.

2. Backwards compatibility with existing userland code is maintained.
Comparable to the case with cpu_idle(9), the ACPI CPU driver
installs alternative functions for the existing sysctl(8) controls.
The "native" behavior (if any) is restored upon detachment.

3. The dynamic nature of ACPI-provided P-states needs more investigation.
The maximum frequency induced (but not forced) by the firmware may
change dynamically. Currently, the sysctl(8) controls error out with
a value larger than the dynamic maximum. The code itself does not
however yet react to the notifications from the firmware by changing
the frequencies in-place. Presumably the system administrator should
be able to choose whether to use dynamic or static frequencies.
 1.34 04-Aug-2010  jruoho Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.
 1.33 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.32 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.31 02-Oct-2009  jmcneill branches: 1.31.2; 1.31.4;
Add support for VIA C7 temperature sensors (options VIA_C7TEMP)
 1.30 02-Oct-2009  jmcneill Use the TSC and current multiplier to calculate bus clock on VIA C7 Esther.
Probably needed for all C7 and Nano processors, but to be safe only use
this alternate method on Esther for now.

Now est on my C7-M 1.6GHz properly reports frequencies from 1600 to 400,
instead of 2133 to 533.
 1.29 05-Aug-2009  jym Add Intel SpeedStep and AMD PowerNow! support in Xen dom0. MSR operations
are now compiled in by default.

Note that MSR support in Xen depends on its version. rdmsr() should always
succeed, but wrmsr() to certain registers can end in a NOOP. In that case,
the error will be logged (see xm dmesg).

Setting CPU frequency (SpeedStep) requires Xen 3.3 with the option
cpufreq="dom0-kernel" passed down to hypervisor during boot.

Compiled and tested for SpeedStep under i386 for XEN3_DOM0 and XEN3PAE_DOM0
by jym@. amd64 was tested by Joel Carnat.

See also http://mail-index.netbsd.org/port-xen/2009/08/02/msg005213.html .

Commit requested by bouyer@.
 1.28 11-Mar-2009  yamt wrap opt_* includes with _KERNEL_OPT.
(i forgot to commit this with the tprof modules yesterday.)
 1.27 13-May-2008  ad branches: 1.27.6; 1.27.8; 1.27.12; 1.27.14; 1.27.16;
Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.
 1.26 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.25 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.24 28-Apr-2008  martin branches: 1.24.2;
Remove clause 3 and 4 from TNF licenses
 1.23 16-Apr-2008  cegger branches: 1.23.2; 1.23.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.22 04-Jan-2008  yamt branches: 1.22.6;
i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.
 1.21 01-Jan-2008  yamt try to detect processor resource sharing topologies. ie. package/core/smt IDs.
 1.20 18-Dec-2007  joerg Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.19 15-Nov-2007  ad branches: 1.19.2; 1.19.6;
Disable TLB shootdown IPIs while in the debugger. Crashdumps may try to
use them, and all but one CPU is paused. Reported and tested by martin@.
 1.18 13-Nov-2007  ad In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.
 1.17 12-Nov-2007  ad - cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.
 1.16 29-Oct-2007  xtraeme branches: 1.16.2;
Add coretemp(4). A new driver for Intel Core's on-die thermal sensor,
available on Intel Core or newer CPUs.

Ported from FreeBSD. Tested by rmind on i386 and joerg on amd64.

Enabled with "options INTEL_CORETEMP".
 1.15 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.14 01-Jul-2007  xtraeme branches: 1.14.8; 1.14.10; 1.14.14;
Add support for the VIA C7-M and Eden processors in the
Enhanced Speedstep driver.

Tested by Heron Gallegos <gallegos at csxxi dot net dot mx>
 1.13 03-Jun-2007  xtraeme Make the Enhanced Speedstep driver available for i386 and amd64.
To use it on EM64T CPUs supporting the EST CPUID feature. Note that
some CPUs still don't work with this driver, like Xeon or Pentium 4.

Move the p[34]_get_bus_clock functions into its own file,
intel_busclock.c and remove this code from i386/identcpu.c.

Tested on i386 by myself and amd64 by Tonerre.
 1.12 21-Mar-2007  xtraeme branches: 1.12.4;
Don't build msr_ipifuncs on Xen, fixes the build with XEN2_DOM0.
 1.11 20-Mar-2007  xtraeme Driver for Intel Thermal Monitor (feature TM) On-Demand Clock
Modulation.

This works by changing the duty cycle of the clock modulation,
and saves power and helps to not increase the temperature by
software.

Adapted from OpenBSD/FreeBSD's p4tcc.

To enable it one must use "options INTEL_ONDEMAND_CLOCKMOD".

Tested by me in UP and SMP, ok'ed by Matthew R. Green.
 1.10 15-Mar-2007  xtraeme Ok... there were people really angry with this, backing it out.
 1.9 15-Mar-2007  xtraeme Add a driver for the Pentium 4 and later models with feature TM
(Thermal Monitor).

This driver will throttle the CPU clock modulation, saving some
power, also known as ODMC (On Demand Modulation Clock).

The processor can change from 12.5% to 100% (there are two erratas,
so two levels might be skipped in the worst case).

If supported, you'll see the following sysctl sub-tree:

machdep.p4tcc.throttling.target: CPU Clock throttling state (0 = lowest, 7 highest)
machdep.p4tcc.throttling.current: current CPU throttling state
machdep.p4tcc.throttling.available: list of CPU Clock throttling states

machdep.p4tcc.throttling.target = 2
machdep.p4tcc.throttling.current = 2
machdep.p4tcc.throttling.available = 7 6 5 4 3 2

Adapted from OpenBSD/FreeBSD.
 1.8 06-Mar-2007  yamt branches: 1.8.2; 1.8.4; 1.8.6;
multiple inclusion protection.
 1.7 05-Mar-2007  drochner clean up how cpus and ioapics are attached at the mainbus:
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
 1.6 01-Jan-2007  ad branches: 1.6.2;
Report on and where possible, try to work around some of the known errata
for Athlon 64 and Opteron processors. Tested briefly by cube@ and elad@.
 1.5 08-Aug-2006  cube branches: 1.5.2; 1.5.6; 1.5.8;
files.x86 isn't included by Xen kernels, so opt_powernow_k8.h never gets
created by config(1), and thus it's not safe to use it in cpuvar.h.

Simply declare the prototype for k8_powernow_init in powernow.h. No need
to #ifdef protect a prototype, after all, only its users.

Un-breaks build of Xen kernels.
 1.4 07-Aug-2006  xtraeme branches: 1.4.2;
* Do not change struct powernow_pst_s (I added another member in my
previous patch) and this MUST be of that size, otherwise the tables
won't be found.

* powernow_k8.c moved into x86/x86, it should work both i386 and amd64.

* Added more DPRINTFs needed to found the first problem.

* Create "machdep.powernow.frequency" again, I can't remember why I
removed frequency... it should work with estd now.

* Do not try to call k[78]_powernow_init() if cpu is not AMD (thanks
to christos).

And more things I can't remember, but this time it will work in
Athlon 64 cpus and it won't crash in EM64T cpus.
 1.3 27-Oct-2003  junyoung branches: 1.3.16; 1.3.30;
Nuke __P().
 1.2 23-Jun-2003  martin branches: 1.2.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.1 01-Mar-2003  fvdl Moved here from i386/include
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.30.1 09-Sep-2006  rpaulo sync with head
 1.3.16.5 21-Jan-2008  yamt sync with head
 1.3.16.4 07-Dec-2007  yamt sync with head
 1.3.16.3 15-Nov-2007  yamt sync with head.
 1.3.16.2 03-Sep-2007  yamt sync with head.
 1.3.16.1 26-Feb-2007  yamt sync with head.
 1.4.2.1 08-Aug-2006  tron Pull up following revision(s) (requested by cube in ticket #7):
sys/arch/x86/include/cpuvar.h: revision 1.5
sys/arch/x86/include/powernow.h: revision 1.3
files.x86 isn't included by Xen kernels, so opt_powernow_k8.h never gets
created by config(1), and thus it's not safe to use it in cpuvar.h.
Simply declare the prototype for k8_powernow_init in powernow.h. No need
to #ifdef protect a prototype, after all, only its users.
Un-breaks build of Xen kernels.
 1.5.8.1 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.5.6.1 12-Sep-2007  msaitoh Pull up following patches (requested by xtraeme in ticket #809)

share/man/man4/options.4 patch
sys/arch/i386/conf/files.i386 patch
sys/arch/i386/i386/est.c delete
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/include/cpuvar.h patch
sys/arch/x86/x86/est.c new file
sys/arch/x86/x86/intel_busclock.c new file
sys/arch/amd64/amd64/identcpu.c patch
sys/arch/amd64/conf/GENERIC patch

Add support for the VIA C7-M and Eden processors in the Enhanced
Speedstep driver.
amd64: The Enhanced Speedstep driver is now able to work on EM64T
CPUs running in 64bit mode.
 1.5.2.1 12-Jan-2007  ad Sync with head.
 1.6.2.2 24-Mar-2007  yamt sync with head.
 1.6.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.8.6.1 29-Mar-2007  reinoud Pullup to -current
 1.8.4.1 11-Jul-2007  mjf Sync with head.
 1.8.2.6 03-Dec-2007  ad Sync with HEAD.
 1.8.2.5 03-Dec-2007  ad Sync with HEAD.
 1.8.2.4 01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.8.2.3 15-Jul-2007  ad Sync with head.
 1.8.2.2 09-Jun-2007  ad Sync with head.
 1.8.2.1 10-Apr-2007  ad Sync with head.
 1.12.4.2 03-Oct-2007  garbled Sync with HEAD
 1.12.4.1 26-Jun-2007  garbled Sync with HEAD.
 1.14.14.2 18-Nov-2007  bouyer Sync with HEAD
 1.14.14.1 13-Nov-2007  bouyer Sync with HEAD
 1.14.10.2 09-Jan-2008  matt sync with HEAD
 1.14.10.1 06-Nov-2007  matt sync with HEAD
 1.14.8.3 21-Nov-2007  joerg Sync with HEAD.
 1.14.8.2 14-Nov-2007  joerg Sync with HEAD.
 1.14.8.1 29-Oct-2007  joerg Sync with HEAD.
 1.16.2.3 18-Feb-2008  mjf Sync with HEAD.
 1.16.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.16.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.19.6.2 08-Jan-2008  bouyer Sync with HEAD
 1.19.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.19.2.1 26-Dec-2007  ad Sync with head.
 1.22.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.23.4.6 09-Oct-2010  yamt sync with head
 1.23.4.5 11-Aug-2010  yamt sync with head.
 1.23.4.4 11-Mar-2010  yamt sync with head
 1.23.4.3 19-Aug-2009  yamt sync with head.
 1.23.4.2 04-May-2009  yamt sync with head.
 1.23.4.1 16-May-2008  yamt sync with head.
 1.23.2.1 18-May-2008  yamt sync with head.
 1.24.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.27.16.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.27.16.1 21-Apr-2010  matt sync to netbsd-5
 1.27.14.1 23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.27.12.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.27.12.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.27.12.3 24-Oct-2010  jym Sync with HEAD
 1.27.12.2 01-Nov-2009  jym Sync with HEAD.
 1.27.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.27.8.3 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.27.8.2 05-Oct-2009  sborrill Pull up the following revisions(s) (requested by jmcneill in ticket #1061):
sys/arch/x86/conf/files.x86: revision 1.53
sys/arch/x86/include/cpuvar.h: revision 1.31
sys/arch/x86/x86/identcpu.c: revision 1.17
sys/arch/x86/x86/viac7temp.c: revision 1.1
sys/arch/i386/conf/ALL: revision 1.218
sys/arch/i386/conf/GENERIC: revision 1.949
Add support for VIA C7 temperature sensors (options VIA_C7TEMP) and enable
in i386 GENERIC kernel.
 1.27.8.1 05-Oct-2009  sborrill Pull up following revision(s) (requested by jmcneill in ticket #1059):
sys/arch/x86/include/cpuvar.h: 1.30
sys/arch/x86/x86/est.c: 1.12
sys/arch/x86/x86/intel_busclock.c: 1.8

Use the TSC and current multiplier to calculate bus clock on VIA C7 Esther.
Probably needed for all C7 and Nano processors, but to be safe only use this
alternate method on Esther for now.
 1.27.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.31.4.3 05-Mar-2011  rmind sync with head
 1.31.4.2 31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.31.4.1 30-May-2010  rmind sync with head
 1.31.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.31.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.31.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.38.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.38.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.43.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.43.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.45.8.1 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.45.6.1 29-Apr-2012  mrg sync to latest -current.
 1.45.2.1 23-May-2012  yamt sync with head.
 1.46.18.1 19-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.46.16.3 28-Aug-2017  skrll Sync with HEAD
 1.46.16.2 05-Feb-2017  skrll Sync with HEAD
 1.46.16.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.46.14.1 06-Mar-2016  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67
Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.46.2.1 03-Dec-2017  jdolecek update from HEAD
 1.47.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.47.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.48.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.50.10.1 10-Jun-2019  christos Sync with HEAD
 1.51.10.1 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.53.26.1 02-Aug-2025  perseant Sync with HEAD
 1.4 11-Jan-2014  christos Add softint case (Richard Hansen)
 1.3 30-Apr-2011  christos branches: 1.3.2; 1.3.6; 1.3.8; 1.3.14; 1.3.18; 1.3.22;
add a define for pcb_sp
 1.2 10-Apr-2011  christos branches: 1.2.2;
something ate my /
 1.1 10-Apr-2011  christos Merge db_trace for x86. From: Vladimir Kirillov proger at wilab dot org dot ua
 1.2.2.3 31-May-2011  rmind sync with head
 1.2.2.2 21-Apr-2011  rmind sync with head
 1.2.2.1 10-Apr-2011  rmind file db_machdep.h was added on branch rmind-uvmplock on 2011-04-21 01:41:32 +0000
 1.3.22.1 18-May-2014  rmind sync with head
 1.3.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.14.1 07-Feb-2014  sborrill Pull up the following revisions(s) (requested by christos in ticket #1017):
sys/arch/x86/include/db_machdep.h: revision 1.4
sys/arch/i386/i386/db_machdep.c: revision 1.5

Fix ddb backtrace for softintr (i386).
 1.3.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 30-Apr-2011  jruoho file db_machdep.h was added on branch jruoho-x86intr on 2011-06-06 09:07:06 +0000
 1.3.2.2 02-May-2011  jym Sync with head.
 1.3.2.1 30-Apr-2011  jym file db_machdep.h was added on branch jym-xensuspend on 2011-05-02 22:49:57 +0000
 1.8 13-Jan-2019  maxv Error out if the higher 32 bits of DR6 and DR7 are set. MOV DR would
fault otherwise.
 1.7 27-Sep-2018  maxv Export x86_dbregs_{save/restore}, will be used outside. Reproduce some
internal dbregs logic in them.
 1.6 26-Jul-2018  maxv Rework dbregs, to switch the registers during context switches, and not on
each user->kernel transition via userret. Reloads of DR6/DR7 are expensive
on both native and xen.
 1.5 22-Jul-2018  maxv Clean up dbregs; remove useless comments, remove arguments from prototypes,
style, add KASSERT and move x86_dbregspl into dbregs.c. No real functional
change.
 1.4 23-Feb-2017  kamil branches: 1.4.12; 1.4.14; 1.4.16;
Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>
 1.3 18-Jan-2017  kamil Embed hardware trap and its type that fired (x86), information for tracers

Now x86 throws SIGTRAP on hardware exception with:
- si_code TRAP_HWWPT - dedicated for hw assisted watchpoint interface
- si_trap - unchanged (T_TRCTRAP)
- si_trap2 - watchpoint number that fired
- si_trap3 - watchpoint specific event description

x86 returns in si_trap3 one of the field from <x86/dbregs.h>
- X86_HW_WATCHPOINT_EVENT_FIRED - watchpoint fired
- X86_HW_WATCHPOINT_EVENT_FIRED_AND_SSTEP - watchpoint fired under PT_STEP

Othe changes:
- restrict more code from <x86/dbregs.h> to _KERNEL

Sponsored bt <The NetBSD Foundation>
 1.2 15-Dec-2016  kamil branches: 1.2.2; 1.2.4;
Add support for hardware assisted watchpoints/breakpoints API in ptrace(2)

Add new ptrace(2) calls:
- PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
- PT_READ_WATCHPOINT - read struct ptrace_watchpoint from the kernel state
- PT_WRITE_WATCHPOINT - write new struct ptrace_watchpoint state, this
includes enabling and disabling watchpoints

The ptrace_watchpoint structure contains MI and MD parts:

typedef struct ptrace_watchpoint {
int pw_index; /* HW Watchpoint ID (count from 0) */
lwpid_t pw_lwpid; /* LWP described */
struct mdpw pw_md; /* MD fields */
} ptrace_watchpoint_t;

For example amd64 defines MD as follows:
struct mdpw {
void *md_address;
int md_condition;
int md_length;
};

These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.

Tested on amd64, initial support added for i386 and XEN.

Sponsored by <The NetBSD Foundation>
 1.1 27-Nov-2016  kamil branches: 1.1.2;
Add accessors for available x86 Debug Registers

There are 8 Debug Registers on i386 (available at least since 80386) and 16
on AMD64. Currently DR4 and DR5 are reserved on both cpu-families and
DR9-DR15 are still reserved on AMD64. Therefore add accessors for DR0-DR3,
DR6-DR7 for all ports.

Debug Registers x86:
* DR0-DR3 Debug Address Registers
* DR4-DR5 Reserved
* DR6 Debug Status Register
* DR7 Debug Control Register
* DR8-DR15 Reserved

Access the registers is available only from a kernel (ring 0) as there is
needed top protected access. For this reason there is need to use special
XEN functions to get and set the registers in the XEN3 kernels.

XEN specific functions as defined in NetBSD:
- HYPERVISOR_get_debugreg()
- HYPERVISOR_set_debugreg()

This code extends the existing rdr6() and ldr6() accessor for additional:
- rdr0() & ldr0()
- rdr1() & ldr1()
- rdr2() & ldr2()
- rdr3() & ldr3()
- rdr7() & ldr7()

Traditionally accessors for DR6 were passing vaddr_t argument, while it's
appropriate type for DR0-DR3, DR6-DR7 should be using u_long, however it's
not a big deal. The resulting functionality should be equivalent so stick
to this convention and use the vaddr_t type for all DR accessors.

There was already a function defined for rdr6() in XEN, but it had a nit on
AMD64 as it was casting HYPERVISOR_get_debugreg() to u_int (32-bit on
AMD64), truncating result. It still works for DR6, but for the sake of
simplicity always return full 64-bit value.

New accessors duplicate functionality of the dr0() function available on
i386 within the KSTACK_CHECK_DR0 option. dr0() is a specialized layer with
logic to set appropriate types of interrupts, now accessors are designed to
pass verbatim values from user-land (with simple sanity checks in the
kernel). At the moment there are no plans to make possible to coexist
KSTACK_CHECK_DR0 with debug registers for user applications (debuggers).

options KSTACK_CHECK_DR0
Detect kernel stack overflow using DR0 register. This option uses DR0
register exclusively so you can't use DR0 register for other purpose
(e.g., hardware breakpoint) if you turn this on.

The KSTACK_CHECK_DR0 functionality was designed for i386 and never ported
to amd64.

Code tested on i386 and amd64 with kernels: GENERIC, XEN3_DOMU, XEN3_DOM0.

Sponsored by <The NetBSD Foundation>
 1.1.2.4 28-Aug-2017  skrll Sync with HEAD
 1.1.2.3 05-Feb-2017  skrll Sync with HEAD
 1.1.2.2 05-Dec-2016  skrll Sync with HEAD
 1.1.2.1 27-Nov-2016  skrll file dbregs.h was added on branch nick-nhusb on 2016-12-05 10:54:59 +0000
 1.2.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.2.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.2.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.1 15-Dec-2016  pgoyette file dbregs.h was added on branch pgoyette-localcount on 2017-01-07 08:56:28 +0000
 1.4.16.1 10-Jun-2019  christos Sync with HEAD
 1.4.14.3 18-Jan-2019  pgoyette Synch with HEAD
 1.4.14.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.4.14.1 28-Jul-2018  pgoyette Sync with HEAD
 1.4.12.2 03-Dec-2017  jdolecek update from HEAD
 1.4.12.1 23-Feb-2017  jdolecek file dbregs.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.15 20-Aug-2022  riastradh machine/efi.h: Migrate common definitions to dev/efi/efi.h.
 1.14 20-Aug-2022  riastradh x86/efi.h: Assert size of struct efi_systbl.
 1.13 20-Aug-2022  riastradh arm/efi.h, x86/efi.h: Fix whitespace around RCS id.

No functional change intended.
 1.12 20-Aug-2022  riastradh x86/efi.h: Fix whitespace. No functional change intended.
 1.11 20-Aug-2022  riastradh machine/efi.h: Add more memory descriptor attributes.
 1.10 01-Apr-2022  skrll Trailing whitespace
 1.9 18-Oct-2019  manu Add UEFI boot services and I/O method protoypes
 1.8 22-Oct-2017  maya branches: 1.8.2; 1.8.6;
Move initialization code out of efi_probe into efi_init

and call it from cpu_configure
 1.7 11-Mar-2017  nonaka search SMBIOS from UEFI configuration table when boot with UEFI.
 1.6 23-Feb-2017  nonaka Avoid panic when amd64 kernel is booted from 32bit UEFI.
 1.5 14-Feb-2017  nonaka Handle persistent memory. Currently only debug output.
 1.4 14-Feb-2017  nonaka x86: make btinfo_memmap from btinfo_efimemmap for to reduce mem_cluster_cnt.

should fix PR/51953.
 1.3 09-Feb-2017  nonaka efi_md::md_virt always uses uint64_t.
 1.2 24-Jan-2017  nonaka Initial commit of native amd64 EFI boot loader.
 1.1 28-Jan-2016  christos branches: 1.1.2; 1.1.4; 1.1.6;
Add support for grub to find the ACPI root table pointer via a bootinfo entry
from grub.
From: https://mail-index.netbsd.org/tech-kern/2014/05/22/msg017119.html
 1.1.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.1.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.4 28-Aug-2017  skrll Sync with HEAD
 1.1.2.3 05-Feb-2017  skrll Sync with HEAD
 1.1.2.2 19-Mar-2016  skrll Sync with HEAD
 1.1.2.1 28-Jan-2016  skrll file efi.h was added on branch nick-nhusb on 2016-03-19 11:30:07 +0000
 1.8.6.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.2.2 03-Dec-2017  jdolecek update from HEAD
 1.8.2.1 22-Oct-2017  jdolecek file efi.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.1 23-Feb-2011  jruoho branches: 1.1.2; 1.1.4; 1.1.6; 1.1.10;
Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.1.10.2 06-Jun-2011  jruoho Sync with HEAD.
 1.1.10.1 23-Feb-2011  jruoho file est.h was added on branch jruoho-x86intr on 2011-06-06 09:07:06 +0000
 1.1.6.2 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.1.6.1 23-Feb-2011  jym file est.h was added on branch jym-xensuspend on 2011-03-28 23:04:49 +0000
 1.1.4.2 05-Mar-2011  rmind sync with head
 1.1.4.1 23-Feb-2011  rmind file est.h was added on branch rmind-uvmplock on 2011-03-05 20:52:28 +0000
 1.1.2.2 05-Mar-2011  bouyer Sync with HEAD
 1.1.2.1 23-Feb-2011  bouyer file est.h was added on branch bouyer-quota2 on 2011-03-05 15:10:10 +0000
 1.9 04-Oct-2025  riastradh x86, m68k <machine/float.h>: `Significand', not `mantissa'.

The IEEE 754 standard uses `significand' and has since its first
edition in 1985; Kahan and Knuth both use `significand' and
explicitly reject `mantissa'; `significand' doesn't have a
conflicting definition in logarithms; and in actual usage in the
floating-point and numerical analysis literature, `significand'
dominates.

No functional change intended -- comment-only.
 1.8 15-Jun-2024  rillig {m68k,x86}/float.h: fix cross references
 1.7 31-Dec-2023  dholland {x86,m68k}/float.h: document LDBL_MIN behavior

It seems that even though both these platforms have 12-byte floats
that are pretty much the same representation and both allegedly
IEEE-compliant, they manifest the top bit of the mantissa and then
differ slightly in the behavior of the extra encodings this permits.

Thanks to riastradh@ for helping sort this out.
 1.6 27-Apr-2013  joerg Systematically include sys/featuretest.h when _NETBSD_SOURCE is used.
Some are redundant, but make verification with grep much easier.
 1.5 23-Oct-2003  kleink branches: 1.5.140; 1.5.150;
* Move the definitions for types other than single-precision and double-
precision back to machine-dependent headers. C99 has no strict
requirement which, if any, extended-precision type `long double' must
match, and even between 80-bit formats there are differences in
implementation (m68k vs. x86).
* On arm, consider __VFP_FP__.
 1.4 12-May-2003  kleink branches: 1.4.2;
Rename <sys/float_ieee.h> to <sys/float_ieee754.h>, following libc's
convention for these.
 1.3 21-Apr-2003  christos Override LDBL_MIN
 1.2 19-Apr-2003  christos PR/3012: Greg A. Woods: Write all float.h files [except the vax of course]
in terms of float_ieee.h
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.4.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.150.1 23-Jun-2013  tls resync from head
 1.5.140.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.23 24-Oct-2020  mgorny Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs

When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64
use the 64-suffixed variant in order to include the complete FIP/FDP
registers in the x87 area.

The difference between the two variants is that the FXSAVE64 (new)
variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64),
while the legacy FXSAVE variant uses split fields: 32-bit offset,
16-bit segment and 16-bit reserved field (union fp_addr.fa_32).
The latter implies that the actual addresses are truncated to 32 bits
which is insufficient in modern programs.

The change is applied only to 64-bit programs on amd64. Plain i386
and compat32 continue using plain FXSAVE. Similarly, NVMM is not
changed as I am not familiar with that code.

This is a potentially breaking change. However, I don't think it likely
to actually break anything because the data provided by the old variant
were not meaningful (because of the truncated pointer).
 1.22 15-Oct-2020  mgorny Revert "Merge convert_xmm_s87.c into fpu.c"

I am going to add ATF tests for these two functions, and having them
in a separate file will make it more convenient to build and run them
in userspace.
 1.21 14-Jun-2020  riastradh Use static constant rather than stack memset buffer for zero fpregs.
 1.20 27-Nov-2019  maxv Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();
 1.19 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.18 04-Oct-2019  maxv Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.
 1.17 26-Jun-2019  mgorny Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).
 1.16 19-May-2019  maxv Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.
 1.15 19-May-2019  maxv Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.
 1.14 20-Jan-2019  maxv Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.
 1.13 05-Oct-2018  maxv export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
 1.12 22-Jun-2018  maxv branches: 1.12.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.
 1.11 20-Jun-2018  jdolecek as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv
 1.10 19-Jun-2018  jdolecek fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8
 1.9 14-Jun-2018  maxv Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.
 1.8 23-May-2018  maxv Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.
 1.7 03-Nov-2017  maxv branches: 1.7.2;
Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).
 1.6 25-Feb-2014  dsl branches: 1.6.4; 1.6.6; 1.6.10; 1.6.28;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
 1.5 23-Feb-2014  dsl Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.
 1.4 15-Feb-2014  dsl Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.
 1.3 15-Feb-2014  dsl Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).
 1.2 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.1 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.6.28.1 23-Jun-2018  martin Pull up the following, via patch, requested by maxv in ticket #897:

sys/arch/amd64/amd64/locore.S 1.166 (patch)
sys/arch/i386/i386/locore.S 1.157 (patch)
sys/arch/x86/include/cpu.h 1.92 (patch)
sys/arch/x86/include/fpu.h 1.9 (patch)
sys/arch/x86/x86/fpu.c 1.33-1.39 (patch)
sys/arch/x86/x86/identcpu.c 1.72 (patch)
sys/arch/x86/x86/vm_machdep.c 1.34 (patch)
sys/arch/x86/x86/x86_machdep.c 1.116,1.117 (patch)

Support eager fpu switch, to work around INTEL-SA-00145.
Provide a sysctl machdep.fpu_eager, which gets automatically
initialized to 1 on affected CPUs.
 1.6.10.3 03-Dec-2017  jdolecek update from HEAD
 1.6.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.10.1 25-Feb-2014  tls file fpu.h was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
 1.6.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.6.1 25-Feb-2014  yamt file fpu.h was added on branch yamt-pagecache on 2014-05-22 11:40:13 +0000
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 25-Feb-2014  rmind file fpu.h was added on branch rmind-smpnet on 2014-05-18 17:45:30 +0000
 1.7.2.3 26-Jan-2019  pgoyette Sync with HEAD
 1.7.2.2 20-Oct-2018  pgoyette Sync with head
 1.7.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.12.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.12.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12.2.1 10-Jun-2019  christos Sync with HEAD
 1.1 30-Apr-2021  christos branches: 1.1.4;
merge the i386 and amd64 gdt.h files.
 1.1.4.2 13-May-2021  thorpej Sync with HEAD.
 1.1.4.1 30-Apr-2021  thorpej file gdt.h was added on branch thorpej-i2c-spi-conf on 2021-05-13 00:47:29 +0000
 1.7 17-Oct-2023  bouyer Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.
Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen
when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.
x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console
xen/x86/consinit.c: support genfb as possible console
xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.
xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.
 1.6 16-Oct-2023  bouyer Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.5 28-Jan-2021  jmcneill branches: 1.5.18;
Remove x86_genfb_mtrr_init. PATs have been available since the Pentium III
and this code has been #if notyet'd shortly after being introduced.
 1.4 30-Nov-2019  nonaka branches: 1.4.8;
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.3 09-Feb-2011  jmcneill branches: 1.3.48; 1.3.56; 1.3.60;
if genfb is attached, hook into db_trap_callback to switch in and out of
polling mode as necessary
 1.2 08-Feb-2011  ahoka Add missing prototype for x86_genfb_mtrr_init to fix build.

Hi Jared!
 1.1 17-Feb-2009  jmcneill branches: 1.1.2; 1.1.4; 1.1.6; 1.1.10; 1.1.12; 1.1.14;
PR# port-i386/37026: userconf(4) doesn't work with vesafb(4)

Add early console support for x86 genfb.
 1.1.14.2 17-Feb-2011  bouyer Sync with HEAD
 1.1.14.1 09-Feb-2011  bouyer Sync with HEAD
 1.1.12.1 06-Jun-2011  jruoho Sync with HEAD.
 1.1.10.1 05-Mar-2011  rmind sync with head
 1.1.6.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.1.6.3 01-Nov-2009  jym Sync with HEAD.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 17-Feb-2009  jym file genfb_machdep.h was added on branch jym-xensuspend on 2009-05-13 17:18:44 +0000
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 17-Feb-2009  yamt file genfb_machdep.h was added on branch yamt-nfs-mp on 2009-05-04 08:12:09 +0000
 1.1.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.1.2.1 17-Feb-2009  skrll file genfb_machdep.h was added on branch nick-hppapmap on 2009-03-03 18:29:36 +0000
 1.3.60.1 08-Dec-2019  martin Pull up following revision(s) (requested by nonaka in ticket #502):
sys/arch/x86/x86/hyperv.c: revision 1.5
sys/arch/x86/include/genfb_machdep.h: revision 1.4
sys/arch/x86/x86/genfb_machdep.c: revision 1.15
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.3.56.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.48.1 05-Dec-2019  bouyer Pull up following revision(s) (requested by nonaka in ticket #1466):
sys/arch/x86/x86/hyperv.c: revision 1.5
sys/arch/x86/include/genfb_machdep.h: revision 1.4
sys/arch/x86/x86/genfb_machdep.c: revision 1.15
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.4.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.5.18.2 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.5.18.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #425):

sys/arch/x86/pci/pci_machdep.c: revision 1.96
sys/arch/x86/acpi/acpi_machdep.c: revision 1.36
sys/arch/x86/x86/hyperv.c: revision 1.16
sys/arch/x86/x86/genfb_machdep.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.56
sys/arch/x86/include/genfb_machdep.h: revision 1.6

Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.7 06-Oct-2022  msaitoh IOAPIC_ID_MASK is 8 bits these days. Fixes PR kern/54276.
 1.6 19-Jun-2019  msaitoh branches: 1.6.2;
Fix ioapic_dump_raw() to dump whole ioapic area.
 1.5 22-Apr-2017  nonaka branches: 1.5.4; 1.5.12;
Added I/O APIC EOI register definition.
 1.4 26-Jan-2013  dyoung branches: 1.4.14; 1.4.18;
Several registers and bitfields named IOAPIC_* actually belong to the
LAPIC, so rename them LAPIC_* and move to a more appropriate header
file.
 1.3 17-Aug-2011  dyoung branches: 1.3.2; 1.3.12;
Add definitions from [1] for the I/O APIC's MSI Message Address & Data
registers.

[1] Intel Corporation, Intel 64 and IA-32 Architectures Software
Developer's Manual, Volume 3A: System Programming Guide, Part 1,
http://www.intel.com/Assets/PDF/manual/253668.pdf, Chapter 10,
January, 2011.
 1.2 28-Apr-2008  martin branches: 1.2.14;
Remove clause 3 and 4 from TNF licenses
 1.1 26-Feb-2003  fvdl branches: 1.1.104; 1.1.106; 1.1.108;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.108.1 16-May-2008  yamt sync with head.
 1.1.106.1 18-May-2008  yamt sync with head.
 1.1.104.1 02-Jun-2008  mjf Sync with HEAD.
 1.2.14.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.3.12.2 03-Dec-2017  jdolecek update from HEAD
 1.3.12.1 25-Feb-2013  tls resync with head
 1.3.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.18.1 26-Apr-2017  pgoyette Sync with HEAD
 1.4.14.1 28-Aug-2017  skrll Sync with HEAD
 1.5.12.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5.4.1 10-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1769):

sys/arch/x86/x86/ioapic.c: revision 1.66
sys/arch/x86/include/i82093reg.h: revision 1.7

Print detail about misconfigured APIC ID.

IOAPIC_ID_MASK is 8 bits these days. Fixes PR kern/54276.
 1.6.2.1 10-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1536):

sys/arch/x86/x86/ioapic.c: revision 1.66
sys/arch/x86/include/i82093reg.h: revision 1.7

Print detail about misconfigured APIC ID.

IOAPIC_ID_MASK is 8 bits these days. Fixes PR kern/54276.
 1.16 23-May-2017  nonaka whitespace
 1.15 23-May-2017  nonaka x86: No ioapic_softc.sc_apicid is used anymore. Use ioapic_softc.sc_pic.pic_apicid.
 1.14 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.13 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.12 15-Jun-2012  yamt branches: 1.12.2; 1.12.16;
comment
 1.11 25-Mar-2009  dyoung branches: 1.11.12;
It is only by accident that these get the definitions they need from
<sys/device.h>, so explicitly #include <sys/device.h>.
 1.10 03-Jul-2008  drochner branches: 1.10.4; 1.10.10;
split device/softc for ioapic
 1.9 03-Jul-2008  drochner Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.8 07-May-2008  joerg branches: 1.8.2; 1.8.4;
Remove some prototypes that are not implemented. Make some functions
static that are only used in intr.c.
 1.7 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.6 18-Apr-2008  cegger branches: 1.6.2; 1.6.4;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.
 1.5 16-Apr-2008  cegger - use aprint_*_dev and device_xname
- use POSIX integer types
 1.4 09-Dec-2007  jmcneill branches: 1.4.10;
Merge jmcneill-pm branch.
 1.3 29-May-2005  christos branches: 1.3.2; 1.3.56; 1.3.58; 1.3.68; 1.3.70;
Sprinkle const.
 1.2 27-Oct-2003  junyoung Nuke __P().
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.70.1 11-Dec-2007  yamt sync with head.
 1.3.68.1 26-Dec-2007  ad Sync with head.
 1.3.58.1 09-Jan-2008  matt sync with HEAD
 1.3.56.1 30-Sep-2007  joerg Add a second function ioapic_reenable that restores all vectors.
 1.3.2.1 21-Jan-2008  yamt sync with head
 1.4.10.2 28-Sep-2008  mjf Sync with HEAD.
 1.4.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.4.2 04-May-2009  yamt sync with head.
 1.6.4.1 16-May-2008  yamt sync with head.
 1.6.2.1 18-May-2008  yamt sync with head.
 1.8.4.1 03-Jul-2008  simonb Sync with head.
 1.8.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.10.10.2 01-Nov-2009  jym Sync with HEAD.
 1.10.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.10.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.11.12.1 30-Oct-2012  yamt sync with head
 1.12.16.2 28-Aug-2017  skrll Sync with HEAD
 1.12.16.1 06-Jun-2015  skrll Sync with HEAD
 1.12.2.1 03-Dec-2017  jdolecek update from HEAD
 1.19 14-Jun-2019  msaitoh No functional change:
- Rename macros:
- ICR, LVT and MSIDATA can share the bit definitions. Remove redundant
definitions and use the common macros.
- Consistently use LAPIC_LVT_ for all local vector table's macro names.
- Use __BITS().
- Add definition for TSC-deadline (LAPIC_LVT_TMM_TSCDLT).
 1.18 13-Jun-2019  msaitoh Indent consistently. No functional change.
 1.17 13-Jun-2019  msaitoh Modify LAPIC_LVT_CMCI's comment to be consistent with other LVT's.
No functional change.
 1.16 28-Apr-2017  nonaka branches: 1.16.10;
Added AMD extended APIC register space present definition.
 1.15 22-Apr-2017  nonaka branches: 1.15.2;
move LAPIC_MSR* to specialreg.h.
 1.14 22-Apr-2017  nonaka Add x2APIC register definitions.
 1.13 17-Jul-2015  msaitoh branches: 1.13.2;
Indent. No functional change.
 1.12 26-Jan-2013  dyoung branches: 1.12.14;
Several registers and bitfields named IOAPIC_* actually belong to the
LAPIC, so rename them LAPIC_* and move to a more appropriate header
file.
 1.11 20-Jan-2012  hannken branches: 1.11.6;
Revert revision 1.4 and change LAPIC_LEVEL_ASSERT / _MASK back to 0x4000.

According to "Intel 64 and IA-32 Architectures Software Developer's Manual"
Vol. 3, May 2011, Order Number: 325384-039US, Section 10.6.1:

LEVEL_ASSERT is bit #14, bit #13 is reserved.

With this change NetBSD now boots multiple processors under CentOS 6.2/kvm.
 1.10 15-Nov-2010  cegger branches: 1.10.8; 1.10.12;
add interrupt EAPIC register definitions
 1.9 09-Jan-2010  cegger branches: 1.9.4;
add LAPIC_MSR_ENABLE_x2 MSR. from murray@river-styx via port-amd64@
'...as documented in the Intel 64 and IA32 Architectures Software
Developers Manual 3A, chapter 10.5.1.'
 1.8 12-May-2008  ad branches: 1.8.8; 1.8.12;
Some defs to describe the IA32_APIC_BASE MSR.
 1.7 09-May-2008  cegger Buildfix: Remove duplicate #defines.
 1.6 09-May-2008  ad LAPIC_ID_MASK is 8 bits these days.
 1.5 28-Apr-2008  martin branches: 1.5.2;
Remove clause 3 and 4 from TNF licenses
 1.4 22-Jan-2008  joerg branches: 1.4.6; 1.4.8; 1.4.10;
Fix LAPIC_LEVEL_MASK and related defines.
 1.3 14-Nov-2007  joerg branches: 1.3.6;
Merge from jmcneill-pm:
Add some more defines from the spec. Remove some old ones not
existing in the current Intel Architecture Guide. Use some more
understandable names.

ANSIfy and use uintXX_t to hurt my eyes less.

Further improve readability by exploiting __HAVE_TIMECOUNTER as
invariance on x86 platforms.
 1.2 14-Nov-2007  ad +LAPIC_DLMODE_EXTINT
 1.1 26-Feb-2003  fvdl branches: 1.1.18; 1.1.60; 1.1.78; 1.1.80; 1.1.84; 1.1.86;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.86.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.86.1 19-Nov-2007  mjf Sync with HEAD.
 1.1.84.1 18-Nov-2007  bouyer Sync with HEAD
 1.1.80.2 23-Mar-2008  matt sync with HEAD
 1.1.80.1 09-Jan-2008  matt sync with HEAD
 1.1.78.2 14-Nov-2007  joerg Sync with HEAD.
 1.1.78.1 06-Sep-2007  joerg Add some more defines from the spec. Remove some old ones not
existing in the current Intel Architecture Guide. Use some more
understandable names.
 1.1.60.1 03-Dec-2007  ad Sync with HEAD.
 1.1.18.2 04-Feb-2008  yamt sync with head.
 1.1.18.1 15-Nov-2007  yamt sync with head.
 1.3.6.1 23-Jan-2008  bouyer Sync with HEAD.
 1.4.10.2 11-Mar-2010  yamt sync with head
 1.4.10.1 16-May-2008  yamt sync with head.
 1.4.8.1 18-May-2008  yamt sync with head.
 1.4.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.8.12.2 10-Jan-2011  jym Sync with HEAD
 1.8.12.1 24-Oct-2010  jym Sync with HEAD
 1.8.8.1 25-Jan-2012  riz Pull up following revision(s) (requested by hannken in ticket #1715):
- Be robust against an invalid timer period value.
sys/dev/ic/hpetreg.h Rev. 1.4
sys/dev/ic/hpet.c Rev. 1.8

- Fix wrong definition of LAPIC_LEVEL_ASSERT / _MASK
sys/arch/x86/include/i82489reg.h Rev. 1.11

- Add virtio driver - speed up disk and network access in virtual environments
sys/arch/i386/conf/GENERIC Rev. 1.1055
sys/arch/i386/conf/ALL Rev. 1.325
sys/arch/amd64/conf/GENERIC Rev. 1.338
sys/dev/pci/files.pci Rev. 1.350
sys/dev/pci/if_vioif.c Rev. 0-1.2
sys/dev/pci/ld_virtio.c Rev. 0-1.4
sys/dev/pci/viomb.c Rev. 0-1.1
sys/dev/pci/virtio.c Rev. 0-1.3
sys/dev/pci/virtioreg.h Rev. 0-1.1
sys/dev/pci/virtiovar.h Rev. 0-1.1
distrib/sets/lists/man/mi Rev. 1.1352 and 1.1358
share/man/man4/Makefile Rev. 1.573 and 1.575
share/man/man4/ld.4 Rev. 1.19
share/man/man4/virtio.4 Rev. 0-1.4
share/man/man4/vioif.4 Rev. 0-1.2
share/man/man4/viomb.4 Rev. 0-1.2

Allow NetBSD to run unmodified under Linux/kvm.
 1.9.4.1 05-Mar-2011  rmind sync with head
 1.10.12.1 18-Feb-2012  mrg merge to -current.
 1.10.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.10.8.1 17-Apr-2012  yamt sync with head
 1.11.6.2 03-Dec-2017  jdolecek update from HEAD
 1.11.6.1 25-Feb-2013  tls resync with head
 1.12.14.2 28-Aug-2017  skrll Sync with HEAD
 1.12.14.1 22-Sep-2015  skrll Sync with HEAD
 1.13.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.15.2.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.16.10.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.21 21-May-2020  ad - Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.20 01-Dec-2019  maxv localify
 1.19 23-May-2017  nonaka branches: 1.19.10;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.18 22-Apr-2017  nonaka use CR8 instead of LAPIC Task Priority register on x86-64.
 1.17 19-Apr-2017  nonaka remove prototypes of nonexistent function.
 1.16 25-Nov-2016  maxv branches: 1.16.2;
Move the virtual address of the LAPIC page out of the data segment on amd64
and i386. The old design was error-prone, and it didn't allow us to map the
data segment with large pages.

Now, the VA is allocated dynamically in the pmap bootstrap code, and entered
manually later. We go from using &local_apic to using *local_apic_va, and we
therefore need one more level of indirection in the asm code.

Discussed on tech-kern.
 1.15 16-Oct-2016  maxv Remove lapic_tpr on amd64 and i386, unused. Now, we have only one pointer
to the LAPIC page, and each register access is done with relative offsets.
 1.14 12-Jun-2011  rmind branches: 1.14.12; 1.14.30; 1.14.34;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.13 18-May-2011  drochner branches: 1.13.2;
remove stale declarations / empty function
 1.12 28-Apr-2008  martin branches: 1.12.14; 1.12.22; 1.12.28;
Remove clause 3 and 4 from TNF licenses
 1.11 14-Apr-2008  cegger branches: 1.11.2; 1.11.4;
- u_int32_t -> uint32_t
- ansfiy
 1.10 09-Dec-2007  jmcneill branches: 1.10.10;
Merge jmcneill-pm branch.
 1.9 03-Dec-2007  joerg branches: 1.9.2;
Revert last commit which added externs that never get defined anywhere.
At least lapic_get_timecount conflicts with the newly added lapic TC.
 1.8 03-Dec-2007  ad branches: 1.8.2;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.7 17-Oct-2007  garbled branches: 1.7.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.6 29-Aug-2007  ad Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.5 16-Feb-2006  perry branches: 1.5.24; 1.5.32; 1.5.38; 1.5.42; 1.5.44;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.4 24-Dec-2005  perry branches: 1.4.2; 1.4.4; 1.4.6;
__asm__ -> __asm
__const__ -> const
__inline__ -> inline
__volatile__ -> volatile
 1.3 27-Oct-2003  junyoung branches: 1.3.16;
Nuke __P().
 1.2 19-Jul-2003  lukem change multiple include protection #define to match filename
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.16.3 21-Jan-2008  yamt sync with head
 1.3.16.2 03-Sep-2007  yamt sync with head.
 1.3.16.1 21-Jun-2006  yamt sync with head.
 1.4.6.1 22-Apr-2006  simonb Sync with head.
 1.4.4.1 09-Sep-2006  rpaulo sync with head
 1.4.2.1 18-Feb-2006  yamt sync with head.
 1.5.44.2 09-Jan-2008  matt sync with HEAD
 1.5.44.1 06-Nov-2007  matt sync with HEAD
 1.5.42.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.5.42.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.5.42.1 03-Aug-2007  jmcneill Pull in power management changes from private branch.
 1.5.38.1 03-Sep-2007  skrll Sync with HEAD.
 1.5.32.1 03-Oct-2007  garbled Sync with HEAD
 1.5.24.2 03-Dec-2007  ad Sync with HEAD.
 1.5.24.1 29-Jul-2007  ad - When zeroing/copying pages, use SSE2 movtni to avoid polluting the cache.
- By default, align assembly routines on 32-byte starting boundaries.
- There are now 8 interrupt priority levels, half of which are softints.
Update intrdefs.h to match.
- Always clear/set spinlock words - removes lots of ifdefs.
- Remove the horrible ci_self150 hack that I introduced.
- Overhaul how TLB shootdown is performed. Inspired by a similar change in
OpenBSD but implemented quite differently. This should be a lot faster
but I have not benchmarked it yet.
 1.7.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.7.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.8.2.2 26-Dec-2007  ad Sync with head.
 1.8.2.1 08-Dec-2007  ad Sync with head.
 1.9.2.1 11-Dec-2007  yamt sync with head.
 1.10.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.11.4.1 16-May-2008  yamt sync with head.
 1.11.2.1 18-May-2008  yamt sync with head.
 1.12.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.12.22.2 31-May-2011  rmind sync with head
 1.12.22.1 26-Apr-2010  rmind Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap,
also provide TLBSTATS option to measure and track TLB shootdowns. Details:

http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html

Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.

XXX: amd64 and xen are not yet; work in progress.
 1.12.14.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.13.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.14.34.3 26-Apr-2017  pgoyette Sync with HEAD
 1.14.34.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14.34.1 04-Nov-2016  pgoyette Sync with HEAD
 1.14.30.2 28-Aug-2017  skrll Sync with HEAD
 1.14.30.1 05-Dec-2016  skrll Sync with HEAD
 1.14.12.1 03-Dec-2017  jdolecek update from HEAD
 1.16.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.3 04-May-2003  fvdl branches: 1.3.2;
Block level-triggered interrupts at the ioapic if they are deferred.
Avoids interrupt storms seen on some systems. Many thanks to
Stoned Elipot for testing.
 1.2 03-Mar-2003  fvdl use CVAROFF.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.13 02-Jan-2024  christos use sized types
 1.12 16-Sep-2023  christos protect against multiple inclusion
 1.11 15-Sep-2010  christos Commit SoC long double support from Stathis Kamperis
 1.10 02-Feb-2007  christos branches: 1.10.48; 1.10.62; 1.10.68; 1.10.70;
Merge the int bit with the high fraction bit. Add constants/macros
needed by gdtoa.
 1.9 15-Apr-2005  kleink branches: 1.9.2; 1.9.28; 1.9.32;
Push back the descriptions of NaN formats, and descriptions of the
distinction between signalling NaNs and quiet NaNs back into the
machine-dependent headers; treat the implementation of __nanf in the
same spirit.

IEEE 754 leaves the distinction between signalling NaNs and quiet NANs
to the implementation, and unlike our headers used to suggest they're
not identical in the interpretation of the fraction's MSb; in due
course, make those of hppa, mips, sh3, and sh5 reflect reality.
 1.8 27-Oct-2003  kleink branches: 1.8.8; 1.8.14;
Err, rename some members added in previous to make them reflect their
semantics better.
 1.7 26-Oct-2003  kleink For convenient use in libc, add unions of the C floating types and their
corresponding structure definitions.
 1.6 26-Oct-2003  kleink Correct the position of the QUIETNAN bit.
 1.5 26-Oct-2003  kleink Use <sys/ieee754.h> where applicable.
 1.4 25-Oct-2003  kleink Reflect the explicit integer bit here as well.
 1.3 23-Oct-2003  kleink Make ieee_ext match reality, and add a note about its ABI-specific
tail padding.
 1.2 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.8.14.1 19-Apr-2005  tron Pull up revision 1.9 (requested by kleink in ticket #163):
Push back the descriptions of NaN formats, and descriptions of the
distinction between signalling NaNs and quiet NaNs back into the
machine-dependent headers; treat the implementation of __nanf in the
same spirit.
IEEE 754 leaves the distinction between signalling NaNs and quiet NANs
to the implementation, and unlike our headers used to suggest they're
not identical in the interpretation of the fraction's MSb; in due
course, make those of hppa, mips, sh3, and sh5 reflect reality.
 1.8.8.1 29-Apr-2005  kent sync with -current
 1.9.32.1 07-May-2007  pavel Pull up following revision(s) (requested by manu in ticket #607):
lib/libc/arch/i386/gen/isnanl.c: revision 1.6
lib/libc/gdtoa/gdtoa.c: revision 1.2-1.3
lib/libc/arch/x86_64/gen/isnanl.c: revision 1.6
lib/libc/gdtoa/gdtoaimp.h: revision 1.6
sys/arch/m68k/include/ieee.h: revision 1.13
usr.bin/xlint/lint1/scan.l: revision 1.36-1.37
lib/libc/stdio/snprintf_ss.c: revision 1.4
lib/libc/arch/i386/gen/isfinitel.c: revision 1.2
lib/libc/stdio/vfscanf.c: revision 1.38
sys/arch/sparc/include/ieee.h: revision 1.11-1.12
lib/libc/gdtoa/dtoa.c: revision 1.4
lib/libc/stdio/Makefile.inc: revision 1.35
lib/libc/stdio/fvwrite.c: revision 1.17
lib/libc/arch/m68k/gen/fpclassifyl.c: revision 1.2
lib/libc/arch/i386/gen/isinfl.c: revision 1.6
lib/libc/arch/x86_64/gen/isinfl.c: revision 1.6
lib/libc/arch/x86_64/gen/isfinitel.c: revision 1.2
lib/libc/stdio/vfprintf.c: revision 1.55-1.57
lib/libc/stdio/vsnprintf_ss.c: revision 1.3
lib/libc/stdio/vfwprintf.c: revision 1.10
sys/arch/x86/include/ieee.h: revision 1.10
lib/libc/gdtoa/dmisc.c: revision 1.3
lib/libc/gdtoa/Makefile.inc: revision 1.5
sys/arch/hppa/include/ieee.h: revision 1.10
lib/libc/arch/x86_64/gen/fpclassifyl.c: revision 1.3
lib/libc/arch/i386/gen/fpclassifyl.c: revision 1.2
sys/sys/ieee754.h: revision 1.7
lib/libc/gdtoa/gdtoa.h: revision 1.7
include/stdio.h: revision 1.67-1.68
lib/libc/gdtoa/hdtoa.c: revision 1.1-1.4
lib/libc/gdtoa/ldtoa.c: revision 1.1-1.4
defined(_NETBSD_SOURCE) is equivalent to (!defined(_ANSI_SOURCE) &&
!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)), so there's no
need to check both of them.
Fix for issue reported in PR lib/35401 as well as related overflow bugs.
deal with hex doubles.
Instead of abusing stdio to get a signal-safe version of sprintf, provide one.
remove __SAFE
add long double and hex double support from freebsd.
make this compile.
add new prototypes.
add the new files to the build. Note I am not bumping libc now, because
these are not used yet.
Merge the int bit with the high fraction bit. Add constants/macros
needed by gdtoa.
add constants used by gdtoa
since the int bit is merged, do the explicit math.
ext_int bit is no more.
ext_int bit is no more.
- merge change from freebsd
- add support for building as vfprintf.c
- XXX: we strdup to simplify the freeing logic. This should be fixed for
efficiency in the vfprintf case.
use vfwprintf.c
enable wide doubles.
some int -> size_t
deal with sparc64 that has 112 bits of mantissa.
make extended precision gdtoa friendly.
int/size_t changes
make this gdtoa friendly.
remove dup definition
use dtoa() instead of returning empty when we don't have extended precision
information.
Fix previous, add forgotten pointer dereference in the call to dtoa().
Add a cheesy workaround marked XXX for the situation where the
strtod() implementation available in the environment does not
handle hex floats.
Discussed with and suggested by christos
From Christos: gdtoa fixes for m68k. M68k ports should build now, but
printing extended precision is a little off.
vax does not have <machine/ieee.h> or long double
It would be nice if the compiler provided something like __IEEE_MATH__
bring in FreeBSD's vfscanf() to gain multi-byte/collation support.
Unfortunately it is too difficult to make vfwscanf and this share
the same code like I did with printf, because for string parsing
the code is too different.
 1.9.28.1 09-Feb-2007  ad Sync with HEAD.
 1.9.2.1 26-Feb-2007  yamt sync with head.
 1.10.70.1 05-Mar-2011  rmind sync with head
 1.10.68.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.10.62.1 24-Oct-2010  jym Sync with HEAD
 1.10.48.1 09-Oct-2010  yamt sync with head
 1.4 26-Mar-2011  christos add fp{g,s}etprec
 1.3 31-Jul-2010  joerg branches: 1.3.2;
Add support for fenv.h interface for i386 and amd64.

Submitted by Stathis Kamperis as part of GSoC 2010 and ported from
FreeBSD.
 1.2 05-Aug-2008  matt branches: 1.2.8; 1.2.14; 1.2.16;
Update <machine/ieeefp.h> to use the C99 FE_* definitions instead of the
NetBSD defined ones. Redefine the NetBSD ones in terms of the C99 ones.
Step 1 to having <fenv.h>
 1.1 26-Feb-2003  fvdl branches: 1.1.104; 1.1.108; 1.1.110; 1.1.114;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.114.1 19-Oct-2008  haad Sync with HEAD.
 1.1.110.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.1.108.2 11-Aug-2010  yamt sync with head.
 1.1.108.1 04-May-2009  yamt sync with head.
 1.1.104.1 28-Sep-2008  mjf Sync with HEAD.
 1.2.16.2 21-Apr-2011  rmind sync with head
 1.2.16.1 05-Mar-2011  rmind sync with head
 1.2.14.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.2.8.2 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.2.8.1 24-Oct-2010  jym Sync with HEAD
 1.3.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.66 07-Sep-2022  knakahara NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.65 24-May-2022  bouyer Some devices (e.g. ixg in MSI-X mode) don't to have their handlers called
when no interrupt are pending. So add an extra ih_pending field
to struct intrhand, which is incremeted when the handler is not called because
of IPL level and reset to 0 when called. Check this in Xen's resume
assembly to call only handlers that are really pending.
 1.64 04-Apr-2022  andvar fix various typos, mainly in comments.
 1.63 12-Mar-2022  riastradh x86: Check for biglock leakage in interrupt handlers.
 1.62 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.61 22-Dec-2019  thorpej branches: 1.61.6;
Add intr_mask() and corresponding intr_unmask() calls that allow specific
interrupt lines / sources to be masked as needed (rather than making a
set of sources by IPL as with spl*()).
 1.60 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.59 13-Feb-2019  cherry Missed the crucial header file in previous commit.

struct intrstub; is now uniform across native and XEN

This should fix the XEN builds.
 1.58 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.57 13-Dec-2018  cherry Allow x86 builds to have the opportunity to not have pre-emption
enabled by default. This can be effected by having a:

"options NO_PREEMPTION"

line in the kernel configuration file.

While it was tempting to tie __HAVE_PREEMPTION to MULTIPROCESSOR,
as is currently assumed in sys/kern/kern_stub.c ,

having MULTIPROCESSOR without __HAVE_PREEMPTION
and not having either are valid configuration options which users
could have choice of. We thus err on the side of configurability.
 1.56 24-Jun-2018  jdolecek branches: 1.56.2;
add support for kern.intr.list aka intrctl(8) 'list' for xen

event_set_handler() and pirq_establish() now have extra intrname
parameter; shared intr_create_intrid() is used to provide the value

xen drivers were changed to pass the specific driver instance
name as the xname, e.g. 'vcpu0 clock' instead just 'clock', or
'xencons0' instead of 'xencons'

associated evcnt is now changed to use intrname - this matches native x86
 1.55 04-Apr-2018  christos Rename Xpreempt{recurse,resume} -> X{recurse,resume}_preempt so that
they fit the pattern. Also the debugger trap sniffer matches them
without adding special entries...
XXX: pullup-8.
 1.54 17-Feb-2018  maxv branches: 1.54.2;
Rename i8259_stubs -> legacy_stubs. We will want the entries to have the
same name, eg:

legacy_stubs
-> Xintr_legacy0, Xrecurse_legacy0, Xresume_legacy0
-> Xintr_legacy1, Xrecurse_legacy1, Xresume_legacy1
...
 1.53 04-Jan-2018  knakahara fix "intrctl list" panic when ACPI is disabled.

reviewed by cherry@n.o and tested by msaitoh@n.o, thanks.
 1.52 04-Nov-2017  cherry Retire xen/x86/intr.c and use the new xen specific glue in x86/x86/intr.c

The purpose of this change is to expose the x86/include/intr.h API
to drivers. Specifically the following functions:

void *intr_establish_xname(...);
void *intr_establish(...);
void intr_disestablish(...);

while maintaining the old API from xen/include/evtchn.h, specifically
the following functions:

int event_set_handler(...);
int event_remove_handler(...);

This is so that if things break, we can keep using the old API until
everything stabilises. This is a stepping stone towards getting the
actual XEN event callback path rework code in place - which can be
done opaquely behind the intr.h API - NetBSD/XEN specific drivers that
have been ported to the intr.h API should then work without
significant further modifications.
 1.51 16-Jul-2017  cherry branches: 1.51.2;
Unify the xen and native x86/ interrupt setup functions and
spl traversal data structures.

This is towards PVHVM.
 1.50 23-May-2017  nonaka branches: 1.50.2;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.49 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.48 17-Aug-2015  knakahara Add kernel code to support intrctl(8).
 1.47 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.46 27-Apr-2015  knakahara add pci_intr_distribute(9) for x86.
 1.45 20-Jul-2014  uebayasi branches: 1.45.4;
ipifunc[]: Comment IPI constant names for grep'ability. Constify.
 1.44 29-Mar-2014  christos branches: 1.44.2;
make pci_intr_string and eisa_intr_string take a buffer and a length
instead of relying in local static storage.
 1.43 01-Aug-2011  drochner branches: 1.43.2; 1.43.12; 1.43.16;
if checking whether an interrupt is shared, don't compare pin numbers
if it is "-1" -- this is a hack to allow MSIs which don't have a concept
of pin numbers, and are generally not shared
(This doesn't give us sensible event names for statistics display. The
whole abstraction has more exceptions than regular cases, it should
be redesigned imho.)
 1.42 03-Apr-2011  dyoung Clean up excessive #ifdef'age of NMI trap handling for amd64/i386/xen.
Handle NMI in all Xen kernels.
 1.41 02-May-2010  plunky branches: 1.41.2;
The spl inline functions refer to external symbols that are only
defined in the kernel.

Wrap kernel-specific declarations in #ifdef _KERNEL to avoid unresolved
references when including from userland.
 1.40 25-Apr-2010  ad Nothing uses x86_multicast_ipi() right now and it complicates many
CPU support, so remove it.
 1.39 19-Apr-2009  ad branches: 1.39.2; 1.39.4;
cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.38 27-Mar-2009  dyoung If defined(_KERNEL), #include <sys/types.h>, otherwise #include
<stdbool.h>, for the bool definition that we need. intr.h only got the
definition by chance, before.
 1.37 25-Mar-2009  dyoung It is only by accident that this gets the definitions it needs from
<sys/evcnt.h>, so explicitly #include <sys/evcnt.h>.
 1.36 24-Feb-2009  yamt - rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.35 30-May-2008  ad branches: 1.35.6; 1.35.12;
Add a 'known_mpsafe' argument to intr_establish().
 1.34 07-May-2008  joerg branches: 1.34.2;
Remove some prototypes that are not implemented. Make some functions
static that are only used in intr.c.
 1.33 28-Apr-2008  ad Add support for kernel preeemption to the i386 and amd64 ports. Notes:

- I have seen one isolated panic in the x86 pmap, but otherwise i386
seems stable with preemption enabled.

- amd64 is missing the FPU handling changes and it's not yet safe to
enable it there.

- The usual level for kern.sched.kpreempt_pri will be 128 once enabled
by default. For testing, setting it to 0 helps to shake out bugs.
 1.32 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.31 21-Jan-2008  dyoung branches: 1.31.6; 1.31.8; 1.31.10;
Add primitive routines to establish NMI handlers on i386.

TBD: synchronize (dis)establishment of handlers.
 1.30 26-Dec-2007  yamt - share idt entry allocation code among x86.
- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
 1.29 03-Dec-2007  ad branches: 1.29.2; 1.29.6; 1.29.8;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.28 17-Oct-2007  garbled branches: 1.28.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.27 09-Jul-2007  ad branches: 1.27.8; 1.27.10;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.26 17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.25 16-Feb-2007  ad branches: 1.25.2; 1.25.6; 1.25.8; 1.25.14;
Remove spllowersoftclock() and CLKF_BASEPRI(), and always dispatch callouts
via a soft interrupt. In the near future, softclock will be run from process
context.
 1.24 09-Feb-2007  ad Merge newlock2 to head.
 1.23 26-Dec-2006  ad Define ipl_t as uint8_t so that it can be packed into a word with a lock
byte. Ok yamt@.
 1.22 21-Dec-2006  yamt merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.21 04-Jul-2006  christos branches: 1.21.4; 1.21.6;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.20 16-Feb-2006  perry branches: 1.20.2; 1.20.10;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.19 24-Dec-2005  perry branches: 1.19.2; 1.19.4; 1.19.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.18 03-Nov-2005  yamt - use sys/spl.h.
- add some IPL_ definitions.
 1.17 29-Oct-2005  yamt add splraiseipl().
 1.16 28-Oct-2005  yamt remove duplicated spllpt().
 1.15 31-Oct-2004  yamt branches: 1.15.12; 1.15.14;
use __insn_barrier rather than homegrown equivalents.
 1.14 23-Oct-2004  yamt to determine if an interrupt needs to grab the kernel lock or not,
check interrupt's own ipl rather than cpu's current ipl.
 1.13 28-Jun-2004  fvdl Updaing ci_ilevel and testing ci_ipending must be done with all interrupts
off, or priority inversion can occur, which can lead to IPI deadlocks.
Leaves interrupts off for a bit longer, sadly, but with no noticeable
effects on the systems I tested on.

From YAMAMOTO Takashi.
 1.12 04-Mar-2004  dbj fix comment about spllowersoftclock
 1.11 14-Jan-2004  yamt spllower: lower spl before checking pending interrupts.
otherwise, interrupts happened immediately after the check might be left
pending for a while. (until the next tick in the worse case.)
 1.10 30-Oct-2003  fvdl * keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.9 27-Oct-2003  junyoung Nuke __P().
 1.8 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.7 06-Sep-2003  fvdl Move the bulk of pci_intr_string into a seperate intr_string function. Use
that new function to print the pciide compat interrupt in pciide_machdep.c.
Share pciide_machdep.c between amd64 and i386.
 1.6 20-Aug-2003  fvdl Pass pointers to frames from assembly, do not use the 'frame on stack
as argument passed by value' trick, as gcc 3.3.x makes (valid) assumptions
about the stack that will not be true. Costs 2 instructions per trap/syscall
on i386, 4 per interrupt for MP. One instruction per trap/syscall on amd64,
2 per interrupt for MP. I expect gcc 3.3.1 to make up for this by better
optimization (it'd better..)

While here, make amd64 compile again by using subr_mbr_disk.c
 1.5 23-Jun-2003  martin branches: 1.5.2;
#ifdef _KERNEL_OPT police
 1.4 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.3 16-Jun-2003  thorpej Rename IPL_IMP -> IPL_VM.
 1.2 04-May-2003  fvdl Block level-triggered interrupts at the ioapic if they are deferred.
Avoids interrupt storms seen on some systems. Many thanks to
Stoned Elipot for testing.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.5.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.2.4 02-Nov-2004  skrll Sync with HEAD.
 1.5.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.1 03-Aug-2004  skrll Sync with HEAD
 1.15.14.1 02-Nov-2005  yamt sync with head.
 1.15.12.6 21-Jan-2008  yamt sync with head
 1.15.12.5 07-Dec-2007  yamt sync with head
 1.15.12.4 03-Sep-2007  yamt sync with head.
 1.15.12.3 26-Feb-2007  yamt sync with head.
 1.15.12.2 30-Dec-2006  yamt sync with head.
 1.15.12.1 21-Jun-2006  yamt sync with head.
 1.19.6.1 22-Apr-2006  simonb Sync with head.
 1.19.4.1 09-Sep-2006  rpaulo sync with head
 1.19.2.1 18-Feb-2006  yamt sync with head.
 1.20.10.1 13-Jul-2006  gdamore Merge from HEAD.
 1.20.2.1 11-Aug-2006  yamt sync with head
 1.21.6.3 21-Sep-2006  yamt rename splraiseipl argument to match with the rest of ports.
 1.21.6.2 18-Sep-2006  yamt correct a header.
 1.21.6.1 18-Sep-2006  yamt implement new api for i386 and amd64.
 1.21.4.2 27-Jan-2007  ad If running on a PPro or later, at boot patch in versions of spllower() and
similar that use cmpxchg8b instead of cli/sti. Cuts the clock cycles for
splx() by a factor of ~6 on the P4, and ~3 on the PIII when bracketed by
serializing instructions (and hopefully more when not).
 1.21.4.1 12-Jan-2007  ad Sync with head.
 1.25.14.2 03-Oct-2007  garbled Sync with HEAD
 1.25.14.1 22-May-2007  matt Update to HEAD.
 1.25.8.1 11-Jul-2007  mjf Sync with head.
 1.25.6.4 03-Dec-2007  ad Sync with HEAD.
 1.25.6.3 03-Dec-2007  ad Sync with HEAD.
 1.25.6.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.25.6.1 27-May-2007  ad Sync with head.
 1.25.2.1 23-Mar-2007  ad - Decouple intr.h from cpu.h.
- Define splraise in spl.S. As a side effect it becomes "preemption safe".
- Make softintr_schedule a function in softintr.c.
- Make softintr a function in spl.S, and remove the unneeded lock prefix.
 1.27.10.3 23-Mar-2008  matt sync with HEAD
 1.27.10.2 09-Jan-2008  matt sync with HEAD
 1.27.10.1 06-Nov-2007  matt sync with HEAD
 1.27.8.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.28.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.28.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.29.8.1 16-Dec-2007  cube Split off device-specific stuff out of subr_autconf.c, and split off
autoconf-specific stuff out of device.h.

The only functional change is the removal of the unused evcnt.h include in
device.h which (*sigh*) has side-effects in x86's intr.h, and probably some
other in the rest of the tree but I'm only compiling i386's QEMU for the
time being.
 1.29.6.2 23-Jan-2008  bouyer Sync with HEAD.
 1.29.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.29.2.1 26-Dec-2007  ad Sync with head.
 1.31.10.3 11-Aug-2010  yamt sync with head.
 1.31.10.2 04-May-2009  yamt sync with head.
 1.31.10.1 16-May-2008  yamt sync with head.
 1.31.8.2 04-Jun-2008  yamt sync with head
 1.31.8.1 18-May-2008  yamt sync with head.
 1.31.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.35.12.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.35.12.4 02-May-2011  jym Sync with head.
 1.35.12.3 24-Oct-2010  jym Sync with HEAD
 1.35.12.2 01-Nov-2009  jym Sync with HEAD.
 1.35.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.35.6.2 28-Apr-2009  skrll Sync with HEAD.
 1.35.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.39.4.2 21-Apr-2011  rmind sync with head
 1.39.4.1 30-May-2010  rmind sync with head
 1.39.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.39.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.41.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.43.16.1 18-May-2014  rmind sync with head
 1.43.12.2 03-Dec-2017  jdolecek update from HEAD
 1.43.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.43.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.44.2.1 10-Aug-2014  tls Rebase.
 1.45.4.3 28-Aug-2017  skrll Sync with HEAD
 1.45.4.2 22-Sep-2015  skrll Sync with HEAD
 1.45.4.1 06-Jun-2015  skrll Sync with HEAD
 1.50.2.2 05-Apr-2018  martin Pull up following revision(s) (requested by christos in ticket #696):

sys/arch/amd64/amd64/vector.S: revision 1.62 (patch)
sys/arch/x86/include/intr.h: revision 1.55
sys/arch/i386/i386/vector.S: revision 1.77
sys/arch/i386/i386/db_interface.c: revision 1.82 (patch)
sys/arch/amd64/amd64/spl.S: revision 1.34 (patch)
sys/arch/amd64/amd64/db_interface.c: revision 1.33 (patch)
sys/arch/x86/x86/intr.c: revision 1.125
sys/arch/i386/i386/spl.S: revision 1.43 (patch)
sys/arch/i386/i386/machdep.c: revision 1.805 (patch)
sys/arch/x86/x86/lapic.c: revision 1.66 (patch)

Rename the DDB IPI IDT vectors for consistency. ok maxv@

Rename Xpreempt{recurse,resume} -> X{recurse,resume}_preempt so that
they fit the pattern. Also the debugger trap sniffer matches them
without adding special entries...

XXX: pullup-8.
 1.50.2.1 13-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #493):
sys/arch/x86/include/intr.h: revision 1.53
sys/arch/x86/pci/pci_intr_machdep.c: revision 1.42
sys/arch/x86/x86/intr.c: revision 1.114 via patch
fix "intrctl list" panic when ACPI is disabled.
reviewed by cherry@n.o and tested by msaitoh@n.o, thanks.
 1.51.2.2 16-Jul-2017  cherry 2302677
 1.51.2.1 16-Jul-2017  cherry file intr.h was added on branch perseant-stdc-iso10646 on 2017-07-16 14:02:49 +0000
 1.54.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.54.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.54.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.56.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.56.2.1 10-Jun-2019  christos Sync with HEAD
 1.61.6.5 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.61.6.4 19-Apr-2020  bouyer Add a struct pic * member to struct intrhand.
This will be used for interrupt_get_count()
For Xen remplace pic_type with a pointer to the pic, and add a pointer
to intrhand, in struct pintrhand
Make event_set_handler return the pointer to struct intrhand.
Don't allocate a fake intrhand in xen_intr_establish_xname(), use the
one returned by event_set_handler().
 1.61.6.3 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.61.6.2 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING
 1.61.6.1 10-Apr-2020  bouyer spllower(): Also check Xen pending events
hypervisor_pvhvm_callback(): exit via Xdoreti, so that pending interrupts
are checked.
disable __HAVE_FAST_SOFTINTS only for XENPV, it now works for PVHVM.
We still have to disable PREEMPTION, until we support MULTIPROCESSOR
 1.2 17-Aug-2015  knakahara branches: 1.2.16;
Add kernel code to support intrctl(8).
 1.1 27-Apr-2015  knakahara branches: 1.1.2;
add pci_intr_distribute(9) for x86.
 1.1.2.3 22-Sep-2015  skrll Sync with HEAD
 1.1.2.2 06-Jun-2015  skrll Sync with HEAD
 1.1.2.1 27-Apr-2015  skrll file intr_distribute.h was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 17-Aug-2015  jdolecek file intr_distribute.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.1 25-Jan-2023  riastradh branches: 1.1.2;
x86/intr: Work around sleazy clockintr with a secret frame argument.

PR kern/57197
 1.1.2.2 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #136):

sys/arch/x86/x86/intr.c: revision 1.164
sys/arch/x86/isa/clock.c: revision 1.41
sys/arch/x86/include/intr_private.h: revision 1.1

x86/intr: Work around sleazy clockintr with a secret frame argument.
PR kern/57197
 1.1.2.1 25-Jan-2023  martin file intr_private.h was added on branch netbsd-10 on 2023-04-01 15:11:00 +0000
 1.26 07-Sep-2022  knakahara NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.25 18-Mar-2021  nonaka LIR_HV priority should be lower than softint.
 1.24 25-Apr-2020  bouyer branches: 1.24.2;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.23 23-Nov-2019  ad branches: 1.23.6;
cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.22 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.21 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.20 19-May-2014  rmind branches: 1.20.20; 1.20.28;
Implement MI IPI interface with cross-call support.
 1.19 01-Dec-2013  christos branches: 1.19.2;
revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.18 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.17 06-Nov-2011  cherry branches: 1.17.10;
[merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.16 22-Jun-2010  rmind branches: 1.16.6; 1.16.8;
Implement high priority (XC_HIGHPRI) xcall(9) mechanism - a facility
to execute functions from software interrupt context, at SOFTINT_CLOCK.
Functions must be lightweight. Will be used for passive serialization.

OK ad@.
 1.15 05-Oct-2009  rmind branches: 1.15.2; 1.15.4;
Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.14 11-Nov-2008  ad branches: 1.14.4;
PR port-amd64/38293 panic: fp_save ipi didn't

Kill the FP flush IPI and always save. The synchronization here isn't strong
and we could easily pull the chain on an innocent LWP's FP state.

Another fix to follow.
 1.13 28-Apr-2008  ad branches: 1.13.6; 1.13.8; 1.13.10;
Add support for kernel preeemption to the i386 and amd64 ports. Notes:

- I have seen one isolated panic in the x86 pmap, but otherwise i386
seems stable with preemption enabled.

- amd64 is missing the FPU handling changes and it's not yet safe to
enable it there.

- The usual level for kern.sched.kpreempt_pri will be 128 once enabled
by default. For testing, setting it to 0 helps to shake out bugs.
 1.12 18-Dec-2007  joerg branches: 1.12.6; 1.12.8; 1.12.10;
Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.11 03-Dec-2007  ad branches: 1.11.2; 1.11.6;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.10 17-Oct-2007  garbled branches: 1.10.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.9 29-Aug-2007  ad Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.8 21-Mar-2007  xtraeme branches: 1.8.4; 1.8.8; 1.8.12; 1.8.14;
Remove the MSR read IPI handler from X86_IPI_NAMES and use the
correct number in X86_NIPI.
 1.7 21-Mar-2007  xtraeme Remove the MSR read IPI handler, there won't be any driver that will
use it, and we can see if the values are ok in the CPUs in the write
operation.

Suggested by YAMAMOTO Takashi.
 1.6 20-Mar-2007  xtraeme MSR read and write IPI handlers for x86. A MSR will be read or written
in all CPUs available in the system. This adds another member
to struct cpu_info, ci_msr_rvalue; it will contain the value of the MSR
in a previous operation.

Tested with clockmod in UP and SMP by me, tested with est in SMP
by Daniel Carosone and Michael Van Elst.

Ok'ed by Andrew Doran and Matthew R. Green.
 1.5 03-Nov-2005  yamt branches: 1.5.26; 1.5.28; 1.5.32; 1.5.34; 1.5.36;
- use sys/spl.h.
- add some IPL_ definitions.
 1.4 16-Apr-2005  yamt branches: 1.4.2;
make multi inclusion protection macros consistent.
 1.3 16-Jun-2003  thorpej branches: 1.3.2; 1.3.10; 1.3.16;
Rename IPL_IMP -> IPL_VM.
 1.2 04-May-2003  fvdl Block level-triggered interrupts at the ioapic if they are deferred.
Avoids interrupt storms seen on some systems. Many thanks to
Stoned Elipot for testing.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.16.1 21-Apr-2005  tron Pull up revision 1.4 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.3.10.1 29-Apr-2005  kent sync with -current
 1.3.2.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.2.4 21-Jan-2008  yamt sync with head
 1.4.2.3 07-Dec-2007  yamt sync with head
 1.4.2.2 03-Sep-2007  yamt sync with head.
 1.4.2.1 21-Jun-2006  yamt sync with head.
 1.5.36.1 29-Mar-2007  reinoud Pullup to -current
 1.5.34.1 11-Jul-2007  mjf Sync with head.
 1.5.32.6 23-Oct-2007  ad - Remove most of the hardware interrupt priority levels as proposed on
tech-kern, but leave the names of the remaining levels as none, vm, sched,
high: http://mail-index.netbsd.org/tech-kern/2007/05/05/0005.html

- Add aliases for the old levels to sys/intr.h.
 1.5.32.5 19-Oct-2007  ad Adjust previous.
 1.5.32.4 19-Oct-2007  ad Tidy up IPL defs, and remove a bogus comment block.
 1.5.32.3 29-Jul-2007  ad - When zeroing/copying pages, use SSE2 movtni to avoid polluting the cache.
- By default, align assembly routines on 32-byte starting boundaries.
- There are now 8 interrupt priority levels, half of which are softints.
Update intrdefs.h to match.
- Always clear/set spinlock words - removes lots of ifdefs.
- Remove the horrible ci_self150 hack that I introduced.
- Overhaul how TLB shootdown is performed. Inspired by a similar change in
OpenBSD but implemented quite differently. This should be a lot faster
but I have not benchmarked it yet.
 1.5.32.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.5.32.1 10-Apr-2007  ad Sync with head.
 1.5.28.1 24-Mar-2007  yamt sync with head.
 1.5.26.1 20-Apr-2007  bouyer Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.8.14.2 09-Jan-2008  matt sync with HEAD
 1.8.14.1 06-Nov-2007  matt sync with HEAD
 1.8.12.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.8.12.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.8.8.1 03-Sep-2007  skrll Sync with HEAD.
 1.8.4.1 03-Oct-2007  garbled Sync with HEAD
 1.10.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.10.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.11.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.11.2.1 26-Dec-2007  ad Sync with head.
 1.12.10.4 11-Aug-2010  yamt sync with head.
 1.12.10.3 11-Mar-2010  yamt sync with head
 1.12.10.2 04-May-2009  yamt sync with head.
 1.12.10.1 16-May-2008  yamt sync with head.
 1.12.8.1 18-May-2008  yamt sync with head.
 1.12.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.12.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.10.1 17-Nov-2008  snj Pull up following revision(s) (requested by ad in ticket #73):
sys/arch/amd64/amd64/fpu.c: revision 1.27
sys/arch/amd64/amd64/ipifuncs.c: revision 1.20
sys/arch/i386/i386/ipifuncs.c: revision 1.28
sys/arch/i386/isa/npx.c: revision 1.130
sys/arch/x86/include/intrdefs.h: revision 1.14
PR port-amd64/38293 panic: fp_save ipi didn't
Kill the FP flush IPI and always save. The synchronization here isn't
strong and we could easily pull the chain on an innocent LWP's FP state.
Another fix to follow.
 1.13.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.13.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.14.4.2 24-Oct-2010  jym Sync with HEAD
 1.14.4.1 01-Nov-2009  jym Sync with HEAD.
 1.15.4.1 03-Jul-2010  rmind sync with head
 1.15.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.16.8.1 10-Nov-2011  yamt sync with head
 1.16.6.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.17.10.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.2.1 10-Aug-2014  tls Rebase.
 1.20.28.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.20.28.1 10-Jun-2019  christos Sync with HEAD
 1.20.20.1 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.23.6.1 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.24.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.12 25-Dec-2018  mlelstv Make ipmi driver available to other platforms.
Add ACPI attachment.
 1.11 01-Aug-2010  mlelstv branches: 1.11.58; 1.11.60;
sc_cmd_mtx protects a command sequence, no longer abuse it for delays.

Initialize mutexes and condition variables in attach and not in the
asynchronously started kernel thread.

Increase BMC spin timeout from 5ms to 15ms, this is necessary to detect
the BMC in a HP ML110G4 reliably.

Implement non-linear sensors as defined in IPMIv2.0 with some crude
32.32 fixed point arithmetic. This adds some small errors as logarithm
and power functions are only approximated.

Fix sensor index mapping so that sensor limits are computed correctly.
 1.10 20-Jul-2009  dyoung branches: 1.10.2; 1.10.4;
Overhaul synchronization in ipmi(4): synchronize all access to
device registers with a mutex. Convert tsleep/wakeup calls to
cv_wait/cv_signal.

Do not repeatedly malloc/free tiny buffers for sending/receiving
commands, but reserve a command buffer in the softc.

Tickle the watchdog in the sensors-refreshing thread.

I am fairly certain that after the device is attached, every register
access happens in the sensors-refreshing thread. Moreover, no
software interrupt touches any register, now. So I may get rid of
the mutex that protects register accesses, sc_cmd_mtx.
 1.9 03-Nov-2008  cegger branches: 1.9.4;
The functions called from ipmi_match use the DEVNAME macro. But the softc is allocated on the stack and the accessed sc_dev member is not initialized.

Initialize the sc_dev.dv_xname in ipmi_match, which is enough to make DEVNAME work. Finally this also allows the device_t/softc split.
 1.8 23-Sep-2008  ad branches: 1.8.2; 1.8.4;
Speed up ipmi attach a bit, although boot times on my workstation still suck:

before 18s
after 14s
without ipmi 8s
 1.7 16-Apr-2008  cegger branches: 1.7.4; 1.7.6; 1.7.10;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.6 16-Nov-2007  xtraeme branches: 1.6.14;
Extend the envsys2 API (one more time, sorry) as defined in:

http://mail-index.netbsd.org/tech-kern/2007/11/09/0001.html

sysmon_envsys_create() and sysmon_envsys_destroy() were added to
create/destroy sysmon_envsys objects (and its TAILQ/LIST for sensors/events).

sysmon_envsys_sensor_attach() and sysmon_envsys_sensor_detach() were
added to attach/detach sensors to a specified sysmon_envsys device.

The events framework is now per device and configurable via the
ENVSYS_SETDICTIONARY ioctl or /etc/envsys.conf and envstat(8).

Update all users and documentation to reflect these changes.
 1.5 17-Oct-2007  garbled branches: 1.5.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.4 09-Jul-2007  ad branches: 1.4.8; 1.4.10; 1.4.14;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.3 01-Jul-2007  xtraeme Imported envsys 2, a brief description of the new features:
(Part 2: drivers)

* Support for detachable sensors.
* Cleaned up the API for simplicity and efficiency.
* Ability to send capacity/critical/warning events to powerd(8).
* Adapted all the code to the new locking order.
* Compatibility with the old envsys API: the ENVSYS_GTREINFO
and ENVSYS_GTREDATA ioctl(2)s are supported.
* Added support for a 'dictionary based communication channel' between
sysmon_power(9) and powerd(8), that means there is no 32 bytes event
size restriction anymore.
* Binary compatibility with old envstat(8) and powerd(8) via COMPAT_40.
* All drivers with the n^2 gtredata bug were fixed, PR kern/36226.

Tested by:

blymn: smsc(4).
bouyer: ipmi(4), mfi(4).
kefren: ug(4).
njoly: viaenv(4), adt7463.c.
riz: owtemp(4).
xtraeme: acpiacad(4), acpibat(4), acpitz(4), aiboost(4), it(4), lm(4).
 1.2 15-Feb-2007  ad branches: 1.2.6; 1.2.8; 1.2.14;
Replace some uses of lockmgr() / simplelocks.
 1.1 01-Oct-2006  bouyer branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10;
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
 1.1.10.2 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1621):
sys/arch/i386/conf/GENERIC: revision 1.787 via patch
share/man/man4/Makefile: revision 1.407 via patch
distrib/sets/lists/man/mi: revision 1.936 via patch
share/man/man4/ipmi.4: revision 1.1 via patch
sys/arch/i386/i386/bios32.c: revision 1.11 via patch
sys/dev/DEVNAMES: revision 1.221 via patch
sys/arch/x86/x86/ipmi.c: revision 1.1 via patch
sys/arch/i386/i386/mainbus.c: revision 1.65 via patch
sys/arch/x86/include/smbiosvar.h: revision 1.1 via patch
sys/arch/x86/include/ipmivar.h: revision 1.1 via patch
sys/arch/x86/conf/files.x86: revision 1.20 via patch
sys/arch/i386/conf/files.i386: revision 1.293 via patch
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
Add manpage for new ipmi driver.
Claim ipmi.
 1.1.10.1 01-Oct-2006  ghen file ipmivar.h was added on branch netbsd-3 on 2007-01-08 16:36:20 +0000
 1.1.8.5 07-Dec-2007  yamt sync with head
 1.1.8.4 03-Sep-2007  yamt sync with head.
 1.1.8.3 26-Feb-2007  yamt sync with head.
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 01-Oct-2006  yamt file ipmivar.h was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.1.4.2 18-Nov-2006  ad Sync with head.
 1.1.4.1 01-Oct-2006  ad file ipmivar.h was added on branch newlock2 on 2006-11-18 21:29:38 +0000
 1.1.2.2 22-Oct-2006  yamt sync with head
 1.1.2.1 01-Oct-2006  yamt file ipmivar.h was added on branch yamt-splraiseipl on 2006-10-22 06:05:16 +0000
 1.2.14.1 03-Oct-2007  garbled Sync with HEAD
 1.2.8.1 11-Jul-2007  mjf Sync with head.
 1.2.6.4 03-Dec-2007  ad Sync with HEAD.
 1.2.6.3 15-Jul-2007  ad Sync with head.
 1.2.6.2 10-Apr-2007  ad Nuke the deferred kthread creation stuff, as it's no longer needed.
Pointed out by thorpej@.
 1.2.6.1 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.4.14.1 18-Nov-2007  bouyer Sync with HEAD
 1.4.10.2 09-Jan-2008  matt sync with HEAD
 1.4.10.1 06-Nov-2007  matt sync with HEAD
 1.4.8.1 21-Nov-2007  joerg Sync with HEAD.
 1.5.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.6.14.3 17-Jan-2009  mjf Sync with HEAD.
 1.6.14.2 28-Sep-2008  mjf Sync with HEAD.
 1.6.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.7.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.7.10.1 19-Oct-2008  haad Sync with HEAD.
 1.7.6.1 10-Oct-2008  skrll Sync with HEAD.
 1.7.4.3 11-Aug-2010  yamt sync with head.
 1.7.4.2 19-Aug-2009  yamt sync with head.
 1.7.4.1 04-May-2009  yamt sync with head.
 1.8.4.1 06-Nov-2008  snj Pull up following revision(s) (requested by cegger in ticket #10):
sys/arch/x86/x86/ipmi.c: revision 1.23
sys/arch/x86/include/ipmivar.h: revision 1.9
The functions called from ipmi_match use the DEVNAME macro. But the
softc is allocated on the stack and the accessed sc_dev member is not
initialized.
Initialize the sc_dev.dv_xname in ipmi_match, which is enough to make
DEVNAME work. Finally this also allows the device_t/softc split.
 1.8.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.9.4.3 24-Oct-2010  jym Sync with HEAD
 1.9.4.2 01-Nov-2009  jym Sync with HEAD.
 1.9.4.1 23-Jul-2009  jym Sync with HEAD.
 1.10.4.1 05-Mar-2011  rmind sync with head
 1.10.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.11.60.1 10-Jun-2019  christos Sync with HEAD
 1.11.58.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.13 12-Dec-2021  andvar s/Miscellanous/Miscellaneous/ in copypasta comments.
 1.12 15-Oct-2016  jdolecek provide intr xname
 1.11 01-Jul-2011  dyoung branches: 1.11.12; 1.11.30; 1.11.34;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.10 19-Aug-2009  dyoung isa_detach_hook() needs two arguments, the first an isa_chipset_tag_t.
 1.9 18-Aug-2009  dyoung These are stragglers from my last commit ("Let us safely detach
the ISA bus and devices attaching to the ISA bus"). Define
isa_detach_hook() in MD ISA implementations. Define isa_dmadestroy().
 1.8 25-Mar-2009  dyoung It is only by accident that these get the definitions they need from
<sys/device.h>, so explicitly #include <sys/device.h>.
 1.7 08-Feb-2009  bouyer branches: 1.7.2;
Apply patch proposed on port-amd64/port-i386, allowing to use a 64bit
bus_addr_t on i386PAE kernels:
change bus_addr_t to be a paddr_t (so its size follows paddr_t depending
on options PAE)
remplace bus_addr_t with vaddr_t where the value is used as a virtual address.

Difference with the proposed patch: cast to uintmax_t and use %jx in
printf() as suggested by Joerg.
 1.6 27-Jun-2008  cegger branches: 1.6.4; 1.6.6; 1.6.12;
struct device * -> device_t
 1.5 28-Apr-2008  martin branches: 1.5.2; 1.5.4;
Remove clause 3 and 4 from TNF licenses
 1.4 16-Apr-2005  yamt branches: 1.4.82; 1.4.84; 1.4.86;
make multi inclusion protection macros consistent.
 1.3 07-Aug-2003  agc branches: 1.3.8; 1.3.14;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.2 09-May-2003  fvdl branches: 1.2.2;
A few ISA sound drivers like to share dma channels, and hence deferred
isa_dmamap_create() calls to their open/close entrypoints. This worked
with some luck, but broke on i386 when _bus_dmamap_create started
to allocate bounce buffers upfront, since memory below 16M may well
not be available when the sound devices is opened for the Nth time.

To fix this, create a new simple interface, isa_drq_alloc/isa_drq_free,
wrappers around already existing bitmask macros. These are expected
to be used before an isa_dmamap_create call, and after an
isa_dmamap_destroy call, respectively. For the sb and ad1848 drivers,
they're deferred until open/close.

All isa_dmamap_create calls can now use BUS_DMA_ALLOCNOW and be done
at attach time.
 1.1 27-Feb-2003  fvdl Move a few more files to x86/include. Trim the list of files to install
in /usr/include a bit.
 1.2.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.14.1 21-Apr-2005  tron Pull up revision 1.4 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.3.8.1 29-Apr-2005  kent sync with -current
 1.4.86.3 19-Aug-2009  yamt sync with head.
 1.4.86.2 04-May-2009  yamt sync with head.
 1.4.86.1 16-May-2008  yamt sync with head.
 1.4.84.1 18-May-2008  yamt sync with head.
 1.4.82.2 29-Jun-2008  mjf Sync with HEAD.
 1.4.82.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.4.1 27-Jun-2008  simonb Sync with head.
 1.5.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.6.12.1 21-Apr-2010  matt sync to netbsd-5
 1.6.6.1 29-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/include/bus.h: revision 1.18
sys/arch/x86/include/isa_machdep.h: revision 1.7
sys/arch/x86/x86/bus_space.c: revision 1.21
Apply patch proposed on port-amd64/port-i386, allowing to use a 64bit
bus_addr_t on i386PAE kernels:
change bus_addr_t to be a paddr_t (so its size follows paddr_t depending
on options PAE)
remplace bus_addr_t with vaddr_t where the value is used as a virtual address.
Difference with the proposed patch: cast to uintmax_t and use %jx in
printf() as suggested by Joerg.
 1.6.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.6.4.1 03-Mar-2009  skrll Sync with HEAD.
 1.7.2.3 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.7.2.2 01-Nov-2009  jym Sync with HEAD.
 1.7.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.34.1 04-Nov-2016  pgoyette Sync with HEAD
 1.11.30.1 05-Dec-2016  skrll Sync with HEAD
 1.11.12.1 03-Dec-2017  jdolecek update from HEAD
 1.6 06-Aug-2014  joerg Consistently define WARN in a way that passes format string checks, i.e.
always uses the same number of arguments as given in the format string.
 1.5 06-Apr-2014  joerg x86_progress takes a format string.
 1.4 21-Mar-2009  ad branches: 1.4.12; 1.4.22; 1.4.26; 1.4.36;
Fix 'boot -z' bogons.
 1.3 25-Sep-2008  christos branches: 1.3.2; 1.3.8;
define a TEST mode.
 1.2 28-Apr-2008  martin branches: 1.2.2; 1.2.6;
Remove clause 3 and 4 from TNF licenses
 1.1 01-Oct-2007  ad branches: 1.1.2; 1.1.4; 1.1.6; 1.1.10; 1.1.14; 1.1.28; 1.1.30; 1.1.32;
Now that the bootblocks are the same, share loadfile_machdep.h between
amd64 and i386.
 1.1.32.2 04-May-2009  yamt sync with head.
 1.1.32.1 16-May-2008  yamt sync with head.
 1.1.30.1 18-May-2008  yamt sync with head.
 1.1.28.2 28-Sep-2008  mjf Sync with HEAD.
 1.1.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.14.2 06-Nov-2007  matt sync with HEAD
 1.1.14.1 01-Oct-2007  matt file loadfile_machdep.h was added on branch matt-armv6 on 2007-11-06 23:23:36 +0000
 1.1.10.2 27-Oct-2007  yamt sync with head.
 1.1.10.1 01-Oct-2007  yamt file loadfile_machdep.h was added on branch yamt-lazymbuf on 2007-10-27 11:28:55 +0000
 1.1.6.2 09-Oct-2007  ad Sync with head.
 1.1.6.1 01-Oct-2007  ad file loadfile_machdep.h was added on branch vmlocking on 2007-10-09 13:38:42 +0000
 1.1.4.2 06-Oct-2007  yamt sync with head.
 1.1.4.1 01-Oct-2007  yamt file loadfile_machdep.h was added on branch yamt-x86pmap on 2007-10-06 15:33:32 +0000
 1.1.2.2 02-Oct-2007  joerg Sync with HEAD.
 1.1.2.1 01-Oct-2007  joerg file loadfile_machdep.h was added on branch jmcneill-pm on 2007-10-02 18:27:49 +0000
 1.2.6.1 19-Oct-2008  haad Sync with HEAD.
 1.2.2.1 10-Oct-2008  skrll Sync with HEAD.
 1.3.8.2 01-Nov-2009  jym Sync with HEAD.
 1.3.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.3.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.4.36.1 10-Aug-2014  tls Rebase.
 1.4.26.1 18-May-2014  rmind sync with head
 1.4.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29 12-Feb-2022  riastradh __cpu_simple_lock(9): Omit needless barriers in init.

It is, and always has been, the caller's responsibility to ensure the
lock is initialized before it can be used -- otherwise the memory
could hold garbage; it is nonsensical to even attempt locking
operations on it before initialization.

So there's no need to issue explicit barriers here. The barrier
seems to have been introduced in sys/arch/alpha/alpha/lock_machdep.c
rev. 1.1 (since moved to inline asm in alpha/include/lock.h) and then
copied & pasted into several other architectures.
 1.28 16-Sep-2017  christos more const
 1.27 22-Jan-2013  christos Allow for non inlined definitions for RUMP
 1.26 11-Oct-2012  apb Change "=r" to "=qQ" in a register constraint in an asm statement
for a register that is used with the "xchgb" instruction in the
definition of __cpu_simple_lock_try(). This fixes PR 45673, or at
least works around the gcc bug that might be behind PR 45673.

The output from "objdump -d" before and after this change is
identical, for the amd64 GENERIC kernel, the i386 GENERIC kernel,
and the i386 MONOLITHIC kernel.
 1.25 15-Jan-2009  pooka branches: 1.25.14; 1.25.20; 1.25.24; 1.25.26;
The last _KERNEL -> _HARDKERNEL in locking operations.
 1.24 28-Apr-2008  martin branches: 1.24.8;
Remove clause 3 and 4 from TNF licenses
 1.23 09-Jan-2008  yamt branches: 1.23.6; 1.23.8; 1.23.10;
fix SPINLOCK_BACKOFF_HOOK.
 1.22 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.21 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.20 20-Dec-2007  ad - Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.19 07-Nov-2007  ad branches: 1.19.2; 1.19.6;
__cpu_simple_locks really should be simple, otherwise they can cause
problems for e.g. profiling.
 1.18 17-Oct-2007  garbled branches: 1.18.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.17 26-Sep-2007  ad branches: 1.17.2;
Make it build in userspace again.
 1.16 11-Sep-2007  skrll branches: 1.16.2;
Always provide __cpu_simple_lock_{set,clear}.

Fixes LOCKDEBUG kernel builds.
 1.15 10-Sep-2007  skrll Merge nick-csl-alignment.
 1.14 10-Feb-2007  ad branches: 1.14.6; 1.14.12; 1.14.14; 1.14.18; 1.14.22; 1.14.24;
NSPR builds seem to choke on 'inline'. Replace it with __inline.
 1.13 09-Feb-2007  ad Merge newlock2 to head.
 1.12 18-Dec-2006  ad __cpu_simple_unlock(): add a note about memory ordering and why this is
correct, contrary to Intel's documentation.
 1.11 28-Dec-2005  perry branches: 1.11.20; 1.11.22;
inline -> __inline
 1.10 24-Dec-2005  perry Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.9 16-Apr-2005  yamt branches: 1.9.2;
make multi inclusion protection macros consistent.
 1.8 25-Nov-2004  yamt branches: 1.8.4; 1.8.10;
remove __lockbarrier, which i forgot to remove in the previous.
 1.7 31-Oct-2004  yamt use __insn_barrier rather than homegrown equivalents.
 1.6 23-Oct-2004  yamt __cpu_simple_lock: loop without locking cache or asserting LOCK#.
 1.5 27-Oct-2003  junyoung Nuke __P().
 1.4 26-Oct-2003  yamt define SPINLOCK_SPIN_HOOK to let LK_SPIN lockmgr locks call x86_pause.
 1.3 26-Sep-2003  nathanw Move __cpu_simple_lock_t and __SIMPLELOCK_{UN,}LOCKED to machine/types.h
so that they can be used in a namespace-friendly way.
 1.2 08-May-2003  fvdl branches: 1.2.2;
Add x86_pause() inline function, containing the "pause" instruction
for i386, and nothing for amd64. Sprinkle it in various spinloops,
as recommended by Intel.
 1.1 27-Feb-2003  fvdl Move a few more files to x86/include. Trim the list of files to install
in /usr/include a bit.
 1.2.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.5 29-Nov-2004  skrll Sync with HEAD.
 1.2.2.4 02-Nov-2004  skrll Sync with HEAD.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.8.10.1 21-Apr-2005  tron Pull up revision 1.9 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.8.4.1 29-Apr-2005  kent sync with -current
 1.9.2.5 21-Jan-2008  yamt sync with head
 1.9.2.4 15-Nov-2007  yamt sync with head.
 1.9.2.3 27-Oct-2007  yamt sync with head.
 1.9.2.2 26-Feb-2007  yamt sync with head.
 1.9.2.1 30-Dec-2006  yamt sync with head.
 1.11.22.1 18-Dec-2006  yamt sync with head.
 1.11.20.4 02-Feb-2007  ad - Define memory barrier ops in lock_stubs.S.
- If lfence/mfence are available, patch them in at boot.
- Patch to a no-op if !MULTIPROCESSOR. XXX Should be determined at runtime.
 1.11.20.3 12-Jan-2007  ad Sync with head.
 1.11.20.2 29-Dec-2006  ad Checkpoint work in progress.
 1.11.20.1 20-Oct-2006  ad Define memory barriers: mb_read(), mb_write(), mb_memory()
 1.14.24.4 23-Mar-2008  matt sync with HEAD
 1.14.24.3 09-Jan-2008  matt sync with HEAD
 1.14.24.2 08-Nov-2007  matt sync with -HEAD
 1.14.24.1 06-Nov-2007  matt sync with HEAD
 1.14.22.2 11-Nov-2007  joerg Sync with HEAD.
 1.14.22.1 02-Oct-2007  joerg Sync with HEAD.
 1.14.18.2 15-Aug-2007  skrll Provide __SIMPLELOCK_{UN,}LOCKED_P and __cpu_simple_lock_{set,clear}
for all architectures.
 1.14.18.1 18-Jul-2007  skrll Initial work on provided correctly aligned __cpu_simple_lock_t for hppa
and first attempt at adapting i386 to the changes.

More to come.
 1.14.14.1 03-Oct-2007  garbled Sync with HEAD
 1.14.12.1 18-Apr-2007  thorpej Convert i386 and amd64 to the new atomic ops API.
 1.14.6.2 03-Dec-2007  ad Sync with HEAD.
 1.14.6.1 09-Oct-2007  ad Sync with head.
 1.16.2.1 06-Oct-2007  yamt sync with head.
 1.17.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.18.2.3 18-Feb-2008  mjf Sync with HEAD.
 1.18.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.18.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.19.6.3 10-Jan-2008  bouyer Sync with HEAD
 1.19.6.2 08-Jan-2008  bouyer Sync with HEAD
 1.19.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.19.2.1 26-Dec-2007  ad Sync with head.
 1.23.10.2 04-May-2009  yamt sync with head.
 1.23.10.1 16-May-2008  yamt sync with head.
 1.23.8.1 18-May-2008  yamt sync with head.
 1.23.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.23.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.24.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.25.26.1 19-Oct-2012  riz Pull up following revision(s) (requested by apb in ticket #606):
sys/arch/x86/include/lock.h: revision 1.26
Change "=r" to "=qQ" in a register constraint in an asm statement
for a register that is used with the "xchgb" instruction in the
definition of __cpu_simple_lock_try(). This fixes PR 45673, or at
least works around the gcc bug that might be behind PR 45673.
The output from "objdump -d" before and after this change is
identical, for the amd64 GENERIC kernel, the i386 GENERIC kernel,
and the i386 MONOLITHIC kernel.
 1.25.24.3 03-Dec-2017  jdolecek update from HEAD
 1.25.24.2 25-Feb-2013  tls resync with head
 1.25.24.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.25.20.1 17-Oct-2012  riz Pull up following revision(s) (requested by apb in ticket #606):
sys/arch/x86/include/lock.h: revision 1.26
Change "=r" to "=qQ" in a register constraint in an asm statement
for a register that is used with the "xchgb" instruction in the
definition of __cpu_simple_lock_try(). This fixes PR 45673, or at
least works around the gcc bug that might be behind PR 45673.
The output from "objdump -d" before and after this change is
identical, for the amd64 GENERIC kernel, the i386 GENERIC kernel,
and the i386 MONOLITHIC kernel.
 1.25.14.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.25.14.1 30-Oct-2012  yamt sync with head
 1.1 30-Nov-2024  christos branches: 1.1.4;
Create a new header lwp_private.h to contain _lwp_getprivate_fast,
_lwp_gettcb_fast, _lwp_settcb and remove them from mcontext.h, so that:
1. we don't need special hacks to hide them
2. we can include <lwp.h> where needed to get the necessary prototypes
without redefining them locally.
 1.1.4.2 02-Aug-2025  perseant Sync with HEAD
 1.1.4.1 30-Nov-2024  perseant file lwp_private.h was added on branch perseant-exfatfs on 2025-08-02 05:56:17 +0000
 1.12 28-Oct-2021  riastradh x86: Process bootloader rndseed much sooner.
 1.11 15-Nov-2020  bouyer remove unused x86_cpu_initclock_func()
 1.10 25-Apr-2020  bouyer branches: 1.10.2;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.9 26-Dec-2016  cherry branches: 1.9.26;
the i386 and amd64 boot time msgbuf init code is nearly identical.

Unify them into x86/x86_machdep.c:init_x86_msgbuf()

Boot tested on GENERIC (i386, amd64), XEN3_DOM0 (amd64)
 1.8 16-Jul-2016  maxv Simplify the way physical pages are internalized into the VM system on x86.
Only two functions are called now: init_x86_clusters, which initializes the
memory clusters from the bootinfo, and init_x86_vm, which inserts the pages
from the clusters into VM.
 1.7 12-Jun-2014  riastradh branches: 1.7.4; 1.7.8;
Tweak x86 page freelists and add x86_select_freelist.

- Add 4G freelist to i386 -- there may be higher addresses if PAE.
- Add 64G and 1T freelists to amd64.
- Simplify freelist setup code and condense it into a table.
- Add x86_select_freelist to get a freelist guaranteed to yield
addresses no greater than a prescribed maximum address.

x86_select_freelist takes a uint64_t, not a paddr_t or bus_addr_t, so
that you can pass in, e.g., a 36-bit maximum address without needing
to write conditionals for i386/PAE.

No objections on port-x86:

https://mail-index.netbsd.org/port-i386/2014/05/21/msg003277.html
https://mail-index.netbsd.org/port-amd64/2014/05/21/msg002062.html
 1.6 12-Apr-2013  christos branches: 1.6.8;
de-duplication police arrests sysctl.
 1.5 21-Oct-2010  yamt branches: 1.5.8; 1.5.18;
don't forget to call nmi_init.
 1.4 23-Aug-2010  jruoho Other entry points beyond x86_cpu_idle_halt() may use HLT as the
idle-mechanism. Send an IPI also for these in cpu_need_resched().
 1.3 18-Jul-2010  jruoho Merge a driver for ACPI CPUs with basic support for processor power states,
also known as C-states. The code is modular and provides an easy way to add
the remaining functionality later (namely throttling and P-states).

Remarks:

1. Commented out in the GENERICs; more testing exposure is needed.

2. The C3-state is disabled for the time being because it turns off
timers, among them the local APIC timer. This may not be universally
true on all x86 processors; define ACPICPU_ENABLE_C3 to test.

3. The algorithm used to choose a power state may need tuning. When
evaluating the appropriate state, the implementation uses the
previous sleep time as an indicator. Additional hints would include
for example the system load.

Also bus master activity is evaluated when choosing a state. The
usb(4) stack is notorious for such activity even when unused.
Typically it must be disabled in order to reach the C3-state,
but it may also prevent the use of C2.

4. While no extensive empirical measurements have been carried out, the
power savings are somewhere between 1-2 W with C1 and C2, depending
on the processor, firmware, and load. With C3 even up to 4 W can be
saved. The less something ticks, the more power is saved.

ok jmcneill@, joerg@, and discussed with various people.
 1.2 15-Dec-2008  cegger branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10; 1.2.12;
cleanup BIOS memmap code:
- get rid of some nested externs
- reduce dependency on global variables
- some preparations for upcoming pmem(9)
 1.1 14-Nov-2008  cegger branches: 1.1.4;
merge BIOS memmap code from i386/i386/machdep.c:init386() and amd64/amd64/machdep.c:init_x86_64 into x86/x86/x86_machdep.c
 1.1.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.1.4.1 14-Nov-2008  haad file machdep.h was added on branch haad-dm on 2008-12-13 01:13:38 +0000
 1.2.12.1 05-Mar-2011  rmind sync with head
 1.2.10.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.10.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.2.8.4 09-Oct-2010  yamt sync with head
 1.2.8.3 11-Aug-2010  yamt sync with head.
 1.2.8.2 04-May-2009  yamt sync with head.
 1.2.8.1 15-Dec-2008  yamt file machdep.h was added on branch yamt-nfs-mp on 2009-05-04 08:12:09 +0000
 1.2.6.1 24-Oct-2010  jym Sync with HEAD
 1.2.4.2 19-Jan-2009  skrll Sync with HEAD.
 1.2.4.1 15-Dec-2008  skrll file machdep.h was added on branch nick-hppapmap on 2009-01-19 13:17:09 +0000
 1.2.2.2 17-Jan-2009  mjf Sync with HEAD.
 1.2.2.1 15-Dec-2008  mjf file machdep.h was added on branch mjf-devfs2 on 2009-01-17 13:28:38 +0000
 1.5.18.3 03-Dec-2017  jdolecek update from HEAD
 1.5.18.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.18.1 23-Jun-2013  tls resync from head
 1.5.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.8.1 10-Aug-2014  tls Rebase.
 1.7.8.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.7.8.1 26-Jul-2016  pgoyette Sync with HEAD
 1.7.4.2 05-Feb-2017  skrll Sync with HEAD
 1.7.4.1 05-Oct-2016  skrll Sync with HEAD
 1.9.26.1 18-Apr-2020  bouyer Centralize initialisations of delay_func and initclock_func
in x86_machdep.c and export from <x86/machdep.h>
Introduce a x86_dummy_initclock() and a x86_cpu_initclock_func pointer,
to be used later for Xen HVM native clock support.
rename rtclock_tval to x86_rtclock_tval and export from <x86/machdep.h>,
for the benefit of lapic.c
 1.10.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.2 28-Oct-2003  kleink branches: 1.2.4;
#define __HAVE_LONG_DOUBLE on platforms which implement a distinct
`long double' type.
 1.1 22-Oct-2003  kleink Use a common <machine/math.h> for amd64 and i386.
 1.2.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.4.2 03-Aug-2004  skrll Sync with HEAD
 1.2.4.1 28-Oct-2003  skrll file math.h was added on branch ktrace-lwp on 2004-08-03 10:43:04 +0000
 1.12 06-Oct-2025  riastradh x86: Wire up PCI resource manager if enabled.

Enable in your kernel config with `options PCI_RESOURCE'.

Adapted from a patch by mlelstv@.

PR port-amd64/59118: Thinkpad T495s - iwm PCI BAR is zero
 1.11 23-May-2017  nonaka branches: 1.11.48;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.10 31-Mar-2013  chs branches: 1.10.12;
remove unused variable declaration.
 1.9 17-Apr-2009  dyoung branches: 1.9.12; 1.9.22;
Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.8 09-Nov-2008  cegger branches: 1.8.4;
struct device * -> device_t
 1.7 09-Nov-2008  cegger Nuke last parameter from mpaci_scan_apics() and mpbios_scan().
It is unused.
 1.6 17-Oct-2007  garbled branches: 1.6.16; 1.6.20; 1.6.26; 1.6.28;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.5 06-Oct-2007  joerg Merge from mpacpi.h 1.4.32.1, acpi_machdep.c 1.13.22.5 and
mpacpi.c 1.48.12.2 from jmcneill-pm:

Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.

Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.4 04-Jul-2006  christos branches: 1.4.8; 1.4.14; 1.4.22; 1.4.24; 1.4.32; 1.4.34; 1.4.36;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.3 16-Apr-2005  yamt branches: 1.3.2; 1.3.12; 1.3.16; 1.3.24;
make multi inclusion protection macros consistent.
 1.2 29-May-2003  fvdl branches: 1.2.2; 1.2.10; 1.2.16;
Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.1 11-May-2003  fvdl Make this include file shareable (moved here from sys/arch/i386/include)
 1.2.16.1 21-Apr-2005  tron Pull up revision 1.3 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.2.10.1 29-Apr-2005  kent sync with -current
 1.2.2.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.24.1 13-Jul-2006  gdamore Merge from HEAD.
 1.3.16.1 11-Aug-2006  yamt sync with head
 1.3.12.1 09-Sep-2006  rpaulo sync with head
 1.3.2.2 27-Oct-2007  yamt sync with head.
 1.3.2.1 30-Dec-2006  yamt sync with head.
 1.4.36.1 14-Oct-2007  yamt sync with head.
 1.4.34.1 06-Nov-2007  matt sync with HEAD
 1.4.32.2 07-Oct-2007  joerg Sync with HEAD.
 1.4.32.1 02-Oct-2007  joerg Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.

Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.4.24.1 29-Oct-2007  wrstuden Catch up with 4.0 RC3
 1.4.22.1 16-Oct-2007  garbled Sync with HEAD
 1.4.14.1 09-Oct-2007  ad Sync with head.
 1.4.8.1 14-Oct-2007  xtraeme Pull up following revision(s) (requested by joerg in ticket #925):
sys/arch/x86/x86/mpacpi.c: revision 1.50
sys/arch/x86/include/mpacpi.h: revision 1.5
sys/arch/x86/x86/acpi_machdep.c: revision 1.16

Merge from mpacpi.h 1.4.32.1, acpi_machdep.c 1.13.22.5 and
mpacpi.c 1.48.12.2 from jmcneill-pm:

Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.
Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.6.28.2 28-Apr-2009  skrll Sync with HEAD.
 1.6.28.1 19-Jan-2009  skrll Sync with HEAD.
 1.6.26.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.6.20.1 04-May-2009  yamt sync with head.
 1.6.16.1 17-Jan-2009  mjf Sync with HEAD.
 1.8.4.2 01-Nov-2009  jym Sync with HEAD.
 1.8.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.9.22.2 03-Dec-2017  jdolecek update from HEAD
 1.9.22.1 23-Jun-2013  tls resync from head
 1.9.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.10.12.1 28-Aug-2017  skrll Sync with HEAD
 1.11.48.1 20-Oct-2025  martin Pull up following revision(s) (requested by riastradh in ticket #66):

sys/arch/x86/include/mpacpi.h: revision 1.12
sys/arch/x86/x86/mpacpi.c: revision 1.112
sys/arch/amd64/conf/ALL: revision 1.194
sys/arch/i386/conf/ALL: revision 1.524
sys/arch/x86/acpi/acpi_machdep.c: revision 1.40
sys/arch/i386/conf/GENERIC: revision 1.1261
sys/dev/acpi/acpi_mcfg.h: revision 1.6
sys/arch/amd64/conf/GENERIC: revision 1.618

x86: Wire up PCI resource manager if enabled.

Enable in your kernel config with `options PCI_RESOURCE'.

Adapted from a patch by mlelstv@.
PR port-amd64/59118: Thinkpad T495s - iwm PCI BAR is zero
 1.6 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.5 28-Apr-2008  martin branches: 1.5.14; 1.5.20; 1.5.22;
Remove clause 3 and 4 from TNF licenses
 1.4 16-Apr-2008  cegger branches: 1.4.2; 1.4.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.3 04-Mar-2003  fvdl branches: 1.3.104;
Make the apic address unsigned, as it should be.
 1.2 04-Mar-2003  fvdl Fix some fields that did not have explicit types yet.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.104.1 02-Jun-2008  mjf Sync with HEAD.
 1.4.4.2 11-Aug-2010  yamt sync with head.
 1.4.4.1 16-May-2008  yamt sync with head.
 1.4.2.1 18-May-2008  yamt sync with head.
 1.5.22.1 30-May-2010  rmind sync with head
 1.5.20.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.5.14.1 24-Oct-2010  jym Sync with HEAD
 1.8 17-Apr-2009  dyoung Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.7 09-Nov-2008  cegger branches: 1.7.4;
struct device * -> device_t
 1.6 09-Nov-2008  cegger Nuke last parameter from mpaci_scan_apics() and mpbios_scan().
It is unused.
 1.5 28-Apr-2008  martin branches: 1.5.6; 1.5.8;
Remove clause 3 and 4 from TNF licenses
 1.4 04-Jul-2006  christos branches: 1.4.58; 1.4.60; 1.4.62;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.3 29-May-2003  fvdl branches: 1.3.18; 1.3.32; 1.3.36; 1.3.44;
Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.2 02-Apr-2003  thorpej Use PAGE_SIZE rather than NBPG.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.44.1 13-Jul-2006  gdamore Merge from HEAD.
 1.3.36.1 11-Aug-2006  yamt sync with head
 1.3.32.1 09-Sep-2006  rpaulo sync with head
 1.3.18.1 30-Dec-2006  yamt sync with head.
 1.4.62.2 04-May-2009  yamt sync with head.
 1.4.62.1 16-May-2008  yamt sync with head.
 1.4.60.1 18-May-2008  yamt sync with head.
 1.4.58.2 17-Jan-2009  mjf Sync with HEAD.
 1.4.58.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.5.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.5.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.7.4.2 01-Nov-2009  jym Sync with HEAD.
 1.7.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.15 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.14 15-Jun-2012  yamt branches: 1.14.2; 1.14.16;
comments
 1.13 01-Jul-2011  dyoung branches: 1.13.2;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.12 09-Jan-2010  cegger add x2apic support.
patch presented on current-users@, port-i386@ and port-amd64@ on 2009-12-22

No comments.
 1.11 17-Apr-2009  dyoung Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.10 16-Apr-2008  cegger branches: 1.10.4; 1.10.12; 1.10.18;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.9 04-Jul-2006  christos branches: 1.9.58;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.8 29-May-2005  christos branches: 1.8.2; 1.8.12; 1.8.16; 1.8.24;
Sprinkle const.
 1.7 16-Apr-2005  yamt make multi inclusion protection macros consistent.
 1.6 27-Oct-2003  junyoung branches: 1.6.8; 1.6.14;
Nuke __P().
 1.5 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.4 06-Sep-2003  fvdl When establishing the ACPI SCI, make sure it's always active low (as well
as level-triggered). Do this by changing the MP config entry that was
set up for the interrupt. Do not change anything if there was an ACPI
interrupt source override, assume that this contains the correct
information already.
 1.3 29-May-2003  fvdl branches: 1.3.2;
Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.2 11-May-2003  fvdl Add a global_int field to the mp_intr_map structure, for use with ACPI.
XXX should probably just use an array directly indexed by global interrupt
numbers in that case.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.6.14.1 21-Apr-2005  tron Pull up revision 1.7 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.6.8.1 29-Apr-2005  kent sync with -current
 1.8.24.1 13-Jul-2006  gdamore Merge from HEAD.
 1.8.16.1 11-Aug-2006  yamt sync with head
 1.8.12.1 09-Sep-2006  rpaulo sync with head
 1.8.2.1 30-Dec-2006  yamt sync with head.
 1.9.58.1 02-Jun-2008  mjf Sync with HEAD.
 1.10.18.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.10.18.3 24-Oct-2010  jym Sync with HEAD
 1.10.18.2 01-Nov-2009  jym Sync with HEAD.
 1.10.18.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.10.12.1 28-Apr-2009  skrll Sync with HEAD.
 1.10.4.2 11-Mar-2010  yamt sync with head
 1.10.4.1 04-May-2009  yamt sync with head.
 1.13.2.1 30-Oct-2012  yamt sync with head
 1.14.16.1 06-Jun-2015  skrll Sync with HEAD
 1.14.2.1 03-Dec-2017  jdolecek update from HEAD
 1.6 31-Jan-2020  maxv constify
 1.5 15-Dec-2011  abs branches: 1.5.48; 1.5.54;
Increase MTRR_I686_NVAR_MAX from 8 to 16. Avoids
"FIXME: more than 8 MTRRs (10)" message on booting Thinkpad W520 and
similar. While here replace a magic number with MTRR_I686_NVAR_MAX * 2
 1.4 01-Jul-2008  mrg branches: 1.4.6; 1.4.30; 1.4.34;
hack around PR#38480:

- rename MTRR_I686_NVAR to MTRR_I686_NVAR_MAX, still set to 8
- store mtrr VCNT value into i686_mtrr_vcnt. if it is less than 8,
zero out the relevant parts of mtrr_raw[].msraddr
- replace all usage of MTRR_I686_NVAR with either i686_mtrr_vcnt or
with MTRR_I686_NVAR_MAX as appropriate
- in i686_mtrr_reload() and mtrr_init_first() don't use mtrr_raw[]
addresses of 0

still needs a bunch of reworking to handle VCNT > 8 case.
 1.3 28-Apr-2008  martin branches: 1.3.2; 1.3.4;
Remove clause 3 and 4 from TNF licenses
 1.2 28-Jul-2003  mrg branches: 1.2.52; 1.2.68; 1.2.102; 1.2.104; 1.2.106;
give >32 bit constants an "LL" prefix to appease gcc3.3
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.2.106.2 04-May-2009  yamt sync with head.
 1.2.106.1 16-May-2008  yamt sync with head.
 1.2.104.1 18-May-2008  yamt sync with head.
 1.2.102.2 02-Jul-2008  mjf Sync with HEAD.
 1.2.102.1 02-Jun-2008  mjf Sync with HEAD.
 1.2.68.1 04-Sep-2008  skrll Sync with netbsd-4.
 1.2.52.3 18-Nov-2008  bouyer Pull up following revision(s) (requested by sborrill in ticket #1173):
sys/arch/x86/include/mtrr.h: revision 1.4
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.54
sys/arch/x86/x86/mtrr_i686.c: revision 1.18
hack around PR#38480:
- rename MTRR_I686_NVAR to MTRR_I686_NVAR_MAX, still set to 8
- store mtrr VCNT value into i686_mtrr_vcnt. if it is less than 8,
zero out the relevant parts of mtrr_raw[].msraddr
- replace all usage of MTRR_I686_NVAR with either i686_mtrr_vcnt or
with MTRR_I686_NVAR_MAX as appropriate
- in i686_mtrr_reload() and mtrr_init_first() don't use mtrr_raw[]
addresses of 0
still needs a bunch of reworking to handle VCNT > 8 case.
Ensure optional MTRR sections are built if MTRR is enabled (missing
Fix build due to changes in revision 1.4 of sys/arch/x86/include/mtrr.h
 1.2.52.2 23-Aug-2008  bouyer Back out ticket #1173, it breaks the build of amd64 kernels.
 1.2.52.1 20-Aug-2008  bouyer Pull up following revision(s) (requested by sborrill in ticket #1173):
sys/arch/x86/include/mtrr.h: revision 1.4
sys/arch/x86/x86/mtrr_i686.c: revision 1.18
hack around PR#38480:
- rename MTRR_I686_NVAR to MTRR_I686_NVAR_MAX, still set to 8
- store mtrr VCNT value into i686_mtrr_vcnt. if it is less than 8,
zero out the relevant parts of mtrr_raw[].msraddr
- replace all usage of MTRR_I686_NVAR with either i686_mtrr_vcnt or
with MTRR_I686_NVAR_MAX as appropriate
- in i686_mtrr_reload() and mtrr_init_first() don't use mtrr_raw[]
addresses of 0
still needs a bunch of reworking to handle VCNT > 8 case.
 1.3.4.1 03-Jul-2008  simonb Sync with head.
 1.3.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.4.34.1 18-Feb-2012  mrg merge to -current.
 1.4.30.1 17-Apr-2012  yamt sync with head
 1.4.6.1 19-Jun-2013  bouyer Pull up following revision(s) (requested by msaitoh in ticket #1847):
sys/arch/x86/include/mtrr.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.25
sys/arch/x86/include/specialreg.h: revision 1.55
Increase MTRR_I686_NVAR_MAX from 8 to 16. Avoids
"FIXME: more than 8 MTRRs (10)" message on booting Thinkpad W520 and
similar. While here replace a magic number with MTRR_I686_NVAR_MAX * 2
 1.5.54.1 29-Feb-2020  ad Sync with head.
 1.5.48.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.10 12-Jul-2023  riastradh machine/mutex.h: Sprinkle sys/types.h, omit machine/lock.h.

Turns out machine/lock.h is not needed for __cpu_simple_lock_t, which
always comes from sys/types.h. And, really, sys/types.h (or at least
sys/stdint.h) is needed for uintN_t and uintptr_t.
 1.9 05-Mar-2020  riastradh branches: 1.9.22;
Fix userland build by surrounding stuff with #ifdef _KERNEL.

(...Why does this header file get exposed to userland at all?)
 1.8 05-Mar-2020  riastradh Remove __MUTEX_PRIVATE conditional in definition of struct kmutex.

This doesn't buy us anything but the need to hack around it in
ctfmerge to avoid massive duplication of kernel types -- which only
worked for the x86 definition.

This changes only x86 and arm for now, pending compile-testing the
remaining architectures.
 1.7 29-Nov-2019  riastradh Nix now-unused definitions of MUTEX_GIVE/MUTEX_RECEIVE.
 1.6 24-Apr-2009  ad branches: 1.6.64; 1.6.68;
A workaround for a bug with some Opteron revisions where locked operations
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It's possible that this is the
reason for pkgbuild's longstanding crashiness.
 1.5 28-Apr-2008  martin branches: 1.5.8; 1.5.10; 1.5.14; 1.5.16;
Remove clause 3 and 4 from TNF licenses
 1.4 09-Dec-2007  ad branches: 1.4.10; 1.4.12; 1.4.14;
Use atomic_cas_ulong().
 1.3 21-Nov-2007  yamt branches: 1.3.2; 1.3.4;
make kmutex_t and krwlock_t smaller by killing lock id.
ok'ed by Andrew Doran.
 1.2 09-Feb-2007  ad branches: 1.2.4; 1.2.8; 1.2.14; 1.2.24; 1.2.26; 1.2.30; 1.2.32;
Merge newlock2 to head.
 1.1 10-Sep-2006  ad branches: 1.1.2;
file mutex.h was initially added on branch newlock2.
 1.1.2.8 01-Feb-2007  ad Header file cleanup.
 1.1.2.7 30-Jan-2007  ad Don't expose the guts of struct kmutex unless _KERNEL.
 1.1.2.6 27-Jan-2007  ad Rename some functions to better describe what they do.
 1.1.2.5 12-Jan-2007  ad Sync with head.
 1.1.2.4 29-Dec-2006  ad Checkpoint work in progress.
 1.1.2.3 17-Nov-2006  ad Checkpoint work in progress.
 1.1.2.2 20-Oct-2006  ad - Don't need locked bus cycles on release from C code.
- Save an integer ID in the lock structures for LOCKDEBUG code.
 1.1.2.1 10-Sep-2006  ad Add updated locking primatives.
 1.2.32.2 27-Dec-2007  mjf Sync with HEAD.
 1.2.32.1 08-Dec-2007  mjf Sync with HEAD.
 1.2.30.1 21-Nov-2007  bouyer Sync with HEAD
 1.2.26.1 09-Jan-2008  matt sync with HEAD
 1.2.24.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.2.24.1 21-Nov-2007  joerg Sync with HEAD.
 1.2.14.1 17-Apr-2007  thorpej G/C _lock_cas() -- the atomic ops API provides what the locking
primitives need.
 1.2.8.1 03-Dec-2007  ad Sync with HEAD.
 1.2.4.4 21-Jan-2008  yamt sync with head
 1.2.4.3 07-Dec-2007  yamt sync with head
 1.2.4.2 26-Feb-2007  yamt sync with head.
 1.2.4.1 09-Feb-2007  yamt file mutex.h was added on branch yamt-lazymbuf on 2007-02-26 09:08:49 +0000
 1.3.4.1 11-Dec-2007  yamt sync with head.
 1.3.2.1 26-Dec-2007  ad Sync with head.
 1.4.14.2 04-May-2009  yamt sync with head.
 1.4.14.1 16-May-2008  yamt sync with head.
 1.4.12.1 18-May-2008  yamt sync with head.
 1.4.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.16.1 13-May-2009  snj branches: 1.5.16.1.2;
Pull up following revision(s) (requested by ad in ticket #725):
sys/arch/x86/include/mutex.h: revision 1.6
A workaround for a bug with some Opteron revisions where locked operations
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It's possible that this is the
reason for pkgbuild's longstanding crashiness.
 1.5.16.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.5.14.2 01-Nov-2009  jym Sync with HEAD.
 1.5.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.5.10.1 13-May-2009  snj Pull up following revision(s) (requested by ad in ticket #725):
sys/arch/x86/include/mutex.h: revision 1.6
A workaround for a bug with some Opteron revisions where locked operations
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It is possible that this is the
reason for pkgbuild's longstanding crashiness.
 1.5.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.6.68.1 13-May-2020  martin Pull up following revision(s) (requested by chs in ticket #904):

sys/arch/x86/include/mutex.h: revision 1.8
sys/arch/x86/include/mutex.h: revision 1.9
sys/arch/arm/include/mutex.h: revision 1.22
sys/arch/arm/include/mutex.h: revision 1.23

Remove __MUTEX_PRIVATE conditional in definition of struct kmutex.

This doesn't buy us anything but the need to hack around it in
ctfmerge to avoid massive duplication of kernel types -- which only
worked for the x86 definition.

This changes only x86 and arm for now, pending compile-testing the
remaining architectures.

Fix userland build by surrounding stuff with #ifdef _KERNEL.
(...Why does this header file get exposed to userland at all?)
 1.6.64.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.22.1 09-Aug-2023  martin Pull up following revision(s) (requested by maya in ticket #316):

sys/arch/m68k/include/mutex.h: revision 1.13
sys/arch/arm/include/cpu.h: revision 1.125
sys/arch/sun68k/include/intr.h: revision 1.21
sys/arch/arm/include/mutex.h: revision 1.28
sys/sys/rwlock.h: revision 1.18
sys/arch/powerpc/include/mutex.h: revision 1.7
sys/arch/arm/include/mutex.h: revision 1.29
sys/arch/powerpc/include/mutex.h: revision 1.8
sys/uvm/uvm_param.h: revision 1.42
sys/sys/ksem.h: revision 1.16
sys/arch/x86/include/mutex.h: revision 1.10
sys/sys/proc.h: revision 1.372
sys/sys/ksem.h: revision 1.17
sys/arch/ia64/include/mutex.h: revision 1.8
sys/arch/evbarm/include/intr.h: revision 1.29
sys/sys/lua.h: revision 1.9
sys/arch/next68k/include/intr.h: revision 1.23
sys/arch/ia64/include/mutex.h: revision 1.9
sys/arch/hp300/include/intr.h: revision 1.35
sys/arch/hp300/include/intr.h: revision 1.36
sys/arch/sparc/include/cpu.h: revision 1.111
sys/arch/hppa/include/mutex.h: revision 1.16
sys/arch/vax/include/intr.h: revision 1.31
sys/arch/hppa/include/mutex.h: revision 1.17
sys/arch/news68k/include/intr.h: revision 1.28
sys/arch/hppa/include/mutex.h: revision 1.18
sys/arch/hppa/include/intr.h: revision 1.3
sys/arch/hppa/include/mutex.h: revision 1.19
sys/arch/hppa/include/intr.h: revision 1.4
sys/sys/sched.h: revision 1.92
sys/opencrypto/cryptodev.h: revision 1.51
sys/arch/vax/include/mutex.h: revision 1.20
sys/arch/sparc64/include/mutex.h: revision 1.10
sys/arch/ia64/include/sapicvar.h: revision 1.2
sys/arch/riscv/include/mutex.h: revision 1.5
sys/arch/amiga/dev/grfabs_cc.c: revision 1.39
sys/external/bsd/drm2/include/linux/idr.h: revision 1.11
sys/arch/riscv/include/mutex.h: revision 1.6
sys/ddb/files.ddb: revision 1.16
sys/arch/mac68k/include/intr.h: revision 1.32
share/man/man4/ddb.4: revision 1.203
sys/ddb/db_command.c: revision 1.183
sys/arch/mips/include/mutex.h: revision 1.10
sys/ddb/db_command.c: revision 1.184
sys/arch/x68k/include/intr.h: revision 1.22
sys/arch/sparc/include/psl.h: revision 1.51
sys/arch/or1k/include/mutex.h: revision 1.4
sys/arch/mips/include/mutex.h: revision 1.11
sys/arch/arm/xscale/pxa2x0_intr.h: revision 1.16
sys/arch/sparc64/include/cpu.h: revision 1.134
sys/arch/sparc/include/psl.h: revision 1.52
sys/arch/or1k/include/mutex.h: revision 1.5
sys/arch/mvme68k/include/intr.h: revision 1.22
sys/arch/luna68k/include/intr.h: revision 1.16
external/cddl/osnet/sys/sys/kcondvar.h: revision 1.6
sys/arch/sparc/include/mutex.h: revision 1.12
sys/arch/sparc/include/mutex.h: revision 1.13
sys/arch/usermode/include/mutex.h: revision 1.5
sys/arch/usermode/include/mutex.h: revision 1.6
sys/kern/kern_core.c: revision 1.38
usr.sbin/crash/Makefile: revision 1.49
sys/arch/amiga/include/intr.h: revision 1.23
sys/arch/alpha/include/mutex.h: revision 1.12
sys/arch/alpha/include/mutex.h: revision 1.13
sys/arch/evbarm/lubbock/sacc_obio.c: revision 1.16
sys/ddb/ddb.h: revision 1.6
sys/arch/sparc64/include/mutex.h: revision 1.8
sys/arch/sh3/include/mutex.h: revision 1.12
sys/arch/evbarm/lubbock/sacc_obio.c: revision 1.17
sys/ddb/db_syncobj.c: revision 1.1
sys/arch/vax/include/mutex.h: revision 1.18
sys/arch/sparc64/include/psl.h: revision 1.63
sys/arch/sparc64/include/mutex.h: revision 1.9
sys/arch/sh3/include/mutex.h: revision 1.13
sys/arch/evbarm/lubbock/obio.c: revision 1.13
sys/arch/atari/include/intr.h: revision 1.23
sys/ddb/db_syncobj.c: revision 1.2
sys/arch/vax/include/mutex.h: revision 1.19
sys/arch/evbarm/g42xxeb/obio.c: revision 1.14
sys/arch/evbarm/g42xxeb/obio.c: revision 1.15
sys/arch/cesfic/include/intr.h: revision 1.14
sys/ddb/db_syncobj.h: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.134
sys/arch/evbarm/g42xxeb/obio.c: revision 1.16
sys/arch/cesfic/include/intr.h: revision 1.15
sys/arch/arm/xscale/pxa2x0_intr.c: revision 1.26
sys/sys/cpu_data.h: revision 1.54
sys/arch/m68k/include/mutex.h: revision 1.12
sys/arch/ia64/acpi/madt.c: revision 1.6

sys/rwlock.h: Make this more self-contained for bool.

machine/mutex.h: Sprinkle includes so this can be used by crash(8).

ddb: New `show all tstiles' command.
Shows who's waiting for which locks and what the owner is up to.

Include psl.h for ipl_cookie_t if __MUTEX_PRIVATE

sys: Rip <sys/resourcevar.h> out of <uvm/uvm_param.h>.

And thus out of <sys/param.h>, which is exceedingly overused and
fragile and delenda est.

Should fix (some) issues with the recent inclusion of machine/lock.h
in various machine/mutex.h files.

arm/mutex.h: Need machine/intr.h, machine/lock.h.

For ipl_cookie_t and __cpu_simple_lock_t.
evbarm/intr.h: Define ipl_cookie_t before including ARM_INTR_IMPL.

Otherwise arm/mutex.h doesn't work, due to a cyclic dependency which
should really be fixed.
opencrypto/cryptodev.h: Fix includes.
- Move sys/condvar.h under #ifdef _KERNEL.
- Add some other necessary includes and forward declarations.
- Sort.

hp300/intr.h: Fix missing includes.
linux/idr.h: Need <sys/mutex.h> for kmutex_t.
amiga/intr.h: Don't define spl*() functions if !_KERNEL.

This is used by crash(8) now, and what's important is ipl_cookie_t.
cesfic/intr.h: Expose ipl_cookie_t to userland for crash(8).
cesfic/intr.h: Expose ipl_cookie_t to userland only with _KMEMUSER.

Probably not necessary but let's be a little more cautious about
this.

atari/intr.h: Expose ipl_cookie_t with _KMEMUSER for crash(8).

arm/cpu.h: Need sys/param.h for COHERENCY_UNIT.

Nix machine/param.h -- not meant to be used directly, pulled in by
sys/param.h.

Move the definition of ipl_cookie_t out of the kernel-only sections,
some _KMEMUSER applications need it.

ddb: Cast pointer to uintptr_t first before db_expr_t.

hppa/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

luna68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

mvme68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

news68k/intr.h: Fix includes. Put some definitions under _KERNEL.

next68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

sys/ksem.h: Hack around fstat(8) abuse of _KERNEL.

sun68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

vax/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

x68k/intr.h: Put functions under _KERNEL so crash(8) can use this.

Make ipl_cookie_t visible for _KMEMUSER userland applications.

fix editor mishap in previous

Explicitly include <sys/mutex.h> for kmutex_t.

Replace kmutex_t * (which may be undefined here) with struct kmutex *,
suggested by Taylor.

hp300/intr.h: Put most of this under #ifdef _KERNEL.
Only ipl_cookie_t really needs to be exposed now, for crash(8).

mac68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).
Make inclusion of sys/intr.h explicit for spl*.

fix hppa and vax builds.

machine/lock.h isn't necessary for __cpu_simple_lock_t, it's in
sys/types.h. avoids cpu_data.h vs sched.h include order issues.

move the hppa ipl_t typedef with the moved usage of it.
machine/mutex.h: Sprinkle sys/types.h, omit machine/lock.h.

Turns out machine/lock.h is not needed for __cpu_simple_lock_t, which
always comes from sys/types.h. And, really, sys/types.h (or at least
sys/stdint.h) is needed for uintN_t and uintptr_t.

ddb: Cast pointer to uintptr_t, then to db_expr_t.
Avoids warnings about conversion between pointer and integer of
different size on some architectures.

re-fix hppa builds.

this file uses __cpu_simple_lock(), not just the underlying type,
so it does need machine/lock.h.

Break cycle by using `struct kmutex *' instead of `kmutex_t *'.
sys/sched.h included sys/mutex.h
which includes sys/intr.h
which includes machine/intr.h
which on cats includes arm/footbridge/footbridge_intr.h
which includes arm/cpu.h
which includes sys/cpu_data.h
which includes sys/sched.h

But there was never any real need for sys/mutex.h in sys/sched.h,
because it only uses pointers to the opaque struct kmutex. Cycle
broken by using `struct kmutex *' instead of pulling in sys/mutex.h
for the definition of kmutex_t.

Side effect: This revealed that sys/cpu_data.h needed sys/intr.h
(which was pulled in accidentally by sys/mutex.h via sys/sched.h) for
SOFTINT_COUNT. Also revealed some other machine/cpu.h header files
were missing includes of sys/mutex.h for kmutex_t.

ia64: Need sys/types.h for u_int, vaddr_t; sys/mutex.h for kmutex_t.

explicitly include no longer implicitly included sys/mutex.h.

arm/xscale: Use sys/bitops.h fls32 - 1 instead of 31 - __builtin_clz.
Sidesteps namespace collision with `#define bits ...' in net/zlib.c.

complete the previous - there were two calls to find_first_bit() to fix.

arm/xscale: Missed a spot with previous find_first_bit commit.

evbarm/g42xxeb: Fix off-by-one in previous.

The original find_first_bit(x) was 31 - __builtin_clz((uint32_t)x),
which is equivalent to fls32(x) - 1, not to fls32(x).

Note that fls32 is 1-based and returns 0 for x=0.
 1.1 24-Feb-2009  yamt branches: 1.1.2; 1.1.4; 1.1.6;
- rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.1.6.3 01-Nov-2009  jym Sync with HEAD.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 24-Feb-2009  jym file nmi.h was added on branch jym-xensuspend on 2009-05-13 17:18:44 +0000
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 24-Feb-2009  yamt file nmi.h was added on branch yamt-nfs-mp on 2009-05-04 08:12:09 +0000
 1.1.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.1.2.1 24-Feb-2009  skrll file nmi.h was added on branch nick-hppapmap on 2009-03-03 18:29:37 +0000
 1.1 20-Aug-2022  riastradh x86: Move page attribute table bits to x86/pat.h.
 1.18 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.17 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.16 27-Apr-2015  knakahara add pci_intr_distribute(9) for x86.
 1.15 04-Mar-2015  knakahara add a comment for pci_intr_handle_t.
 1.14 14-Mar-2010  dyoung branches: 1.14.20; 1.14.38;
Add a new member, pc_super, to x86's pci_chipset_tag: pc.pc_super points
to the tag that pc inherits its behavior from. Add code to deal with
pc.pc_super.

Pull identical declarations out of xen/include/pci_machdep.h and
x86/include/pci_machdep.h into x86/include/pci_machdep_common.h.
 1.13 25-Feb-2010  dyoung Change the pci_attach_args definition to allow machine-dependent
code to override the default pci(9) behavior by creating a non-NULL
pci_attach_args_t (on x86, pci_attach_args_t is always NULL) containing
one or more non-NULL function pointers.
 1.12 24-Feb-2010  dyoung KNF: change spaces to tabs.
 1.11 24-Feb-2010  dyoung Don't bother to #define PCI_PREFER_IOSPACE, nothing uses it.
 1.10 24-Feb-2010  dyoung Change 'typedef void *pci_chipset_tag_t' to 'typedef struct
pci_chipset_tag *pci_chipset_tag_t' for an improvement in type safety.
(Back when I did the same for cardbus_chipset_tag_t, it helped to turn
up some bugs!)
 1.9 16-Feb-2010  dyoung Get rid of all PCI_CONF_MODE #ifdef'age except for the little bit
that initializes pci_mode, which I have moved to the top.

Make pci_mode private to pci_machdep.c.

Provide pci_mode_set() for pcibios.c to configure the PCI Configuration
Mechanism. KASSERT() in pci_mode_set() that the mechanism is not
changing from anything but the "don't know" value, -1.
 1.8 30-May-2008  joerg branches: 1.8.12; 1.8.18;
Add a function to extract the primary bus number of PCI host bridges,
as far as specific code for this already existed.
 1.7 16-Apr-2008  cegger branches: 1.7.2; 1.7.4; 1.7.6;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.6 20-Jun-2005  sekiya branches: 1.6.82;
pci_device_foreach(), pci_device_foreach_min(), pci_bridge_foreach(), and
pci_bridge_hook don't actually have any dependancies on PCIBIOS-specific code,
and they can be used to fixup PCI bus numbering in the absence of the BIOS.

To that end, decouple them from PCIBIOS.
 1.5 16-Apr-2005  yamt make multi inclusion protection macros consistent.
 1.4 29-Jul-2004  drochner branches: 1.4.4; 1.4.10;
remove now unnecessary "pci_enumerate_bus" definitions
 1.3 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.2 15-Jun-2003  fvdl branches: 1.2.2;
Handle 64bit DMA addresses on PCI for platforms that can (currently only
enabled on amd64). Add a dmat64 field to various PCI attach structures,
and pass it down where needed. Implement a simple new function called
pci_dma64_available(pa) to test if 64bit DMA addresses may be used.
This returns 1 iff _PCI_HAVE_DMA64 is defined in <machine/pci_machdep.h>,
and there is more than 4G of memory.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 06-Aug-2004  skrll Fix merge mistakes.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.4.10.1 21-Apr-2005  tron Pull up revision 1.5 (requested by yamt in ticket #174):
make multi inclusion protection macros consistent.
 1.4.4.1 29-Apr-2005  kent sync with -current
 1.6.82.1 02-Jun-2008  mjf Sync with HEAD.
 1.7.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.7.4.3 11-Aug-2010  yamt sync with head.
 1.7.4.2 11-Mar-2010  yamt sync with head
 1.7.4.1 04-May-2009  yamt sync with head.
 1.7.2.1 04-Jun-2008  yamt sync with head
 1.8.18.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.8.12.1 24-Oct-2010  jym Sync with HEAD
 1.14.38.2 06-Jun-2015  skrll Sync with HEAD
 1.14.38.1 06-Apr-2015  skrll Sync with HEAD
 1.14.20.1 03-Dec-2017  jdolecek update from HEAD
 1.25 01-Aug-2020  jdolecek move __HAVE_PCI_MSI_MSIX to <x86/pci_machdep_common.h>
 1.24 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.23 11-Jul-2016  knakahara branches: 1.23.28;
pci_intr_type() is required pci_chipset_tag_t argument by other than x86.

pointed out by nonaka@n.o.
 1.22 22-Oct-2015  knakahara add pci_intr_alloc related stubs to reduce ifdef from device drivers.
 1.21 17-Aug-2015  knakahara Add kernel code to support intrctl(8).
 1.20 13-Aug-2015  msaitoh - Don't take pci_attach_args as an argument in pci_msi[x]_count().
- Move prototypes of pci_msi[x]_count() from x86/x86/pci_machdep_common to
sys/dev/pci/pcivar.h.
- Move pci_msi[x]_count() from x86/pci/pci_msi_machdep.c to sys/dev/pci/pci.c
 1.19 21-Jul-2015  knakahara add pci_intr_alloc() API
 1.18 15-May-2015  knakahara pci_msi_string() must be used by MD code only.
 1.17 15-May-2015  knakahara unify INTx, MSI and MSI-X APIs without alloc. (alloc API is under discussion)
 1.16 08-May-2015  knakahara add a const qualifier to struct pci_attach_args *pa argument
 1.15 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.14 27-Apr-2015  knakahara add pci_intr_distribute(9) for x86.
 1.13 29-Mar-2014  christos branches: 1.13.6;
make pci_intr_string and eisa_intr_string take a buffer and a length
instead of relying in local static storage.
 1.12 31-Jul-2013  soren Blocking memory space accesses on the SIS 85C496 chipset turned out to be
a bit too heavy-handed and similar cases are unlikely to crop up again,
so simplify by eliminating pci_bus_flags().

Closes PR port-i386/20410.
 1.11 09-Dec-2012  jakllsch branches: 1.11.2;
Reflect that this file is now for the x86 ports and not just i386 in comments.
 1.10 09-Dec-2012  jakllsch Remove trailing whitespace on blank lines.
 1.9 15-Jun-2012  yamt branches: 1.9.2;
comment
 1.8 28-Aug-2011  dyoung branches: 1.8.2;
Add some code for grovelling in the PCI configuration space for all
of the memory & I/O space reserved by the PCI BIOS for PCI devices
(including bridges) and recording that information for later use.

The code takes between 13k and 50k (depends on the architecture and,
bizarrely, the kernel configuration) so I am going to move it from
pci_machdep.c into its own module on Monday.
 1.7 01-Aug-2011  drochner add an experimental implementation of PCI MSIs (Message Signaled
Interrupts). Successfully tested with hdaudio and "wpi" wireless
ethernet.
notes:
-There seem to be buggy chips around which announce MSI support
but don't correctly implement it. Thus the final word whether MSIs
can be used should be by the driver.
-Only a single vector is supported. For multiple vectors, the IDT
allocation code would have to be changed. (And we would possibly
run into problems due to the limited number of vectors supported
by the current code.)
-The code is "#if NIOAPIC > 0" because it uses the ioapic_edge
interrupt stubs. These actually don't touch any ioapic, so this
is somewhat a misnomer.
-MSIs can't be identified by a "pin" but only by a cpu/vector
pair. Common intr code soesn't deal well with this yet.
-Drivers need to take care of saving/restoring MSI data in the device's
config space on suspend/resume.
 1.6 04-Apr-2011  dyoung Neither pci_dma64_available(), pci_probe_device(), pci_mapreg_map(9),
pci_find_rom(), pci_intr_map(9), pci_enumerate_bus(), nor the match
predicate passed to pciide_compat_intr_establish() should ever modify
their pci_attach_args argument, so make their pci_attach_args arguments
const and deal with the fallout throughout the kernel.

For the most part, these changes add a 'const' where there was no
'const' before, however, some drivers and MD code used to modify
pci_attach_args. Now those drivers either copy their pci_attach_args
and modify the copy, or refrain from modifying pci_attach_args:

Xen: according to Manuel Bouyer, writing to pci_attach_args in
pci_intr_map() was a leftover from Xen 2. Probably a bug. I
stopped writing it. I have not tested this change.

siside(4): sis_hostbr_match() needlessly wrote to pci_attach_args.
Probably a bug. I use a temporary variable. I have not tested this
change.

slide(4): sl82c105_chip_map() overwrote the caller's pci_attach_args.
Probably a bug. Use a local pci_attach_args. I have not tested
this change.

viaide(4): via_sata_chip_map() and via_sata_chip_map_new() overwrote the
caller's pci_attach_args. Probably a bug. Make a local copy of the
caller's pci_attach_args and modify the copy. I have not tested
this change.

While I'm here, make pci_mapreg_submap() static.

With these changes in place, I have tested the compilation of these
kernels:

alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-eb NSLU2
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE GUMSTIX
HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321 IXDP425 IXM1200
KUROBOX_PRO LUBBOCK MARVELL_NAS NAPPI SHEEVAPLUG SMDK2800 TEAMASA_NPWR
TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sgimips GENERIC32_IP2x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC

As of Sun Apr 3 15:26:26 CDT 2011, I could not compile these kernels
with or without my patches in place:

### evbmips-el GDIUM

nbmake: nbmake: don't know how to make /home/dyoung/pristine-nbsd/src/sys/arch/mips/mips/softintr.c. Stop

### evbarm-el MPCSA_GENERIC
src/sys/arch/evbarm/conf/MPCSA_GENERIC:318: ds1672rtc*: unknown device `ds1672rtc'

### ia64 GENERIC

/tmp/genassym.28085/assym.c: In function 'f111':
/tmp/genassym.28085/assym.c:67: error: invalid application of 'sizeof' to incomplete type 'struct pcb'
/tmp/genassym.28085/assym.c:76: error: dereferencing pointer to incomplete type

### sgimips GENERIC32_IP3x

crmfb.o: In function `crmfb_attach':
crmfb.c:(.text+0x2304): undefined reference to `ddc_read_edid'
crmfb.c:(.text+0x2304): relocation truncated to fit: R_MIPS_26 against `ddc_read_edid'
crmfb.c:(.text+0x234c): undefined reference to `edid_parse'
crmfb.c:(.text+0x234c): relocation truncated to fit: R_MIPS_26 against `edid_parse'
crmfb.c:(.text+0x2354): undefined reference to `edid_print'
crmfb.c:(.text+0x2354): relocation truncated to fit: R_MIPS_26 against `edid_print'
 1.5 06-Nov-2010  jakllsch branches: 1.5.2;
Unbreak Xen build, while not actually fixing the real problem.
NetBSD/xen doesn't implement disestablishing interrupts yet.
 1.4 06-Nov-2010  jakllsch Implement pciide_machdep_compat_intr_disestablish() to help enable
detachment of compatibility-mapped pciide(4)-family controllers.
 1.3 28-Apr-2010  dyoung branches: 1.3.2; 1.3.4; 1.3.6;
Provide an x86 implementation of pci_chipset_tag_create(9) and
pci_chipset_tag_destroy(9).
 1.2 20-Mar-2010  dyoung Add a prototype for pci_mmio_range_infer() that will infer the
range of memory forwarded by the host chipset to PCI.
 1.1 14-Mar-2010  dyoung branches: 1.1.2;
Add a new member, pc_super, to x86's pci_chipset_tag: pc.pc_super points
to the tag that pc inherits its behavior from. Add code to deal with
pc.pc_super.

Pull identical declarations out of xen/include/pci_machdep.h and
x86/include/pci_machdep.h into x86/include/pci_machdep_common.h.
 1.1.2.3 21-Apr-2011  rmind sync with head
 1.1.2.2 05-Mar-2011  rmind sync with head
 1.1.2.1 30-May-2010  rmind sync with head
 1.3.6.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.3.6.4 02-May-2011  jym Sync with head.
 1.3.6.3 10-Jan-2011  jym Sync with HEAD
 1.3.6.2 24-Oct-2010  jym Sync with HEAD
 1.3.6.1 28-Apr-2010  jym file pci_machdep_common.h was added on branch jym-xensuspend on 2010-10-24 22:48:16 +0000
 1.3.4.2 11-Aug-2010  yamt sync with head.
 1.3.4.1 28-Apr-2010  yamt file pci_machdep_common.h was added on branch yamt-nfs-mp on 2010-08-11 22:52:55 +0000
 1.3.2.3 09-Nov-2010  uebayasi Sync with HEAD.
 1.3.2.2 30-Apr-2010  uebayasi Sync with HEAD.
 1.3.2.1 28-Apr-2010  uebayasi file pci_machdep_common.h was added on branch uebayasi-xip on 2010-04-30 14:39:57 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.8.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.8.2.1 30-Oct-2012  yamt sync with head
 1.9.2.3 03-Dec-2017  jdolecek update from HEAD
 1.9.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.2.1 25-Feb-2013  tls resync with head
 1.11.2.2 18-May-2014  rmind sync with head
 1.11.2.1 28-Aug-2013  rmind sync with head
 1.13.6.4 05-Oct-2016  skrll Sync with HEAD
 1.13.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.13.6.2 22-Sep-2015  skrll Sync with HEAD
 1.13.6.1 06-Jun-2015  skrll Sync with HEAD
 1.23.28.1 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.10 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.9 04-Nov-2017  cherry branches: 1.9.14;
Add a PIC_XEN abstraction to evtchn.c

This allows us to get XEN interrupt code closer to unification to x86/intr.c
 1.8 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.7 19-Apr-2009  ad branches: 1.7.22; 1.7.40;
cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.6 02-Apr-2009  dyoung During shutdown, detach devices in an orderly fashion.

Call the detach routine for every device in the device tree, starting
with the leaves and moving toward the root, expecting that each
(pseudo-)device driver will use the opportunity to gracefully commit
outstandings transactions to the underlying (pseudo-)device and to
relinquish control of the hardware to the system BIOS.

Detaching devices is not suitable for every shutdown: in an emergency,
or if the system state is inconsistent, we should resort to a fast,
simple shutdown that uses only the pmf(9) shutdown hooks and the
(deprecated) shutdownhooks. For now, if the flag RB_NOSYNC is set in
boothowto, opt for the fast, simple shutdown.

Add a device flag, DVF_DETACH_SHUTDOWN, that indicates by its presence
that it is safe to detach a device during shutdown. Introduce macros
CFATTACH_DECL3() and CFATTACH_DECL3_NEW() for creating autoconf
attachments with default device flags. Add DVF_DETACH_SHUTDOWN
to configuration attachments for atabus(4), atw(4) at cardbus(4),
cardbus(4), cardslot(4), com(4) at isa(4), elanpar(4), elanpex(4),
elansc(4), gpio(4), npx(4) at isa(4), nsphyter(4), pci(4), pcib(4),
pcmcia(4), ppb(4), sip(4), wd(4), and wdc(4) at isa(4).

Add a device-detachment "reason" flag, DETACH_SHUTDOWN, that tells the
autoconf code and a device driver that the reason for detachment is
system shutdown.

Add a sysctl, kern.detachall, that tells the system to try to detach
every device at shutdown, regardless of any device's DVF_DETACH_SHUTDOWN
flag. The default for kern.detachall is 0. SET IT TO 1, PLEASE, TO
HELP TEST AND DEBUG DEVICE DETACHMENT AT SHUTDOWN.

This is a work in progress. In future work, I aim to treat
pseudo-devices more thoroughly, and to gracefully tear down a stack of
(pseudo-)disk drivers and filesystems, including cgd(4), vnd(4), and
raid(4) instances at shutdown.

Also commit some changes that are not easily untangled from the rest:

(1) begin to simplify device_t locking: rename struct pmf_private to
device_lock, and incorporate device_lock into struct device.

(2) #include <sys/device.h> in sys/pmf.h in order to get some
definitions that it needs. Stop unnecessarily #including <sys/device.h>
in sys/arch/x86/include/pic.h to keep the amd64, xen, and i386 releases
building.
 1.5 03-Jul-2008  drochner branches: 1.5.4; 1.5.10;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.4 04-Jan-2008  ad branches: 1.4.6; 1.4.10; 1.4.12; 1.4.14;
Don't pull in sys/simplelock.h, it's not needed.
 1.3 12-Mar-2007  ad branches: 1.3.18; 1.3.24; 1.3.30;
Include sys/simplelock.h, not lock.h.
 1.2 04-Jul-2006  christos branches: 1.2.10; 1.2.14;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.1 26-Feb-2003  fvdl branches: 1.1.18; 1.1.32; 1.1.36; 1.1.44;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.44.1 13-Jul-2006  gdamore Merge from HEAD.
 1.1.36.1 11-Aug-2006  yamt sync with head
 1.1.32.1 09-Sep-2006  rpaulo sync with head
 1.1.18.3 21-Jan-2008  yamt sync with head
 1.1.18.2 03-Sep-2007  yamt sync with head.
 1.1.18.1 30-Dec-2006  yamt sync with head.
 1.2.14.1 13-Mar-2007  ad Sync with head.
 1.2.10.1 24-Mar-2007  yamt sync with head.
 1.3.30.1 08-Jan-2008  bouyer Sync with HEAD
 1.3.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.3.18.1 09-Jan-2008  matt sync with HEAD
 1.4.14.1 03-Jul-2008  simonb Sync with head.
 1.4.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.4.10.1 04-May-2009  yamt sync with head.
 1.4.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.5.10.2 01-Nov-2009  jym Sync with HEAD.
 1.5.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.5.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.7.40.1 06-Jun-2015  skrll Sync with HEAD
 1.7.22.1 03-Dec-2017  jdolecek update from HEAD
 1.9.14.1 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.10 15-Nov-2019  maxv Remove the ins* and outs* functions. Not sanitizer-friendly, and unused
anyway.
 1.9 22-May-2011  christos branches: 1.9.56;
remove _
 1.8 28-Apr-2008  martin branches: 1.8.14; 1.8.22; 1.8.28;
Remove clause 3 and 4 from TNF licenses
 1.7 17-Oct-2007  garbled branches: 1.7.16; 1.7.18; 1.7.20;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.6 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.5 16-Feb-2006  perry branches: 1.5.24; 1.5.32; 1.5.42; 1.5.44; 1.5.46;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.4 03-Feb-2006  bouyer branches: 1.4.2;
Change repne to rep for {ins,outs}{b,s,l} as proposed
to port-amd64, port-i386 and port-xen 2 weeks ago. Under Xen-3, a repne won't
loop (only the first value is read/written) while rep works as expected.
Linux and FreeBSD use rep, and documentation suggests that repne should
not be used with ins and outs instructions.
See http://mail-index.netbsd.org/port-xen/2006/01/22/0000.html for
details.
 1.3 24-Dec-2005  perry branches: 1.3.2; 1.3.4;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.2 27-Feb-2003  fvdl branches: 1.2.18;
Reinstate some const qualifiers I accidentally removed when moving this
file.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.18.2 27-Oct-2007  yamt sync with head.
 1.2.18.1 21-Jun-2006  yamt sync with head.
 1.3.4.1 09-Sep-2006  rpaulo sync with head
 1.3.2.1 18-Feb-2006  yamt sync with head.
 1.4.2.1 22-Apr-2006  simonb Sync with head.
 1.5.46.1 06-Oct-2007  yamt sync with head.
 1.5.44.1 06-Nov-2007  matt sync with HEAD
 1.5.42.1 02-Oct-2007  joerg Sync with HEAD.
 1.5.32.1 03-Oct-2007  garbled Sync with HEAD
 1.5.24.1 09-Oct-2007  ad Sync with head.
 1.7.20.1 16-May-2008  yamt sync with head.
 1.7.18.1 18-May-2008  yamt sync with head.
 1.7.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.8.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.8.22.1 31-May-2011  rmind sync with head
 1.8.14.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.9.56.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.134 20-Aug-2022  riastradh x86: Move definition of struct pmap to pmap_private.h.

This makes pmap_resident_count and pmap_wired_count out-of-line
functions instead of inline. No functional change intended
otherwise.
 1.133 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.132 20-Aug-2022  riastradh x86: Move pl*_i, pl_i_roundup, and ptp_va2o out of x86/pmap.h.

- pl[1-4]_i -> x86/pte.h
- pl_i, pl_i_roundup, ptp_va2o -> x86/pmap.c
 1.131 20-Aug-2022  riastradh x86: Move struct vm_page_md to common x86/pmap.h.
 1.130 20-Aug-2022  riastradh x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
 1.129 20-Aug-2022  riastradh x86: Move page attribute table bits to x86/pat.h.
 1.128 18-Jun-2022  andvar fix typos in word "functions" in comments, mainly s/fuctions/functions/.
 1.127 30-Apr-2021  christos Merge the x86 gdt function and constant definitions
 1.126 30-Apr-2021  christos Bump MAX_USERLDT_SIZE to the max size (wastes some memory). wine needs more
than PAGE_SIZE and fails spuriously.
XXX: Note the duplicate definition hacks. Should really create <x86/gdt.h>,
put the just the constants there and unify them.
This would also avoid the hack in: src/tests/lib/libi386/t_user_ldt.c#46
 1.125 19-Jul-2020  maxv branches: 1.125.6;
Revert most of ad's movs/stos change. Instead do a lot simpler: declare
svs_quad_copy() used by SVS only, with no need for instrumentation, because
SVS is disabled when sanitizers are on.
 1.124 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.123 24-Jun-2020  maxv remove unused x86_stos
 1.122 27-May-2020  ad - Add a couple of wrapper functions around STOS and MOVS and use them to zero
and copy PTEs in preference to memset()/memcpy().

- Remove related SSE / pageidlezero stuff.
 1.121 26-May-2020  bouyer Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().
 1.120 08-May-2020  riastradh Factor randomization out of slotspace_rand.

slotspace_rand becomes deterministic; the randomization moves into
the callers instead. Why?

There are two callers of slotspace_rand:

- x86/pmap.c pmap_bootstrap
- amd64/amd64.c init_slotspace

When the randomization was introduced, it used an x86-only
`cpu_earlyrng' abstraction that would hash rdseed/rdrand and rdtsc
output together. Except init_slotspace ran before cpu_probe, so
cpu_feature was not yet filled out, so during init_slotspace, the
only randomization was rdtsc.

In the course of the recent entropy overhaul, I replaced cpu_earlyrng
by entropy_extract, and moved cpu_init_rng much earlier -- but still
after cpu_probe -- in order to reduce the number of abstractions
lying around and the number of copies of rdrand/rdseed logic. In so
doing I added some annoying complication (see curcpu_available) to
kern_entropy.c to make it work early enough for init_slotspace, and
dropped the rdtsc.

For pmap_bootstrap that didn't substantively change anything. But
for init_slotspace, it removed the only randomization. To mitigate
this, this commit pulls the randomization out of slotspace_rand into
pmap_bootstrap and init_slotspace, so that

(a) init_slotspace can use rdtsc and a little private entropy pool in
order to restore the prior (weak) randomization it had, and

(b) pmap_bootstrap, which runs a little bit later, can continue to
use entropy_extract normally and get rdrand/rdseed too.

A subsequent commit will move cpu_init_rng just a wee bit later,
after cpu_init_msrs, so the kern_entropy.c complications can go away.
Perhaps someone else more wizardly with x86 can find a way to make
init_slotspace run a little later too, after cpu_probe and after
cpu_init_msrs and after cpu_rng_init, but I am not that wizardly.
 1.119 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.118 24-Apr-2020  maxv Give the ldt a fixed size of one page (512 slots), and drop the variable-
sized mechanism that was too complex.

This fixes a race between USER_LDT and SVS: during context switches, the
way SVS installs the new ldt relies on the ldt pointer AND the ldt size,
but both cannot be accessed atomically at the same time.
 1.117 05-Apr-2020  ad branches: 1.117.2;
Allocate PV entries in PAGE_SIZE chunks, and cache partially allocated PV
pages with the pmap. Worth about 2-3% sys time on build.sh for me.
 1.116 22-Mar-2020  ad x86 pmap:

- Give pmap_remove_all() its own version of pmap_remove_ptes() that on native
x86 does the bare minimum needed to clear out PTPs. Cuts ~4% sys time on
'build.sh release' for me.

- pmap_sync_pv(): there's no need to issue a redundant TLB shootdown. The
caller waits for the competing operation to finish.

- Bring 'options TLBSTATS' up to date.
 1.115 17-Mar-2020  ad Hallelujah, the bug has been found. Resurrect prior changes, to be fixed
with following commit.
 1.114 17-Mar-2020  ad Back out the recent pmap changes until I can figure out what is going on
with pmap_page_remove() (to pmap.c rev 1.365).
 1.113 14-Mar-2020  ad PR kern/55071 (Panic shortly after running X11 due to kernel diagnostic assertion "mutex_owned(&pp->pp_lock)")

- Fix a locking bug in pmap_pp_clear_attrs() and in pmap_pp_remove() do the
TLB shootdown while still holding the target pmap's lock.

Also:

- Finish PV list locking for x86 & update comments around same.

- Keep track of the min/max index of PTEs inserted into each PTP, and use
that to clip ranges of VAs passed to pmap_remove_ptes().

- Based on the above, implement a pmap_remove_all() for x86 that clears out
the pmap in a single pass. Makes exit() / fork() much cheaper.
 1.112 14-Mar-2020  ad pmap_remove_all(): Return a boolean value to indicate the behaviour. If
true, all mappings have been removed, the pmap is totally cleared out, and
UVM can then avoid doing the work to call pmap_remove() for each map entry.
If false, either nothing has been done, or some helpful arch-specific voodoo
has taken place.
 1.111 10-Mar-2020  ad - pmap_check_inuse() is expensive so make it DEBUG not DIAGNOSTIC.

- Put PV locking back in place with only a minor performance impact.
pmap_enter() still needs more work - it's not easy to satisfy all the
competing requirements so I'll do that with another change.

- Use pmap_find_ptp() (lookup only) in preference to pmap_get_ptp() (alloc).
Make pm_ptphint indexed by VA not PA. Replace the per-pmap radixtree for
dynamic PV entries with a per-PTP rbtree. Cuts system time during kernel
build by ~10% for me.
 1.110 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.109 12-Jan-2020  ad x86 pmap:

- It turns out that every page the pmap frees is necessarily zeroed. Tell
the VM system about this and use the pmap as a source of pre-zeroed pages.

- Redo deferred freeing of PTPs more elegantly, including the integration with
pmap_remove_all(). This fixes problems with nvmm, and possibly also a crash
discovered during fuzzing.

Reported-by: syzbot+a97186518c84f1d85c0c@syzkaller.appspotmail.com
 1.108 04-Jan-2020  ad branches: 1.108.2;
x86 pmap improvements, reducing system time during a build by about 15% on
my test machine:

- Replace the global pv_hash with a per-pmap record of dynamically allocated
pv entries. The data structure used for this can be changed easily, and
has no special concurrency requirements. For now go with radixtree.

- Change pmap_pdp_cache back into a pool; cache the page directory with the
pmap, and avoid contention on pmaps_lock by adjusting the global list in
the pool_cache ctor & dtor. Align struct pmap and its lock, and update
some comments.

- Simplify pv_entry lists slightly. Allow both PP_EMBEDDED and dynamically
allocated entries to co-exist on a single page. This adds a pointer to
struct vm_page on x86, but shrinks pv_entry to 32 bytes (which also gets
it nicely aligned).

- More elegantly solve the chicken-and-egg problem introduced into the pmap
with radixtree lookup for pages, where we need PTEs mapped and page
allocations to happen under a single hold of the pmap's lock. While here
undo some cut-n-paste.

- Don't adjust pmap_kernel's stats with atomics, because its mutex is now
held in the places the stats are changed.
 1.107 15-Dec-2019  ad uvm_pagerealloc() can now block because of radixtree manipulation, so defer
freeing PTPs until pmap_unmap_ptes(), where we still have the pmap locked
but can finally tolerate context switches again.

To be revisited soon: pmap_map_ptes() seems broken WRT other pmap load.

Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
 1.106 08-Dec-2019  ad Merge x86 pmap changes from yamt-pagecache:

- Deal better with the multi-level pmap object locking kludge.
- Handle uvm_pagealloc() being able to block.
 1.105 14-Nov-2019  maxv Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.104 13-Nov-2019  maxv Rename:
PP_ATTRS_M -> PP_ATTRS_D
PP_ATTRS_U -> PP_ATTRS_A
For consistency.
 1.103 05-Oct-2019  maxv Switch to the new PTE naming. No binary diff (tested with MKREPRO).
 1.102 07-Aug-2019  maxv Add support for USER_LDT in SVS. This allows us to have both enabled at
the same time.

We allocate an LDT for each CPU in the GDT and map an area for it, in
addition to the default LDT already present. In context switches between
different processes, we choose between the default or the per-cpu LDT
selector: if the user set specific LDT entries, we memcpy them to the
per-cpu LDT and load the per-cpu selector.

Tested by Naveen Narayanan (with Wine on amd64).
 1.101 29-May-2019  maxv branches: 1.101.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.100 10-Mar-2019  maxv Two changes:

* Allow large pages to be passed in pmap_pdes_valid, this happens under
DDB when it reads RIP (.text), called via pmap_extract.

* Invert a branch in pmap_extract, so that 'l_cpu' is not touched if we're
dealing with the kernel pmap.

This fixes 'boot -d'.
 1.99 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.98 23-Feb-2019  maxv Move PATENTRY into pmap.h, will be used outside.
 1.97 13-Feb-2019  maxv Add the EPT pmap code, used by Intel-VMX.

The idea is that under NVMM, we don't want to implement the hypervisor page
tables manually in NVMM directly, because we want pageable guests; that is,
we want to allow UVM to unmap guest pages when the host comes under
pressure.

Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from
native, and this has three important consequences:

- We can't use the native PTE bits, so each time we want to modify the
page tables, we need to know whether we're dealing with a native pmap
or an EPT pmap. This is accomplished with callbacks, that handle
everything PTE-related.

- There is no recursive slot possible, so we can't use pmap_map_ptes().
Rather, we walk down the EPT trees via the direct map, and that's
actually a lot simpler (and probably faster too...).

- The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded
on the host. This has two sub-consequences: at creation time we must
zero out all of the top-level PTEs, and at destruction time we force
the page out of the pool cache and into the pool, to ensure that a next
allocation will invoke pmap_pdp_ctor() to create a native pmap and not
recycle some stale EPT entries.

To create an EPT pmap, the caller must invoke pmap_ept_transform() on a
newly-allocated native pmap. And that's about it, from then on the EPT
callbacks will be invoked, and the pmap can be destroyed via the usual
pmap_destroy(). The TLB shootdown callback is not initialized however,
it is the responsibility of the hypervisor (NVMM) to set it.

There are some twisted cases that we need to handle. For example if
pmap_is_referenced() is called on a physical page that is entered both by
a native pmap and by an EPT pmap, we take the Accessed bits from the
two pmaps using different PTE sets in each case, and combine them into a
generic PP_ATTRS_U flag (that does not depend on the pmap type).

Given that the EPT layout is a 4-Level tree with the same address space as
native x86_64, we allow ourselves to use a few native macros in EPT, such
as pmap_pa2pte(), rather than re-defining them with "ept" in the name.

Even though this EPT code is rather complex, it is not too intrusive: just
a few callbacks in a few pmap functions, predicted-false to give priority
to native. So this comes with no messy #ifdef or performance cost.
 1.96 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.95 01-Feb-2019  maxv Add the remaining pmap callbacks, will be used by NVMM-VMX.
 1.94 01-Feb-2019  maxv Change the format of the pp_attrs field: instead of using PTE bits
directly, use abstracted bits that are converted from/to PTE bits when
needed (in pmap_sync_pv).

This allows us to use the same pp_attrs for pmaps that have PTE bits at
different locations.
 1.93 17-Dec-2018  maxv Add two pmap fields, will be used by NVMM-VMX. Also apply a few cosmetic
changes.
 1.92 06-Dec-2018  maxv Fix inconsistency, these are indexes and not types, no real functional
change.
 1.91 19-Nov-2018  maxv Introduce pl_pi, will be used soon.
 1.90 19-Nov-2018  maxv Rename 'mask' -> 'frame', we will use the real 'mask' soon.
 1.89 07-Nov-2018  maxv Add two pmap fields, will be used by NVMM.
 1.88 29-Aug-2018  maxv clean up a little
 1.87 29-Aug-2018  maxv Remove the constants of the DMAP, they are unused, and move NL4_SLOT_DIRECT
into amd64/.
 1.86 29-Aug-2018  maxv Simplify the ASLR stuff, we don't care about resizable areas now, and it
makes the code more complicated for no good reason.
 1.85 20-Aug-2018  maxv Add support for kASan on amd64. Written by me, with some parts inspired
from Siddharth Muralee's initial work. This feature can detect several
kinds of memory bugs, and it's an excellent feature.

It can be enabled by uncommenting these three lines in GENERIC:

#makeoptions KASAN=1 # Kernel Address Sanitizer
#options KASAN
#no options SVS

The kernel is compiled without SVS, without DMAP and without PCPU area.
A shadow area is created at boot time, and it can cover the upper 128TB
of the address space. This area is populated gradually as we allocate
memory. With this design the memory consumption is kept at its lowest
level.

The compiler calls the __asan_* functions each time a memory access is
done. We verify whether this access is legal by looking at the shadow
area.

We declare our own special memcpy/memset/etc functions, because the
compiler's builtins don't add the __asan_* instrumentation.

Initially all the mappings are marked as valid. During dynamic
allocations, we add a redzone, which we mark as invalid. Any access on
it will trigger a kASan error message. Additionally, the compiler adds
a redzone on global variables, and we mark these redzones as invalid too.
The illegal-access detection works with a 1-byte granularity.

For now, we cover three areas:

- global variables
- kmem_alloc-ated areas
- malloc-ated areas

More will come, but that's a good start.
 1.84 12-Aug-2018  maxv Move the PCPU area from slot 384 to slot 510, to avoid creating too much
fragmentation in the slot space (384 is in the middle of the kernel half
of the VA).
 1.83 12-Aug-2018  maxv Randomize the main memory on Xen, same as native. Tested on amd64-dom0.
 1.82 12-Aug-2018  maxv Add a new area, SLAREA_HYPV, which indicates the slots used by the
hypervisor, in our case Xen.
 1.81 21-Jul-2018  maxv More ASLR. Randomize the location of the direct map at boot time on amd64.
This doesn't need "options KASLR" and works on GENERIC. Will soon be
enabled by default.

The location of the areas is abstracted in a slotspace structure. Ideally
we should always use this structure when touching the L4 slots, instead of
the current cocktail of global variables and constants.

machdep initializes the structure with the default values, and we then
randomize its dmap entry. Ideally machdep should randomize everything at
once, but in the case of the direct map its size is determined a little
later in the boot procedure, so we're forced to randomize its location
later too.
 1.80 20-Jun-2018  maxv branches: 1.80.2;
Add and use bootspace.smodule. Initialize it in locore/prekern to better
hide the specifics from the "upper" layers. This allows for greater
flexibility.
 1.79 19-May-2018  jakllsch remove some remaining uvm_emap(9)-related function prototypes
 1.78 19-May-2018  jdolecek Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.77 08-May-2018  maxv Mitigation for the SS bug, CVE-2018-8897. We disabled dbregs a month ago
in -current and -8 so we are not particularly affected anymore.

The #DB handler runs on ist3, if we decide to process the exception we
copy the iret frame on the correct non-ist stack and continue as usual.
 1.76 04-Mar-2018  jdolecek branches: 1.76.2;
drop pmap_update_2pg(), just call pmap_update_pg() separately for each
 1.75 18-Jan-2018  maxv Unmap the kernel heap from the user page tables (SVS).

This implementation is optimized and organized in such a way that we
don't need to copy the kernel stack to a safe place during user<->kernel
transitions. We create two VAs that point to the same physical page; one
will be mapped in userland and is offset in order to contain only the
trapframe, the other is mapped in the kernel and maps the entire stack.

Sent on tech-kern@ a week ago.
 1.74 11-Jan-2018  maxv Add ist0 to pcpu_entry.
 1.73 05-Jan-2018  maxv Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.
 1.72 28-Dec-2017  maxv Use variables in PMAP_DIRECT_*, so that the location of the direct map can
change.
 1.71 11-Nov-2017  maxv Modify the layout of the bootspace structure, in such a way that it can
contain several kernel segments of the same type (eg several .text
segments). Some parts are still a bit messy but will be cleaned up soon.

I cannot compile-test this change on i386, but it seems fine enough.

NOTE: you need to rebuild and reinstall a new prekern after this change.
 1.70 29-Oct-2017  maxv Add a fifth region, called "head". On kaslr kernels it contains the ELF
Header and the ELF Section Headers. On normal kernels it is empty (the
headers are in the "boot" region).

Note: if you're using GENERIC_KASLR, you also need to rebuild the prekern.
 1.69 30-Sep-2017  maxv Add a bootspace structure. It describes the physical and virtual space
layout created by the early kernel bootstrap code. Start using it, and
eliminate several references to KERNBASE and other global symbols. While
here clean up xen-i386, it's really tiring.
 1.68 29-Sep-2017  ozaki-r Fix build

sys/arch/x86/x86/cpu.c:920:20: error: 'pmap_largepages' undeclared (first use in this function)
smp_data.large = (pmap_largepages != 0);
^
 1.67 17-Jun-2017  maxv Actually, use slot 456 instead, so that it fits a cache line.
 1.66 14-Jun-2017  maxv Give the direct map 32 slots (16TB of va). This matches MAXPHYSMEM, in
such a way that the direct map is no longer the limiting factor for high
memory systems.
 1.65 14-Jun-2017  maxv Move the direct map from slot 509 to slot 460. We will increase its size
dynamically.
 1.64 23-Mar-2017  maxv branches: 1.64.6;
Remove PG_k completely.
 1.63 05-Mar-2017  maxv Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.

On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.

However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.

Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.

With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.62 11-Feb-2017  maxv Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.
 1.61 08-Nov-2016  christos branches: 1.61.2;
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.60 19-Sep-2016  maya move function prototype to x86, so it is available to amd64 too
 1.59 25-Jul-2016  maxv The L1 entry of the first page of the data segment is overwritten for the
LAPIC page, and set as RWX+PG_N. The LAPIC pa is fixed, and its va resides
in the data segment. Because of this error-prone design, the kernel image
map is not linear, and I first thought it was a bug (as I vaguely said in
PR/51148). Using large pages for the data segment is therefore wrong, since
the first page does not actually belong to the data segment (even if its va
is in the range). This bug is not triggered currently, since local_apic is
not large-page-aligned.

We will certainly have to allocate a va dynamically instead of using the
first page of data; but for now, disable large pages on the data segment,
and map the LAPIC as RW.

This is the last x86-specific RWX page.
 1.58 01-Jul-2016  maxv branches: 1.58.2;
Define pmap_pg_nx globally. Will be used soon.
 1.57 11-Nov-2015  skrll Split out the pmap_pv_track stuff for use by others.

Discussed with riastradh@
 1.56 03-Apr-2015  riastradh Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
 1.55 17-Oct-2013  christos branches: 1.55.4; 1.55.6;
__USE() unused variables
 1.54 23-Jun-2013  uebayasi branches: 1.54.2;
Remove obsolete comment. OK'ed by rmind@.
 1.53 13-Nov-2012  chs add a pmap_kremove_local() that doesn't do TLB invalidations
on other CPUs. this is only intended for use while writing
kernel crash dumps. remove unused pmap_map().
 1.52 20-Apr-2012  rmind branches: 1.52.2;
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.51 11-Mar-2012  jym Alternate PTEs got killed a few weeks ago. Clean up unused prototypes.
 1.50 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.49 04-Dec-2011  chs branches: 1.49.2;
map all of physical memory using large pages.
ported from openbsd years ago by Murray Armfield,
updated for changes since then by me.
 1.48 23-Nov-2011  jym branches: 1.48.2;
No more users of xpmap_update(). Use pmap_pte_*() functions now.
 1.47 23-Nov-2011  jym Move Xen-specific functions to Xen pmap. Requested by cherry@.

Un'ifdef XEN in xen_pmap.c, it is always defined there.
 1.46 20-Nov-2011  jym Expose pmap_pdp_cache publicly to x86/xen pmap. Provide suspend/resume
callbacks for Xen pmap.

Turn static internal callbacks of pmap_pdp_cache.

XXX the implementation of pool_cache_invalidate(9) is still wrong, and
IMHO this needs fixing before -6. See
http://mail-index.netbsd.org/tech-kern/2011/11/18/msg011924.html
 1.45 08-Nov-2011  cherry Expose the PG_k #define pt/pd bit to both xen and "baremetal" x86. This is required, since kernel pages are mapped with user permissions in XEN/amd64 since the VM kernel runs in ring3. Since XEN/i386(including PAE) runs in ring1, supervisor mode is appropriate for these ports. We need to share this since the pmap implementation is still shared. Once the xen implementation is sufficiently independant of the x86 one, this can be made private to xen/include/xenpmap.h
 1.44 06-Nov-2011  cherry [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
 1.43 18-Oct-2011  jym branches: 1.43.2;
Make "pmaps" (list of non-kernel pmaps) and "pmaps_lock" externally
visible. Required by pmap MD code that could reside in other
files, notably Xen's pmap.
 1.42 20-Sep-2011  jym Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.41 13-Aug-2011  cherry Add locking around ops to the hypervisor MMU "queue".
 1.40 13-Jun-2011  tls Fix Xen kernel builds (pmap_is_curpmap can't be static)
 1.39 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.38 07-May-2011  jym branches: 1.38.2;
Do as the comment says, use ilog2(). This gets optimized directly at
compile time, no call to fls() is needed.
 1.37 25-Apr-2011  yamt comment
 1.36 25-Apr-2011  yamt remove unused ptei
 1.35 11-Feb-2011  jmcneill add bus_space_mmap support for BUS_SPACE_MAP_PREFETCHABLE, ok matt@
 1.34 01-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
remove no-longer-valid wustl email address for me.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.33 24-Jul-2010  jym branches: 1.33.2; 1.33.4;
Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.32 15-Jul-2010  jym Make the comment about PDPpaddr more thorough.
 1.31 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.30 10-May-2010  dyoung Provide pmap_enter_ma(), pmap_extract_ma(), pmap_kenter_ma() in all x86
kernels, and use them in the bus_space(9) implementation instead of ugly
Xen #ifdef-age. In a non-Xen kernel, the _ma() functions either call or
alias the equivalent _pa() functions.

Reviewed on port-xen@netbsd.org and port-i386@netbsd.org. Passes
rmind@'s and bouyer@'s inspection. Tested on i386 and on Xen DOMU /
DOM0.
 1.29 09-Feb-2010  jym branches: 1.29.2;
Fix typos in comments.
 1.28 11-Nov-2009  cegger branches: 1.28.2;
update comment: we use PMAP_NOCACHE for both pmap_enter and pmap_kenter_pa
 1.27 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.26 19-Jul-2009  rmind pmap_emap_sync: add an argument, and do not perform pmap_load() during
context switch (pmap_destroy() path seems to be unsafe), instead just
perform tlbflush(). Slightly inefficient, but good enough for now.
 1.25 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.24 22-Apr-2009  cegger change pmap flags argument from int to u_int.
forgot to commit this.
 1.23 18-Apr-2009  cegger Introduce PMAP_NOCACHE as first PMAP MD bit in x86. Make use of it in pmap_enter().
This safes one extra TLB flush when mapping dma-safe memory.
Presented on tech-kern@, port-i386@ and port-amd64@
ok ad@
 1.22 21-Mar-2009  ad PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash

Fix numerous problems:

1. LDT updates are not atomic.

2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.

3. LDTR can be leaked over context switch.

4. GDT slot allocations can race, giving the same LDT slot to two procs.

5. Incomplete interrupt/trap frames can be stacked.

6. In some rare cases segment faults are not handled correctly.
 1.21 09-Dec-2008  pooka branches: 1.21.2;
Make pmap_kernel() a MI macro for struct pmap *kernel_pmap_ptr,
which is now the "API" provided by the pmap module. pmap_kernel()
remains as the syntactic sugar.

Bonus cosmetics round: move all the pmap_t pointer typedefs into
uvm_pmap.h.

Thanks to Greg Oster for providing cpu muscle for doing test builds.
 1.20 16-Sep-2008  bouyer branches: 1.20.2; 1.20.4;
Implement the arch-dependent p2m frame lists list. This adds support for
'xm dump-core' for NetBSD domUs.
From Jean-Yves Migeon (jean-yves dot migeon at espci dot fr)
 1.19 24-Jun-2008  jmcneill branches: 1.19.2;
Define PMAP_FORK -- this was lost in the vmlocking merge, and is required
by options USER_LDT.
 1.18 05-Jun-2008  ad branches: 1.18.2;
pmap_remove_all() for x86. Also, always defer freeing ptps to pmap_update().
There may be a better way to do this, but for now this is simple and avoids
potential bugs.

Proposed on tech-kern and discussed with chs@.
 1.17 04-Jun-2008  ad Revert unintentional change.
 1.16 04-Jun-2008  ad vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.15 02-Jun-2008  ad - Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.
 1.14 03-May-2008  ad branches: 1.14.2;
Back out previous which was not thought through properly.
 1.13 03-May-2008  ad Implement pmap_remove_all().
 1.12 23-Jan-2008  bouyer branches: 1.12.6; 1.12.8; 1.12.10;
Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).
 1.11 20-Jan-2008  yamt - rewrite P->V tracking.
- use a hash rather than SPLAY trees.
SPLAY tree is a wrong algorithm to use here.
will be revisited if it slows down anything other than
micro-benchmarks.
- optimize the single mapping case (it's a common case) by
embedding an entry into mdpage.
- don't keep a pmap pointer as it can be obtained from ptp.
(discussed on port-i386 some years ago.)
ideally, a single paddr_t should be enough to describe a pte.
but it needs some more thoughts as it can increase computational
costs.
- pmap_enter: simplify and fix races with pmap_sync_pv.
- don't bother to lock pm_obj[i] where i > 0, unless DIAGNOSTIC.
- kill mp_link to save space.
- add many KASSERTs.
 1.10 11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.9 08-Jan-2008  yamt kill unused PMF_USER_RELOAD.
 1.8 02-Jan-2008  yamt g/c pv_page stuffs.
 1.7 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.6 09-Dec-2007  jmcneill branches: 1.6.2;
Merge jmcneill-pm branch.
 1.5 22-Nov-2007  bouyer branches: 1.5.2; 1.5.4;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.4 15-Nov-2007  ad Remove support for 80386 level CPUs. PR port-i386/36163.
 1.3 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.2 18-Oct-2007  yamt branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.
 1.1 08-Oct-2007  yamt branches: 1.1.2; 1.1.4;
file pmap.h was initially added on branch yamt-x86pmap.
 1.1.4.4 18-Nov-2007  bouyer Sync with HEAD
 1.1.4.3 13-Nov-2007  bouyer Sync with HEAD
 1.1.4.2 25-Oct-2007  bouyer Finish sync with HEAD. Especially use the new x86 pmap for xenamd64.
For this:
- rename pmap_pte_set() to pmap_pte_testset()
- make pmap_pte_set() a function or macro for non-atomic PTE write
- define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries
- define pmap_pte_flush() which is a nop in x86 case, and flush the
MMUops queue in the Xen case
 1.1.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.1.2.3 18-Oct-2007  yamt #ifdef out an unused member for x86_64.
 1.1.2.2 14-Oct-2007  yamt move pl_i_roundup to a header.
 1.1.2.1 08-Oct-2007  yamt merge some parts of x86 pmap.h.
 1.2.10.5 23-Mar-2008  matt sync with HEAD
 1.2.10.4 09-Jan-2008  matt sync with HEAD
 1.2.10.3 08-Nov-2007  matt sync with -HEAD
 1.2.10.2 06-Nov-2007  matt sync with HEAD
 1.2.10.1 18-Oct-2007  matt file pmap.h was added on branch matt-armv6 on 2007-11-06 23:23:38 +0000
 1.2.8.4 18-Feb-2008  mjf Sync with HEAD.
 1.2.8.3 27-Dec-2007  mjf Sync with HEAD.
 1.2.8.2 08-Dec-2007  mjf Sync with HEAD.
 1.2.8.1 19-Nov-2007  mjf Sync with HEAD.
 1.2.6.6 04-Feb-2008  yamt sync with head.
 1.2.6.5 21-Jan-2008  yamt sync with head
 1.2.6.4 07-Dec-2007  yamt sync with head
 1.2.6.3 15-Nov-2007  yamt sync with head.
 1.2.6.2 27-Oct-2007  yamt sync with head.
 1.2.6.1 18-Oct-2007  yamt file pmap.h was added on branch yamt-lazymbuf on 2007-10-27 11:28:56 +0000
 1.2.4.6 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.4.5 21-Nov-2007  joerg Sync with HEAD.
 1.2.4.4 11-Nov-2007  joerg Sync with HEAD.
 1.2.4.3 28-Oct-2007  joerg Cosmetic: reduce diff to HEAD.
 1.2.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.2.4.1 18-Oct-2007  joerg file pmap.h was added on branch jmcneill-pm on 2007-10-26 15:43:44 +0000
 1.2.2.4 03-Dec-2007  ad Sync with HEAD.
 1.2.2.3 24-Oct-2007  ad Use a pool_cache to allocate pv entries. PR port-i386/37193.
 1.2.2.2 23-Oct-2007  ad Sync with head.
 1.2.2.1 18-Oct-2007  ad file pmap.h was added on branch vmlocking on 2007-10-23 20:36:40 +0000
 1.5.4.1 11-Dec-2007  yamt sync with head.
 1.5.2.1 26-Dec-2007  ad Sync with head.
 1.6.2.7 20-Jan-2008  bouyer Sync with HEAD
 1.6.2.6 17-Jan-2008  bouyer - Fix L2_SLOT_APTE value (not sure how I got this value but it was definitively
wrong)
- Use global variable for the PAE L3 page adresses, so that pmap.c can get it
from the bootstrap code
- Extent the size of our virtual PDP from 3 to 4 pages, so that pmap->pm_pdir[]
is contigous for the whole VA range. The last page is a shadow of
the kernel's real PDP (L3[3]).
- make pm_pdirpa an array of 4 paddr_t if using PAE. introduce a
pmap_pdirpa macro to get the physical address of a given PD entry.
- fix pmap_map_pte

The kernel now boots single-user. fsck will cause a kernel fault in
pmap_pdes_invalid() on exit.
 1.6.2.5 13-Jan-2008  bouyer Work in progress on xeni386 PAE support:
Make xeni386 build with a 64bit paddr_t. For this vaddr_t vs paddr_t vs
pointers usages had to be clarified.
If 'options PAE' is present in a Xen3 kernel, switch paddr_t, pd_entry_t
and pt_entry_t to 64bits, and add the PAE entry in the __xen_guest ELF section.
 1.6.2.4 10-Jan-2008  bouyer Sync with HEAD
 1.6.2.3 02-Jan-2008  bouyer Sync with HEAD
 1.6.2.2 13-Dec-2007  bouyer - make amd64 XEN3 kernels build again
- pin the pdp pages in the PDP cache contructor, and unpin them in the
destructor. garbage-collect PMF_USER_XPIN.
 1.6.2.1 11-Dec-2007  bouyer Switch i386 to x86/x86/pmap.c
 1.12.10.5 11-Aug-2010  yamt sync with head.
 1.12.10.4 11-Mar-2010  yamt sync with head
 1.12.10.3 19-Aug-2009  yamt sync with head.
 1.12.10.2 18-Jul-2009  yamt sync with head.
 1.12.10.1 04-May-2009  yamt sync with head.
 1.12.8.2 17-Jun-2008  yamt sync with head.
 1.12.8.1 04-Jun-2008  yamt sync with head
 1.12.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.12.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.12.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.12.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.14.2.3 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.14.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.14.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.18.2.1 27-Jun-2008  simonb Sync with head.
 1.19.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.19.2.1 19-Oct-2008  haad Sync with HEAD.
 1.20.4.1 04-Apr-2009  snj Pull up following revision(s) (requested by ad in ticket #656):
sys/arch/amd64/amd64/gdt.c: revision 1.21 via patch
sys/arch/amd64/amd64/machdep.c: revision 1.129 via patch
sys/arch/i386/i386/gdt.c: revision 1.47 via patch
sys/arch/i386/i386/kvm86.c: revision 1.17 via patch
sys/arch/i386/i386/locore.S: revision 1.85 via patch
sys/arch/i386/i386/machdep.c: revision 1.666 via patch
sys/arch/i386/i386/vector.S: revision 1.45 via patch
sys/arch/i386/include/pcb.h: revision 1.47 via patch
sys/arch/x86/include/pmap.h: revision 1.22 via patch
sys/arch/x86/include/sysarch.h: revision 1.8 via patch
sys/arch/x86/x86/pmap.c: revision 1.80 via patch
sys/arch/x86/x86/sys_machdep.c: revision 1.17 via patch
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.143 via patch
sys/kern/init_main.c: revision 1.384 via patch
PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash
Fix numerous problems:
1. LDT updates are not atomic.
2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.
3. LDTR can be leaked over context switch.
4. GDT slot allocations can race, giving the same LDT slot to two procs.
5. Incomplete interrupt/trap frames can be stacked.
6. In some rare cases segment faults are not handled correctly.
 1.20.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.20.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.21.2.11 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.21.2.10 26-May-2011  jym Pull-up some modifications from -current to my branch.
 1.21.2.9 02-May-2011  jym Sync with head.
 1.21.2.8 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.21.2.7 24-Oct-2010  jym Sync with HEAD
 1.21.2.6 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.21.2.5 01-Nov-2009  jym Sync with HEAD.
 1.21.2.4 24-Jul-2009  jym - rework the page pinning API, so that now a function is provided for
each level of indirection encountered during virtual memory translations. Update
pmap accordingly. Pinning looks cleaner that way, and it offers the possibility
to pin lower level pages if necessary (NetBSD does not do it currently).

- some fixes and comments to explain how page validation/invalidation take
place during save/restore/migrate under Xen. L2 shadow entries from PAE are now
handled, so basically, suspend/resume works with PAE.

- fixes an issue reported by Christoph (cegger@) for xencons suspend/resume
in dom0.

TODO:

- PAE save/restore is currently limited to single-user only, multi-user
support requires modifications in PAE pmap that should be discussed first. See
the comments about the L2 shadow pages cached in pmap_pdp_cache in this commit.

- grant table bug is still there; do not use the kernels of this branch
to test suspend/resume, unless you want to experience bad crashes in dom0,
and push the big red button.

Now there is light at the end of the tunnel :)

Note: XEN2 kernels will neither build nor work with this branch.
 1.21.2.3 23-Jul-2009  jym Sync with HEAD.
 1.21.2.2 31-May-2009  jym Modifications for the Xen suspend/migrate/resume branch:

- introduce xenbus_device_{suspend,resume}() functions. These are routines
used to suspend/resume MI parts of the Xenbus device interfaces, like updating
frontend/backend devices' paths found in XenStore.

- introduce HYPERVISOR_sysctl(), an hypercall used only by Xentools to obtain
information from hypervisor (listing VMs, printing console, etc.). I use it
to query xenconsole from ddb(), as a last resort in case of a panic() in
dom0 (xm being not available). Currently unused in the branch; could be, if
requested.

- disable the rwlock(9) used to protect code that could use transient MFNs.
It could trigger nasty context switches in place it should not to.

- fix some bugs in the xennet/xbd suspend/resume pmf(9) handlers.

- following XenSource's design, talk_to_otherend() is now called
watch_otherend(), and free_otherend_details() is used by Xenbus device
suspend/resume routines.

- some slight modifications in pmap regarding APDP. Introduce an inline
function (pmap_unmap_apdp_pde()) that clears APDP entry for the current pmap.

- similarly, implement pmap_unmap_all_apdp_pdes() that iterates through all
pmaps and tears down APDP, as Xen does not handle them properly.

TODO/XXX:

- pmap_unmap_apdp_pde() does not handle APDP shadow entry of PAE. It will,
once I figure out how PAE uses it.

- revisit the pmap locking issue regarding transient MFNs. As NetBSD does not
use kernel preemption and MP for Xen, this could be skipped momentarily. See
http://mail-index.netbsd.org/port-xen/2009/04/27/msg004903.html for details.

- fix a bug regarding grant tables which could technically DoS a dom0 if
ridiculously high consumer/producer indexes are passed down in the ring during
a resume.

All in all, once the grant table index issue and APDP PAE are fixed, next step
is to torture test this branch.

Tested under i386 PAE and non-PAE, Xen3 dom0 and domU. amd64 is only compile
tested.
 1.21.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.28.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.28.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.29.2.11 31-May-2011  rmind sync with head
 1.29.2.10 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.29.2.9 17-Mar-2011  rmind - Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE)
is not (yet) enabled. This fixes the issue of stale TLB entry, experienced
early on boot, when PGE is not yet set on primary CPU.
- Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly),
which simplifies and unifies (under x86) code, plus fixes few bugs.
- cpu_attach: remove assignment to cpus_running, as primary CPU might not be
attached first, which causes reset (and thus missed secondary CPUs).
 1.29.2.8 08-Mar-2011  rmind struct pmap_tlb_mailbox: make tm_pending and tm_gen volatile.
 1.29.2.7 05-Mar-2011  rmind sync with head
 1.29.2.6 31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.29.2.5 30-May-2010  rmind sync with head
 1.29.2.4 26-May-2010  rmind Split x86 TLB shootdown code into a separate file.
Code part is under TNF license, as per pmap.c 1.105.2.4 revision.
 1.29.2.3 26-Apr-2010  rmind Partly rewrite amd64 TLB shutdown handler for the changes in x86 pmap.
At this point, branch seems to pass preliminar stress tests on amd64.
 1.29.2.2 26-Apr-2010  rmind Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap,
also provide TLBSTATS option to measure and track TLB shootdowns. Details:

http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html

Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.

XXX: amd64 and xen are not yet; work in progress.
 1.29.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.33.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.33.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.33.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.38.2.3 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.38.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.38.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.43.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.43.2.5 16-Jan-2013  yamt sync with (a bit old) head
 1.43.2.4 23-May-2012  yamt sync with head.
 1.43.2.3 17-Apr-2012  yamt sync with head
 1.43.2.2 18-Nov-2011  yamt share a lock among pmap uobjs
 1.43.2.1 10-Nov-2011  yamt sync with head
 1.48.2.3 29-Apr-2012  mrg sync to latest -current.
 1.48.2.2 05-Apr-2012  mrg sync to latest -current.
 1.48.2.1 18-Feb-2012  mrg merge to -current.
 1.49.2.3 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1441):
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
sys/arch/x86/include/pmap.h: revision 1.63 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.49.2.2 09-May-2012  riz branches: 1.49.2.2.4; 1.49.2.2.6;
Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.49.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.49.2.2.6.1 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1441):
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
sys/arch/x86/include/pmap.h: revision 1.63 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.49.2.2.4.1 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1441):
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
sys/arch/x86/include/pmap.h: revision 1.63 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.52.2.3 03-Dec-2017  jdolecek update from HEAD
 1.52.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.52.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.54.2.1 18-May-2014  rmind sync with head
 1.55.6.6 28-Aug-2017  skrll Sync with HEAD
 1.55.6.5 05-Dec-2016  skrll Sync with HEAD
 1.55.6.4 05-Oct-2016  skrll Sync with HEAD
 1.55.6.3 09-Jul-2016  skrll Sync with HEAD
 1.55.6.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.55.6.1 06-Apr-2015  skrll Sync with HEAD
 1.55.4.3 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1388):
sys/arch/x86/x86/pmap.c: revision 1.241
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.55.4.2 18-Dec-2016  snj Pull up following revision(s) (requested by riastradh in ticket #1316):
sys/arch/x86/x86/pmap.c: revision 1.223
sys/arch/x86/x86/vm_machdep.c: revision 1.26
sys/arch/x86/include/pmap.h: revision 1.61
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.55.4.1 23-Apr-2015  snj branches: 1.55.4.1.2; 1.55.4.1.4;
Pull up following revision(s) (requested by mrg in ticket #718):
sys/arch/x86/include/pmap.h: revision 1.56
sys/arch/x86/x86/pmap.c: revision 1.188
sys/dev/pci/agp_amd64.c: revision 1.8
sys/dev/pci/agp_i810.c: revision 1.118
sys/external/bsd/drm2/dist/drm/i915/i915_dma.c: revision 1.16
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: revision 1.29
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_agp.c: revision 1.3
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_ttm.c: revision 1.4
sys/external/bsd/drm2/dist/drm/radeon/atombios_crtc.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_agp.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_display.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_legacy_crtc.c: revision 1.2
sys/external/bsd/drm2/dist/drm/radeon/radeon_object.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_ttm.c: revision 1.7
sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c: revisions 1.7-1.10
sys/external/bsd/drm2/dist/drm/ttm/ttm_bo_util.c: revision 1.5
sys/external/bsd/drm2/i915drm/intelfb.c: revision 1.13
sys/external/bsd/drm2/include/drm/drm_wait_netbsd.h: revisions 1.12, 1.13
sys/external/bsd/drm2/include/linux/mm.h: revision 1.5
sys/external/bsd/drm2/include/linux/pci.h: revisions 1.16, 1.17
sys/external/bsd/drm2/nouveau/nouveaufb.c: revision 1.2
sys/external/bsd/drm2/radeon/radeon_pci.c: revisions 1.8, 1.9
sys/uvm/uvm_init.c: revision 1.46
Hack against the blank console problem:
Leave the CLUT alone on ancient cards. At least this leaves us with a
semi working console (red and blue are flipped). Leave an example of what
seems to be happening but disable it because colors are better than 444 bit
greyscale.
--
Initialize P->V tracking for unmanaged device pages in uvm_init.

Conditional on __HAVE_PMAP_PV_TRACK until we add it to all pmaps.

MI part of pmap_pv(9) change proposed on tech-kern:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
--
Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
--
Use pmap_pv(9) to remove mappings of Intel graphics aperture pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html

Further background at:

https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html
--
Use pmap_pv(9) to remove mappings of device pages in TTM.

Adapt nouveau and radeon to do pmap_pv_track for their device pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html

Further background at:

https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html
--
Fix error branches in agp_amd64.c.

- agp_generic_detach always.
- Free asc if it was allocated. (Found by Brainy, noted by maxv@.)
- Free the GATT if it was allocated.
--
pmf_device_register returns false on failure, not true
--
In DRM_SPIN_WAIT_ON, don't stop after waiting only one tick.

Continue the loop to recheck the condition and count the whole
duration.
--
Don't use the video BIOS memory as an i915 flush page!
--
Don't let anyone else allocate the video BIOS either.
--
Missed a zero: it's 0x100000, not 0x10000.
--
Don't reserve if atomic -- caller must have pre-pinned the buffer.
--
Don't reserve if atomic -- caller must have pre-pinned the buffer.
--
almost add radeondrmkms suspend/resume support. it unfortunately doesn't work.
--
Need the page's uvm object lock to do pmap_page_protect.
--
Use KASSERTMSG to show bad base/offset.
--
KASSERT about page-alignment on initialization too.
--
Don't break when hardclock_ticks wraps around.

Since we now only count time spent in wait, rather than determining
the end time and checking whether we've passed it, timeouts might be
marginally longer in effect. Unlikely to be an issue.
--
Remove broken drm2 vm_mmap stub. Can't possibly have ever worked.
--
apply some of the additional changes from Arto Huusko in PR#49645:
- call pmf_device_deregister on detach.

i've kept the "resume = true" for radeon_resume_kms() call as it
seems to work for me (indeed, code inspection shows it is unused
on netbsd :-)

my old nforce4 box that can resume old drm (or could, last i tried
several years ago) while X and GL apps were running, can at least
survive a resume if X hasn't started. my one attempt so far with
X exited, but having run, did not work.
--
First attempt to make ttm_buffer_object_transfer less bogus.
--
Make sure mem.bus.is_iomem is initialized. PR 49833
 1.55.4.1.4.2 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.55.4.1.4.1 18-Jan-2017  skrll Sync with netbsd-5
 1.55.4.1.2.2 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1388):
sys/arch/x86/include/pmap.h: revision 1.63 via patch
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.55.4.1.2.1 18-Dec-2016  snj Pull up following revision(s) (requested by riastradh in ticket #1316):
sys/arch/x86/x86/pmap.c: revision 1.223
sys/arch/x86/x86/vm_machdep.c: revision 1.26
sys/arch/x86/include/pmap.h: revision 1.61
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.58.2.5 26-Apr-2017  pgoyette Sync with HEAD
 1.58.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.58.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.58.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.58.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.61.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.64.6.2 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.64.6.1 16-Mar-2018  martin Pull up the following revisions (via patch), requested by maxv in #635:

sys/arch/amd64/amd64/gdt.c 1.39-1.45 (patch)
sys/arch/amd64/amd64/amd64/machdep.c 1.284,1.287,1.288 (patch)
sys/arch/amd64/amd64/include/param.h 1.23 (patch)
sys/arch/amd64/include/types.h 1.53 (patch)
sys/arch/x86/include/cpu.h 1.87 (patch)
sys/arch/x86/include/pmap.h 1.73,1.74 (patch)
sys/arch/x86/x86/cpu.c 1.142 (patch)
sys/arch/x86/x86/intr.c 1.117 (partial),1.120 (patch)
sys/arch/x86/x86/pmap.c 1.276 (patch)

Initialize ist0 in cpu_init_tss.
Backport __HAVE_PCPU_AREA.
 1.76.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.76.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.76.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.76.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.76.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.76.2.1 21-May-2018  pgoyette Sync with HEAD
 1.80.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.80.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.80.2.1 10-Jun-2019  christos Sync with HEAD
 1.101.2.1 31-May-2020  martin Pull up following revision(s) (requested by bouyer in ticket #935):

sys/arch/xen/x86/x86_xpmap.c: revision 1.89
sys/arch/x86/include/pmap.h: revision 1.121
sys/arch/xen/xen/privcmd.c: revision 1.58
sys/external/mit/xen-include-public/dist/xen/include/public/memory.h: revision 1.2
sys/arch/xen/include/xenpmap.h: revision 1.44
sys/arch/xen/include/xenio.h: revision 1.12
sys/arch/x86/x86/pmap.c: revision 1.394
(all via patch)

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()

Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().

Implement new ioctl, needed by Xen 4.13:
IOCTL_PRIVCMD_MMAPBATCH_V2
IOCTL_PRIVCMD_MMAP_RESOURCE
IOCTL_GNTDEV_MMAP_GRANT_REF
IOCTL_GNTDEV_ALLOC_GRANT_REF

Always enable declarations needed by privcmd.c
 1.108.2.2 29-Feb-2020  ad Sync with head.
 1.108.2.1 17-Jan-2020  ad Sync with head.
 1.117.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.125.6.1 13-May-2021  thorpej Sync with HEAD.
 1.5 04-Oct-2023  ad Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
 1.4 24-Sep-2022  riastradh x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.
 1.3 13-Sep-2022  riastradh x86/pmap.h: Need machine/cpufunc.h for invlpg.
 1.2 20-Aug-2022  riastradh x86: Move definition of struct pmap to pmap_private.h.

This makes pmap_resident_count and pmap_wired_count out-of-line
functions instead of inline. No functional change intended
otherwise.
 1.1 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.17 17-Mar-2020  ad Hallelujah, the bug has been found. Resurrect prior changes, to be fixed
with following commit.
 1.16 17-Mar-2020  ad Back out the recent pmap changes until I can figure out what is going on
with pmap_page_remove() (to pmap.c rev 1.365).
 1.15 15-Mar-2020  ad - pmap_enter(): Remove cosmetic differences between the EPT & native cases.
Remove old code to free PVEs that should not be there that caused panics
(merge error moving between source trees on my part).

- pmap_destroy(): pmap_remove_all() doesn't work for EPT yet, so need to catch
up on deferred PTP frees manually in the EPT case.

- pp_embedded: Remove it. It's one more variable to go wrong and another
store to be made. Just check for non-zero PTP pointer & non-zero VA
instead.
 1.14 14-Mar-2020  ad PR kern/55071 (Panic shortly after running X11 due to kernel diagnostic assertion "mutex_owned(&pp->pp_lock)")

- Fix a locking bug in pmap_pp_clear_attrs() and in pmap_pp_remove() do the
TLB shootdown while still holding the target pmap's lock.

Also:

- Finish PV list locking for x86 & update comments around same.

- Keep track of the min/max index of PTEs inserted into each PTP, and use
that to clip ranges of VAs passed to pmap_remove_ptes().

- Based on the above, implement a pmap_remove_all() for x86 that clears out
the pmap in a single pass. Makes exit() / fork() much cheaper.
 1.13 10-Mar-2020  ad - pmap_check_inuse() is expensive so make it DEBUG not DIAGNOSTIC.

- Put PV locking back in place with only a minor performance impact.
pmap_enter() still needs more work - it's not easy to satisfy all the
competing requirements so I'll do that with another change.

- Use pmap_find_ptp() (lookup only) in preference to pmap_get_ptp() (alloc).
Make pm_ptphint indexed by VA not PA. Replace the per-pmap radixtree for
dynamic PV entries with a per-PTP rbtree. Cuts system time during kernel
build by ~10% for me.
 1.12 23-Feb-2020  ad The PV locking changes are expensive and not needed yet, so back them
out for the moment. I want to find a cheaper approach.
 1.11 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.10 12-Jan-2020  ad x86 pmap:

- It turns out that every page the pmap frees is necessarily zeroed. Tell
the VM system about this and use the pmap as a source of pre-zeroed pages.

- Redo deferred freeing of PTPs more elegantly, including the integration with
pmap_remove_all(). This fixes problems with nvmm, and possibly also a crash
discovered during fuzzing.

Reported-by: syzbot+a97186518c84f1d85c0c@syzkaller.appspotmail.com
 1.9 04-Jan-2020  ad branches: 1.9.2;
x86 pmap improvements, reducing system time during a build by about 15% on
my test machine:

- Replace the global pv_hash with a per-pmap record of dynamically allocated
pv entries. The data structure used for this can be changed easily, and
has no special concurrency requirements. For now go with radixtree.

- Change pmap_pdp_cache back into a pool; cache the page directory with the
pmap, and avoid contention on pmaps_lock by adjusting the global list in
the pool_cache ctor & dtor. Align struct pmap and its lock, and update
some comments.

- Simplify pv_entry lists slightly. Allow both PP_EMBEDDED and dynamically
allocated entries to co-exist on a single page. This adds a pointer to
struct vm_page on x86, but shrinks pv_entry to 32 bytes (which also gets
it nicely aligned).

- More elegantly solve the chicken-and-egg problem introduced into the pmap
with radixtree lookup for pages, where we need PTEs mapped and page
allocations to happen under a single hold of the pmap's lock. While here
undo some cut-n-paste.

- Don't adjust pmap_kernel's stats with atomics, because its mutex is now
held in the places the stats are changed.
 1.8 02-Jan-2020  ad Back the pv_hash stuff out. Now seeing errors from ATOMIC_*.
For another day.
 1.7 02-Jan-2020  ad Replace the pv_hash_locks with atomic ops.

Leave the hash table at the same size for now: with the hash table size
doubled, system time for a build drops 10-15%, but user time starts to rise
suspiciously, presumably because the cache is wrecked. Need to try another
data structure.
 1.6 13-Nov-2019  maxv Rename:
PP_ATTRS_M -> PP_ATTRS_D
PP_ATTRS_U -> PP_ATTRS_A
For consistency.
 1.5 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.4 01-Feb-2019  maxv Change the format of the pp_attrs field: instead of using PTE bits
directly, use abstracted bits that are converted from/to PTE bits when
needed (in pmap_sync_pv).

This allows us to use the same pp_attrs for pmaps that have PTE bits at
different locations.
 1.3 12-Jun-2011  rmind branches: 1.3.54;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.2 28-Jan-2008  yamt branches: 1.2.2; 1.2.10; 1.2.28; 1.2.36; 1.2.46;
save a word in pv_entry by making pv_hash SLIST.

although this can slow down pmap_sync_pv if hash lists get long,
we should keep them short anyway.
 1.1 20-Jan-2008  yamt branches: 1.1.2; 1.1.4;
- rewrite P->V tracking.
- use a hash rather than SPLAY trees.
SPLAY tree is a wrong algorithm to use here.
will be revisited if it slows down anything other than
micro-benchmarks.
- optimize the single mapping case (it's a common case) by
embedding an entry into mdpage.
- don't keep a pmap pointer as it can be obtained from ptp.
(discussed on port-i386 some years ago.)
ideally, a single paddr_t should be enough to describe a pte.
but it needs some more thoughts as it can increase computational
costs.
- pmap_enter: simplify and fix races with pmap_sync_pv.
- don't bother to lock pm_obj[i] where i > 0, unless DIAGNOSTIC.
- kill mp_link to save space.
- add many KASSERTs.
 1.1.4.3 04-Feb-2008  yamt sync with head.
 1.1.4.2 21-Jan-2008  yamt sync with head
 1.1.4.1 20-Jan-2008  yamt file pmap_pv.h was added on branch yamt-lazymbuf on 2008-01-21 09:40:09 +0000
 1.1.2.2 20-Jan-2008  bouyer Sync with HEAD
 1.1.2.1 20-Jan-2008  bouyer file pmap_pv.h was added on branch bouyer-xeni386 on 2008-01-20 17:51:26 +0000
 1.2.46.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.2.36.1 25-Apr-2010  rmind Drop per-"MD page" (i.e. struct pmap_page) locking i.e. pp_lock/pp_unlock
and rely on locking provided by upper layer, UVM. Sprinkle asserts.
 1.2.28.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.2.10.2 23-Mar-2008  matt sync with HEAD
 1.2.10.1 28-Jan-2008  matt file pmap_pv.h was added on branch matt-armv6 on 2008-03-23 02:04:28 +0000
 1.2.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.2.2.1 28-Jan-2008  mjf file pmap_pv.h was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.3.54.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.54.1 10-Jun-2019  christos Sync with HEAD
 1.9.2.2 29-Feb-2020  ad Sync with head.
 1.9.2.1 17-Jan-2020  ad Sync with head.
 1.13 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.12 26-Oct-2010  jruoho branches: 1.12.2; 1.12.4;
Remove some unused (ACPI) constants.
 1.11 20-Aug-2010  jruoho Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.10 19-Aug-2010  jruoho Add sysctl-glue for interaction with the acpicpu(4).
 1.9 24-Mar-2007  xtraeme branches: 1.9.52; 1.9.58; 1.9.60;
* Remove the WRITE_FIDVID macro from powernow.h and use it in in the
powernow_k8 driver (much better than undeffing and write it again).
* Fix the WRITE_FIDVID macro, I changed it to use the third argument
for the bitmask, but it's not correct.

Last change should fix the problem reported by FUKUMOTO Atsushi.
 1.8 18-Mar-2007  xtraeme There's no need to run est_init or k8_powernow_init on each CPU.
Just run it once (in the first cpu probed) with the RUN_ONCE(9)
framework.

Change the argument of est_init and k8_powernow_init to void, we don't
need cpu_info * anymore.

Suggested by tls@ and mrg@.
 1.7 18-Mar-2007  xtraeme Change k8_powernow_init to accept a struct cpu_info * as argument,
so that in the informative messages it prints the correct cpu
and not curcpu().

This fixes the first part of PR kern/35676.
 1.6 04-Oct-2006  cube branches: 1.6.2; 1.6.4; 1.6.6; 1.6.10; 1.6.12; 1.6.14;
Rework the way PowerNow! and Cool'n'Quiet features are detected and
displayed, to make the code much simpler and easier to follow. Also, use
bitmask_printf() to make output consistent with other stuff. Use
CPUID2FAMILY() where appropriate.
 1.5 27-Aug-2006  xtraeme branches: 1.5.2; 1.5.4; 1.5.6;
Update powernow module with POWERNOW_K7 and POWERNOW_K8 support.
Works fine on amd64 cpus running in 32-bit mode.

Tested by Joel Carnat.
 1.4 23-Aug-2006  xtraeme - Move k7_powernow_* prototypes from i386/include/cpu.h to
x86/include/powernow.h
- Protect k[78]_powernow_init() functions with #ifdef POWERNOW_K[78] to
make it build without these options.

This fixes the problem reported by hubertf.
 1.3 08-Aug-2006  cube branches: 1.3.2;
files.x86 isn't included by Xen kernels, so opt_powernow_k8.h never gets
created by config(1), and thus it's not safe to use it in cpuvar.h.

Simply declare the prototype for k8_powernow_init in powernow.h. No need
to #ifdef protect a prototype, after all, only its users.

Un-breaks build of Xen kernels.
 1.2 07-Aug-2006  xtraeme branches: 1.2.2;
* Do not change struct powernow_pst_s (I added another member in my
previous patch) and this MUST be of that size, otherwise the tables
won't be found.

* powernow_k8.c moved into x86/x86, it should work both i386 and amd64.

* Added more DPRINTFs needed to found the first problem.

* Create "machdep.powernow.frequency" again, I can't remember why I
removed frequency... it should work with estd now.

* Do not try to call k[78]_powernow_init() if cpu is not AMD (thanks
to christos).

And more things I can't remember, but this time it will work in
Athlon 64 cpus and it won't crash in EM64T cpus.
 1.1 06-Aug-2006  xtraeme AMD PowerNow!/Cool`n'Quiet driver for NetBSD/amd64,
adapted from OpenBSD.

Tested on a few machines:

http://bigbird.dohd.org:3021/NetBSD/dmesg
http://www.bsd.org.il/netbsd/acpi/dmesg

Thanks to cube, elad and others for testing and fixes.

Enabled by default on GENERIC.
 1.2.2.3 30-Aug-2006  tron Pull up following revision(s) (requested by xtraeme in ticket #74):
sys/lkm/arch/i386/powernow/Makefile: revision 1.3
sys/arch/x86/x86/powernow_k8.c: revision 1.5
sys/arch/x86/include/powernow.h: revision 1.5
sys/lkm/arch/i386/powernow/lkminit_powernow.c: revision 1.6
Update powernow module with POWERNOW_K7 and POWERNOW_K8 support.
Works fine on amd64 cpus running in 32-bit mode.
Tested by Joel Carnat.
 1.2.2.2 27-Aug-2006  tron Pull up following revision(s) (requested by xtraeme in ticket #57):
sys/arch/i386/i386/identcpu.c: revision 1.37
sys/arch/x86/include/powernow.h: revision 1.4
sys/arch/i386/include/cpu.h: revision 1.127
- Move k7_powernow_* prototypes from i386/include/cpu.h to
x86/include/powernow.h
- Protect k[78]_powernow_init() functions with #ifdef POWERNOW_K[78] to
make it build without these options.
This fixes the problem reported by hubertf.
 1.2.2.1 08-Aug-2006  tron Pull up following revision(s) (requested by cube in ticket #7):
sys/arch/x86/include/cpuvar.h: revision 1.5
sys/arch/x86/include/powernow.h: revision 1.3
files.x86 isn't included by Xen kernels, so opt_powernow_k8.h never gets
created by config(1), and thus it's not safe to use it in cpuvar.h.
Simply declare the prototype for k8_powernow_init in powernow.h. No need
to #ifdef protect a prototype, after all, only its users.
Un-breaks build of Xen kernels.
 1.3.2.3 03-Sep-2006  yamt sync with head.
 1.3.2.2 11-Aug-2006  yamt sync with head
 1.3.2.1 08-Aug-2006  yamt file powernow.h was added on branch yamt-pdpolicy on 2006-08-11 15:43:16 +0000
 1.5.6.1 22-Oct-2006  yamt sync with head
 1.5.4.2 09-Sep-2006  rpaulo sync with head
 1.5.4.1 27-Aug-2006  rpaulo file powernow.h was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:36 +0000
 1.5.2.1 18-Nov-2006  ad Sync with head.
 1.6.14.1 29-Mar-2007  reinoud Pullup to -current
 1.6.12.1 11-Jul-2007  mjf Sync with head.
 1.6.10.1 10-Apr-2007  ad Sync with head.
 1.6.6.2 15-Apr-2007  yamt sync with head.
 1.6.6.1 24-Mar-2007  yamt sync with head.
 1.6.4.3 03-Sep-2007  yamt sync with head.
 1.6.4.2 30-Dec-2006  yamt sync with head.
 1.6.4.1 04-Oct-2006  yamt file powernow.h was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.6.2.1 20-Apr-2007  bouyer Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.9.60.1 05-Mar-2011  rmind sync with head
 1.9.58.1 06-Nov-2010  uebayasi Sync with HEAD.
 1.9.52.3 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.9.52.2 10-Jan-2011  jym Sync with HEAD
 1.9.52.1 24-Oct-2010  jym Sync with HEAD
 1.12.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.12.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.10 12-Aug-2017  maxv Don't include opt_vm86.h.
 1.9 12-Aug-2017  maxv Remove vm86.

Pass 3.
 1.8 04-Oct-2012  dsl branches: 1.8.14;
Remove references to VM86 from the amd64 kernel configs.
VM86 mode isn't supported while in long mode.
 1.7 20-Apr-2012  jym branches: 1.7.2;
PSL_AC is user-settable.
 1.6 18-Sep-2008  dsl branches: 1.6.28; 1.6.32; 1.6.34;
Remove PSL_MBO (the bits that Must Be One) from PSL_USER - which are the
bits that the 'user' can change.
Who knows what the effect of a user signal handler (which I think might have
access to the bits) changing these bits might be!
 1.5 18-Sep-2008  christos Define a PSL_CLEARSIG macro for the psl flags to be cleared on signal delivery
and use it everywhere.
 1.4 17-Sep-2008  christos Include PSL_D in the flags to be able to be set by the user. Since setmcontext
is used to restore context from a signal handler, this will allow restoring
PSL_D to what it was before the user code entered the signal handler allowing
programs to work.
 1.3 30-Nov-2004  nathanw branches: 1.3.96; 1.3.100; 1.3.102; 1.3.106;
Add PSL_T to PSL_USER; it's fine for a program to want to trap itself.
 1.2 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.106.1 19-Oct-2008  haad Sync with HEAD.
 1.3.102.1 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.3.100.1 04-May-2009  yamt sync with head.
 1.3.96.1 28-Sep-2008  mjf Sync with HEAD.
 1.6.34.1 20-Apr-2012  riz Pull up following revision(s) (requested by jym in ticket #189):
sys/arch/x86/include/psl.h: revision 1.7
sys/arch/i386/i386/locore.S: revision 1.98
sys/arch/amd64/acpi/acpi_wakecode.S: revision 1.11
sys/arch/amd64/amd64/mptramp.S: revision 1.13
sys/arch/i386/acpi/acpi_wakecode.S: revision 1.15
sys/arch/i386/i386/mptramp.S: revision 1.23
sys/arch/amd64/amd64/locore.S: revision 1.68
Set the CR0_AM bit so processes can enable alignment check errors under
x86 through PSL_AC bit.
ATF test incoming shortly.
PSL_AC is user-settable.
 1.6.32.1 29-Apr-2012  mrg sync to latest -current.
 1.6.28.2 30-Oct-2012  yamt sync with head
 1.6.28.1 23-May-2012  yamt sync with head.
 1.7.2.2 03-Dec-2017  jdolecek update from HEAD
 1.7.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.8.14.1 28-Aug-2017  skrll Sync with HEAD
 1.7 20-Aug-2022  riastradh x86: Forbid using x86/pte.h directly; use machine/pte.h.

machine/pte.h already used outside sys/arch, so let's make it the
primary thing and make sure to use x86/pte.h only as a subroutine.
 1.6 20-Aug-2022  riastradh x86: Move pl*_i, pl_i_roundup, and ptp_va2o out of x86/pmap.h.

- pl[1-4]_i -> x86/pte.h
- pl_i, pl_i_roundup, ptp_va2o -> x86/pmap.c
 1.5 05-Sep-2020  maxv x86: rename PGEX_X -> PGEX_I

To match the x86 specification and the other OSes.
 1.4 14-Mar-2020  maxv style
 1.3 09-Oct-2019  maxv Add new bits.
 1.2 05-Oct-2019  maxv Switch to the new PTE naming. No binary diff (tested with MKREPRO).
 1.1 06-Jul-2010  cegger branches: 1.1.2; 1.1.4; 1.1.6; 1.1.12; 1.1.68;
Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.1.68.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.12.2 05-Mar-2011  rmind sync with head
 1.1.12.1 06-Jul-2010  rmind file pte.h was added on branch rmind-uvmplock on 2011-03-05 20:52:28 +0000
 1.1.6.2 24-Oct-2010  jym Sync with HEAD
 1.1.6.1 06-Jul-2010  jym file pte.h was added on branch jym-xensuspend on 2010-10-24 22:48:16 +0000
 1.1.4.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.1.4.1 06-Jul-2010  uebayasi file pte.h was added on branch uebayasi-xip on 2010-08-17 06:45:31 +0000
 1.1.2.2 11-Aug-2010  yamt sync with head.
 1.1.2.1 06-Jul-2010  yamt file pte.h was added on branch yamt-nfs-mp on 2010-08-11 22:52:55 +0000
 1.1 16-Jun-2009  bouyer branches: 1.1.2; 1.1.4; 1.1.6; 1.1.12;
Split mc146818-related functions from clock.c into rtc.c.
Call rtc_set_ymdhms() from xen/xen/clock.c:xen_rtc_set() for xen3 dom0
kernels as the Xen3 hypervisor doesn't write the new date/time to the CMOS
by itself.
Now a XEN3_DOM0 kernel properly updates the CMOS time.
 1.1.12.2 21-Apr-2010  matt sync to netbsd-5
 1.1.12.1 16-Jun-2009  matt file rtc.h was added on branch matt-nb5-mips64 on 2010-04-21 00:33:45 +0000
 1.1.6.2 01-Nov-2009  jym Sync with HEAD.
 1.1.6.1 16-Jun-2009  jym file rtc.h was added on branch jym-xensuspend on 2009-11-01 13:58:16 +0000
 1.1.4.2 20-Jun-2009  yamt sync with head
 1.1.4.1 16-Jun-2009  yamt file rtc.h was added on branch yamt-nfs-mp on 2009-06-20 07:20:12 +0000
 1.1.2.2 19-Jun-2009  snj Pull up following revision(s) (requested by bouyer in ticket #816):
sys/arch/amd64/conf/files.amd64: revision 1.68
sys/arch/i386/conf/files.i386: revision 1.350
sys/arch/x86/include/rtc.h: revision 1.1
sys/arch/x86/isa/clock.c: revision 1.33
sys/arch/x86/isa/rtc.c: revision 1.1
sys/arch/xen/conf/files.xen: revision 1.100
sys/arch/xen/xen/clock.c: revision 1.50 via patch
Split mc146818-related functions from clock.c into rtc.c.
Call rtc_set_ymdhms() from xen/xen/clock.c:xen_rtc_set() for xen3 dom0
kernels as the Xen3 hypervisor doesn't write the new date/time to the CMOS
by itself.
Now a XEN3_DOM0 kernel properly updates the CMOS time.
 1.1.2.1 16-Jun-2009  snj file rtc.h was added on branch netbsd-5 on 2009-06-19 21:22:10 +0000
 1.6 29-Nov-2019  riastradh branches: 1.6.2;
Largely eliminate the MD rwlock.h header file.

This was full of definitions that have been obsolete for over a
decade. The file still remains for __HAVE_RW_STUBS but that's all.
Used only internally in kern_rwlock.c now, not by <sys/rwlock.h>.
 1.5 28-Apr-2008  martin branches: 1.5.88;
Remove clause 3 and 4 from TNF licenses
 1.4 09-Dec-2007  ad branches: 1.4.10; 1.4.12; 1.4.14;
Use atomic_cas_ulong().
 1.3 21-Nov-2007  yamt branches: 1.3.2; 1.3.4;
make kmutex_t and krwlock_t smaller by killing lock id.
ok'ed by Andrew Doran.
 1.2 09-Feb-2007  ad branches: 1.2.4; 1.2.8; 1.2.14; 1.2.24; 1.2.26; 1.2.30; 1.2.32;
Merge newlock2 to head.
 1.1 10-Sep-2006  ad branches: 1.1.2;
file rwlock.h was initially added on branch newlock2.
 1.1.2.3 29-Dec-2006  ad Checkpoint work in progress.
 1.1.2.2 20-Oct-2006  ad - Don't need locked bus cycles on release from C code.
- Save an integer ID in the lock structures for LOCKDEBUG code.
 1.1.2.1 10-Sep-2006  ad Add updated locking primatives.
 1.2.32.2 27-Dec-2007  mjf Sync with HEAD.
 1.2.32.1 08-Dec-2007  mjf Sync with HEAD.
 1.2.30.1 21-Nov-2007  bouyer Sync with HEAD
 1.2.26.1 09-Jan-2008  matt sync with HEAD
 1.2.24.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.2.24.1 21-Nov-2007  joerg Sync with HEAD.
 1.2.14.1 17-Apr-2007  thorpej G/C _lock_cas() -- the atomic ops API provides what the locking
primitives need.
 1.2.8.1 03-Dec-2007  ad Sync with HEAD.
 1.2.4.4 21-Jan-2008  yamt sync with head
 1.2.4.3 07-Dec-2007  yamt sync with head
 1.2.4.2 26-Feb-2007  yamt sync with head.
 1.2.4.1 09-Feb-2007  yamt file rwlock.h was added on branch yamt-lazymbuf on 2007-02-26 09:08:49 +0000
 1.3.4.1 11-Dec-2007  yamt sync with head.
 1.3.2.1 26-Dec-2007  ad Sync with head.
 1.4.14.1 16-May-2008  yamt sync with head.
 1.4.12.1 18-May-2008  yamt sync with head.
 1.4.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.88.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.6.2.2 22-Jan-2020  ad Back out previous.
 1.6.2.1 19-Jan-2020  ad empty these; remove later.
 1.1 21-Jul-2021  jmcneill branches: 1.1.4;
Separate MI smbios interface from MD specific code.
 1.1.4.2 01-Aug-2021  thorpej Sync with HEAD.
 1.1.4.1 21-Jul-2021  thorpej file smbios_machdep.h was added on branch thorpej-i2c-spi-conf on 2021-08-01 22:42:19 +0000
 1.7 21-Jul-2021  jmcneill Separate MI smbios interface from MD specific code.
 1.6 21-Aug-2019  msaitoh branches: 1.6.12;
Fix typo (s/controler/controller/).
 1.5 25-Dec-2018  mlelstv Expose more DMI variables via sysctl.
 1.4 11-Mar-2017  nonaka branches: 1.4.12; 1.4.14;
search SMBIOS from UEFI configuration table when boot with UEFI.
 1.3 16-Apr-2008  cegger branches: 1.3.48; 1.3.68; 1.3.72; 1.3.76;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.2 30-Mar-2008  ad If SMBIOS is present and there seems to be good expansion slot info,
note the number of ISA compatible slots.
 1.1 01-Oct-2006  bouyer branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10; 1.1.60;
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
 1.1.60.2 02-Jun-2008  mjf Sync with HEAD.
 1.1.60.1 03-Apr-2008  mjf Sync with HEAD.
 1.1.10.2 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1621):
sys/arch/i386/conf/GENERIC: revision 1.787 via patch
share/man/man4/Makefile: revision 1.407 via patch
distrib/sets/lists/man/mi: revision 1.936 via patch
share/man/man4/ipmi.4: revision 1.1 via patch
sys/arch/i386/i386/bios32.c: revision 1.11 via patch
sys/dev/DEVNAMES: revision 1.221 via patch
sys/arch/x86/x86/ipmi.c: revision 1.1 via patch
sys/arch/i386/i386/mainbus.c: revision 1.65 via patch
sys/arch/x86/include/smbiosvar.h: revision 1.1 via patch
sys/arch/x86/include/ipmivar.h: revision 1.1 via patch
sys/arch/x86/conf/files.x86: revision 1.20 via patch
sys/arch/i386/conf/files.i386: revision 1.293 via patch
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
Add manpage for new ipmi driver.
Claim ipmi.
 1.1.10.1 01-Oct-2006  ghen file smbiosvar.h was added on branch netbsd-3 on 2007-01-08 16:36:20 +0000
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 01-Oct-2006  yamt file smbiosvar.h was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.1.4.2 18-Nov-2006  ad Sync with head.
 1.1.4.1 01-Oct-2006  ad file smbiosvar.h was added on branch newlock2 on 2006-11-18 21:29:38 +0000
 1.1.2.2 22-Oct-2006  yamt sync with head
 1.1.2.1 01-Oct-2006  yamt file smbiosvar.h was added on branch yamt-splraiseipl on 2006-10-22 06:05:16 +0000
 1.3.76.1 21-Apr-2017  bouyer Sync with HEAD
 1.3.72.1 20-Mar-2017  pgoyette Sync with HEAD
 1.3.68.1 28-Aug-2017  skrll Sync with HEAD
 1.3.48.1 03-Dec-2017  jdolecek update from HEAD
 1.4.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.14.1 10-Jun-2019  christos Sync with HEAD
 1.4.12.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.6.12.1 01-Aug-2021  thorpej Sync with HEAD.
 1.220 24-Aug-2025  rillig x86/specialreg.h: remove redundant '\0' from snprintb format
 1.219 28-Apr-2025  riastradh xen: Stop-gap FPU PCB fix; disable Intel AMX for now.

Since the custom cpu_uarea_alloc/free are disabled under XENPV,
nothing would initialize struct pcb::pcb_savefpu to point either to
struct pcb::pcb_savefpusmall, or to a separately allocated large area
on machines with Intel AMX TILECFG/TILEDATA requiring it. So the
memset in fpu_lwp_fork would crash on null pointer dereference:

[ 1.0000030] uvm_fault(0xffffffff8094a300, 0x0, 2) -> e
[ 1.0000030] fatal page fault in supervisor mode
[ 1.0000030] trap type 6 code 0x2 rip 0xffffffff8062795c cs 0xe030 rflags 0x10202 cr2 0 ilevel 0 rsp 0xffffffff80adad38
[ 1.0000030] curlwp 0xffffffff8078f880 pid 0.0 lowest kstack 0xffffffff80ad62c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:memset+0x2c: repe stosq %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x2f1
fork1() at netbsd:fork1+0x42c
main() at netbsd:main+0x44f

In order to support Intel AMX TILECFG/TILEDATA, or any other CPU
extensions that increase the XSAVE area beyond what fits in a single
page after struct pcb, we would need to enable the the custom
cpu_uarea_alloc/free. Currently that would imply allocating stack
guard pages (`redzone') under XENPV; if there's some reason the stack
guard pages don't work, we could also push #ifdef XENPV conditionals
into cpu_uarea_alloc/free to cover the guard pages -- to be
considered.

PR kern/59371: Xen domU uvm_fault since FPU state allocation patch

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.218 24-Apr-2025  riastradh amd64: Enable TILECFG and TILEDATA registers.

This allows processes to use the registers, and NetBSD will save and
restore them in context switches. But it does not expose them to
ptrace(2) or debuggers like all the other extended CPU state
(xmm/ymm/zmm) -- that will require more work.

PR kern/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu
PR port-amd64/59299: Support Intel AMX CPU state (TILECFG/TILEDATA)
 1.217 24-Apr-2025  riastradh x86: Add some more XCR0 bits and references.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.216 19-Oct-2024  msaitoh x86/specialreg.h: Update AMD CPUID definitions.

- Add AMD Hetero Workload Classification.
- Extend the number of UMC PMCs field from 6bit to 8bit.
- Add Guest Intercept Control for SEV-ES.
- Add Segmented RMP
 1.215 17-Oct-2024  msaitoh x86/specialreg.h: Update AMD CPUID definitions.

Update definitions from the following PPR:
- PPR for AMD Family 19h Model 11h, Revision B2 Processors
(Doc ID 55901 rev. 0.47)
- PPR for AMD Family 1Ah Model 02h, Revision C1 Processors
(Doc ID 57238 rev.0.24)
- PPR for AMD Family 1Ah Model 24h, Revision B0 Processors
(Doc ID 57274 rev. 3.00)

- Rename CPUID Fn8000001b EDX bit 11 from IbsL3MissFiltering to
Zen4IbsExtension.
- Add some CPUID bits.
 1.214 06-Oct-2024  msaitoh Add some unknown CPUID bits for AMD.
 1.213 06-Oct-2024  msaitoh Add some CPUID bits for AMD.
 1.212 01-Jul-2024  andvar Disable the VIA Alternate Instructions according the VIA documentation:
* C7 and above do not support ALTINST, do not check or attempt to disable them.
* For VIA C3 Nehemiah check extended feature flags for support and status,
do no attempt to disable when AIS is not supported or enabled.
* For pre-Nehemiah models explicitly disable, if they are in the range
of documented models, flags aren't present to check the status on these models.
Note: for pre-Nehemiah may be other functional side effects depdending
on the version and stepping.

Explicit disabling of ALTINST was introduced with rev. 1.84 following
the discovery of some VIA CPUs having these instructions enabled by default
leading to the potential backdoor (aka rosenbrindge).

Unfortunately, implementation used a wrong check (ACE supported flag),
which can be true for the later models, still supporting padlock features.
Setting ALTINST bit on those may have unexpected side effects like VIA C7 CPUID
instruction for temperature sensor not reporting correct value or
`cpuctl identify' not reporting certain CPU features. Similar side effects
can be observed even for Nehemiah models not supporting AIS instructions. This
change should limit possibility of such issues to only the pre-Nehemiah models,
not covered at all in the previous implementation.

Feature Control Register (FCR) macros were unified under one group and
consistent naming while implementing the change. Few comments updated as well.

patch reviewed by Riastradh@ (thank you)

need pullups to netbsd-9, 10.

PR kern/58370
 1.211 12-May-2024  msaitoh branches: 1.211.2;
s/RPMQUERY/RMPQUERY/
 1.210 08-Mar-2024  rillig cpuctl: fix i386 bit descriptions for CPUID_SEF_FLAGS1

warning: non-printing character '\31' in description
'BUS_LOCK_DETECT""b\31' [363]
 1.209 27-Oct-2023  mrg add MSR stuff for AMD errata 1474.
 1.208 27-Jul-2023  msaitoh Add AMD IBPB_RET and BusLockThreshold.
 1.207 25-Jul-2023  mrg x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.206 11-Apr-2023  msaitoh Fix compile error.
 1.205 11-Apr-2023  msaitoh Add CPUID 0x07 %ecx bit 24 BUS_LOCK_DETECT.
 1.204 25-Mar-2023  andvar s/Predective/Predictive/ and s/dedected/detected/ in comments.
 1.203 17-Feb-2023  msaitoh Add AMD CPUID Fn0000_0008 %ebx bit 3 INVLPGB.
 1.202 14-Feb-2023  msaitoh Add some CPUID bits from PPR for AMD Family 19h Model 61h Revision B1.
 1.201 30-Dec-2022  msaitoh Fix comment.
 1.200 30-Dec-2022  msaitoh Update definitions from the latest Intel SDM.

- Rename HW_FEEDBACK to HWI (Hardware Feedback Interface).
- Add CPUID Fn0000_0006 %eax bit 24 IA32_THERM_INTERRUPT MSR bit 25 Hardware
Feedback Notification support.
- Add CPUID Fn0000_0007 %ecx bit 29 ENQCMD.
- Add CPUID Fn0000_0007 %edx bit 1 SGX-KEYS.
- Add CPUID Fn0000_0007 %edx bit 5 UINTR(User INTeRrupts).
- Add CPUID Fn0000_0007 %edx bit 1 RTM_ALWAYS_ABORT.
- Rename TSX_FORCE_ABORT to RTM_FORCE_ABORT.
- Add CPUID Fn0000_0007 %edx bit 22 AMX_BF16.
- Add CPUID Fn0000_0007 %edx bit 23 AVX512_FP16.
- Add CPUID Fn0000_0007 %edx bit 24 AMX_TILE.
- Add CPUID Fn0000_0007 %edx bit 25 AMX_INT8.
- Add CPUID Fn0000_0007 sub-leaf 1 %edx bit 18 CET_SSS.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 0 PSFD.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 1 IPRED_CTRL.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 2 RRSBA_CTRL.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 3 DDPD_U.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 4 BHI_CTRL.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 5 MCDT_NO.
- Modify comment. Both Intel and AMD support CPUID Fn0000000b.
- Add CPUID Fn0000_000d sub-leaf 1 %eax bit 4 XFD.
- Modify comment. Hybrid Information -> Native Model ID Information.
- Add CPUID Fn0000_001d Tile Information.
- Add CPUID Fn0000_001e TMUL Information.
 1.199 27-Dec-2022  msaitoh Use __BIT(). Add comment. Whitespace. No functional change.
 1.198 21-Nov-2022  msaitoh branches: 1.198.2;
Update AMD CPUID Fn8000_001b

- Add IbsFetchCtlExtd and IbsOpData4.
- Fix typo (lbs -> Ibs).
 1.197 16-Nov-2022  msaitoh Add CPUID Fn8000_0022 AMD Extended Performance Monitoring and Debug.
 1.196 16-Nov-2022  msaitoh Add CPUID Fn8000_0021 AMD Extended Features Identification 2.
 1.195 16-Nov-2022  msaitoh Add Some definitions from AMD APM:

- Add CPUID Fn8000_0007 %eax RAS capabilities.
- Add CPUID Fn8000_001b Instruction-Based Sampling capabilities.
- Add BTC_NO, ROGPT, RPMQUERY, VmplSSS, TscAuxVirt, VmgexitParam,
VirtualTomMsr, bsVirtGuest, SmtProtection, vsmCommPageMSR and
NestedVirtSnpMsr.
 1.194 19-Oct-2022  msaitoh Add AMD cpuid Fn8000_000a x2AVIC, VNMI and IBSVIRT from APM Vol. 3 Rev. 3.34.
 1.193 12-Oct-2022  msaitoh Add CPUID Fn8000_001e Processor Topology Information.
 1.192 06-Oct-2022  msaitoh Update some AMD CPUID bits:

- Rename FSREP_MOV to FSRM.
- Add Memory Bandwidth Enforcement (MBE)
- Add AMD's PPIN. Rename CPUID_SEF_PPIN to CPUID_SEF_INTEL_PPIN.
- Add Collaborative Processor Performance Control (CPPC).
- Add HOST_MCE_OVERRIDE.
- Add some unknown bits as Bxx.
- Add comments.
- Use __BIT().
 1.191 15-Jun-2022  msaitoh Modify CPUID Fn0000000a %ebx's string. Add new string for %ecx.
 1.190 13-Jun-2022  msaitoh Add top-down slots event bit of architectural performance monitoring leaf.
 1.189 01-Feb-2022  msaitoh s/shareing/sharing/. No functional change.
 1.188 29-Jan-2022  msaitoh Add Intel Hybrid Information Enumeration (CPUID Fn0000_001a).
 1.187 17-Jan-2022  andvar fix typos in comments, mainly s/foward/forward/.
 1.186 15-Jan-2022  msaitoh Add Some definitions from AMD APM:

- CPUID Fn80000001 %ecx bit 30 AddrMaskExt.
- CPUID Fn80000008 %ebx bit 13 INT_WBINVD.
- CPUID Fn80000008 %ebx bit 19 IbrsSameMode.
- CPUID Fn80000008 %ebx bit 20 EferLmsleUnsupported.
- CPUID Fn80000008 %ebx bit 28 PSFD.
- CPUID Fn80000008 %edx bit 30 as "B30". Not documented.
- CPUID Fn8000001f %eax bit 8 SecureTSC.
- CPUID Fn8000001f %eax bit 24 VmsaRegProt.
- Tested by nonaka@.
 1.185 15-Jan-2022  msaitoh Whitespace. No functional change.
 1.184 15-Jan-2022  msaitoh Move CPUID_CAPEX_FLAGS next to %eax because it's for %eax.
 1.183 15-Jan-2022  msaitoh No functional change.

- Modify comment. Add comment. Fix typo. Mainly taken from dragonfly.
- Use __BIT().
 1.182 14-Jan-2022  msaitoh Add Architectural LBR and Linear Address Masking.
 1.181 14-Jan-2022  msaitoh Both Intel and AMD says the name of CPUID 0x01 %edx bit 19 is "CLFSH".
 1.180 13-Jan-2022  msaitoh Add some CPUID bits from the latest Intel SDM.

- Last Branch Record.
- Thread Director.
- AVX version of VNNI.
- Fast short REP MOV.
- HRESET.
- PPIN.
 1.179 13-Jan-2022  msaitoh Use __BIT(). KNF. No functional change.
 1.178 30-Sep-2021  msaitoh Print CPUID_PBE (Pending Break Enable) with "PBE".
 1.177 10-Jul-2021  msaitoh Add some definitions from Intel SDM:

- CPUID leaf 7:0 %ecx bit 13 TME_EN (Total Memory Encryption)
- CPUID leaf 7:0 %edx bit 18 PCONFIG (Platform CONFIGuration)
 1.176 24-Nov-2020  msaitoh branches: 1.176.4;
Add some definitions from the latest Intel SDM:

- Add CPUID leaf 7 %edx bit 23 "KL" (Key Locker).
- Add CPUID leaf 7 subleaf 1 %eax bit 5 "AVX512_BF16".
 1.175 07-Sep-2020  jakllsch branches: 1.175.2;
Fix printb string for LA57
 1.174 07-Sep-2020  msaitoh Add CPUID(EAX=07H, ECX=0) ECX bit 16 LA57 from maxv.
 1.173 05-Sep-2020  maxv x86: fix several CPUID flags

- Rename: CPUID_PN -> CPUID_PSN
CPUID_CFLUSH -> CPUID_CLFSH
CPUID_SBF -> CPUID_PBE
CPUID_LZCNT -> CPUID_ABM
CPUID_P1GB -> CPUID_PAGE1GB
CPUID2_PCLMUL -> CPUID2_PCLMULQDQ
CPUID2_CID -> CPUID2_CNXTID
CPUID2_xTPR -> CPUID2_XTPR
CPUID2_AES -> CPUID2_AESNI
To match the x86 specification and the other OSes.

- Remove: CPUID_B10, CPUID_B20, CPUID_IA64. They do not exist.
 1.172 04-Sep-2020  maxv Add a few more CPUID flags.
 1.171 05-Aug-2020  maxv Add new fields here and there.
 1.170 20-Jul-2020  maxv Revert previous, to unbreak the build (NVMM declares the macro too).

There are hundreds of MSRs, we're not going to list them all, especially
when the majority are unused.
 1.169 19-Jul-2020  jdolecek add definition for MSR_IA32_FEATURE_CONTROL, just for information
 1.168 18-Jun-2020  maxv style and fix typo
 1.167 10-Jun-2020  msaitoh Add SRBDS_CTRL bit.
 1.166 01-Jun-2020  msaitoh Add some definitions from the latest Intel SDM plus small fix:

- Add CPUID leaf 6 %eax bit 19 for HW_FEEDBACK* and IA32_PACKAGE_TERM* MSRs.
- Add CPUID leaf 7 %ecx bit 31 for Protection Keys.
- Add definition of Load only TLB and Store only TLB.
- Add IF_PSCHANGE_MC_NO bit of IA32_ARCH_CAPABILITIES
- Fix HWP_IGNIDL.
 1.165 28-May-2020  msaitoh Add AMD MSR_DE_CFG's bit 1 as DE_CFG_LFENCE_SERIALIZE.
This bit makes lfence instruction serializing.
 1.164 01-May-2020  msaitoh - Add AMD INVLPGB/TLBSYNC hypervisor enable in VMCB and TLBSYNC intercept bit.
- Modify comment.
 1.163 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.162 24-Apr-2020  msaitoh - AMD CPUID Fn8000_000a %edx bit 20 is "SPEC_CTRL".
- Add some bit definitions of AMD's CPUID Fn8000_001f Encrypted Memory
features.
 1.161 06-Apr-2020  msaitoh branches: 1.161.2;
Rename CPUID_APM_TSC to CPUID_APM_ITSC. No functional change.
 1.160 06-Apr-2020  msaitoh CPUID Fn00000001 %edx bit 8 is printed as "TSC", so rename CPUID Fn8000_0007
%edx bit 8 from "TSC" to "ITSC" (Invariant TSC) to avoid confusion.
 1.159 01-Apr-2020  msaitoh Add AVX512_VP2INTERSECT, SERIALIZE and TSXLDTRK(TSX suspend load addr tracking)
 1.158 17-Nov-2019  msaitoh Add the following bit definitions from the latest Intel SDM:
- CET shadow stack
- Fast Short REP MOV
- Hybrid part
- CET Indirect Branch Tracking
 1.157 12-Nov-2019  maxv Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA).

Two sysctls are added:

machdep.taa.mitigated = {0/1} user-settable
machdep.taa.method = {string} constructed by the kernel

There are two cases:

(1) If the CPU is affected by MDS, then the MDS mitigation will also
mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf
read-only, and force:
machdep.taa.mitigated = machdep.mds.mitigated
machdep.taa.method = [MDS]
The kernel already enables the MDS mitigation by default.

(2) If the CPU is not affected by MDS but is affected by TAA, then we use
the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode
update, now available on the Intel website. The kernel will automatically
enable the TAA mitigation if the updated microcode is present. If the new
microcode is not present, the user can load it via cpuctl, and set
machdep.taa.mitigated=1.
 1.156 30-Oct-2019  msaitoh - GMET is not bit 11 but 17.
- Add unknown CPUID Fn8000_000a %edx bit 20.
 1.155 08-Oct-2019  msaitoh Fix AMD Fn8000_0001f %eax bit 0's name.
 1.154 03-Oct-2019  msaitoh - Add definitions of AMD's CPUID Fn8000_001f Encrypted Memory features.
- Add definition of AMD's CPUID Fn8000_000a %edx bit 11 "GMET".
- Define CPUID_AMD_SVM_PFThreshold correctly.
- Modify comment a bit for consistency.
 1.153 26-Sep-2019  msaitoh Define CPUID_CAPEX_FLAGS's bit 10 correctly.
 1.152 09-Sep-2019  msaitoh Add MCOMMIT instruction.
 1.151 30-Aug-2019  msaitoh Add definitions of AMD's CPUID Fn8000_0008 %ebx.
 1.150 26-Jul-2019  msaitoh branches: 1.150.2;
- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.149 13-Jul-2019  msaitoh Define some new bits of CPUID Fn8000_0007 %edx AMD Advanced Power Management
leaf.
 1.148 26-Jun-2019  mgorny Fetch XSAVE area component offsets and sizes when initializing x86 CPU

Introduce two new arrays, x86_xsave_offsets and x86_xsave_sizes,
and initialize them with XSAVE area component offsets and sizes queried
via CPUID. This will be needed to implement getters and setters for
additional register types.

While at it, add XSAVE_* constants corresponding to specific XSAVE
components.
 1.147 29-May-2019  maxv Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.146 18-May-2019  maxv Clean up a little, add new XCR0 bits, remove a few unused MSRs, and fix
typos.
 1.145 14-May-2019  msaitoh Add snprintb's string for cpuid7 edx bit 10 "MD_CLEAR".
 1.144 14-May-2019  maxv Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS).

It requires a microcode update, now available on the Intel website. The
microcode modifies the behavior of the VERW instruction, and makes it flush
internal CPU buffers. We hotpatch the return-to-userland path to add VERW.

Two sysctls are added:

machdep.mds.mitigated = {0/1} user-settable
machdep.mds.method = {string} constructed by the kernel

The kernel will automatically enable the mitigation if the updated
microcode is present. If the new microcode is not present, the user can
load it via cpuctl, and set machdep.mds.mitigated=1.
 1.143 13-Mar-2019  msaitoh Add TSX_FORCE_ABORT related definitions.
 1.142 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.141 16-Feb-2019  maxv Handle MSR_MISC_ENABLE on NVMM-Intel (Intel-specific).
 1.140 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.139 08-Feb-2019  msaitoh Fix bitstring format of Intel CPUID Architectural Performance Monitoring
Fn0000000a %ebx.
 1.138 05-Feb-2019  msaitoh Add new CPUID flags WAITPKG, CLDEMOTE, MOVDIRI, MOVDIR64B and
IA32_CORE_CAPABILITIES from the latest Intel SDM.
 1.137 13-Jan-2019  maxv Forgot to commit file along with identcpu.c::rev1.86.
 1.136 26-Nov-2018  msaitoh Add Intel CPUID Architectural Performance Monitoring leaf Fn0000000a.
 1.135 22-Nov-2018  msaitoh Add Intel/AMD MONITOR/MWAIT leaf.
 1.134 21-Nov-2018  msaitoh Add Intel CPUID Extended Topology Enumeration Fn0000000b definitions.
 1.133 21-Nov-2018  msaitoh Modify comment. No functional change:
- AMD also has CPUID 0x06 and 0x0d.
- PCOMMIT was obsoleted.
 1.132 15-Nov-2018  msaitoh Add MAWAU (for BND{LD,ST}X instruction) from the latest Intel SDM.
 1.131 10-Nov-2018  maxv Declare the MSR_VIA_ACE values as macros, and use a consistent naming,
similar to the rest of the file.

I'm wondering if I'm not fixing a huge bug here. The ECX8 value we were
using was wrong: ECX8 is bit 1, not bit 0. Bit 0 is ALTINST, an alternate
ISA, which is now known to be backdoored.

So it looks like we were explicitly enabling the backdoor.

Not tested, because I don't have a VIA cpu.
 1.130 20-Aug-2018  msaitoh OK'd by maxv:
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
 1.129 07-Aug-2018  maxv Add five errata for AMD Family 17h (Ryzen etc), tested by Patrick Welche,
thanks. Also add two errata for Family 16h, not yet tested, so not yet
enabled.
 1.128 13-Jul-2018  maxv Remove the X86PMC code I had written, replaced by tprof. Many defines
become unused in specialreg.h, so remove them. We don't want to add
defines all the time, there are countless PMCs on many generations, and
it's better to just inline the event/unit values.
 1.127 04-Jul-2018  maya Disable MWAIT/MONITOR on Apollo Lake CPUs to workaround APL30 errata.

We use MWAIT/MONITOR to hatch secondary CPUs. The errata means that
the wakeup may not happen, so SMP boot fails.
Use wrmsr to disable it in hardware too, for extra paranoia.

PR port-amd64/53420,
also reported on netbsd-users by joern clausen and ssartor.
 1.126 31-May-2018  msaitoh branches: 1.126.2;
Fix the bit location of SSBD in the macro for snprintb.
 1.125 23-May-2018  maxv Clean up the FPU headers.
 1.124 22-May-2018  maxv Extend the AMD NONARCH method to family 17h. The AMD spec states that for
17h care must be taken when handling sibling threads.

The concern is that if we have a protected two-thread process running on
two siblings, and context switch one thread to another unprotected thread,
disabling the SSB protection on one logical core will disable SSB on its
sibling too (which is still running the protected thread).

All of that doesn't matter to us, because the SSB value we set is
system-wide, not per-process.
 1.123 22-May-2018  maxv Implement a mitigation for SpectreV4 on AMD families 15h and 16h. We use
a non-architectural MSR. This MSR is also available on 17h, but there SMT
is involved, and it needs more investigation.

Not tested (I have only 10h).
 1.122 22-May-2018  maxv Add RSBA. When set, it indicates that the CPU is vulnerable to SpectreV2
via the RSB.
 1.121 22-May-2018  maxv Mitigation for SpectreV4, based on SSBD. The following sysctl branches
are added:

machdep.spectre_v4.mitigated = {0/1} user-settable
machdep.spectre_v4.affected = {0/1} set by the kernel

The mitigation is not enabled by default yet. It is not tested either,
because no microcode update has been published yet.

On current CPUs a microcode/bios update must be applied for SSBD to be
available. The user can then set mitigated=1. Even with an update applied
the kernel will set affected=1.

On future CPUs, where the problem will presumably be fixed by default,
the CPU will report SSB_NO, and the kernel will set affected=0. In this
case we also have mitigated=0, but the mitigation is not needed.

For now the feature is system-wide. Perhaps we will want a more
fine-grained, per-process approach in the future.
 1.120 30-Mar-2018  maxv Add RDCL_NO and IBRS_ALL.
 1.119 30-Mar-2018  msaitoh Add Some bit definitions of AMD Fn80000001 %edx:
- MMX
- FXSR
 1.118 30-Mar-2018  msaitoh From the latest Intel SDM:
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.
 1.117 14-Mar-2018  maxv ... and also add IBPB ...
 1.116 14-Mar-2018  maxv Add the IBRS and STIBP MSRs.
 1.115 14-Mar-2018  maxv Add IC_CFG.DIS_IND: "Disable Indirect Branch Predictor". Available (at
least) on AMD Families 10h, 12h and 16h.
 1.114 12-Mar-2018  msaitoh s/CLFUSH/CLFLUSH/
No functional change.
 1.113 08-Mar-2018  msaitoh Sort entries. No functional change.
 1.112 05-Mar-2018  msaitoh branches: 1.112.2;
Add Intel Deterministic Address Translation Parameter Leaf(0x18) definitions.
 1.111 15-Jan-2018  msaitoh Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.
 1.110 15-Jan-2018  msaitoh Add MSR_IA32_ARCH_CAPABILITIES definition.
 1.109 15-Jan-2018  msaitoh - Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add comment.
 1.108 13-Jan-2018  jdolecek fix swapped comments for EFER LME and LMA
 1.107 10-Jan-2018  msaitoh Add Intel cpuid 7 %edx IBRS(IBPB Speculation Control) and
STIBP(STIBP Speculation Control) from OpenBSD.
 1.106 10-Jan-2018  msaitoh Add comment.
 1.105 19-Oct-2017  msaitoh Add the following bits in AMD Fn8000000a %edx features (SVM features):
PFThreshold (PAUSE filter threshold)
AVIC (AMD virtual interrupt controller)
V_VMSAVE_VMLOAD (virtualized VMSAVE and VMLOAD)
vGIF (virtualized GIF)
 1.104 18-Oct-2017  msaitoh Add Turbo Boost Max Technology 3.0 bit.
 1.103 13-Oct-2017  msaitoh Add the following instruction bits in Structured Extended Flags Enumeration
Leaf from "Intel Architecture Instruction Set Extensions and Future Features
Programming Reference" (319433-030):
AVX512_IFMA
AVX512_VBMI
AVX512_VBMI2
GFNI
VAES
VPCLMULQDQ
AVX512_VNNI
AVX512_BITALG
AVX512_VPOPCNTDQ
AVX512_4VNNIW
AVX512_4FMAPS
 1.102 07-Sep-2017  msaitoh Define CPUID Fn00000001 %ebx bits and use them. No functional change.
 1.101 11-Aug-2017  maxv Add a comment about APICBASE_PHYSADDR. Has to do with PR/42597.
 1.100 11-Jul-2017  gson Fix typo in comment
 1.99 14-Jun-2017  maxv Add EFER_TCE. This would be an interesting feature to have, since it
reduces the indirect cost of invlpg; but I'm not convinced the way we
flush upper-levels is correct for this yet.
 1.98 15-May-2017  msaitoh branches: 1.98.2;
CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify comments
and snprintb() sring.
 1.97 22-Apr-2017  nonaka branches: 1.97.2;
move LAPIC_MSR* to specialreg.h.
 1.96 22-Apr-2017  nonaka Add x2APIC register definitions.
 1.95 11-Mar-2017  maxv Add the AMD 10h family, with additional events that I believe are useful,
the DTLB misses on large pages for example.

While here, remove a few K7 flags that do not actually exist on K7 (there
must have been a confusion between K7 and K8); and make the 'pmc list'
command a little more user-friendly.
 1.94 18-Feb-2017  maxv Add the AMD 10h family PMC values. Some values depend on the CPU revision,
they are commented out. Several other values are common with K7, we could
merge them later.

This family of CPUs has a 12bit event selector, contrary to K7 (8bit). The
thing is, i386's PMC interface takes as argument a uint8_t from userland,
so these counters are not accessible (yet).
 1.93 11-Feb-2017  maxv Fix a few (unused) MSR values, and add some others that I believe are
relevant.

From Murray Armfield (PR/42861).
 1.92 02-Feb-2017  msaitoh Modify comment. Use long form.
 1.91 08-Dec-2016  msaitoh branches: 1.91.2;
Add CLWB bit.
 1.90 05-Dec-2016  msaitoh Fix CPUID_SEF_FLAGS. Octal value has no 8.
 1.89 19-Aug-2016  maxv KNF so NXR likes it, and some typos
 1.88 16-Jul-2016  maxv Add the cr4 flags for PKE and UMIP.
 1.87 27-Apr-2016  msaitoh branches: 1.87.2;
Add some bit definitions mainly taken from the latest Intel SDM:
- Add SGX, UMIP, RDPID and SGXLC.
- Add avx512dq, avx512bw and avx512vl.
Fix the bit location of CLFLUSHOPT.
 1.86 13-Jan-2016  msaitoh Add some AMD's bit definitions from "BIOS and Kernel Developer(BKDG) for AMD
Family 15h Models 60h-6Fh Processors".
 1.85 08-Jan-2016  msaitoh Add CLFLUSHOPT bit.
 1.84 08-Jan-2016  msaitoh Add x86 FPU Data Pointer Updated Only bit from Intel SDM.
 1.83 14-Aug-2015  msaitoh - Add Hardware-Controlled Performance States (HWP) bits.
- Use __BIT()
 1.82 08-May-2015  msaitoh From Intel SDM:
- Add the Silicon Debug bit in CPUID Fn00000001 %ecx
- Add CPUID Fn0000_0007 %ecx bits
- Add comments.
 1.81 12-Dec-2014  msaitoh Use specialreg.h's definitions.
 1.80 11-Sep-2014  msaitoh branches: 1.80.2;
- Add two more bit definitions
- XINUSE -> XGETBV
 1.79 09-Sep-2014  msaitoh Update CPUID(EAX=0x0d, ECX=1) from Intel SDM:
- XSAVEC(bit1)
- XGETBV(bit2)
- XSAVES(bit3)
 1.78 25-Feb-2014  dsl branches: 1.78.4;
Add the XCR bits for snazzy upcoming features.
Define a mask for the fpu releated ones - only these wll be enabled.
The memory bound ones will need saving on every context switch.
 1.77 04-Jan-2014  msaitoh Add Energy Performance Bias bit.
 1.76 04-Jan-2014  msaitoh Remove duplicated entry. Modify comments a bit.
 1.75 25-Dec-2013  msaitoh move XCR0 definitions to next to CR0's.
 1.74 08-Dec-2013  dsl Add some definitions for cpu 'extended state'.
These are needed for support of the AVX SIMD instructions.
Nothing yet uses them.
 1.73 20-Nov-2013  msaitoh - Add some AMD Fn80000001 extended features %ecx bits definitions from
the document (AMD64 Architecture ProgrammerVolume 3: General-Purpose and
System Instructions. Document revision 3.20)

- "s/MXX/MMXX/" because this bit is "MMX eXtention".
 1.72 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.71 21-Oct-2013  msaitoh - Add Intel Deterministic Cache Parameter Leaf (CPUID leaf 4).
This definitions are required to know cache information of
newer Intel CPU.
- Fix comment.
 1.70 04-Oct-2013  msaitoh Sort definitions. No functional change.
- CPUID_FEAT_BLACKLIST is for Fn00000001 %edx, so move it.
- Sort CPUID definitions with initial EAX value.
 1.69 04-Oct-2013  msaitoh Add comment about CPUID Processor extended state Enumeration Fn0000000d %eax.
 1.68 14-Sep-2013  msaitoh Add some definitions of Intel's cpuid feature from the latest document.
 1.67 12-Aug-2013  drochner add feature flag definitions for the last round of Intel instruction
set extensions (AVX512 et al.)
 1.66 26-Jul-2013  msaitoh Style change.
 1.65 25-Jul-2013  msaitoh Add some new bit definitions of Structured Extended Feature Flags Enumeration
Leaf from the document (Intel 64 and IA-32 Architectures Software Developer's
Manual).
 1.64 25-Jul-2013  msaitoh Fix the bit positions in CPUID_SEF_FLAGS macro. On snprintb(), position 1
means LSB(bit0). The bit position from HLE to SMAP was 1 bit right shifted.
The bit position of BMI1 was completely wrong.
 1.63 06-Mar-2013  yamt branches: 1.63.6;
some more definitions
 1.62 06-Jan-2013  dsl Correct the comment about the extended family and model bits.
Add some definitions related to the process extended state enumeration.
 1.61 03-Jan-2013  dsl Add some missing bit definitions to CPUID2 and those for XCR0.
Taken from the August 2012 Intel SDM (intel_x86_325462.pdf).
Split all the snprintb() format strings to make them (almost) readable.
Fix CPUID_AMD_FLAGS4 to not try to print bits \41 and \42.
 1.60 17-Oct-2012  drochner recognize the P1GB and RDTSCP which were AMD-only on Intel HW too
 1.59 05-May-2012  jym branches: 1.59.2;
Add latest CR4 bits:
- CR4_VMXE: VMX operations, used for hardware virtualization.
- CR4_SMXE: SMX operations, used for safer Mode Extensions (ground for
Intel's TXT - Trusted Execution Technology - platform).
- CR4_FSGSBASE: enable *FSBASE and *GSBASE instructions, for R/W access
to FS/GS segment base addresses.
- CR4_PCIDE: enable Process Context IDentifiers (other architectures may call
these "address space identifiers").
- CR4_OSXSAVE: enable xsave and xrestore instructions
- CR4_SMEP: Supervisor Mode Execution Prevention. Allows enforcing --x rights
from cpl 0.

From Intel® 64 and IA-32 Architectures Software Developer’s Manual,
March 2012.

Align declarations.

CPUID_* bits for these features follow.
 1.58 30-Apr-2012  christos Add VIA Eden FCR MSR.
 1.57 06-Apr-2012  chs bring in this change from openbsd:
Implement the AMD suggested workaround for family 10h & 12h errata 721
"Processor May Incorrectly Update Stack Pointer" by setting a bit
marked 'reserved' in an MSR that is only "documented" to exist on 12h.
 1.56 02-Mar-2012  bouyer Don't mask out CPUID_FXSR. If not set, the kernel won't handle SSE and SSE2
registers on context switches; leading to data corruption when running
binaries using these instructions (like e.g. binaries built with a
-mcpu newer than pentium 4, which enables theses instruction in gcc).
 1.55 15-Dec-2011  abs branches: 1.55.2;
Increase MTRR_I686_NVAR_MAX from 8 to 16. Avoids
"FIXME: more than 8 MTRRs (10)" message on booting Thinkpad W520 and
similar. While here replace a magic number with MTRR_I686_NVAR_MAX * 2
 1.54 09-Dec-2011  cegger add AMD ucode MSRs
 1.53 03-Oct-2011  njoly branches: 1.53.2; 1.53.6;
Do not redefine CPUID_LAHF.
 1.52 26-Jul-2011  yamt - add PCID
- comment
 1.51 20-Feb-2011  jruoho Add MSR_TEMPERATURE_TARGET.
 1.50 15-Feb-2011  cegger update cpuid bits
 1.49 12-Oct-2010  jakllsch branches: 1.49.2; 1.49.4;
Correct another off-by-one-bit error. This time for Erratum 97.
 1.48 18-Sep-2010  jakllsch AMD publication 25759 rev 3.69 says that DisIOReqLock in NB_CFG is "bit 3".
They probably mean "bit 3" and not "the third bit" (or bit 2).
This change should prevent superfluous warnings of errata 89.
 1.47 25-Aug-2010  jruoho Add definitions for Intel Digital Thermal Sensor and Power Management, at
CPUID Fn0000_0006, %eax, %ecx. Use these instead of magic numbers.
 1.46 21-Aug-2010  jruoho Add IA32_MPERF (E7h) and IA32_APERF (E8h) as MSR_MPERF and MSR_APERF.
 1.45 21-Aug-2010  jruoho Add CPUID_APM_CPB at Fn8000_0007 %edx, for core performance boost.
 1.44 29-Jul-2010  cegger add RDTSCP_AUX MSR
 1.43 24-Jul-2010  cegger add AMD OSVW MSRs
 1.42 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.41 04-May-2010  jym Enable the NX bit feature for Xen i386pae and amd64 kernels.

Tested with Xen 3.1 and Xen 3.3, dom0 and domU, by bouyer@ and jym@.

Ok bouyer@.
 1.40 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.39 03-Apr-2010  jym Fix the comments about cpuid flags, according cpuid documentation by
Intel and AMD.
 1.38 13-Jan-2010  cegger branches: 1.38.2; 1.38.4;
recognize SVM PauseFilter
 1.37 13-Aug-2009  cegger recognize virtual cpu feature indicating guest state.
 1.36 26-May-2009  rmind Add CPU topology detection support for AMD processors.
Tested on the following AMD CPUs:
- Family 15, model 65
- Family 15, model 67
- Family 15, model 75
- Family 16, model 2
- Family 17, model 3

Reviewed (slightly older version of patch) by <yamt>.
 1.35 16-May-2009  pgoyette Correctly identify flag bit for SSSE3 (one of the 'S' was missing). Also
rename AMD bit from SCALL/RET to SYSCALL/SYSRET to match Intel bit name.
 1.34 13-May-2009  pgoyette 1. Extend CPU probe of Intel processors to handle extended-models. This
allows us to properly identify new Intel 45nm processors, Core i7,
Atom, and the 45nm Xeon MP.

2. Properly decode several new Intel cache descriptors, as listed in the
most recent (March 2009) edition of Intel's Application Note 485.

3. Convert decode of the various features masks to use the newly added
snprintb_m(3) routine.

Addresses my PR bin/41289
Addresses my PR bin/41290
 1.33 12-Mar-2009  yamt add definitions for SVM features.
 1.32 12-Mar-2009  yamt comments
 1.31 14-Oct-2008  cegger branches: 1.31.2; 1.31.4; 1.31.8; 1.31.12;
do correct octal counting and use CPUID_APM_FLAGS in cpuctl
 1.30 14-Oct-2008  cegger add cpuid fn 80000007 %edx: AMD Power Management feature flags
 1.29 14-Oct-2008  cegger fix output of 3DNOWPREFETCH feature flag
 1.28 13-Oct-2008  cegger Add cpuid 0x80000001 %ecx features flags. Rename CPUID_MASK4 to CPUID_INTEL_MASK4 for consistency with new CPUID_AMD_MASK4
 1.27 26-Aug-2008  pgoyette Clean up previous: add bit definitions for some new fields, and use "old"
style bitmask_printf(9) format string for consistency with the rest of the
file. No functional change.

OK cegger@
 1.26 24-Aug-2008  pgoyette Shorten SYSCALL/SYSRET to SCALL/RET bit definition so it fits on one line.
 1.25 24-Aug-2008  pgoyette 1. For non-Intel vendors, don't overload cpuflags with the extended
flags from CPUID 80000001_EDX. Instead, keep the extended flags
separate, in ci_feature3_flags (Intel processors already kept a
separate ci_feature3_flag value).

2. Decode/display ci_feature3_flag in a vendor-specific manner, since
the definitions are vendor-specific.

OK cegger@
 1.24 25-May-2008  chris branches: 1.24.4;
Add detection of errata for AMD Family 10h steppings A and 2. Covering
errata:
254: Internal Resource Livelock Involving Cached TLB Reload
261: Processor May Stall Entering Stop-Grant Due to Pending Data
Cache Scrub
298: L2 Eviction May Occur During Processor Operation To Set
Accessed or Dirty Bit
309: Processor Core May Execute Incorrect Instructions on
Concurrent L2 and Northbridge Response
 1.23 03-Feb-2008  xtraeme branches: 1.23.6; 1.23.8; 1.23.10; 1.23.12;
Add DTES64 and SSE4 related bits to CPUID2_FLAGS, from FreeBSD.
 1.22 21-Dec-2007  drochner define the SSSE3 feature flag bit and print out all known bits
 1.21 29-Oct-2007  xtraeme branches: 1.21.2; 1.21.4; 1.21.8;
Add coretemp(4). A new driver for Intel Core's on-die thermal sensor,
available on Intel Core or newer CPUs.

Ported from FreeBSD. Tested by rmind on i386 and joerg on amd64.

Enabled with "options INTEL_CORETEMP".
 1.20 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.19 26-Sep-2007  ad branches: 1.19.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.18 11-Jul-2007  njoly branches: 1.18.8; 1.18.10; 1.18.12;
Display RDTSCP bit on AMD processors (Read Serialized TSC Pair).

ok by xtraeme
 1.17 03-Jul-2007  christos Support for VIA Esther (From FreeBSD)
 1.16 04-Jun-2007  xtraeme Add four missing bits for CPUID2_FLAGS, from FreeBSD.
 1.15 17-Feb-2007  daniel branches: 1.15.2; 1.15.6; 1.15.8; 1.15.14;
Add an opencrypto provider for the AES xcrypt instructions found on VIA
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.

Reviewed on tech-crypto and port-i386, no objections to commiting this.
 1.14 16-Jan-2007  christos PR/35430: Izumi Tsutsui: Identify amd64 CPU on NetBSD/i386
 1.13 11-Jan-2007  ad x86_errata: correct the definition of MSR_HWCR and re-enable. Problem
noted and debugged by Murray Armfield (murray at river-styx.org).
 1.12 01-Jan-2007  ad Report on and where possible, try to work around some of the known errata
for Athlon 64 and Opteron processors. Tested briefly by cube@ and elad@.
 1.11 03-Sep-2006  xtraeme branches: 1.11.2; 1.11.6;
Update the enhanced speedstep driver and sync the code with OpenBSD:

est.c:

* Use a quintuplet (vendor, MHz_hi, mV_hi, MHz_lo, mV_lo } to match
CPUs more correctly than parsing the brand string.
* Add support for a bunch of models.
* Create a fake table on the fly if the CPU is unknown (there's no
table for it) with the current/highest/lowest frequency.

specialreg.h:

* Add some MSRs needed to get the bus clock value.

identcpu.c:

* Add functions specific to Pentium III, Pentium M and Pentium 4 to
get the bus clock value.

Note that the new fake table code from Simon Burge is not included on
this commit.

Ok'ed by simonb and dogcow.
 1.10 24-Aug-2006  cube Display XD for Intel processors (Execution Disable bit support).
 1.9 02-Dec-2005  christos branches: 1.9.4; 1.9.8; 1.9.18;
PR/32216: Nicolas Joly: Missing HTT feature display for Opterons dual-core CPUs
 1.8 21-Feb-2005  he branches: 1.8.4;
Probe and print the Intel Extended Feature Bits, as documented
in the CPUID instruction description in the "Intel Extended Memory 64
Technology Software Developer's Guide, Volume 1 of 2" available at
ftp://download.intel.com/technology/64bitextensions/30083402.pdf

This presently consists of the SYSCALL/SYSRET and the EM64T features.
CPUs with the EM64T feature available should be able to run amd64 code.

Reviewed by fvdl
 1.7 10-Feb-2005  drochner Recognize an obscure cpu feature flag bit "xTPR"
which indicates that Task Priority Messages might
be disabled. Not relevant for the kernel for now
(related to interrupt distribution on the APIC bus
afaict), but present on one of my boxes.
Being here, also recognise the future "Vanderpool"
extension.
 1.6 17-May-2004  joda branches: 1.6.4; 1.6.6;
the EST and TM2 flags in the second cpuid register were swapped
(according AP-485); while here add a few more flags
 1.5 19-Feb-2004  drochner define AMD64's CPUID_NOX bit (I'm curious where Intel puts this bit in the
ia32 extension just announced)
XXX there should be a better separation between generic and vendor
specific feature flags
 1.4 02-Feb-2004  soren Add Pentium M MSR definitions from Michael Eriksson.
 1.3 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.2 25-Apr-2003  fvdl branches: 1.2.2;
Share some common cache info cpuid code between i386 and x86_64.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.6 11-Dec-2005  christos Sync with head.
 1.2.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.2.2.4 15-Feb-2005  skrll Sync with HEAD.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.6.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.6.6.1 12-Feb-2005  yamt sync with head.
 1.6.4.1 29-Apr-2005  kent sync with -current
 1.8.4.8 04-Feb-2008  yamt sync with head.
 1.8.4.7 21-Jan-2008  yamt sync with head
 1.8.4.6 15-Nov-2007  yamt sync with head.
 1.8.4.5 27-Oct-2007  yamt sync with head.
 1.8.4.4 03-Sep-2007  yamt sync with head.
 1.8.4.3 26-Feb-2007  yamt sync with head.
 1.8.4.2 30-Dec-2006  yamt sync with head.
 1.8.4.1 21-Jun-2006  yamt sync with head.
 1.9.18.1 06-Sep-2006  riz Pull up following revision(s) (requested by xtraeme in ticket #111):
sys/arch/x86/include/specialreg.h: revision 1.11
sys/arch/i386/i386/identcpu.c: revision 1.39
sys/arch/i386/include/cpu.h: revision 1.128
sys/arch/i386/i386/est.c: revision 1.26
Update the enhanced speedstep driver and sync the code with OpenBSD:
est.c:
* Use a quintuplet (vendor, MHz_hi, mV_hi, MHz_lo, mV_lo } to match
CPUs more correctly than parsing the brand string.
* Add support for a bunch of models.
* Create a fake table on the fly if the CPU is unknown (there's no
table for it) with the current/highest/lowest frequency.
specialreg.h:
* Add some MSRs needed to get the bus clock value.
identcpu.c:
* Add functions specific to Pentium III, Pentium M and Pentium 4 to
get the bus clock value.
Note that the new fake table code from Simon Burge is not included on
this commit.
Ok'ed by simonb and dogcow.
 1.9.8.1 03-Sep-2006  yamt sync with head.
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.11.6.1 10-Feb-2007  tron Pull up following revision(s) (requested by chs in ticket #411):
sys/arch/x86/include/specialreg.h: revision 1.14
sys/arch/i386/i386/identcpu.c: revision 1.53
PR/35430: Izumi Tsutsui: Identify amd64 CPU on NetBSD/i386
 1.11.2.2 01-Feb-2007  ad Sync with head.
 1.11.2.1 12-Jan-2007  ad Sync with head.
 1.15.14.2 03-Oct-2007  garbled Sync with HEAD
 1.15.14.1 26-Jun-2007  garbled Sync with HEAD.
 1.15.8.1 11-Jul-2007  mjf Sync with head.
 1.15.6.4 03-Dec-2007  ad Sync with HEAD.
 1.15.6.3 09-Oct-2007  ad Sync with head.
 1.15.6.2 15-Jul-2007  ad Sync with head.
 1.15.6.1 09-Jun-2007  ad Sync with head.
 1.15.2.2 17-Feb-2007  daniel Add an opencrypto provider for the AES xcrypt instructions found on VIA
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.

Reviewed on tech-crypto and port-i386, no objections to commiting this.
 1.15.2.1 17-Feb-2007  daniel file specialreg.h was added on branch yamt-idlelwp on 2007-02-17 00:28:26 +0000
 1.18.12.1 06-Oct-2007  yamt sync with head.
 1.18.10.3 23-Mar-2008  matt sync with HEAD
 1.18.10.2 09-Jan-2008  matt sync with HEAD
 1.18.10.1 06-Nov-2007  matt sync with HEAD
 1.18.8.2 29-Oct-2007  joerg Sync with HEAD.
 1.18.8.1 02-Oct-2007  joerg Sync with HEAD.
 1.19.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.21.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.21.4.1 26-Dec-2007  ad Sync with head.
 1.21.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.23.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.23.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.23.10.7 09-Oct-2010  yamt sync with head
 1.23.10.6 11-Aug-2010  yamt sync with head.
 1.23.10.5 11-Mar-2010  yamt sync with head
 1.23.10.4 19-Aug-2009  yamt sync with head.
 1.23.10.3 20-Jun-2009  yamt sync with head
 1.23.10.2 16-May-2009  yamt sync with head
 1.23.10.1 04-May-2009  yamt sync with head.
 1.23.8.1 04-Jun-2008  yamt sync with head
 1.23.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.23.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.23.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.24.4.1 19-Oct-2008  haad Sync with HEAD.
 1.31.12.1 21-Apr-2010  matt sync to netbsd-5
 1.31.8.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.31.8.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.31.8.4 24-Oct-2010  jym Sync with HEAD
 1.31.8.3 01-Nov-2009  jym Sync with HEAD.
 1.31.8.2 31-May-2009  jym Sync with HEAD.
 1.31.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.31.4.4 01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1968):
sys/arch/x86/include/specialreg.h: revision 1.72 via patch

Backport CPUID_TO_*() macros. Old macros are kept for compatibility.
 1.31.4.3 19-Jun-2013  bouyer Pull up following revision(s) (requested by msaitoh in ticket #1847):
sys/arch/x86/include/mtrr.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.25
sys/arch/x86/include/specialreg.h: revision 1.55
Increase MTRR_I686_NVAR_MAX from 8 to 16. Avoids
"FIXME: more than 8 MTRRs (10)" message on booting Thinkpad W520 and
similar. While here replace a magic number with MTRR_I686_NVAR_MAX * 2
 1.31.4.2 28-Nov-2012  riz branches: 1.31.4.2.2;
Pull up following revision(s) (requested by christos in ticket #1819):
sys/arch/x86/include/specialreg.h: revision 1.58
Add VIA Eden FCR MSR.
 1.31.4.1 16-Jun-2009  snj branches: 1.31.4.1.2;
Pull up following revision(s) (requested by rmind in ticket #789):
sys/arch/x86/include/specialreg.h: revision 1.36
sys/arch/x86/x86/cpu_topology.c: revision 1.2
Add CPU topology detection support for AMD processors.
Tested on the following AMD CPUs:
- Family 15, model 65
- Family 15, model 67
- Family 15, model 75
- Family 16, model 2
- Family 17, model 3
Reviewed (slightly older version of patch) by <yamt>.
 1.31.4.2.2.1 01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1968):
sys/arch/x86/include/specialreg.h: revision 1.72 via patch

Backup CPUID_TO_*() macros. Old macros are kept for compatibility.
 1.31.4.1.2.1 01-Jun-2015  sborrill Pull up the following revisions(s) (requested by msaitoh in ticket #1968):
sys/arch/x86/include/specialreg.h: revision 1.72 via patch

Backup CPUID_TO_*() macros. Old macros are kept for compatibility.
 1.31.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.38.4.2 05-Mar-2011  rmind sync with head
 1.38.4.1 30-May-2010  rmind sync with head
 1.38.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.38.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.38.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.49.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.49.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.49.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.53.6.6 02-Jun-2012  mrg sync to latest -current.
 1.53.6.5 29-Apr-2012  mrg sync to latest -current.
 1.53.6.4 06-Mar-2012  mrg sync to -current
 1.53.6.3 06-Mar-2012  mrg sync to -current
 1.53.6.2 04-Mar-2012  mrg sync to latest -current.
 1.53.6.1 18-Feb-2012  mrg merge to -current.
 1.53.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.53.2.4 23-Jan-2013  yamt sync with head
 1.53.2.3 30-Oct-2012  yamt sync with head
 1.53.2.2 23-May-2012  yamt sync with head.
 1.53.2.1 17-Apr-2012  yamt sync with head
 1.55.2.5 26-Jan-2015  martin Pull up the following, requested by msaitoh in ticket #1240:

sys/arch/x86/include/specialreg.h 1.72 via patch

Add CPUID_TO_*() macros to avoid bug. Old macros are kept for compatibility.
See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.55.2.4 29-Dec-2014  martin Pull up the following revisisions, requested by msaitoh in #1220:

sys/arch/x86/include/specialreg.h 1.59-1.71, 1.73-1.81 (patch)

Update x86 special register definitions:
- Add latest CR4 bits.
- Recognize the P1GB and RDTSCP which were AMD-only on Intel HW too.
- Add some missing bit definitions for CPUID2 and those for XCR0.
- Fix CPUID_AMD_FLAGS4 to not try to print bits \41 and \42.
- Correct the comment about the extended family and model bits.
- Add some definitions related to the process extended state
enumeration.
- Add Intel Structured Extended Feature leaf (Fn0000_0007).
- Sort CPUID definitions in initial EAX value.
- Add Intel Deterministic Cache Parameter Leaf (CPUID leaf 4).
- Add some AMD Fn80000001 extended features %ecx bits definitions.
- "s/MXX/MMXX/" because this bit is "MMX eXtention".
- Add some definitions for cpu 'extended state' enumeration
(Fn0000000d).
- Add Energy Performance Bias bit of Fn0000_0006 %ecx.
- Add MSR_IA32_PLATFORM_ID (0x017)
- Modify comment.
- Style fix.
 1.55.2.3 07-May-2012  riz Pull up following revision(s) (requested by christos in ticket #220):
sys/arch/x86/x86/identcpu.c: revision 1.31
sys/arch/x86/include/specialreg.h: revision 1.58
PR/41267: Andrius V: 5.0 RC4 does not detect second CPU in VIA. VIA Eden cpuid
lies about it's ability to do cmpxchg8b. Turn the feature on using the FCR MSR.
Needs pullup to both 5 and 6.
Add VIA Eden FCR MSR.
 1.55.2.2 09-Apr-2012  riz Pull up following revision(s) (requested by chs in ticket #168):
sys/arch/x86/include/specialreg.h: revision 1.57
sys/arch/x86/x86/errata.c: revision 1.20
bring in this change from openbsd:
Implement the AMD suggested workaround for family 10h & 12h errata 721
"Processor May Incorrectly Update Stack Pointer" by setting a bit
marked 'reserved' in an MSR that is only "documented" to exist on 12h.
 1.55.2.1 05-Mar-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #80):
sys/arch/xen/x86/x86_xpmap.c: revision 1.42
sys/arch/x86/include/specialreg.h: revision 1.56
sys/arch/amd64/amd64/machdep.c: revision 1.179
sys/arch/i386/i386/locore.S: revision 1.97
sys/arch/i386/i386/machdep.c: revision 1.723 via patch
sys/arch/x86/include/cpu.h: revision 1.49

Fix possible FPU registers corruption on context switches.
Fix type of pointers passed to some hypercalls.
 1.59.2.5 03-Dec-2017  jdolecek update from HEAD
 1.59.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.59.2.3 23-Jun-2013  tls resync from head
 1.59.2.2 25-Feb-2013  tls resync with head
 1.59.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.63.6.2 18-May-2014  rmind sync with head
 1.63.6.1 28-Aug-2013  rmind sync with head
 1.78.4.6 09-Oct-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #1636):
sys/arch/x86/include/cacheinfo.h: 1.23-1.26
sys/arch/x86/include/cpu.h: 1.70
sys/arch/x86/include/specialreg.h: 1.91-1.93,1.98,1.100,1.102-1.124,1.126,1.130 via patch
sys/arch/x86/x86/cpu_topology.c: 1.10
sys/arch/x86/x86/identcpu.c: 1.56-1.57,1.70 via patch
usr.sbin/cpuctl/arch/i386.c: 1.71,1.75-1.79,1.81-1.85 via patch
Add some register definitions for x86:
- Add CLWB bit.
- Fix a few (unused) MSR values, and add some bit definitions of
MSR_EFER from Murray Armfield in PR#42861.
- CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify
comments and snprintb() string.
- Define CPUID Fn00000001 %ebx bits and use them.
No functional change.
- Add Structured Extended Flags Enumeration Leaf's bit definitions:
AVX512_{IFMA,VBMI2,VNNI,BITALG,VPOPCNTDQ,4VNNIW,4FMAPS},GFNI&VAES.
- Add Turbo Boost Max Technology 3.0 bit.
- Add AMD SVM features definitions.
- Add Intel cpuid 7 %edx IBRS and STIBP bit definitions.
- Fix swapped comments for EFER LME and LMA
- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add MSR_IA32_ARCH_CAPABILITIES definition.
- Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.
- Add Intel Deterministic Address Translation Parameter Leaf(0x18)
definitions.
- s/CLFUSH/CLFLUSH/
- Add AMD's Disable Indirect Branch Predictor bit definition.
- Add the MSR bits definitions for IBRS, STIBP and IBPB.
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.
- Add AMD's CPUID Fn80000001 %edx MMX and FXSR bit definitions.
- Add RDCL_NO and IBRS_ALL.
- Add SSBD and RSBA bit definitions.
- Add AMD's SSB bit definitions for F15H, F16H and F17H.
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
- Add yet another Shared L2 TLB (2M/4M pages).
- Add 3way and 6way of L2 cache or TLB on AMD CPU.
- AMD L3 cache association bitfield is not 8bit but 4bit like others
association bitfields.
- Sort entries. No functional change.
- Modify comment, fix typo in comment and add comment.
cpuctl(8):
- Add detection for Quark X1000, Xeon E5 v4, E7 v4,
Core i7-69xx Extreme Edition, Xeon Scalable (Skylake),
Xeon Phi [357]200 (Knights Landing), Atom (Goldmont),
Atom (Denverton), Future Core (Cannon Lake), Atom (Goldmont Plus),
Xeon Phi 7215, 7285 and 7295 (Knights Mill) and
7th or 8th gen Core (Kaby Lake, Coffee Lake).
- Print Structured Extended Feature leaf Fn0000_0007 %ebx on AMD,too.
- Print Fn0000_0007 %ecx on Intel.
- Print Intel cpuid 7 %edx.
- Parse the TLB info from `cpuid leaf 18H' on Intel processor.
- Use aprint_error_dev() for error output.
 1.78.4.5 08-Dec-2016  snj Pull up following revision(s) (requested by msaitoh in ticket #1285):
sys/arch/x86/include/cacheinfo.h: revision 1.22
sys/arch/x86/include/specialreg.h: revisions 1.87 and 1.90
usr.sbin/cpuctl/arch/i386.c: revisions 1.72-1.74
Changes for x86's cpuctl(8):
- Add Quark X1000, Xeon E[57] v4, Core i7-69xx Extreme, 7th gen Core,
Denverton, Xeon Phi [357]200, Future Xeon and Future Xeon Phi.
- Add SGX, UMIP, RDPID, SGXLC, AVX512DQ, AVX512BW and AVX512VL bit.
- Fix the bit location of CLFLUSHOPT.
- Add new TLB descriptor 0x64 and 0xc4.
 1.78.4.4 06-Mar-2016  martin branches: 1.78.4.4.2;
Pull up the following changes, requested by msaitoh in #1117:

sys/arch/x86/include/cacheinfo.h 1.20-1.21
sys/arch/x86/include/specialreg.h 1.83-1.86
usr.sbin/cpuctl/arch/i386.c 1.67-1.70

Changes for x86's cpuctl(8):
- Add some TLB information (index 0x6a-0x6d).
- Add Hardware-Controlled Performance States (HWP) bits, FPU Data
Pointer Updated Only bit and CLFLUSHOPT bit.
- Add some AMD's bit definitions from "BIOS and Kernel Developer(BKDG)
for AMD Family 15h Models 60h-6Fh Processors".
- Add Xeon E5-4600 v3,
- Add Xeon E3-1200 v4 and v5.
- Add 6th gen Core, Xeon E3-1500 v5 and Xeon D-1500.
- Change CPU family 0x1c from "Atom Family" to "45nm Atom Family"
 1.78.4.3 09-May-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #739):
sys/arch/x86/include/specialreg.h: revision 1.82
usr.sbin/cpuctl/arch/i386.c: revision 1.66
From Intel SDM:
- Add the Silicon Debug bit in CPUID Fn00000001 %ecx
- Add CPUID Fn0000_0007 %ecx bits
- Add comments.
--
Update some Intel CPU models (Sky Lake, Broadwell and Atom X[357]).
 1.78.4.2 09-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #396):
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.6
sys/arch/x86/include/specialreg.h: revision 1.81
Use specialreg.h's definitions.
 1.78.4.1 12-Dec-2014  martin Pull up following revision(s) (requested by msaitoh in ticket #310):
sys/arch/x86/include/specialreg.h: revision 1.79-1.80
usr.sbin/cpuctl/arch/i386.c: revision 1.59
sys/arch/x86/include/cacheinfo.h: revision 1.19

Update some cpuid related values:
- Add XSAVECC, XGETBV, XSAVES, SMAP and PQE
- Change XINUSE to XGETBV
- Add new cache descripter value (0xc3)
- Update signatures for the follwing CPUs:
- Core M-5xxx
- Core i7 Extreme
- Future Core (0x4e)
- Future Xeon (0x56)
 1.78.4.4.2.1 18-Jan-2017  skrll Sync with netbsd-5
 1.80.2.8 28-Aug-2017  skrll Sync with HEAD
 1.80.2.7 05-Feb-2017  skrll Sync with HEAD
 1.80.2.6 05-Oct-2016  skrll Sync with HEAD
 1.80.2.5 29-May-2016  skrll Sync with HEAD
 1.80.2.4 19-Mar-2016  skrll Sync with HEAD
 1.80.2.3 22-Sep-2015  skrll Sync with HEAD
 1.80.2.2 06-Jun-2015  skrll Sync with HEAD
 1.80.2.1 06-Apr-2015  skrll Sync with HEAD
 1.87.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.87.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.87.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.87.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.91.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.97.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.98.2.28 29-Jul-2023  martin Pull up the following revisions, all via patch, requested by msaitoh
in ticket #1853:

sys/arch/x86/include/specialreg.h 1.204-1.206, 1.208

- Add Intel CPUID 0x07 %ecx bit 24 BUS_LOCK_DETECT.
- Add AMD CPUID 0x80000008 %ebx bit 30 IBPB_RET and CPUID 0x8000000a
%edx bit 29 BusLockThreshold.
- Fix typo in comment.
 1.98.2.27 25-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #1851):

sys/arch/x86/include/specialreg.h: revision 1.207
sys/arch/x86/x86/errata.c: revision 1.31

x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.98.2.26 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1827):

sys/arch/x86/include/specialreg.h: revision 1.202
sys/arch/x86/include/specialreg.h: revision 1.203
usr.sbin/cpuctl/arch/i386.c: revision 1.136

Add some CPUID bits from PPR for AMD Family 19h Model 61h Revision B1.

Add AMD CPUID Fn0000_0008 %ebx bit 3 INVLPGB.
 1.98.2.25 23-Jan-2023  martin Pull up the following revisions, requested by msaitoh in ticket #1791:

sys/arch/x86/include/specialreg.h 1.193-1.198

- Add CPUID Fn0000_0006 %eax bit 24 IA32_THERM_INTERRUPT MSR bit 25
Hardware Feedback Notification support.
- Add CPUID Fn0000_0007 %ecx bit 29 ENQCMD.
- Add CPUID Fn0000_0007 %edx bit 1 SGX-KEYS.
- Add CPUID Fn0000_0007 %edx bit 5 UINTR(User INTeRrupts).
- Add CPUID Fn0000_0007 %edx bit 11 RTM_ALWAYS_ABORT.
- Add CPUID Fn0000_0007 %edx bit 22 AMX_BF16.
- Add CPUID Fn0000_0007 %edx bit 23 AVX512_FP16.
- Add CPUID Fn0000_0007 %edx bit 24 AMX_TILE.
- Add CPUID Fn0000_0007 %edx bit 25 AMX_INT8.
- Add CPUID Fn0000_0007 sub-leaf 1 %edx bit 18 CET_SSS.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx definitions.
- Add CPUID Fn0000_000d sub-leaf 1 %eax bit 4 XFD.
- Add CPUID Fn0000_001d Tile Information.
- Add CPUID Fn0000_001e TMUL Information.
- Add CPUID Fn8000_0007 %eax RAS capabilities.
- Add CPUID Fn8000_0008 %ebx BTC_NO,
- Add cpuid Fn8000_000a x2AVIC, VNMI, IBSVIRT and ROGPT.
- Add CPUID Fn8000_001b Instruction-Based Sampling.
- Add CPUID Fn8000_001e Processor Topology Information.
- Add CPUID Fn8000_001f %eax RPMQUERY, VmplSSS, TscAuxVirt,
VmgexitParam, VirtualTomMsr, IbsVirtGuest, SmtProtection,
vsmCommPageMSR and NestedVirtSnpMsr.
- Add CPUID Fn8000_0021 AMD Extended Features Identification 2.
- Add CPUID Fn8000_0022 AMD Extended Performance Monitoring and Debug.
- Rename HW_FEEDBACK to HWI (Hardware Feedback Interface).
- Rename TSX_FORCE_ABORT to RTM_FORCE_ABORT.
- Modify comment. Both Intel and AMD support CPUID Fn0000000b.
- Modify comment. Hybrid Information -> Native Model ID Information.
- Use __BIT(). Add comment. Whitespace fix.
 1.98.2.24 15-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1775):

sys/arch/x86/include/specialreg.h: revision 1.189
usr.sbin/cpuctl/arch/i386.c: revision 1.128
sys/arch/x86/include/specialreg.h: revision 1.190
sys/arch/x86/include/specialreg.h: revision 1.191
sys/arch/x86/include/specialreg.h: revision 1.192

s/shareing/sharing/. No functional change.

Add top-down slots event bit of architectural performance monitoring leaf.

Modify CPUID Fn0000000a %ebx's string. Add new string for %ecx.

Modify output of CPUID Fn0000000a.
old:
cpu0: Perfmon-eax 0x8300805<VERSION=0x5,GPCounter=0x8,GPBitwidth=0x30>
cpu0: Perfmon-eax 0x8300805<Vectorlen=0x8>
cpu0: Perfmon-edx 0x8604<FixedFunc=0x4,FFBitwidth=0x30,ANYTHREADDEPR>
new:
cpu0: Perfmon: Ver. 5
cpu0: Perfmon: General: bitwidth 48, 8 counters
cpu0: Perfmon: General: avail 0xff<CORECYCL,INST,REFCYCL,LLCREF,LLCMISS,BRINST>
cpu0: Perfmon: General: avail 0xff<BRMISPR,TOPDOWNSLOT>
cpu0: Perfmon: Fixed: bitwidth 48, 4 counters
cpu0: Perfmon: Fixed: avail 0xf<INST,CLK_CORETHREAD,CLK_REF_TSC,TOPDOWNSLOT>

Update some AMD CPUID bits:
- Rename FSREP_MOV to FSRM.
- Add Memory Bandwidth Enforcement (MBE)
- Add AMD's PPIN. Rename CPUID_SEF_PPIN to CPUID_SEF_INTEL_PPIN.
- Add Collaborative Processor Performance Control (CPPC).
- Add HOST_MCE_OVERRIDE.
- Add some unknown bits as Bxx.
- Add comments.
- Use __BIT().
 1.98.2.23 31-Jan-2022  martin Pull up the following revisions (all via patch), requested by
msaitoh in ticket #1731:

sys/arch/x86/include/specialreg.h 1.179-1.188

- Add CPUID definitions of Last Branch Record, Thread Director,
AVX version of VNNI, Fast short REP MOV, HRESET, PPIN, Architectural
LBR, Linear Address Masking and Hybrid Information from the latest
Intel SDM.
- Add CPUID definitions of AddrMaskExt, INT_WBINVD, IbrsSameMode,
EferLmsleUnsupported, PSFD and SecureTSC from AMD APM.
- Print CLFSH instead of CLFLUSH because both Intel and AMD documents
say so.
- Modify comment. Add comment. Fix typo. Use __BIT(). KNF. Sort lines.
No functional change.
 1.98.2.22 08-Dec-2021  martin Pull up the following, requested by msaitoh in ticket #1720:

sys/arch/x86/include/specialreg.h 1.146, 1.171,
1.173-1.178 via patch
sys/arch/x86/x86/identcpu.c 1.106, 1.117,
1.122 via patch
sys/arch/x86/x86/pmap.c patch
sys/external/bsd/drm2/drm/drm_cache.c 1.14
usr.sbin/cpuctl/arch/i386.c 1.114-1.117


- Add PT, PKRU, HDC, LA57, PKE, PKS, CET, CET_U, CET_S, HWP, KL,
AVX512_BF16, TME_EN and PCONFIG.
- Rename some macros to match the x86 specification and the other OSes.
- Print CPUID 0x8000008 %ebx on Intel, too.
- Print CPUID leaf 7 subleaf 1.
- Identify Tiger Lake, 3rd gen Xeon Scalable (Ice Lake), Elkhart Lake
and Jasper Lake.
- Remove a few unused MSRs.
- Add comment.
- KNF. Whitespace fix.
 1.98.2.21 05-Aug-2020  martin Accidently not commited for ticket #1595:

sys/arch/x86/include/specialreg.h 1.129 via patch

Add six errata for AMD Family 17h (Ryzen etc).
 1.98.2.20 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1588:

sys/arch/x86/include/specialreg.h 1.162-1.168 via patch

- AMD CPUID Fn8000_000a %edx bit 20 is "SPEC_CTRL".
- Add some bit definitions of AMD's CPUID Fn8000_001f Encrypted Memory
features.
- Add AMD INVLPGB/TLBSYNC hypervisor enable in VMCB and TLBSYNC
intercept bit.
- Add AMD MSR_DE_CFG's bit 1 as DE_CFG_LFENCE_SERIALIZE.
- Add some definitions for Intel:
- Add CPUID leaf 6 %eax bit 19 for HW_FEEDBACK* and
IA32_PACKAGE_TERM* MSRs.
- Add CPUID leaf 7 %ecx bit 31 for Protection Keys.
- Add definition of Load only TLB and Store only TLB.
- Add IF_PSCHANGE_MC_NO bit of IA32_ARCH_CAPABILITIES
- Fix HWP_IGNIDL.
- Add CPUID 7 %edx bit 9 "SRBDS_CTRL"
- Modify comment. Style and fix typo.
 1.98.2.19 15-Apr-2020  martin Pull up the following, requested by msaitoh in ticket #1530:

sys/arch/x86/x86/procfs_machdep.c 1.33-1.36
sys/arch/x86/x86/tsc.c 1.40
sys/arch/x86/x86/specialreg.h 1.159-1.161
usr.sbin/cpuctl/arch/i386.c 1.109-1.110 via patch

- Print avx512ifma, cqm_mbm_total, cqm_mbm_local, waitpkg, rdpru,
Fast Short Rep Mov(fsrm), AVX512_VP2INTERSECT, SERIALIZE and
TSXLDTRK.
- Rename CPUID Fn8000_0007 %edx bit 8 from "TSC" to "ITSC"
(Invariant TSC) to avoid confusion.
- Print CPUID 0x80000007 %edx on both Intel and AMD.
- Remove ci_max_ext_cpuid from usr.sbin/cpuctl/arch/i386.c because it's
the same as ci_cpuid_extlevel.
- Use unsigned to avoid undefined behavior in procfs_getonefeatreg().
 1.98.2.18 31-Jan-2020  martin Pull up the following, requested by msaitoh in ticket #1494:

sys/arch/x86/include/specialreg.h 1.146, 1.151-1.154, 1.156 via patch
usr.sbin/cpuctl/arch/i386.c 1.105-1.107 via patch

- Add definitions of AMD's CPUID Fn8000_0008 %ebx.
- Add definitions of AMD's CPUID Fn8000_001f Encrypted Memory features.
- Add definition of AMD's CPUID Fn8000_000a %edx bit 11 "GMET".
- Define CPUID_AMD_SVM_PFThreshold correctly.
- Modify comment a bit for consistency.
- Call cpu_dcp_cacheinfo() only when the cpuid Topology Extension flag
is set on AMD processor.
- Fix typos.
 1.98.2.17 19-Nov-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1450):

usr.sbin/cpuctl/arch/i386.c: revision 1.108
sys/arch/x86/include/specialreg.h: revision 1.158

Add the following bit definitions from the latest Intel SDM:
- CET shadow stack
- Fast Short REP MOV
- Hybrid part
- CET Indirect Branch Tracking

0x7d and 0x7e are for 10th generation Core (Ice Lake).
 1.98.2.16 12-Nov-2019  martin Pull up following revision(s) (requested by maxv in ticket #1433):

sys/arch/x86/include/specialreg.h: revision 1.157
sys/arch/x86/x86/spectre.c: revision 1.31

Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA).

Two sysctls are added:
machdep.taa.mitigated = {0/1} user-settable
machdep.taa.method = {string} constructed by the kernel

There are two cases:

(1) If the CPU is affected by MDS, then the MDS mitigation will also
mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf
read-only, and force:

machdep.taa.mitigated = machdep.mds.mitigated
machdep.taa.method = [MDS]

The kernel already enables the MDS mitigation by default.

(2) If the CPU is not affected by MDS but is affected by TAA, then we use
the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode
update, now available on the Intel website. The kernel will automatically
enable the TAA mitigation if the updated microcode is present. If the new
microcode is not present, the user can load it via cpuctl, and set
machdep.taa.mitigated=1.
 1.98.2.15 16-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1338):

usr.sbin/cpuctl/arch/i386.c: revision 1.104
sys/arch/x86/x86/identcpu.c: revision 1.93
sys/arch/x86/include/cacheinfo.h: revision 1.28
sys/arch/x86/include/specialreg.h: revision 1.150

- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.98.2.14 17-Jul-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1293):

sys/arch/x86/include/specialreg.h: revision 1.149

Define some new bits of CPUID Fn8000_0007 %edx AMD Advanced Power Management
leaf.
 1.98.2.13 29-May-2019  martin Pullup the following, requested by msaitoh in ticket #1270:

sys/arch/x86/include/specialreg.h 1.143, 1.145 via patch
sys/arch/x86/x86/procfs_machdep.c 1.30

Add TSX_FORCE_ABORT related definitions.
Add cpuid7 edx bit 10 "MD_CLEAR".
 1.98.2.12 14-May-2019  martin Pull up following revision(s) (requested by maxv in ticket #1269):

sys/arch/amd64/amd64/locore.S: revision 1.181 (adapted)
sys/arch/amd64/amd64/amd64_trap.S: revision 1.47 (adapted)
sys/arch/x86/include/specialreg.h: revision 1.144 (adapted)
sys/arch/amd64/include/frameasm.h: revision 1.43 (adapted)
sys/arch/x86/x86/spectre.c: revision 1.27 (adapted)

Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS).
It requires a microcode update, now available on the Intel website. The
microcode modifies the behavior of the VERW instruction, and makes it flush
internal CPU buffers. We hotpatch the return-to-userland path to add VERW.

Two sysctls are added:

machdep.mds.mitigated = {0/1} user-settable
machdep.mds.method = {string} constructed by the kernel

The kernel will automatically enable the mitigation if the updated
microcode is present. If the new microcode is not present, the user can
load it via cpuctl, and set machdep.mds.mitigated=1.
 1.98.2.11 12-Feb-2019  martin Actually pull up rev 1.139 (as claimed, but not done in previous),
requested by msaitoh in ticket #1187:

Fix bitstring format of Intel CPUID Architectural Performance Monitoring
Fn0000000a %ebx.
 1.98.2.10 11-Feb-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1187):

usr.sbin/cpuctl/arch/i386.c: revision 1.92
sys/arch/x86/include/specialreg.h: revision 1.138

Add new CPUID flags WAITPKG, CLDEMOTE, MOVDIRI, MOVDIR64B and
IA32_CORE_CAPABILITIES from the latest Intel SDM.

Add Ice Lake and Tremont from the latest Intel SDM.

Fix bitstring format of Intel CPUID Architectural Performance Monitoring
Fn0000000a %ebx.
 1.98.2.9 27-Dec-2018  martin Pull up following revision(s) (requested by maxv in ticket #1148):

sys/arch/x86/x86/identcpu.c: revision 1.81
sys/arch/x86/x86/identcpu.c: revision 1.82
sys/arch/x86/x86/identcpu.c: revision 1.84
sys/arch/x86/include/specialreg.h: revision 1.131

Declare the MSR_VIA_ACE values as macros, and use a consistent naming,
similar to the rest of the file.

I'm wondering if I'm not fixing a huge bug here. The ECX8 value we were
using was wrong: ECX8 is bit 1, not bit 0. Bit 0 is ALTINST, an alternate
ISA, which is now known to be backdoored.

So it looks like we were explicitly enabling the backdoor.

Not tested, because I don't have a VIA cpu.

-

Merge the VIA detection code into cpu_probe_c3.

-

Explicitly disable ALTINST on VIA, in case it isn't disabled by default
already (the 'VIA cpu backdoor').
 1.98.2.8 04-Dec-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1120):

usr.sbin/cpuctl/arch/i386.c: revision 1.85
usr.sbin/cpuctl/arch/i386.c: revision 1.86
usr.sbin/cpuctl/arch/i386.c: revision 1.87
usr.sbin/cpuctl/arch/i386.c: revision 1.88
usr.sbin/cpuctl/arch/i386.c: revision 1.89
usr.sbin/cpuctl/arch/i386.c: revision 1.90
sys/arch/x86/include/specialreg.h: revision 1.132
sys/arch/x86/include/specialreg.h: revision 1.133
sys/arch/x86/include/specialreg.h: revision 1.134
sys/arch/x86/include/specialreg.h: revision 1.135
sys/arch/x86/include/specialreg.h: revision 1.136
sys/arch/x86/x86/cpu_topology.c: revision 1.14

Add MAWAU (for BND{LD,ST}X instruction) from the latest Intel SDM.

Whitespace fix. No functional change.

Modify comment. No functional change:
- AMD also has CPUID 0x06 and 0x0d.
- PCOMMIT was obsoleted.
- Use ci_feat_val[7] as CPUID 7 %edx to match x86/cpu.h
- AMD also has CPUID 6.
- Remove unused code for coretemp.
- Consistently use descs[] instead of data[].
- AMD also reports CPUID 7's highest subleaf. Print it.
- Use macro.
Add Intel CPUID Extended Topology Enumeration Fn0000000b definitions.
Decode package, core and SMT id if CPUID 0x0b is available on Intel processor.

If the value is different from the kernel value, we should fix the kernel code.

TODO: Use 0x1f if it's available.

Add Intel/AMD MONITOR/MWAIT leaf.
Decode Intel/AMD MONITOR/MWAIT leaf.

Add Intel CPUID Architectural Performance Monitoring leaf Fn0000000a.

Print Intel CPUID Architectural Performance Monitoring leaf Fn0000000a.
 1.98.2.7 23-Sep-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1026):

sys/arch/x86/x86/procfs_machdep.c: revision 1.24
sys/arch/x86/include/specialreg.h: revision 1.130

OK'd by maxv:
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
 1.98.2.6 13-Jul-2018  martin Pull up following revision(s) (requested by maya in ticket #912):

sys/arch/x86/x86/identcpu.c: revision 1.79
sys/arch/x86/include/specialreg.h: revision 1.127

Disable MWAIT/MONITOR on Apollo Lake CPUs to workaround APL30 errata.

We use MWAIT/MONITOR to hatch secondary CPUs. The errata means that
the wakeup may not happen, so SMP boot fails.
Use wrmsr to disable it in hardware too, for extra paranoia.

PR port-amd64/53420,
also reported on netbsd-users by joern clausen and ssartor.
 1.98.2.5 09-Jun-2018  martin Pullup the following revisions, requested by maxv in ticket #865:

sys/arch/amd64/amd64/machdep.c 1.303 (patch)
sys/arch/amd64/conf/GENERIC 1.492 (patch)
sys/arch/amd64/conf/files.amd64 1.103 (patch)
sys/arch/i386/i386/machdep.c 1.806 (patch)
sys/arch/i386/conf/GENERIC 1.1179 (patch)
sys/arch/i386/conf/files.i386 1.393 (patch)
sys/arch/x86/include/cpu.h 1.91 (patch)
sys/arch/x86/include/specialreg.h upto 1.126 (patch)
sys/arch/x86/x86/x86_machdep.c upto 1.115 (patch, adapted)
sys/arch/x86/x86/spectre.c upto 1.19 (patch, adapted,
no IBRS,
SpectreV2 mitigations not
enabled by default)

Backport the hardware SpectreV2 and SpectreV4 mitigations.
 1.98.2.4 18-Apr-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #778):

sys/arch/x86/include/specialreg.h: revision 1.118,1.119

From the latest Intel SDM:
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.

Add Some bit definitions of AMD Fn80000001 %edx:
- MMX
- FXSR
 1.98.2.3 31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #678):

sys/arch/x86/include/specialreg.h: revision 1.115-1.117,1.120

Add IC_CFG.DIS_IND: "Disable Indirect Branch Predictor". Available (at
least) on AMD Families 10h, 12h and 16h.

Add the IBRS and STIBP MSRs.

... and also add IBPB ...

Add RDCL_NO and IBRS_ALL.
 1.98.2.2 16-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #633):
sys/arch/x86/include/specialreg.h: revision 1.107
sys/arch/x86/include/specialreg.h: revision 1.108
sys/arch/x86/include/specialreg.h: revision 1.109
sys/arch/x86/include/cacheinfo.h: revision 1.23
sys/arch/x86/include/specialreg.h: revision 1.110
sys/arch/x86/include/specialreg.h: revision 1.111
sys/arch/x86/include/specialreg.h: revision 1.112
sys/arch/x86/include/specialreg.h: revision 1.113
sys/arch/x86/include/specialreg.h: revision 1.114
usr.sbin/cpuctl/arch/i386.c: revision 1.79
sys/arch/x86/x86/identcpu.c: revision 1.70
sys/arch/x86/include/specialreg.h: revision 1.106

Add comment.

Add Intel cpuid 7 %edx IBRS(IBPB Speculation Control) and
STIBP(STIBP Speculation Control) from OpenBSD.

Print Intel cpuid 7 %edx.

Example output of cpuctl -v identify 0:
+cpu0: 00000007: 00000000 000027ab 00000000 0c000000
(snip)
+cpu0: SEF edx 0xc000000<IBRS,STIBP>

fix swapped comments for EFER LME and LMA

- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add comment.
Add MSR_IA32_ARCH_CAPABILITIES definition.

Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.

Add Intel Deterministic Address Translation Parameter Leaf(0x18) definitions.

Sort entries. No functional change.

s/CLFUSH/CLFLUSH/
No functional change.
 1.98.2.1 21-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #365):
sys/arch/x86/include/specialreg.h: revision 1.99
usr.sbin/cpuctl/arch/i386.c: revision 1.75
usr.sbin/cpuctl/arch/i386.c: revision 1.76
usr.sbin/cpuctl/arch/i386.c: revision 1.77
usr.sbin/cpuctl/arch/i386.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.56
sys/arch/x86/x86/identcpu.c: revision 1.57
sys/arch/x86/x86/cpu_topology.c: revision 1.10
sys/arch/x86/include/specialreg.h: revision 1.100
sys/arch/x86/include/specialreg.h: revision 1.101
sys/arch/x86/include/specialreg.h: revision 1.102
sys/arch/x86/include/specialreg.h: revision 1.103
sys/arch/x86/include/specialreg.h: revision 1.104
sys/arch/x86/include/specialreg.h: revision 1.105
Add EFER_TCE. This would be an interesting feature to have, since it
reduces the indirect cost of invlpg; but I'm not convinced the way we
flush upper-levels is correct for this yet.
Fix typo in comment
Add a comment about APICBASE_PHYSADDR. Has to do with PR/42597.
Define CPUID Fn00000001 %ebx bits and use them. No functional change.
Set ci->ci_cflush_lsize correctly. This bug was added in the last commit(1.56).
Add the following instruction bits in Structured Extended Flags Enumeration
Leaf from "Intel Architecture Instruction Set Extensions and Future Features
Programming Reference" (319433-030):
AVX512_IFMA
AVX512_VBMI
AVX512_VBMI2
GFNI
VAES
VPCLMULQDQ
AVX512_VNNI
AVX512_BITALG
AVX512_VPOPCNTDQ
AVX512_4VNNIW
AVX512_4FMAPS
- Print ci_feat_val[5] (Structured Extended Feature leaf Fn0000_0007 %ebx) on
AMD, too.
- Print ci_feat_val[6] (Fn0000_0007 %ecx) on Intel.
Update from the latest Intel SDM:
0x5c: Atom (Goldmont)
0x5f: Atom (Goldmont, Denverton)
0x7a: Atom (Goldmont Plus)
Add Turbo Boost Max Technology 3.0 bit.
Update from Intel SDM:
0x55: Xeon Scalable (Skylake)
0x57: Xeon Phi [357]200 (Knights Landing)
0x66: Future Core (Cannon Lake)
0x85: Future Xeon Phi (Knights Mill)
Add the following bits in AMD Fn8000000a %edx features (SVM features):
PFThreshold (PAUSE filter threshold)
AVIC (AMD virtual interrupt controller)
V_VMSAVE_VMLOAD (virtualized VMSAVE and VMLOAD)
vGIF (virtualized GIF)
 1.112.2.8 18-Jan-2019  pgoyette Synch with HEAD
 1.112.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.112.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.112.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.112.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.112.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.112.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.112.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.126.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.126.2.1 10-Jun-2019  christos Sync with HEAD
 1.150.2.16 20-Jul-2024  martin Pull up following revision(s) (requested by andvar in ticket #1855):

sys/arch/x86/x86/identcpu.c: revision 1.129
sys/arch/x86/include/specialreg.h: revision 1.212
sys/arch/x86/x86/identcpu.c: revision 1.130

Disable the VIA Alternate Instructions according the VIA documentation:
* C7 and above do not support ALTINST, do not check or attempt to disable them.
* For VIA C3 Nehemiah check extended feature flags for support and status,
do no attempt to disable when AIS is not supported or enabled.
* For pre-Nehemiah models explicitly disable, if they are in the range
of documented models, flags aren't present to check the status on
these models.

Note: for pre-Nehemiah may be other functional side effects depdending
on the version and stepping.

Explicit disabling of ALTINST was introduced with rev. 1.84 following
the discovery of some VIA CPUs having these instructions enabled by default
leading to the potential backdoor (aka rosenbrindge).

Unfortunately, implementation used a wrong check (ACE supported flag),
which can be true for the later models, still supporting padlock features.

Setting ALTINST bit on those may have unexpected side effects like VIA C7 CPUID
instruction for temperature sensor not reporting correct value or
`cpuctl identify' not reporting certain CPU features. Similar side effects
can be observed even for Nehemiah models not supporting AIS instructions. This
change should limit possibility of such issues to only the pre-Nehemiah models,
not covered at all in the previous implementation.

Feature Control Register (FCR) macros were unified under one group and
consistent naming while implementing the change. Few comments updated as well.
patch reviewed by Riastradh@ (thank you)

PR kern/58370

Move determination of the largest VIA CPU extended function value
to the intended place where the checks are performed.
Currently the value can be overridden while checking for the padlock features,
and failing the check for max function value as a result.
 1.150.2.15 29-Jul-2023  martin Pull up the following revisions, all via patch, requested by msaitoh
in ticket #1669:

sys/arch/x86/include/specialreg.h 1.204-1.206, 1.208

- Add Intel CPUID 0x07 %ecx bit 24 BUS_LOCK_DETECT.
- Add AMD CPUID 0x80000008 %ebx bit 30 IBPB_RET and CPUID 0x8000000a
%edx bit 29 BusLockThreshold.
- Fix typo in comment.
 1.150.2.14 25-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #1664):

sys/arch/x86/include/specialreg.h: revision 1.207
sys/arch/x86/x86/errata.c: revision 1.31

x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.150.2.13 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1646):

sys/arch/x86/include/specialreg.h: revision 1.202
sys/arch/x86/include/specialreg.h: revision 1.203
usr.sbin/cpuctl/arch/i386.c: revision 1.136

Add some CPUID bits from PPR for AMD Family 19h Model 61h Revision B1.

Add AMD CPUID Fn0000_0008 %ebx bit 3 INVLPGB.
 1.150.2.12 23-Jan-2023  martin Pull up the following revisions, requested by msaitoh in ticket #1574:

sys/arch/x86/include/specialreg.h 1.193-1.198

- Add CPUID Fn0000_0006 %eax bit 24 IA32_THERM_INTERRUPT MSR bit 25
Hardware Feedback Notification support.
- Add CPUID Fn0000_0007 %ecx bit 29 ENQCMD.
- Add CPUID Fn0000_0007 %edx bit 1 SGX-KEYS.
- Add CPUID Fn0000_0007 %edx bit 5 UINTR(User INTeRrupts).
- Add CPUID Fn0000_0007 %edx bit 11 RTM_ALWAYS_ABORT.
- Add CPUID Fn0000_0007 %edx bit 22 AMX_BF16.
- Add CPUID Fn0000_0007 %edx bit 23 AVX512_FP16.
- Add CPUID Fn0000_0007 %edx bit 24 AMX_TILE.
- Add CPUID Fn0000_0007 %edx bit 25 AMX_INT8.
- Add CPUID Fn0000_0007 sub-leaf 1 %edx bit 18 CET_SSS.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx definitions.
- Add CPUID Fn0000_000d sub-leaf 1 %eax bit 4 XFD.
- Add CPUID Fn0000_001d Tile Information.
- Add CPUID Fn0000_001e TMUL Information.
- Add CPUID Fn8000_0007 %eax RAS capabilities.
- Add CPUID Fn8000_0008 %ebx BTC_NO,
- Add cpuid Fn8000_000a x2AVIC, VNMI, IBSVIRT and ROGPT.
- Add CPUID Fn8000_001b Instruction-Based Sampling.
- Add CPUID Fn8000_001e Processor Topology Information.
- Add CPUID Fn8000_001f %eax RPMQUERY, VmplSSS, TscAuxVirt,
VmgexitParam, VirtualTomMsr, IbsVirtGuest, SmtProtection,
vsmCommPageMSR and NestedVirtSnpMsr.
- Add CPUID Fn8000_0021 AMD Extended Features Identification 2.
- Add CPUID Fn8000_0022 AMD Extended Performance Monitoring and Debug.
- Rename HW_FEEDBACK to HWI (Hardware Feedback Interface).
- Rename TSX_FORCE_ABORT to RTM_FORCE_ABORT.
- Modify comment. Both Intel and AMD support CPUID Fn0000000b.
- Modify comment. Hybrid Information -> Native Model ID Information.
- Use __BIT(). Add comment. Whitespace fix.
 1.150.2.11 15-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1542):

sys/arch/x86/include/specialreg.h: revision 1.189
sys/dev/nvmm/x86/nvmm_x86.c: revision 1.23
usr.sbin/cpuctl/arch/i386.c: revision 1.128
sys/arch/x86/include/specialreg.h: revision 1.190
sys/arch/x86/include/specialreg.h: revision 1.191
sys/arch/x86/include/specialreg.h: revision 1.192

s/shareing/sharing/. No functional change.

Add top-down slots event bit of architectural performance monitoring leaf.

Modify CPUID Fn0000000a %ebx's string. Add new string for %ecx.

Modify output of CPUID Fn0000000a.
old:
cpu0: Perfmon-eax 0x8300805<VERSION=0x5,GPCounter=0x8,GPBitwidth=0x30>
cpu0: Perfmon-eax 0x8300805<Vectorlen=0x8>
cpu0: Perfmon-edx 0x8604<FixedFunc=0x4,FFBitwidth=0x30,ANYTHREADDEPR>
new:
cpu0: Perfmon: Ver. 5
cpu0: Perfmon: General: bitwidth 48, 8 counters
cpu0: Perfmon: General: avail 0xff<CORECYCL,INST,REFCYCL,LLCREF,LLCMISS,BRINST>
cpu0: Perfmon: General: avail 0xff<BRMISPR,TOPDOWNSLOT>
cpu0: Perfmon: Fixed: bitwidth 48, 4 counters
cpu0: Perfmon: Fixed: avail 0xf<INST,CLK_CORETHREAD,CLK_REF_TSC,TOPDOWNSLOT>

Update some AMD CPUID bits:
- Rename FSREP_MOV to FSRM.
- Add Memory Bandwidth Enforcement (MBE)
- Add AMD's PPIN. Rename CPUID_SEF_PPIN to CPUID_SEF_INTEL_PPIN.
- Add Collaborative Processor Performance Control (CPPC).
- Add HOST_MCE_OVERRIDE.
- Add some unknown bits as Bxx.
- Add comments.
- Use __BIT().
 1.150.2.10 31-Jan-2022  martin Pull up the following revisions (all via patch), requested by msaitoh
in ticket #1417:

sys/arch/x86/include/specialreg.h 1.179-1.188

- Add CPUID definitions of Last Branch Record, Thread Director,
AVX version of VNNI, Fast short REP MOV, HRESET, PPIN, Architectural
LBR, Linear Address Masking and Hybrid Information from the latest
Intel SDM.
- Add CPUID definitions of AddrMaskExt, INT_WBINVD, IbrsSameMode,
EferLmsleUnsupported, PSFD and SecureTSC from AMD APM.
- Print CLFSH instead of CLFLUSH because both Intel and AMD documents
say so.
- Modify comment. Add comment. Fix typo. Use __BIT(). KNF. Sort lines.
No functional change.
 1.150.2.9 08-Dec-2021  martin Pull up the following revisions, requested by msaitoh in ticket #1391:

sys/arch/x86/include/specialreg.h 1.171, 1.173-1.178
sys/arch/x86/x86/identcpu.c 1.106, 1.117,
1.122 via patch
sys/dev/nvmm/x86/nvmm_x86.c 1.18
sys/external/bsd/drm2/drm/drm_cache.c 1.14
sys/external/bsd/drm2/include/asm/cpufeature.h 1.5
usr.sbin/cpuctl/arch/i386.c 1.114-1.117


- Add LA57, PKE, PKS, CET, CET_U, CET_S, HWP, KL, AVX512_BF16, TME_EN
and PCONFIG.
- Rename some macros to match the x86 specification and the other OSes.
- Print CPUID 0x8000008 %ebx on Intel, too.
- Print CPUID leaf 7 subleaf 1.
- Identify Tiger Lake, 3rd gen Xeon Scalable (Ice Lake), Elkhart Lake
and Jasper Lake.
- Add comment.
- KNF. Whitespace fix.
 1.150.2.8 04-Sep-2020  martin Pull up following revision(s) (requested by maxv in ticket #1076):

sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.75
sys/arch/x86/include/specialreg.h: revision 1.172
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.72

nvmm-x86-vmx: fix detection of the BIOS lock

If it's locked, ensure it's locked with VMX enabled. If it's not locked,
then lock it ourselves with VMX enabled.

Should fix NetBSD PR/55596.

-

Add a few more CPUID flags.

-

nvmm-x86-svm: check the SVM revision
Only revision 1 exists, but check it, for future-proofness.
 1.150.2.7 13-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #998):

sys/arch/x86/include/specialreg.h: revision 1.162
sys/arch/x86/include/specialreg.h: revision 1.164
sys/arch/x86/include/specialreg.h: revision 1.165
sys/arch/x86/include/specialreg.h: revision 1.166
sys/arch/x86/include/specialreg.h: revision 1.167
sys/arch/x86/include/specialreg.h: revision 1.168

- AMD CPUID Fn8000_000a %edx bit 20 is "SPEC_CTRL".
- Add some bit definitions of AMD's CPUID Fn8000_001f Encrypted Memory
features.
- Add AMD INVLPGB/TLBSYNC hypervisor enable in VMCB and TLBSYNC intercept bit.
- Modify comment.
Add AMD MSR_DE_CFG's bit 1 as DE_CFG_LFENCE_SERIALIZE.
This bit makes lfence instruction serializing.
Add some definitions from the latest Intel SDM plus small fix:
- Add CPUID leaf 6 %eax bit 19 for HW_FEEDBACK* and IA32_PACKAGE_TERM* MSRs.
- Add CPUID leaf 7 %ecx bit 31 for Protection Keys.
- Add definition of Load only TLB and Store only TLB.
- Add IF_PSCHANGE_MC_NO bit of IA32_ARCH_CAPABILITIES
- Fix HWP_IGNIDL.
Add SRBDS_CTRL bit.
style and fix typo
 1.150.2.6 14-Apr-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #833):

usr.sbin/cpuctl/arch/i386.c: revision 1.109
sys/arch/x86/include/specialreg.h: revision 1.159
usr.sbin/cpuctl/arch/i386.c: revision 1.110
sys/arch/x86/include/specialreg.h: revision 1.160
sys/arch/x86/include/specialreg.h: revision 1.161
sys/arch/x86/x86/tsc.c: revision 1.40
sys/arch/x86/x86/procfs_machdep.c: revision 1.35
sys/arch/x86/x86/procfs_machdep.c: revision 1.36

Add Fast Short Rep Mov(fsrm).

Add AVX512_VP2INTERSECT, SERIALIZE and TSXLDTRK(TSX suspend load addr tracking)

CPUID Fn00000001 %edx bit 8 is printed as "TSC", so rename CPUID Fn8000_0007
%edx bit 8 from "TSC" to "ITSC" (Invariant TSC) to avoid confusion.

Rename CPUID_APM_TSC to CPUID_APM_ITSC. No functional change.

Remove ci_max_ext_cpuid because it's the same as ci_cpuid_extlevel.

Print CPUID 0x80000007 %edx on both Intel and AMD.
 1.150.2.5 19-Nov-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #452):

usr.sbin/cpuctl/arch/i386.c: revision 1.108
sys/arch/x86/include/specialreg.h: revision 1.158

Add the following bit definitions from the latest Intel SDM:
- CET shadow stack
- Fast Short REP MOV
- Hybrid part
- CET Indirect Branch Tracking
0x7d and 0x7e are for 10th generation Core (Ice Lake).
 1.150.2.4 12-Nov-2019  martin Pull up following revision(s) (requested by maxv in ticket #419):

sys/arch/x86/include/specialreg.h: revision 1.157
sys/arch/x86/x86/spectre.c: revision 1.31

Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA).

Two sysctls are added:
machdep.taa.mitigated = {0/1} user-settable
machdep.taa.method = {string} constructed by the kernel

There are two cases:

(1) If the CPU is affected by MDS, then the MDS mitigation will also
mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf
read-only, and force:

machdep.taa.mitigated = machdep.mds.mitigated
machdep.taa.method = [MDS]

The kernel already enables the MDS mitigation by default.

(2) If the CPU is not affected by MDS but is affected by TAA, then we use
the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode
update, now available on the Intel website. The kernel will automatically
enable the TAA mitigation if the updated microcode is present. If the new
microcode is not present, the user can load it via cpuctl, and set
machdep.taa.mitigated=1.
 1.150.2.3 10-Nov-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #407):

sys/arch/x86/include/specialreg.h: revision 1.156

- GMET is not bit 11 but 17.
- Add unknown CPUID Fn8000_000a %edx bit 20.
 1.150.2.2 17-Oct-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #344):

sys/arch/x86/include/specialreg.h: revision 1.154
sys/arch/x86/include/specialreg.h: revision 1.155
usr.sbin/cpuctl/arch/i386.c: revision 1.107
sys/arch/x86/x86/procfs_machdep.c: revision 1.34

- Add definitions of AMD's CPUID Fn8000_001f Encrypted Memory features.
- Add definition of AMD's CPUID Fn8000_000a %edx bit 11 "GMET".
- Define CPUID_AMD_SVM_PFThreshold correctly.
- Modify comment a bit for consistency.

Fix AMD Fn8000_0001f %eax bit 0's name.

Add rdpru.
 1.150.2.1 26-Sep-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #241):

sys/arch/x86/include/specialreg.h: revision 1.152
sys/arch/x86/include/specialreg.h: revision 1.153
usr.sbin/cpuctl/arch/i386.c: revision 1.105
sys/arch/x86/x86/spectre.c: revision 1.30
sys/arch/x86/include/specialreg.h: revision 1.151

Add definitions of AMD's CPUID Fn8000_0008 %ebx.
Decode AMD's CPUID Fn8000_0008 %ebx.
Use macro.
Add MCOMMIT instruction.
Define CPUID_CAPEX_FLAGS's bit 10 correctly.
 1.161.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.175.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.176.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.198.2.6 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #919):

sys/arch/x86/x86/errata.c: revision 1.28
sys/arch/x86/x86/errata.c: revision 1.29
sys/arch/x86/include/specialreg.h: revision 1.209
usr.sbin/cpuctl/arch/i386.c: revision 1.144
sys/arch/x86/x86/errata.c: revision 1.30
sys/arch/x86/x86/errata.c: revision 1.33
sys/arch/x86/x86/errata.c: revision 1.34
sys/arch/x86/x86/errata.c: revision 1.35
sys/arch/x86/include/specialreg.h: revision 1.210
sys/arch/x86/include/specialreg.h: revision 1.211

x86/errata.c: Link to original AMD errata guide.

This one is no longer updated; need to link to newer ones for
individual families too. That's where all the cryptic nomenclature
comes from here.

x86/errata.c: Say what revision we're searching for.

x86/errata.c: Only say the errata revision search for cpu0.

x86: make the CPUID list for errata be far less confusing
the 0x80000001 CPUID result needs some parsing to match against
actual family/model/stepping values. 4-bit 'family' values of
15 or 6 change how to parse the 4-bit extended model and 8-bit
extended family value - for family 6 or 15, the extended model
bits (4) are concatenated with the base 4-bits to create an
8-bit value, and for family 15, the family value is addition
of the family value and the 8-bit extended-family value, giving
a range of 0 to 15 + 0xff aka 270.

use a CPUREV(family, model, stepping) macro that builds the
relevant bit-representation of a CPUID, making it far easier
to understand what each entry means, and to add new ones too.
i have confirmed that the emitted cpurevs[] array has the same
values before/after this change, ie, NFCI or observed.

x86: add names for errata that don't have actual numbers
zenbleed is reported as "erratum 65535" currently, this adds a name
for it, and enables the name for any others as well.
pull logging into a function with a tag message.

x86: handle AMD errata 1474: A CPU core may hang after about 1044 days
from the new comment:
* This requires disabling CC6 power level, which can be a performance
* issue since it stops full turbo in some implementations (eg, half the
* cores must be in CC6 to achieve the highest boost level.) Set a timer
* to fire in 1000 days -- except NetBSD timers end up having a signed
* 32-bit hz-based value, which rolls over in under 25 days with HZ=1000,
* and doing xcall(9) or kthread(9) from a callout is not allowed anyway,
* so just have a kthread wait 1 day for 1000 times.
documented in:
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/56323-PUB_1_01.pdf

add MSR stuff for AMD errata 1474.

cpuctl: fix i386 bit descriptions for CPUID_SEF_FLAGS1
warning: non-printing character '\31' in description
'BUS_LOCK_DETECT""b\31' [363]
s/RPMQUERY/RMPQUERY/
 1.198.2.5 20-Jul-2024  martin Pull up following revision(s) (requested by andvar in ticket #738):

sys/arch/x86/x86/identcpu.c: revision 1.129
sys/arch/x86/include/specialreg.h: revision 1.212
sys/arch/x86/x86/identcpu.c: revision 1.130

Disable the VIA Alternate Instructions according the VIA documentation:
* C7 and above do not support ALTINST, do not check or attempt to disable them.
* For VIA C3 Nehemiah check extended feature flags for support and status,
do no attempt to disable when AIS is not supported or enabled.
* For pre-Nehemiah models explicitly disable, if they are in the range
of documented models, flags aren't present to check the status on
these models.

Note: for pre-Nehemiah may be other functional side effects depdending
on the version and stepping.

Explicit disabling of ALTINST was introduced with rev. 1.84 following
the discovery of some VIA CPUs having these instructions enabled by default
leading to the potential backdoor (aka rosenbrindge).

Unfortunately, implementation used a wrong check (ACE supported flag),
which can be true for the later models, still supporting padlock features.

Setting ALTINST bit on those may have unexpected side effects like VIA C7 CPUID
instruction for temperature sensor not reporting correct value or
`cpuctl identify' not reporting certain CPU features. Similar side effects
can be observed even for Nehemiah models not supporting AIS instructions. This
change should limit possibility of such issues to only the pre-Nehemiah models,
not covered at all in the previous implementation.

Feature Control Register (FCR) macros were unified under one group and
consistent naming while implementing the change. Few comments updated as well.
patch reviewed by Riastradh@ (thank you)

PR kern/58370

Move determination of the largest VIA CPU extended function value
to the intended place where the checks are performed.
Currently the value can be overridden while checking for the padlock features,
and failing the check for max function value as a result.
 1.198.2.4 29-Jul-2023  martin Pull up the following revisions, all via patch, requested by msaitoh
in ticket #250:

sys/arch/x86/include/specialreg.h 1.204-1.206, 1.208

- Add Intel CPUID 0x07 %ecx bit 24 BUS_LOCK_DETECT.
- Add AMD CPUID 0x80000008 %ebx bit 30 IBPB_RET and CPUID 0x8000000a
%edx bit 29 BusLockThreshold.
- Fix typo in comment.
 1.198.2.3 25-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #243):

sys/arch/x86/include/specialreg.h: revision 1.207
sys/arch/x86/x86/errata.c: revision 1.31

x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.198.2.2 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #200):

sys/arch/x86/include/specialreg.h: revision 1.202
sys/arch/x86/include/specialreg.h: revision 1.203
usr.sbin/cpuctl/arch/i386.c: revision 1.136

Add some CPUID bits from PPR for AMD Family 19h Model 61h Revision B1.

Add AMD CPUID Fn0000_0008 %ebx bit 3 INVLPGB.
 1.198.2.1 23-Jan-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #56):

sys/arch/x86/include/specialreg.h: revision 1.200
sys/arch/x86/include/specialreg.h: revision 1.201
sys/arch/x86/include/specialreg.h: revision 1.199

Use __BIT(). Add comment. Whitespace. No functional change.

Update definitions from the latest Intel SDM.
- Rename HW_FEEDBACK to HWI (Hardware Feedback Interface).
- Add CPUID Fn0000_0006 %eax bit 24 IA32_THERM_INTERRUPT MSR bit 25 Hardware
Feedback Notification support.
- Add CPUID Fn0000_0007 %ecx bit 29 ENQCMD.
- Add CPUID Fn0000_0007 %edx bit 1 SGX-KEYS.
- Add CPUID Fn0000_0007 %edx bit 5 UINTR(User INTeRrupts).
- Add CPUID Fn0000_0007 %edx bit 1 RTM_ALWAYS_ABORT.
- Rename TSX_FORCE_ABORT to RTM_FORCE_ABORT.
- Add CPUID Fn0000_0007 %edx bit 22 AMX_BF16.
- Add CPUID Fn0000_0007 %edx bit 23 AVX512_FP16.
- Add CPUID Fn0000_0007 %edx bit 24 AMX_TILE.
- Add CPUID Fn0000_0007 %edx bit 25 AMX_INT8.
- Add CPUID Fn0000_0007 sub-leaf 1 %edx bit 18 CET_SSS.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 0 PSFD.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 1 IPRED_CTRL.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 2 RRSBA_CTRL.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 3 DDPD_U.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 4 BHI_CTRL.
- Add CPUID Fn0000_0007 sub-leaf 2 %edx bit 5 MCDT_NO.
- Modify comment. Both Intel and AMD support CPUID Fn0000000b.
- Add CPUID Fn0000_000d sub-leaf 1 %eax bit 4 XFD.
- Modify comment. Hybrid Information -> Native Model ID Information.
- Add CPUID Fn0000_001d Tile Information.
- Add CPUID Fn0000_001e TMUL Information.

Fix comment.
 1.211.2.1 02-Aug-2025  perseant Sync with HEAD
 1.15 19-Jun-2020  maxv localify
 1.14 13-Jul-2018  maxv Remove the X86PMC code I had written, replaced by tprof. Many defines
become unused in specialreg.h, so remove them. We don't want to add
defines all the time, there are countless PMCs on many generations, and
it's better to just inline the event/unit values.
 1.13 12-Jul-2018  maxv Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.
 1.12 12-Jul-2017  maxv branches: 1.12.4; 1.12.6;
Properly handle overflows, and take them into account in userland.
 1.11 10-Mar-2017  maxv branches: 1.11.6;
Switch to per-CPU PMC results, and completely rewrite the pmc(1) tool. Now
the PMCs are system-wide, fine-grained and more tunable by the user.

We don't do application tracking, since it would require to store the PMC
values in mdproc and starting/stopping the counters on each context switch.
While this doesn't seem to be particularly difficult to achieve, I don't
think it is really interesting; and if someone really wants to measure
the performance of an application, they can simply schedctl it to a cpu
and look at the PMC results for this cpu.

Note that several options are implemented but not yet used.
 1.10 08-Mar-2017  maxv Add a version argument, set to 1, and check it in usr.bin/pmc. Use uint32_t
instead uint8_t since we now need 12bit selectors (10h family). And while
here KNF.
 1.9 07-Jul-2010  chs branches: 1.9.18; 1.9.36; 1.9.40; 1.9.44;
add the guts of TLS support on amd64. based on joerg's patch,
reworked by me to support 32-bit processes as well.
we now keep %fs and %gs loaded with the user values
while in the kernel, which means we don't need to
reload them when returning to user mode.
 1.8 21-Mar-2009  ad branches: 1.8.2; 1.8.4;
PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash

Fix numerous problems:

1. LDT updates are not atomic.

2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.

3. LDTR can be leaked over context switch.

4. GDT slot allocations can race, giving the same LDT slot to two procs.

5. Incomplete interrupt/trap frames can be stacked.

6. In some rare cases segment faults are not handled correctly.
 1.7 28-Apr-2008  martin branches: 1.7.8; 1.7.10; 1.7.14;
Remove clause 3 and 4 from TNF licenses
 1.6 10-Nov-2007  ad branches: 1.6.14; 1.6.16; 1.6.18;
- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.
 1.5 17-Oct-2007  garbled branches: 1.5.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.4 16-Sep-2007  ragge branches: 1.4.4;
i386 -> __i386__
 1.3 23-Jun-2007  dsl branches: 1.3.2; 1.3.10; 1.3.12; 1.3.14;
Split x86_set/get_ldt() so they are callable with kernel buffers.
For linux emulation code.
 1.2 16-Apr-2007  ad branches: 1.2.2; 1.2.4; 1.2.6;
Fix error in previous.
 1.1 16-Apr-2007  ad Share the sysarch stuff between the x86 ports. PR kern/36046.
 1.2.6.5 03-Dec-2007  ad Sync with HEAD.
 1.2.6.4 09-Oct-2007  ad Sync with head.
 1.2.6.3 15-Jul-2007  ad Sync with head.
 1.2.6.2 09-Jun-2007  ad Sync with head.
 1.2.6.1 16-Apr-2007  ad file sysarch.h was added on branch vmlocking on 2007-06-09 21:37:04 +0000
 1.2.4.2 07-May-2007  yamt sync with head.
 1.2.4.1 16-Apr-2007  yamt file sysarch.h was added on branch yamt-idlelwp on 2007-05-07 10:55:05 +0000
 1.2.2.2 03-Oct-2007  garbled Sync with HEAD
 1.2.2.1 26-Jun-2007  garbled Sync with HEAD.
 1.3.14.4 15-Nov-2007  yamt sync with head.
 1.3.14.3 27-Oct-2007  yamt sync with head.
 1.3.14.2 03-Sep-2007  yamt sync with head.
 1.3.14.1 23-Jun-2007  yamt file sysarch.h was added on branch yamt-lazymbuf on 2007-09-03 14:31:21 +0000
 1.3.12.2 09-Jan-2008  matt sync with HEAD
 1.3.12.1 06-Nov-2007  matt sync with HEAD
 1.3.10.2 11-Nov-2007  joerg Sync with HEAD.
 1.3.10.1 02-Oct-2007  joerg Sync with HEAD.
 1.3.2.2 11-Jul-2007  mjf Sync with head.
 1.3.2.1 23-Jun-2007  mjf file sysarch.h was added on branch mjf-ufs-trans on 2007-07-11 20:03:16 +0000
 1.4.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.5.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.6.18.3 11-Aug-2010  yamt sync with head.
 1.6.18.2 04-May-2009  yamt sync with head.
 1.6.18.1 16-May-2008  yamt sync with head.
 1.6.16.1 18-May-2008  yamt sync with head.
 1.6.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.7.14.3 24-Oct-2010  jym Sync with HEAD
 1.7.14.2 01-Nov-2009  jym Sync with HEAD.
 1.7.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.7.10.1 04-Apr-2009  snj Pull up following revision(s) (requested by ad in ticket #656):
sys/arch/amd64/amd64/gdt.c: revision 1.21 via patch
sys/arch/amd64/amd64/machdep.c: revision 1.129 via patch
sys/arch/i386/i386/gdt.c: revision 1.47 via patch
sys/arch/i386/i386/kvm86.c: revision 1.17 via patch
sys/arch/i386/i386/locore.S: revision 1.85 via patch
sys/arch/i386/i386/machdep.c: revision 1.666 via patch
sys/arch/i386/i386/vector.S: revision 1.45 via patch
sys/arch/i386/include/pcb.h: revision 1.47 via patch
sys/arch/x86/include/pmap.h: revision 1.22 via patch
sys/arch/x86/include/sysarch.h: revision 1.8 via patch
sys/arch/x86/x86/pmap.c: revision 1.80 via patch
sys/arch/x86/x86/sys_machdep.c: revision 1.17 via patch
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.143 via patch
sys/kern/init_main.c: revision 1.384 via patch
PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash
Fix numerous problems:
1. LDT updates are not atomic.
2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.
3. LDTR can be leaked over context switch.
4. GDT slot allocations can race, giving the same LDT slot to two procs.
5. Incomplete interrupt/trap frames can be stacked.
6. In some rare cases segment faults are not handled correctly.
 1.7.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.8.4.1 05-Mar-2011  rmind sync with head
 1.8.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.9.44.1 21-Apr-2017  bouyer Sync with HEAD
 1.9.40.1 20-Mar-2017  pgoyette Sync with HEAD
 1.9.36.1 28-Aug-2017  skrll Sync with HEAD
 1.9.18.1 03-Dec-2017  jdolecek update from HEAD
 1.11.6.1 01-Aug-2017  snj Pull up following revision(s) (requested by maxv in ticket #164):
distrib/sets/lists/base/md.amd64: revision 1.269
distrib/sets/lists/debug/md.amd64: revision 1.97
sys/arch/amd64/conf/GENERIC: revision 1.460
sys/arch/amd64/conf/files.amd64: revision 1.89
sys/arch/i386/conf/GENERIC: revision 1.1157
sys/arch/i386/conf/files.i386: revision 1.379
sys/arch/i386/i386/i386_trap.S: revision 1.7-1.8
sys/arch/i386/include/frameasm.h: revision 1.16
sys/arch/x86/include/sysarch.h: revision 1.12
sys/arch/x86/x86/pmc.c: revision 1.8-1.10
sys/arch/x86/x86/sys_machdep.c: revision 1.36
sys/arch/xen/conf/files.compat: revision 1.26
sys/secmodel/suser/secmodel_suser.c: revision 1.43
sys/sys/kauth.h: revision 1.74
usr.bin/pmc/Makefile: revision 1.5
usr.bin/pmc/pmc.1: revision 1.12-1.13
usr.bin/pmc/pmc.c: revision 1.24-1.25
style
--
style
--
Disable interrupts for T_NMI (inline calltrap). Note that there's still a
way to evade the NMI mode here, if a segment register faults in
INTRFASTEXIT; but we don't care. I didn't test this change, but it seems
fine enough.
--
Make the PMC syscalls privileged.
--
Check argc, and add a message.
--
include opt_pmc.h
--
Build the pmc tool on amd64.
--
Properly handle overflows, and take them into account in userland.
--
Update.
--
Enable PMCs by default.
--
Sort sections. Fix macro usage.
 1.12.6.1 10-Jun-2019  christos Sync with HEAD
 1.12.4.1 28-Jul-2018  pgoyette Sync with HEAD
 1.3 15-Jul-2018  maxv Remove unused x86/include/tprof.h, there should be no need for this kind
of includes.
 1.2 24-Feb-2009  yamt branches: 1.2.62; 1.2.64;
- rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.1 01-Jan-2008  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.18; 1.1.26; 1.1.32;
a simple performance monitor based profiler, inspired from linux oprofile.
 1.1.32.2 01-Nov-2009  jym Sync with HEAD.
 1.1.32.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.26.1 03-Mar-2009  skrll Sync with HEAD.
 1.1.18.1 04-May-2009  yamt sync with head.
 1.1.8.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.8.1 01-Jan-2008  mjf file tprof.h was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.1.6.2 21-Jan-2008  yamt sync with head
 1.1.6.1 01-Jan-2008  yamt file tprof.h was added on branch yamt-lazymbuf on 2008-01-21 09:40:09 +0000
 1.1.4.2 09-Jan-2008  matt sync with HEAD
 1.1.4.1 01-Jan-2008  matt file tprof.h was added on branch matt-armv6 on 2008-01-09 01:49:49 +0000
 1.1.2.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.2.1 01-Jan-2008  bouyer file tprof.h was added on branch bouyer-xeni386 on 2008-01-02 21:51:21 +0000
 1.2.64.1 10-Jun-2019  christos Sync with HEAD
 1.2.62.1 28-Jul-2018  pgoyette Sync with HEAD
 1.3 14-Mar-2020  maxv style
 1.2 07-Aug-2003  agc branches: 1.2.192;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.2.192.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3 24-Aug-2009  jmcneill Add vga_post_set_vbe for setting video mode.
 1.2 29-Mar-2008  jmcneill branches: 1.2.4; 1.2.18;
Add RCSID to top of file.
 1.1 25-Dec-2007  joerg branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.16;
Add initial version of calling VGA POST from vga_resume. This is the
equivalent to "vbetool post" using x86emu in the kernel.
 1.1.16.1 03-Apr-2008  mjf Sync with HEAD.
 1.1.10.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.10.1 25-Dec-2007  mjf file vga_post.h was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.1.8.2 21-Jan-2008  yamt sync with head
 1.1.8.1 25-Dec-2007  yamt file vga_post.h was added on branch yamt-lazymbuf on 2008-01-21 09:40:10 +0000
 1.1.6.2 09-Jan-2008  matt sync with HEAD
 1.1.6.1 25-Dec-2007  matt file vga_post.h was added on branch matt-armv6 on 2008-01-09 01:49:49 +0000
 1.1.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.4.1 25-Dec-2007  bouyer file vga_post.h was added on branch bouyer-xeni386 on 2008-01-02 21:51:21 +0000
 1.1.2.2 26-Dec-2007  ad Sync with head.
 1.1.2.1 25-Dec-2007  ad file vga_post.h was added on branch vmlocking2 on 2007-12-26 19:17:17 +0000
 1.2.18.1 01-Nov-2009  jym Sync with HEAD.
 1.2.4.1 16-Sep-2009  yamt sync with head
 1.10 29-Jun-2020  riastradh padlock(4): Remove legacy rijndael API use.

This doesn't actually need to compute AES -- it just needs the
standard AES key schedule, so use the BearSSL constant-time key
schedule implementation.

XXX Compile-tested only.
XXX The byte-order business here seems highly questionable.
 1.9 27-Feb-2016  tls Remove callout-based RNG support in VIA crypto driver; add VIA RNG backend for cpu_rng.
 1.8 13-Apr-2015  riastradh Convert arch/x86 to use <sys/rnd*.h>. Omit needless includes.
 1.7 19-Nov-2011  tls branches: 1.7.8; 1.7.26;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.6 19-Feb-2011  jmcneill branches: 1.6.4;
modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.5 01-Apr-2009  drochner branches: 1.5.4; 1.5.6; 1.5.8;
sort out what is needed for crash(8) and what not, should fix
recent build errors
 1.4 01-Apr-2009  tls Fix probe for VIA C3 and successors -- these are CPU family 6, not 5.
The broken probe was causing the VIA padlock driver to never attach!
Now we can see that its AES appears to be broken -- it makes FAST_IPSEC
ESP not work, on systems where it works fine with cryptosoft.

Rework code to detect and (if necessary) enable VIA crypto and RNG.
Add RNG support to VIA padlock driver. In the process, have a quick
go at debugging the AES support but no luck thus far.
 1.3 07-Mar-2009  ad Expose more stuff if _KMEMUSER is defined.
 1.2 16-Apr-2008  cegger branches: 1.2.4; 1.2.12; 1.2.18;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.1 17-Feb-2007  daniel branches: 1.1.2; 1.1.4; 1.1.46;
Add an opencrypto provider for the AES xcrypt instructions found on VIA
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.

Reviewed on tech-crypto and port-i386, no objections to commiting this.
 1.1.46.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.4.2 26-Feb-2007  yamt sync with head.
 1.1.4.1 17-Feb-2007  yamt file via_padlock.h was added on branch yamt-lazymbuf on 2007-02-26 09:08:49 +0000
 1.1.2.2 17-Feb-2007  daniel Add an opencrypto provider for the AES xcrypt instructions found on VIA
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.

Reviewed on tech-crypto and port-i386, no objections to commiting this.
 1.1.2.1 17-Feb-2007  daniel file via_padlock.h was added on branch yamt-idlelwp on 2007-02-17 00:28:26 +0000
 1.2.18.3 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.2.18.2 01-Nov-2009  jym Sync with HEAD.
 1.2.18.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.12.1 28-Apr-2009  skrll Sync with HEAD.
 1.2.4.1 04-May-2009  yamt sync with head.
 1.5.8.1 05-Mar-2011  bouyer Sync with HEAD
 1.5.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.4.1 05-Mar-2011  rmind sync with head
 1.6.4.1 17-Apr-2012  yamt sync with head
 1.7.26.2 19-Mar-2016  skrll Sync with HEAD
 1.7.26.1 06-Jun-2015  skrll Sync with HEAD
 1.7.8.1 03-Dec-2017  jdolecek update from HEAD
 1.42 24-Feb-2025  imil Check for RTC presence to avoid hang with QEMU microvm and rtc=off
parameter.

Test bits 0-6 of MC146818's Register D, which must be 0 according to
the specification. This prevents a later hang in rtcget() when no RTC
is present.
 1.41 25-Jan-2023  riastradh branches: 1.41.6;
x86/intr: Work around sleazy clockintr with a secret frame argument.

PR kern/57197
 1.40 24-Jan-2023  riastradh x86/isa/clock.c: Nix trailing whitespace.

No functional change intended.
 1.39 29-May-2020  rin branches: 1.39.20;
For struct timecounter, use C99 initializers.
Compile tested. No functional changes intended.
 1.38 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.37 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.36 16-Oct-2019  christos branches: 1.36.6;
Add and use __FPTRCAST, requested by uwe@
 1.35 16-Oct-2019  christos add void * casts for the clock interrupt handlers.
 1.34 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.33 16-Jun-2009  bouyer branches: 1.33.56; 1.33.64;
Split mc146818-related functions from clock.c into rtc.c.
Call rtc_set_ymdhms() from xen/xen/clock.c:xen_rtc_set() for xen3 dom0
kernels as the Xen3 hypervisor doesn't write the new date/time to the CMOS
by itself.
Now a XEN3_DOM0 kernel properly updates the CMOS time.
 1.32 07-Apr-2009  dyoung Detach sysbeep0 at shutdown.
 1.31 16-Dec-2008  christos branches: 1.31.2;
replace bitmask_snprintf(9) with snprintb(3)
 1.30 11-May-2008  ad branches: 1.30.6; 1.30.8; 1.30.14;
Fix the qemu (?) problem.
 1.29 10-May-2008  ad Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.
 1.28 06-Apr-2008  cherry branches: 1.28.2; 1.28.4; 1.28.6;
Correct comment about struct timecounter field
 1.27 05-Mar-2008  cube Cosmetics: use device_t and cfdata_t.
 1.26 04-Mar-2008  cube sysbeep has no softc, use CFATTACH_DECL_NEW.
 1.25 19-Jan-2008  kardel branches: 1.25.2; 1.25.6;
unbreak i8254_get_timecount() in environments where the
clock interrupt is derived from other sources (e.g. lapic)
and the i8254 timer is running the full cycle without
being used as clock interrupt source.
 1.24 17-Jan-2008  lukem Remove unnecessary references to config_time.h.
 1.23 16-Jan-2008  chuck fix clock accounting problem in i8254_get_timecount that caused
the auich auich_calibrate() function to get the wrong ac97 freq
(may cause audio to play at wrong speed on some systems). this
error was inadvertently introduced in rev 1.98 of the old
src/sys/arch/i386/isa/clock.c (2006/09/03) and manifests itself
on systems that do not use an alternate timecounter (e.g. ACPI-Fast).

the basic problem is that the code that handled when the i8254
counter wrapped was firing in cases when it shouldn't have,
causing the counter to run fast. a more detailed discussion
can be found here:
http://mail-index.netbsd.org/tech-kern/2008/01/15/0001.html
http://mail-index.netbsd.org/tech-kern/2008/01/16/0000.html
 1.22 04-Jan-2008  dyoung Remove superfluous #if (NPCPPI > 0).
 1.21 04-Jan-2008  dyoung Move #endif to the place where it belongs. Thanks, Chavdar Ivanov,
for noticing this.
 1.20 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.19 04-Jan-2008  christos add missing includes
 1.18 03-Jan-2008  he Declare sysbeepdetach(), and protect a small additional section
with #if (NPCPPI > 0).
 1.17 03-Jan-2008  dyoung Support detachment of pchb(4) and sysbeep(4).
 1.16 28-Dec-2007  joerg Remove delaytab and just compute the reminder directly. This requires
two muls and a shift, which needs at most 2ms on a 25MHz i386 and should
end up as fast as delay(1) was before due to using a reminder of 2.
Discussed with ad@.
 1.15 09-Dec-2007  jmcneill branches: 1.15.2;
Merge jmcneill-pm branch.
 1.14 04-Dec-2007  ad branches: 1.14.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.
 1.13 14-Nov-2007  ad branches: 1.13.2;
- Remove I486_CPU, I586_CPU, I686_CPU options. They buy us nothing and
clutter the code significantly.
- Remove pccons.
 1.12 26-Oct-2007  joerg branches: 1.12.2;
Match delay/DELAY on x86 with delay(9). It takes an unsigned int as
argument. Use this and replace the inline assembly (mul + div using the
64bit intermediate result) with normal 32bit multiplication and
division. The compiler can turn the division into a multiplication and
shift, making it even cheaper then the original assembly. For extreme
long delays, just use 64bit arithmetic.
 1.11 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.10 26-Sep-2007  ad branches: 1.10.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.9 09-Jul-2007  ad branches: 1.9.8; 1.9.10; 1.9.12;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.8 08-Dec-2006  yamt branches: 1.8.2; 1.8.8; 1.8.10; 1.8.16;
- pass intrframe by-pointer, not by-value.
- make i386 and xen use per-cpu interrupt stack.

xen part is reviewed by Manuel Bouyer.
 1.7 16-Nov-2006  christos branches: 1.7.2; 1.7.4;
__unused removal on arguments; approved by core.
 1.6 13-Oct-2006  hannken More __unused (NPCPPI == 0 case).
 1.5 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.4 07-Sep-2006  gdamore branches: 1.4.2; 1.4.4; 1.4.6; 1.4.8;
Use common todr_settime_ymdhms/gettime_ymdhms.
While here, fix an incorrect test for timeset (that's in kern_todr already),
and an incorrect to time_second (instead of using the date passed in).
 1.3 04-Sep-2006  gdamore Remove unused todr_setcal/todr_getcal and all the assorted stub
implementations.
 1.2 04-Sep-2006  perry Undo static declaration on gettick -- lapic.c uses it.
Pointed out by Geoff Wing (mason at primenet.com.au)
 1.1 04-Sep-2006  perry switch to a common clock.c
 1.4.8.2 10-Dec-2006  yamt sync with head.
 1.4.8.1 22-Oct-2006  yamt sync with head
 1.4.6.2 14-Sep-2006  yamt sync with head.
 1.4.6.1 07-Sep-2006  yamt file clock.c was added on branch yamt-pdpolicy on 2006-09-14 12:31:22 +0000
 1.4.4.2 09-Sep-2006  rpaulo sync with head
 1.4.4.1 07-Sep-2006  rpaulo file clock.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.4.2.2 12-Jan-2007  ad Sync with head.
 1.4.2.1 18-Nov-2006  ad Sync with head.
 1.7.4.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.7.2.2 17-May-2008  bouyer Pull up following revision(s) (requested by kardel in ticket #1058):
sys/arch/x86/isa/clock.c: revision 1.25 via patch
unbreak i8254_get_timecount() in environments where the
clock interrupt is derived from other sources (e.g. lapic)
and the i8254 timer is running the full cycle without
being used as clock interrupt source.
 1.7.2.1 21-Jan-2008  bouyer Pull up following revision(s) (requested by chuck in ticket #1049):
src/sys/arch/x86/isa/clock.c 1.23 via patch
fixes clock accounting problem in i8254_get_timecount that caused
the auich auich_calibrate() function to get the wrong ac97 freq
(may cause audio to play at wrong speed on some systems).
 1.8.16.1 03-Oct-2007  garbled Sync with HEAD
 1.8.10.1 11-Jul-2007  mjf Sync with head.
 1.8.8.3 03-Dec-2007  ad Sync with HEAD.
 1.8.8.2 09-Oct-2007  ad Sync with head.
 1.8.8.1 10-Apr-2007  ad Replace some more locks.
 1.8.2.8 17-Mar-2008  yamt sync with head.
 1.8.2.7 21-Jan-2008  yamt sync with head
 1.8.2.6 07-Dec-2007  yamt sync with head
 1.8.2.5 15-Nov-2007  yamt sync with head.
 1.8.2.4 27-Oct-2007  yamt sync with head.
 1.8.2.3 03-Sep-2007  yamt sync with head.
 1.8.2.2 30-Dec-2006  yamt sync with head.
 1.8.2.1 08-Dec-2006  yamt file clock.c was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.9.12.1 06-Oct-2007  yamt sync with head.
 1.9.10.3 23-Mar-2008  matt sync with HEAD
 1.9.10.2 09-Jan-2008  matt sync with HEAD
 1.9.10.1 06-Nov-2007  matt sync with HEAD
 1.9.8.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.9.8.5 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.9.8.4 28-Nov-2007  jmcneill Register with power management framework.
 1.9.8.3 21-Nov-2007  joerg Sync with HEAD.
 1.9.8.2 28-Oct-2007  joerg Sync with HEAD.
 1.9.8.1 02-Oct-2007  joerg Sync with HEAD.
 1.10.2.2 18-Nov-2007  bouyer Sync with HEAD
 1.10.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.12.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.12.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.12.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.12.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.13.2.2 26-Dec-2007  ad Sync with head.
 1.13.2.1 08-Dec-2007  ad Sync with head.
 1.14.2.1 11-Dec-2007  yamt sync with head.
 1.15.2.4 20-Jan-2008  bouyer Sync with HEAD
 1.15.2.3 19-Jan-2008  bouyer Sync with HEAD
 1.15.2.2 08-Jan-2008  bouyer Sync with HEAD
 1.15.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.25.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.25.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.25.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.25.2.1 24-Mar-2008  keiichi sync with head.
 1.28.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.28.4.3 20-Jun-2009  yamt sync with head
 1.28.4.2 04-May-2009  yamt sync with head.
 1.28.4.1 16-May-2008  yamt sync with head.
 1.28.2.1 18-May-2008  yamt sync with head.
 1.30.14.1 21-Apr-2010  matt sync to netbsd-5
 1.30.8.1 19-Jun-2009  snj Pull up following revision(s) (requested by bouyer in ticket #816):
sys/arch/amd64/conf/files.amd64: revision 1.68
sys/arch/i386/conf/files.i386: revision 1.350
sys/arch/x86/include/rtc.h: revision 1.1
sys/arch/x86/isa/clock.c: revision 1.33
sys/arch/x86/isa/rtc.c: revision 1.1
sys/arch/xen/conf/files.xen: revision 1.100
sys/arch/xen/xen/clock.c: revision 1.50 via patch
Split mc146818-related functions from clock.c into rtc.c.
Call rtc_set_ymdhms() from xen/xen/clock.c:xen_rtc_set() for xen3 dom0
kernels as the Xen3 hypervisor doesn't write the new date/time to the CMOS
by itself.
Now a XEN3_DOM0 kernel properly updates the CMOS time.
 1.30.6.2 28-Apr-2009  skrll Sync with HEAD.
 1.30.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.31.2.3 01-Nov-2009  jym Sync with HEAD.
 1.31.2.2 23-Jul-2009  jym Sync with HEAD.
 1.31.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.33.64.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.33.64.1 10-Jun-2019  christos Sync with HEAD
 1.33.56.1 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.36.6.1 18-Apr-2020  bouyer Centralize initialisations of delay_func and initclock_func
in x86_machdep.c and export from <x86/machdep.h>
Introduce a x86_dummy_initclock() and a x86_cpu_initclock_func pointer,
to be used later for Xen HVM native clock support.
rename rtclock_tval to x86_rtclock_tval and export from <x86/machdep.h>,
for the benefit of lapic.c
 1.39.20.1 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #136):

sys/arch/x86/x86/intr.c: revision 1.164
sys/arch/x86/isa/clock.c: revision 1.41
sys/arch/x86/include/intr_private.h: revision 1.1

x86/intr: Work around sleazy clockintr with a secret frame argument.
PR kern/57197
 1.41.6.1 02-Aug-2025  perseant Sync with HEAD
 1.53 03-Oct-2025  thorpej Use device_{get,set}prop_bool() for "no-legacy-devices".
 1.52 15-Apr-2022  jmcneill Disable FADT LEGACY_DEVICES flag test.

This test had the unintended side-effect of blocking the lm(4) driver
from attaching on more than one system. Go back to (slow) probing of
ISA devices for now to restore existing functionality.
 1.51 17-Dec-2021  skrll Correct copypaste comment grammar.
 1.50 17-Dec-2021  skrll Trailing whitespace
 1.49 16-Oct-2021  jmcneill Skip legacy device detection for VMware guests with ACPI enabled.
 1.48 15-Oct-2021  jmcneill Add missing acpi include
 1.47 15-Oct-2021  jmcneill If ACPI indicates that there are no user visible devices on the LPC or ISA
bus, set the "no-legacy-devices" property on isa to bypass indirect
configuration of ISA devices.
 1.46 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.45 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.44 11-Feb-2019  cherry branches: 1.44.10;
We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.43 25-Dec-2018  cherry Excise XEN specific code out of x86/x86/intr.c into xen/x86/xen_intr.c

While at it, separate the source function tracking so that the interrupt
paths are truly independant.

Use weak symbol exporting to provision for future PVHVM co-existence
of both files, but with independant paths. Introduce assembler code
such that in a unified scenario, native interrupts get first priority
in spllower(), followed by XEN event callbacks. IPL management and
semantics are unchanged - native handlers and xen callbacks are
expected to maintain their ipl related semantics.

In summary, after this commit, native and XEN now have completely
unrelated interrupt handling mechanisms, including
intr_establish_xname() and assembler stubs and intr handler
management.

Happy Christmas!
 1.42 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.41 03-Dec-2018  cherry Allow isa_machdep.c to compile in the case of no ioapic support.
 1.40 10-Sep-2018  cherry Remove the last usage of xen_pirq_alloc() and pirq_establish()
outside of the x86 interrupt and xen events framework.

This allows us to finally unify the interrupt path for both Xen
and x86 as changes 'internal' to the subsystem.

This change has been kindly tested on real hardware by gson@

The change is not cosmetic and may thus affect users on various
hardware configurations - especially involving legacy hardware.

I look forward to bug reports.
 1.39 24-Jun-2018  jdolecek branches: 1.39.2;
add support for kern.intr.list aka intrctl(8) 'list' for xen

event_set_handler() and pirq_establish() now have extra intrname
parameter; shared intr_create_intrid() is used to provide the value

xen drivers were changed to pass the specific driver instance
name as the xname, e.g. 'vcpu0 clock' instead just 'clock', or
'xencons0' instead of 'xencons'

associated evcnt is now changed to use intrname - this matches native x86
 1.38 13-Dec-2017  bouyer branches: 1.38.2;
Fixes for physical interrupts on Xen:
- do not cast int * to intr_handle_t *, they're not the same size
- legacy_irq is not always -1 for ioapic interrupts, test pic_type instead
- change irq2port[] to hold (port + 1) so that 0 is an invalid value
- add KASSERTs to make sure vect, port or irq values extracted from arrays are
valid (or that they are invalid before write)
- for the !ioapic case, we still need to do PHYSDEVOP_ASSIGN_VECTOR and
bind_pirq_to_evtch().

now XEN3_DOM0 boots again
 1.37 04-Nov-2017  cherry Retire xen/x86/intr.c and use the new xen specific glue in x86/x86/intr.c

The purpose of this change is to expose the x86/include/intr.h API
to drivers. Specifically the following functions:

void *intr_establish_xname(...);
void *intr_establish(...);
void intr_disestablish(...);

while maintaining the old API from xen/include/evtchn.h, specifically
the following functions:

int event_set_handler(...);
int event_remove_handler(...);

This is so that if things break, we can keep using the old API until
everything stabilises. This is a stepping stone towards getting the
actual XEN event callback path rework code in place - which can be
done opaquely behind the intr.h API - NetBSD/XEN specific drivers that
have been ported to the intr.h API should then work without
significant further modifications.
 1.36 21-Jul-2017  cherry Fix uninitialised use of variable mpih

Pointed out by joerg@
 1.35 16-Jul-2017  cherry branches: 1.35.2;
Remove the xen specific interrupt type for the x86 intr_handle_t
For this to work, we use the evtchn.c:get_pirq_to_evtchn() glue
function to make things easier.
 1.34 15-Oct-2016  jdolecek provide intr xname
 1.33 27-Apr-2015  knakahara branches: 1.33.2;
add intr_handle_t and let pci_intr_handle_t use it.
 1.32 28-Feb-2012  mbalmer branches: 1.32.2; 1.32.16;
cosmetic, spelling, and grammar adjustments
 1.31 18-Oct-2011  dyoung branches: 1.31.2; 1.31.6;
Factor device_isa_register() and device_pci_register() out of
device_register() and stick the new routines into isa_machdep.c and
pci_machdep.c, respectively.
 1.30 01-Sep-2011  christos Add bus_dma overrides. From dyoung
 1.29 27-Aug-2011  christos use c99 struct initializers
 1.28 19-Aug-2009  dyoung isa_detach_hook() needs two arguments, the first an isa_chipset_tag_t.
 1.27 18-Aug-2009  dyoung These are stragglers from my last commit ("Let us safely detach
the ISA bus and devices attaching to the ISA bus"). Define
isa_detach_hook() in MD ISA implementations. Define isa_dmadestroy().
 1.26 19-Apr-2009  ad cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.25 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.24 18-Dec-2008  cegger branches: 1.24.2;
remove unused malloc.h
 1.23 03-Jul-2008  drochner branches: 1.23.4;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.22 27-Jun-2008  cegger struct device * -> device_t
 1.21 27-Jun-2008  cegger ansify
 1.20 30-May-2008  ad branches: 1.20.2;
Add a 'known_mpsafe' argument to intr_establish().
 1.19 28-Apr-2008  martin branches: 1.19.2;
Remove clause 3 and 4 from TNF licenses
 1.18 17-Oct-2007  garbled branches: 1.18.16; 1.18.18; 1.18.20;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.17 09-Jul-2007  ad branches: 1.17.10;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.16 21-Feb-2007  mrg branches: 1.16.4; 1.16.6; 1.16.12;
add a pair of new bus_dma(9) functions:
int _bus_dmatag_subregion(bus_dma_tag_t tag,
bus_addr_t min_addr,
bus_addr_t max_addr,
bus_dma_tag_t *newtag,
int flags)
void _bus_dmatag_destroy(bus_dma_tag_t tag)

that allow a (normally broken/limited) device to restrict the bus address
range it can talk to. this is used by bce(4) to limit DMA addresses to
1GB range, the maximum the chip can address.

all this is from Yorick Hardy <yhardy@uj.ac.za> with input from several
people on tech-kern.

XXX: bus_dma(9) needs an update still.
 1.15 16-Nov-2006  christos branches: 1.15.4;
__unused removal on arguments; approved by core.
 1.14 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.13 11-Dec-2005  christos branches: 1.13.20; 1.13.22;
merge ktrace-lwp.
 1.12 16-Apr-2005  yamt branches: 1.12.2;
tweak x86 bus_dma code so that it can be used by xen port.

- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.11 20-Jun-2004  thorpej branches: 1.11.4; 1.11.10;
Remove the "ID" component of the x86 bus_dma flags, since these are no
longer "ISA DMA" specific flags.
 1.10 30-Oct-2003  fvdl * keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.9 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.8 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.7 07-May-2003  fvdl branches: 1.7.2;
Generalize bounce buffers, and use them for 32 bit PCI if needed.
Make ALLOCNOW the default iff bouncing might be needed (this has
no effect on i386 because ISA DMA devices already had to use
ALLOCNOW, and PCI isn't bounced (yet), since we don't do > 4G
at this point for i386.
 1.6 05-May-2003  fvdl Move definition of ISA_DMA_BOUNCE_THRESHOLD to dev/isa/isareg.h.
 1.5 03-May-2003  wiz DMA, not dma nor Dma.
 1.4 04-Mar-2003  fvdl s/i386_isa_chipset/x86_isa_chipset/
 1.3 02-Mar-2003  fvdl Clean up some unneeded "mca.h" and "eisa.h" includes, make one that is
needed dependent on !__x86_64__. To be revisited later.
 1.2 02-Mar-2003  fvdl x86_64 has no mca.h and eisa.h (should perhaps just generate empty ones)
 1.1 27-Feb-2003  fvdl Moved here from i386/isa
 1.7.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.7.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.2.1 03-Aug-2004  skrll Sync with HEAD
 1.11.10.1 21-Apr-2005  tron Pull up revision 1.12 (requested by yamt in ticket #175):
tweak x86 bus_dma code so that it can be used by xen port.
- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.11.4.1 29-Apr-2005  kent sync with -current
 1.12.2.3 03-Sep-2007  yamt sync with head.
 1.12.2.2 26-Feb-2007  yamt sync with head.
 1.12.2.1 30-Dec-2006  yamt sync with head.
 1.13.22.2 10-Dec-2006  yamt sync with head.
 1.13.22.1 22-Oct-2006  yamt sync with head
 1.13.20.1 18-Nov-2006  ad Sync with head.
 1.15.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.16.12.1 03-Oct-2007  garbled Sync with HEAD
 1.16.6.1 11-Jul-2007  mjf Sync with head.
 1.16.4.1 29-Apr-2007  ad Replace another simplelock.
 1.17.10.1 06-Nov-2007  matt sync with HEAD
 1.18.20.3 19-Aug-2009  yamt sync with head.
 1.18.20.2 04-May-2009  yamt sync with head.
 1.18.20.1 16-May-2008  yamt sync with head.
 1.18.18.2 04-Jun-2008  yamt sync with head
 1.18.18.1 18-May-2008  yamt sync with head.
 1.18.16.4 17-Jan-2009  mjf Sync with HEAD.
 1.18.16.3 28-Sep-2008  mjf Sync with HEAD.
 1.18.16.2 29-Jun-2008  mjf Sync with HEAD.
 1.18.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.19.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.19.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.20.2.2 03-Jul-2008  simonb Sync with head.
 1.20.2.1 27-Jun-2008  simonb Sync with head.
 1.23.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.23.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.24.2.3 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.24.2.2 01-Nov-2009  jym Sync with HEAD.
 1.24.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.31.6.3 06-Mar-2012  mrg sync to -current
 1.31.6.2 06-Mar-2012  mrg sync to -current
 1.31.6.1 04-Mar-2012  mrg sync to latest -current.
 1.31.2.1 17-Apr-2012  yamt sync with head
 1.32.16.3 28-Aug-2017  skrll Sync with HEAD
 1.32.16.2 05-Dec-2016  skrll Sync with HEAD
 1.32.16.1 06-Jun-2015  skrll Sync with HEAD
 1.32.2.1 03-Dec-2017  jdolecek update from HEAD
 1.33.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.35.2.2 16-Jul-2017  cherry 2739767
 1.35.2.1 16-Jul-2017  cherry file isa_machdep.c was added on branch perseant-stdc-iso10646 on 2017-07-16 06:14:24 +0000
 1.38.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.38.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.38.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.39.2.1 10-Jun-2019  christos Sync with HEAD
 1.44.10.1 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.4 07-Sep-2025  thorpej Remove unnecessary NULL-initialization of TODR handle fields.
 1.3 07-Jan-2025  jakllsch Only use FADT Century byte if it targets a valid 146818 NVRAM location.

Should fix PR 57821.
 1.2 30-Dec-2022  jakllsch branches: 1.2.6;
Honor ACPI FADT Century byte; should fix many "unknown CMOS layout" messages.
 1.1 16-Jun-2009  bouyer branches: 1.1.2; 1.1.4; 1.1.6; 1.1.12; 1.1.102;
Split mc146818-related functions from clock.c into rtc.c.
Call rtc_set_ymdhms() from xen/xen/clock.c:xen_rtc_set() for xen3 dom0
kernels as the Xen3 hypervisor doesn't write the new date/time to the CMOS
by itself.
Now a XEN3_DOM0 kernel properly updates the CMOS time.
 1.1.102.2 09-May-2025  martin Pull up following revision(s) (requested by sborrill in ticket #1105):

sys/arch/x86/isa/rtc.c: revision 1.3

Only use FADT Century byte if it targets a valid 146818 NVRAM location.

Should fix PR 57821.
 1.1.102.1 13-Jan-2023  martin Pull up following revision(s) (requested by jakllsch in ticket #46):

sys/arch/x86/isa/rtc.c: revision 1.2

Honor ACPI FADT Century byte; should fix many "unknown CMOS layout" messages.
 1.1.12.2 21-Apr-2010  matt sync to netbsd-5
 1.1.12.1 16-Jun-2009  matt file rtc.c was added on branch matt-nb5-mips64 on 2010-04-21 00:33:46 +0000
 1.1.6.2 01-Nov-2009  jym Sync with HEAD.
 1.1.6.1 16-Jun-2009  jym file rtc.c was added on branch jym-xensuspend on 2009-11-01 13:58:16 +0000
 1.1.4.2 20-Jun-2009  yamt sync with head
 1.1.4.1 16-Jun-2009  yamt file rtc.c was added on branch yamt-nfs-mp on 2009-06-20 07:20:12 +0000
 1.1.2.2 19-Jun-2009  snj Pull up following revision(s) (requested by bouyer in ticket #816):
sys/arch/amd64/conf/files.amd64: revision 1.68
sys/arch/i386/conf/files.i386: revision 1.350
sys/arch/x86/include/rtc.h: revision 1.1
sys/arch/x86/isa/clock.c: revision 1.33
sys/arch/x86/isa/rtc.c: revision 1.1
sys/arch/xen/conf/files.xen: revision 1.100
sys/arch/xen/xen/clock.c: revision 1.50 via patch
Split mc146818-related functions from clock.c into rtc.c.
Call rtc_set_ymdhms() from xen/xen/clock.c:xen_rtc_set() for xen3 dom0
kernels as the Xen3 hypervisor doesn't write the new date/time to the CMOS
by itself.
Now a XEN3_DOM0 kernel properly updates the CMOS time.
 1.1.2.1 16-Jun-2009  snj file rtc.c was added on branch netbsd-5 on 2009-06-19 21:22:11 +0000
 1.2.6.1 02-Aug-2025  perseant Sync with HEAD
 1.8 25-Nov-2009  njoly aprintify.
 1.7 09-Jul-2008  joerg branches: 1.7.8;
Fix syntax. *sigh*
 1.6 09-Jul-2008  joerg Finish device/softc split.
 1.5 09-Jul-2008  joerg - device/softc split
 1.4 11-Dec-2005  christos branches: 1.4.74; 1.4.78; 1.4.80; 1.4.82; 1.4.84;
merge ktrace-lwp.
 1.3 13-Jan-2005  fvdl If there are no ioapics, don't bother to put things in ioapic mode.
However, always do this otherwise, regardless of the revision.
Remove incorrect comment.
 1.2 23-Apr-2004  itojun branches: 1.2.2;
pass string length (= boundary info) to pci_devinfo so that we do not run over
the end of memory region
 1.1 18-Apr-2004  fvdl Moved here from arch/amd64/pci
 1.2.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.2.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.2.2.1 23-Apr-2004  skrll file aapic.c was added on branch ktrace-lwp on 2004-08-03 10:43:04 +0000
 1.4.84.1 19-Oct-2008  haad Sync with HEAD.
 1.4.82.1 18-Jul-2008  simonb Sync with head.
 1.4.80.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.4.78.2 11-Mar-2010  yamt sync with head
 1.4.78.1 04-May-2009  yamt sync with head.
 1.4.74.1 28-Sep-2008  mjf Sync with HEAD.
 1.7.8.1 24-Oct-2010  jym Sync with HEAD
 1.2 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.1 18-Dec-2006  christos branches: 1.1.2; 1.1.4; 1.1.6; 1.1.68;
Moved from i386/pci/agp_machdep.c; from Blair Sadewitz
 1.1.68.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.1.6.2 12-Jan-2007  ad Sync with head.
 1.1.6.1 18-Dec-2006  ad file agp_machdep.c was added on branch newlock2 on 2007-01-12 01:01:01 +0000
 1.1.4.2 30-Dec-2006  yamt sync with head.
 1.1.4.1 18-Dec-2006  yamt file agp_machdep.c was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.1.2.2 21-Dec-2006  yamt sync with head.
 1.1.2.1 18-Dec-2006  yamt file agp_machdep.c was added on branch yamt-splraiseipl on 2006-12-21 15:07:58 +0000
 1.2 11-Dec-2005  christos merge ktrace-lwp.
 1.1 18-Apr-2004  fvdl branches: 1.1.2;
Moved here from arch/amd64/pci
 1.1.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.2.1 18-Apr-2004  skrll file amd8131reg.h was added on branch ktrace-lwp on 2004-08-03 10:43:04 +0000
 1.5 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.4 24-Apr-2021  thorpej branches: 1.4.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.3 12-Dec-2018  is branches: 1.3.14;
Added support for AMD family 16h cpu sensors - (just like 10h-14h).
(Tested on netbsd-8.0 release.)
 1.2 16-Apr-2012  cegger branches: 1.2.2; 1.2.4; 1.2.36; 1.2.42; 1.2.44;
Add rescan support. Re-fixes PR 45268.
 1.1 13-Apr-2012  cegger Replace amdtempbus with amdnb_miscbus.
This allows us to have independent drivers on the same device (northbridge f3)
each coming with a certain functionality/feature.
This way we do not need to mess with amdtemp(4) to utilize other features.
 1.2.44.1 10-Jun-2019  christos Sync with HEAD
 1.2.42.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.2.36.1 15-Dec-2018  martin Pull up following revision(s) (requested by is in ticket #1137):

sys/arch/x86/pci/amdnb_misc.c: revision 1.3
sys/arch/x86/pci/amdtemp.c: revision 1.22

Added support for AMD family 16h cpu sensors - (just like 10h-14h).
(Tested on netbsd-8.0 release.)
 1.2.4.2 29-Apr-2012  mrg sync to latest -current.
 1.2.4.1 16-Apr-2012  mrg file amdnb_misc.c was added on branch jmcneill-usbmp on 2012-04-29 23:04:43 +0000
 1.2.2.3 18-Apr-2012  yamt pull following revisions from trunk so that the kernel at least boot
on my system.
cvs rdiff -u -r1.33 -r1.34 src/sys/arch/x86/pci/pchb.c
cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/pci/pchbvar.h
cvs rdiff -u -r1.1 -r1.2 src/sys/arch/x86/pci/amdnb_misc.c
 1.2.2.2 17-Apr-2012  yamt sync with head
 1.2.2.1 16-Apr-2012  yamt file amdnb_misc.c was added on branch yamt-pagecache on 2012-04-17 00:07:05 +0000
 1.3.14.7 05-Apr-2021  thorpej config_match() -> config_probe() for the straight-forward indirect config
cases. There are still a few odd balls using config_match() which should
be sorted out later.
 1.3.14.6 04-Apr-2021  thorpej CFARG_SUBMATCH -> CFARG_SEARCH for the indirect configuration uses.
 1.3.14.5 03-Apr-2021  thorpej Give config_attach() the tagged variadic argument treatment and
mechanically convert all call sites.
 1.3.14.4 28-Mar-2021  thorpej These devices have only one interface attribute and no locators,
so simplify:

- config_attach_loc() -> config_attach().
- Don't pass CFARG_IATTR, or CFARG_LOCATORS to config_search().
 1.3.14.3 21-Mar-2021  thorpej In "rescan" routines, always pass locators and the interface attribute
straight through to config_search(). Also, for devices that carry only
one interface attribute, no need to do an ifattr_match(), because
rescan_with_cfdata() will have already validated that the parent is
eligible, which includes an interface attribute check.
 1.3.14.2 21-Mar-2021  thorpej CFARG_IATTR usage audit:

If a device carries only one interface attribute, there is no need
to specify it when calling config_search(); that specification is
meant only to disambiguate which interface attribute (which is a
proxy for "what kind of attach args are being used") is having
children attached. cfparent_match() will take care of ensuring that
any potential children can attach to one of the parent's iterface
attributes, and if the parent only carries one, no disambiguation is
necessary.
 1.3.14.1 20-Mar-2021  thorpej The proliferation if config_search_*() and config_found_*() combinations
is a little absurd, so begin to tidy this up:

- Introduce a new cfarg_t enumerated type, that defines the types of
tag-value variadic arguments that can be passed to the various
config_*() functions (CFARG_SUBMATCH, CFARG_IATTR, and CFARG_LOCATORS,
for now, plus a CFARG_EOL sentinel).
- Collapse config_search_*() into config_search() that takes these
variadic arguments.
- Convert all call sites of config_search_*() to the new signature.
Noticed several incorrect usages along the way, which will be
audited in a future commit.
 1.4.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.5 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.4 24-Apr-2021  thorpej branches: 1.4.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.3 20-Jul-2008  martin branches: 1.3.96;
Make struct pcib_softc explicit in our softc.
 1.2 21-Mar-2008  xtraeme branches: 1.2.4; 1.2.6; 1.2.8; 1.2.10;
Split device_t/softc for amdpcib and the hpet attachment, plus other
related cosmetic changes.
 1.1 26-Oct-2007  xtraeme branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10; 1.1.12; 1.1.26;
Share pcib(4) and amdpcib(4) between i386 and amd64; one copy is enough.
 1.1.26.2 28-Sep-2008  mjf Sync with HEAD.
 1.1.26.1 03-Apr-2008  mjf Sync with HEAD.
 1.1.12.2 03-Dec-2007  ad Sync with HEAD.
 1.1.12.1 26-Oct-2007  ad file amdpcib.c was added on branch vmlocking on 2007-12-03 19:04:24 +0000
 1.1.10.2 13-Nov-2007  bouyer Sync with HEAD
 1.1.10.1 26-Oct-2007  bouyer file amdpcib.c was added on branch bouyer-xenamd64 on 2007-11-13 16:00:18 +0000
 1.1.8.3 23-Mar-2008  matt sync with HEAD
 1.1.8.2 06-Nov-2007  matt sync with HEAD
 1.1.8.1 26-Oct-2007  matt file amdpcib.c was added on branch matt-armv6 on 2007-11-06 23:23:40 +0000
 1.1.4.2 28-Oct-2007  joerg Sync with HEAD.
 1.1.4.1 26-Oct-2007  joerg file amdpcib.c was added on branch jmcneill-pm on 2007-10-28 20:10:59 +0000
 1.1.2.3 24-Mar-2008  yamt sync with head.
 1.1.2.2 27-Oct-2007  yamt sync with head.
 1.1.2.1 26-Oct-2007  yamt file amdpcib.c was added on branch yamt-lazymbuf on 2007-10-27 11:28:57 +0000
 1.2.10.1 19-Oct-2008  haad Sync with HEAD.
 1.2.8.1 28-Jul-2008  simonb Sync with head.
 1.2.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.2.4.1 04-May-2009  yamt sync with head.
 1.3.96.4 05-Apr-2021  thorpej config_match() -> config_probe() for the straight-forward indirect config
cases. There are still a few odd balls using config_match() which should
be sorted out later.
 1.3.96.3 04-Apr-2021  thorpej CFARG_SUBMATCH -> CFARG_SEARCH for the indirect configuration uses.
 1.3.96.2 03-Apr-2021  thorpej config_attach_loc() -> config_attach() with CFARG_LOCATORS argument.
 1.3.96.1 20-Mar-2021  thorpej The proliferation if config_search_*() and config_found_*() combinations
is a little absurd, so begin to tidy this up:

- Introduce a new cfarg_t enumerated type, that defines the types of
tag-value variadic arguments that can be passed to the various
config_*() functions (CFARG_SUBMATCH, CFARG_IATTR, and CFARG_LOCATORS,
for now, plus a CFARG_EOL sentinel).
- Collapse config_search_*() into config_search() that takes these
variadic arguments.
- Convert all call sites of config_search_*() to the new signature.
Noticed several incorrect usages along the way, which will be
audited in a future commit.
 1.4.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.8 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.7 15-Jun-2011  jruoho Modularize hpet(4). Works nicely with the multiple bus locations.
 1.6 15-Jun-2011  jruoho Use defined constants.
 1.5 15-Jun-2011  jruoho Add detach function for hpet(4) at amdpcib(4).
 1.4 21-Mar-2008  xtraeme branches: 1.4.18; 1.4.36;
Split device_t/softc for ichlpcib(4) and all hpet consumers, plus
other related cosmetic changes.
 1.3 21-Mar-2008  xtraeme Split device_t/softc for amdpcib and the hpet attachment, plus other
related cosmetic changes.
 1.2 07-Jan-2008  joerg branches: 1.2.6;
x86 always has timecounter support.
 1.1 26-Oct-2007  xtraeme branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.12; 1.1.18;
Share pcib(4) and amdpcib(4) between i386 and amd64; one copy is enough.
 1.1.18.1 08-Jan-2008  bouyer Sync with HEAD
 1.1.12.2 03-Dec-2007  ad Sync with HEAD.
 1.1.12.1 26-Oct-2007  ad file amdpcib_hpet.c was added on branch vmlocking on 2007-12-03 19:04:25 +0000
 1.1.10.2 13-Nov-2007  bouyer Sync with HEAD
 1.1.10.1 26-Oct-2007  bouyer file amdpcib_hpet.c was added on branch bouyer-xenamd64 on 2007-11-13 16:00:19 +0000
 1.1.8.4 23-Mar-2008  matt sync with HEAD
 1.1.8.3 09-Jan-2008  matt sync with HEAD
 1.1.8.2 06-Nov-2007  matt sync with HEAD
 1.1.8.1 26-Oct-2007  matt file amdpcib_hpet.c was added on branch matt-armv6 on 2007-11-06 23:23:41 +0000
 1.1.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.2 28-Oct-2007  joerg Sync with HEAD.
 1.1.4.1 26-Oct-2007  joerg file amdpcib_hpet.c was added on branch jmcneill-pm on 2007-10-28 20:10:59 +0000
 1.1.2.4 24-Mar-2008  yamt sync with head.
 1.1.2.3 21-Jan-2008  yamt sync with head
 1.1.2.2 27-Oct-2007  yamt sync with head.
 1.1.2.1 26-Oct-2007  yamt file amdpcib_hpet.c was added on branch yamt-lazymbuf on 2007-10-27 11:28:57 +0000
 1.2.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.4.36.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.4.18.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.18 17-Oct-2024  msaitoh amdsmn(4): Add support AMD family F1Ah model 0xh "Turin".
 1.17 28-Jul-2023  msaitoh branches: 1.17.6;
Add Zen4 Phoenix support.
 1.16 28-Jan-2023  msaitoh amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.15 18-Dec-2022  reinoud Add amdsmn(4) and amdccp(4) power management stubs.
 1.14 01-Oct-2022  msaitoh branches: 1.14.4;
amdsmn(4),amdzentemp(4): Add support for 17h/6xh and 19h/6xh.
 1.13 27-Apr-2022  msaitoh Rename for AMD F15/6X device. No functional change.
 1.12 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.11 24-Apr-2021  thorpej branches: 1.11.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.10 25-Apr-2020  bouyer branches: 1.10.4;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.9 23-Apr-2020  simonb Apply previous change ("Don't mix sign and unsigned operands. Just use
size_t for the loop.") to another loop variable.
 1.8 20-Apr-2020  joerg Don't mix sign and unsigned operands. Just use size_t for the loop.
 1.7 20-Apr-2020  simonb Update to support Family 15h Model 60 temperature sensors.

Changes based on FreeBSD amdtemp driver changes by Conrad Meyer.

XXX: Some code duplication between this driver and amdtemp as
parts of the 15h refresh code share more in common with
older CPUs while accessing the device more like 17h.
 1.6 06-Aug-2019  msaitoh branches: 1.6.6;
Whitespace fix.
 1.5 18-Jul-2019  msaitoh branches: 1.5.2;
Use unsigned to fix compile error on i386.
 1.4 18-Jul-2019  msaitoh Add support for Ryzen 2xxx and 3xxx.
 1.3 27-Jan-2018  kardel branches: 1.3.2; 1.3.6;
rescan amdsmnbus instead of amdsmn (fixes panic)
 1.2 25-Jan-2018  pgoyette Modularize the amdsmn(4) driver, and update dependency for amdzentemp(4)
 1.1 25-Jan-2018  christos Add amdzentemp from FreeBSD via Ian Clark
 1.3.6.2 21-Apr-2020  martin Sync with HEAD
 1.3.6.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.2.7 22-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1888):

sys/arch/x86/pci/amdzentemp.c: revision 1.20
sys/arch/x86/pci/amdsmn.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.19

Add Zen4 Ryzen "Phoenix" support.
Add Zen2 Mendocino APU support.
Add Zen4 Phoenix support.
 1.3.2.6 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1825):

sys/arch/x86/pci/amdsmn.c: revision 1.16
sys/arch/x86/pci/amdzentemp.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.18

Reduce diff against DragonFly. No functional change.
amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.3.2.5 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1773):

share/man/man4/man4.x86/amdsmn.4 1.4,1.5
share/man/man4/man4.x86/amdzentemp.4 1.7
sys/arch/x86/pci/amdsmn.c 1.7-1.9,1.13,1.14
sys/arch/x86/pci/amdzentemp.c 1.8-1.10,1.12-1.15

adjust for possible 49K offset

presence of this offset is indicated by a set 19th bit which is shifted away
this brings the temperature to "normal" levels on my ryzen 2700
(I assumed the same 49K offset as the k10temp project)

correct for known temperature bias values.

Update to support Family 15h Model 60 temperature sensors.

Changes based on FreeBSD amdtemp driver changes by Conrad Meyer.

XXX: Some code duplication between this driver and amdtemp as
parts of the 15h refresh code share more in common with
older CPUs while accessing the device more like 17h.

Don't mix sign and unsigned operands. Just use size_t for the loop.

Apply previous change ("Don't mix sign and unsigned operands. Just use
size_t for the loop.") to another loop variable.

amdzentemp(4): Add Zen 3 support.

amdzentemp(4): Add support for per CCD temperature sensor from FreeBSD.

Fix build failure on i386.

Rename for AMD F15/6X device. No functional change.
amdsmn(4),amdzentemp(4): Add support for 17h/6xh and 19h/6xh.

Note that these drivers are present on some newer AMD Family 15h
processors.

amdsmn.4: Now support AMD Family 19h processors.
 1.3.2.4 06-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1328):

sys/arch/x86/pci/amdsmn.c: revision 1.4
sys/arch/x86/pci/amdsmn.c: revision 1.5
sys/arch/x86/pci/amdsmn.c: revision 1.6

Add support for Ryzen 2xxx and 3xxx.

Use unsigned to fix compile error on i386.

Whitespace fix.
 1.3.2.3 06-Feb-2018  martin Additionally pull up rev 1.3 of sys/arch/x86/pci/amdsmn.c, requested by
pgoyette in ticket #524:

rescan amdsmnbus instead of amdsmn (fixes panic)
 1.3.2.2 05-Feb-2018  martin Pull up following revision(s) (requested by pgoyette in ticket #524):

distrib/sets/lists/man/mi 1.1574
distrib/sets/lists/modules/md.amd64 1.73
distrib/sets/lists/modules/md.i386 1.76
share/man/man4/amdtemp.4 1.11
share/man/man4/man4.x86/Makefile 1.17
share/man/man4/man4.x86/amdsmn.4 1.1-1.3
share/man/man4/man4.x86/amdzentemp.4 1.1-1.6
sys/arch/amd64/conf/ALL 1.79,1.80
sys/arch/amd64/conf/GENERIC 1.482,1.484
sys/arch/amd64/conf/XEN3_DOM0 1.146,1.147
sys/arch/x86/pci/amdsmn.c 1.1-1.2
sys/arch/x86/pci/amdsmn.h 1.1
sys/arch/x86/pci/amdzentemp.c 1.1-1.7
sys/arch/x86/pci/files.pci 1.22,1.23
sys/modules/amdzentemp/amdzentemp.ioconf 1.2


Add amdzentemp from FreeBSD via Ian Clark.

man pages for amdsmn and amdzentemp.

Some clean-up on the HISTORY and AUTHORS sections, and addition of a BUGS
section to document the fact that we don't yet handle the required temp
offset, nor do we expose the available thermal-trip value.

Add missing article 'a'

KNF: Put back the blank line following the empty variable declarations
Put back the variable declaration, too, and mark it __diagused
Otherwise a DIAGNOSTIC kernel will complain about the variable being
undeclared.

Correct placement of __diagused attribute.

Modularize the amdsmn(4) driver, and update dependency for amdzentemp(4),
Create amdsmn(4) amd amdzentemp(4) modules for X86.
 1.3.2.1 27-Jan-2018  martin file amdsmn.c was added on branch netbsd-8 on 2018-02-05 13:06:55 +0000
 1.5.2.5 22-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1720):

sys/arch/x86/pci/amdzentemp.c: revision 1.20
sys/arch/x86/pci/amdsmn.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.19

Add Zen4 Ryzen "Phoenix" support.
Add Zen2 Mendocino APU support.
Add Zen4 Phoenix support.
 1.5.2.4 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1644):

sys/arch/x86/pci/amdsmn.c: revision 1.16
sys/arch/x86/pci/amdzentemp.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.18

Reduce diff against DragonFly. No functional change.
amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.5.2.3 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1539):

share/man/man4/man4.x86/amdsmn.4: revision 1.5
sys/arch/x86/pci/amdsmn.c: revision 1.14
sys/arch/x86/pci/amdzentemp.c: revision 1.12-1.15

amdzentemp(4): Add Zen 3 support.

amdzentemp(4): Add support for per CCD temperature sensor from FreeBSD.

Fix build failure on i386.

amdsmn(4),amdzentemp(4): Add support for 17h/6xh and 19h/6xh.

amdsmn.4: Now support AMD Family 19h processors.
 1.5.2.2 27-Jul-2022  martin Pull up the following revisions, requested by msaitoh in ticket #1482:

sys/dev/pci/pcidevs 1.1422,1.1445-1.1460
via patch
sys/arch/x86/pci/amdsmn.c 1.13

Update pcidevs:
- Add Intel Alder Lake devices and Intel 600 Series PCH devices.
- Add some Intel Xeon Scalable / Skylake-E devices.
- Fix AMD F16_HB from 0x1568 to 0x1538.
- Add some devices for AMD and improve some descriptions to clarify.
- Add VMware AHCI and NVMe.
- Update Intel 700 series Ethernet devices.
- Add some Broadcom devices.
- Add some Broadcom / LSI RAID cards.
- Fix typos and whitespace.
 1.5.2.1 24-Apr-2020  martin Pull up following revision(s) (requested by simonb in ticket #851):

share/man/man4/man4.x86/amdzentemp.4: revision 1.7
share/man/man4/man4.x86/amdsmn.4: revision 1.4
sys/arch/x86/pci/amdsmn.c: revision 1.7
sys/arch/x86/pci/amdsmn.c: revision 1.8
sys/arch/x86/pci/amdsmn.c: revision 1.9
sys/arch/x86/pci/amdzentemp.c: revision 1.10

Update to support Family 15h Model 60 temperature sensors.

Changes based on FreeBSD amdtemp driver changes by Conrad Meyer.
XXX: Some code duplication between this driver and amdtemp as
parts of the 15h refresh code share more in common with
older CPUs while accessing the device more like 17h.
--
Note that these drivers are present on some newer AMD Family 15h
processors.
--
Don't mix sign and unsigned operands. Just use size_t for the loop.
--
Apply previous change ("Don't mix sign and unsigned operands. Just use
size_t for the loop.") to another loop variable.
--
 1.6.6.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.10.4.6 05-Apr-2021  thorpej config_match() -> config_probe() for the straight-forward indirect config
cases. There are still a few odd balls using config_match() which should
be sorted out later.
 1.10.4.5 04-Apr-2021  thorpej CFARG_SUBMATCH -> CFARG_SEARCH for the indirect configuration uses.
 1.10.4.4 03-Apr-2021  thorpej config_attach_loc() -> config_attach() with CFARG_LOCATORS argument.
 1.10.4.3 28-Mar-2021  thorpej These devices have only one interface attribute and no locators,
so simplify:

- config_attach_loc() -> config_attach().
- Don't pass CFARG_IATTR, or CFARG_LOCATORS to config_search().
 1.10.4.2 21-Mar-2021  thorpej In "rescan" routines, always pass locators and the interface attribute
straight through to config_search(). Also, for devices that carry only
one interface attribute, no need to do an ifattr_match(), because
rescan_with_cfdata() will have already validated that the parent is
eligible, which includes an interface attribute check.
 1.10.4.1 20-Mar-2021  thorpej The proliferation if config_search_*() and config_found_*() combinations
is a little absurd, so begin to tidy this up:

- Introduce a new cfarg_t enumerated type, that defines the types of
tag-value variadic arguments that can be passed to the various
config_*() functions (CFARG_SUBMATCH, CFARG_IATTR, and CFARG_LOCATORS,
for now, plus a CFARG_EOL sentinel).
- Collapse config_search_*() into config_search() that takes these
variadic arguments.
- Convert all call sites of config_search_*() to the new signature.
Noticed several incorrect usages along the way, which will be
audited in a future commit.
 1.11.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.14.4.3 22-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #335):

sys/arch/x86/pci/amdzentemp.c: revision 1.20
sys/arch/x86/pci/amdsmn.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.19

Add Zen4 Ryzen "Phoenix" support.
Add Zen2 Mendocino APU support.
Add Zen4 Phoenix support.
 1.14.4.2 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #198):

sys/arch/x86/pci/amdsmn.c: revision 1.16
sys/arch/x86/pci/amdzentemp.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.18

Reduce diff against DragonFly. No functional change.
amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.14.4.1 19-Dec-2022  martin Pull up following revision(s) (requested by reinoud in ticket #3):

sys/dev/pci/amdccp_pci.c: revision 1.4
sys/arch/x86/pci/amdsmn.c: revision 1.15
sys/dev/acpi/amdccp_acpi.c: revision 1.6
sys/dev/fdt/amdccp_fdt.c: revision 1.7

Add amdsmn(4) and amdccp(4) power management stubs.
 1.17.6.1 02-Aug-2025  perseant Sync with HEAD
 1.1 25-Jan-2018  christos branches: 1.1.2;
Add amdzentemp from FreeBSD via Ian Clark
 1.1.2.2 05-Feb-2018  martin Pull up following revision(s) (requested by pgoyette in ticket #524):

distrib/sets/lists/man/mi 1.1574
distrib/sets/lists/modules/md.amd64 1.73
distrib/sets/lists/modules/md.i386 1.76
share/man/man4/amdtemp.4 1.11
share/man/man4/man4.x86/Makefile 1.17
share/man/man4/man4.x86/amdsmn.4 1.1-1.3
share/man/man4/man4.x86/amdzentemp.4 1.1-1.6
sys/arch/amd64/conf/ALL 1.79,1.80
sys/arch/amd64/conf/GENERIC 1.482,1.484
sys/arch/amd64/conf/XEN3_DOM0 1.146,1.147
sys/arch/x86/pci/amdsmn.c 1.1-1.2
sys/arch/x86/pci/amdsmn.h 1.1
sys/arch/x86/pci/amdzentemp.c 1.1-1.7
sys/arch/x86/pci/files.pci 1.22,1.23
sys/modules/amdzentemp/amdzentemp.ioconf 1.2


Add amdzentemp from FreeBSD via Ian Clark.

man pages for amdsmn and amdzentemp.

Some clean-up on the HISTORY and AUTHORS sections, and addition of a BUGS
section to document the fact that we don't yet handle the required temp
offset, nor do we expose the available thermal-trip value.

Add missing article 'a'

KNF: Put back the blank line following the empty variable declarations
Put back the variable declaration, too, and mark it __diagused
Otherwise a DIAGNOSTIC kernel will complain about the variable being
undeclared.

Correct placement of __diagused attribute.

Modularize the amdsmn(4) driver, and update dependency for amdzentemp(4),
Create amdsmn(4) amd amdzentemp(4) modules for X86.
 1.1.2.1 25-Jan-2018  martin file amdsmn.h was added on branch netbsd-8 on 2018-02-05 13:06:55 +0000
 1.23 30-Dec-2018  is Document bobcat/puma family nicknames.
 1.22 12-Dec-2018  is Added support for AMD family 16h cpu sensors - (just like 10h-14h).
(Tested on netbsd-8.0 release.)
 1.21 27-Sep-2018  maxv Improve a bit, no real functional change.
 1.20 01-Jun-2017  chs branches: 1.20.2; 1.20.8; 1.20.10;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.19 23-Apr-2015  pgoyette Update module dependencies for all the existing modules that depend on sysmon components.
 1.18 15-Nov-2013  msaitoh branches: 1.18.6;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.17 12-Nov-2013  msaitoh Calcurate the processor family correctly. The extended family bits
should be added only when the base family is 0xf.
 1.16 16-Jul-2012  pgoyette branches: 1.16.2; 1.16.4;
Enable entropy gathering
 1.15 13-Apr-2012  cegger Replace amdtempbus with amdnb_miscbus.
This allows us to have independent drivers on the same device (northbridge f3)
each coming with a certain functionality/feature.
This way we do not need to mess with amdtemp(4) to utilize other features.
 1.14 13-Apr-2012  cegger - support AMD Family15h
- deregister pmf on detach
 1.13 02-Mar-2012  nonaka Added Family 12h support.
 1.12 31-Jul-2011  jmcneill branches: 1.12.2; 1.12.6; 1.12.8;
add Family14h (AMD Fusion) support
 1.11 15-Jun-2011  jruoho Small cleanup; use KM_SLEEP, wrap long lines, etc. No functional change.
 1.10 15-Jun-2011  jruoho Modularize amdtemp(4).
 1.9 16-Oct-2009  cegger branches: 1.9.10;
Family 10h Errata #319: Attach on Family10h cpu series which have it fixed.
 1.8 16-Jun-2009  cegger - use <sys/bus.h> and <sys/cpu.h>
- add reference to family11h documentation
- add reference to AMD K8 Errata #141
 1.7 12-Mar-2009  cegger - beautify dmesg
- print family id if not supported
spotted by jmcneill@
 1.6 04-Dec-2008  cegger branches: 1.6.4;
Fix the fix: Only AMD K8 Rev-G on AM2 sockets are impacted.
 1.5 04-Dec-2008  cegger On AMD K8 CPUs with Socket AM2, sensor normalization is off by 21C degree.
Adjust temperature calculation. This should fix strange temperatures on AMD K8
CPUs reported by many people.
 1.4 20-May-2008  cegger branches: 1.4.2; 1.4.6; 1.4.8; 1.4.10;
correct comment copied from aiboost(4): envsys(4) wants uK
 1.3 20-May-2008  cegger envsys(4) expects values in mK and not the top of the range of possible temperature values.
Needed some time to figure this out after I saw negative temperature values on Griffin.
 1.2 29-Apr-2008  martin branches: 1.2.2; 1.2.4;
Convert to new 2 clause license
 1.1 22-Apr-2008  cegger branches: 1.1.2;
amdtemp(4): Driver for AMD CPU Temperature Sensors. Adopted from OpenBSD's kate(4).
Changes beyond OpenBSD's driver:
- Improved support for AMD K8
- Added support for AMD Barcelona, AMD Phenom and AMD Griffin
Tested on various single and multi-socket machines.
Review and OK xtreame
 1.1.2.4 11-Mar-2010  yamt sync with head
 1.1.2.3 20-Jun-2009  yamt sync with head
 1.1.2.2 04-May-2009  yamt sync with head.
 1.1.2.1 16-May-2008  yamt sync with head.
 1.2.4.3 04-Jun-2008  yamt sync with head
 1.2.4.2 18-May-2008  yamt sync with head.
 1.2.4.1 29-Apr-2008  yamt file amdtemp.c was added on branch yamt-pf42 on 2008-05-18 12:33:04 +0000
 1.2.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.4.10.2 10-Dec-2008  snj Pull up following revision(s) (requested by cegger in ticket #173):
sys/arch/x86/pci/amdtemp.c: revision 1.6
Fix the fix: Only AMD K8 Rev-G on AM2 sockets are impacted.
 1.4.10.1 10-Dec-2008  snj Pull up following revision(s) (requested by cegger in ticket #173):
sys/arch/x86/pci/amdtemp.c: revision 1.5
On AMD K8 CPUs with Socket AM2, sensor normalization is off by 21C degree.
Adjust temperature calculation. This should fix strange temperatures on
AMD K8
CPUs reported by many people.
 1.4.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.4.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.4.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.4.2.3 17-Jan-2009  mjf Sync with HEAD.
 1.4.2.2 02-Jun-2008  mjf Sync with HEAD.
 1.4.2.1 20-May-2008  mjf file amdtemp.c was added on branch mjf-devfs2 on 2008-06-02 13:22:50 +0000
 1.6.4.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.6.4.3 01-Nov-2009  jym Sync with HEAD.
 1.6.4.2 23-Jul-2009  jym Sync with HEAD.
 1.6.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.9.10.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.12.8.2 16-Apr-2012  riz Pull up following revision(s) (requested by cegger in ticket #180):
sys/arch/x86/pci/amdtemp.c: revision 1.14
sys/dev/pci/pcidevs: revision 1.1115
Add AMD Family15h ids
- support AMD Family15h
- deregister pmf on detach
 1.12.8.1 08-Mar-2012  riz Pull up following revision(s) (requested by nonaka):
share/man/man4/amdtemp.4: revision 1.6
share/man/man4/amdtemp.4: revision 1.7
sys/arch/x86/pci/amdtemp.c: revision 1.13
Added Family 12h support.
Mention AMD Fusion.
Bump date for previous.
 1.12.6.4 29-Apr-2012  mrg sync to latest -current.
 1.12.6.3 06-Mar-2012  mrg sync to -current
 1.12.6.2 06-Mar-2012  mrg sync to -current
 1.12.6.1 04-Mar-2012  mrg sync to latest -current.
 1.12.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.2.2 30-Oct-2012  yamt sync with head
 1.12.2.1 17-Apr-2012  yamt sync with head
 1.16.4.1 18-May-2014  rmind sync with head
 1.16.2.2 03-Dec-2017  jdolecek update from HEAD
 1.16.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.18.6.2 28-Aug-2017  skrll Sync with HEAD
 1.18.6.1 06-Jun-2015  skrll Sync with HEAD
 1.20.10.1 10-Jun-2019  christos Sync with HEAD
 1.20.8.3 18-Jan-2019  pgoyette Synch with HEAD
 1.20.8.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.20.8.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.20.2.1 15-Dec-2018  martin Pull up following revision(s) (requested by is in ticket #1137):

sys/arch/x86/pci/amdnb_misc.c: revision 1.3
sys/arch/x86/pci/amdtemp.c: revision 1.22

Added support for AMD family 16h cpu sensors - (just like 10h-14h).
(Tested on netbsd-8.0 release.)
 1.23 06-Apr-2025  pgoyette insert a space in attach message
 1.22 17-Oct-2024  msaitoh amdzentemp(4): Add some CPU support.

- Zen4 "Siena" (family 1fh mode 0xa0...0xaf)
- Zen5 "Turin Classic" (family 1ah mode 0x00...0x0f)
- Zen5 "Turin Dense" (family 1ah mode 0x10...0x1f)
- Zen5 "Strix Point" (family 1ah mode 0x20...0x2f)
 1.21 04-Oct-2024  msaitoh amdzentemp(4): Add support for CPU family 0x1a model 0x40...0x4f (Zen 5)
 1.20 28-Jul-2023  msaitoh branches: 1.20.6;
Add Zen2 Mendocino APU support.
 1.19 28-Jul-2023  msaitoh Add Zen4 Ryzen "Phoenix" support.
 1.18 28-Jan-2023  msaitoh amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.17 28-Jan-2023  msaitoh Reduce diff against DragonFly. No functional change.
 1.16 24-Nov-2022  mrg branches: 1.16.2;
match zen3 "cezanne" (ryzen 5000-series APU.)
 1.15 01-Oct-2022  msaitoh amdsmn(4),amdzentemp(4): Add support for 17h/6xh and 19h/6xh.
 1.14 06-Jun-2021  nonaka Fix build failure on i386.
 1.13 06-Jun-2021  nonaka amdzentemp(4): Add support for per CCD temperature sensor from FreeBSD.
 1.12 05-Jun-2021  nonaka amdzentemp(4): Add Zen 3 support.
 1.11 25-Apr-2020  bouyer branches: 1.11.6; 1.11.10;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.10 20-Apr-2020  simonb Update to support Family 15h Model 60 temperature sensors.

Changes based on FreeBSD amdtemp driver changes by Conrad Meyer.

XXX: Some code duplication between this driver and amdtemp as
parts of the 15h refresh code share more in common with
older CPUs while accessing the device more like 17h.
 1.9 16-Jun-2019  mlelstv branches: 1.9.2; 1.9.8;
correct for known temperature bias values.
 1.8 25-Jul-2018  para adjust for possible 49K offset

presence of this offset is indicated by a set 19th bit which is shifted away
this brings the temperature to "normal" levels on my ryzen 2700
(I assumed the same 49K offset as the k10temp project)
 1.7 26-Jan-2018  pgoyette branches: 1.7.2; 1.7.4; 1.7.6;
sc->sc_sensor cannot be NULL since it was just allocated with KM_SLEEP
(which cannot fail). So remove the NULL-check. CID/1428644
 1.6 25-Jan-2018  pgoyette Modularize the amdsmn(4) driver, and update dependency for amdzentemp(4)
 1.5 25-Jan-2018  pgoyette Correct placement of __diagused attribute
 1.4 25-Jan-2018  pgoyette Put back the variable declaration, too, and mark it __diagused

Otherwise a DIAGNOSTIC kernel will complain about the variable being
undeclared.
 1.3 25-Jan-2018  pgoyette KNF: Put back the blank line following the empty variable declarations
 1.2 25-Jan-2018  prlw1 Unused variable build fix. (now void *aux is unused)
 1.1 25-Jan-2018  christos Add amdzentemp from FreeBSD via Ian Clark
 1.7.6.3 21-Apr-2020  martin Sync with HEAD
 1.7.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7.6.1 10-Jun-2019  christos Sync with HEAD
 1.7.4.1 28-Jul-2018  pgoyette Sync with HEAD
 1.7.2.6 22-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1888):

sys/arch/x86/pci/amdzentemp.c: revision 1.20
sys/arch/x86/pci/amdsmn.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.19

Add Zen4 Ryzen "Phoenix" support.
Add Zen2 Mendocino APU support.
Add Zen4 Phoenix support.
 1.7.2.5 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1825):

sys/arch/x86/pci/amdsmn.c: revision 1.16
sys/arch/x86/pci/amdzentemp.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.18

Reduce diff against DragonFly. No functional change.
amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.7.2.4 23-Jan-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1789):

sys/arch/x86/pci/amdzentemp.c: revision 1.16

match zen3 "cezanne" (ryzen 5000-series APU.)
 1.7.2.3 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1773):

share/man/man4/man4.x86/amdsmn.4 1.4,1.5
share/man/man4/man4.x86/amdzentemp.4 1.7
sys/arch/x86/pci/amdsmn.c 1.7-1.9,1.13,1.14
sys/arch/x86/pci/amdzentemp.c 1.8-1.10,1.12-1.15

adjust for possible 49K offset

presence of this offset is indicated by a set 19th bit which is shifted away
this brings the temperature to "normal" levels on my ryzen 2700
(I assumed the same 49K offset as the k10temp project)

correct for known temperature bias values.

Update to support Family 15h Model 60 temperature sensors.

Changes based on FreeBSD amdtemp driver changes by Conrad Meyer.

XXX: Some code duplication between this driver and amdtemp as
parts of the 15h refresh code share more in common with
older CPUs while accessing the device more like 17h.

Don't mix sign and unsigned operands. Just use size_t for the loop.

Apply previous change ("Don't mix sign and unsigned operands. Just use
size_t for the loop.") to another loop variable.

amdzentemp(4): Add Zen 3 support.

amdzentemp(4): Add support for per CCD temperature sensor from FreeBSD.

Fix build failure on i386.

Rename for AMD F15/6X device. No functional change.
amdsmn(4),amdzentemp(4): Add support for 17h/6xh and 19h/6xh.

Note that these drivers are present on some newer AMD Family 15h
processors.

amdsmn.4: Now support AMD Family 19h processors.
 1.7.2.2 05-Feb-2018  martin Pull up following revision(s) (requested by pgoyette in ticket #524):

distrib/sets/lists/man/mi 1.1574
distrib/sets/lists/modules/md.amd64 1.73
distrib/sets/lists/modules/md.i386 1.76
share/man/man4/amdtemp.4 1.11
share/man/man4/man4.x86/Makefile 1.17
share/man/man4/man4.x86/amdsmn.4 1.1-1.3
share/man/man4/man4.x86/amdzentemp.4 1.1-1.6
sys/arch/amd64/conf/ALL 1.79,1.80
sys/arch/amd64/conf/GENERIC 1.482,1.484
sys/arch/amd64/conf/XEN3_DOM0 1.146,1.147
sys/arch/x86/pci/amdsmn.c 1.1-1.2
sys/arch/x86/pci/amdsmn.h 1.1
sys/arch/x86/pci/amdzentemp.c 1.1-1.7
sys/arch/x86/pci/files.pci 1.22,1.23
sys/modules/amdzentemp/amdzentemp.ioconf 1.2


Add amdzentemp from FreeBSD via Ian Clark.

man pages for amdsmn and amdzentemp.

Some clean-up on the HISTORY and AUTHORS sections, and addition of a BUGS
section to document the fact that we don't yet handle the required temp
offset, nor do we expose the available thermal-trip value.

Add missing article 'a'

KNF: Put back the blank line following the empty variable declarations
Put back the variable declaration, too, and mark it __diagused
Otherwise a DIAGNOSTIC kernel will complain about the variable being
undeclared.

Correct placement of __diagused attribute.

Modularize the amdsmn(4) driver, and update dependency for amdzentemp(4),
Create amdsmn(4) amd amdzentemp(4) modules for X86.
 1.7.2.1 26-Jan-2018  martin file amdzentemp.c was added on branch netbsd-8 on 2018-02-05 13:06:55 +0000
 1.9.8.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.9.2.5 22-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1720):

sys/arch/x86/pci/amdzentemp.c: revision 1.20
sys/arch/x86/pci/amdsmn.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.19

Add Zen4 Ryzen "Phoenix" support.
Add Zen2 Mendocino APU support.
Add Zen4 Phoenix support.
 1.9.2.4 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1644):

sys/arch/x86/pci/amdsmn.c: revision 1.16
sys/arch/x86/pci/amdzentemp.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.18

Reduce diff against DragonFly. No functional change.
amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.9.2.3 23-Jan-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1572):

sys/arch/x86/pci/amdzentemp.c: revision 1.16

match zen3 "cezanne" (ryzen 5000-series APU.)
 1.9.2.2 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1539):

share/man/man4/man4.x86/amdsmn.4: revision 1.5
sys/arch/x86/pci/amdsmn.c: revision 1.14
sys/arch/x86/pci/amdzentemp.c: revision 1.12-1.15

amdzentemp(4): Add Zen 3 support.

amdzentemp(4): Add support for per CCD temperature sensor from FreeBSD.

Fix build failure on i386.

amdsmn(4),amdzentemp(4): Add support for 17h/6xh and 19h/6xh.

amdsmn.4: Now support AMD Family 19h processors.
 1.9.2.1 24-Apr-2020  martin Pull up following revision(s) (requested by simonb in ticket #851):

share/man/man4/man4.x86/amdzentemp.4: revision 1.7
share/man/man4/man4.x86/amdsmn.4: revision 1.4
sys/arch/x86/pci/amdsmn.c: revision 1.7
sys/arch/x86/pci/amdsmn.c: revision 1.8
sys/arch/x86/pci/amdsmn.c: revision 1.9
sys/arch/x86/pci/amdzentemp.c: revision 1.10

Update to support Family 15h Model 60 temperature sensors.

Changes based on FreeBSD amdtemp driver changes by Conrad Meyer.
XXX: Some code duplication between this driver and amdtemp as
parts of the 15h refresh code share more in common with
older CPUs while accessing the device more like 17h.
--
Note that these drivers are present on some newer AMD Family 15h
processors.
--
Don't mix sign and unsigned operands. Just use size_t for the loop.
--
Apply previous change ("Don't mix sign and unsigned operands. Just use
size_t for the loop.") to another loop variable.
--
 1.11.10.1 06-Jun-2021  cjep sync with head
 1.11.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.16.2.3 09-Oct-2024  martin Pull up following revision(s) (requested by msaitoh in ticket #940):

sys/arch/x86/pci/amdzentemp.c: revision 1.21

amdzentemp(4): Add support for CPU family 0x1a model 0x40...0x4f (Zen 5)
 1.16.2.2 22-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #335):

sys/arch/x86/pci/amdzentemp.c: revision 1.20
sys/arch/x86/pci/amdsmn.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.19

Add Zen4 Ryzen "Phoenix" support.
Add Zen2 Mendocino APU support.
Add Zen4 Phoenix support.
 1.16.2.1 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #198):

sys/arch/x86/pci/amdsmn.c: revision 1.16
sys/arch/x86/pci/amdzentemp.c: revision 1.17
sys/arch/x86/pci/amdzentemp.c: revision 1.18

Reduce diff against DragonFly. No functional change.
amdsmn(4),amdzentemp(4): Add Zen3+ Rembrandt(19h/4xh) & Zen4 Genoa(19h/1xh).
 1.20.6.1 02-Aug-2025  perseant Sync with HEAD
 1.13 15-Sep-2025  thorpej No longer need to include acpi_i2c.h here.
 1.12 15-Sep-2025  thorpej Do the ACPI-specific get-child-devices dance in iic_attach(). This
obviously isn't ideal, but it funnels the issue into a central location
and provides for easier improvement later.
 1.11 11-Nov-2024  martin Add missing include of "acpica.h", pointed out by Jared.
 1.10 29-Apr-2024  andvar branches: 1.10.2;
Make dwiic_pci compile without ACPI option.
 1.9 19-Oct-2022  riastradh dwiic(4): Don't try to attach children if dwiic_attach failed.

PR kern/57063
 1.8 27-Oct-2021  msaitoh Add more Jasper Lake and Elkhart Lake devices.
 1.7 27-Oct-2021  msaitoh Add many Intel I2C devices.
 1.6 07-Aug-2021  thorpej branches: 1.6.2;
Merge thorpej-cfargs2.
 1.5 29-May-2021  riastradh branches: 1.5.4;
dwiic(4): Attribute output correctly and relegate to debug-level.

Tidy up a little while here.
 1.4 24-Apr-2021  thorpej branches: 1.4.2; 1.4.4;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.3 26-Jan-2021  jmcneill branches: 1.3.2;
Add a device_t parameter to acpi_enter_i2c_devs. If non-NULL, all child
acpi_devnodes will be claimed by that device so we don't later try to
attach a duplicate device to that node at acpinodebus.
 1.2 26-Sep-2018  jakllsch branches: 1.2.4; 1.2.12;
Add dwiic_fdt attachment for "snps,designware-i2c".
 1.1 10-Dec-2017  bouyer branches: 1.1.2; 1.1.4;
Add support for I2C designware controllers (as found in Intel PCH devices),
with a pci front-end.
The pci front-end is tied to ACPI and Intel-specific, so it's in arch/x86/pci
and not dev/pci.
Core driver from OpenBSD, PCI front-end by me.
 1.1.4.1 10-Jun-2019  christos Sync with HEAD
 1.1.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.2.12.1 03-Apr-2021  thorpej Sync with HEAD.
 1.2.4.1 21-Jun-2021  martin Pull up following revision(s) (requested by riastradh in ticket #1303):

sys/arch/x86/pci/dwiic_pci.c: revision 1.5 (patch)

dwiic(4): Attribute output correctly and relegate to debug-level.

Tidy up a little while here.
 1.3.2.1 23-Mar-2021  thorpej Convert config_found_ia() call sites where the device only carries
a single interface attribute to bare config_found() calls.
 1.4.4.1 31-May-2021  cjep sync with head
 1.4.2.3 17-Jun-2021  thorpej Sync w/ HEAD.
 1.4.2.2 25-Apr-2021  thorpej - Don't use acpi_enter_i2c_devs() -- it no longer exists.
- Pass along our devhandle to the i2c bus instance.
 1.4.2.1 25-Apr-2021  thorpej acpi_i2c.h is no more.
 1.5.4.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.6.2.1 09-Aug-2021  thorpej Port over the changes from thorpej-i2c-spi-conf to thorpej-i2c-spi-conf2,
which is based on a newer HEAD revision.
 1.10.2.1 02-Aug-2025  perseant Sync with HEAD
 1.27 12-Apr-2023  riastradh ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.

Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.26 24-Apr-2021  thorpej branches: 1.26.16;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.25 14-Oct-2020  ryo branches: 1.25.4;
vmx(4) should be MI. moved to sys/dev/pci from sys/arch/x86/pci
 1.24 01-Mar-2018  mrg move the imc code into x86/pci/files.pci so that pci is defined in time.
 1.23 27-Jan-2018  christos provide an intermediate "bus" for the module and to be the same structure
like amdtemp
 1.22 25-Jan-2018  christos Add amdzentemp from FreeBSD via Ian Clark
 1.21 10-Dec-2017  bouyer Add support for I2C designware controllers (as found in Intel PCH devices),
with a pci front-end.
The pci front-end is tied to ACPI and Intel-specific, so it's in arch/x86/pci
and not dev/pci.
Core driver from OpenBSD, PCI front-end by me.
 1.20 03-May-2015  pgoyette branches: 1.20.10;
Separate the watchdog code from the pcib code, and make the watchdog
a loadable module.
 1.19 11-Nov-2014  christos branches: 1.19.2;
add an agp dependency so that the agp drivers get loaded.
 1.18 18-Oct-2014  uebayasi Install agp_* drivers where pchb(4) is installed except INSTALL_FLOPPY.

XXX
Config around agp(4) is done in quite wrong direction.
"pchb <- (agpbus) <- agp <- agp_*"
should be:
"pchb <- (pcibus) <- agp_* <- (agpbus) <- agp"
 1.17 17-Oct-2014  uebayasi Fix another indirect circular dependency (agp_* -> (agpbus) -> pchb -> abp_*).
Fixes "no agp*" build. Reported & build-tested by Kurt Schreiner.
 1.16 10-Jun-2014  hikaru Add VMware VMXNET3 ethernet driver from OpenBSD, vmx(4).
 1.15 05-Dec-2012  christos branches: 1.15.10;
Intel Atom E600 PCI-LPC bridge, adds a watchdog + HPET support. Tested
on a Soekris net6501. (jmcneill)
 1.14 13-Apr-2012  cegger branches: 1.14.2;
Replace amdtempbus with amdnb_miscbus.
This allows us to have independent drivers on the same device (northbridge f3)
each coming with a certain functionality/feature.
This way we do not need to mess with amdtemp(4) to utilize other features.
 1.13 18-Aug-2011  jakllsch branches: 1.13.2; 1.13.6;
Attach amdtemp(4) at pchb(4) instead of in place of pchb(4).

Should fix PR#45268.
 1.12 15-Jun-2011  jruoho Factor out hpet(4) from ichlpcib(4).
 1.11 04-Apr-2011  bouyer branches: 1.11.2;
Add a driver for RDC's vortex86/PMX-1000 SoC PCI/ISA bridge, with support
for the integrated watchdog timer.
 1.10 23-Jul-2010  jakllsch branches: 1.10.2;
Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.9 14-May-2010  phx gcscpcib depends on functions from x86/pci/pcib.c
 1.8 27-Sep-2009  jakllsch branches: 1.8.2; 1.8.4;
gpio(4) support for Intel ICH southbridges.

Tested on Intel SS4200-E (ICH7), and Acorp 6A815EPD (ICH2) motherboards,
on amd64 and i386 ports respectively.

It should be noted that the majority of boards with ICH chips do not
expose the GPIO pins for off-board use. For instance, aside from the
three exposed-on-a-header pins on the 6A815EPD, another pin is also
used to control write protect on the FWH. The SS4200 exposes the GPIO
on a header that connects to the 10 LEDs on the front panel, as well
as a tact switch on the back panel.
 1.7 03-Aug-2008  joerg branches: 1.7.8;
Move some MD declarations from x86/pci/files.pci to x86/conf/files.x86,
so that Xen can use the former.

Drop Xen's pcib.c in favor of the x86 code and thereby unbreak ichlpcib.
 1.6 18-May-2008  jmcneill branches: 1.6.4;
Add support for PCI_BUS_FIXUP and PCI_ADDR_FIXUP on amd64.
 1.5 22-Apr-2008  cegger branches: 1.5.2; 1.5.4;
amdtemp(4): Driver for AMD CPU Temperature Sensors. Adopted from OpenBSD's kate(4).
Changes beyond OpenBSD's driver:
- Improved support for AMD K8
- Added support for AMD Barcelona, AMD Phenom and AMD Griffin
Tested on various single and multi-socket machines.
Review and OK xtreame
 1.4 09-Dec-2007  jmcneill branches: 1.4.10; 1.4.12;
Merge jmcneill-pm branch.
 1.3 26-Oct-2007  xtraeme branches: 1.3.2; 1.3.6; 1.3.8; 1.3.10; 1.3.12;
Share pcib(4) and amdpcib(4) between i386 and amd64; one copy is enough.
 1.2 26-Oct-2007  xtraeme - Share pchb(4) between i386 and amd64; one copy is enough for both.
- Move some of the x86 PCI devices into x86/pci/files.pci.
- Add more x86 stuff into x86/conf/files.x86.

ok joerg.
 1.1 04-Sep-2007  joerg branches: 1.1.2; 1.1.6;
file files.pci was initially added on branch jmcneill-pm.
 1.1.6.1 13-Nov-2007  bouyer Sync with HEAD
 1.1.2.4 28-Oct-2007  joerg Sync with HEAD.
 1.1.2.3 05-Sep-2007  joerg Correctly attach HPET on ichlpcib. Patch and hints how to do this
from cube@
 1.1.2.2 04-Sep-2007  joerg Don't use a global variable to decide whether this is a ICH6+,
use a variable in the softc to determine whether the RCBA is supported.
Add generic HPET support for ICH5 and ICH6+.

This is not (yet) enabled by default, until someone adds the code to
not use the direct attachment if hpet was configured via ACPI.
 1.1.2.1 04-Sep-2007  joerg Move common PCI devices on i386 and amd64 into a arch/x86/pci/fils.pci.
 1.3.12.1 11-Dec-2007  yamt sync with head.
 1.3.10.1 26-Dec-2007  ad Sync with head.
 1.3.8.2 03-Dec-2007  ad Sync with HEAD.
 1.3.8.1 26-Oct-2007  ad file files.pci was added on branch vmlocking on 2007-12-03 19:04:26 +0000
 1.3.6.3 09-Jan-2008  matt sync with HEAD
 1.3.6.2 06-Nov-2007  matt sync with HEAD
 1.3.6.1 26-Oct-2007  matt file files.pci was added on branch matt-armv6 on 2007-11-06 23:23:41 +0000
 1.3.2.3 21-Jan-2008  yamt sync with head
 1.3.2.2 27-Oct-2007  yamt sync with head.
 1.3.2.1 26-Oct-2007  yamt file files.pci was added on branch yamt-lazymbuf on 2007-10-27 11:28:58 +0000
 1.4.12.1 18-May-2008  yamt sync with head.
 1.4.10.2 28-Sep-2008  mjf Sync with HEAD.
 1.4.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.5.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.5.2.3 11-Aug-2010  yamt sync with head.
 1.5.2.2 11-Mar-2010  yamt sync with head
 1.5.2.1 04-May-2009  yamt sync with head.
 1.6.4.1 19-Oct-2008  haad Sync with HEAD.
 1.7.8.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.7.8.3 02-May-2011  jym Sync with head.
 1.7.8.2 24-Oct-2010  jym Sync with HEAD
 1.7.8.1 01-Nov-2009  jym Sync with HEAD.
 1.8.4.3 21-Apr-2011  rmind sync with head
 1.8.4.2 05-Mar-2011  rmind sync with head
 1.8.4.1 30-May-2010  rmind sync with head
 1.8.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.10.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.13.6.1 29-Apr-2012  mrg sync to latest -current.
 1.13.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.13.2.1 17-Apr-2012  yamt sync with head
 1.14.2.3 03-Dec-2017  jdolecek update from HEAD
 1.14.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.14.2.1 25-Feb-2013  tls resync with head
 1.15.10.1 10-Aug-2014  tls Rebase.
 1.19.2.1 06-Jun-2015  skrll Sync with HEAD
 1.20.10.1 05-Feb-2018  martin Pull up following revision(s) (requested by pgoyette in ticket #524):

distrib/sets/lists/man/mi 1.1574
distrib/sets/lists/modules/md.amd64 1.73
distrib/sets/lists/modules/md.i386 1.76
share/man/man4/amdtemp.4 1.11
share/man/man4/man4.x86/Makefile 1.17
share/man/man4/man4.x86/amdsmn.4 1.1-1.3
share/man/man4/man4.x86/amdzentemp.4 1.1-1.6
sys/arch/amd64/conf/ALL 1.79,1.80
sys/arch/amd64/conf/GENERIC 1.482,1.484
sys/arch/amd64/conf/XEN3_DOM0 1.146,1.147
sys/arch/x86/pci/amdsmn.c 1.1-1.2
sys/arch/x86/pci/amdsmn.h 1.1
sys/arch/x86/pci/amdzentemp.c 1.1-1.7
sys/arch/x86/pci/files.pci 1.22,1.23
sys/modules/amdzentemp/amdzentemp.ioconf 1.2


Add amdzentemp from FreeBSD via Ian Clark.

man pages for amdsmn and amdzentemp.

Some clean-up on the HISTORY and AUTHORS sections, and addition of a BUGS
section to document the fact that we don't yet handle the required temp
offset, nor do we expose the available thermal-trip value.

Add missing article 'a'

KNF: Put back the blank line following the empty variable declarations
Put back the variable declaration, too, and mark it __diagused
Otherwise a DIAGNOSTIC kernel will complain about the variable being
undeclared.

Correct placement of __diagused attribute.

Modularize the amdsmn(4) driver, and update dependency for amdzentemp(4),
Create amdsmn(4) amd amdzentemp(4) modules for X86.
 1.25.4.2 28-Mar-2021  thorpej Minor rearrangement of the deck chairs to group things together.
 1.25.4.1 23-Mar-2021  thorpej Remove unneceesary "imcsmb" attribute from "imc".
 1.26.16.1 01-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #282):

sys/dev/pci/ichsmb.c: revision 1.82
sys/arch/amd64/conf/GENERIC: revision 1.602
sys/arch/x86/pci/tco.c: revision 1.10
sys/arch/x86/pci/tco.h: revision 1.5
sys/arch/x86/pci/ichlpcib.c: revision 1.59
sys/dev/ic/i82801lpcreg.h: revision 1.17
sys/arch/x86/pci/files.pci: revision 1.27
sys/dev/pci/files.pci: revision 1.446

ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.
Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.9 13-Apr-2015  riastradh Convert arch/x86 to use <sys/rnd*.h>. Omit needless includes.
 1.8 16-Nov-2014  ozaki-r branches: 1.8.2;
Replace callout_stop with callout_halt

In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.

Discussed with martin@ and riastradh@.
 1.7 10-Aug-2014  tls branches: 1.7.2;
Merge tls-earlyentropy branch into HEAD.
 1.6 17-Oct-2013  christos branches: 1.6.2;
remove set but unused variables
 1.5 02-Feb-2012  tls branches: 1.5.2; 1.5.6; 1.5.10;
Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.4 19-Nov-2011  tls branches: 1.4.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.3 01-Jul-2011  dyoung branches: 1.3.2;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.2 23-Aug-2010  jakllsch branches: 1.2.2; 1.2.8;
Move FWH chip detection area entirely within the mapping for
the smaller i82802AB. This is needed as not all BIOSes set a
larger-than-necessary decode range.
 1.1 23-Jul-2010  jakllsch branches: 1.1.2; 1.1.4;
Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.1.4.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.1.4.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.1.4.1 23-Jul-2010  uebayasi file fwhrng.c was added on branch uebayasi-xip on 2010-08-17 06:45:32 +0000
 1.1.2.3 09-Oct-2010  yamt sync with head
 1.1.2.2 11-Aug-2010  yamt sync with head.
 1.1.2.1 23-Jul-2010  yamt file fwhrng.c was added on branch yamt-nfs-mp on 2010-08-11 22:52:56 +0000
 1.2.8.2 05-Mar-2011  rmind sync with head
 1.2.8.1 23-Aug-2010  rmind file fwhrng.c was added on branch rmind-uvmplock on 2011-03-05 20:52:28 +0000
 1.2.2.3 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.2.2.2 24-Oct-2010  jym Sync with HEAD
 1.2.2.1 23-Aug-2010  jym file fwhrng.c was added on branch jym-xensuspend on 2010-10-24 22:48:17 +0000
 1.3.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.2.1 17-Apr-2012  yamt sync with head
 1.4.2.1 18-Feb-2012  mrg merge to -current.
 1.5.10.1 18-May-2014  rmind sync with head
 1.5.6.2 03-Dec-2017  jdolecek update from HEAD
 1.5.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.1 07-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #1201):
sys/kern/kern_ktrace.c: revision 1.166
sys/dev/isa/aps.c: revision 1.16
sys/dev/sysmon/sysmonvar.h: revision 1.45
sys/dev/ir/irframe_tty.c: revision 1.60
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.111-1.112 (patch)
sys/dev/pci/pccbb.c: revision 1.207
sys/dev/wscons/wskbd.c: revision 1.135
sys/dev/usb/ohci.c: revision 1.254
sys/net/if_ecosubr.c: revision 1.41
sys/dev/pcmcia/btbc.c: revision 1.17
sys/arch/x86/x86/via_padlock.c: revision 1.23
sys/dev/sdmmc/sdmmc.c: revision 1.23 (patch)
sys/dev/bluetooth/btkbd.c: revision 1.17
sys/dev/bluetooth/bcsp.c: revision 1.25
sys/arch/x86/pci/fwhrng.c: revision 1.8
sys/dev/ic/nslm7x.c: revision 1.61
share/man/man9/callout.9: revision 1.28 (patch)

Replace callout_stop with callout_halt and ensure the callout
is not running before destroying it.
 1.6.2.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.7.2.1 01-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #279):
sys/kern/kern_ktrace.c: revision 1.166
sys/dev/isa/aps.c: revision 1.16
sys/dev/sysmon/sysmonvar.h: revision 1.45
sys/dev/ir/irframe_tty.c: revision 1.60
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.111
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.112
sys/dev/pci/pccbb.c: revision 1.207
sys/dev/wscons/wskbd.c: revision 1.135
sys/dev/usb/ohci.c: revision 1.254
sys/net/if_ecosubr.c: revision 1.41
sys/dev/pcmcia/btbc.c: revision 1.17
sys/arch/x86/x86/via_padlock.c: revision 1.23
sys/dev/sdmmc/sdmmc.c: revision 1.23
sys/dev/bluetooth/btkbd.c: revision 1.17
sys/dev/bluetooth/bcsp.c: revision 1.25
sys/arch/x86/pci/fwhrng.c: revision 1.8
sys/dev/ic/nslm7x.c: revision 1.61
share/man/man9/callout.9: revision 1.28
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
Discussed with martin@ and riastradh@.
Make it clear that we should use not callout_stop but callout_halt
before callout_destroy
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
In this case, we need to pass an interlock to callout_halt to wait for
the callout complete.
Reviewed by riastradh@.
Kill sme_callout_mtx and use sme_mtx instead
We can use sme_mtx for the callout as well. Actually we should do so
because sme_events_list and some other data that are touched in the
callout should be protected by sme_mtx, not sme_callout_mtx.
Discussed with riastradh@ in
http://mail-index.netbsd.org/tech-kern/2014/11/11/msg017956.html
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
In this case, we need to pass an interlock to callout_halt to wait for
the callout complete. And also we make sure that SME_CALLOUT_INITIALIZED
is unset before calling callout_halt to prevent the callout from calling
callout_schedule. This is the same as what we did in sys/netinet6/mld6.c@1.61.
Reviewed by riastradh@.
 1.8.2.1 06-Jun-2015  skrll Sync with HEAD
 1.4 23-Aug-2010  jakllsch Move FWH chip detection area entirely within the mapping for
the smaller i82802AB. This is needed as not all BIOSes set a
larger-than-necessary decode range.
 1.3 23-Jul-2010  jakllsch Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.2 03-Nov-2009  snj branches: 1.2.2; 1.2.4;
Drop 3rd and 4th clauses, as the copyright holder (Michael Shalayeff) did
in OpenBSD revision 1.2.
 1.1 12-Feb-2006  tron branches: 1.1.2; 1.1.10; 1.1.16; 1.1.22; 1.1.80; 1.1.94;
Share Intel hardware random number generator support between amd64 and
i386 port. This will benefit EM64T systems using Intel i9xx chipsets.
 1.1.94.1 24-Oct-2010  jym Sync with HEAD
 1.1.80.3 09-Oct-2010  yamt sync with head
 1.1.80.2 11-Aug-2010  yamt sync with head.
 1.1.80.1 11-Mar-2010  yamt sync with head
 1.1.22.2 09-Sep-2006  rpaulo sync with head
 1.1.22.1 12-Feb-2006  rpaulo file i82802reg.h was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.1.16.2 21-Jun-2006  yamt sync with head.
 1.1.16.1 12-Feb-2006  yamt file i82802reg.h was added on branch yamt-lazymbuf on 2006-06-21 14:57:56 +0000
 1.1.10.2 22-Apr-2006  simonb Sync with head.
 1.1.10.1 12-Feb-2006  simonb file i82802reg.h was added on branch simonb-timecounters on 2006-04-22 11:38:09 +0000
 1.1.2.2 18-Feb-2006  yamt sync with head.
 1.1.2.1 12-Feb-2006  yamt file i82802reg.h was added on branch yamt-uio_vmspace on 2006-02-18 15:38:54 +0000
 1.2.4.1 05-Mar-2011  rmind sync with head
 1.2.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.62 18-Dec-2024  hans Add support for the Braswell PCU LPC to ichlpcib.
 1.61 09-May-2023  riastradh branches: 1.61.6;
ichlpcib(4): Use config_detach_children.

Delete a lot of unnecessary code with broken error branches involving
config_detach which have probably seldom if ever been exercised.

No substantive functional change intended. Low risk because
ichlpcib(4) is not a removable device, so you have to go out of your
way to exercise detach.
 1.60 09-May-2023  riastradh ichlpcib(4): KNF. No functional change intended.
 1.59 12-Apr-2023  riastradh ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.

Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.58 22-Sep-2022  riastradh branches: 1.58.4;
ichlpcib(4), tco(4): Rename iot -> pmt, ioh -> pmh.

Makes it clearer that this is specifically about the power management
controller (PMC) registers relative to PMBASE.
 1.57 22-Sep-2022  riastradh ichlpcib(4), tco(4): Take `lpcib_' off various names.

For PMC-specific ones, change `lpcib_' to `pmc_'. These are in a
separate PCI device in newer chipsets.

For TCO-specific ones, which may live in different places, whether at
their own base address or as an offset from PMBASE, just leave it as
`tco_' or `tcotimer'.

No functional change intended.
 1.56 22-Sep-2022  riastradh tco(4): Rename lpcib_tco_attach_args -> tco_attach_args.

No longer hangs off LPC bus, newer devices hang it off SMBus.
 1.55 22-Sep-2022  riastradh tco(4): Change has_rcba bit into version number.

Will be useful for newer Intel platform controller hubs.

No functional change intended. Module ABI is unchanged, although older
modules will do something nonseneical when confronted with versions
above 1 -- that will require a revbump (but with any luck, it will make
life easier for versions above 2 easier once we do that).
 1.54 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.53 24-Apr-2021  thorpej branches: 1.53.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.52 03-Jun-2018  maxv branches: 1.52.16;
Constify lpcib_devices[] so that it lands in .rodata (1584 bytes).
 1.51 06-Aug-2016  jakllsch branches: 1.51.14;
Disable gpio(4) attachment to ichlpcib(4) by default.

The GPIO lines on an ICH are usually connected to opaque platform-
defined functionality, and may be manipulated by the ACPI DSDT or other
mechanisms behind our backs. In one instance, it was found this
in combiation with gpio_resume() sabotaged repeated suspend/resume cycles.

GPIO functionality can be enabled by setting ichlpcib_gpio_disable to 0,
for instance with `gdb -write`.
 1.50 17-May-2015  msaitoh Add Core 5G (mobile) LPC support.
 1.49 03-May-2015  pgoyette Separate the watchdog code from the pcib code, and make the watchdog
a loadable module.
 1.48 20-Mar-2015  msaitoh Add Intel C61x and X99 devices.
 1.47 18-Mar-2015  msaitoh Add 9 Series support.
 1.46 13-Jan-2015  msaitoh As I wrote in the last commit, The PMBASE and GPIOBASE registers are not
compltible with the PCI spec and the map sizes are fixed to 128bytes. The
pci_mapreg_submap() function has a code to check the range of the BAR. The
PCI_MAPREG_IO_SIZE() macro returns lower than 128bytes on some machines.
It makes impossible to use pci_mapreg_submap(). Use pci_conf_read() and
bus_space_map() directly. Observed and tested with my Thinkpad X61.
 1.45 26-Dec-2014  msaitoh Fix a bug that ichlpcib(4) maps I/O area incorrectly and then fails to attach
gpio. It might also fix ACPI related problem described in PR#48960:
- The LPCIB_PCI_PMBASE and LPCIB_PCI_GPIO register are alike PCI BAR but not
completely compatible with it. It's ok because the registers' addresses are
out of BAR0-BAR5(0x10-0x24) and are located in the device-dependent header.
The PMBASE and GPIO registers define the base address and the type but not
describe the size. The size is fixed to 128bytes. So use
pci_mapreg_submap().
- Make pci_mapreg_submap() extern again.
- Fix the calculation of the map size in pci_mapreg_submap().
 1.44 15-Dec-2014  msaitoh Add DH89xxC[CL] LPC devices.
 1.43 04-Jan-2014  msaitoh branches: 1.43.4; 1.43.6;
Add Z68 LPC.
 1.42 04-Jan-2014  msaitoh Temporary disable C2000 PCU because the behavior of the wdog is little strange.
 1.41 03-Jan-2014  msaitoh Add C2000 Platform Controller Unit(PCU).
 1.40 17-Sep-2013  jakllsch Use '\n' at the end of all aprint_error_dev() format strings.
 1.39 04-Jun-2013  msaitoh branches: 1.39.2;
Add Intel 8 Series / C220 Series LPC devices.
 1.38 12-Jan-2013  riastradh Match the C600's ichlpcib.
 1.37 19-Dec-2012  msaitoh Add Intel 7 series' LPC devices.
 1.36 06-Dec-2012  msaitoh Add support 3400 series, 5 series, C216, 82801GH, 82801E and 6300ESB.
 1.35 06-Dec-2012  msaitoh No functional change:
- Remove trailing white space.
- Sort entries.
- Remove duplicated entries.
 1.34 17-Nov-2011  riz branches: 1.34.6; 1.34.10;
Also match ICH8, ICH9 and ICH10 devices. Tested on ICH10.
 1.33 14-Aug-2011  msaitoh branches: 1.33.2;
Add some LPC entries for Intel 6 series and C20x.
 1.32 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.31 15-Jun-2011  jruoho Factor out hpet(4) from ichlpcib(4).
 1.30 06-Jun-2011  msaitoh Rename to use PCI_PRODUCT_INTEL_82801DBM_LPC
 1.29 04-Apr-2011  dyoung branches: 1.29.2;
Neither pci_dma64_available(), pci_probe_device(), pci_mapreg_map(9),
pci_find_rom(), pci_intr_map(9), pci_enumerate_bus(), nor the match
predicate passed to pciide_compat_intr_establish() should ever modify
their pci_attach_args argument, so make their pci_attach_args arguments
const and deal with the fallout throughout the kernel.

For the most part, these changes add a 'const' where there was no
'const' before, however, some drivers and MD code used to modify
pci_attach_args. Now those drivers either copy their pci_attach_args
and modify the copy, or refrain from modifying pci_attach_args:

Xen: according to Manuel Bouyer, writing to pci_attach_args in
pci_intr_map() was a leftover from Xen 2. Probably a bug. I
stopped writing it. I have not tested this change.

siside(4): sis_hostbr_match() needlessly wrote to pci_attach_args.
Probably a bug. I use a temporary variable. I have not tested this
change.

slide(4): sl82c105_chip_map() overwrote the caller's pci_attach_args.
Probably a bug. Use a local pci_attach_args. I have not tested
this change.

viaide(4): via_sata_chip_map() and via_sata_chip_map_new() overwrote the
caller's pci_attach_args. Probably a bug. Make a local copy of the
caller's pci_attach_args and modify the copy. I have not tested
this change.

While I'm here, make pci_mapreg_submap() static.

With these changes in place, I have tested the compilation of these
kernels:

alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-eb NSLU2
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE GUMSTIX
HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321 IXDP425 IXM1200
KUROBOX_PRO LUBBOCK MARVELL_NAS NAPPI SHEEVAPLUG SMDK2800 TEAMASA_NPWR
TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sgimips GENERIC32_IP2x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC

As of Sun Apr 3 15:26:26 CDT 2011, I could not compile these kernels
with or without my patches in place:

### evbmips-el GDIUM

nbmake: nbmake: don't know how to make /home/dyoung/pristine-nbsd/src/sys/arch/mips/mips/softintr.c. Stop

### evbarm-el MPCSA_GENERIC
src/sys/arch/evbarm/conf/MPCSA_GENERIC:318: ds1672rtc*: unknown device `ds1672rtc'

### ia64 GENERIC

/tmp/genassym.28085/assym.c: In function 'f111':
/tmp/genassym.28085/assym.c:67: error: invalid application of 'sizeof' to incomplete type 'struct pcb'
/tmp/genassym.28085/assym.c:76: error: dereferencing pointer to incomplete type

### sgimips GENERIC32_IP3x

crmfb.o: In function `crmfb_attach':
crmfb.c:(.text+0x2304): undefined reference to `ddc_read_edid'
crmfb.c:(.text+0x2304): relocation truncated to fit: R_MIPS_26 against `ddc_read_edid'
crmfb.c:(.text+0x234c): undefined reference to `edid_parse'
crmfb.c:(.text+0x234c): relocation truncated to fit: R_MIPS_26 against `edid_parse'
crmfb.c:(.text+0x2354): undefined reference to `edid_print'
crmfb.c:(.text+0x2354): relocation truncated to fit: R_MIPS_26 against `edid_print'
 1.28 06-Sep-2010  christos branches: 1.28.2;
make it compile.
 1.27 17-Aug-2010  jakllsch Match ichlpcib(4) on ICH0 (82801AB_LPC).
 1.26 23-Jul-2010  jakllsch Finish cleaning up pchb from recent change.
Use fewer magic numbers in ichlpcib.
Slightly improve style conformance.
Update paths in cpp re-inclusion guards.
 1.25 23-Jul-2010  jakllsch Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.24 24-Feb-2010  dyoung branches: 1.24.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
 1.23 08-Jan-2010  dyoung branches: 1.23.2;
Expand PMF_FN_* macros.
 1.22 08-Jan-2010  dyoung Move all copies of ifattr_match() to sys/kern/subr_autoconf.c.
 1.21 27-Sep-2009  jakllsch Make this build without gpio(4).
 1.20 27-Sep-2009  jakllsch gpio(4) support for Intel ICH southbridges.

Tested on Intel SS4200-E (ICH7), and Acorp 6A815EPD (ICH2) motherboards,
on amd64 and i386 ports respectively.

It should be noted that the majority of boards with ICH chips do not
expose the GPIO pins for off-board use. For instance, aside from the
three exposed-on-a-header pins on the 6A815EPD, another pin is also
used to control write protect on the FWH. The SS4200 exposes the GPIO
on a header that connects to the 10 LEDs on the front panel, as well
as a tact switch on the back panel.
 1.19 18-Aug-2009  dyoung Let us detach ichlpcib(4) and its children.

XXX More testing is needed. I've tested this on a Dell Dimension 3000,
XXX but that system does not attach every possible device that I try to
XXX detach with this code:

ichlpcib0 at pci0 dev 31 function 0
ichlpcib0: vendor 0x8086 product 0x24d0 (rev. 0x02)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
isa0 at ichlpcib0
 1.18 11-Aug-2009  bouyer Fix watchdog code:
- the timer bound constants are in tick, so convert period to tick before
checking it against the bounds
- for ICH5 or older, fix code that would have always written a 0 period
to the register.
 1.17 29-Apr-2009  njoly Attach 82801IEM LPC Interface Bridge too.
 1.16 04-Apr-2009  joerg Restore SpeedStep settings on shutdown. Some BIOSes don't like it, if
SpeedStep is enabled and powerdown fails otherwise. Fixes PR kern/40487.
 1.15 03-Mar-2009  mrg don't enable speedstep on systems with intel 82855GM host bridges.
 1.14 13-Oct-2008  joerg branches: 1.14.2; 1.14.4; 1.14.8;
Intel Tempest can use ichlpcib as well.
 1.13 14-Aug-2008  yamt revert some parts of the following commit.
(given that it reverted other developers' changes saying
"misc/cosmetic changes", i assume that it was unintentional.)
this makes a watchdog on my box (8086:24d0) work again.
----------------------------
revision 1.1
date: 2007/08/26 16:49:47; author: xtraeme; state: Exp;
branches: 1.1.2;
Some changes for the ichlpcib driver:

- Moved to x86/pci, so that EM64T systems running NetBSD/amd64 can use it.
- Added support for the TCO on ICH6 or newer chipsets, adapted from
FreeBSD.
- Added timecounter support for the power management timer, adapted from
OpenBSD.
- Plus some misc/cosmetic changes.

Thanks to yukonbob on irc@freenode for testing the TCO part on ICH4-M.
Tested by me with ICH7 too.
 1.12 20-Jul-2008  martin Make struct pcib_softc explicit in our softc.
 1.11 28-Apr-2008  martin branches: 1.11.2; 1.11.4; 1.11.6;
Remove clause 3 and 4 from TNF licenses
 1.10 16-Apr-2008  cegger branches: 1.10.2; 1.10.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.9 21-Mar-2008  xtraeme Split device_t/softc for ichlpcib(4) and all hpet consumers, plus
other related cosmetic changes.
 1.8 29-Feb-2008  dyoung Use PMF_FN_ARGS, PMF_FN_PROTO.
 1.7 15-Jan-2008  drochner branches: 1.7.2; 1.7.6;
fix some unaligned PCI config space accesses in suspend/resume functions
 1.6 09-Dec-2007  jmcneill branches: 1.6.2;
Merge jmcneill-pm branch.
 1.5 23-Nov-2007  xtraeme branches: 1.5.2; 1.5.4;
tcotimer_setmode: convert seconds to ticks after the value has been
checked with the limits. We can use now the max timeout value on ICH6
or newer (i.e 613 seconds and not the half of it as previously).
 1.4 03-Sep-2007  xtraeme branches: 1.4.2; 1.4.4; 1.4.6; 1.4.10; 1.4.14;
Improve some comments.
 1.3 01-Sep-2007  ober Attach to ICH8M LPC.
Tested watchdog and it works.
ok xtraeme@
 1.2 29-Aug-2007  xtraeme Attach to the ICH9 LPC Interface Bridges. The datasheet doesn't mention
any difference in the TCO part (compared to ICH[678]).
 1.1 26-Aug-2007  xtraeme branches: 1.1.2;
Some changes for the ichlpcib driver:

- Moved to x86/pci, so that EM64T systems running NetBSD/amd64 can use it.
- Added support for the TCO on ICH6 or newer chipsets, adapted from
FreeBSD.
- Added timecounter support for the power management timer, adapted from
OpenBSD.
- Plus some misc/cosmetic changes.

Thanks to yukonbob on irc@freenode for testing the TCO part on ICH4-M.
Tested by me with ICH7 too.
 1.1.2.3 23-Mar-2008  matt sync with HEAD
 1.1.2.2 09-Jan-2008  matt sync with HEAD
 1.1.2.1 06-Nov-2007  matt sync with HEAD
 1.4.14.3 18-Feb-2008  mjf Sync with HEAD.
 1.4.14.2 27-Dec-2007  mjf Sync with HEAD.
 1.4.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.4.10.3 03-Dec-2007  ad Sync with HEAD.
 1.4.10.2 09-Oct-2007  ad Sync with head.
 1.4.10.1 03-Sep-2007  ad file ichlpcib.c was added on branch vmlocking on 2007-10-09 13:38:43 +0000
 1.4.6.17 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.4.6.16 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.4.6.15 12-Nov-2007  joerg GC unused softc field.
 1.4.6.14 06-Nov-2007  joerg Refactor PNP API:
- Make suspend/resume directly a device functionality. It consists of
three layers (class logic, device logic, bus logic), all of them being
optional. This replaces D0/D3 transitions.
- device_is_active returns true if the device was not disabled and was
not suspended (even partially), device_is_enabled returns true if the
device was enabled.
- Change pnp_global_transition into pnp_system_suspend and
pnp_system_resume. Before running any suspend/resume handlers, check
that all currently attached devices support power management and bail
out otherwise. The latter is not done for the shutdown/panic case.
- Make the former bus-specific generic network handlers a class handler.
- Make PNP message like volume up/down/toogle PNP events. Each device
can register what events they are interested in and whether the handler
should be global or not.
- Introduce device_active API for devices to mark themselve in use from
either the system or the device. Use this to implement the idle handling
for audio and input devices. This is intended to replace most ad-hoc
watchdogs as well.
- Fix somes situations in which audio resume would lose mixer settings.
- Make USB host controllers better deal with suspend in the light of
shared interrupts.
- Flush filesystem cache on suspend.
- Flush disk caches on suspend. Put ATA disks into standby on suspend as
well.
- Adopt drivers to use the new PNP API.
- Fix a critical bug in the generic cardbus layer that made D0->D3
break.
- Fix ral(4) to set if_stop.
- Convert cbb(4) to the new PNP API.
- Apply the PCI Express SCI fix on resume again.
 1.4.6.13 01-Oct-2007  joerg Extend device API by device_power_private and device_power_set_private.
The latter is a temporary mean until the pnp_register API itself is
overhault. This functions allow a generic power handler to store its
state independent of the driver.

Use this and revamp the PCI power handling. Pretty much all PCI devices
had power handlers that did the same thing, generalize this in
pci_generic_power_register/deregister and the handler. This interface
offers callbacks for the drivers to save and restore state on
transistions. After a long discussion with jmcneill@ it was considered
to be powerful enough until evidence is shown that devices can handle
D1/D2 with less code and higher speed than without the full
save/restore. The generic code is carefully written to handle device
without PCI-PM support and ensure that the correct registers are written
to when D3 loses all state.

Reimplement the generic PCI network device handling on
top of PCI generic power handling.

Introduce pci_disable_retry as used and implemented locally at least by
ath(4) and iwi(4). Use it in this drivers to restore behaviour from
before the introduction of generic PCI network handling.

Convert all PCI drivers that were using pnp_register to the new
framework. The only exception is vga(4) as it is commonly used as
console device. Add a note therein that this should be fixed later.
 1.4.6.12 05-Sep-2007  cube Finish previous commit.
 1.4.6.11 05-Sep-2007  cube Avoid any future confusion by renaming the (unused) first argument of
lpci_hpet_match from "self" to "parent". Sprinkle a few device_t while
there.
 1.4.6.10 05-Sep-2007  joerg Try to map the HPET window in the match function to deal with the
possible ACPI attachmened HPET earlier and cleaner.
 1.4.6.9 05-Sep-2007  joerg Move variables into the branches where they are used. Make status
32bit wide to match the actual register operations.
 1.4.6.8 05-Sep-2007  joerg Correctly attach HPET on ichlpcib. Patch and hints how to do this
from cube@
 1.4.6.7 05-Sep-2007  jmcneill XXX because of pcibattach, we need to keep our softcs in sync with pcib(4)
 1.4.6.6 05-Sep-2007  joerg Push the mapping of the HPET register window into the attach function.
This should prevent DIAGNOSTIC from complaining when hpet is attached
via ACPI.
 1.4.6.5 04-Sep-2007  joerg Don't use a global variable to decide whether this is a ICH6+,
use a variable in the softc to determine whether the RCBA is supported.
Add generic HPET support for ICH5 and ICH6+.

This is not (yet) enabled by default, until someone adds the code to
not use the direct attachment if hpet was configured via ACPI.
 1.4.6.4 04-Sep-2007  joerg Merge back power management changes from arch/i386/pci/ichlpcib.c.
 1.4.6.3 04-Sep-2007  joerg Explicitly remember pci_attach_args and drop the corresponding argument
processing.
 1.4.6.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.4.6.1 03-Sep-2007  jmcneill file ichlpcib.c was added on branch jmcneill-pm on 2007-09-03 16:47:46 +0000
 1.4.4.6 24-Mar-2008  yamt sync with head.
 1.4.4.5 17-Mar-2008  yamt sync with head.
 1.4.4.4 21-Jan-2008  yamt sync with head
 1.4.4.3 07-Dec-2007  yamt sync with head
 1.4.4.2 03-Sep-2007  yamt sync with head.
 1.4.4.1 03-Sep-2007  yamt file ichlpcib.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:22 +0000
 1.4.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.4.2.1 03-Sep-2007  skrll file ichlpcib.c was added on branch nick-csl-alignment on 2007-09-03 10:19:51 +0000
 1.5.4.1 11-Dec-2007  yamt sync with head.
 1.5.2.1 26-Dec-2007  ad Sync with head.
 1.6.2.1 19-Jan-2008  bouyer Sync with HEAD
 1.7.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.7.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.7.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.7.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.7.2.1 24-Mar-2008  keiichi sync with head.
 1.10.4.6 09-Oct-2010  yamt sync with head
 1.10.4.5 11-Aug-2010  yamt sync with head.
 1.10.4.4 11-Mar-2010  yamt sync with head
 1.10.4.3 19-Aug-2009  yamt sync with head.
 1.10.4.2 04-May-2009  yamt sync with head.
 1.10.4.1 16-May-2008  yamt sync with head.
 1.10.2.1 18-May-2008  yamt sync with head.
 1.11.6.1 19-Oct-2008  haad Sync with HEAD.
 1.11.4.1 28-Jul-2008  simonb Sync with head.
 1.11.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.14.8.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.14.8.4 02-May-2011  jym Sync with head.
 1.14.8.3 24-Oct-2010  jym Sync with HEAD
 1.14.8.2 01-Nov-2009  jym Sync with HEAD.
 1.14.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.14.4.4 26-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #1944):
sys/arch/x86/pci/ichlpcib.c: revision 1.46

The PMBASE and GPIOBASE registers are not compltible with the PCI spec
and the map sizes are fixed to 128bytes. The pci_mapreg_submap()
function has a code to check the range of the BAR. The
PCI_MAPREG_IO_SIZE() macro returns lower than 128bytes on some
machines. It makes impossible to use pci_mapreg_submap(). Use
pci_conf_read() and bus_space_map() directly.
 1.14.4.3 23-Jan-2015  martin Pull up the following changes, requested by msaitoh in ticket #1942:

sys/arch/x86/pci/ichlpcib.c 1.40, 1.45 via patch
sys/dev/ic/i82801lpcreg.h 1.12
sys/dev/pci/pci_map.c 1.32 via patch

- Fix a bug that ichlpcib(4) maps I/O area incorrectly. It might also
fixes ACPI related problem described in PR#48960:
- The LPCIB_PCI_PMBASE and LPCIB_PCI_GPIO register are alike PCI BAR
but not completely compatible with it. It's ok because the
registers' addresses are out of BAR0-BAR5(0x10-0x24) and are
located in the device-dependent header. The PMBASE and GPIO
registers define the base address and the type but not describe
the size. The size is fixed to 128bytes. So use
pci_mapreg_submap().
- Fix the calculation of the map size in pci_mapreg_submap().
- Use '\n' at the end of aprint_error_dev() format strings.
 1.14.4.2 16-Aug-2009  snj Pull up following revision(s) (requested by bouyer in ticket #912):
sys/arch/x86/pci/ichlpcib.c: revision 1.18
Fix watchdog code:
- the timer bound constants are in tick, so convert period to tick before
checking it against the bounds
- for ICH5 or older, fix code that would have always written a 0 period
to the register.
 1.14.4.1 07-Apr-2009  snj branches: 1.14.4.1.2; 1.14.4.1.4;
Pull up following revision(s) (requested by joerg in ticket #669):
sys/arch/x86/pci/ichlpcib.c: revision 1.16
Restore SpeedStep settings on shutdown. Some BIOSes don't like it, if
SpeedStep is enabled and powerdown fails otherwise. Fixes PR kern/40487.
 1.14.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.14.4.1.2.1 16-Aug-2009  snj Pull up following revision(s) (requested by bouyer in ticket #912):
sys/arch/x86/pci/ichlpcib.c: revision 1.18
Fix watchdog code:
- the timer bound constants are in tick, so convert period to tick before
checking it against the bounds
- for ICH5 or older, fix code that would have always written a 0 period
to the register.
 1.14.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.14.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.23.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.23.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.23.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.24.2.3 12-Jun-2011  rmind sync with head
 1.24.2.2 21-Apr-2011  rmind sync with head
 1.24.2.1 05-Mar-2011  rmind sync with head
 1.28.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.29.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.33.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.33.2.3 23-Jan-2013  yamt sync with head
 1.33.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.33.2.1 17-Apr-2012  yamt sync with head
 1.34.10.4 03-Dec-2017  jdolecek update from HEAD
 1.34.10.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.10.2 23-Jun-2013  tls resync from head
 1.34.10.1 25-Feb-2013  tls resync with head
 1.34.6.2 26-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #1239):
sys/arch/x86/pci/ichlpcib.c: revision 1.46
The PMBASE and GPIOBASE registers are not
compatible with the PCI spec and the map sizes are fixed to 128bytes. The
pci_mapreg_submap() function has a code to check the range of the BAR. The
PCI_MAPREG_IO_SIZE() macro returns lower than 128bytes on some machines.
This makes it impossible to use pci_mapreg_submap(). Use pci_conf_read() and
bus_space_map() directly.
 1.34.6.1 16-Jan-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #1229):
sys/arch/x86/pci/ichlpcib.c: revision 1.40, 1.45
sys/dev/pci/pcivar.h: revision 1.101
sys/dev/pci/pci_map.c: revision 1.32
sys/dev/ic/i82801lpcreg.h: revision 1.12
Use '\n' at the end of all aprint_error_dev() format strings.
--
Fix a bug that ichlpcib(4) maps I/O area incorrectly and then fails to attach
gpio. It might also fixes ACPI related problem described in PR#48960:
- The LPCIB_PCI_PMBASE and LPCIB_PCI_GPIO register are alike PCI BAR but not
completely compatible with it. It's ok because the registers' addresses are
out of BAR0-BAR5(0x10-0x24) and are located in the device-dependent header.
The PMBASE and GPIO registers define the base address and the type but not
describe the size. The size is fixed to 128bytes. So use
pci_mapreg_submap().
- Make pci_mapreg_submap() extern again.
- Fix the calculation of the map size in pci_mapreg_submap().
 1.39.2.1 18-May-2014  rmind sync with head
 1.43.6.3 05-Oct-2016  skrll Sync with HEAD
 1.43.6.2 06-Jun-2015  skrll Sync with HEAD
 1.43.6.1 06-Apr-2015  skrll Sync with HEAD
 1.43.4.6 09-Dec-2016  snj Pull up following revision(s) (requested by msaitoh in ticket #1295):
sys/arch/x86/pci/ichlpcib.c: revision 1.50
Add Core 5G (mobile) LPC support.
 1.43.4.5 28-Aug-2016  snj branches: 1.43.4.5.2;
Pull up following revision(s) (requested by maya in ticket #1213):
sys/arch/x86/pci/ichlpcib.c: revision 1.51
Disable gpio(4) attachment to ichlpcib(4) by default.
The GPIO lines on an ICH are usually connected to opaque platform-
defined functionality, and may be manipulated by the ACPI DSDT or other
mechanisms behind our backs. In one instance, it was found this
in combiation with gpio_resume() sabotaged repeated suspend/resume cycles.
GPIO functionality can be enabled by setting ichlpcib_gpio_disable to 0,
for instance with `gdb -write`.
 1.43.4.4 30-Apr-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #725):
sys/arch/x86/pci/ichlpcib.c: revisions 1.47, 1.48
sys/dev/pci/ichsmb.c: revisions 1.39, 1.40, 1.41
sys/dev/pci/pucdata.c: revision 1.94
Add 9 Series support.
--
Add Intel C61x and X99 devices.
--
attach Mobile 5th Gen. Core SMBus
 1.43.4.3 17-Feb-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #528):
sys/dev/pci/ichsmb.c: revision 1.38
sys/dev/pci/pcidevs: revision 1.1206
sys/dev/pci/pcidevs: revision 1.1207
sys/arch/x86/pci/ichlpcib.c: revision 1.44
Change Intel 0x0434 entry:
- Rename DH89XX_QA to DH89XXCC_IQIA
- Modify the description to DH89xxCC PCIe Endpoint and QuickAssist
(include typo fix)
- Rename DH89xxCC's names from DH89XX_ to DH89XXCC_.
- Add some DH89xxCC's devices.
- Add DH89XXCL's devices.
- Rename PCI_PRODUCT_INTEL_DH89XX_SMB to PCI_PRODUCT_INTEL_DH89XXCC_SMB
- Add PCI_PRODUCT_INTEL_DH89XXCL_SMB
Add DH89xxC[CL] LPC devices.
 1.43.4.2 26-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #455):
sys/arch/x86/pci/ichlpcib.c: revision 1.46

The PMBASE and GPIOBASE registers are not compatible with the PCI spec
and the map sizes are fixed to 128bytes. The pci_mapreg_submap()
function has a code to check the range of the BAR. The
PCI_MAPREG_IO_SIZE() macro returns lower than 128bytes on some
machines. This makes it impossible to use pci_mapreg_submap(). Use
pci_conf_read() and bus_space_map() directly.
 1.43.4.1 08-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #394):
sys/dev/pci/pcivar.h: revision 1.101
sys/dev/pci/pci_map.c: revision 1.32
sys/dev/ic/i82801lpcreg.h: revision 1.12
sys/arch/x86/pci/ichlpcib.c: revision 1.45
Fix a bug that ichlpcib(4) maps I/O area incorrectly and then fails to attach
gpio. It might also fix ACPI related problem described in PR#48960:
- The LPCIB_PCI_PMBASE and LPCIB_PCI_GPIO register are alike PCI BAR but not
completely compatible with it. It's ok because the registers' addresses are
out of BAR0-BAR5(0x10-0x24) and are located in the device-dependent header.
The PMBASE and GPIO registers define the base address and the type but not
describe the size. The size is fixed to 128bytes. So use
pci_mapreg_submap().
- Make pci_mapreg_submap() extern again.
- Fix the calculation of the map size in pci_mapreg_submap().
 1.43.4.5.2.1 18-Jan-2017  skrll Sync with netbsd-5
 1.51.14.1 25-Jun-2018  pgoyette Sync with HEAD
 1.52.16.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.53.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.58.4.1 01-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #282):

sys/dev/pci/ichsmb.c: revision 1.82
sys/arch/amd64/conf/GENERIC: revision 1.602
sys/arch/x86/pci/tco.c: revision 1.10
sys/arch/x86/pci/tco.h: revision 1.5
sys/arch/x86/pci/ichlpcib.c: revision 1.59
sys/dev/ic/i82801lpcreg.h: revision 1.17
sys/arch/x86/pci/files.pci: revision 1.27
sys/dev/pci/files.pci: revision 1.446

ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.
Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.61.6.1 02-Aug-2025  perseant Sync with HEAD
 1.3 01-Jul-2011  dyoung branches: 1.3.2;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.2 15-Jun-2011  jruoho branches: 1.2.2;
Modularize hpet(4). Works nicely with the multiple bus locations.
 1.1 15-Jun-2011  jruoho Factor out hpet(4) from ichlpcib(4).
 1.2.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.2.2.1 15-Jun-2011  cherry file ichlpcib_hpet.c was added on branch cherry-xenmp on 2011-06-23 14:19:48 +0000
 1.3.2.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.3.2.1 01-Jul-2011  jym file ichlpcib_hpet.c was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.61 14-Oct-2020  ryo vmx(4) should be MI. moved to sys/dev/pci from sys/arch/x86/pci
 1.60 27-Apr-2020  yamaguchi Fix the wrong logic about making the number of vmx(4) TX/RX queue
be power of two

reviewed by nonaka@n.o.
 1.59 24-Mar-2020  knakahara fix vmx(4) cannot link up at boot time. reviewed by msaitoh@n.o, thanks.

vmx(4) could call if_link_state_change(ifp, LINK_STATE_UP) from vmxnet3_init()
before ifp->if_link_cansched was set, because dp->dom_if_up() (in6_if_up() for
INET6) could call ifp->if_init(). And then, workqueue_enqueue() was not called
at that time.
As the result, the last LQ_ITEM was stuck LINK_STATE_UP, so
if_link_state_change_work_schedule() was never called until
if_link_state_change(ifp, LINK_STATE_DOWN) was called.

To fix this issue, vmx(4) avoid calling if_link_state_change() before
ifp->if_link_cansched is set.
 1.58 15-Mar-2020  thorpej Define and implement a locking protocol for the ifmedia / mii layers:
- MP-safe drivers provide a mutex to ifmedia that is used to serialize
access to media-related structures / hardware regsiters. Converted
drivers use the new ifmedia_init_with_lock() function for this. The
new name is provided to ease the transition.
- Un-converted drivers continue to call ifmedia_init(), which will supply
a compatibility lock to be used instead. Several media-related entry
points must be aware of this compatibility lock, and are able to acquire
it recursively a limited number of times, if needed. This is a SPIN
mutex with priority IPL_NET.
- This same lock is used to serialize access to PHY registers and other
MII-related data structures.

The PHY drivers are modified to acquire and release the lock, as needed,
and assert the lock is held as a diagnostic aid.

The "usbnet" framework has had an overhaul of its internal locking
protocols to fit in with the media / mii changes, and the drivers adapted.

USB wifi drivers have been changed to provide their own adaptive mutex
to the ifmedia later via a new ieee80211_media_init_with_lock() function.
This is required because the USB drivers need an adaptive mutex.

Besised "usbnet", a few other drivers are converted: vmx, wm, ixgbe / ixv.

mcx also now calls ifmedia_init_with_lock() because it needs to also use
an adaptive mutex. The mcx driver still needs to be fully converted to
NET_MPSAFE.
 1.57 02-Feb-2020  thorpej - Adopt <net/if_stats.h>.
- Use ifmedia_fini().
 1.56 29-Jan-2020  knakahara Fix missing callout_destroy(). Pointed out by yamaguchi@n.o.
 1.55 29-Jan-2020  knakahara Fix typo in evcnt description. Pointed out by yamaguchi@n.o.
 1.54 06-Jan-2020  msaitoh branches: 1.54.2;
Protect ec_multicnt.
 1.53 24-Dec-2019  knakahara Fix missing splnet() for ether_ioctl() caused by if_vmx.c:r1.32.

pointed out by nonaka@n.o, thanks.
 1.52 27-Nov-2019  maxv localify
 1.51 10-Oct-2019  knakahara Fix kassert failure in vmxnet3_transmit(). Pointed out by ryo@n.o, thanks.
 1.50 30-Sep-2019  knakahara Fix typo in vmxnet3_legacy_intr().

That causes sysctl hw.vmx*.{rx,tx} effect inversely when vmx(4) uses
INTx or MSI.
 1.49 30-Aug-2019  knakahara vmxnet3_softc.vmx_stats should not count globally. pointed out by hikaru@n.o

divide vmxnet3_softc.vmx_stats to each vmxnet3_txqueue and vmxnet3_rxqueue,
furthermore make them evcnt.
 1.48 19-Aug-2019  knakahara add vmx(4) basic statistics counters.

Sorry, I have forgotten this TODO in r1.40 commit message.
 1.47 19-Aug-2019  knakahara fix panic when vmx(4) is detached.
 1.46 01-Aug-2019  knakahara vmx(4) uses interrupt distribution for each queue like ixg(4).
 1.45 30-Jul-2019  knakahara branches: 1.45.2;
vmx(4) can select workqueue for packet processing like ixg(4).
 1.44 29-Jul-2019  knakahara make vmx(4)'s *_process_limit tunable.
 1.43 29-Jul-2019  knakahara vmx(4) uses deferred interrupt handlering like ixg(4).
 1.42 29-Jul-2019  knakahara Fix missing NULL check after softint_establish().
 1.41 29-Jul-2019  knakahara Join Tx interrupt handler and Rx interrupt handler of vmx(4).

That can reduce interrupt resources.
 1.40 24-Jul-2019  knakahara vmx(4) support if_transmit and Tx multiqueue (2/2).

Fix Tx interrupt handler. I tested on ESXi 5.5.

TODO: add statistics counters
 1.39 24-Jul-2019  knakahara refactor: unify vmxnet3_start_locked and vmxnet_transmit_locked
 1.38 24-Jul-2019  knakahara vmx(4) support if_transmit and Tx multiqueue (1/2)

Implemented Tx processing only. Fix Tx interrupt handler later.
 1.37 23-Jul-2019  knakahara vmx(4) can be detached now.
 1.36 22-Jul-2019  knakahara remove unnecessary NULL check after kmem_zalloc(KM_SLEEP)
 1.35 19-Jul-2019  knakahara vmx(4) can be set IFEF_MPSAFE now.

I tested bidirectional forwarding with some ioctls.
 1.34 19-Jul-2019  knakahara Store IFF_ALLMULTI in ec->ec_flags instead of ifp->if_flags.

See such as if_wm.c:1.636.
 1.33 19-Jul-2019  knakahara vmx(4) enable jumbo frame.

I tested 1600 mtu to/from Linux vmxnet3.
 1.32 16-Jul-2019  knakahara Fix vmx(4) MTU setting.

Advised by hikaru@n.o and msaitoh@n.o, thanks.
 1.31 16-Jul-2019  knakahara Eliminate IFF_RUNNING checking code from vmxnet3_init_locked().

Advised by hikaru@n.o, thanks.
 1.30 09-Jul-2019  msaitoh Don't automatically set ec_capenable's ETHERCAP_VLAN_HWTAGGING bit in
vlan_config() to make it user-controllable. Instead, set the bit in
xxx_attach().
 1.29 29-May-2019  msaitoh Even if we don't use MII(4), use the common path of SIOC[GS]IFMEDIA in
sys/net/if_ethersubr.c if we can.
- Add ec_ifmedia into struct ethercom.
- ec_mii in struct ethercom is kept and used as it is. It might be used in
future. Note that some Ethernet drivers which _DOESN'T_ use mii(4) use
ec_mii for keeping the if_media. Those should be changed in future.
 1.28 23-May-2019  msaitoh -No functional change:
- KNF
- u_int*_t -> uint*_t.
 1.27 20-Mar-2019  nonaka PR/54058: vmx(4): Fix device enable command failure when the number of vCPUs
is not a power of two.

Make the size of the vmx(4) TX/RX queue a power of two not exceeding
the number of vCPUs.
 1.26 26-Jun-2018  msaitoh branches: 1.26.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.25 01-Jun-2018  maxv Rename

M_CSUM_DATA_IPv6_HL -> M_CSUM_DATA_IPv6_IPHL
M_CSUM_DATA_IPv6_HL_SET -> M_CSUM_DATA_IPv6_SET

Reduces the diff against IPv4. Also, clarify the definitions.
 1.24 16-Apr-2018  nonaka vmx(4): compute if_ibytes using rxq->vxrxq_stats.vmrxs_ibytes.
 1.23 16-Apr-2018  nonaka vmx(4): handled SIOCZIFDATA.
 1.22 16-Apr-2018  nonaka vmx(4): Fix calculation of interface statistics counter.
 1.21 12-Feb-2018  maxv branches: 1.21.2;
m_free -> m_freem, otherwise leak
 1.20 26-Sep-2017  knakahara VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.

I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html

XXX need pullup to -8 branch
 1.19 20-Feb-2017  knakahara branches: 1.19.6;
Apply deferred if_start to vmx(4).
 1.18 11-Jan-2017  maya branches: 1.18.2;
we cannot guarantee that m_pulldown doesn't fail, as it may fail even
if temporarily out of memory, and it will free the mbuf in this scenario.

test for failure and return error if it happens.

CID 1396651

ok riastradh
 1.17 11-Jan-2017  maya on error, free the mbuf in vmxnet3_txq_offload_ctx, not in callers.

ok riastradh
 1.16 11-Jan-2017  maya GC unused macros.

Even if they were used (and actually asserted), asserting on !mutex_owned
is generally a bad idea, as it may be true in unexpected contexts.

suggested by riastradh, thanks.
 1.15 28-Dec-2016  ozaki-r Protect ec_multi* with mutex

The data can be accessed from sysctl, ioctl, interface watchdog
(if_slowtimo) and interrupt handlers. We need to protect the data against
parallel accesses from them.

Currently the mutex is applied to some drivers, we need to apply it to all
drivers in the future.

Note that the mutex is adaptive one for ease of implementation but some
drivers access the data in interrupt context so we cannot apply the mutex
to every drivers as is. We have two options: one is to replace the mutex
with a spin one, which requires some additional works (see
ether_multicast_sysctl), and the other is to modify the drivers to access
the data not in interrupt context somehow.
 1.14 27-Dec-2016  hikaru Use the correct number of multicast addrs
 1.13 15-Dec-2016  ozaki-r Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.12 08-Dec-2016  ozaki-r Apply deferred if_start framework

if_schedule_deferred_start checks if the if_snd queue contains packets,
so drivers don't need to check it by themselves.
 1.11 29-Nov-2016  dholland PR 51672 David Binderman: M_CSUM_TCPv6, not 2x M_CSUM_TCPv4.
(from context it's quite clear that's what's supposed to be here)
 1.10 28-Nov-2016  martin Mark a variable __diagused as it is only ever used in a KASSERT
 1.9 25-Nov-2016  hikaru Add missing bpf_mtap.
 1.8 25-Nov-2016  hikaru Sync code with FreeBSD to support RSS

- Use MSI/MSI-X if it is available.
- Support TSO.

co-authored by k-nakahara
 1.7 10-Jun-2016  ozaki-r branches: 1.7.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.6 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.5 14-Aug-2014  hikaru branches: 1.5.2; 1.5.4;
Set ifflags callback so that the device can enter promiscuous mode.
 1.4 19-Jul-2014  hikaru branches: 1.4.2; 1.4.4;
Correct return value handling.
m_defrag(9) is different from OpenBSD one,
it returns new mbuf pointer on success, not zero.
 1.3 19-Jun-2014  hikaru Use 64-bit DMA, if it is available.
This fixes null packet handling on guest which have memory over than 3GB.
 1.2 19-Jun-2014  hikaru Make it be able to down I/F. This fixes panic when removing IFF_UP flag.
 1.1 10-Jun-2014  hikaru Add VMware VMXNET3 ethernet driver from OpenBSD, vmx(4).
 1.4.4.1 14-Aug-2014  martin Pull up following revision(s) (requested by hikaru in ticket #15):
sys/arch/x86/pci/if_vmx.c: revision 1.5
Set ifflags callback so that the device can enter promiscuous mode.
 1.4.2.2 10-Aug-2014  tls Rebase.
 1.4.2.1 19-Jul-2014  tls file if_vmx.c was added on branch tls-earlyentropy on 2014-08-10 06:54:11 +0000
 1.5.4.5 28-Aug-2017  skrll Sync with HEAD
 1.5.4.4 05-Feb-2017  skrll Sync with HEAD
 1.5.4.3 05-Dec-2016  skrll Sync with HEAD
 1.5.4.2 09-Jul-2016  skrll Sync with HEAD
 1.5.4.1 19-Mar-2016  skrll Sync with HEAD
 1.5.2.3 03-Dec-2017  jdolecek update from HEAD
 1.5.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.1 14-Aug-2014  tls file if_vmx.c was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
 1.7.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.7.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.18.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.6.8 13-May-2020  martin Pull up following revision(s) (requested by yamaguchi in ticket #1547):

sys/arch/x86/pci/if_vmx.c: revision 1.60

Fix the wrong logic about making the number of vmx(4) TX/RX queue
be power of two

reviewed by nonaka@n.o.
 1.19.6.7 26-Dec-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1477):

sys/arch/x86/pci/if_vmx.c: revision 1.53

Fix missing splnet() for ether_ioctl() caused by if_vmx.c:r1.32.
pointed out by nonaka@n.o, thanks.
 1.19.6.6 22-Jul-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1300):

sys/arch/x86/pci/if_vmx.c: revision 1.31
sys/arch/x86/pci/if_vmx.c: revision 1.32 (via patch)

Eliminate IFF_RUNNING checking code from vmxnet3_init_locked().

Advised by hikaru@n.o, thanks.

-

Fix vmx(4) MTU setting.

Advised by hikaru@n.o and msaitoh@n.o, thanks.
 1.19.6.5 21-Mar-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1219):

sys/arch/x86/pci/if_vmx.c: revision 1.27
PR/54058: vmx(4): Fix device enable command failure when the number of vCPUs
is not a power of two.

Make the size of the vmx(4) TX/RX queue a power of two not exceeding
the number of vCPUs.
 1.19.6.4 16-Apr-2018  martin Pull up following revision(s) (requested by nonaka in ticket #767):

sys/arch/x86/pci/if_vmx.c: revision 1.23,1.24

vmx(4): handled SIOCZIFDATA.

vmx(4): compute if_ibytes using rxq->vxrxq_stats.vmrxs_ibytes.
 1.19.6.3 16-Apr-2018  martin Pull up following revision(s) (requested by nonaka in ticket #762):

sys/arch/x86/pci/if_vmx.c: revision 1.22

vmx(4): Fix calculation of interface statistics counter.
 1.19.6.2 26-Feb-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #577):
sys/arch/x86/pci/if_vmx.c: 1.21
m_free -> m_freem, otherwise leak
 1.19.6.1 24-Oct-2017  snj Pull up following revision(s) (requested by knakahara in ticket #302):
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.30-1.31
sys/arch/x86/pci/if_vmx.c: 1.20
sys/dev/ic/i82557.c: 1.148
sys/dev/ic/rtl8169.c: 1.152
sys/dev/pci/cxgb/cxgb_sge.c: 1.5
sys/dev/pci/if_age.c: 1.51
sys/dev/pci/if_alc.c: 1.25
sys/dev/pci/if_ale.c: 1.23
sys/dev/pci/if_bge.c: 1.311
sys/dev/pci/if_bge.c: 1.312
sys/dev/pci/if_bnx.c: 1.62
sys/dev/pci/if_jme.c: 1.32
sys/dev/pci/if_nfe.c: 1.64
sys/dev/pci/if_sip.c: 1.167
sys/dev/pci/if_stge.c: 1.63-1.64
sys/dev/pci/if_ti.c: 1.102
sys/dev/pci/if_txp.c: 1.48
sys/dev/pci/if_vge.c: 1.61
sys/dev/pci/if_wm.c: 1.538
sys/dev/pci/ixgbe/ix_txrx.c: 1.29 via patch
sys/net/agr/if_agrether_hash.c: 1.4
sys/net/if_ether.h: 1.67-1.68
sys/net/if_ethersubr.c: 1.244
sys/net/if_vlan.c: 1.100
sys/net80211/ieee80211_input.c: 1.89
sys/net80211/ieee80211_output.c: 1.59
sys/sys/mbuf.h: 1.171
VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.
I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html
--
only get vtag when we have vtag like the other drivers.
--
- only get the vtag if we have it like the other drivers
- mask the hardware vlan tag
--
- add a constant for the vlan mask.
- enforce that we have a tag before we get it.
only get vtag when we have vtag like the other drivers.
like if_bge.c:1.312 and if_stge.c:1.64.
fixed by s-yamaguchi@IIJ, thanks.
 1.21.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.21.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.21.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.26.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.26.2.1 10-Jun-2019  christos Sync with HEAD
 1.45.2.7 13-May-2020  martin Pull up following revision(s) (requested by yamaguchi in ticket #902):

sys/arch/x86/pci/if_vmx.c: revision 1.60

Fix the wrong logic about making the number of vmx(4) TX/RX queue
be power of two

reviewed by nonaka@n.o.
 1.45.2.6 28-Jan-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #662):

sys/dev/pcmcia/if_xi.c: revision 1.93
sys/arch/x86/pci/if_vmx.c: revision 1.54
sys/dev/pci/if_de.c: revision 1.165
sys/arch/arm/ti/if_cpsw.c: revision 1.10
sys/arch/arm/omap/if_cpsw.c: revision 1.26
sys/dev/isa/if_iy.c: revision 1.112
sys/dev/pcmcia/if_ray.c: revision 1.96

Add ETHER_LOCK() and ETHER_UNLOCK() to protect ec_multiaddrs.

XXX These drivers don't check whether enm_addrlo and enm_addrhi are the same
or not, so it won't work correctly if an multicast address entry has a range.

Protect ec_multicnt.
 1.45.2.5 26-Dec-2019  martin Pull up following revision(s) (requested by knakahara in ticket #583):

sys/arch/x86/pci/if_vmx.c: revision 1.53

Fix missing splnet() for ether_ioctl() caused by if_vmx.c:r1.32.
pointed out by nonaka@n.o, thanks.
 1.45.2.4 10-Oct-2019  martin Pull up following revision(s) (requested by knakahara in ticket #298):

sys/arch/x86/pci/if_vmx.c: revision 1.51

Fix kassert failure in vmxnet3_transmit(). Pointed out by ryo@n.o, thanks.
 1.45.2.3 30-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #268):

sys/arch/x86/pci/if_vmx.c: revision 1.50

Fix typo in vmxnet3_legacy_intr().

That causes sysctl hw.vmx*.{rx,tx} effect inversely when vmx(4) uses
INTx or MSI.
 1.45.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #143):

sys/arch/x86/pci/if_vmx.c: revision 1.49

vmxnet3_softc.vmx_stats should not count globally. pointed out by hikaru@n.o

divide vmxnet3_softc.vmx_stats to each vmxnet3_txqueue and vmxnet3_rxqueue,
furthermore make them evcnt.
 1.45.2.1 20-Aug-2019  martin Pull up following revision(s) (requested by knakahara in ticket #99):

sys/arch/x86/pci/if_vmx.c: revision 1.46
sys/arch/x86/pci/if_vmx.c: revision 1.47
sys/arch/x86/pci/if_vmx.c: revision 1.48

vmx(4) uses interrupt distribution for each queue like ixg(4).

fix panic when vmx(4) is detached.

add vmx(4) basic statistics counters.
Sorry, I have forgotten this TODO in r1.40 commit message.
 1.54.2.1 29-Feb-2020  ad Sync with head.
 1.4 14-Oct-2020  ryo vmx(4) should be MI. moved to sys/dev/pci from sys/arch/x86/pci
 1.3 05-Mar-2019  msaitoh Centralize ETHER_ALIGN into net/if_ether.h. Note that this commit also changes
if_upgt.c's ETHER_ALIGN from 0 to 2.
 1.2 25-Nov-2016  hikaru branches: 1.2.16;
Sync code with FreeBSD to support RSS

- Use MSI/MSI-X if it is available.
- Support TSO.

co-authored by k-nakahara
 1.1 10-Jun-2014  hikaru branches: 1.1.2; 1.1.6; 1.1.8; 1.1.12;
Add VMware VMXNET3 ethernet driver from OpenBSD, vmx(4).
 1.1.12.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.8.1 05-Dec-2016  skrll Sync with HEAD
 1.1.6.3 03-Dec-2017  jdolecek update from HEAD
 1.1.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.6.1 10-Jun-2014  tls file if_vmxreg.h was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
 1.1.2.2 10-Aug-2014  tls Rebase.
 1.1.2.1 10-Jun-2014  tls file if_vmxreg.h was added on branch tls-earlyentropy on 2014-08-10 06:54:11 +0000
 1.2.16.1 10-Jun-2019  christos Sync with HEAD
 1.1 10-Dec-2017  bouyer Add support for I2C designware controllers (as found in Intel PCH devices),
with a pci front-end.
The pci front-end is tied to ACPI and Intel-specific, so it's in arch/x86/pci
and not dev/pci.
Core driver from OpenBSD, PCI front-end by me.
 1.27 24-May-2022  bouyer - msipic_construct_msix_pic(): set mp_table_base to memaddr (without
table_offset), this is what Xen wants
while there use pci_conf_write16() in msi_set_msictl_enablebit() too,
for consistency (it seems that Xen accepts the 32bit write at this point,
but this may change).

- xen_map_msix_pirq(): don't forget to set map_irq.table_base in the
MSI-X case, otherwise Xen maps it as MSI
- call pic_hwunmask() after pirq_establish() in msi/msix case, to make sure
the msi-x vector is unmasked.

Now MSI-X works with Xen so stop disabling it in pci_attach_hook().
 1.26 23-May-2022  bouyer Work in progress on MSI/MSI-X on Xen (MSI works on my hardware, more work
needed for MSI-X):
- Xen silently rejects 32 bits writes to MSI configuration registers
(especially when setting PCI_MSI_CTL_MSI_ENABLE/PCI_MSIX_CTL_ENABLE),
it expects 16 bits writes. So introduce a pci_conf_write16(),
only available on XENPV (and working only for mode 1 without
PCI_OVERRIDE_CONF_WRITE) and use it to enable MSI or MSI-X on XENPV.
- for multi-MSI vectors, Xen allocates all of them in a single hypercall,
so it's not convenient to do it at intr_establish() time.
So do it at alloc() time and register the pirqs in the msipic structure.
xen_pic_to_gsi() now just returns the values cached in the msipic.
As a bonus, if the PHYSDEVOP_map_pirq hypercall fails we can fail
the alloc() and we don't need the xen_pci_msi*_probe() hacks.

options NO_PCI_MSI_MSIX still on by default for XEN3_DOM0.
 1.25 11-Dec-2020  knakahara Fix build failure when XNEPV is defined.
 1.24 11-Dec-2020  knakahara Not pic->pic_addroute but pic->pic_hwunmask should enable interrupts for MSI-X.

pic->pic_addroute should not enable interrupt, because callers expect
interrupts have been disabled until they call pic->pic_hwunmask.

By the way, the old implement writes zero to Vector Control for MSI-X Table
Entries, howerver it must be read and updated. Because, there are not only
Mask Bit but also ST lower and ST upper.
 1.23 04-May-2020  jdolecek branches: 1.23.2;
add support for using MSI for XenPV Dom0

use PHYSDEVOP_map_pirq to get the pirq/gsi for MSI/MSI-X, switch also INTx
to use it instead of PHYSDEVOP_alloc_irq_vector

MSI confirmed working with single-vector MSI for wm(4), ahcisata(4), bge(4)

XXX added some provision for MSI-X, but it doesn't actually work (no interrupts
delivered), needs some further investigation; disable MSI-X for XENPV
via flag in x86/pci/pci_machdep.c
 1.22 04-May-2020  jdolecek constify the pic templates
 1.21 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.20 02-Dec-2019  msaitoh branches: 1.20.6;
Use PCI_MSIX_"TBL"BIR_MASK instead of PCI_MSIX_"PBA"BIR_MASK for MSI-X table.
This is not a real bug because both macros have the same value.
 1.19 13-Nov-2019  hikaru Disable MSI-X before writing the MSI-X table.

That fixes MSI-X interrupt lost on VMware ESXi 6.7 PCI passthrough devices.

ok knakahara@
 1.18 03-Oct-2019  tnn change bus_space_map to _x86_memio_map

Resolves bus space reservation conflict between MI and MD code.
Discussion:
http://mail-index.netbsd.org/port-amd64/2019/09/28/msg003014.html
 1.17 26-Jun-2019  knakahara branches: 1.17.2;
Fix updating "Multiple Message Enable" field for MSI multiple vectors. Pointed out by jmcneill@n.o, thanks.

I tested ahcisata for MSI single vector regression.
 1.16 18-Jun-2019  msaitoh Add note about the case of PCI_MSI_MDATA[64] is 16bit.
 1.15 17-Jun-2019  msaitoh KNF. No functional change.
 1.14 17-Jun-2019  msaitoh Fix comma with semicolon. No functional change.
 1.13 14-Jun-2019  msaitoh No functional change:
- Rename macros:
- ICR, LVT and MSIDATA can share the bit definitions. Remove redundant
definitions and use the common macros.
- Consistently use LAPIC_LVT_ for all local vector table's macro names.
- Use __BITS().
- Add definition for TSC-deadline (LAPIC_LVT_TMM_TSCDLT).
 1.12 01-Apr-2019  msaitoh Fix typo in comment (s/numer/number/).
 1.11 28-Jul-2017  maxv branches: 1.11.2; 1.11.6;
Don't include malloc.h.
 1.10 01-Jun-2017  chs branches: 1.10.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.9 23-May-2017  nonaka x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.8 17-Nov-2015  msaitoh No functional change:
- Add comments.
- Remove obsolete comment.
- Move definitions to better location.
- Rename bit definition.
- KNF.
- Indent.
 1.7 13-Aug-2015  msaitoh Add workaround for PCI prefetchable bit in msipic_construct_msix_pic().
Some chips (e.g. Intel 82599) report SERR and MSI-X interrupt doesn't work.
This problem might not be the driver's bug but our PCI common part or VMs'
bug. See fxp(4), bge(4) and ixgbe(4). All of them has the same workaround
related to prefetchable bit. For the MSI-X table area, it should not have side
effect by prefetching. Until we find a real reason, we ignore the prefetchable
bit.
 1.6 13-Aug-2015  msaitoh - Don't take pci_attach_args as an argument in pci_msi[x]_count().
- Move prototypes of pci_msi[x]_count() from x86/x86/pci_machdep_common to
sys/dev/pci/pcivar.h.
- Move pci_msi[x]_count() from x86/pci/pci_msi_machdep.c to sys/dev/pci/pci.c
 1.5 11-Aug-2015  msaitoh Add missing opt_intrdebug.h.
 1.4 08-May-2015  knakahara branches: 1.4.2;
add a const qualifier to struct pci_attach_args *pa argument
 1.3 28-Apr-2015  martin Make this compilable in non-DIAGNOSTIC kernels.
 1.2 28-Apr-2015  knakahara fix debug message.
 1.1 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.4.2.5 28-Aug-2017  skrll Sync with HEAD
 1.4.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.4.2.3 22-Sep-2015  skrll Sync with HEAD
 1.4.2.2 06-Jun-2015  skrll Sync with HEAD
 1.4.2.1 08-May-2015  skrll file msipic.c was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.10.2.1 20-Nov-2019  martin Pull up following revision(s) (requested by hikaru in ticket #1453):

sys/arch/x86/pci/msipic.c: revision 1.19

Disable MSI-X before writing the MSI-X table.

That fixes MSI-X interrupt lost on VMware ESXi 6.7 PCI passthrough devices.

ok knakahara@
 1.11.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.11.6.1 10-Jun-2019  christos Sync with HEAD
 1.11.2.2 03-Dec-2017  jdolecek update from HEAD
 1.11.2.1 28-Jul-2017  jdolecek file msipic.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.17.2.2 16-Nov-2019  martin Pull up following revision(s) (requested by hikaru in ticket #429):

sys/arch/x86/pci/msipic.c: revision 1.19

Disable MSI-X before writing the MSI-X table.

That fixes MSI-X interrupt lost on VMware ESXi 6.7 PCI passthrough devices.

ok knakahara@
 1.17.2.1 15-Oct-2019  martin Pull up following revision(s) (requested by tnn in ticket #305):

sys/arch/x86/pci/msipic.c: revision 1.18

change bus_space_map to _x86_memio_map

Resolves bus space reservation conflict between MI and MD code.

Discussion:
http://mail-index.netbsd.org/port-amd64/2019/09/28/msg003014.html
 1.20.6.1 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.23.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.4 23-May-2022  bouyer Work in progress on MSI/MSI-X on Xen (MSI works on my hardware, more work
needed for MSI-X):
- Xen silently rejects 32 bits writes to MSI configuration registers
(especially when setting PCI_MSI_CTL_MSI_ENABLE/PCI_MSIX_CTL_ENABLE),
it expects 16 bits writes. So introduce a pci_conf_write16(),
only available on XENPV (and working only for mode 1 without
PCI_OVERRIDE_CONF_WRITE) and use it to enable MSI or MSI-X on XENPV.
- for multi-MSI vectors, Xen allocates all of them in a single hypercall,
so it's not convenient to do it at intr_establish() time.
So do it at alloc() time and register the pirqs in the msipic structure.
xen_pic_to_gsi() now just returns the values cached in the msipic.
As a bonus, if the PHYSDEVOP_map_pirq hypercall fails we can fail
the alloc() and we don't need the xen_pci_msi*_probe() hacks.

options NO_PCI_MSI_MSIX still on by default for XEN3_DOM0.
 1.3 04-May-2020  jdolecek add support for using MSI for XenPV Dom0

use PHYSDEVOP_map_pirq to get the pirq/gsi for MSI/MSI-X, switch also INTx
to use it instead of PHYSDEVOP_alloc_irq_vector

MSI confirmed working with single-vector MSI for wm(4), ahcisata(4), bge(4)

XXX added some provision for MSI-X, but it doesn't actually work (no interrupts
delivered), needs some further investigation; disable MSI-X for XENPV
via flag in x86/pci/pci_machdep.c
 1.2 08-May-2015  knakahara branches: 1.2.2; 1.2.18;
add a const qualifier to struct pci_attach_args *pa argument
 1.1 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.2.18.2 03-Dec-2017  jdolecek update from HEAD
 1.2.18.1 08-May-2015  jdolecek file msipic.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.2.2.2 06-Jun-2015  skrll Sync with HEAD
 1.2.2.1 08-May-2015  skrll file msipic.h was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.37 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.36 24-Apr-2021  thorpej branches: 1.36.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.35 01-Oct-2016  mrg branches: 1.35.30;
use 4-byte style accesses, should hopefully fix PR#37787.
 1.34 16-Apr-2012  pgoyette branches: 1.34.2; 1.34.16; 1.34.20;
Now that we have amdnb_misc for attaching amdtemp, revert pchb.c revisions
1.27 and 1.32. This will unbreak the build.

XXX The amdtemp device currently does not seem to provide correct sensor
values.

XXX The amdnb_misc device does not currently have a rescan capability, so
the amdtemp module will not instantiate any devices (PR kern/45268
reappears).

XXX The agp attachment at the same pci device and function (which was
the motivation for attaching amdtemp at pchb) probably ought to also
be moved to attach at amdnb_miscbus.
 1.33 30-Jan-2012  drochner Use pci_aprint_devinfo(9) instead of pci_devinfo+aprint_{normal,naive}
where it looks straightforward, and pci_aprint_devinfo_fancy in a few
others where drivers want to supply their own device names instead
of the pcidevs generated one. More complicated cases, where names
are composed at runtime, are left alone for now. It certainly makes
sense to simplify the drivers here rather than inventing a catch-all API.
This should serve as as example for new drivers, and also ensure
consistent output in the AB_QUIET ("boot -q") case. Also, it avoids
excessive stack usage where drivers attach child devices because the
buffer for the device name is not kept on the local stack anymore.
 1.32 20-Aug-2011  jakllsch branches: 1.32.2; 1.32.6;
Add rescan support for 'amdtempbus' to x86 pchb(4).
Maybe finally fixes PR#45268.
 1.31 20-Aug-2011  jakllsch pchb_get_bus_number() is actually public
 1.30 20-Aug-2011  jakllsch staticification
 1.29 20-Aug-2011  jakllsch We no longer need to #include "rnd.h".
 1.28 20-Aug-2011  jakllsch We need to initialize the PCI chipset and device tags in the softc for
the suspend and resume handlers.
 1.27 18-Aug-2011  jakllsch Attach amdtemp(4) at pchb(4) instead of in place of pchb(4).

Should fix PR#45268.
 1.26 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.25 17-May-2011  dyoung PCI_FLAGS_IO_ENABLED and PCI_FLAGS_MEM_ENABLED changed their functional
role in NetBSD (drivers are no longer supposed to write these to
pa_flags) without changing name. Correct that.

Rename PCI_FLAGS_IO_ENABLED to PCI_FLAGS_IO_OKAY and
PCI_FLAGS_MEM_ENABLED to PCI_FLAGS_MEM_OKAY, thus making their names
consistent with the other PCI flags and poisoning 3rd-party driver
sources that use the flags in the old bad way.

This patch produces no binary changes in this set of PCI kernels when
they are compiled w/o 'options DIAGNOSTIC' and w/ -V MKREPRO=yes:

algor P4032 P5064 P6032
alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE
evbarm-el GUMSTIX HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321
evbarm-el IXDP425 IXM1200 KUROBOX_PRO
evbarm-el LUBBOCK MARVELL_NAS NAPPI NSLU2 SHEEVAPLUG SMDK2800 TEAMASA_NPWR
evbarm-el TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
evbppc OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
iyonix GENERIC
landisk GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sbmips-el GENERIC
sgimips GENERIC32_IP2x GENERIC32_IP3x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC
 1.24 24-Feb-2011  matt Add Intel Pineview support
 1.23 23-Jul-2010  jakllsch branches: 1.23.2; 1.23.4;
Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.22 16-Jun-2010  riz Add AGP support for a number of Intel onboard devices, including
82G41, 82B43, E7221, 82965GME, and "Iron Lake". Device
types (i915, i965, G33, and G4X variants) from the Linux Intel AGP
driver, and (for 82G41) from Henry Bent in PR#42906.

There are a few more varieties that should be relatively low-hanging
fruit ("Pineview" and "Sandy Bridge"), but will require a little bit
of rejiggering of the "chiptype".

OK mrg@
 1.21 24-Feb-2010  dyoung branches: 1.21.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
 1.20 08-Jan-2010  dyoung branches: 1.20.2;
Expand PMF_FN_* macros.
 1.19 23-Aug-2009  jmcneill Save a line of dmesg by printing the vendor/product info on the same line
as the locators.
 1.18 07-Apr-2009  dyoung Detach pchb(4) instances at shutdown.
 1.17 27-Jan-2009  markd branches: 1.17.2;
Add some more Intel G4X class chipsets
 1.16 29-Nov-2008  christos Add support for the Intel G45 AGP. From Arnaud Lacombe
 1.15 08-Nov-2008  christos Support for Intel G35 as found on Asus P5E-VM HDMI motherboard from
Milos Negovanovic
 1.14 22-Aug-2008  tnn branches: 1.14.2; 1.14.4; 1.14.8;
AGP support for Intel 945GME chipset, found on Acer Aspire One.
 1.13 19-Aug-2008  matthias Add agp support for Intel 946GZ.
 1.12 30-May-2008  joerg branches: 1.12.4;
Add a function to extract the primary bus number of PCI host bridges,
as far as specific code for this already existed.
 1.11 28-Apr-2008  martin branches: 1.11.2;
Remove clause 3 and 4 from TNF licenses
 1.10 16-Apr-2008  cegger branches: 1.10.2; 1.10.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.9 04-Mar-2008  cube Split device_t/softc and other related cosmetic changes.
 1.8 29-Feb-2008  dyoung Use PMF_FN_ARGS, PMF_FN_PROTO.
 1.7 28-Feb-2008  drochner fix an unaligned PCI config space access for the P2 "BX" chipset
 1.6 03-Jan-2008  dyoung branches: 1.6.2; 1.6.6;
Support detachment of pchb(4) and sysbeep(4).
 1.5 09-Dec-2007  jmcneill branches: 1.5.2;
Merge jmcneill-pm branch.
 1.4 24-Nov-2007  markd branches: 1.4.2; 1.4.4; 1.4.6;
Add Intel Q35/G33/Q33 bridges.
 1.3 12-Nov-2007  joerg branches: 1.3.2;
Add Intel 82965PM bridge from jmcneill-pm.
 1.2 30-Oct-2007  jnemeth branches: 1.2.2; 1.2.4;
PR/37201 - Yasushi Oshima -- Intel 82965G chipset support
 1.1 26-Oct-2007  xtraeme branches: 1.1.2; 1.1.4;
- Share pchb(4) between i386 and amd64; one copy is enough for both.
- Move some of the x86 PCI devices into x86/pci/files.pci.
- Add more x86 stuff into x86/conf/files.x86.

ok joerg.
 1.1.4.7 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.1.4.6 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.1.4.5 06-Nov-2007  joerg Refactor PNP API:
- Make suspend/resume directly a device functionality. It consists of
three layers (class logic, device logic, bus logic), all of them being
optional. This replaces D0/D3 transitions.
- device_is_active returns true if the device was not disabled and was
not suspended (even partially), device_is_enabled returns true if the
device was enabled.
- Change pnp_global_transition into pnp_system_suspend and
pnp_system_resume. Before running any suspend/resume handlers, check
that all currently attached devices support power management and bail
out otherwise. The latter is not done for the shutdown/panic case.
- Make the former bus-specific generic network handlers a class handler.
- Make PNP message like volume up/down/toogle PNP events. Each device
can register what events they are interested in and whether the handler
should be global or not.
- Introduce device_active API for devices to mark themselve in use from
either the system or the device. Use this to implement the idle handling
for audio and input devices. This is intended to replace most ad-hoc
watchdogs as well.
- Fix somes situations in which audio resume would lose mixer settings.
- Make USB host controllers better deal with suspend in the light of
shared interrupts.
- Flush filesystem cache on suspend.
- Flush disk caches on suspend. Put ATA disks into standby on suspend as
well.
- Adopt drivers to use the new PNP API.
- Fix a critical bug in the generic cardbus layer that made D0->D3
break.
- Fix ral(4) to set if_stop.
- Convert cbb(4) to the new PNP API.
- Apply the PCI Express SCI fix on resume again.
 1.1.4.4 04-Nov-2007  jmcneill Re-add 965 support lost in an earlier merge.
 1.1.4.3 31-Oct-2007  joerg Sync with HEAD.
 1.1.4.2 28-Oct-2007  joerg Sync with HEAD.
 1.1.4.1 26-Oct-2007  joerg file pchb.c was added on branch jmcneill-pm on 2007-10-28 20:10:59 +0000
 1.1.2.6 17-Mar-2008  yamt sync with head.
 1.1.2.5 21-Jan-2008  yamt sync with head
 1.1.2.4 07-Dec-2007  yamt sync with head
 1.1.2.3 15-Nov-2007  yamt sync with head.
 1.1.2.2 27-Oct-2007  yamt sync with head.
 1.1.2.1 26-Oct-2007  yamt file pchb.c was added on branch yamt-lazymbuf on 2007-10-27 11:28:58 +0000
 1.2.4.4 23-Mar-2008  matt sync with HEAD
 1.2.4.3 09-Jan-2008  matt sync with HEAD
 1.2.4.2 06-Nov-2007  matt sync with HEAD
 1.2.4.1 30-Oct-2007  matt file pchb.c was added on branch matt-armv6 on 2007-11-06 23:23:42 +0000
 1.2.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.2.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.2.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.2.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.3.2.2 13-Nov-2007  bouyer Sync with HEAD
 1.3.2.1 12-Nov-2007  bouyer file pchb.c was added on branch bouyer-xenamd64 on 2007-11-13 16:00:19 +0000
 1.4.6.1 11-Dec-2007  yamt sync with head.
 1.4.4.1 26-Dec-2007  ad Sync with head.
 1.4.2.2 03-Dec-2007  ad Sync with HEAD.
 1.4.2.1 24-Nov-2007  ad file pchb.c was added on branch vmlocking on 2007-12-03 19:04:28 +0000
 1.5.2.1 08-Jan-2008  bouyer Sync with HEAD
 1.6.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.6.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.6.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.6.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.6.2.1 24-Mar-2008  keiichi sync with head.
 1.10.4.5 11-Aug-2010  yamt sync with head.
 1.10.4.4 11-Mar-2010  yamt sync with head
 1.10.4.3 16-Sep-2009  yamt sync with head
 1.10.4.2 04-May-2009  yamt sync with head.
 1.10.4.1 16-May-2008  yamt sync with head.
 1.10.2.2 04-Jun-2008  yamt sync with head
 1.10.2.1 18-May-2008  yamt sync with head.
 1.11.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.11.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.12.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.12.4.1 19-Oct-2008  haad Sync with HEAD.
 1.14.8.1 21-Apr-2010  matt sync to netbsd-5
 1.14.4.1 05-May-2009  bouyer Pull up following revision(s) (requested by snj in ticket #737):
sys/arch/x86/pci/pchb.c: revisions 1.15 - 1.17
sys/dev/pci/agp.c: revisions 1.63 - 1.65
sys/dev/pci/agp_i810.c: revisions 1.57 - 1.64
sys/dev/pci/pcidevs: revisions 1.965, 1.967 via patch
sys/dev/pci/agpreg.h: revision 1.20
Add AGP support for Intel G35, G45, and Q45.
 1.14.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.14.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.14.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.17.2.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.17.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.17.2.3 24-Oct-2010  jym Sync with HEAD
 1.17.2.2 01-Nov-2009  jym Sync with HEAD.
 1.17.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.20.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.20.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.21.2.3 31-May-2011  rmind sync with head
 1.21.2.2 05-Mar-2011  rmind sync with head
 1.21.2.1 03-Jul-2010  rmind sync with head
 1.23.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.23.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.32.6.2 29-Apr-2012  mrg sync to latest -current.
 1.32.6.1 18-Feb-2012  mrg merge to -current.
 1.32.2.2 18-Apr-2012  yamt pull following revisions from trunk so that the kernel at least boot
on my system.
cvs rdiff -u -r1.33 -r1.34 src/sys/arch/x86/pci/pchb.c
cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/pci/pchbvar.h
cvs rdiff -u -r1.1 -r1.2 src/sys/arch/x86/pci/amdnb_misc.c
 1.32.2.1 17-Apr-2012  yamt sync with head
 1.34.20.1 04-Nov-2016  pgoyette Sync with HEAD
 1.34.16.1 05-Oct-2016  skrll Sync with HEAD
 1.34.2.1 03-Dec-2017  jdolecek update from HEAD
 1.35.30.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.36.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.10 23-Jul-2010  jakllsch Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.9 03-Nov-2009  snj branches: 1.9.2; 1.9.4;
Drop 3rd and 4th clauses, as the copyright holder (Michael Shalayeff) did
in OpenBSD revision 1.39.
 1.8 04-Mar-2008  cube branches: 1.8.4; 1.8.18;
Split device_t/softc and other related cosmetic changes.
 1.7 03-Jan-2008  dyoung branches: 1.7.2; 1.7.6;
Support detachment of pchb(4) and sysbeep(4).
 1.6 17-Oct-2007  garbled branches: 1.6.2; 1.6.8;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.5 16-Oct-2007  joerg Make the check for a working RNG a bit more aggressive. Try to read
10 samples and bail out if there was a timeout to get either of those
within 10ms or if they are all 0xff. Both are good indicators of a
missing RNG.
 1.4 16-Oct-2007  joerg Exploit that only Intel devices are matched and all devices do the same.
This saves two levels of indentation for the main body, making it more
readable. Don't hide the error for disabling the RNG under DIAGNOSTIC,
verbose is enough. Use aprint_*_dev.
 1.3 09-Jul-2007  ad branches: 1.3.8; 1.3.10; 1.3.12;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.2 19-Feb-2006  tron branches: 1.2.8; 1.2.14; 1.2.20; 1.2.30; 1.2.32; 1.2.38;
Improve code probing for the Intel hardware RNG to avoid false detections.
See http://home.comcast.net/~andrex/hardware-RNG/doihave.html for details.
Problem pointed on by Thor Lancelot Simon on port-amd64 mailing list.
 1.1 12-Feb-2006  tron branches: 1.1.2;
Share Intel hardware random number generator support between amd64 and
i386 port. This will benefit EM64T systems using Intel i9xx chipsets.
 1.1.2.3 01-Mar-2006  yamt sync with head.
 1.1.2.2 18-Feb-2006  yamt sync with head.
 1.1.2.1 12-Feb-2006  yamt file pchb_rnd.c was added on branch yamt-uio_vmspace on 2006-02-18 15:38:54 +0000
 1.2.38.2 17-Oct-2007  garbled Sync with HEAD
 1.2.38.1 03-Oct-2007  garbled Sync with HEAD
 1.2.32.1 11-Jul-2007  mjf Sync with head.
 1.2.30.2 23-Oct-2007  ad Sync with head.
 1.2.30.1 01-Jul-2007  ad Adapt to callout API change.
 1.2.20.2 09-Sep-2006  rpaulo sync with head
 1.2.20.1 19-Feb-2006  rpaulo file pchb_rnd.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.2.14.6 17-Mar-2008  yamt sync with head.
 1.2.14.5 21-Jan-2008  yamt sync with head
 1.2.14.4 27-Oct-2007  yamt sync with head.
 1.2.14.3 03-Sep-2007  yamt sync with head.
 1.2.14.2 21-Jun-2006  yamt sync with head.
 1.2.14.1 19-Feb-2006  yamt file pchb_rnd.c was added on branch yamt-lazymbuf on 2006-06-21 14:57:56 +0000
 1.2.8.2 22-Apr-2006  simonb Sync with head.
 1.2.8.1 19-Feb-2006  simonb file pchb_rnd.c was added on branch simonb-timecounters on 2006-04-22 11:38:09 +0000
 1.3.12.1 18-Oct-2007  yamt sync with head.
 1.3.10.3 23-Mar-2008  matt sync with HEAD
 1.3.10.2 09-Jan-2008  matt sync with HEAD
 1.3.10.1 06-Nov-2007  matt sync with HEAD
 1.3.8.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.6.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.6.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.7.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.7.2.1 24-Mar-2008  keiichi sync with head.
 1.8.18.1 24-Oct-2010  jym Sync with HEAD
 1.8.4.2 11-Aug-2010  yamt sync with head.
 1.8.4.1 11-Mar-2010  yamt sync with head
 1.9.4.1 05-Mar-2011  rmind sync with head
 1.9.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.9.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.9 16-Apr-2012  pgoyette Now that we have amdnb_misc for attaching amdtemp, revert pchb.c revisions
1.27 and 1.32. This will unbreak the build.

XXX The amdtemp device currently does not seem to provide correct sensor
values.

XXX The amdnb_misc device does not currently have a rescan capability, so
the amdtemp module will not instantiate any devices (PR kern/45268
reappears).

XXX The agp attachment at the same pci device and function (which was
the motivation for attaching amdtemp at pchb) probably ought to also
be moved to attach at amdnb_miscbus.
 1.8 20-Aug-2011  jakllsch branches: 1.8.2; 1.8.6;
Add rescan support for 'amdtempbus' to x86 pchb(4).
Maybe finally fixes PR#45268.
 1.7 23-Jul-2010  jakllsch Finish cleaning up pchb from recent change.
Use fewer magic numbers in ichlpcib.
Slightly improve style conformance.
Update paths in cpp re-inclusion guards.
 1.6 23-Jul-2010  jakllsch Almost entirely rework Intel Firmware Hub random number generator support.

This introduces fwhrng(4) which attaches via ichlpcib(4), replacing
the rnd(4) support in pchb(4).
 1.5 28-Apr-2008  martin branches: 1.5.14; 1.5.20; 1.5.22;
Remove clause 3 and 4 from TNF licenses
 1.4 04-Mar-2008  cube branches: 1.4.2; 1.4.4;
Split device_t/softc and other related cosmetic changes.
 1.3 03-Jan-2008  dyoung branches: 1.3.2; 1.3.6;
Support detachment of pchb(4) and sysbeep(4).
 1.2 09-Dec-2007  jmcneill branches: 1.2.2;
Merge jmcneill-pm branch.
 1.1 12-Feb-2006  tron branches: 1.1.2; 1.1.10; 1.1.16; 1.1.22; 1.1.50; 1.1.52; 1.1.58; 1.1.62; 1.1.64;
Share Intel hardware random number generator support between amd64 and
i386 port. This will benefit EM64T systems using Intel i9xx chipsets.
 1.1.64.1 11-Dec-2007  yamt sync with head.
 1.1.62.1 26-Dec-2007  ad Sync with head.
 1.1.58.1 18-Feb-2008  mjf Sync with HEAD.
 1.1.52.2 23-Mar-2008  matt sync with HEAD
 1.1.52.1 09-Jan-2008  matt sync with HEAD
 1.1.50.2 12-Nov-2007  joerg CG unused softc fields.
 1.1.50.1 03-Aug-2007  jmcneill Pull in power management changes from private branch.
 1.1.22.2 09-Sep-2006  rpaulo sync with head
 1.1.22.1 12-Feb-2006  rpaulo file pchbvar.h was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.1.16.4 17-Mar-2008  yamt sync with head.
 1.1.16.3 21-Jan-2008  yamt sync with head
 1.1.16.2 21-Jun-2006  yamt sync with head.
 1.1.16.1 12-Feb-2006  yamt file pchbvar.h was added on branch yamt-lazymbuf on 2006-06-21 14:57:56 +0000
 1.1.10.2 22-Apr-2006  simonb Sync with head.
 1.1.10.1 12-Feb-2006  simonb file pchbvar.h was added on branch simonb-timecounters on 2006-04-22 11:38:09 +0000
 1.1.2.2 18-Feb-2006  yamt sync with head.
 1.1.2.1 12-Feb-2006  yamt file pchbvar.h was added on branch yamt-uio_vmspace on 2006-02-18 15:38:54 +0000
 1.2.2.1 08-Jan-2008  bouyer Sync with HEAD
 1.3.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.3.2.1 24-Mar-2008  keiichi sync with head.
 1.4.4.2 11-Aug-2010  yamt sync with head.
 1.4.4.1 16-May-2008  yamt sync with head.
 1.4.2.1 18-May-2008  yamt sync with head.
 1.5.22.1 05-Mar-2011  rmind sync with head
 1.5.20.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.5.14.2 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.5.14.1 24-Oct-2010  jym Sync with HEAD
 1.8.6.1 29-Apr-2012  mrg sync to latest -current.
 1.8.2.1 18-Apr-2012  yamt pull following revisions from trunk so that the kernel at least boot
on my system.
cvs rdiff -u -r1.33 -r1.34 src/sys/arch/x86/pci/pchb.c
cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/pci/pchbvar.h
cvs rdiff -u -r1.1 -r1.2 src/sys/arch/x86/pci/amdnb_misc.c
 1.10 28-Jul-2017  maxv Don't include malloc.h.
 1.9 27-Jan-2012  para branches: 1.9.6; 1.9.24;
converting extent(9) from malloc(9) to kmem(9)
preceding kmem-vmem-pool-uvm patch

releng@ acknowledged
 1.8 28-Aug-2011  dyoung branches: 1.8.2; 1.8.6;
Normalize whitespace.
 1.7 28-Aug-2011  dyoung Replace some anonymous constants with PCI_ constants.

Print debugging information using aprint_debug(9) not aprint_verbose(9)
and be consistent about that. Get rid of the pciaddrverbose switch for
debugging printfs.

Make 'static' several functions that are private to this module.

Don't test truth of arbitrary integers but compare with 0. Change
'return (x)' to 'return x'.
 1.6 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.5 26-Jul-2010  jym Add PAE to ALL kernel, so that most paddr_t format string errors get caught
during compilation.

While here, fix the compilation for ALL.
 1.4 17-Feb-2009  jmcneill branches: 1.4.2; 1.4.4; 1.4.6;
Use aprint_*
 1.3 19-Dec-2008  cegger branches: 1.3.2;
backout previous. makes i386 ALL kernel build again
 1.2 18-Dec-2008  cegger remove unused malloc.h
 1.1 18-May-2008  jmcneill branches: 1.1.2; 1.1.4; 1.1.8; 1.1.12;
Add support for PCI_BUS_FIXUP and PCI_ADDR_FIXUP on amd64.
 1.1.12.2 03-Mar-2009  skrll Sync with HEAD.
 1.1.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.1.8.2 23-Jun-2008  wrstuden Add files to branch that were added on -current.

After this, all that's left of update is to merge some changes
that had conflicts.
 1.1.8.1 18-May-2008  wrstuden file pci_addr_fixup.c was added on branch wrstuden-revivesa on 2008-06-23 05:02:12 +0000
 1.1.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.1.4.1 18-May-2008  mjf file pci_addr_fixup.c was added on branch mjf-devfs2 on 2008-06-02 13:22:50 +0000
 1.1.2.2 18-May-2008  yamt sync with head.
 1.1.2.1 18-May-2008  yamt file pci_addr_fixup.c was added on branch yamt-pf42 on 2008-05-18 12:33:04 +0000
 1.3.2.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.3.2.3 24-Oct-2010  jym Sync with HEAD
 1.3.2.2 01-Nov-2009  jym Sync with HEAD.
 1.3.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.4.6.1 05-Mar-2011  rmind sync with head
 1.4.4.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.4.2.3 11-Aug-2010  yamt sync with head.
 1.4.2.2 04-May-2009  yamt sync with head.
 1.4.2.1 17-Feb-2009  yamt file pci_addr_fixup.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.8.6.1 18-Feb-2012  mrg merge to -current.
 1.8.2.1 17-Apr-2012  yamt sync with head
 1.9.24.1 28-Aug-2017  skrll Sync with HEAD
 1.9.6.1 03-Dec-2017  jdolecek update from HEAD
 1.1 18-May-2008  jmcneill branches: 1.1.2; 1.1.4; 1.1.8; 1.1.22;
Add support for PCI_BUS_FIXUP and PCI_ADDR_FIXUP on amd64.
 1.1.22.2 04-May-2009  yamt sync with head.
 1.1.22.1 18-May-2008  yamt file pci_addr_fixup.h was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.1.8.2 23-Jun-2008  wrstuden Add files to branch that were added on -current.

After this, all that's left of update is to merge some changes
that had conflicts.
 1.1.8.1 18-May-2008  wrstuden file pci_addr_fixup.h was added on branch wrstuden-revivesa on 2008-06-23 05:02:12 +0000
 1.1.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.1.4.1 18-May-2008  mjf file pci_addr_fixup.h was added on branch mjf-devfs2 on 2008-06-02 13:22:50 +0000
 1.1.2.2 18-May-2008  yamt sync with head.
 1.1.2.1 18-May-2008  yamt file pci_addr_fixup.h was added on branch yamt-pf42 on 2008-05-18 12:33:04 +0000
 1.3 01-Mar-2019  msaitoh - Almost all ppbreg.h's definitions are also in pcireg.h. Remove duplicated
definitions from ppbreg.h and move some definitions from ppbreg.h to
pcireg.h.
- Change fast back-to-back "capable" to "enable" in pci_subr.c.
- Print Primary Discard Timer, Secondary Discard Timer, Discard Timer Status
and Discard Timer SERR# Enable bit in pci_subr.c.
- PCI_BRIDGE_PREFETCHBASE32_REG and PCI_BRIDGE_PREFETCHLIMIT32_REG are
"upper" 32bit registers, rename to *UP32_REG to avoid confusion.
- Use macro.
 1.2 01-Jul-2011  dyoung branches: 1.2.54;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.1 18-May-2008  jmcneill branches: 1.1.2; 1.1.4; 1.1.8; 1.1.18; 1.1.22;
Add support for PCI_BUS_FIXUP and PCI_ADDR_FIXUP on amd64.
 1.1.22.2 04-May-2009  yamt sync with head.
 1.1.22.1 18-May-2008  yamt file pci_bus_fixup.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.1.18.1 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.1.8.2 23-Jun-2008  wrstuden Add files to branch that were added on -current.

After this, all that's left of update is to merge some changes
that had conflicts.
 1.1.8.1 18-May-2008  wrstuden file pci_bus_fixup.c was added on branch wrstuden-revivesa on 2008-06-23 05:02:12 +0000
 1.1.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.1.4.1 18-May-2008  mjf file pci_bus_fixup.c was added on branch mjf-devfs2 on 2008-06-02 13:22:50 +0000
 1.1.2.2 18-May-2008  yamt sync with head.
 1.1.2.1 18-May-2008  yamt file pci_bus_fixup.c was added on branch yamt-pf42 on 2008-05-18 12:33:04 +0000
 1.2.54.1 10-Jun-2019  christos Sync with HEAD
 1.1 18-May-2008  jmcneill branches: 1.1.2; 1.1.4; 1.1.8; 1.1.22;
Add support for PCI_BUS_FIXUP and PCI_ADDR_FIXUP on amd64.
 1.1.22.2 04-May-2009  yamt sync with head.
 1.1.22.1 18-May-2008  yamt file pci_bus_fixup.h was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.1.8.2 23-Jun-2008  wrstuden Add files to branch that were added on -current.

After this, all that's left of update is to merge some changes
that had conflicts.
 1.1.8.1 18-May-2008  wrstuden file pci_bus_fixup.h was added on branch wrstuden-revivesa on 2008-06-23 05:02:12 +0000
 1.1.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.1.4.1 18-May-2008  mjf file pci_bus_fixup.h was added on branch mjf-devfs2 on 2008-06-02 13:22:50 +0000
 1.1.2.2 18-May-2008  yamt sync with head.
 1.1.2.1 18-May-2008  yamt file pci_bus_fixup.h was added on branch yamt-pf42 on 2008-05-18 12:33:04 +0000
 1.51 01-Aug-2020  jdolecek reorder includes to pull __HAVE_PCI_MSI_MSIX properly via
<x86/pci_machdep_common.h>
 1.50 17-Jun-2019  msaitoh KNF. No functional change.
 1.49 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.48 02-Dec-2018  cherry make

options NO_PCI_MSI_MSIX

work again for arch/x86/
 1.47 27-Nov-2018  jdolecek actually allow pci_intr_alloc() with NULL count with MSI-X
 1.46 27-Nov-2018  jdolecek make pci_intr_alloc() try also MSI-X by default (with NULL count);
there are boards/emulators which only have MSI-X and no MSI, and
so far there is no evidence there are devices which support both
and don't work in MSI-X mode

this change is supposed to reduce amount of needed cut&paste code in drivers

discussed briefly with jmcneill@
 1.45 23-Sep-2018  cherry Revert:
http://mail-index.netbsd.org/source-changes/2018/09/10/msg098995.html

It is incorrect to infer semantics from usage.

the problem for which the original commit was intended should be fixed
within the callee intr_establish_xname() and not the caller:
pci_intr_find_intx_irq()

This was accomplished via:
http://mail-index.netbsd.org/source-changes/2018/09/20/msg099286.html
 1.44 10-Sep-2018  cherry In the NIOAPIC case, we do not need to support "legacy" irqs,
ie; We don't need to simultaneously pass back the irq in the
range 0 < irq < 16 (which are sometimes described as "legacy"
in src

This was non-obvious, until the semantics of "legacy" were
used in inconsistent ways in Xen (to also mean interrupts in
the 0 < irq < 256 range) which causes problems with attempting
to unify the sys/arch/x86/isa/isa_machdep.c:isa_intr_establish_xname()
function between XEN and !XEN

This commit should not affect current functionality on any
either native or Xen. It is needed for future code reorg, and
published now as a preview.
 1.43 24-Jun-2018  jdolecek branches: 1.43.2;
provide pci_intr_establish_xname() on x86 independantly from MSI,
so it's available on XEN too; change also the stub to use weak
symbol instead #ifdef
 1.42 04-Jan-2018  knakahara branches: 1.42.2;
fix "intrctl list" panic when ACPI is disabled.

reviewed by cherry@n.o and tested by msaitoh@n.o, thanks.
 1.41 28-Jul-2017  maxv Don't include malloc.h.
 1.40 01-Jun-2017  chs branches: 1.40.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.39 28-Nov-2016  knakahara fix build of amd64/i386 with NO_PCI_MSI_MSIX option.
 1.38 11-Jul-2016  knakahara branches: 1.38.2;
pci_intr_type() is required pci_chipset_tag_t argument by other than x86.

pointed out by nonaka@n.o.
 1.37 17-Aug-2015  knakahara Add kernel code to support intrctl(8).
 1.36 13-Aug-2015  msaitoh - Don't take pci_attach_args as an argument in pci_msi[x]_count().
- Move prototypes of pci_msi[x]_count() from x86/x86/pci_machdep_common to
sys/dev/pci/pcivar.h.
- Move pci_msi[x]_count() from x86/pci/pci_msi_machdep.c to sys/dev/pci/pci.c
 1.35 24-Jul-2015  knakahara fix pci_intr_alloc(..., NULL, 0). reported nonaka@n.o
 1.34 21-Jul-2015  knakahara add pci_intr_alloc() API
 1.33 15-May-2015  knakahara pci_msi_string() must be used by MD code only.
 1.32 15-May-2015  knakahara refactor: change function names and move them.
 1.31 15-May-2015  knakahara unify INTx, MSI and MSI-X APIs without alloc. (alloc API is under discussion)
 1.30 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.29 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.28 27-Apr-2015  knakahara add pci_intr_distribute(9) for x86.
 1.27 29-Mar-2014  christos branches: 1.27.6;
make pci_intr_string and eisa_intr_string take a buffer and a length
instead of relying in local static storage.
 1.26 26-Jan-2013  dyoung branches: 1.26.2;
Several registers and bitfields named IOAPIC_* actually belong to the
LAPIC, so rename them LAPIC_* and move to a more appropriate header
file.
 1.25 15-Jun-2012  yamt branches: 1.25.2;
comment
 1.24 15-Jun-2012  yamt assertions. use a macro. no functional changes.
 1.23 29-Aug-2011  dyoung branches: 1.23.2;
Use a loop instead of tail-recursion for the pci_intr(9) overrides.
This is the same change that I just made to the pci(9) overrides. While
I am here, fix a bug: use PCI_OVERRIDE_INTR_DISESTABLISH instead of
PCI_OVERRIDE_INTR_ESTABLISH for the pci_intr_disestablish(9) override.
 1.22 17-Aug-2011  dyoung In pci_msi_establish(), replace several anonymous constants with IOAPIC_
symbols. No change in the generated assembly.
 1.21 17-Aug-2011  dyoung Redefine PCI_MSI_* and PCI_PCIE_* constants in terms of bits(3).

Use named constants and more conventional variable names in
pci_msi_establish() and pci_msi_disestablish(). Fix a couple of bugs:
pci_msi_establish() returned a pointer to the struct intrhand instead of
to the struct msi_hdl as it was intended to, and pci_msi_disestablish()
did not free(9) the msi_hdl.
 1.20 01-Aug-2011  drochner add an experimental implementation of PCI MSIs (Message Signaled
Interrupts). Successfully tested with hdaudio and "wpi" wireless
ethernet.
notes:
-There seem to be buggy chips around which announce MSI support
but don't correctly implement it. Thus the final word whether MSIs
can be used should be by the driver.
-Only a single vector is supported. For multiple vectors, the IDT
allocation code would have to be changed. (And we would possibly
run into problems due to the limited number of vectors supported
by the current code.)
-The code is "#if NIOAPIC > 0" because it uses the ioapic_edge
interrupt stubs. These actually don't touch any ioapic, so this
is somewhat a misnomer.
-MSIs can't be identified by a "pin" but only by a cpu/vector
pair. Common intr code soesn't deal well with this yet.
-Drivers need to take care of saving/restoring MSI data in the device's
config space on suspend/resume.
 1.19 04-Apr-2011  dyoung Neither pci_dma64_available(), pci_probe_device(), pci_mapreg_map(9),
pci_find_rom(), pci_intr_map(9), pci_enumerate_bus(), nor the match
predicate passed to pciide_compat_intr_establish() should ever modify
their pci_attach_args argument, so make their pci_attach_args arguments
const and deal with the fallout throughout the kernel.

For the most part, these changes add a 'const' where there was no
'const' before, however, some drivers and MD code used to modify
pci_attach_args. Now those drivers either copy their pci_attach_args
and modify the copy, or refrain from modifying pci_attach_args:

Xen: according to Manuel Bouyer, writing to pci_attach_args in
pci_intr_map() was a leftover from Xen 2. Probably a bug. I
stopped writing it. I have not tested this change.

siside(4): sis_hostbr_match() needlessly wrote to pci_attach_args.
Probably a bug. I use a temporary variable. I have not tested this
change.

slide(4): sl82c105_chip_map() overwrote the caller's pci_attach_args.
Probably a bug. Use a local pci_attach_args. I have not tested
this change.

viaide(4): via_sata_chip_map() and via_sata_chip_map_new() overwrote the
caller's pci_attach_args. Probably a bug. Make a local copy of the
caller's pci_attach_args and modify the copy. I have not tested
this change.

While I'm here, make pci_mapreg_submap() static.

With these changes in place, I have tested the compilation of these
kernels:

alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-eb NSLU2
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE GUMSTIX
HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321 IXDP425 IXM1200
KUROBOX_PRO LUBBOCK MARVELL_NAS NAPPI SHEEVAPLUG SMDK2800 TEAMASA_NPWR
TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sgimips GENERIC32_IP2x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC

As of Sun Apr 3 15:26:26 CDT 2011, I could not compile these kernels
with or without my patches in place:

### evbmips-el GDIUM

nbmake: nbmake: don't know how to make /home/dyoung/pristine-nbsd/src/sys/arch/mips/mips/softintr.c. Stop

### evbarm-el MPCSA_GENERIC
src/sys/arch/evbarm/conf/MPCSA_GENERIC:318: ds1672rtc*: unknown device `ds1672rtc'

### ia64 GENERIC

/tmp/genassym.28085/assym.c: In function 'f111':
/tmp/genassym.28085/assym.c:67: error: invalid application of 'sizeof' to incomplete type 'struct pcb'
/tmp/genassym.28085/assym.c:76: error: dereferencing pointer to incomplete type

### sgimips GENERIC32_IP3x

crmfb.o: In function `crmfb_attach':
crmfb.c:(.text+0x2304): undefined reference to `ddc_read_edid'
crmfb.c:(.text+0x2304): relocation truncated to fit: R_MIPS_26 against `ddc_read_edid'
crmfb.c:(.text+0x234c): undefined reference to `edid_parse'
crmfb.c:(.text+0x234c): relocation truncated to fit: R_MIPS_26 against `edid_parse'
crmfb.c:(.text+0x2354): undefined reference to `edid_print'
crmfb.c:(.text+0x2354): relocation truncated to fit: R_MIPS_26 against `edid_print'
 1.18 20-Dec-2010  matt branches: 1.18.2;
Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.17 28-Apr-2010  dyoung Provide an x86 implementation of pci_chipset_tag_create(9) and
pci_chipset_tag_destroy(9).
 1.16 14-Mar-2010  dyoung branches: 1.16.2;
Add a new member, pc_super, to x86's pci_chipset_tag: pc.pc_super points
to the tag that pc inherits its behavior from. Add code to deal with
pc.pc_super.

Pull identical declarations out of xen/include/pci_machdep.h and
x86/include/pci_machdep.h into x86/include/pci_machdep_common.h.
 1.15 25-Feb-2010  dyoung In the x86 pci(9) implementation, test for and call a
pci_chipset_tag_t's override functions.
 1.14 18-Aug-2009  jmcneill branches: 1.14.2;
Switch to ACPICA 20090730, and update for API changes.
 1.13 21-Mar-2009  ad Fix 'boot -z' bogons.
 1.12 03-Jul-2008  drochner branches: 1.12.4; 1.12.10;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.11 30-May-2008  ad branches: 1.11.2;
pci_intr_setattr(), allows PCI interrupts to be marked MPSAFE on x86, and
other platforms if the code is added.

pci_intr_map(...)
pci_intr_setattr(pc, ih, PCI_INTR_MPSAFE, 1);
pci_intr_establish(...)
 1.10 30-May-2008  ad Add a 'known_mpsafe' argument to intr_establish().
 1.9 03-May-2008  cegger branches: 1.9.2;
ansify
 1.8 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.7 04-Jan-2008  ad branches: 1.7.6; 1.7.8; 1.7.10;
sys/lock.h isn't needed here.
 1.6 16-Nov-2006  christos branches: 1.6.28; 1.6.34; 1.6.42;
__unused removal on arguments; approved by core.
 1.5 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.4 04-Jul-2006  christos branches: 1.4.4; 1.4.6; 1.4.8;
spell NACPI correctly.
 1.3 04-Jul-2006  christos PR/33912: tron: Building GENERIC kernel fails
Fallout from ACPI changes.
 1.2 04-Jul-2006  christos Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.1 03-Feb-2006  bouyer branches: 1.1.4; 1.1.6; 1.1.14; 1.1.16;
Split move interrupt-related PCI functions from pci_machdep.c to
pci_intr_machdep.c. In Xen-3 registers access is done the normal way but
interrupts need custom setup. Proposed on port-amd64, port-i386 and
port-xen a week ago.
 1.1.16.4 21-Jan-2008  yamt sync with head
 1.1.16.3 30-Dec-2006  yamt sync with head.
 1.1.16.2 21-Jun-2006  yamt sync with head.
 1.1.16.1 03-Feb-2006  yamt file pci_intr_machdep.c was added on branch yamt-lazymbuf on 2006-06-21 14:57:56 +0000
 1.1.14.1 13-Jul-2006  gdamore Merge from HEAD.
 1.1.6.1 11-Aug-2006  yamt sync with head
 1.1.4.2 18-Feb-2006  yamt sync with head.
 1.1.4.1 03-Feb-2006  yamt file pci_intr_machdep.c was added on branch yamt-uio_vmspace on 2006-02-18 15:38:54 +0000
 1.4.8.2 10-Dec-2006  yamt sync with head.
 1.4.8.1 22-Oct-2006  yamt sync with head
 1.4.6.2 09-Sep-2006  rpaulo sync with head
 1.4.6.1 04-Jul-2006  rpaulo file pci_intr_machdep.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.4.4.1 18-Nov-2006  ad Sync with head.
 1.6.42.1 08-Jan-2008  bouyer Sync with HEAD
 1.6.34.1 18-Feb-2008  mjf Sync with HEAD.
 1.6.28.1 09-Jan-2008  matt sync with HEAD
 1.7.10.5 11-Aug-2010  yamt sync with head.
 1.7.10.4 11-Mar-2010  yamt sync with head
 1.7.10.3 19-Aug-2009  yamt sync with head.
 1.7.10.2 04-May-2009  yamt sync with head.
 1.7.10.1 16-May-2008  yamt sync with head.
 1.7.8.2 04-Jun-2008  yamt sync with head
 1.7.8.1 18-May-2008  yamt sync with head.
 1.7.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.7.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.9.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.2.1 03-Jul-2008  simonb Sync with head.
 1.12.10.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.12.10.5 02-May-2011  jym Sync with head.
 1.12.10.4 10-Jan-2011  jym Sync with HEAD
 1.12.10.3 24-Oct-2010  jym Sync with HEAD
 1.12.10.2 01-Nov-2009  jym Sync with HEAD.
 1.12.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.14.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.16.2.3 21-Apr-2011  rmind sync with head
 1.16.2.2 05-Mar-2011  rmind sync with head
 1.16.2.1 30-May-2010  rmind sync with head
 1.18.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.23.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.23.2.1 30-Oct-2012  yamt sync with head
 1.25.2.3 03-Dec-2017  jdolecek update from HEAD
 1.25.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.25.2.1 25-Feb-2013  tls resync with head
 1.26.2.1 18-May-2014  rmind sync with head
 1.27.6.5 28-Aug-2017  skrll Sync with HEAD
 1.27.6.4 05-Dec-2016  skrll Sync with HEAD
 1.27.6.3 05-Oct-2016  skrll Sync with HEAD
 1.27.6.2 22-Sep-2015  skrll Sync with HEAD
 1.27.6.1 06-Jun-2015  skrll Sync with HEAD
 1.38.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.40.2.1 13-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #493):
sys/arch/x86/include/intr.h: revision 1.53
sys/arch/x86/pci/pci_intr_machdep.c: revision 1.42
sys/arch/x86/x86/intr.c: revision 1.114 via patch
fix "intrctl list" panic when ACPI is disabled.
reviewed by cherry@n.o and tested by msaitoh@n.o, thanks.
 1.42.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.42.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.42.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.43.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.43.2.1 10-Jun-2019  christos Sync with HEAD
 1.101 26-Sep-2025  tnn x86: correct bootinfo detection for vioif(4) adapters

When we pxeboot a virtio-net adapter in QEMU, firmware reports the
parent virtio(4) pci BDF triplet as the device we can match on, but
device_pci_register must return the child vioif(4) as actual boot device.

PR kern/57023 from Martin Kjellstrand.
 1.100 08-May-2025  riastradh x86/pci_machdep.c: Nix trailing whitespace.

No functional change intended.
 1.99 06-May-2025  manu Search host bridge on all devices from PCI bus 0

We look for host bridge MSI capability to enable MSI on PCI devices,
which require locating the host bridge itself. Previously we assumed
it was on bus 0, device 0, but that assmption misses some setups.
For instance, Dell Poweredge r760xd2 has its host bridge on bus 0,
device 20, function 4.

This change iterates on all devices on bus 0 to find the host bridge.
 1.98 21-Nov-2023  gutteridge branches: 1.98.2;
pci_machdep.c & pci_msi_machdep.c: comment fixes

Correct spelling and grammar in some comments.
 1.97 17-Oct-2023  bouyer Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.
Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen
when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.
x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console
xen/x86/consinit.c: support genfb as possible console
xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.
xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.
 1.96 16-Oct-2023  bouyer Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.95 25-Aug-2023  riastradh xen: Provide definitions or ifdefs to make drm build in XEN3_DOM0.

No idea if it works, but it builds now.

PR port-xen/49330
 1.94 07-Aug-2023  msaitoh Fix detection of availability of MSI/MSI-X on some systems.

Try to find all functions on bus 0, device 0 to find a PCI host bridge.
Some CPU's host bridge is at 0:0.4. Tested by Intel Snow Ridge.
 1.93 06-Sep-2022  msaitoh branches: 1.93.4;
Fix compile error. Compile test only.
 1.92 05-Sep-2022  riastradh x86: Fix interaction between consinit, device_pci_register, and drm.

Leave an essay on what's going on here in both places with
cross-references.

PR kern/56996
 1.91 24-May-2022  bouyer - msipic_construct_msix_pic(): set mp_table_base to memaddr (without
table_offset), this is what Xen wants
while there use pci_conf_write16() in msi_set_msictl_enablebit() too,
for consistency (it seems that Xen accepts the 32bit write at this point,
but this may change).

- xen_map_msix_pirq(): don't forget to set map_irq.table_base in the
MSI-X case, otherwise Xen maps it as MSI
- call pic_hwunmask() after pirq_establish() in msi/msix case, to make sure
the msi-x vector is unmasked.

Now MSI-X works with Xen so stop disabling it in pci_attach_hook().
 1.90 23-May-2022  bouyer Work in progress on MSI/MSI-X on Xen (MSI works on my hardware, more work
needed for MSI-X):
- Xen silently rejects 32 bits writes to MSI configuration registers
(especially when setting PCI_MSI_CTL_MSI_ENABLE/PCI_MSIX_CTL_ENABLE),
it expects 16 bits writes. So introduce a pci_conf_write16(),
only available on XENPV (and working only for mode 1 without
PCI_OVERRIDE_CONF_WRITE) and use it to enable MSI or MSI-X on XENPV.
- for multi-MSI vectors, Xen allocates all of them in a single hypercall,
so it's not convenient to do it at intr_establish() time.
So do it at alloc() time and register the pirqs in the msipic structure.
xen_pic_to_gsi() now just returns the values cached in the msipic.
As a bonus, if the PHYSDEVOP_map_pirq hypercall fails we can fail
the alloc() and we don't need the xen_pci_msi*_probe() hacks.

options NO_PCI_MSI_MSIX still on by default for XEN3_DOM0.
 1.89 15-Oct-2021  jmcneill Disable MSI and MSI-X support if IAPC_BOOT_ARCH reports that MSI is not
supported.
 1.88 28-Jan-2021  jmcneill Remove x86_genfb_mtrr_init. PATs have been available since the Pentium III
and this code has been #if notyet'd shortly after being introduced.
 1.87 04-May-2020  jdolecek branches: 1.87.2;
add support for using MSI for XenPV Dom0

use PHYSDEVOP_map_pirq to get the pirq/gsi for MSI/MSI-X, switch also INTx
to use it instead of PHYSDEVOP_alloc_irq_vector

MSI confirmed working with single-vector MSI for wm(4), ahcisata(4), bge(4)

XXX added some provision for MSI-X, but it doesn't actually work (no interrupts
delivered), needs some further investigation; disable MSI-X for XENPV
via flag in x86/pci/pci_machdep.c
 1.86 24-May-2019  nonaka branches: 1.86.2;
Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.85 17-May-2019  christos Factor out the fbinfo setting code, to make it more readable, and use
memcpy to properly align the structure (although it does not matter on x86).
 1.84 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.83 10-Jul-2018  maxv Fix bug, SPINOUT() is not supposed to take the value given to BACKOFF().
Here the exponential backoff is wrecked.
 1.82 23-Jun-2018  jakllsch branches: 1.82.2;
Disable all contemporary mode 1 quirks.
 1.81 23-Jun-2018  jakllsch If mode 1 enable check fails, give mode 1 a second chance by trying to
use it to locate a PCI Host Bridge or device from vendor that produced
a chipset lacking a Host Bridge class device.

Should allow us to remove most all the mode 1 quirks added in the last
two decades.
 1.80 11-Apr-2018  nonaka efiboot reports parent ppb bus/device/function of booted network inteface.
 1.79 01-Jun-2017  chs branches: 1.79.2; 1.79.8;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.78 25-Feb-2017  nonaka EFI console is drawing faster by shadowfb.
 1.77 09-Feb-2017  msaitoh Supress verbose message "This pci host supports neither MSI nor MSI-X."
on VMware and KVM. OK'd by k-nakahara.
 1.76 25-Aug-2016  nonaka branches: 1.76.2;
more fix line break position in verbose message.
 1.75 25-Aug-2016  knakahara fix line break position in verbose message.

pointed out by nonaka@n.o, thanks.
 1.74 10-Jun-2016  jakllsch branches: 1.74.2;
Avoid trying to create a tag for locating AMD HyperTransport bridge that will
panic a machine that uses Configuration Mechanism 2.
 1.73 26-Nov-2015  jakllsch Move acpimcfg_map_bus() before no-MSI bailout in pci_attach_hook().
 1.72 02-Nov-2015  knakahara Add verbose messages when the kernel disables MSI/MSI-X.
 1.71 02-Oct-2015  msaitoh PCI Extended Configuration stuff written by nonaka@:
- Add PCI Extended Configuration Space support into x86.
- Check register offset of pci_conf_read() in MD part. It returns (pcireg_t)-1
if it isn't accessible.
- Decode Extended Capability in PCI Extended Configuration Space.
Currently the following extended capabilities are decoded:
- Advanced Error Reporting
- Virtual Channel
- Device Serial Number
- Power Budgeting
- Root Complex Link Declaration
- Root Complex Event Collector Association
- Access Control Services
- Alternative Routing-ID Interpretation
- Address Translation Services
- Single Root IO Virtualization
- Page Request
- TPH Requester
- Latency Tolerance Reporting
- Secondary PCI Express
- Process Address Space ID
- LN Requester
- L1 PM Substates
The following extended capabilities are not decoded yet:
- Root Complex Internal Link Control
- Multi-Function Virtual Channel
- RCRB Header
- Vendor Unique
- Configuration Access Correction
- Multiple Root IO Virtualization
- Multicast
- Resizable BAR
- Dynamic Power Allocation
- Protocol Multiplexing
- Downstream Port Containment
- Precision Time Management
- M-PCIe
- Function Reading Status Queueing
- Readiness Time Reporting
- Designated Vendor-Specific
 1.70 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.69 07-Nov-2014  christos branches: 1.69.2;
print the bad values in panic messages
 1.68 05-Nov-2014  christos we don't need to keep track of curmode if not vga_post.
 1.67 06-May-2014  christos branches: 1.67.2;
tidy up.
 1.66 06-May-2014  sborrill Force pci_mode 1 when running as Xen HVM domU to allow cd* to be detected
correctly. Fixes kern/48770. Thanks to cube@
 1.65 27-Jan-2014  jakllsch branches: 1.65.2;
Stopgap to prevent genfb from stealing console. Revisit later.
 1.64 26-Jan-2014  msaitoh PUCCN improvements:
- Fix a bug that the puc cn mechanism doesn't use the UART's frequency
in pucdata.c's table.

- Add a new option PUC_CNAUTO. If this option is set, consinit() in
x86/x86/consinit.c checks puc com device to use it as console.
Without this option, the behavior is the same as before.

- Add a new config parameter PUC_CNBUS. The old code scans bus #0 only.
If PUC_CNBUS is set, the specified number's bus will be scanned.

- Rename comcnprobe() to puc_cnprobe() to make it clear.

- Rename comcninit() to puc_cninit() to make it clear.

- Add code for a device that a device's com register is MMIO (#if0 ed).
 1.63 25-Dec-2013  jakllsch Give cpu_comcnprobe a chance of working on Mode 2 PCI config space.
 1.62 17-Oct-2013  christos remove set but unused variables
 1.61 05-Oct-2013  gson Force PCI mode 1 when running under QEMU, to work around QEMU bug 897771.
This should also make it possible to boot NetBSD under versions of KVM
that have inherited said QEMU bug. Fixes PR kern/45671.
 1.60 31-Jul-2013  macallan hand genfb the virtual address of the actual framebuffer, not the upper
left corner of the text area
now centering works and we don't scribble past the mapped VRAM when trying
to clear the screen in 32bit colour
 1.59 31-Jul-2013  soren Blocking memory space accesses on the SIS 85C496 chipset turned out to be
a bit too heavy-handed and similar cases are unlikely to crop up again,
so simplify by eliminating pci_bus_flags().

Closes PR port-i386/20410.
 1.58 22-Jul-2013  soren Allow console on com_puc without a compile-time option so that PC servers
can become headless after the first reboot (sadly, e.g. Intel AMT presents
as a com_puc, but doesn't appear in the BIOS serial port table, so you need
a keyboard and monitor to install and set the installboot parameters first).

Fix com_puc console on devices with offset BAR's.
 1.57 03-May-2013  jakllsch branches: 1.57.4; 1.57.6;
use IO_VGA as symbolic constant for 0x3c0 in x86_genfb_set_mapreg()
 1.56 01-Mar-2012  jakllsch branches: 1.56.2;
slightly rework pcim1_quirk_tbl[]-related bits:
- put patchable entry first so finding it with gdb/ddb is more trivial
- use pcitag_t instead of uint32_t for tag
- make this table const
- drop old #undef
- use NULL instead of 0 where appropriate.
 1.55 28-Feb-2012  jakllsch cosmetic, spelling, and grammar adjustments
 1.54 15-Feb-2012  tsutsui branches: 1.54.2; 1.54.6; 1.54.8;
Add VIA VX900 host bridge to a buggy PCI mode 1 quirk table. PR/46018
Ok releng@
 1.53 18-Nov-2011  jmcneill branches: 1.53.4;
remove Xbox support
 1.52 18-Oct-2011  dyoung branches: 1.52.2;
Factor device_isa_register() and device_pci_register() out of
device_register() and stick the new routines into isa_machdep.c and
pci_machdep.c, respectively.
 1.51 13-Sep-2011  dyoung Bracket a debugging printf() with #ifdef DEBUG.
 1.50 01-Sep-2011  christos Add bus_dma overrides. From dyoung
 1.49 29-Aug-2011  dyoung Move the code for grovelling in PCI configuration space for assigned
memory & I/O regions into its own module, pci_ranges.c, so that we can
leave it out on systems that won't need it.
 1.48 28-Aug-2011  dyoung Add some code for grovelling in the PCI configuration space for all
of the memory & I/O space reserved by the PCI BIOS for PCI devices
(including bridges) and recording that information for later use.

The code takes between 13k and 50k (depends on the architecture and,
bizarrely, the kernel configuration) so I am going to move it from
pci_machdep.c into its own module on Monday.
 1.47 28-Aug-2011  dyoung Make the override implementation more concise. Saves about three lines
of code per routine, makes it more explicit what's going on, and avoids
recursion, though the compiler probably optimized the tail recursion in
the old code.
 1.46 27-Aug-2011  christos use c99 struct initializers
 1.45 17-May-2011  dyoung PCI_FLAGS_IO_ENABLED and PCI_FLAGS_MEM_ENABLED changed their functional
role in NetBSD (drivers are no longer supposed to write these to
pa_flags) without changing name. Correct that.

Rename PCI_FLAGS_IO_ENABLED to PCI_FLAGS_IO_OKAY and
PCI_FLAGS_MEM_ENABLED to PCI_FLAGS_MEM_OKAY, thus making their names
consistent with the other PCI flags and poisoning 3rd-party driver
sources that use the flags in the old bad way.

This patch produces no binary changes in this set of PCI kernels when
they are compiled w/o 'options DIAGNOSTIC' and w/ -V MKREPRO=yes:

algor P4032 P5064 P6032
alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE
evbarm-el GUMSTIX HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321
evbarm-el IXDP425 IXM1200 KUROBOX_PRO
evbarm-el LUBBOCK MARVELL_NAS NAPPI NSLU2 SHEEVAPLUG SMDK2800 TEAMASA_NPWR
evbarm-el TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
evbppc OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
iyonix GENERIC
landisk GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sbmips-el GENERIC
sgimips GENERIC32_IP2x GENERIC32_IP3x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC
 1.44 30-Apr-2010  dyoung branches: 1.44.2;
Add an exponential back-off to the loop that tries to lock the
PCI configuration-access registers, to avoid excessive cacheline
ping-ponging. Thanks to rmind@ for the tip.
 1.43 28-Apr-2010  dyoung Provide an x86 implementation of pci_chipset_tag_create(9) and
pci_chipset_tag_destroy(9).
 1.42 27-Apr-2010  dyoung Make pci_conf_read(9) and pci_conf_write(9) re-entrant so that the
kernel can use them in an NMI trap handler. Only one CPU can be
in _read() or _write() at once. However, on any single CPU, more
than one thread of execution (LWP, interrupt handler, trap handler)
may be in _read() or _write() at once, because each thread saves
and restores the PCI configuration-access state.
 1.41 14-Mar-2010  dyoung branches: 1.41.2;
Add a new member, pc_super, to x86's pci_chipset_tag: pc.pc_super points
to the tag that pc inherits its behavior from. Add code to deal with
pc.pc_super.

Pull identical declarations out of xen/include/pci_machdep.h and
x86/include/pci_machdep.h into x86/include/pci_machdep_common.h.
 1.40 25-Feb-2010  dyoung In the x86 pci(9) implementation, test for and call a
pci_chipset_tag_t's override functions.
 1.39 16-Feb-2010  dyoung PCI Configuration Mechanisms #1 and #2 are controlled by two to
three registers. Let us think of the kernel operating the registers
in two steps:

1) Select: enable configuration cycles and select a range of
configuration-space addresses.

2) Access: read or write a word in PCI configuration space.

To make the steps more explicit, extract some helper subroutines
from pci_conf_read(9) and pci_conf_write(9):

pci_conf_selector(tag, reg): from a pcitag_t and a register offset,
create a word that enables configuration cycles and selects a
configuration address range.

pci_conf_select(w): for `w' a word created by pci_conf_selector(),
enable configuration cycles and select the address range indicated
by `w'.

pci_conf_select(0): disable configuration cycles.

pci_conf_port(tag, reg): map a pcitag_t and a register offset to an I/O
port where the configuration access should occur.

While I'm in here, change the panic(9) calls to panic("%s: ...",
__func__) instead of hard-coding a subroutine name.
 1.38 16-Feb-2010  dyoung Get rid of all PCI_CONF_MODE #ifdef'age except for the little bit
that initializes pci_mode, which I have moved to the top.

Make pci_mode private to pci_machdep.c.

Provide pci_mode_set() for pcibios.c to configure the PCI Configuration
Mechanism. KASSERT() in pci_mode_set() that the mechanism is not
changing from anything but the "don't know" value, -1.
 1.37 18-Aug-2009  jmcneill branches: 1.37.2;
Switch to ACPICA 20090730, and update for API changes.
 1.36 03-Jul-2009  drochner add SIS 740 to the list of chipsets known to implement PCI configuration
mode 1 incorrectly, from Jason White
(see thread "ACPI issue with old Shuttle system" on port-i386)
 1.35 15-Mar-2009  cegger ansify function definitions
 1.34 28-Apr-2008  martin branches: 1.34.8; 1.34.10; 1.34.14;
Remove clause 3 and 4 from TNF licenses
 1.33 16-Apr-2008  cegger branches: 1.33.2; 1.33.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.32 21-Mar-2008  dyoung Use device_t.
 1.31 14-Jan-2008  dyoung branches: 1.31.2; 1.31.6;
KASSERT() that reads/writes from/to PCI configuration space are
aligned on 32-bit boundaries.
 1.30 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.29 04-Jan-2008  ad sys/lock.h isn't needed here.
 1.28 17-Oct-2007  garbled branches: 1.28.2; 1.28.8;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.27 29-Aug-2007  dyoung Use __arraycount().
 1.26 22-Jul-2007  mjf branches: 1.26.4; 1.26.6;
Remove newline from format string of aprint_normal.

Thanks to pooka@ for pointing it out.
 1.25 19-Jul-2007  mjf s/aprintf_normal/aprint_normal
 1.24 19-Jul-2007  mjf Change printf to aprintf_normal and add a newline as requested by Christoph Egger on port-xen.
 1.23 09-Jul-2007  ad branches: 1.23.2; 1.23.4;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.22 22-Feb-2007  matt branches: 1.22.4; 1.22.6; 1.22.12;
Add missing initializer for _tag_needs_count
 1.21 21-Feb-2007  mrg add a pair of new bus_dma(9) functions:
int _bus_dmatag_subregion(bus_dma_tag_t tag,
bus_addr_t min_addr,
bus_addr_t max_addr,
bus_dma_tag_t *newtag,
int flags)
void _bus_dmatag_destroy(bus_dma_tag_t tag)

that allow a (normally broken/limited) device to restrict the bus address
range it can talk to. this is used by bce(4) to limit DMA addresses to
1GB range, the maximum the chip can address.

all this is from Yorick Hardy <yhardy@uj.ac.za> with input from several
people on tech-kern.

XXX: bus_dma(9) needs an update still.
 1.20 06-Feb-2007  jmcneill branches: 1.20.2;
On Xbox, disallow pci_conf_read/pci_conf_write calls for bus 0 device 0
functions 1 and 2.
 1.19 05-Jan-2007  jmcneill On the Xbox, prevent scanning past the first device on bus 1.
 1.18 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.17 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.16 04-Jul-2006  christos branches: 1.16.4; 1.16.6;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.15 25-Jun-2006  soren Add quirk for the not quite standard PCI bus in Parallels Desktop for Mac.
 1.14 07-Feb-2006  bouyer branches: 1.14.2; 1.14.10;
Add back proper MPBIOS/MPACPI handling.
 1.13 03-Feb-2006  bouyer branches: 1.13.2;
Split move interrupt-related PCI functions from pci_machdep.c to
pci_intr_machdep.c. In Xen-3 registers access is done the normal way but
interrupts need custom setup. Proposed on port-amd64, port-i386 and
port-xen a week ago.
 1.12 16-Nov-2005  christos branches: 1.12.2; 1.12.4;
PR/31885: Stephane Witzmann: Force pci mode 1 on SIS 741.
 1.11 20-Jun-2005  sekiya branches: 1.11.2; 1.11.8;
pci_device_foreach(), pci_device_foreach_min(), pci_bridge_foreach(), and
pci_bridge_hook don't actually have any dependancies on PCIBIOS-specific code,
and they can be used to fixup PCI bus numbering in the absence of the BIOS.

To that end, decouple them from PCIBIOS.
 1.10 16-Apr-2005  yamt tweak x86 bus_dma code so that it can be used by xen port.

- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.9 30-Oct-2003  fvdl branches: 1.9.8; 1.9.14;
* keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.8 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.7 06-Sep-2003  fvdl Move the bulk of pci_intr_string into a seperate intr_string function. Use
that new function to print the pciide compat interrupt in pciide_machdep.c.
Share pciide_machdep.c between amd64 and i386.
 1.6 06-Sep-2003  fvdl If possible, put the device name of the APIC used into the interrupt string,
not "apic N". This makes it easier to match vmstat output with dmesg output.
 1.5 15-Jun-2003  fvdl branches: 1.5.2;
Handle 64bit DMA addresses on PCI for platforms that can (currently only
enabled on amd64). Add a dmat64 field to various PCI attach structures,
and pass it down where needed. Implement a simple new function called
pci_dma64_available(pa) to test if 64bit DMA addresses may be used.
This returns 1 iff _PCI_HAVE_DMA64 is defined in <machine/pci_machdep.h>,
and there is more than 4G of memory.
 1.4 29-May-2003  fvdl Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.3 07-May-2003  fvdl Generalize bounce buffers, and use them for 32 bit PCI if needed.
Make ALLOCNOW the default iff bouncing might be needed (this has
no effect on i386 because ISA DMA devices already had to use
ALLOCNOW, and PCI isn't bounced (yet), since we don't do > 4G
at this point for i386.
 1.2 28-Apr-2003  fvdl Include "eisa.h" in order to get the NEISA value; without it EISA-only
MP intr routing tables wouldn't be searched.
 1.1 27-Feb-2003  fvdl Moved here from i386/pci.
 1.5.2.5 11-Dec-2005  christos Sync with head.
 1.5.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.1 03-Aug-2004  skrll Sync with HEAD
 1.9.14.2 21-Nov-2005  tron Pull up following revision(s) (requested by christos in ticket #966):
sys/arch/x86/pci/pci_machdep.c: revision 1.12
PR/31885: Stephane Witzmann: Force pci mode 1 on SIS 741.
 1.9.14.1 21-Apr-2005  tron Pull up revision 1.10 (requested by yamt in ticket #175):
tweak x86 bus_dma code so that it can be used by xen port.
- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.9.8.1 29-Apr-2005  kent sync with -current
 1.11.8.1 22-Nov-2005  yamt sync with head.
 1.11.2.6 24-Mar-2008  yamt sync with head.
 1.11.2.5 21-Jan-2008  yamt sync with head
 1.11.2.4 03-Sep-2007  yamt sync with head.
 1.11.2.3 26-Feb-2007  yamt sync with head.
 1.11.2.2 30-Dec-2006  yamt sync with head.
 1.11.2.1 21-Jun-2006  yamt sync with head.
 1.12.4.1 09-Sep-2006  rpaulo sync with head
 1.12.2.1 18-Feb-2006  yamt sync with head.
 1.13.2.1 22-Apr-2006  simonb Sync with head.
 1.14.10.1 13-Jul-2006  gdamore Merge from HEAD.
 1.14.2.2 11-Aug-2006  yamt sync with head
 1.14.2.1 26-Jun-2006  yamt sync with head.
 1.16.6.2 10-Dec-2006  yamt sync with head.
 1.16.6.1 22-Oct-2006  yamt sync with head
 1.16.4.3 09-Feb-2007  ad Sync with HEAD.
 1.16.4.2 12-Jan-2007  ad Sync with head.
 1.16.4.1 18-Nov-2006  ad Sync with head.
 1.20.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.22.12.1 03-Oct-2007  garbled Sync with HEAD
 1.22.6.1 11-Jul-2007  mjf Sync with head.
 1.22.4.3 09-Oct-2007  ad Sync with head.
 1.22.4.2 20-Aug-2007  ad Sync with HEAD.
 1.22.4.1 10-Apr-2007  ad Replace some more locks.
 1.23.4.2 03-Sep-2007  skrll Sync with HEAD.
 1.23.4.1 15-Aug-2007  skrll Sync with HEAD.
 1.23.2.1 07-Aug-2007  matt Sync with HEAD.
 1.26.6.3 23-Mar-2008  matt sync with HEAD
 1.26.6.2 09-Jan-2008  matt sync with HEAD
 1.26.6.1 06-Nov-2007  matt sync with HEAD
 1.26.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.28.8.2 19-Jan-2008  bouyer Sync with HEAD
 1.28.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.28.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.31.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.31.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.31.2.1 24-Mar-2008  keiichi sync with head.
 1.33.4.6 11-Aug-2010  yamt sync with head.
 1.33.4.5 11-Mar-2010  yamt sync with head
 1.33.4.4 19-Aug-2009  yamt sync with head.
 1.33.4.3 18-Jul-2009  yamt sync with head.
 1.33.4.2 04-May-2009  yamt sync with head.
 1.33.4.1 16-May-2008  yamt sync with head.
 1.33.2.1 18-May-2008  yamt sync with head.
 1.34.14.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.34.14.4 24-Oct-2010  jym Sync with HEAD
 1.34.14.3 01-Nov-2009  jym Sync with HEAD.
 1.34.14.2 23-Jul-2009  jym Sync with HEAD.
 1.34.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.34.10.2 21-May-2014  bouyer Pull up following revision(s) (requested by sborrill in ticket #1903):
sys/arch/x86/pci/pci_machdep.c: revision 1.61 via patch
sys/arch/x86/pci/pci_machdep.c: revision 1.66 via patch
Force pci_mode 1 when running as Xen HVM domU to allow cd* to be
detected correctly. Fixes kern/48770. Thanks to cube@
Force PCI mode 1 when running under QEMU, to work around QEMU bug 897771.
This should also make it possible to boot NetBSD under versions of KVM
that have inherited said QEMU bug. Fixes PR kern/45671.
 1.34.10.1 19-May-2012  riz Pull up following revision(s) (requested by gendalia in ticket #1755):
sys/arch/x86/pci/pci_machdep.c: revision 1.36
add SIS 740 to the list of chipsets known to implement PCI configuration
mode 1 incorrectly, from Jason White
(see thread "ACPI issue with old Shuttle system" on port-i386)
 1.34.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.37.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.37.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.41.2.2 31-May-2011  rmind sync with head
 1.41.2.1 30-May-2010  rmind sync with head
 1.44.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.52.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.52.2.1 17-Apr-2012  yamt sync with head
 1.53.4.4 06-Mar-2012  mrg sync to -current
 1.53.4.3 06-Mar-2012  mrg sync to -current
 1.53.4.2 04-Mar-2012  mrg sync to latest -current.
 1.53.4.1 18-Feb-2012  mrg merge to -current.
 1.54.8.1 20-Oct-2013  bouyer Apply patchm requested by gson in ticket #963:
sys/arch/x86/pci/pci_machdep.c 1.61 via patch

Force PCI mode 1 when running under QEMU, to work around
QEMU bug 897771.
This should also make it possible to boot NetBSD under versions of KVM
that have inherited said QEMU bug. Fixes PR kern/45671.
 1.54.6.1 20-Oct-2013  bouyer Apply patch, requested by gson in ticket #963:
sys/arch/x86/pci/pci_machdep.c 1.61 via patch

Force PCI mode 1 when running under QEMU, to work around
QEMU bug 897771.
This should also make it possible to boot NetBSD under versions of KVM
that have inherited said QEMU bug. Fixes PR kern/45671.
 1.54.2.2 21-May-2014  bouyer Pull up following revision(s) (requested by sborrill in ticket #1060):
sys/arch/x86/pci/pci_machdep.c: revision 1.66
Force pci_mode 1 when running as Xen HVM domU to allow cd* to be
detected correctly. Fixes kern/48770. Thanks to cube@
 1.54.2.1 20-Oct-2013  bouyer Apply patch, requested by riastradh in ticket #962:
sys/arch/x86/pci/pci_machdep.c 1.61 via patch

Force PCI mode 1 when running under QEMU, to work around
QEMU bug 897771.
This should also make it possible to boot NetBSD under versions of KVM
that have inherited said QEMU bug. Fixes PR kern/45671.
 1.56.2.3 03-Dec-2017  jdolecek update from HEAD
 1.56.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.56.2.1 23-Jun-2013  tls resync from head
 1.57.6.1 23-Jul-2013  riastradh sync with HEAD
 1.57.4.2 18-May-2014  rmind sync with head
 1.57.4.1 28-Aug-2013  rmind sync with head
 1.65.2.1 10-Aug-2014  tls Rebase.
 1.67.2.1 25-Jan-2015  martin Pull up following revision(s) (requested by nonaka in ticket #451):
sys/arch/x86/pci/pci_machdep.c: revision 1.68
we don't need to keep track of curmode if not vga_post.
 1.69.2.5 28-Aug-2017  skrll Sync with HEAD
 1.69.2.4 05-Oct-2016  skrll Sync with HEAD
 1.69.2.3 09-Jul-2016  skrll Sync with HEAD
 1.69.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.69.2.1 06-Jun-2015  skrll Sync with HEAD
 1.74.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.76.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.79.8.3 28-Jul-2018  pgoyette Sync with HEAD
 1.79.8.2 25-Jun-2018  pgoyette Sync with HEAD
 1.79.8.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.79.2.3 23-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1890):

sys/arch/x86/pci/pci_machdep.c: revision 1.94

Fix detection of availability of MSI/MSI-X on some systems.

Try to find all functions on bus 0, device 0 to find a PCI host bridge.
Some CPU's host bridge is at 0:0.4. Tested by Intel Snow Ridge.
 1.79.2.2 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.79.2.1 11-Apr-2018  martin Pull up following revision(s) (requested by nonaka in ticket #740):

sys/arch/x86/pci/pci_machdep.c: revision 1.80

efiboot reports parent ppb bus/device/function of booted network inteface.
 1.82.2.1 10-Jun-2019  christos Sync with HEAD
 1.86.2.2 16-May-2025  martin Pull up following revision(s) (requested by manu in ticket #1955):

sys/arch/x86/pci/pci_machdep.c: revision 1.99 (via patch)

Search host bridge on all devices from PCI bus 0

We look for host bridge MSI capability to enable MSI on PCI devices,
which require locating the host bridge itself. Previously we assumed
it was on bus 0, device 0, but that assmption misses some setups.

For instance, Dell Poweredge r760xd2 has its host bridge on bus 0,
device 20, function 4.

This change iterates on all devices on bus 0 to find the host bridge.
 1.86.2.1 23-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1722):

sys/arch/x86/pci/pci_machdep.c: revision 1.94

Fix detection of availability of MSI/MSI-X on some systems.

Try to find all functions on bus 0, device 0 to find a PCI host bridge.
Some CPU's host bridge is at 0:0.4. Tested by Intel Snow Ridge.
 1.87.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.93.4.5 16-May-2025  martin Pull up following revision(s) (requested by manu in ticket #1120):

sys/arch/x86/pci/pci_machdep.c: revision 1.99 (via patch)

Search host bridge on all devices from PCI bus 0

We look for host bridge MSI capability to enable MSI on PCI devices,
which require locating the host bridge itself. Previously we assumed
it was on bus 0, device 0, but that assmption misses some setups.

For instance, Dell Poweredge r760xd2 has its host bridge on bus 0,
device 20, function 4.

This change iterates on all devices on bus 0 to find the host bridge.
 1.93.4.4 21-Oct-2023  martin Apply patch, requested by bouyer in ticket #433:

sys/arch/x86/pci/pci_machdep.c (apply patch)
sys/arch/x86/x86/genfb_machdep.c (apply patch)

Fix build of XEN kernels with genfb(4)
 1.93.4.3 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.93.4.2 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #425):

sys/arch/x86/pci/pci_machdep.c: revision 1.96
sys/arch/x86/acpi/acpi_machdep.c: revision 1.36
sys/arch/x86/x86/hyperv.c: revision 1.16
sys/arch/x86/x86/genfb_machdep.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.56
sys/arch/x86/include/genfb_machdep.h: revision 1.6

Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.93.4.1 23-Aug-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #337):

sys/arch/x86/pci/pci_machdep.c: revision 1.94

Fix detection of availability of MSI/MSI-X on some systems.

Try to find all functions on bus 0, device 0 to find a PCI host bridge.
Some CPU's host bridge is at 0:0.4. Tested by Intel Snow Ridge.
 1.98.2.1 02-Aug-2025  perseant Sync with HEAD
 1.19 21-Nov-2023  gutteridge pci_machdep.c & pci_msi_machdep.c: comment fixes

Correct spelling and grammar in some comments.
 1.18 13-May-2023  andvar s/requied/required/ in comments (likely grammar should be also improved in the
future).
 1.17 23-May-2022  bouyer Work in progress on MSI/MSI-X on Xen (MSI works on my hardware, more work
needed for MSI-X):
- Xen silently rejects 32 bits writes to MSI configuration registers
(especially when setting PCI_MSI_CTL_MSI_ENABLE/PCI_MSIX_CTL_ENABLE),
it expects 16 bits writes. So introduce a pci_conf_write16(),
only available on XENPV (and working only for mode 1 without
PCI_OVERRIDE_CONF_WRITE) and use it to enable MSI or MSI-X on XENPV.
- for multi-MSI vectors, Xen allocates all of them in a single hypercall,
so it's not convenient to do it at intr_establish() time.
So do it at alloc() time and register the pirqs in the msipic structure.
xen_pic_to_gsi() now just returns the values cached in the msipic.
As a bonus, if the PHYSDEVOP_map_pirq hypercall fails we can fail
the alloc() and we don't need the xen_pci_msi*_probe() hacks.

options NO_PCI_MSI_MSIX still on by default for XEN3_DOM0.
 1.16 05-Dec-2021  msaitoh s/futher/further/ in comment.
 1.15 14-Mar-2021  skrll Remoave an extra space from a comment
 1.14 19-Jul-2020  jdolecek branches: 1.14.2;
for Xen MSI, fallback to INTx when PHYSDEVOP_map_pirq fails for the device

apparently Xen requires VT-d to be enabled in BIOS for PHYSDEVOP_map_pirq
to work, this change makes it work on systems with VT-d disabled or missing

adresses the panic part of PR port-xen/55285 by Patrick Welche
 1.13 28-Jul-2017  maxv branches: 1.13.2;
Don't include malloc.h.
 1.12 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.11 14-Apr-2017  knakahara disable msi/msix when the system doesn't detect ioapic. This would fix PR kern/52111.

Some system does not detect ioapic when "boot -1", disable acpi, and so on.
In such cases, msi/msix doesn't work, so disable them.

This patch is implemented by nonaka@n.o, I just commit by proxy, thanks.
 1.10 28-Nov-2016  knakahara branches: 1.10.2;
fix build of amd64/i386 with NO_PCI_MSI_MSIX option.
 1.9 17-Aug-2015  knakahara branches: 1.9.2;
Add kernel code to support intrctl(8).
 1.8 13-Aug-2015  msaitoh - Don't take pci_attach_args as an argument in pci_msi[x]_count().
- Move prototypes of pci_msi[x]_count() from x86/x86/pci_machdep_common to
sys/dev/pci/pcivar.h.
- Move pci_msi[x]_count() from x86/pci/pci_msi_machdep.c to sys/dev/pci/pci.c
 1.7 11-Aug-2015  msaitoh Add missing opt_intrdebug.h.
 1.6 22-Jun-2015  msaitoh Don't check PCI_FLAGS_"MSI"_OKAY in pci_"msix"_alloc_common().
OK'd by knakahara.
 1.5 15-May-2015  knakahara branches: 1.5.2;
pci_msi_string() must be used by MD code only.
 1.4 15-May-2015  knakahara refactor: change function names and move them.
 1.3 15-May-2015  knakahara unify INTx, MSI and MSI-X APIs without alloc. (alloc API is under discussion)
 1.2 08-May-2015  knakahara add a const qualifier to struct pci_attach_args *pa argument
 1.1 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.5.2.5 28-Aug-2017  skrll Sync with HEAD
 1.5.2.4 05-Dec-2016  skrll Sync with HEAD
 1.5.2.3 22-Sep-2015  skrll Sync with HEAD
 1.5.2.2 06-Jun-2015  skrll Sync with HEAD
 1.5.2.1 15-May-2015  skrll file pci_msi_machdep.c was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.9.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.9.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.10.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.2.2 03-Dec-2017  jdolecek update from HEAD
 1.13.2.1 28-Jul-2017  jdolecek file pci_msi_machdep.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.14.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3 17-Aug-2015  knakahara branches: 1.3.16;
Add kernel code to support intrctl(8).
 1.2 15-May-2015  knakahara branches: 1.2.2;
pci_msi_string() must be used by MD code only.
 1.1 15-May-2015  knakahara unify INTx, MSI and MSI-X APIs without alloc. (alloc API is under discussion)
 1.2.2.3 22-Sep-2015  skrll Sync with HEAD
 1.2.2.2 06-Jun-2015  skrll Sync with HEAD
 1.2.2.1 15-May-2015  skrll file pci_msi_machdep.h was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 17-Aug-2015  jdolecek file pci_msi_machdep.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.9 21-Jun-2021  christos prop_dictionary_set_cstring_nocopy -> prop_dictionary_set_string_nocopy
 1.8 01-Mar-2019  msaitoh branches: 1.8.16;
- Almost all ppbreg.h's definitions are also in pcireg.h. Remove duplicated
definitions from ppbreg.h and move some definitions from ppbreg.h to
pcireg.h.
- Change fast back-to-back "capable" to "enable" in pci_subr.c.
- Print Primary Discard Timer, Secondary Discard Timer, Discard Timer Status
and Discard Timer SERR# Enable bit in pci_subr.c.
- PCI_BRIDGE_PREFETCHBASE32_REG and PCI_BRIDGE_PREFETCHLIMIT32_REG are
"upper" 32bit registers, rename to *UP32_REG to avoid confusion.
- Use macro.
 1.7 01-Jun-2017  chs branches: 1.7.10;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.6 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.5 27-Jul-2015  msaitoh KNF.
 1.4 27-Oct-2012  chs branches: 1.4.14;
split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.
 1.3 18-Oct-2011  dyoung branches: 1.3.2; 1.3.12;
Add an implementation of device_pci_props_register().
 1.2 13-Sep-2011  dyoung Clean up a bit: delete #if 1 and its corresponding #endif.
 1.1 29-Aug-2011  dyoung Move the code for grovelling in PCI configuration space for assigned
memory & I/O regions into its own module, pci_ranges.c, so that we can
leave it out on systems that won't need it.
 1.3.12.2 03-Dec-2017  jdolecek update from HEAD
 1.3.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.3.2.1 30-Oct-2012  yamt sync with head
 1.4.14.3 28-Aug-2017  skrll Sync with HEAD
 1.4.14.2 09-Jul-2016  skrll Sync with HEAD
 1.4.14.1 22-Sep-2015  skrll Sync with HEAD
 1.7.10.1 10-Jun-2019  christos Sync with HEAD
 1.8.16.1 01-Aug-2021  thorpej Sync with HEAD.
 1.22 19-Oct-2025  thorpej Use {,e}isabus_attach().
 1.21 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.20 24-Apr-2021  thorpej branches: 1.20.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.19 04-Oct-2019  mrg branches: 1.19.10;
add missing break.

surely it is not intended to treat viatech devices with
non VT82C686A's device id as maybe cyrix pci bridges.
 1.18 17-Jun-2019  msaitoh KNF. No functional change.
 1.17 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.16 04-Mar-2018  jdolecek branches: 1.16.4;
according to VT82C686A chip specs, the VIA Technologies device 0x3057
is more a Power Management controller, rename the pcidevs entry and device
macro

PR kern/31963 by Nicolas Joly
 1.15 06-Apr-2012  plunky device_pmf_is_registered() is not required
 1.14 30-Jan-2012  drochner Use pci_aprint_devinfo(9) instead of pci_devinfo+aprint_{normal,naive}
where it looks straightforward, and pci_aprint_devinfo_fancy in a few
others where drivers want to supply their own device names instead
of the pcidevs generated one. More complicated cases, where names
are composed at runtime, are left alone for now. It certainly makes
sense to simplify the drivers here rather than inventing a catch-all API.
This should serve as as example for new drivers, and also ensure
consistent output in the AB_QUIET ("boot -q") case. Also, it avoids
excessive stack usage where drivers attach child devices because the
buffer for the device name is not kept on the local stack anymore.
 1.13 01-Jul-2011  dyoung branches: 1.13.2; 1.13.6;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.12 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.11 08-Jan-2010  dyoung branches: 1.11.2; 1.11.4;
Move all copies of ifattr_match() to sys/kern/subr_autoconf.c.
 1.10 23-Aug-2009  jmcneill Save a line of dmesg by printing the vendor/product info on the same line
as the locators.
 1.9 18-Aug-2009  dyoung Allow detachment and re-attachment of an ISA bus at an PCI-ISA bus
bridge, isa0 at pcib0.
 1.8 02-Apr-2009  dyoung During shutdown, detach devices in an orderly fashion.

Call the detach routine for every device in the device tree, starting
with the leaves and moving toward the root, expecting that each
(pseudo-)device driver will use the opportunity to gracefully commit
outstandings transactions to the underlying (pseudo-)device and to
relinquish control of the hardware to the system BIOS.

Detaching devices is not suitable for every shutdown: in an emergency,
or if the system state is inconsistent, we should resort to a fast,
simple shutdown that uses only the pmf(9) shutdown hooks and the
(deprecated) shutdownhooks. For now, if the flag RB_NOSYNC is set in
boothowto, opt for the fast, simple shutdown.

Add a device flag, DVF_DETACH_SHUTDOWN, that indicates by its presence
that it is safe to detach a device during shutdown. Introduce macros
CFATTACH_DECL3() and CFATTACH_DECL3_NEW() for creating autoconf
attachments with default device flags. Add DVF_DETACH_SHUTDOWN
to configuration attachments for atabus(4), atw(4) at cardbus(4),
cardbus(4), cardslot(4), com(4) at isa(4), elanpar(4), elanpex(4),
elansc(4), gpio(4), npx(4) at isa(4), nsphyter(4), pci(4), pcib(4),
pcmcia(4), ppb(4), sip(4), wd(4), and wdc(4) at isa(4).

Add a device-detachment "reason" flag, DETACH_SHUTDOWN, that tells the
autoconf code and a device driver that the reason for detachment is
system shutdown.

Add a sysctl, kern.detachall, that tells the system to try to detach
every device at shutdown, regardless of any device's DVF_DETACH_SHUTDOWN
flag. The default for kern.detachall is 0. SET IT TO 1, PLEASE, TO
HELP TEST AND DEBUG DEVICE DETACHMENT AT SHUTDOWN.

This is a work in progress. In future work, I aim to treat
pseudo-devices more thoroughly, and to gracefully tear down a stack of
(pseudo-)disk drivers and filesystems, including cgd(4), vnd(4), and
raid(4) instances at shutdown.

Also commit some changes that are not easily untangled from the rest:

(1) begin to simplify device_t locking: rename struct pmf_private to
device_lock, and incorporate device_lock into struct device.

(2) #include <sys/device.h> in sys/pmf.h in order to get some
definitions that it needs. Stop unnecessarily #including <sys/device.h>
in sys/arch/x86/include/pic.h to keep the amd64, xen, and i386 releases
building.
 1.7 04-Aug-2008  cegger branches: 1.7.2; 1.7.8;
struct cfdata -> cfdata_t
 1.6 03-Aug-2008  joerg Move some MD declarations from x86/pci/files.pci to x86/conf/files.x86,
so that Xen can use the former.

Drop Xen's pcib.c in favor of the x86 code and thereby unbreak ichlpcib.
 1.5 20-Jul-2008  martin Make the softc externally visible, so other bridges reusing this code
don't have to "get it right" manually.
 1.4 28-Apr-2008  martin branches: 1.4.2; 1.4.4; 1.4.6;
Remove clause 3 and 4 from TNF licenses
 1.3 22-Feb-2008  dyoung branches: 1.3.2; 1.3.4;
Add methods to detach self and children.

Use device_t and accessors. Use aprint_*_dev().
 1.2 09-Dec-2007  jmcneill branches: 1.2.6; 1.2.10;
Merge jmcneill-pm branch.
 1.1 26-Oct-2007  xtraeme branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10; 1.1.12; 1.1.14; 1.1.16;
Share pcib(4) and amdpcib(4) between i386 and amd64; one copy is enough.
 1.1.16.1 11-Dec-2007  yamt sync with head.
 1.1.14.1 26-Dec-2007  ad Sync with head.
 1.1.12.2 03-Dec-2007  ad Sync with HEAD.
 1.1.12.1 26-Oct-2007  ad file pcib.c was added on branch vmlocking on 2007-12-03 19:04:29 +0000
 1.1.10.2 13-Nov-2007  bouyer Sync with HEAD
 1.1.10.1 26-Oct-2007  bouyer file pcib.c was added on branch bouyer-xenamd64 on 2007-11-13 16:00:20 +0000
 1.1.8.4 23-Mar-2008  matt sync with HEAD
 1.1.8.3 09-Jan-2008  matt sync with HEAD
 1.1.8.2 06-Nov-2007  matt sync with HEAD
 1.1.8.1 26-Oct-2007  matt file pcib.c was added on branch matt-armv6 on 2007-11-06 23:23:44 +0000
 1.1.4.5 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.1.4.4 12-Nov-2007  joerg CG unused softc fields.
 1.1.4.3 06-Nov-2007  joerg Refactor PNP API:
- Make suspend/resume directly a device functionality. It consists of
three layers (class logic, device logic, bus logic), all of them being
optional. This replaces D0/D3 transitions.
- device_is_active returns true if the device was not disabled and was
not suspended (even partially), device_is_enabled returns true if the
device was enabled.
- Change pnp_global_transition into pnp_system_suspend and
pnp_system_resume. Before running any suspend/resume handlers, check
that all currently attached devices support power management and bail
out otherwise. The latter is not done for the shutdown/panic case.
- Make the former bus-specific generic network handlers a class handler.
- Make PNP message like volume up/down/toogle PNP events. Each device
can register what events they are interested in and whether the handler
should be global or not.
- Introduce device_active API for devices to mark themselve in use from
either the system or the device. Use this to implement the idle handling
for audio and input devices. This is intended to replace most ad-hoc
watchdogs as well.
- Fix somes situations in which audio resume would lose mixer settings.
- Make USB host controllers better deal with suspend in the light of
shared interrupts.
- Flush filesystem cache on suspend.
- Flush disk caches on suspend. Put ATA disks into standby on suspend as
well.
- Adopt drivers to use the new PNP API.
- Fix a critical bug in the generic cardbus layer that made D0->D3
break.
- Fix ral(4) to set if_stop.
- Convert cbb(4) to the new PNP API.
- Apply the PCI Express SCI fix on resume again.
 1.1.4.2 28-Oct-2007  joerg Sync with HEAD.
 1.1.4.1 26-Oct-2007  joerg file pcib.c was added on branch jmcneill-pm on 2007-10-28 20:11:00 +0000
 1.1.2.4 27-Feb-2008  yamt sync with head.
 1.1.2.3 21-Jan-2008  yamt sync with head
 1.1.2.2 27-Oct-2007  yamt sync with head.
 1.1.2.1 26-Oct-2007  yamt file pcib.c was added on branch yamt-lazymbuf on 2007-10-27 11:28:58 +0000
 1.2.10.3 28-Sep-2008  mjf Sync with HEAD.
 1.2.10.2 02-Jun-2008  mjf Sync with HEAD.
 1.2.10.1 03-Apr-2008  mjf Sync with HEAD.
 1.2.6.1 24-Mar-2008  keiichi sync with head.
 1.3.4.6 11-Aug-2010  yamt sync with head.
 1.3.4.5 11-Mar-2010  yamt sync with head
 1.3.4.4 16-Sep-2009  yamt sync with head
 1.3.4.3 19-Aug-2009  yamt sync with head.
 1.3.4.2 04-May-2009  yamt sync with head.
 1.3.4.1 16-May-2008  yamt sync with head.
 1.3.2.1 18-May-2008  yamt sync with head.
 1.4.6.1 19-Oct-2008  haad Sync with HEAD.
 1.4.4.1 28-Jul-2008  simonb Sync with head.
 1.4.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.7.8.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.7.8.3 24-Oct-2010  jym Sync with HEAD
 1.7.8.2 01-Nov-2009  jym Sync with HEAD.
 1.7.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.7.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.11.4.1 30-May-2010  rmind sync with head
 1.11.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.13.6.2 29-Apr-2012  mrg sync to latest -current.
 1.13.6.1 18-Feb-2012  mrg merge to -current.
 1.13.2.1 17-Apr-2012  yamt sync with head
 1.16.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.16.4.1 10-Jun-2019  christos Sync with HEAD
 1.19.10.1 24-Mar-2021  thorpej Don't filter interface attributes in rescan functions for devices that
carry only a single interface attribute. The autoconfiguration machinery
already considers interface attributes when searching for possible parents.
 1.20.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.2 18-Aug-2009  dyoung Allow detachment and re-attachment of an ISA bus at an PCI-ISA bus
bridge, isa0 at pcib0.
 1.1 20-Jul-2008  martin branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.16; 1.1.20;
Make the softc externally visible, so other bridges reusing this code
don't have to "get it right" manually.
 1.1.20.3 19-Aug-2009  yamt sync with head.
 1.1.20.2 04-May-2009  yamt sync with head.
 1.1.20.1 20-Jul-2008  yamt file pcibvar.h was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.1.16.1 01-Nov-2009  jym Sync with HEAD.
 1.1.8.2 19-Oct-2008  haad Sync with HEAD.
 1.1.8.1 20-Jul-2008  haad file pcibvar.h was added on branch haad-dm on 2008-10-19 22:16:07 +0000
 1.1.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.1.6.1 20-Jul-2008  mjf file pcibvar.h was added on branch mjf-devfs2 on 2008-09-28 10:40:12 +0000
 1.1.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.1.4.1 20-Jul-2008  wrstuden file pcibvar.h was added on branch wrstuden-revivesa on 2008-09-18 04:33:38 +0000
 1.1.2.2 28-Jul-2008  simonb Sync with head.
 1.1.2.1 20-Jul-2008  simonb file pcibvar.h was added on branch simonb-wapbl on 2008-07-28 14:37:26 +0000
 1.17 04-Nov-2017  cherry Remove bitrotted xen specific versions of pci, pciide machdep related code.

Use the common x86/ code instead.
 1.16 15-Oct-2016  jdolecek provide intr xname
 1.15 27-Jul-2015  msaitoh branches: 1.15.2;
KNF.
 1.14 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.13 12-May-2014  joerg branches: 1.13.4;
buf is only used in the ioapic case.
 1.12 29-Mar-2014  christos branches: 1.12.2;
make pci_intr_string and eisa_intr_string take a buffer and a length
instead of relying in local static storage.
 1.11 04-Apr-2011  dyoung branches: 1.11.4; 1.11.14; 1.11.18;
Neither pci_dma64_available(), pci_probe_device(), pci_mapreg_map(9),
pci_find_rom(), pci_intr_map(9), pci_enumerate_bus(), nor the match
predicate passed to pciide_compat_intr_establish() should ever modify
their pci_attach_args argument, so make their pci_attach_args arguments
const and deal with the fallout throughout the kernel.

For the most part, these changes add a 'const' where there was no
'const' before, however, some drivers and MD code used to modify
pci_attach_args. Now those drivers either copy their pci_attach_args
and modify the copy, or refrain from modifying pci_attach_args:

Xen: according to Manuel Bouyer, writing to pci_attach_args in
pci_intr_map() was a leftover from Xen 2. Probably a bug. I
stopped writing it. I have not tested this change.

siside(4): sis_hostbr_match() needlessly wrote to pci_attach_args.
Probably a bug. I use a temporary variable. I have not tested this
change.

slide(4): sl82c105_chip_map() overwrote the caller's pci_attach_args.
Probably a bug. Use a local pci_attach_args. I have not tested
this change.

viaide(4): via_sata_chip_map() and via_sata_chip_map_new() overwrote the
caller's pci_attach_args. Probably a bug. Make a local copy of the
caller's pci_attach_args and modify the copy. I have not tested
this change.

While I'm here, make pci_mapreg_submap() static.

With these changes in place, I have tested the compilation of these
kernels:

alpha GENERIC
amd64 GENERIC XEN3_DOM0
arc GENERIC
atari HADES MILAN-PCIIDE
bebox GENERIC
cats GENERIC
cobalt GENERIC
evbarm-eb NSLU2
evbarm-el ADI_BRH ARMADILLO9 CP3100 GEMINI GEMINI_MASTER GEMINI_SLAVE GUMSTIX
HDL_G IMX31LITE INTEGRATOR IQ31244 IQ80310 IQ80321 IXDP425 IXM1200
KUROBOX_PRO LUBBOCK MARVELL_NAS NAPPI SHEEVAPLUG SMDK2800 TEAMASA_NPWR
TEAMASA_NPWR_FC TS7200 TWINTAIL ZAO425
evbmips-el AP30 DBAU1500 DBAU1550 MALTA MERAKI MTX-1 OMSAL400 RB153 WGT624V3
evbmips64-el XLSATX
evbppc EV64260 MPC8536DS MPC8548CDS OPENBLOCKS200 OPENBLOCKS266
OPENBLOCKS266_OPT P2020RDB PMPPC RB800 WALNUT
hp700 GENERIC
i386 ALL XEN3_DOM0 XEN3_DOMU
ibmnws GENERIC
macppc GENERIC
mvmeppc GENERIC
netwinder GENERIC
ofppc GENERIC
prep GENERIC
sandpoint GENERIC
sgimips GENERIC32_IP2x
sparc GENERIC_SUN4U KRUPS
sparc64 GENERIC

As of Sun Apr 3 15:26:26 CDT 2011, I could not compile these kernels
with or without my patches in place:

### evbmips-el GDIUM

nbmake: nbmake: don't know how to make /home/dyoung/pristine-nbsd/src/sys/arch/mips/mips/softintr.c. Stop

### evbarm-el MPCSA_GENERIC
src/sys/arch/evbarm/conf/MPCSA_GENERIC:318: ds1672rtc*: unknown device `ds1672rtc'

### ia64 GENERIC

/tmp/genassym.28085/assym.c: In function 'f111':
/tmp/genassym.28085/assym.c:67: error: invalid application of 'sizeof' to incomplete type 'struct pcb'
/tmp/genassym.28085/assym.c:76: error: dereferencing pointer to incomplete type

### sgimips GENERIC32_IP3x

crmfb.o: In function `crmfb_attach':
crmfb.c:(.text+0x2304): undefined reference to `ddc_read_edid'
crmfb.c:(.text+0x2304): relocation truncated to fit: R_MIPS_26 against `ddc_read_edid'
crmfb.c:(.text+0x234c): undefined reference to `edid_parse'
crmfb.c:(.text+0x234c): relocation truncated to fit: R_MIPS_26 against `edid_parse'
crmfb.c:(.text+0x2354): undefined reference to `edid_print'
crmfb.c:(.text+0x2354): relocation truncated to fit: R_MIPS_26 against `edid_print'
 1.10 06-Nov-2010  jakllsch branches: 1.10.2;
Implement pciide_machdep_compat_intr_disestablish() to help enable
detachment of compatibility-mapped pciide(4)-family controllers.
 1.9 01-May-2009  cegger branches: 1.9.2; 1.9.4;
- struct device * -> device_t
- remove useless parenthesis
 1.8 16-Apr-2008  cegger branches: 1.8.4; 1.8.18;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.7 01-Dec-2007  jmcneill branches: 1.7.14;
aprintify
 1.6 16-Nov-2006  christos branches: 1.6.8; 1.6.26; 1.6.28; 1.6.34;
__unused removal on arguments; approved by core.
 1.5 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.4 11-Dec-2005  christos branches: 1.4.20; 1.4.22;
merge ktrace-lwp.
 1.3 30-Oct-2003  fvdl branches: 1.3.4; 1.3.18;
* keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.2 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.1 06-Sep-2003  fvdl Move the bulk of pci_intr_string into a seperate intr_string function. Use
that new function to print the pciide compat interrupt in pciide_machdep.c.
Share pciide_machdep.c between amd64 and i386.
 1.3.18.2 07-Dec-2007  yamt sync with head
 1.3.18.1 30-Dec-2006  yamt sync with head.
 1.3.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.4.2 03-Aug-2004  skrll Sync with HEAD
 1.3.4.1 30-Oct-2003  skrll file pciide_machdep.c was added on branch ktrace-lwp on 2004-08-03 10:43:04 +0000
 1.4.22.2 10-Dec-2006  yamt sync with head.
 1.4.22.1 22-Oct-2006  yamt sync with head
 1.4.20.1 18-Nov-2006  ad Sync with head.
 1.6.34.1 08-Dec-2007  mjf Sync with HEAD.
 1.6.28.1 09-Jan-2008  matt sync with HEAD
 1.6.26.1 01-Dec-2007  jmcneill Sync with HEAD.
 1.6.8.1 03-Dec-2007  ad Sync with HEAD.
 1.7.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.8.18.4 02-May-2011  jym Sync with head.
 1.8.18.3 10-Jan-2011  jym Sync with HEAD
 1.8.18.2 01-Nov-2009  jym Sync with HEAD.
 1.8.18.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.8.4.1 04-May-2009  yamt sync with head.
 1.9.4.2 21-Apr-2011  rmind sync with head
 1.9.4.1 05-Mar-2011  rmind sync with head
 1.9.2.1 09-Nov-2010  uebayasi Sync with HEAD.
 1.10.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.18.1 18-May-2014  rmind sync with head
 1.11.14.2 03-Dec-2017  jdolecek update from HEAD
 1.11.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.12.2.1 10-Aug-2014  tls Rebase.
 1.13.4.3 05-Dec-2016  skrll Sync with HEAD
 1.13.4.2 22-Sep-2015  skrll Sync with HEAD
 1.13.4.1 06-Jun-2015  skrll Sync with HEAD
 1.15.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.3 07-Apr-2020  christos Recognize more rdc devices (Andrius V.)
 1.2 01-Jul-2011  dyoung branches: 1.2.46; 1.2.54; 1.2.58;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.1 04-Apr-2011  bouyer branches: 1.1.2; 1.1.4; 1.1.8;
Add a driver for RDC's vortex86/PMX-1000 SoC PCI/ISA bridge, with support
for the integrated watchdog timer.
 1.1.8.2 06-Jun-2011  jruoho Sync with HEAD.
 1.1.8.1 04-Apr-2011  jruoho file rdcpcib.c was added on branch jruoho-x86intr on 2011-06-06 09:07:07 +0000
 1.1.4.3 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.1.4.2 02-May-2011  jym Sync with head.
 1.1.4.1 04-Apr-2011  jym file rdcpcib.c was added on branch jym-xensuspend on 2011-05-02 22:49:57 +0000
 1.1.2.2 21-Apr-2011  rmind sync with head
 1.1.2.1 04-Apr-2011  rmind file rdcpcib.c was added on branch rmind-uvmplock on 2011-04-21 01:41:32 +0000
 1.2.58.1 12-Jul-2020  martin Additionaly pull up the following revisions for ticket #998 to unbreak
the build:

sys/arch/x86/pci/rdcpcib.c 1.3

Recognize more RDC devices.
 1.2.54.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.46.1 20-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #1578):

sys/dev/pci/pcidevs: revision 1.1404
sys/dev/pci/pcidevs: revision 1.1405
sys/arch/x86/pci/rdcpcib.c: revision 1.3

Add more RDC products (Andrius V)

Recognize more rdc devices (Andrius V.)

Fix typo
 1.10 12-Apr-2023  riastradh ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.

Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.9 22-Sep-2022  riastradh branches: 1.9.4;
tco(4): Nix PMC_TCO_BASE offset in TCO register definitions.

This just uses a subregion with PMC_TCO_BASE automatically applied.

No functional change intended.
 1.8 22-Sep-2022  riastradh tco(4): Use a subregion of the PMC registers for TCO registers.

This is an intermediate step that will let us decouple it from access
via PMBASE.
 1.7 22-Sep-2022  riastradh ichlpcib(4), tco(4): Rename iot -> pmt, ioh -> pmh.

Makes it clearer that this is specifically about the power management
controller (PMC) registers relative to PMBASE.
 1.6 22-Sep-2022  riastradh ichlpcib(4), tco(4): Take `lpcib_' off various names.

For PMC-specific ones, change `lpcib_' to `pmc_'. These are in a
separate PCI device in newer chipsets.

For TCO-specific ones, which may live in different places, whether at
their own base address or as an offset from PMBASE, just leave it as
`tco_' or `tcotimer'.

No functional change intended.
 1.5 22-Sep-2022  riastradh tco(4): Rename lpcib_tco_attach_args -> tco_attach_args.

No longer hangs off LPC bus, newer devices hang it off SMBus.
 1.4 22-Sep-2022  riastradh tco(4): Change has_rcba bit into version number.

Will be useful for newer Intel platform controller hubs.

No functional change intended. Module ABI is unchanged, although older
modules will do something nonseneical when confronted with versions
above 1 -- that will require a revbump (but with any luck, it will make
life easier for versions above 2 easier once we do that).
 1.3 21-Sep-2022  riastradh tco(4): Fix whitespace. No functional change intended.
 1.2 30-Aug-2015  christos branches: 1.2.16;
print the configuration information early so that it does not get intermingled
with possible error prints.
 1.1 03-May-2015  pgoyette branches: 1.1.2;
Separate the watchdog code from the pcib code, and make the watchdog
a loadable module.
 1.1.2.3 22-Sep-2015  skrll Sync with HEAD
 1.1.2.2 06-Jun-2015  skrll Sync with HEAD
 1.1.2.1 03-May-2015  skrll file tco.c was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 30-Aug-2015  jdolecek file tco.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.9.4.1 01-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #282):

sys/dev/pci/ichsmb.c: revision 1.82
sys/arch/amd64/conf/GENERIC: revision 1.602
sys/arch/x86/pci/tco.c: revision 1.10
sys/arch/x86/pci/tco.h: revision 1.5
sys/arch/x86/pci/ichlpcib.c: revision 1.59
sys/dev/ic/i82801lpcreg.h: revision 1.17
sys/arch/x86/pci/files.pci: revision 1.27
sys/dev/pci/files.pci: revision 1.446

ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.
Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.5 12-Apr-2023  riastradh ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.

Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.4 22-Sep-2022  riastradh branches: 1.4.4;
ichlpcib(4), tco(4): Rename iot -> pmt, ioh -> pmh.

Makes it clearer that this is specifically about the power management
controller (PMC) registers relative to PMBASE.
 1.3 22-Sep-2022  riastradh tco(4): Rename lpcib_tco_attach_args -> tco_attach_args.

No longer hangs off LPC bus, newer devices hang it off SMBus.
 1.2 22-Sep-2022  riastradh tco(4): Change has_rcba bit into version number.

Will be useful for newer Intel platform controller hubs.

No functional change intended. Module ABI is unchanged, although older
modules will do something nonseneical when confronted with versions
above 1 -- that will require a revbump (but with any luck, it will make
life easier for versions above 2 easier once we do that).
 1.1 03-May-2015  pgoyette branches: 1.1.2; 1.1.18;
Separate the watchdog code from the pcib code, and make the watchdog
a loadable module.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 03-May-2015  jdolecek file tco.h was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.1.2.2 06-Jun-2015  skrll Sync with HEAD
 1.1.2.1 03-May-2015  skrll file tco.h was added on branch nick-nhusb on 2015-06-06 14:40:04 +0000
 1.4.4.1 01-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #282):

sys/dev/pci/ichsmb.c: revision 1.82
sys/arch/amd64/conf/GENERIC: revision 1.602
sys/arch/x86/pci/tco.c: revision 1.10
sys/arch/x86/pci/tco.h: revision 1.5
sys/arch/x86/pci/ichlpcib.c: revision 1.59
sys/dev/ic/i82801lpcreg.h: revision 1.17
sys/arch/x86/pci/files.pci: revision 1.27
sys/dev/pci/files.pci: revision 1.446

ichsmb(4), tco(4): Add support for TCO on newer Intel chipsets.

TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.
Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).

XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
 1.4 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.3 24-Apr-2021  thorpej branches: 1.3.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.2 11-Jul-2016  msaitoh branches: 1.2.32;
KNF. No functional change.
 1.1 05-Dec-2012  christos branches: 1.1.2; 1.1.6; 1.1.18;
Intel Atom E600 PCI-LPC bridge, adds a watchdog + HPET support. Tested
on a Soekris net6501. (jmcneill)
 1.1.18.1 05-Oct-2016  skrll Sync with HEAD
 1.1.6.3 03-Dec-2017  jdolecek update from HEAD
 1.1.6.2 25-Feb-2013  tls resync with head
 1.1.6.1 05-Dec-2012  tls file tcpcib.c was added on branch tls-maxphys on 2013-02-25 00:29:05 +0000
 1.1.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.1.2.1 05-Dec-2012  yamt file tcpcib.c was added on branch yamt-pagecache on 2013-01-16 05:33:10 +0000
 1.2.32.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.3.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.6 10-May-2023  riastradh x86/imc(4): Use config_detach_children.
 1.5 28-Sep-2022  msaitoh Fix compile error.
 1.4 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.3 24-Apr-2021  thorpej branches: 1.3.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.2 15-Mar-2018  maya branches: 1.2.6; 1.2.16;
Provide a default case also when building imc as builtin.

Fixes ALL kernel build. ok pgoyette.
 1.1 01-Mar-2018  pgoyette branches: 1.1.2;
Move the imc(4) and imcsmb(4) sources into architecture-specific
directory (for previous CVS history see the sys/dev/pci/imcsmb/
Attic)
 1.1.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.2.16.2 28-Mar-2021  thorpej - The third argument passed to the rescan function is a locs array, not
a pointer to flags.
- imc and imcsmb each carry only a single interface attribute, so no
need to be explicit.
 1.2.16.1 23-Mar-2021  thorpej Convert config_found_ia() call sites where the device only carries
a single interface attribute to bare config_found() calls.
 1.2.6.1 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1537):

sys/arch/x86/pci/imcsmb/imc.c: revision 1.5
sys/dev/pci/pcidevs: revision 1.1461-1.1468

add several samsung nvme entries

Add more Alder Lake devices.

Jasper Lake Intel Trace Hub on Compute Die is not 0x4da6 but 0x4e29.

Add Intel Core 8G (8core, H, Halo) Host Bridge, DRAM.

Sort by number. No functional change.

Add AMD 19h/6xh Root Complex.

Add AMD FCH SATA Controller D

add NVIDIA GeForce GTX 770

Fix compile error.
 1.3.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.7 15-Sep-2025  thorpej Encapsulate what's needed to attach an I2C bus into a iicbus_attach()
inline.
 1.6 10-May-2023  riastradh x86/imc(4): Use config_detach_children.
 1.5 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.4 24-Apr-2021  thorpej branches: 1.4.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.3 22-Dec-2019  thorpej branches: 1.3.10;
Cleanup i2c bus acquire / release, centralizing all of the logic into
iic_acquire_bus() / iic_release_bus(). "acquire" and "release" hooks
no longer need to be provided by back-end controller drivers (only if
they need special handling, e.g. powering on the i2c controller).
This results in the removal of a bunch of rendundant code from each
back-end controller driver.

Assert that we are not in hard interrupt context in iic_acquire_bus(),
iic_exec(), and iic_release_bus().
 1.2 03-Mar-2018  pgoyette branches: 1.2.4;
Fix the attach message - needs a ": "
 1.1 01-Mar-2018  pgoyette Move the imc(4) and imcsmb(4) sources into architecture-specific
directory (for previous CVS history see the sys/dev/pci/imcsmb/
Attic)
 1.2.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.10.2 28-Mar-2021  thorpej - The third argument passed to the rescan function is a locs array, not
a pointer to flags.
- imc and imcsmb each carry only a single interface attribute, so no
need to be explicit.
 1.3.10.1 23-Mar-2021  thorpej In imcsmb_rescan(), no need to explcitly check that we're being asked
to rescan on the "i2cbus" interface attribute. The autoconfiguration
machinery will not request a rescan of an interface attribute that a
device does not carry, and imcsmb only carries "i2cbus".
 1.4.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.1 01-Mar-2018  pgoyette Move the imc(4) and imcsmb(4) sources into architecture-specific
directory (for previous CVS history see the sys/dev/pci/imcsmb/
Attic)
 1.1 01-Mar-2018  pgoyette Move the imc(4) and imcsmb(4) sources into architecture-specific
directory (for previous CVS history see the sys/dev/pci/imcsmb/
Attic)
 1.3 08-Jul-2025  imil Rename pvbus component to pv to allow conditional NPV define

pvbus is always defined in files.pv, which prevents the use of
#if NPVBUS > 0. Renaming it to pv aligns with naming conventions used
for other bus components and allows NPV to be conditionally defined.
 1.2 15-Jan-2025  imil Add support for command line MMIO devices, at least qemu and
Firecracker pass MMIO virtual devices mapping through the kernel
command line.
This driver is based on Colin Percival's FreeBSD virtio_mmio_cmdline.c
https://github.com/freebsd/freebsd-src/blob/main/sys/dev/virtio/mmio/virtio_mmio_cmdline.c

The following kernel options are needed

options MPBIOS
options MPTABLE_LINUX_BUG_COMPAT

As are these drivers

pv* at pvbus?
virtio* at pv?

Exemple qemu usage on a Linux host to boot a NetBSD guest:

qemu-system-x86_64 \
-M microvm,x-option-roms=off,rtc=on,acpi=off,pic=off,accel=kvm \
-m 256 -cpu host -kernel ${KERNEL} \
-append "root=ld0a console=com rw -v" \
-device virtio-blk-device,drive=hd0 \
-drive file=${IMG},format=raw,id=hd0 \
-device virtio-net-device,netdev=net0 \
-netdev user,id=net0,ipv6=off,hostfwd=::2200-:22 \
-global virtio-mmio.force-legacy=false -display none -serial stdio

A lightweight kernel configuration named MICROVM if available for this
use case.
 1.1 02-Jan-2025  imil Trivial bus implementation inspired by OpenBSD's pv(4) to attach devices
that don't need nor rely on a PCI or ISA bus.
 1.2 14-Jan-2025  riastradh x86/pvbus(4): KNF. No functional change intended.
 1.1 02-Jan-2025  imil Trivial bus implementation inspired by OpenBSD's pv(4) to attach devices
that don't need nor rely on a PCI or ISA bus.
 1.1 02-Jan-2025  imil Trivial bus implementation inspired by OpenBSD's pv(4) to attach devices
that don't need nor rely on a PCI or ISA bus.
 1.30 12-Jun-2011  jruoho Follow IA-64 with the x86-specific ACPI MD functions and move these where
they belong to. Remove an unused function. Minor KNF. No functional change.
 1.29 14-Jan-2011  jruoho branches: 1.29.6;
acpi_md_ncpus(): use cpu_attached instead of cpus_running.
 1.28 13-Jan-2011  jruoho Move the function that counts the CPUs from acpicpu(4) to the MD layer.
 1.27 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.26 14-Apr-2010  jruoho UINT32 -> uint32_t; UINT8 -> uint8_t.
 1.25 18-Aug-2009  jmcneill branches: 1.25.2; 1.25.4;
Switch to ACPICA 20090730, and update for API changes.
 1.24 14-Mar-2009  jmcneill Add acpi_md_OsEnableInterrupt, to go with acpi_md_OsDisableInterrupt
 1.23 18-Dec-2008  cegger branches: 1.23.2;
remove unused malloc.h
 1.22 03-Jul-2008  drochner branches: 1.22.4;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.21 30-May-2008  ad branches: 1.21.2;
Add a 'known_mpsafe' argument to intr_establish().
 1.20 17-Dec-2007  joerg branches: 1.20.6; 1.20.8; 1.20.10; 1.20.12;
Don't call acpi_md_sleep_init on Xen, it doesn't support ACPI sleep
anyway.
 1.19 15-Dec-2007  joerg Move mapping of the real mode location for the ACPI wakeup code into a
separate function called from acpi_md_callback.
 1.18 09-Dec-2007  jmcneill branches: 1.18.2;
Merge jmcneill-pm branch.
 1.17 17-Oct-2007  garbled branches: 1.17.4; 1.17.6;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.16 06-Oct-2007  joerg branches: 1.16.2;
Merge from mpacpi.h 1.4.32.1, acpi_machdep.c 1.13.22.5 and
mpacpi.c 1.48.12.2 from jmcneill-pm:

Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.

Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.15 30-Sep-2007  joerg The ACPI SCI override is relative to the Source of the MADT entry,
so use bus_pin for the comparision. Tested by reinoud@.
 1.14 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.13 15-Feb-2007  ad branches: 1.13.6; 1.13.14; 1.13.22; 1.13.24; 1.13.26;
Don't establish an interrupt handler at IPL_VM, use IPL_TTY instead.
 1.12 12-Aug-2006  christos branches: 1.12.6; 1.12.8;
Revert previous borken change.
 1.11 12-Aug-2006  fvdl Make sure to override the trigger variable with IST_LEVEL, as well
as the ioapic flags, in the case of an ACPI interrupt without
override.
 1.10 04-Jul-2006  christos Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.9 16-Feb-2006  kochi branches: 1.9.2; 1.9.10;
define acpi_intr_deferq as static
 1.8 11-Dec-2005  christos branches: 1.8.2; 1.8.4; 1.8.6;
merge ktrace-lwp.
 1.7 02-May-2005  kochi branches: 1.7.2;
Merge changes for ACPI-CA 20050408.
 1.6 10-Apr-2004  kochi whitespace nit
 1.5 10-Oct-2003  tron branches: 1.5.2;
Fix build problem when MPACPI is used without ioapic as proposed by
Quentin Garnier on current-users@NetBSD.org.
 1.4 06-Sep-2003  fvdl When establishing the ACPI SCI, make sure it's always active low (as well
as level-triggered). Do this by changing the MP config entry that was
set up for the interrupt. Do not change anything if there was an ACPI
interrupt source override, assume that this contains the correct
information already.
 1.3 02-Sep-2003  fvdl Compile in the !MPACPI && NIOAPIC case.
 1.2 27-Aug-2003  itojun variable 'sc' needed in MPACPI case (what should we do about NIOAPIC?)
 1.1 11-May-2003  fvdl branches: 1.1.2;
Moved here from sys/arch/i386/i386
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 01-Jun-2004  jmc Pullup rev 1.6 (requested by kochi in ticket #427)

Lots of fixes to prevents panic's on HT motherboards
 1.7.2.5 21-Jan-2008  yamt sync with head
 1.7.2.4 27-Oct-2007  yamt sync with head.
 1.7.2.3 26-Feb-2007  yamt sync with head.
 1.7.2.2 30-Dec-2006  yamt sync with head.
 1.7.2.1 21-Jun-2006  yamt sync with head.
 1.8.6.1 22-Apr-2006  simonb Sync with head.
 1.8.4.1 09-Sep-2006  rpaulo sync with head
 1.8.2.1 18-Feb-2006  yamt sync with head.
 1.9.10.1 13-Jul-2006  gdamore Merge from HEAD.
 1.9.2.2 03-Sep-2006  yamt sync with head.
 1.9.2.1 11-Aug-2006  yamt sync with head
 1.12.8.2 29-Oct-2007  wrstuden Catch up with 4.0 RC3
 1.12.8.1 01-Oct-2007  wrstuden Catch up with netbsd-4-9-RC2
 1.12.6.2 14-Oct-2007  xtraeme Pull up following revision(s) (requested by joerg in ticket #925):
sys/arch/x86/x86/mpacpi.c: revision 1.50
sys/arch/x86/include/mpacpi.h: revision 1.5
sys/arch/x86/x86/acpi_machdep.c: revision 1.16

Merge from mpacpi.h 1.4.32.1, acpi_machdep.c 1.13.22.5 and
mpacpi.c 1.48.12.2 from jmcneill-pm:

Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.
Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.12.6.1 30-Sep-2007  xtraeme Pull up following revision(s) (requested by joerg in ticket #912):
sys/arch/x86/x86/acpi_machdep.c: revision 1.15

The ACPI SCI override is relative to the Source of the MADT entry,
so use bus_pin for the comparision. Tested by reinoud@.
 1.13.26.2 14-Oct-2007  yamt sync with head.
 1.13.26.1 06-Oct-2007  yamt sync with head.
 1.13.24.2 09-Jan-2008  matt sync with HEAD
 1.13.24.1 06-Nov-2007  matt sync with HEAD
 1.13.22.7 07-Oct-2007  joerg Sync with HEAD.
 1.13.22.6 02-Oct-2007  jmcneill Update to ACPI-CA 20070320
 1.13.22.5 02-Oct-2007  joerg Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.

Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.13.22.4 02-Oct-2007  joerg Sync with HEAD.
 1.13.22.3 23-Aug-2007  joerg From FreeBSD: explicitly load regions first to allow acpi_md_callback
to actually query the routing tables.

Drop the argment to acpi_md_callback, passing around singletons is not
that helpful.
 1.13.22.2 14-Aug-2007  joerg Call the MD ACPI callback before enabling ACPI. This allows
the MD layer to scan the MADT table and in turn allows the
retiring of defering the interrupt setup.
 1.13.22.1 14-Aug-2007  joerg Don't check for NACPI > 0, this file is only built for acpi.
 1.13.14.2 16-Oct-2007  garbled Sync with HEAD
 1.13.14.1 03-Oct-2007  garbled Sync with HEAD
 1.13.6.1 09-Oct-2007  ad Sync with head.
 1.16.2.1 17-Oct-2007  bouyer amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by
Mathieu Ropert in 2006.
DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to
sysinst if you're lucky. Often it panics because a runable LWP has
a NULL stack (really, it's all of l->l_addr which is has been zeroed out
while the process was on the queue !)
TODO:
- bug fixes :)
- Most of the xpq_* functions should be shared with xen/i386
- The xen/i386 assembly bootstrap code should be remplaced with the C
version in xenamd64/amd64/xpmap.c
- see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
 1.17.6.1 11-Dec-2007  yamt sync with head.
 1.17.4.1 26-Dec-2007  ad Sync with head.
 1.18.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.20.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.20.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.20.10.3 11-Aug-2010  yamt sync with head.
 1.20.10.2 19-Aug-2009  yamt sync with head.
 1.20.10.1 04-May-2009  yamt sync with head.
 1.20.8.1 04-Jun-2008  yamt sync with head
 1.20.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.20.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.20.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.21.2.1 03-Jul-2008  simonb Sync with head.
 1.22.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.22.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.23.2.5 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.23.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.23.2.3 24-Oct-2010  jym Sync with HEAD
 1.23.2.2 01-Nov-2009  jym Sync with HEAD.
 1.23.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.4.2 05-Mar-2011  rmind sync with head
 1.25.4.1 30-May-2010  rmind sync with head
 1.25.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.29.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.10 14-Jun-2019  msaitoh - Dump LAPIC and I/O APIC correctly.
- Don't print redirect target on LAPIC.
- Fix DEST_MASK:
- DEST_MASK is not 1 bit but 2 bit.
- Add missing "\0"s to print decoded name correctly.
- Support both LAPIC and I/O APIC correctly in apic_format_redir().
- Improve output of some bits using with snprintb()'s "F\B\1" and ":\V".
 1.9 13-Jun-2019  msaitoh Whitespace fix. No functional change.
 1.8 16-Dec-2008  christos branches: 1.8.66;
replace bitmask_snprintf(9) with snprintb(3)
 1.7 28-Apr-2008  martin branches: 1.7.8;
Remove clause 3 and 4 from TNF licenses
 1.6 16-Apr-2008  cegger branches: 1.6.2; 1.6.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.5 04-Jan-2008  ad branches: 1.5.6;
sys/lock.h isn't needed here.
 1.4 11-Dec-2005  christos branches: 1.4.50; 1.4.56; 1.4.64;
merge ktrace-lwp.
 1.3 29-May-2005  christos branches: 1.3.2;
Sprinkle const.
 1.2 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.1 26-Feb-2003  fvdl branches: 1.1.2;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.2.1 21-Jan-2008  yamt sync with head
 1.4.64.1 08-Jan-2008  bouyer Sync with HEAD
 1.4.56.1 18-Feb-2008  mjf Sync with HEAD.
 1.4.50.1 09-Jan-2008  matt sync with HEAD
 1.5.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.5.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.4.2 04-May-2009  yamt sync with head.
 1.6.4.1 16-May-2008  yamt sync with head.
 1.6.2.1 18-May-2008  yamt sync with head.
 1.7.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.8.66.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7 24-Jul-2021  jmcneill Build fix: vtophys takes vaddr_t, not a ptr
 1.6 24-Jul-2021  jmcneill smbios: Add character device for accessing SMBIOS tables

The /dev/smbios character device gives an aperture into physical memory
that allows read-only access to the SMBIOS header and tables.
 1.5 21-Jul-2021  jmcneill Separate MI smbios interface from MD specific code.
 1.4 27-Dec-2019  msaitoh branches: 1.4.12;
s/sucess/success/ in comment.
 1.3 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.2 15-Aug-2017  maxv branches: 1.2.2; 1.2.4; 1.2.8;
style
 1.1 15-Aug-2017  maxv Merge into x86/.
 1.2.8.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2.8.1 10-Jun-2019  christos Sync with HEAD
 1.2.4.2 03-Dec-2017  jdolecek update from HEAD
 1.2.4.1 15-Aug-2017  jdolecek file bios32.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.2.2.2 28-Aug-2017  skrll Sync with HEAD
 1.2.2.1 15-Aug-2017  skrll file bios32.c was added on branch nick-nhusb on 2017-08-28 17:51:56 +0000
 1.4.12.1 01-Aug-2021  thorpej Sync with HEAD.
 1.92 04-Aug-2025  skrll to to -> to in a comment
 1.91 04-Jun-2024  riastradh x86: Teach bus_dmamem_map about BUS_DMA_PREFETCHABLE.

PR port-amd64/58308
 1.90 28-Mar-2023  riastradh x86/bus_dma.c: Sprinkle KASSERTMSG.
 1.89 20-Aug-2022  riastradh branches: 1.89.4;
x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.88 13-Aug-2022  skrll Fix an inverted KASSERTMSG test from the #ifdef DIAGNOSTIC panic -> KASSERT
conversion.
 1.87 12-Aug-2022  riastradh x86/bus_dma: #ifdef DIAGNOSTIC panic -> KASSERT

While here, use some better types and avoid integer overflow in the
diagnostic tests.

No functional change intended except in the case of bugs anyway.
 1.86 12-Aug-2022  riastradh x86: Adjust fences issued in bus_dmamap_sync after bouncing.

And expand the comment on the lfence for POSTREAD before bouncing.

Net change:

op before bounce after bounce
old new
PREREAD nop lfence sfence
PREWRITE nop mfence sfence
PREREAD|PREWRITE nop mfence sfence
POSTREAD lfence lfence nop[*]
POSTWRITE nop mfence nop
POSTREAD|POSTWRITE lfence mfence nop[*]

The case of PREREAD is as follows:

1. loads and stores before DMA buffer may be allocated for the purpose
2. bus_dmamap_sync(BUS_DMASYNC_PREREAD)
3. store to register or DMA descriptor to trigger DMA

The register or DMA descriptor may be in any type of memory (or I/O).

lfence at (2) is _not enough_ to ensure stores at (1) have completed
before the store in (3) in case the register or DMA descriptor lives
in wc/wc+ memory, or the store to it is non-temporal: in that case,
it may executed early before all the stores in (1) have completed.

On the other hand, lfence at (2) is _not needed_ to ensure loads in
(1) have completed before the store in (3), because x86 never
reorders load;store to store;load. So we may need to enforce
store/store ordering, but not any other ordering, hence sfence.

The case of PREWRITE is as follows:

1. stores to DMA buffer (and loads from it, before allocated)
2. bus_dmamap_sync(BUS_DMASYNC_PREWRITE)
3. store to register or DMA descriptor to trigger DMA

Ensuring prior loads have completed is not necessary because x86
never reorders load;store to store;load (and in any case, the device
isn't changing the DMA buffer, so it's safe to read over and over
again). But we must ensure the stores in (1) have completed before
the store in (3). So we need sfence, in case either the DMA buffer
or the register or the DMA descriptor is in wc/wc+ memory or either
store is non-temporal. But we don't need mfence.

The case of POSTREAD is as follows:

1. load from register or DMA descriptor notifying DMA completion
2. bus_dmamap_sync(BUS_DMASYNC_POSTREAD)
(a) lfence [*]
(b) if bouncing, memcpy(userbuf, bouncebuf, ...)
(c) ???
3. loads from DMA buffer to use data, and stores to reuse buffer

This certainly needs an lfence to prevent the loads at (3) from being
executed early -- but bus_dmamap_sync already issues lfence in that
case at 2(a), before it conditionally loads from the bounce buffer
into the user's buffer. So we don't need any _additional_ fence
_after_ bouncing at 2(c).

The case of POSTWRITE is as follows:

1. load from register or DMA descriptor notifying DMA completion
2. bus_dmamap_sync(BUS_DMASYNC_POSTWRITE)
3. loads and stores to reuse buffer

Stores at (3) will never be executed early because x86 never reorders
load;store to store;load for any memory types. Loads at (3) are
harmless because the device isn't changing the buffer -- it's
supposed to be fixed from the time of PREWRITE to the time of
POSTWRITE as far as the CPU can witness.

Proposed on port-amd64 last month:

https://mail-index.netbsd.org/port-amd64/2022/07/16/msg003593.html

Reference:

AMD64 Architecture Programmer's Manual, Volume 2: System Programming,
24593--Rev. 3.38--November 2021, Sec. 7.4.2 Memory Barrier Interaction
with Memory Types, Table 7-3, p. 196.
https://www.amd.com/system/files/TechDocs/24593.pdf
 1.85 13-Jul-2022  riastradh x86: Move lfence into _bus_dmamap_sync and comment it.

No functional change intended. This just keeps the bus_dma_* and
_bus_dma_* functions organized more consistently.
 1.84 22-Jan-2022  skrll Ensure bus_dmatag_subregion is called with an inclusive max_addr
everywhere.
 1.83 07-Oct-2021  msaitoh KNF. No functional change.
 1.82 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.81 14-Nov-2019  maxv Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.80 04-Oct-2019  maxv Add DMA instrumentation in KASAN. We note the original buffer and length in
the map, and check the buffer on each bus_dmamap_sync. This allows us to
find DMA buffer overflows and UAFs, which couldn't be found before because
the device accesses to memory are outside of KASAN's control.
 1.79 14-Jun-2019  mrg KASSERT() -> KASSERTMSG() message in _bus_dmamem_alloc_range().
 1.78 21-Apr-2019  maxv Rename the PTE bits.
 1.77 31-Jul-2017  jdolecek branches: 1.77.4;
modify code handling mismatch of nsegs in _bus_dmamem_alloc_range() to
a KASSERT() - plain return leaks memory, and this condition should
never trigger unless there is bug in uvm_pglistalloc(), so it seems
to be waste to check this

other ports usually simply do not check this, with exception of arm,
which does even full cleanup (checks it and calls uvm_pglistfree())
 1.76 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.75 05-Jan-2017  msaitoh Update the dmamp argument only when the allocation succeeded.
 1.74 27-Oct-2015  christos branches: 1.74.2;
fix operator precedence.
 1.73 27-Oct-2015  christos make sure we have a cookie before we try to clear it.
 1.72 27-Oct-2015  christos - If we succeeded allocating a buffer that did not need bouncing before, but
the buffer in the previous mapping did, clear the bounce bit. Fixes the
ld_virtio.c bug with machines 8GB and dd if=/dev/zero of=crash bs=1g count=4.
- Allocate with M_ZERO instead of doing memset
- The panic string can take a format, use it.
- When checking for the bounce buffer boundary check addr + len < limit, not
addr < limit.
 1.71 24-Dec-2013  christos branches: 1.71.4; 1.71.6; 1.71.8;
use __func__
 1.70 02-Jul-2013  christos make a diagnostic message more informative.
 1.69 08-Dec-2012  kiyohara branches: 1.69.2;
#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.
 1.68 14-Oct-2011  bouyer branches: 1.68.2; 1.68.8; 1.68.12; 1.68.14; 1.68.16;
Both bdt_ov->ov_dmamap_sync() and bus_dmamap_sync() return void,
so don't write return bdt_ov->ov_dmamap_sync(). Pointed out by njoly@
 1.67 28-Sep-2011  dyoung Cosmetic: join some if-statements, remove superfluous parentheses. No
change in the generated assembly.
 1.66 28-Sep-2011  dyoung After bouncing in bus_dmamap_load{,_mbuf,_uio}, call bus_dmamap_load(9)
instead of _bus_dmamap_load() so that a bus_dmamap_load(9) override has
a shot at loading the map.

XXX Perhaps bounce buffers should be rewritten in terms of bus_dma(9)
XXX overrides.
 1.65 28-Sep-2011  dyoung In bus_dma_tag_create(9), copy important properties (e.g., bounce
parameters) from the parent tag.

In bus_dma_tag_create(), increase the reference count on a parent
bus_dma_tag_t (if applicable), and decrease the reference count in
bus_dma_tag_destroy().

Don't let bus_dmatag_destroy(9) destroy an overridden bus_dma_tag_t.
 1.64 28-Sep-2011  dyoung Add an untested implementation of bus_dmamap_load_raw(9).
 1.63 27-Sep-2011  dyoung Instead of declaring _bus_dmamap_load_busaddr() static inline, make
it static and let the compiler decide about inlining. This reduces
the code size on both amd64 and i386, and the smaller code is probably
faster code.
 1.62 27-Sep-2011  dyoung In _bus_dmamap_load_busaddr(), change sgsize from an int to a bus_size_t.
 1.61 27-Sep-2011  dyoung Make the 'size' argument of _bus_dmamap_load_busaddr() a bus_size_t for
consistency's sake.
 1.60 13-Sep-2011  dyoung For consistency, call a bus_dma_tag_t bdt instead of bst. No functional
change intended.
 1.59 01-Sep-2011  christos Add bus_dma overrides. From dyoung
 1.58 25-Jul-2011  dyoung In _bus_dmamap_load_busaddr(), just return 0 instead of assigning an
intermediate variable (int error = 0;) and returning that (return
error;).
 1.57 01-Jul-2011  dyoung #include <sys/bus.h> instead of <machine/bus.h>.
 1.56 06-Nov-2010  uebayasi Machine dependent code is considered as part of UVM. Include
internal API header.
 1.55 24-Sep-2010  jakllsch fix copy/paste/modify error in a diagnostic panic message
 1.54 22-Mar-2010  bouyer bus_dmamem_alloc() may not get a boundary smaller than size, but
it's perfectly valid for bus_dmamap_create() to do so (a contigous
transfers will then split in multiple segment).
Fix _xen_bus_dmamem_alloc_range() and _bus_dmamem_alloc_range() to
allow a boundary limit smaller than size:
- compute appropriate boundary for uvm_pglistalloc(), wich doesn't
accept boundary < size
- also take care of boundary when deciding to start a new segment.
While there, remove useless boundary argument to _xen_alloc_contig().
Fix the boundary-related issue of PR port-amd64/42980
 1.53 26-Feb-2010  jym branches: 1.53.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
 1.52 06-Nov-2009  dsl branches: 1.52.2;
Don't call _bus_dmamem_free() when _bus_dmamem_alloc() fails.
Fixes PR/42208
 1.51 21-Apr-2009  cegger change pmap flags argument from int to u_int.
discussed with christos@ on source-changes-d@
 1.50 18-Apr-2009  cegger Introduce PMAP_NOCACHE as first PMAP MD bit in x86. Make use of it in pmap_enter().
This safes one extra TLB flush when mapping dma-safe memory.
Presented on tech-kern@, port-i386@ and port-amd64@
ok ad@
 1.49 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.48 20-Feb-2009  cegger backout rev. 1.47.
per request from dyoung@ and cube@
 1.47 19-Feb-2009  cegger bus_dmamap_create(): on failure, reset dmamp or drivers
like nfe(4) try to call bus_dmamap_destroy() on an invalid dmamap in their error path.
 1.46 15-Nov-2008  skrll branches: 1.46.4;
Typo in comment.
 1.45 28-Jun-2008  bouyer branches: 1.45.2; 1.45.4; 1.45.6;
port-i386/38935: Add appropriate x86_lfence or x86_mfence calls to
bus_dmamap_sync(), depending on the BUS_DMASYNC flag. This makes sure
that the kernel reads data from main memory in the intended order.
 1.44 13-Jun-2008  bjs "functin" -> "function" (no "functional" change, har har)
 1.43 04-Jun-2008  ad branches: 1.43.2;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.42 28-Apr-2008  martin branches: 1.42.2;
Remove clause 3 and 4 from TNF licenses
 1.41 27-Apr-2008  ad branches: 1.41.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.
 1.40 28-Nov-2007  ad branches: 1.40.14; 1.40.16;
Remove remaining CPUCLASS_386 tests.
 1.39 17-Oct-2007  garbled branches: 1.39.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.38 12-Oct-2007  ad branches: 1.38.2;
crit_enter/crit_exit are now available.
 1.37 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.36 29-Aug-2007  ad branches: 1.36.2;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.35 04-Mar-2007  christos branches: 1.35.2; 1.35.10; 1.35.14; 1.35.18; 1.35.20;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.34 21-Feb-2007  mrg add a pair of new bus_dma(9) functions:
int _bus_dmatag_subregion(bus_dma_tag_t tag,
bus_addr_t min_addr,
bus_addr_t max_addr,
bus_dma_tag_t *newtag,
int flags)
void _bus_dmatag_destroy(bus_dma_tag_t tag)

that allow a (normally broken/limited) device to restrict the bus address
range it can talk to. this is used by bce(4) to limit DMA addresses to
1GB range, the maximum the chip can address.

all this is from Yorick Hardy <yhardy@uj.ac.za> with input from several
people on tech-kern.

XXX: bus_dma(9) needs an update still.
 1.33 09-Feb-2007  ad branches: 1.33.2;
Merge newlock2 to head.
 1.32 16-Nov-2006  christos branches: 1.32.2; 1.32.4;
__unused removal on arguments; approved by core.
 1.31 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.30 28-Aug-2006  bouyer branches: 1.30.2; 1.30.4;
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
 1.29 01-Mar-2006  yamt branches: 1.29.2; 1.29.12;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.28 14-Jan-2006  christos branches: 1.28.2; 1.28.4;
Protect against uio_lwp being NULL from Pavel Cahyna
 1.27 24-Dec-2005  perry branches: 1.27.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.26 11-Dec-2005  christos merge ktrace-lwp.
 1.25 24-Nov-2005  yamt bus_dmamem_map: honour BUS_DMA_NOWAIT. noted by Manuel Bouyer.
bus_space_map: always do NOWAIT allocation as it used to be before yamt-km.

we have too many copies!
 1.24 20-Sep-2005  thorpej branches: 1.24.6;
Turn bounce buffer stats into evcnts and enable them by default.
 1.23 22-Aug-2005  bouyer Rename _PRIVATE_BUS_DMAMEM_ALLOC_RANGE to _BUS_DMAMEM_ALLOC_RANGE for
consistency with other macros defined in bus_private.h. Pointed out by
YAMAMOTO Takashi.
 1.22 20-Aug-2005  bouyer More adjustements to deal with Xen's physical <=> machine addresses mappings:
- Allow _bus_dmamem_alloc_range to be provided from external source:
Use a _PRIVATE_BUS_DMAMEM_ALLOC_RANGE macro, defined to
_bus_dmamem_alloc_range by default.
- avail_end is the end of the physical address range. Define a macro
_BUS_AVAIL_END (defined by default to avail_end) and use it instead.
 1.21 16-Apr-2005  yamt branches: 1.21.2;
tweak x86 bus_dma code so that it can be used by xen port.

- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.20 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.19 09-Mar-2005  matt branches: 1.19.2;
Add a dm_maxsegsz public member to bus_dmamap_t. This allows a user of the API
to select the maximum segment size for each bus_dmamap_load (up to the maxsegsz
supplied to bus_dmamap_create). dm_maxsegsz is reset to the value supplied to
bus_dmamap_create when the dmamap is unloaded.
 1.18 19-Feb-2005  jdolecek g/c obsolete comment for _bus_dmamap_load_buffer()
 1.17 20-Jun-2004  thorpej branches: 1.17.4; 1.17.6;
Remove the "ID" component of the x86 bus_dma flags, since these are no
longer "ISA DMA" specific flags.
 1.16 12-Jun-2004  yamt remove XXX comments which are no longer true.
 1.15 12-Jun-2004  yamt ANSIfy.
 1.14 12-Jun-2004  yamt simplify x86 bus_dma implementation.
(rather than passing &lastaddr and &seg around,
use and update bus_dmamap_t directly.)
 1.13 12-Jun-2004  yamt - introduce _bus_dmamap_load_paddr, which takes (paddr, size) and
add the range to the map, and use it for _bus_dmamap_load_{buffer,mbuf}.
- _bus_dmamap_load_mbuf: in the case of M_EXT_PAGES, deal with vm_pages
directly rather than doing pmap_extract on given kva.

as a side effect, do a segment coalescing and boundary checks for mbufs.

ok'ed by Frank van der Linden and Jason Thorpe on tech-kern@.
 1.12 12-Jun-2004  yamt simplify x86 bus_dma internal "load" functions.
(by eliminating a variable "first" and using seg == -1 instead.)
 1.11 05-Jun-2004  yamt unexport following x86 bus_dma internal functions.
_bus_dma_alloc_bouncebuf
_bus_dma_free_bouncebuf
_bus_dmamap_load_buffer
 1.10 11-May-2004  yamt _bus_dmamap_load_mbuf: check bounce_thresh in the case when we have paddr hint.
 1.9 28-Oct-2003  mycroft In _bus_dma_uiomove():
* Don't punt after the first iov in the UIO_SYSSPACE case. Not that this ever
happens in practice right now.
* If we get through the loop, error==0 by definition, so just return 0.
* Eliminate bogus initializer.
 1.8 25-Oct-2003  christos Fix uninitialized variable warning
 1.7 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.6 29-Jun-2003  fvdl branches: 1.6.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.5 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.4 11-Jun-2003  fvdl Avoid bad free() calls for failed allocations. From enami.
 1.3 07-May-2003  fvdl Generalize bounce buffers, and use them for 32 bit PCI if needed.
Make ALLOCNOW the default iff bouncing might be needed (this has
no effect on i386 because ISA DMA devices already had to use
ALLOCNOW, and PCI isn't bounced (yet), since we don't do > 4G
at this point for i386.
 1.2 09-Apr-2003  thorpej Use cached physical addresses for mbufs and clusters to save having
to extract the physical address from the virtual.

On the ARM, also use the "read-only at MMU" indication to avoid a
redundant cache clean operation.

Other platforms should use these two as examples of how to use these
new pool/mbuf features to improve network performance. Note this requires
a platform to provide a working POOL_VTOPHYS().

Part 3 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
 1.1 12-Mar-2003  thorpej Split bus_space and bus_dma into separate files.
 1.6.2.8 11-Dec-2005  christos Sync with head.
 1.6.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.6.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.6.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.6.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.6.2.2 03-Aug-2004  skrll Sync with HEAD
 1.6.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.17.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.17.6.1 25-Jan-2005  yamt - convert i386 to new apis.
- remove a pmap bootstrap kludge, which is no longer needed.
 1.17.4.1 29-Apr-2005  kent sync with -current
 1.19.2.5 28-Sep-2008  jdc Pull up revisions:
sys/arch/amd64/include/cpufunc.h patch
sys/arch/i386/include/cpufunc.h patch
sys/arch/x86/x86/bus_dma.c 1.45 via patch
requested by bouyer in ticket 1945.
 1.19.2.4 16-Sep-2006  ghen Pull up following revision(s) (requested by bouyer in ticket #1510):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.7
sys/arch/xen/x86/xen_bus_dma.c: revision 1.8
sys/arch/x86/include/bus_private.h: revision 1.6
sys/arch/x86/x86/bus_dma.c: revision 1.30
sys/arch/xen/include/bus_private.h: revision 1.7
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.19.2.3 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #697):
sys/arch/x86/x86/bus_dma.c: revision 1.23
sys/arch/x86/include/bus_private.h: revision 1.3
sys/arch/xen/include/bus_private.h: revision 1.3
Rename _PRIVATE_BUS_DMAMEM_ALLOC_RANGE to _BUS_DMAMEM_ALLOC_RANGE for
consistency with other macros defined in bus_private.h. Pointed out by
YAMAMOTO Takashi.
 1.19.2.2 25-Aug-2005  tron Pull up following revision(s) (requested by bouyer in ticket #695):
sys/arch/x86/x86/bus_dma.c: revision 1.22
sys/arch/x86/include/bus_private.h: revision 1.2
More adjustements to deal with Xen's physical <=> machine addresses mappings:
- Allow _bus_dmamem_alloc_range to be provided from external source:
Use a _PRIVATE_BUS_DMAMEM_ALLOC_RANGE macro, defined to
_bus_dmamem_alloc_range by default.
- avail_end is the end of the physical address range. Define a macro
_BUS_AVAIL_END (defined by default to avail_end) and use it instead.
 1.19.2.1 21-Apr-2005  tron Pull up revision 1.21 (requested by yamt in ticket #175):
tweak x86 bus_dma code so that it can be used by xen port.
- distinguish paddr_t and bus_addr_t.
for xen, use bus_addr_t in the sense of machine address.
- move _X86_BUS_DMA_PRIVATE part of bus.h into bus_private.h.
- remove special handling of xen_shm. we can always grab
machine address from pte.
 1.21.2.6 07-Dec-2007  yamt sync with head
 1.21.2.5 27-Oct-2007  yamt sync with head.
 1.21.2.4 03-Sep-2007  yamt sync with head.
 1.21.2.3 26-Feb-2007  yamt sync with head.
 1.21.2.2 30-Dec-2006  yamt sync with head.
 1.21.2.1 21-Jun-2006  yamt sync with head.
 1.24.6.1 29-Nov-2005  yamt sync with head.
 1.27.2.2 07-Feb-2006  yamt adapt x86 bus_dma.
 1.27.2.1 15-Jan-2006  yamt sync with head.
 1.28.4.1 22-Apr-2006  simonb Sync with head.
 1.28.2.1 09-Sep-2006  rpaulo sync with head
 1.29.12.1 14-Sep-2006  riz Pull up following revision(s) (requested by bouyer in ticket #150):
sys/arch/xen/x86/xen_bus_dma.c: revision 1.7
sys/arch/xen/x86/xen_bus_dma.c: revision 1.8
sys/arch/x86/include/bus_private.h: revision 1.6
sys/arch/x86/x86/bus_dma.c: revision 1.30
sys/arch/xen/include/bus_private.h: revision 1.7
Some bus_dma(9) fixes for Xen:
- Attempt to gracefully recover from a failed decrease_reservation or
increase_reservation, by avoiding physical memory loss.
- always store a machine address in ds_addr; this avoids some mistakes
where machine address would in some case be freed at physical address, or
mapped as physical address.
Wrap some printfs in #ifdef DEBUG, as we should not leak memory any more when
bus_dma memory allocation fails.
 1.29.2.1 03-Sep-2006  yamt sync with head.
 1.30.4.2 10-Dec-2006  yamt sync with head.
 1.30.4.1 22-Oct-2006  yamt sync with head
 1.30.2.2 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.30.2.1 18-Nov-2006  ad Sync with head.
 1.32.4.1 04-Sep-2008  skrll Sync with netbsd-4.
 1.32.2.1 31-Aug-2008  jdc Pull up revision 1.45 (requested by bouyer in ticket #1165).

port-i386/38935: Add appropriate x86_lfence or x86_mfence calls to
bus_dmamap_sync(), depending on the BUS_DMASYNC flag. This makes sure
that the kernel reads data from main memory in the intended order.
 1.33.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.33.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.35.20.2 09-Jan-2008  matt sync with HEAD
 1.35.20.1 06-Nov-2007  matt sync with HEAD
 1.35.18.4 03-Dec-2007  joerg Sync with HEAD.
 1.35.18.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.35.18.2 02-Oct-2007  joerg Sync with HEAD.
 1.35.18.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.35.14.1 03-Sep-2007  skrll Sync with HEAD.
 1.35.10.2 16-Oct-2007  garbled Sync with HEAD
 1.35.10.1 03-Oct-2007  garbled Sync with HEAD
 1.35.2.7 03-Dec-2007  ad Sync with HEAD.
 1.35.2.6 09-Oct-2007  ad Sync with head.
 1.35.2.5 09-Oct-2007  ad Sync with head.
 1.35.2.4 23-Aug-2007  ad - Use pmap_pte_setbits, pmap_pte_clearbits.
- Shoot down a range instead of single pages in bus_space.c.
 1.35.2.3 21-Aug-2007  ad amd64 changes, as yet untested:

- Adapt to vmlocking branch.
- Apply TLB shootdown and pv allocation changes to the pmap.
- Make it build.
 1.35.2.2 21-Aug-2007  ad - Update PTEs atomically. Only shoot down if the mappings were in use.
- _bus_dmamem_unmap: don't shootdown, pmap_remove() will do it for us.
- Assume that pmap_update() will wait for the TLB shootdown to complete.
 1.35.2.1 29-Jul-2007  ad - When zeroing/copying pages, use SSE2 movtni to avoid polluting the cache.
- By default, align assembly routines on 32-byte starting boundaries.
- There are now 8 interrupt priority levels, half of which are softints.
Update intrdefs.h to match.
- Always clear/set spinlock words - removes lots of ifdefs.
- Remove the horrible ci_self150 hack that I introduced.
- Overhaul how TLB shootdown is performed. Inspired by a similar change in
OpenBSD but implemented quite differently. This should be a lot faster
but I have not benchmarked it yet.
 1.36.2.2 14-Oct-2007  yamt sync with head.
 1.36.2.1 06-Oct-2007  yamt sync with head.
 1.38.2.1 17-Oct-2007  bouyer amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by
Mathieu Ropert in 2006.
DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to
sysinst if you're lucky. Often it panics because a runable LWP has
a NULL stack (really, it's all of l->l_addr which is has been zeroed out
while the process was on the queue !)
TODO:
- bug fixes :)
- Most of the xpq_* functions should be shared with xen/i386
- The xen/i386 assembly bootstrap code should be remplaced with the C
version in xenamd64/amd64/xpmap.c
- see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
 1.39.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.40.16.2 17-Jun-2008  yamt sync with head.
 1.40.16.1 18-May-2008  yamt sync with head.
 1.40.14.4 17-Jan-2009  mjf Sync with HEAD.
 1.40.14.3 29-Jun-2008  mjf Sync with HEAD.
 1.40.14.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.40.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.41.2.5 09-Oct-2010  yamt sync with head
 1.41.2.4 11-Aug-2010  yamt sync with head.
 1.41.2.3 11-Mar-2010  yamt sync with head
 1.41.2.2 04-May-2009  yamt sync with head.
 1.41.2.1 16-May-2008  yamt sync with head.
 1.42.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.42.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.43.2.2 03-Jul-2008  simonb Sync with head.
 1.43.2.1 18-Jun-2008  simonb Sync with head.
 1.45.6.1 19-Nov-2010  riz Pull up following revision(s) (requested by bouyer in ticket #1348):
sys/arch/x86/x86/bus_dma.c: revision 1.54
sys/arch/xen/x86/xen_bus_dma.c: revision 1.21
bus_dmamem_alloc() may not get a boundary smaller than size, but
it's perfectly valid for bus_dmamap_create() to do so (a contigous
transfers will then split in multiple segment).
Fix _xen_bus_dmamem_alloc_range() and _bus_dmamem_alloc_range() to
allow a boundary limit smaller than size:
- compute appropriate boundary for uvm_pglistalloc(), wich doesn't
accept boundary < size
- also take care of boundary when deciding to start a new segment.
While there, remove useless boundary argument to _xen_alloc_contig().
Fix the boundary-related issue of PR port-amd64/42980
 1.45.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.45.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.45.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.46.4.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.46.4.4 10-Jan-2011  jym Sync with HEAD
 1.46.4.3 24-Oct-2010  jym Sync with HEAD
 1.46.4.2 01-Nov-2009  jym Sync with HEAD.
 1.46.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.52.2.5 09-Nov-2010  uebayasi Sync with HEAD.
 1.52.2.4 09-Nov-2010  uebayasi Sync with HEAD.
 1.52.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.52.2.2 30-Apr-2010  uebayasi Sync with HEAD.
 1.52.2.1 28-Apr-2010  uebayasi Adjustment for uvm/uvm_page.h. More to follow later.
 1.53.2.2 05-Mar-2011  rmind sync with head
 1.53.2.1 30-May-2010  rmind sync with head
 1.68.16.1 15-Nov-2015  bouyer Pull up following revision(s) (requested by christos in ticket #1339):
sys/arch/x86/x86/bus_dma.c: revision 1.72
sys/arch/x86/x86/bus_dma.c: revision 1.73
sys/arch/x86/x86/bus_dma.c: revision 1.74
- If we succeeded allocating a buffer that did not need bouncing before, but
the buffer in the previous mapping did, clear the bounce bit. Fixes the
ld_virtio.c bug with machines 8GB and dd if=/dev/zero of=crash bs=1g count=4.
- Allocate with M_ZERO instead of doing memset
- The panic string can take a format, use it.
- When checking for the bounce buffer boundary check addr + len < limit, not
addr < limit.
make sure we have a cookie before we try to clear it.
fix operator precedence.
 1.68.14.1 15-Nov-2015  bouyer Pull up following revision(s) (requested by christos in ticket #1339):
sys/arch/x86/x86/bus_dma.c: revision 1.72
sys/arch/x86/x86/bus_dma.c: revision 1.73
sys/arch/x86/x86/bus_dma.c: revision 1.74
- If we succeeded allocating a buffer that did not need bouncing before, but
the buffer in the previous mapping did, clear the bounce bit. Fixes the
ld_virtio.c bug with machines 8GB and dd if=/dev/zero of=crash bs=1g count=4.
- Allocate with M_ZERO instead of doing memset
- The panic string can take a format, use it.
- When checking for the bounce buffer boundary check addr + len < limit, not
addr < limit.
make sure we have a cookie before we try to clear it.
fix operator precedence.
 1.68.12.3 03-Dec-2017  jdolecek update from HEAD
 1.68.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.68.12.1 25-Feb-2013  tls resync with head
 1.68.8.1 15-Nov-2015  bouyer Pull up following revision(s) (requested by christos in ticket #1339):
sys/arch/x86/x86/bus_dma.c: revision 1.72
sys/arch/x86/x86/bus_dma.c: revision 1.73
sys/arch/x86/x86/bus_dma.c: revision 1.74
- If we succeeded allocating a buffer that did not need bouncing before, but
the buffer in the previous mapping did, clear the bounce bit. Fixes the
ld_virtio.c bug with machines 8GB and dd if=/dev/zero of=crash bs=1g count=4.
- Allocate with M_ZERO instead of doing memset
- The panic string can take a format, use it.
- When checking for the bounce buffer boundary check addr + len < limit, not
addr < limit.
make sure we have a cookie before we try to clear it.
fix operator precedence.
 1.68.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.68.2.1 16-Jan-2013  yamt sync with (a bit old) head
 1.69.2.2 18-May-2014  rmind sync with head
 1.69.2.1 28-Aug-2013  rmind sync with head
 1.71.8.1 08-Nov-2015  riz Pull up following revision(s) (requested by christos in ticket #1011):
sys/arch/x86/x86/bus_dma.c: revision 1.72
sys/arch/x86/x86/bus_dma.c: revision 1.73
sys/arch/x86/x86/bus_dma.c: revision 1.74
- If we succeeded allocating a buffer that did not need bouncing before, but
the buffer in the previous mapping did, clear the bounce bit. Fixes the
ld_virtio.c bug with machines 8GB and dd if=/dev/zero of=crash bs=1g count=4.
- Allocate with M_ZERO instead of doing memset
- The panic string can take a format, use it.
- When checking for the bounce buffer boundary check addr + len < limit, not
addr < limit.
make sure we have a cookie before we try to clear it.
fix operator precedence.
 1.71.6.3 28-Aug-2017  skrll Sync with HEAD
 1.71.6.2 05-Feb-2017  skrll Sync with HEAD
 1.71.6.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.71.4.1 08-Nov-2015  riz Pull up following revision(s) (requested by christos in ticket #1011):
sys/arch/x86/x86/bus_dma.c: revision 1.72
sys/arch/x86/x86/bus_dma.c: revision 1.73
sys/arch/x86/x86/bus_dma.c: revision 1.74
- If we succeeded allocating a buffer that did not need bouncing before, but
the buffer in the previous mapping did, clear the bounce bit. Fixes the
ld_virtio.c bug with machines 8GB and dd if=/dev/zero of=crash bs=1g count=4.
- Allocate with M_ZERO instead of doing memset
- The panic string can take a format, use it.
- When checking for the bounce buffer boundary check addr + len < limit, not
addr < limit.
make sure we have a cookie before we try to clear it.
fix operator precedence.
 1.74.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.77.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.77.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.77.4.1 10-Jun-2019  christos Sync with HEAD
 1.89.4.2 04-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #928):

sys/external/bsd/drm2/dist/drm/drm_gem.c: revision 1.25
sys/external/bsd/drm2/dist/drm/radeon/radeon_ci_dpm.c: revision 1.7
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/device/priv.h: revision 1.4
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/device/nouveau_nvkm_engine_device_acpi.c: revision 1.4
sys/external/bsd/drm2/dist/drm/i915/display/intel_opregion.h: revision 1.6
sys/external/bsd/drm2/dist/drm/i915/i915_drv.h: revision 1.49
sys/external/bsd/drm2/include/linux/mxm-wmi.h: revision 1.1
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/subdev/pci/nouveau_nvkm_subdev_pci_pcie.c: revision 1.4
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/device/nouveau_nvkm_engine_device_base.c: revision 1.13
sys/external/bsd/common/include/linux/bitops.h: revision 1.17
sys/external/bsd/drm2/nouveau/files.nouveau: revision 1.40
sys/external/bsd/drm2/linux/linux_pci.c: revision 1.30
sys/external/bsd/drm2/dist/drm/radeon/radeon_drv.h: revision 1.5
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/subdev/pci/nouveau_nvkm_subdev_pci_pcie.c: revision 1.5
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/subdev/mxm/nouveau_nvkm_subdev_mxm_base.c: revision 1.5
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_gart.c: revision 1.12
sys/external/bsd/drm2/dist/drm/radeon/radeon_rv770.c: revision 1.3
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_sorgm200.c: revision 1.3
sys/external/bsd/common/include/linux/printk.h: revision 1.14
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/subdev/instmem/nouveau_nvkm_subdev_instmem_gk20a.c: revision 1.10
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_vi.c: revision 1.4
sys/external/bsd/drm2/include/linux/acpi.h: revision 1.11
sys/external/bsd/drm2/drm/drm_cdevsw.c: revision 1.31
sys/external/bsd/drm2/dist/drm/radeon/radeon_si.c: revision 1.6
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_acpi.c: revision 1.5
sys/external/bsd/drm2/dist/drm/i915/display/intel_acpi.h: revision 1.5
sys/external/bsd/drm2/include/acpi/video.h: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_evergreen.c: revision 1.6
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_acpi.h: revision 1.4
sys/arch/sparc64/include/pci_machdep.h: revision 1.31
sys/arch/sparc64/dev/pci_machdep.c: revision 1.83
sys/external/bsd/drm2/include/linux/kref.h: revision 1.14
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/device/nouveau_nvkm_engine_device_pci.c: revision 1.12
sys/external/bsd/drm2/linux/linux_dma_buf.c: revision 1.17
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/subdev/bios/nouveau_nvkm_subdev_bios_shadowacpi.c: revision 1.4
sys/external/bsd/drm2/drm/drm_module.c: revision 1.32
sys/external/bsd/drm2/dist/drm/i915/i915_gem.h: revision 1.8
sys/external/bsd/drm2/dist/drm/i915/gem/i915_gem_dmabuf.c: revision 1.7
sys/external/bsd/drm2/include/linux/smp.h: revision 1.5
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_si.c: revision 1.5
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_device.c: revision 1.20
sys/arch/x86/x86/bus_dma.c: revision 1.91
sys/external/bsd/drm2/radeon/files.radeon: revision 1.40
sys/external/bsd/drm2/include/acpi/acpi_bus.h: revision 1.1
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_drv.h: revision 1.5
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_device.c: revision 1.21
sys/external/bsd/common/include/asm/barrier.h: revision 1.20
sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h: revision 1.2
sys/external/bsd/common/include/asm/barrier.h: revision 1.21
sys/modules/drmkms/drmkms_pci.h: revision 1.1
sys/external/bsd/drm2/dist/drm/drm_dp_helper.c: revision 1.17
sys/external/bsd/drm2/radeon/radeon_pci.c: revision 1.23
sys/external/bsd/drm2/linux/linux_xa.c: revision 1.4
sys/external/bsd/drm2/ttm/ttm_bo_vm.c: revision 1.23
sys/external/bsd/drm2/ttm/ttm_bo_vm.c: revision 1.24
sys/external/bsd/drm2/ttm/ttm_bo_vm.c: revision 1.25
sys/external/bsd/drm2/linux/linux_pci.c: revision 1.26
sys/dev/pci/pcivar.h: revision 1.120
sys/arch/xen/include/pci_machdep.h: revision 1.24
sys/external/bsd/drm2/ttm/ttm_bo_vm.c: revision 1.26
sys/external/bsd/drm2/linux/linux_pci.c: revision 1.27
sys/external/bsd/drm2/ttm/ttm_bo_vm.c: revision 1.27
sys/external/bsd/drm2/linux/linux_pci.c: revision 1.28
sys/external/bsd/drm2/dist/drm/radeon/radeon_cik.c: revision 1.8
sys/external/bsd/drm2/ttm/ttm_bo_vm.c: revision 1.28
sys/external/bsd/drm2/linux/linux_pci.c: revision 1.29
sys/external/bsd/drm2/include/linux/pci.h: revision 1.57
sys/external/bsd/drm2/include/linux/pci.h: revision 1.58
sys/external/bsd/drm2/dist/drm/radeon/radeon_acpi.c: revision 1.5
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_display.c: revision 1.6
sys/external/bsd/drm2/dist/drm/amd/powerplay/hwmgr/amdgpu_hwmgr.c: revision 1.3
sys/external/bsd/drm2/dist/drm/amd/display/dc/core/amdgpu_dc_stream.c: revision 1.3
share/man/man9/bus_dma.9: revision 1.69
sys/external/bsd/drm2/drm/drm_gem_cma_helper.c: revision 1.15
sys/external/bsd/drm2/dist/drm/radeon/radeon_acpi.c: revision 1.6
sys/external/bsd/drm2/dist/drm/radeon/radeon.h: revision 1.12
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_cik.c: revision 1.7
sys/dev/acpi/acpi_mcfg.c: revision 1.29
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c: revision 1.6
sys/external/bsd/drm2/dist/drm/radeon/radeon_r600.c: revision 1.7
sys/external/bsd/drm2/dist/drm/radeon/radeon_bios.c: revision 1.13
sys/modules/amdgpu/Makefile: revision 1.9
sys/external/bsd/drm2/dist/drm/radeon/radeon_bios.c: revision 1.14
sys/external/bsd/common/linux/linux_tasklet.c: revision 1.12
sys/external/bsd/drm2/dist/drm/nouveau/include/nvkm/core/device.h: revision 1.10
sys/external/bsd/drm2/dist/drm/i915/gem/i915_gem_mman.c: revision 1.23
sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu.h: revision 1.9
sys/external/bsd/drm2/include/linux/interval_tree.h: revision 1.14
sys/external/bsd/drm2/dist/drm/i915/gem/i915_gem_mman.c: revision 1.26
sys/external/bsd/drm2/dist/drm/amd/powerplay/hwmgr/amdgpu_smu7_hwmgr.c: revision 1.5
sys/dev/pci/pci.c: revision 1.168
sys/external/bsd/drm2/dist/drm/i915/gem/i915_gem_mman.c: revision 1.27
sys/external/bsd/drm2/dist/drm/radeon/radeon_si_dpm.c: revision 1.9
sys/external/bsd/drm2/pci/files.drmkms_pci: revision 1.18
sys/external/bsd/drm2/linux/linux_sync_file.c: revision 1.3
sys/external/bsd/drm2/amdgpu/files.amdgpu: revision 1.31
sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/device/nouveau_nvkm_engine_device_tegra.c: revision 1.4
sys/external/bsd/drm2/dist/drm/drm_gem.c: revision 1.24
sys/arch/xen/xen/xpci_xenbus.c: revision 1.29

drm: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.
Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html

linux asm/barrier.h: Fix !MULTIPROCESSOR build.

remove "nouveau" from a comment. noted by jmcneill.

drm: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
comment a function that has a clear overbounds read but it isn't used.
found by GCC 12.

nix the NetBSD specific GEM_BUG_ON().
avoids GCC 12 warnings, and matches upstream closer.
avoid uninitialised variable usage in drm_gem_cma_create_internal().
in the case nothing has returned 'error', 'nsegs' and the dma info
are (potentially) uninitialised, so consider this an error.
found by GCC 12.

avoid a GCC 12 warning.
there's a 1-element long array and a loop conditional that tries to see
if indexes for it are not identical. as these indexes will always both
be 0, the only valid index, the condition is always false. GCC 12
triggers a strange warning on this code that can never run (see below),
so simply assert the array size is 1 and comment the rest.
amdgpu_dc_stream.c:470:55: error: array subscript [0, 0] is outside array bounds of 'struct dc_writeback_info[1]' [-Werror=array-bounds]
470 | stream->writeback_info[j] = stream->writeback_info[i];

convert a KASSERT() into an if () panic() sequence to appease GCC 12.
OK riastradh@.

drm: Fix conditionals around drmkms_pci and agp.
Kernel should build now with all pci drm drivers stripped out but
DRM_LEGACY still enabled. (Might not be very useful, but it'll
build. Maybe we should also have DRM_LEGACY_PCI so those drivers can
be modloaded later.)

drmkms: Fix module build.
avoid an unlikely array bounds issue picked up by GCC 12.
nvkm_pcie_speed() can return -1, which is then used as an array index,
so make this default return PCIe 1.0 speeds.

drm: enable almost all PCIe functionality
linux_pci.c revisions 1.24 and 1.25 implemented most of the remaining
missing PCIe backends, but only enabled them for some amdgpu portions.
this enables all code marked with "XXX amdgpu pcie", "XXX radeon pcie",
and "XXX pcie speed". for most of it, simply removing #ifndefs __NetBSD__
to enable compliation was required, once the new "bus->max_bus_speed"
member was added to struct pci_bus. add an "always fails" backend for
pci_enable_atomic_ops_to_root() which seems to only be necessary
for virtual GPU functionality (and could be implemented if needed.)
tested on radeon 5450, 7750, R7 240 [radeon], and RX 550 [amdgpu], and
nvidia 750 and 1030 [nouveau].
this still does not quite work on nvidia cards. there are two problems
that remain:
- the call to set the link speed is skipped because the speed is set
to the default value of "-1". nvkm_pcie_set_link() will actually
determine the right value for this and for some cards, calling this
function if the current speed is -1 helps set the link speed. it
may be that on linux other paths we don't have enabled properly
would set this (there's one via debugfs, and a jetson specific one,
though perhaps setting either AC or DC speed values as boot options
(after hooking up these for netbsd) would currently work.
- worse, cards newer than kepler - geforce 900, 1000, and newer, are
all lacking the backing support to set pcie link speed. the GT 1030
card i have been testing with remains at pcie 1.0.

radeon: fix and enable ACPI methods for getting ROM BIOS
The hacky way of getting the BIOS mapped only works on x86. ACPI
should be preferred if available. Makes BIOS reading though VFCT
work on aarch64 with EDK2. (But only if EDK2 has POSTed the GPU.)
XXX amdgpu should get the same treatment.

drm: put_cpu() should enable preemption, not disable it again

drm(4): make pr_debug equivalent to aprint_debug
significantly reduces the default spam from amdgpu(4).

drm: Set CONFIG_ACPI in linux/acpi.h and make it build.

Leave a little ACPI-related functionality disabled for now, like
getting EDID out of ACPI -- needs a bit more work to make this work,
and I don't have hardware to work on that.
Should help with failures of the forms:
- unable to locate a BIOS ROM
- bios: unable to locate usable image
on various machines.

radeon_acpi.c: ifdef out unused function on NetBSD.
Should fix syzkaller build.

drm(4): Fix st_rdev in stat.
dminor->index already has the 64*type adjustment, as allocated in
drm_minor_alloc.
PR kern/58180

linux_sync_file: Fix missing init/fini steps.
Noted by rjs@.
PR kern/58210

ttm: Sync ttm_bo_uvm_fault_idle better with Linux.
PR xsrc/58133
ttm: Undo mistake in previous.

PR xsrc/58133
linux: Add a few more cases to pci_get_class.
Should fix crash on boot with amdgpu now that the ACPI business is
enabled.

i915: Fix dmabuf mmap object.

drm: Fix missing bounds checks in dma buf mmap.

drm_gem.c: Fix sense of assertion.
This is the opposite of WARN_ON.
Noted by rjs@.

drm_gem.c: Enable drm_gem_fence_array_add now that we emulate xa.
linux_xa: Delete and replace collision in xa_store as intended.
Don't free the colliding node that's still in the tree.
Noted by rjs@.

i915_gem_mman.c: Apply mmap types via pmap flags.
This way, userland gets buffers mapped write-combining or uncached as
needed.
PR xsrc/58307

x86: Teach bus_dmamem_map about BUS_DMA_PREFETCHABLE.
PR port-amd64/58308

bus_dma(9): Document BUS_DMA_PREFETCHABLE.
Like BUS_DMA_NOCACHE. Doesn't absolve you of the need for
bus_dmamap_sync, but if you later pass the vaddr to bus_dmamap_load,
the DMA map might notice the mapping is write-combining and use this
to make bus_dmamap_sync cheaper.
PR kern/58309

nouveau_nvkm_subdev_instmem_gk20a.c: Use BUS_DMA_PREFETCHABLE.
Matches Linux's pgprot_writecombine.
Unclear where the appropriate bus_dmamap_sync happens, or is supposed
to happen -- not using it would be wrong, but asking for a
prefetchable mapping may paper over symptoms, at least!

ttm: Sync more with Linux.
Add the original copyright and attribution since this is now,
intentionally, a modified copy of the original and not just roughly
the same algorithm.

ttm: Respect PGO_ALLPAGES.
Not sure this is useful but it reduces XXX's and makes this match
udv_fault better so it's easier to understand.

ttm: Sync cacheability flag logic with Linux.

ttm: Add XXX about readahead fault failures.

pci: Pass cookie through pci_find_device, pci_enumerate_bus, take 2.
New functions pci_find_device1 and pci_enumerate_bus1 have the cookie
argument. Existing symbols pci_find_device and pci_enumerate_bus are
now wrappers for the cookieless version.
This will allow pci_find_device callers to pass a cookie through to
the match function so they can keep state or pass in extra parameters
like b/d/f numbers, which will allow us to nix some horrible kludges
in the Linux PCI API emulation for drm (and, perhaps, Intel wifi).
This change drops the symbol pci_probe_device, in favour of a new
pci_probe_device1 with the cookie argument. But I don't think that
requires a revbump because it's only called by MD pci_enumerate_bus1
implementations, which don't live in modules anyway.
Take 2: Make sure to handle NULL match function.
linux_pci: Nix pci enumeration kludges.
Now that we can pass a cookie through, this stuff will be a little
less fragile.

i915: Omit needless i915_gem_object_pin/unpin_pages cycle in fault.
vm_fault_cpu and vm_fault_gtt, called by i915_gem_fault, already do
the pinning and unpinning internally, so there is no need for
i915_gem_fault to do it.
No functional change intended, except that the transient pin count
will be one lower than before during the fault routine (but it will
still be positive).

i915: Match Linux fault routine return code actions.
Omit needless EINTR interception -- this is now handled by
i915_error_to_vmf_fault.
Earlier revert was over a false alarm -- bisection shows the new
warnings arose from linux_pci.c 1.29 here:
https://mail-index.netbsd.org/source-changes/2024/06/23/msg151929.html

linux_pci: Fix shifto in pci_get_class.
It looks like Linux's pci_get_class also matches the interface part
of the PCI class register (but not the revision part), and I hadn't
noticed that in the previous shim structured differently.

With GCC12 kernel ALL/amd64 triggers "'sor' may be used uninitialized".
If "sublinks & 3" is zero GCC is right and sor[1] may be returned unitialized.
Fix by initializing "sor" to zero to return -1 instead of uninitialized value.
Ok: Taylor R Campbell <riastradh@>

amdgpu: Map BAR 2, not BAR 5, on pre-bonaire chips.
PR kern/58384

amdgpu: Map consecutive pages, not the same one over and over again.
PR kern/58385

linux/bitops: Fix overestimate for BITS_TO_LONGS(9)
Fortunately, this seems harmless except for allocating
excessive buffer memory.
Pointed out by nonaka@, OK riastradh@.
 1.89.4.1 20-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #881):

sys/arch/x86/x86/bus_dma.c: revision 1.90

x86/bus_dma.c: Sprinkle KASSERTMSG.
 1.6 29-Nov-2003  fvdl This file is dead. It has ceased to be. It has gone to meet it's maker. It
is a late file.

Any rumours of it pining for the fjords are totally unsubstantiated.
 1.5 28-Nov-2003  jhawk Manually moved from Attic/ in the repository since cvs failed to do so.
 1.4 28-Jun-2003  darrenr branches: 1.4.2;
Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.3 12-Mar-2003  thorpej Split bus_space and bus_dma into separate files.
 1.2 03-Mar-2003  fvdl Use pmap_cpu_has_pg_n()
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.4.2.1 03-Aug-2004  skrll Sync with HEAD
 1.47 17-Jul-2022  riastradh x86: Cite reference for bus_space_barrier memory ordering rules.
 1.46 07-Oct-2021  msaitoh KNF. No functional change.
 1.45 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.44 21-Apr-2020  jdolecek two more files to convert to newer HYPERVISOR_physdev_op() interface
 1.43 03-Dec-2019  riastradh branches: 1.43.6;
Skip fences in bus_space_barrier on I/O space.

I/O operations are issued in program order. Not that I/O operations
are usually a performance bottleneck anyway, but maybe it is slightly
cheaper to avoid stalling on store buffers or pending loads, and
there's very little cost to the skipping criterion here.
 1.42 02-Dec-2019  riastradh Use LFENCE/SFENCE/MFENCE in x86 bus_space_barrier.

These are needed for BUS_SPACE_MAP_PREFETCHABLE mappings. On x86,
these are WC-type memory regions, which means -- unlike normal
WB-type memory regions -- loads can be reordered with loads,
requiring LFENCE, and stores can be reordered with stores, requiring
SFENCE.

Reference: AMD64 Architecture Programmer's Manual, Volume 2: System
Programming, Sec. 7.4.1 `Memory Barrier Interaction with Memory
Types', Table 7-3 `Memory Access Ordering Rules'.
 1.41 11-Feb-2019  cherry branches: 1.41.4;
We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.40 01-Jun-2017  chs branches: 1.40.10;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.39 20-Jan-2017  maya increase max io mem on amd64. some devices need it.
 1.38 27-Jan-2012  para branches: 1.38.6; 1.38.22; 1.38.24; 1.38.28; 1.38.32;
converting extent(9) from malloc(9) to kmem(9)
preceding kmem-vmem-pool-uvm patch

releng@ acknowledged
 1.37 25-Aug-2011  dyoung branches: 1.37.2; 1.37.6;
Initialize bst_exists in bus_space_tag_create(9). Use it to avoid
walking the chain of ancestor tags to see if a bus_space(9) routine was
overridden.
 1.36 25-Jul-2011  dyoung Constify bus_space_reserve_subregion() implementation.
 1.35 08-Jul-2011  dyoung Bring bus_space_tag_create(9) up-to-date with doco.

Fix overrides of bus_space_unmap(9) et cetera.
 1.34 06-Jul-2011  dyoung Implement bus_space_tag_create() and _destroy().

Factor bus_space_reserve(), bus_space_release(), et cetera out of
bus_space_alloc(), bus_space_map(), bus_space_free(), bus_space_unmap(),
et cetera.

For i386 and amd64, activate the use of <machine/bus_defs.h> and
<machine/bus_funcs.h> by #defining __HAVE_NEW_STYLE_BUS_H in
their respective types.h. While I'm here, remove unnecessary
__HAVE_DEVICE_REGISTER #defines.
 1.33 11-Feb-2011  jmcneill add bus_space_mmap support for BUS_SPACE_MAP_PREFETCHABLE, ok matt@
 1.32 10-Jan-2011  jruoho branches: 1.32.2; 1.32.4;
Bump iomem_ex_storage from 16 to 64. Based on analysis from joda@:

http://mail-index.netbsd.org/current-users/2010/10/01/msg014446.html

Discussed with mrg@ and jmcneill@.
 1.31 27-Jul-2010  jakllsch Allow BUS_SPACE_DEBUG to compile on amd64.
 1.30 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.29 10-May-2010  dyoung Provide pmap_enter_ma(), pmap_extract_ma(), pmap_kenter_ma() in all x86
kernels, and use them in the bus_space(9) implementation instead of ugly
Xen #ifdef-age. In a non-Xen kernel, the _ma() functions either call or
alias the equivalent _pa() functions.

Reviewed on port-xen@netbsd.org and port-i386@netbsd.org. Passes
rmind@'s and bouyer@'s inspection. Tested on i386 and on Xen DOMU /
DOM0.
 1.28 28-Apr-2010  dyoung #include <sys/bus.h> instead of <machine/bus.h> here to get all of the
MI prototypes.
 1.27 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.26 07-Nov-2009  cegger branches: 1.26.2; 1.26.4;
Implement pmap_kenter_pa(9) new flag argument in x86.
Make x86 bus_space(9) using it to eliminate an extra TLB flush.
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
Thanks to Martin Husemann for spotting copy&pasto errors in the original patch version.
 1.25 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.24 03-Sep-2009  jmcneill Fix a bug where mapping the very end of iomem accidentally returns an
address in the ISA hole (because addr+size calculations wrap to 0). Fixes
ohci on VirtualPC 7 for Mac, which places OHCI at base address 0xfffff000
size 0x1000.
 1.23 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.22 10-Mar-2009  bouyer physical addresses may not fit in u_long, use paddr_t
 1.21 08-Feb-2009  bouyer branches: 1.21.2;
Apply patch proposed on port-amd64/port-i386, allowing to use a 64bit
bus_addr_t on i386PAE kernels:
change bus_addr_t to be a paddr_t (so its size follows paddr_t depending
on options PAE)
remplace bus_addr_t with vaddr_t where the value is used as a virtual address.

Difference with the proposed patch: cast to uintmax_t and use %jx in
printf() as suggested by Joerg.
 1.20 21-Oct-2008  cegger branches: 1.20.2; 1.20.4; 1.20.10;
introduce two macros: xendomain_is_dom0() and xendomain_is_privileged(). Use them.
 1.19 01-May-2008  ad branches: 1.19.6;
Kernel preemption needs to be off for tlb shootdowns.
 1.18 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.17 16-Apr-2008  cegger branches: 1.17.2; 1.17.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.16 12-Apr-2008  cegger ansify
 1.15 11-Jan-2008  bouyer branches: 1.15.6;
Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.14 20-Dec-2007  ad - Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.13 28-Nov-2007  ad branches: 1.13.2; 1.13.6;
Remove remaining CPUCLASS_386 tests.
 1.12 22-Nov-2007  bouyer Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.11 17-Oct-2007  garbled branches: 1.11.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.10 26-Sep-2007  ad branches: 1.10.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.9 29-Aug-2007  ad branches: 1.9.2;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.8 04-Mar-2007  christos branches: 1.8.2; 1.8.10; 1.8.14; 1.8.18; 1.8.20;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.7 16-Nov-2006  christos branches: 1.7.4;
__unused removal on arguments; approved by core.
 1.6 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.5 27-Sep-2006  cube This is again that time of the millenium where we have to crank up a few
static limits to meet modern bloat requirements.

VM_PHYSSEG_MAX needs it to run on Intel's D946GZIS motherboard, as reported
by rix on #NetBSD-code on freenode. This has a consequence on the initial
number of possible extent allocations for iomem_ex, so increase that value
too.

While there, clarify the action to be taken when VM_PHYSSEG_MAX is maxed
out.

Do that on both amd64 and i386 because the causes, the effects and the code
are mostly the same.
 1.4 24-Nov-2005  yamt branches: 1.4.20; 1.4.22;
bus_dmamem_map: honour BUS_DMA_NOWAIT. noted by Manuel Bouyer.
bus_space_map: always do NOWAIT allocation as it used to be before yamt-km.

we have too many copies!
 1.3 01-Apr-2005  yamt branches: 1.3.2; 1.3.8;
merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.2 14-Mar-2003  christos branches: 1.2.2; 1.2.10; 1.2.12;
pmap_kremove the pages before uvm_km_free'ing them. Thanks jason!
 1.1 12-Mar-2003  thorpej Split bus_space and bus_dma into separate files.
 1.2.12.1 25-Jan-2005  yamt - convert i386 to new apis.
- remove a pmap bootstrap kludge, which is no longer needed.
 1.2.10.1 29-Apr-2005  kent sync with -current
 1.2.2.2 11-Dec-2005  christos Sync with head.
 1.2.2.1 01-Apr-2005  skrll Sync with HEAD.
 1.3.8.1 29-Nov-2005  yamt sync with head.
 1.3.2.6 21-Jan-2008  yamt sync with head
 1.3.2.5 07-Dec-2007  yamt sync with head
 1.3.2.4 27-Oct-2007  yamt sync with head.
 1.3.2.3 03-Sep-2007  yamt sync with head.
 1.3.2.2 30-Dec-2006  yamt sync with head.
 1.3.2.1 21-Jun-2006  yamt sync with head.
 1.4.22.2 10-Dec-2006  yamt sync with head.
 1.4.22.1 22-Oct-2006  yamt sync with head
 1.4.20.1 18-Nov-2006  ad Sync with head.
 1.7.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.8.20.3 23-Mar-2008  matt sync with HEAD
 1.8.20.2 09-Jan-2008  matt sync with HEAD
 1.8.20.1 06-Nov-2007  matt sync with HEAD
 1.8.18.4 03-Dec-2007  joerg Sync with HEAD.
 1.8.18.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.8.18.2 02-Oct-2007  joerg Sync with HEAD.
 1.8.18.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.8.14.1 03-Sep-2007  skrll Sync with HEAD.
 1.8.10.1 03-Oct-2007  garbled Sync with HEAD
 1.8.2.7 03-Dec-2007  ad Sync with HEAD.
 1.8.2.6 09-Oct-2007  ad Sync with head.
 1.8.2.5 09-Oct-2007  ad Sync with head.
 1.8.2.4 23-Aug-2007  ad - Use pmap_pte_setbits, pmap_pte_clearbits.
- Shoot down a range instead of single pages in bus_space.c.
 1.8.2.3 21-Aug-2007  ad amd64 changes, as yet untested:

- Adapt to vmlocking branch.
- Apply TLB shootdown and pv allocation changes to the pmap.
- Make it build.
 1.8.2.2 21-Aug-2007  ad - Update PTEs atomically. Only shoot down if the mappings were in use.
- _bus_dmamem_unmap: don't shootdown, pmap_remove() will do it for us.
- Assume that pmap_update() will wait for the TLB shootdown to complete.
 1.8.2.1 29-Jul-2007  ad - When zeroing/copying pages, use SSE2 movtni to avoid polluting the cache.
- By default, align assembly routines on 32-byte starting boundaries.
- There are now 8 interrupt priority levels, half of which are softints.
Update intrdefs.h to match.
- Always clear/set spinlock words - removes lots of ifdefs.
- Remove the horrible ci_self150 hack that I introduced.
- Overhaul how TLB shootdown is performed. Inspired by a similar change in
OpenBSD but implemented quite differently. This should be a lot faster
but I have not benchmarked it yet.
 1.9.2.1 06-Oct-2007  yamt sync with head.
 1.10.2.2 21-Nov-2007  bouyer Use pmap_kenter_ma() for Xen.
 1.10.2.1 16-Nov-2007  bouyer Initial domain0 support for xenamd64. The kernel boots multiuser, but
xen tools have not been tried yet.
In this process, cleanup some more the page table bootstrap, and properly
handle event counters for soft interrupts.
 1.11.2.3 18-Feb-2008  mjf Sync with HEAD.
 1.11.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.11.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.13.6.2 02-Jan-2008  bouyer Sync with HEAD
 1.13.6.1 13-Dec-2007  bouyer remove interractions with the old xeni386 pmap.
 1.13.2.1 26-Dec-2007  ad Sync with head.
 1.15.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.15.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.17.4.6 11-Aug-2010  yamt sync with head.
 1.17.4.5 11-Mar-2010  yamt sync with head
 1.17.4.4 16-Sep-2009  yamt sync with head
 1.17.4.3 19-Aug-2009  yamt sync with head.
 1.17.4.2 04-May-2009  yamt sync with head.
 1.17.4.1 16-May-2008  yamt sync with head.
 1.17.2.1 18-May-2008  yamt sync with head.
 1.19.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.20.10.1 21-Apr-2010  matt sync to netbsd-5
 1.20.4.3 30-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/x86/bus_space.c: revision 1.22
physical addresses may not fit in u_long, use paddr_t
 1.20.4.2 29-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/include/bus.h: revision 1.18
sys/arch/x86/include/isa_machdep.h: revision 1.7
sys/arch/x86/x86/bus_space.c: revision 1.21
Apply patch proposed on port-amd64/port-i386, allowing to use a 64bit
bus_addr_t on i386PAE kernels:
change bus_addr_t to be a paddr_t (so its size follows paddr_t depending
on options PAE)
remplace bus_addr_t with vaddr_t where the value is used as a virtual address.
Difference with the proposed patch: cast to uintmax_t and use %jx in
printf() as suggested by Joerg.
 1.20.4.1 16-Sep-2009  snj Pull up following revision(s) (requested by jmcneill in ticket #940):
sys/arch/x86/x86/bus_space.c: revision 1.24
Fix a bug where mapping the very end of iomem accidentally returns an
address in the ISA hole (because addr+size calculations wrap to 0). Fixes
ohci on VirtualPC 7 for Mac, which places OHCI at base address 0xfffff000
size 0x1000.
 1.20.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.20.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.21.2.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.21.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.21.2.3 24-Oct-2010  jym Sync with HEAD
 1.21.2.2 01-Nov-2009  jym Sync with HEAD.
 1.21.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.26.4.2 05-Mar-2011  rmind sync with head
 1.26.4.1 30-May-2010  rmind sync with head
 1.26.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.26.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.32.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.32.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.37.6.1 18-Feb-2012  mrg merge to -current.
 1.37.2.1 17-Apr-2012  yamt sync with head
 1.38.32.1 21-Apr-2017  bouyer Sync with HEAD
 1.38.28.1 20-Mar-2017  pgoyette Sync with HEAD
 1.38.24.2 28-Aug-2017  skrll Sync with HEAD
 1.38.24.1 05-Feb-2017  skrll Sync with HEAD
 1.38.22.1 26-Mar-2017  snj Pull up following revision(s) (requested by maya in ticket #1375):
sys/arch/amd64/include/param.h: revision 1.20
sys/arch/i386/include/param.h: revision 1.80
sys/arch/x86/x86/bus_space.c: revision 1.39
increase max io mem on amd64. some devices need it.
 1.38.6.1 03-Dec-2017  jdolecek update from HEAD
 1.40.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.40.10.1 10-Jun-2019  christos Sync with HEAD
 1.41.4.1 17-Dec-2019  martin Pull up following revision(s) (requested by riastradh in ticket #566):

sys/arch/x86/x86/bus_space.c: revision 1.42
sys/arch/x86/x86/bus_space.c: revision 1.43

Use LFENCE/SFENCE/MFENCE in x86 bus_space_barrier.

These are needed for BUS_SPACE_MAP_PREFETCHABLE mappings. On x86,
these are WC-type memory regions, which means -- unlike normal
WB-type memory regions -- loads can be reordered with loads,
requiring LFENCE, and stores can be reordered with stores, requiring
SFENCE.

Reference: AMD64 Architecture Programmer's Manual, Volume 2: System
Programming, Sec. 7.4.1 `Memory Barrier Interaction with Memory
Types', Table 7-3 `Memory Access Ordering Rules'.

Skip fences in bus_space_barrier on I/O space.

I/O operations are issued in program order. Not that I/O operations
are usually a performance bottleneck anyway, but maybe it is slightly
cheaper to avoid stalling on store buffers or pending loads, and
there's very little cost to the skipping criterion here.
 1.43.6.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.18 11-May-2008  ad Simplify x86 identcpu code, and share between i386/amd64.
 1.17 28-Apr-2008  martin branches: 1.17.2;
Remove clause 3 and 4 from TNF licenses
 1.16 16-Apr-2008  cegger branches: 1.16.2; 1.16.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.15 11-Mar-2008  joerg Use CPUID2EXTFAMILY and CPUID2EXTMODEL.
 1.14 04-Jan-2008  ad branches: 1.14.2; 1.14.6;
Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.13 04-Jan-2008  christos add missing includes
 1.12 17-Oct-2007  garbled branches: 1.12.2; 1.12.8;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.11 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.10 28-Aug-2006  christos branches: 1.10.12; 1.10.20; 1.10.30; 1.10.32; 1.10.34;
Add missing initializers
 1.9 21-Feb-2006  thorpej branches: 1.9.2;
Use aprint_*().
 1.8 11-Dec-2005  christos branches: 1.8.2; 1.8.4; 1.8.6;
merge ktrace-lwp.
 1.7 29-May-2005  christos branches: 1.7.2;
Sprinkle const.
 1.6 17-Aug-2004  briggs Get correct cache information for earlier VIA C3 models.
Mostly from PR kern/26689 submitted by Michael van Elst.
 1.5 08-Aug-2004  briggs VIA C3 cache info.
 1.4 04-Jul-2004  mycroft b -> B
 1.3 14-Jul-2003  lukem branches: 1.3.2;
add __KERNEL_RCSID()
 1.2 14-May-2003  fvdl branches: 1.2.2;
Add RCS Id and/or copyright notice.
 1.1 25-Apr-2003  fvdl Share some common cache info cpuid code between i386 and x86_64.
 1.2.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.2.2.2 12-Aug-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.2.3 22-Aug-2004  tron Pull up revision 1.6 (requested by briggs in ticket #770):
Get correct cache information for earlier VIA C3 models.
Mostly from PR kern/26689 submitted by Michael van Elst.
 1.3.2.2 12-Aug-2004  jmc Pullup rev 1.5 (requested by briggs in ticket #742)

Enable VIA C3 CPU support
 1.3.2.1 04-Jul-2004  he Pull up revision 1.4 (requested by mycroft in ticket #595):
Use ``B'' instead of ``b'' to indicate byte.
 1.7.2.5 17-Mar-2008  yamt sync with head.
 1.7.2.4 21-Jan-2008  yamt sync with head
 1.7.2.3 27-Oct-2007  yamt sync with head.
 1.7.2.2 30-Dec-2006  yamt sync with head.
 1.7.2.1 21-Jun-2006  yamt sync with head.
 1.8.6.1 22-Apr-2006  simonb Sync with head.
 1.8.4.1 09-Sep-2006  rpaulo sync with head
 1.8.2.1 01-Mar-2006  yamt sync with head.
 1.9.2.1 03-Sep-2006  yamt sync with head.
 1.10.34.1 06-Oct-2007  yamt sync with head.
 1.10.32.3 23-Mar-2008  matt sync with HEAD
 1.10.32.2 09-Jan-2008  matt sync with HEAD
 1.10.32.1 06-Nov-2007  matt sync with HEAD
 1.10.30.1 02-Oct-2007  joerg Sync with HEAD.
 1.10.20.1 03-Oct-2007  garbled Sync with HEAD
 1.10.12.1 09-Oct-2007  ad Sync with head.
 1.12.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.12.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.14.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.14.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.14.2.1 24-Mar-2008  keiichi sync with head.
 1.16.4.1 16-May-2008  yamt sync with head.
 1.16.2.1 17-Jun-2008  yamt fix merge botches
 1.17.2.1 23-Jun-2008  wrstuden Remove files removed on branch. Updating using patch has its
drawbacks. :-)
 1.1 18-Mar-2018  christos branches: 1.1.2;
Separate the compat code in its own file to facilitate module building.
 1.1.2.2 18-Mar-2018  pgoyette Import more christos@ changes from -current
 1.1.2.1 18-Mar-2018  pgoyette file compat_60_cpu_ucode.c was added on branch pgoyette-compat on 2018-03-18 00:35:26 +0000
 1.43 10-Oct-2025  skrll More stuff to revert wrt ucom console.
 1.42 10-Oct-2025  manu Support console over USB-to-serial

This require up-to-date UEFI bootstrap. Example usage from boot.cfg:
consdev=com4,115200
menu=NetBSD ucom0:kconsdev ucom0,115200;boot netbsd
 1.41 30-Apr-2025  imil Introduce pvh_boot boolean to identify the real hypervisor when booting in PVH
mode.

As of now, sys/arch/x86/x86/identcpu.c / identify_hypervisor() returns in the
case of vm_guest being VM_GUEST_GENPVH, yet this VM type is not an actual
hypervisor but an information recorded in locore.S to drive boot method.
We need to investigate what type of hypervisor is really running the VM in
order to apply specifics, so instead of relying on vm_guest_is_pvh() which only
checks for VM_GUEST_XENPVH || VM_GUEST_GENPVH, pvh_boot informs on the boot
method while allowing to identify the real hypervisor.

Idea ok'd by bouyer@, tested on Xen domU, Xen dom0 with GENERIC PVH and
qemu GENERIC PVH boot.
 1.40 02-Dec-2024  bouyer Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.
 1.39 09-Feb-2024  andvar branches: 1.39.2;
fix spelling mistakes, mainly in comments and log messages.
 1.38 17-Oct-2023  bouyer XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
 1.37 17-Oct-2023  bouyer Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.
Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen
when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.
x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console
xen/x86/consinit.c: support genfb as possible console
xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.
xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.
 1.36 24-Mar-2023  bouyer Allow a PVH dom0 to use VGA as console: make xen_pvh_consinit() return 1 if
it handles the console and 0 otherwise (especially when console=tty0 or
console=pc is present on the command line).
In consinit() fallback to native console selection if xen_pvh_consinit()
returns 0.
 1.35 05-Sep-2022  riastradh branches: 1.35.4;
x86: Fix interaction between consinit, device_pci_register, and drm.

Leave an essay on what's going on here in both places with
cross-references.

PR kern/56996
 1.34 07-Oct-2021  msaitoh KNF. No functional change.
 1.33 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.32 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.31 31-May-2019  nonaka branches: 1.31.8;
Back out r1.30 change.

> tuck in include inside ifdef, from Ryosuke Moro

It was caused by the reporter's local change.
 1.30 26-May-2019  christos tuck in include inside ifdef, from Ryosuke Moro
 1.29 24-May-2019  nonaka Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.28 11-Jan-2015  is branches: 1.28.10; 1.28.18;
Add support for the (cobalt) nullcons to amd64 and i386.
 1.27 12-Mar-2014  martin branches: 1.27.4; 1.27.6;
#ifdef a variable like its use
 1.26 26-Jan-2014  taca Fix build problem when there is no com(4) but ucom(4).
 1.25 26-Jan-2014  msaitoh PUCCN improvements:
- Fix a bug that the puc cn mechanism doesn't use the UART's frequency
in pucdata.c's table.

- Add a new option PUC_CNAUTO. If this option is set, consinit() in
x86/x86/consinit.c checks puc com device to use it as console.
Without this option, the behavior is the same as before.

- Add a new config parameter PUC_CNBUS. The old code scans bus #0 only.
If PUC_CNBUS is set, the specified number's bus will be scanned.

- Rename comcnprobe() to puc_cnprobe() to make it clear.

- Rename comcninit() to puc_cninit() to make it clear.

- Add code for a device that a device's com register is MMIO (#if0 ed).
 1.24 13-Oct-2012  jdc branches: 1.24.2;
Adapt to the changed signature of pckbc_cnattach().
 1.23 18-Nov-2011  jmcneill branches: 1.23.10;
remove Xbox support
 1.22 01-Jul-2011  dyoung branches: 1.22.2;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.21 28-Apr-2010  dyoung On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.20 18-Mar-2009  cegger branches: 1.20.2; 1.20.4;
Ansify function definitions w/o arguments. Generated with sed.
 1.19 15-Mar-2009  cegger ansify function definitions
 1.18 19-Feb-2009  jmcneill vesafb is no more
 1.17 17-Feb-2009  jmcneill PR# port-i386/37026: userconf(4) doesn't work with vesafb(4)

Add early console support for x86 genfb.
 1.16 16-Feb-2009  jmcneill Kernel-side modifications for framebuffer console support on i386 and amd64.

* New BTINFO_FRAMEBUFFER kernel parameter to pass screen configuration
* Early attach support for framebuffer console
* Pass BTINFO_FRAMEBUFFER parameters to genfb in device_register
* Provide hooks to genfb to set VGA DAC palette in 8bpp mode
 1.15 14-Nov-2007  ad branches: 1.15.18; 1.15.26; 1.15.32;
- Remove I486_CPU, I586_CPU, I686_CPU options. They buy us nothing and
clutter the code significantly.
- Remove pccons.
 1.14 05-Jan-2007  jmcneill branches: 1.14.6; 1.14.22; 1.14.24; 1.14.28; 1.14.30;
Allow xboxfb to attach and initialize the display early in the boot process.
 1.13 05-Jan-2007  jmcneill xboxfb is a possible candidate for the console screen, from Andrew Gillham
 1.12 13-Aug-2006  jmcneill branches: 1.12.2;
No longer try to attach unichromefb as an initial console device.
 1.11 02-Aug-2006  jmcneill Allow unichromefb(4) to be the system console.
 1.10 24-Apr-2006  jmcneill Attach vesafb driver if available.
 1.9 03-Feb-2006  jmmv branches: 1.9.2; 1.9.4; 1.9.6; 1.9.8;
Implement support for 'The Multiboot Specification' so that i386 kernels
can be booted directly from Multiboot-compliant boot loaders (e.g. GRUB).
See the added multiboot(8) manual page for more information.

No objections in tech-kern@; only positive comments.
 1.8 11-Dec-2005  christos branches: 1.8.2; 1.8.4;
merge ktrace-lwp.
 1.7 06-May-2005  augustss branches: 1.7.2;
Print a warning if no console keyboard was found in consinit().
 1.6 06-May-2005  augustss Move declaration of error variable to avoid 'unused' warning.
 1.5 29-Apr-2005  augustss Attach a USB keyboard as console keyboard if pckbc_cnattach() failed or
if there is no pckbc configured. Previously only the latter cuased the
USB keyboard to be used.
This should make it more likely to get the USB keyboard as the console
on legacy free machines using the GENERIC config file.
 1.4 13-Mar-2004  bjh21 Abstract the interface between pckbc(4), and the pckbd(4) and pms(4)
drivers that attach to it. This allows for other host interface chips
that use the same keyboards and mice, such as the ones in the ARM
IOMD20, ARM7500, and SA-1111. The PC-compatible driver is still
called pckbc(4), and the new abstraction layer is "pckbport", so the
child devices have moved from sys/dev/pckbc to sys/dev/pckbport, which
also contains some code shared between all host controllers. To avoid
incompatibility, pckbdreg.h is still installed in
/usr/include/dev/pckbc.

In theory, this shouldn't cause any behavioural changes in the drivers
concerned. Thy just use rather more function pointers than before. Tested
on i386 and (with a new host driver) acorn32. Compiled on several other
affected architectures.
 1.3 14-Jun-2003  thorpej branches: 1.3.2;
Also pass a type argument to comcnattach() and com_kgdb_attach().
comspeed() (and thus cominit()) may need this information.
 1.2 02-Mar-2003  fvdl Clean up some unneeded "mca.h" and "eisa.h" includes, make one that is
needed dependent on !__x86_64__. To be revisited later.
 1.1 27-Feb-2003  fvdl Moved here from arch/i386/i386.
 1.3.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.7.2.4 15-Nov-2007  yamt sync with head.
 1.7.2.3 26-Feb-2007  yamt sync with head.
 1.7.2.2 30-Dec-2006  yamt sync with head.
 1.7.2.1 21-Jun-2006  yamt sync with head.
 1.8.4.1 09-Sep-2006  rpaulo sync with head
 1.8.2.1 18-Feb-2006  yamt sync with head.
 1.9.8.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.9.6.1 11-May-2006  elad sync with head
 1.9.4.3 03-Sep-2006  yamt sync with head.
 1.9.4.2 11-Aug-2006  yamt sync with head
 1.9.4.1 24-May-2006  yamt sync with head.
 1.9.2.1 01-Jun-2006  kardel Sync with head.
 1.12.2.1 12-Jan-2007  ad Sync with head.
 1.14.30.1 19-Nov-2007  mjf Sync with HEAD.
 1.14.28.1 18-Nov-2007  bouyer Sync with HEAD
 1.14.24.1 09-Jan-2008  matt sync with HEAD
 1.14.22.1 21-Nov-2007  joerg Sync with HEAD.
 1.14.6.1 03-Dec-2007  ad Sync with HEAD.
 1.15.32.4 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.15.32.3 24-Oct-2010  jym Sync with HEAD
 1.15.32.2 01-Nov-2009  jym Sync with HEAD.
 1.15.32.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.15.26.2 28-Apr-2009  skrll Sync with HEAD.
 1.15.26.1 03-Mar-2009  skrll Sync with HEAD.
 1.15.18.2 11-Aug-2010  yamt sync with head.
 1.15.18.1 04-May-2009  yamt sync with head.
 1.20.4.1 30-May-2010  rmind sync with head
 1.20.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.22.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.22.2.2 30-Oct-2012  yamt sync with head
 1.22.2.1 17-Apr-2012  yamt sync with head
 1.23.10.3 03-Dec-2017  jdolecek update from HEAD
 1.23.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.23.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.24.2.1 18-May-2014  rmind sync with head
 1.27.6.1 06-Apr-2015  skrll Sync with HEAD
 1.27.4.1 28-Jan-2015  martin Pull up following revision(s) (requested by is in ticket #462):
sys/arch/x86/x86/consinit.c: revision 1.28
Add support for the (cobalt) nullcons to amd64 and i386.
 1.28.18.1 10-Jun-2019  christos Sync with HEAD
 1.28.10.1 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.31.8.1 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.35.4.3 29-Mar-2025  martin Pull up following revision(s) (requested by imil in ticket #1074):

sys/arch/x86/x86/x86_machdep.c: revision 1.155
sys/arch/x86/include/cpu.h: revision 1.137
sys/arch/x86/x86/x86_machdep.c: revision 1.156
sys/arch/x86/include/cpu.h: revision 1.138
sys/arch/x86/x86/consinit.c: revision 1.40
sys/arch/x86/acpi/acpi_machdep.c: revision 1.37
sys/arch/x86/acpi/acpi_machdep.c: revision 1.38
sys/arch/amd64/amd64/machdep.c: revision 1.370
sys/arch/xen/xen/hypervisor.c: revision 1.97
sys/arch/xen/xen/hypervisor.c: revision 1.98
sys/arch/amd64/amd64/genassym.cf: revision 1.98
sys/arch/x86/x86/x86_autoconf.c: revision 1.88
sys/arch/x86/x86/x86_autoconf.c: revision 1.89
sys/arch/amd64/amd64/locore.S: revision 1.226
sys/arch/amd64/amd64/locore.S: revision 1.227
sys/arch/x86/x86/identcpu.c: revision 1.131

Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.

Get one more change from PR kern/57813, needed for non-Xen PVH.

Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.35.4.2 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.35.4.1 30-Mar-2023  martin Pull up following revision(s) (requested by bouyer in ticket #131):

sys/arch/x86/x86/consinit.c: revision 1.36
sys/arch/xen/x86/pvh_consinit.c: revision 1.3
sys/arch/xen/include/xen.h: revision 1.48

Allow a PVH dom0 to use VGA as console: make xen_pvh_consinit() return 1 if
it handles the console and 0 otherwise (especially when console=tty0 or
console=pc is present on the command line).

In consinit() fallback to native console selection if xen_pvh_consinit()
returns 0.
 1.39.2.1 02-Aug-2025  perseant Sync with HEAD
 1.8 10-Feb-2024  andvar s/musn't/mustn't/ in comments.
 1.7 15-Oct-2020  mgorny Remove unnecessary <sys/systm.h> include
 1.6 15-Oct-2020  mgorny Fix s87_tw reconstruction to correctly indicate register states

Fix the code reconstructing s87_tw (full tag word) from fx_sw (abridged
tag word) to correctly represent all register states. The previous code
only distinguished between empty/non-empty registers, and assigned
'regular value' to all non-empty registers. The new code explicitly
distinguishes the two other tag word values: empty and special.
 1.5 15-Oct-2020  mgorny Revert "Merge convert_xmm_s87.c into fpu.c"

I am going to add ATF tests for these two functions, and having them
in a separate file will make it more convenient to build and run them
in userspace.
 1.4 23-May-2018  maxv Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.
 1.3 15-Feb-2014  dsl branches: 1.3.4; 1.3.6; 1.3.10; 1.3.34;
Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.
 1.2 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.1 07-Feb-2014  dsl Convert the amd64 build to use x86/cpu_extended_state.h so that the fpu
definitions match those of i386.
Mostly just structure and field renames, in addition:
1) process_xmm_to_s87() and process_s87_to_xmm() moved into
x86/convert_xmm_s87.c so they can be used by amd64's netbsd32 code.
2) The linux signal code simplified to use a structure copy for ths fxsave
data - it matches the hardware definition and won't change.
 1.3.34.1 25-Jun-2018  pgoyette Sync with HEAD
 1.3.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.10.1 15-Feb-2014  tls file convert_xmm_s87.c was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
 1.3.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.6.1 15-Feb-2014  yamt file convert_xmm_s87.c was added on branch yamt-pagecache on 2014-05-22 11:40:13 +0000
 1.3.4.2 18-May-2014  rmind sync with head
 1.3.4.1 15-Feb-2014  rmind file convert_xmm_s87.c was added on branch rmind-smpnet on 2014-05-18 17:45:30 +0000
 1.6 20-Nov-2019  pgoyette Move all non-emulation-specific coredump code into the coredump module,
and remove all #ifdef COREDUMP conditional compilation. Now, the
coredump module is completely separated from the emulation modules, and
they can all be independently loaded and unloaded.

Welcome to 9.99.18 !
 1.5 04-Jan-2014  dsl branches: 1.5.30;
Remove __HAVE_PROCESS_XFPREGS and add the extra parameter for the size
of the fp save area to all the process_read_fpregs() and
process_write_fpregs() functions.
None of the functions have been modified to use the new parameters.
The size is set for all the writes, but some of the arch-specific reads
just pass NULL.
The amd64 (and i386) need variable sized fp register save areas in order
to support AVX and other enhanced register areas.
These functions are rarely called - so the extra argument won't matter.
 1.4 01-Jan-2014  dsl Change the type of the 'cookie' that holds the state of the core dump file
from 'void *' to the actual type 'struct coredump_iostate *'.
In most of the code the contents of the structure are still unknown.
This just stops the wrong type of pointer being passed to the 'void *'
parameter.
I hope I've found everything, amd64 GENERIC and i386 GENERIC & ALL compile.
 1.3 21-Nov-2009  rmind branches: 1.3.12; 1.3.22; 1.3.26;
Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.2 15-Aug-2009  matt Include <sys/exec_aout.h> explicitly instead of relying on <sys/exec.h> to
do it for us.
 1.1 30-Mar-2009  rmind branches: 1.1.2; 1.1.4; 1.1.6;
Merge/move core_machdep.c into x86, no difference between i386 and amd64.
 1.1.6.4 24-Oct-2010  jym Sync with HEAD
 1.1.6.3 01-Nov-2009  jym Sync with HEAD.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 30-Mar-2009  jym file core_machdep.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.1.4.4 11-Mar-2010  yamt sync with head
 1.1.4.3 19-Aug-2009  yamt sync with head.
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 30-Mar-2009  yamt file core_machdep.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.1.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.1.2.1 30-Mar-2009  skrll file core_machdep.c was added on branch nick-hppapmap on 2009-04-28 07:34:57 +0000
 1.3.26.1 18-May-2014  rmind sync with head
 1.3.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.30.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.42 15-Jul-2024  gutteridge coretemp.c: drop redundant condition (NFCI)

Checking for a processor model upper limit has no point inside a block
that is already limited further. Noted from code inspection by Sotiris
Lamprinidis in PR kern/58372.

While here, also update to a cached version of an URL for processor
references, as both original URLs have now been removed by Intel.
 1.41 12-Mar-2024  gutteridge branches: 1.41.2;
coretemp.c: don't accept impossibly low TjMax values

r. 1.39 introduced a regression where instead of applying a reasonable
default maximum (as was done prior to that change), incorrect values
were accepted and applied, as failures to retrieve an expected MSR
value weren't accounted for.

Apply different logic for unexpectedly low vs. high maximums, with
distinct warnings for each. Also add another warning about a retrieval
failure right at the outset (which also just uses the default, then).

This change fundamentally doesn't address the fact that
__SHIFTOUT(msr, MSR_TEMP_TARGET_READOUT)
doesn't necessarily return a valid value. It just restores prior
behaviour, which is more reasonable than applying a zero value, which
started happening on some older hardware. (I infer this is most likely
an issue with dated generations of Intel hardware with this feature.)
The challenge is that this evidently isn't all documented properly
anywhere. Various "magic values" in this driver need further
investigation.

While here, also fix output so warnings are cleanly formatted, rather
than the slightly scrambled way they were appearing.

Tested on older Intel hardware I had on hand:
E7500 (now falls back to default 100 rather than 0)
E5540 (successfully retrieves 97, as before)
i5-3340M (successfully retrieves 105, as before)
 1.40 29-Feb-2024  gutteridge coretemp.c: fix grammar in a warning message

(I get several of these warnings on boot on a particular machine. Now,
it also seems that the code isn't retrieving the correct value, either;
TBD.)
 1.39 13-Jul-2023  msaitoh coretemp(4): Change limits of Tjmax.

- Change the lower limit from 70 to 60. At least, some BIOSes can change
the value down to 62.
- Change the upper limit from 110 to 120. At least, some BIOSes can change
the value up to 115.
- Print error message when rdmsr(TEMPERATURE_TARGET) failed.
- When Tjmax exceeded the limit, print warning message and use the value
as it is.
 1.38 07-Oct-2021  msaitoh branches: 1.38.4;
KNF. No functional change.
 1.37 27-Mar-2020  msaitoh Add special handling for model 0x0f stepping >=2 or mode 0x0e to get Tjmax.
 1.36 11-Jul-2018  msaitoh branches: 1.36.4;
- Detect and set Atom's Tj(max) to 90 if it's not the 45nm D400/D500/N400
series (90 for Diamondville and 100 for Pineview). From FreeBSD r221509.
- Reduce diff a little against FreeBSD.
 1.35 07-Jul-2016  msaitoh branches: 1.35.10; 1.35.16; 1.35.18;
KNF. Remove extra spaces. No functional change.
 1.34 27-May-2015  msaitoh - Change the Upper limit of Tjmax from 100 to 110 because some new
CPUs have 105. This change is the same as FreeBSD.
- Print Tjmax with aprint_verbose().
- Reduce the diff against FreeBSD.
 1.33 23-Apr-2015  pgoyette Update module dependencies for all the existing modules that depend on sysmon components.
 1.32 17-Nov-2013  martin branches: 1.32.4; 1.32.6;
Remove unused variable
 1.31 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.30 12-Nov-2013  msaitoh Fix calculation of the cpu model (display model) in coretemp_tjmax().
The CPUID2MODEL() macro returns only low 4bit, so the checking against 0x17
doesn't work correctly. The correct way is to use the display model.
Remove incorrect extmodel check. Same as FreeBSD.
 1.29 14-Aug-2012  jruoho branches: 1.29.2; 1.29.4;
Collect rnd(9) entropy from coretemp(4), acpibat(4), aibs(4), hpacel(4),
thinkpad(4), and aps(4).
 1.28 06-Oct-2011  jruoho branches: 1.28.2;
Like the comment says, also MSR_IA32_EXT_CONFIG is unsafe; use rdmsr_safe().
Fixes the panic reported by njoly@.
 1.27 24-Sep-2011  jruoho Use rdmsr_safe() when reading IA32_TEMPERATURE_TARGET.
 1.26 20-Jun-2011  pgoyette Inialize sensor state before registering.
 1.25 19-Mar-2011  ahoka branches: 1.25.2;
Dont try to read MSR_TEMPERATURE_TARGET on Core Duo Yonah
 1.24 18-Mar-2011  jruoho Comment out IA32_TEMPERATURE_TARGET temporarily.
 1.23 04-Mar-2011  jruoho Only attach on the first SMT ID (as in revision 1.16).
 1.22 24-Feb-2011  jruoho Fix autoconf(9) of cpufeaturebus.
 1.21 21-Feb-2011  jruoho Call pmf_device_deregister(9) during detach.
 1.20 21-Feb-2011  jruoho Add couple of additional CPU model checks for the undocumented Tj(max).
 1.19 21-Feb-2011  jruoho Use constants and bits(3), and fix a typo.
 1.18 20-Feb-2011  jruoho Add proper definitions. Remove (too) verbose comments. Remove (wrong) debug
printf. Do not mark the sensor as invalid based on whether the critical
detector output signal has (ever) been asserted without reset. Support for
trip-points will be added later.
 1.17 20-Feb-2011  jruoho Modularize coretemp(4). Ok jmcneill@.
 1.16 25-Aug-2010  jruoho branches: 1.16.2; 1.16.4;
Add definitions for Intel Digital Thermal Sensor and Power Management, at
CPUID Fn0000_0006, %eax, %ecx. Use these instead of magic numbers.
 1.15 15-Aug-2010  mrg only attach on SMT ID 0 cpus.

on my i7, cpus 0/4, 1/5, 2/6 and 3/7 have identical information and the
processor manual says that there are only 4 actual sensors.


this still doesn't attach (yet) on that system, due to a core solo/duo
errata being wrongly applied, but i haven't figured out the right way
to do that.
 1.14 14-Mar-2010  pgoyette branches: 1.14.2;
Remove setting of the edata->monitor since that member no longer exists.
 1.13 03-Dec-2009  sborrill branches: 1.13.2;
CPU model and CPU extended model cannot simply be summed; the extended model
differentiates different CPUs within a given model type (i.e. model 0xe with
extended model 0x1 is NOT the same as a model 0xf).
Modern Xeons do not support MSR_IA32_EXT_CONFIG, so use model and extended
model correctly to avoid it
 1.12 25-Mar-2009  dyoung It is only by accident that these get definitions they need from
<sys/device.h>, so explicitly #include <sys/device.h>.
 1.11 23-Sep-2008  christos branches: 1.11.2; 1.11.4; 1.11.8; 1.11.12;
PR/39458: Juan RP: avoid attaching coretemp on systems that don't have it
by checking the read valid bit.
 1.10 11-May-2008  ad branches: 1.10.4;
Don't abuse ci_cpuid - in particular, ci_cpuid != ci_signature.
 1.9 28-Jan-2008  xtraeme branches: 1.9.6; 1.9.8; 1.9.10; 1.9.12;
coretemp_refresh: run xc_unicast() regardless if sc->sc_ci is curcpu()
or not, this fixes a deadlock seen by Greg Oster in a Dual Quad Core
machine with 8 coretemp instances.
 1.8 28-Jan-2008  xtraeme Pass the same size to both kmem_alloc(9) and kmem_free(9), this fixes
the kmem_poison_check in DEBUG kernels.
 1.7 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.6 04-Jan-2008  xtraeme machine/cpufunc.h is required now.
 1.5 21-Dec-2007  xtraeme After comments from joerg@, backout previous and use 'cpuN'.
 1.4 21-Dec-2007  xtraeme Change the description to 'coreN' rather than 'cpuN', which seems to
be more correct.
 1.3 22-Nov-2007  xtraeme branches: 1.3.2; 1.3.4; 1.3.8;
Use the returned value of xc_unicast() on xc_wait(), that will wait
for completion on the CPU running the xcall thread.

Tested on a 8-way Xeon by Greg Oster.
 1.2 16-Nov-2007  xtraeme Extend the envsys2 API (one more time, sorry) as defined in:

http://mail-index.netbsd.org/tech-kern/2007/11/09/0001.html

sysmon_envsys_create() and sysmon_envsys_destroy() were added to
create/destroy sysmon_envsys objects (and its TAILQ/LIST for sensors/events).

sysmon_envsys_sensor_attach() and sysmon_envsys_sensor_detach() were
added to attach/detach sensors to a specified sysmon_envsys device.

The events framework is now per device and configurable via the
ENVSYS_SETDICTIONARY ioctl or /etc/envsys.conf and envstat(8).

Update all users and documentation to reflect these changes.
 1.1 29-Oct-2007  xtraeme branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10;
Add coretemp(4). A new driver for Intel Core's on-die thermal sensor,
available on Intel Core or newer CPUs.

Ported from FreeBSD. Tested by rmind on i386 and joerg on amd64.

Enabled with "options INTEL_CORETEMP".
 1.1.10.5 04-Feb-2008  yamt sync with head.
 1.1.10.4 21-Jan-2008  yamt sync with head
 1.1.10.3 07-Dec-2007  yamt sync with head
 1.1.10.2 15-Nov-2007  yamt sync with head.
 1.1.10.1 29-Oct-2007  yamt file coretemp.c was added on branch yamt-lazymbuf on 2007-11-15 11:43:40 +0000
 1.1.8.3 18-Nov-2007  bouyer Sync with HEAD
 1.1.8.2 13-Nov-2007  bouyer Sync with HEAD
 1.1.8.1 29-Oct-2007  bouyer file coretemp.c was added on branch bouyer-xenamd64 on 2007-11-13 16:00:20 +0000
 1.1.6.4 23-Mar-2008  matt sync with HEAD
 1.1.6.3 09-Jan-2008  matt sync with HEAD
 1.1.6.2 06-Nov-2007  matt sync with HEAD
 1.1.6.1 29-Oct-2007  matt file coretemp.c was added on branch matt-armv6 on 2007-11-06 23:23:46 +0000
 1.1.4.4 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.3 27-Dec-2007  mjf Sync with HEAD.
 1.1.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.1.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.1.2.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.1.2.3 21-Nov-2007  joerg Sync with HEAD.
 1.1.2.2 29-Oct-2007  joerg Sync with HEAD.
 1.1.2.1 29-Oct-2007  joerg file coretemp.c was added on branch jmcneill-pm on 2007-10-29 02:57:24 +0000
 1.3.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.3.4.1 26-Dec-2007  ad Sync with head.
 1.3.2.2 03-Dec-2007  ad Sync with HEAD.
 1.3.2.1 22-Nov-2007  ad file coretemp.c was added on branch vmlocking on 2007-12-03 19:04:31 +0000
 1.9.12.2 10-Oct-2008  skrll Sync with HEAD.
 1.9.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.9.10.5 09-Oct-2010  yamt sync with head
 1.9.10.4 11-Aug-2010  yamt sync with head.
 1.9.10.3 11-Mar-2010  yamt sync with head
 1.9.10.2 04-May-2009  yamt sync with head.
 1.9.10.1 16-May-2008  yamt sync with head.
 1.9.8.1 18-May-2008  yamt sync with head.
 1.9.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.9.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.10.4.1 19-Oct-2008  haad Sync with HEAD.
 1.11.12.1 21-Apr-2010  matt sync to netbsd-5
 1.11.8.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.11.8.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.11.8.3 24-Oct-2010  jym Sync with HEAD
 1.11.8.2 01-Nov-2009  jym Sync with HEAD.
 1.11.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.4.1 18-Dec-2009  snj Pull up following revision(s) (requested by sborrill in ticket #1180):
sys/arch/x86/x86/coretemp.c: revision 1.13
CPU model and CPU extended model cannot simply be summed; the extended model
differentiates different CPUs within a given model type (i.e. model 0xe with
extended model 0x1 is NOT the same as a model 0xf).
Modern Xeons do not support MSR_IA32_EXT_CONFIG, so use model and extended
model correctly to avoid it
 1.11.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.13.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.13.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.13.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.14.2.2 21-Apr-2011  rmind sync with head
 1.14.2.1 05-Mar-2011  rmind sync with head
 1.16.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.16.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.25.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.28.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.28.2.1 30-Oct-2012  yamt sync with head
 1.29.4.1 18-May-2014  rmind sync with head
 1.29.2.2 03-Dec-2017  jdolecek update from HEAD
 1.29.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.32.6.2 09-Jul-2016  skrll Sync with HEAD
 1.32.6.1 06-Jun-2015  skrll Sync with HEAD
 1.32.4.2 18-Nov-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1649):

sys/arch/x86/x86/coretemp.c: revision 1.36

- Detect and set Atom's Tj(max) to 90 if it's not the 45nm D400/D500/N400
series (90 for Diamondville and 100 for Pineview). From FreeBSD r221509.
- Reduce diff a little against FreeBSD.
 1.32.4.1 11-Aug-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #946):
sys/arch/x86/x86/coretemp.c: revision 1.34
- Change the Upper limit of Tjmax from 100 to 110 because some new
CPUs have 105. This change is the same as FreeBSD.
- Print Tjmax with aprint_verbose().
- Reduce the diff against FreeBSD.
 1.35.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.35.18.1 10-Jun-2019  christos Sync with HEAD
 1.35.16.1 28-Jul-2018  pgoyette Sync with HEAD
 1.35.10.3 29-Jul-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1857):

sys/arch/x86/x86/coretemp.c: revision 1.38-1.39 (patch)

coretemp(4): Change limits of Tjmax.
- Change the lower limit from 70 to 60. At least, some BIOSes can change
the value down to 62.
- Change the upper limit from 110 to 120. At least, some BIOSes can change
the value up to 115.
- Print error message when rdmsr(TEMPERATURE_TARGET) failed.
- When Tjmax exceeded the limit, print warning message and use the value
as it is.
- KNF.
 1.35.10.2 05-Aug-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #1589):

sys/arch/x86/x86/coretemp.c: revision 1.37

Add special handling for model 0x0f stepping >=2 or mode 0x0e to get Tjmax.
 1.35.10.1 26-Jul-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #936):
sys/arch/x86/x86/coretemp.c: revision 1.36
- Detect and set Atom's Tj(max) to 90 if it's not the 45nm D400/D500/N400
series (90 for Diamondville and 100 for Pineview). From FreeBSD r221509.
- Reduce diff a little against FreeBSD.
 1.36.4.3 22-Jun-2024  martin Pull up following revision(s) (requested by gutteridge in ticket #1848):

sys/arch/x86/x86/coretemp.c: revision 1.40

sys/arch/x86/x86/coretemp.c: revision 1.41

coretemp.c: fix grammar in a warning message
(I get several of these warnings on boot on a particular machine. Now,
it also seems that the code isn't retrieving the correct value, either;
TBD.)

coretemp.c: don't accept impossibly low TjMax values
r. 1.39 introduced a regression where instead of applying a reasonable
default maximum (as was done prior to that change), incorrect values
were accepted and applied, as failures to retrieve an expected MSR
value weren't accounted for.

Apply different logic for unexpectedly low vs. high maximums, with
distinct warnings for each. Also add another warning about a retrieval
failure right at the outset (which also just uses the default, then).

This change fundamentally doesn't address the fact that
__SHIFTOUT(msr, MSR_TEMP_TARGET_READOUT)
doesn't necessarily return a valid value. It just restores prior
behaviour, which is more reasonable than applying a zero value, which
started happening on some older hardware. (I infer this is most likely
an issue with dated generations of Intel hardware with this feature.)

The challenge is that this evidently isn't all documented properly
anywhere. Various "magic values" in this driver need further
investigation.

While here, also fix output so warnings are cleanly formatted, rather
than the slightly scrambled way they were appearing.

Tested on older Intel hardware I had on hand:
E7500 (now falls back to default 100 rather than 0)
E5540 (successfully retrieves 97, as before)
i5-3340M (successfully retrieves 105, as before)
 1.36.4.2 29-Jul-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #254):

sys/arch/x86/x86/coretemp.c: revision 1.38-1.39 (patch)

coretemp(4): Change limits of Tjmax.
- Change the lower limit from 70 to 60. At least, some BIOSes can change
the value down to 62.
- Change the upper limit from 110 to 120. At least, some BIOSes can change
the value up to 115.
- Print error message when rdmsr(TEMPERATURE_TARGET) failed.
- When Tjmax exceeded the limit, print warning message and use the value
as it is.
- KNF.
 1.36.4.1 15-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #1009):

sys/arch/x86/x86/coretemp.c: revision 1.37

Add special handling for model 0x0f stepping >=2 or mode 0x0e to get Tjmax.
 1.38.4.2 22-Jun-2024  martin Pull up following revision(s) (requested by gutteridge in ticket #717):

sys/arch/x86/x86/coretemp.c: revision 1.40

sys/arch/x86/x86/coretemp.c: revision 1.41

coretemp.c: fix grammar in a warning message
(I get several of these warnings on boot on a particular machine. Now,
it also seems that the code isn't retrieving the correct value, either;
TBD.)

coretemp.c: don't accept impossibly low TjMax values
r. 1.39 introduced a regression where instead of applying a reasonable
default maximum (as was done prior to that change), incorrect values
were accepted and applied, as failures to retrieve an expected MSR
value weren't accounted for.

Apply different logic for unexpectedly low vs. high maximums, with
distinct warnings for each. Also add another warning about a retrieval
failure right at the outset (which also just uses the default, then).

This change fundamentally doesn't address the fact that
__SHIFTOUT(msr, MSR_TEMP_TARGET_READOUT)
doesn't necessarily return a valid value. It just restores prior
behaviour, which is more reasonable than applying a zero value, which
started happening on some older hardware. (I infer this is most likely
an issue with dated generations of Intel hardware with this feature.)

The challenge is that this evidently isn't all documented properly
anywhere. Various "magic values" in this driver need further
investigation.

While here, also fix output so warnings are cleanly formatted, rather
than the slightly scrambled way they were appearing.

Tested on older Intel hardware I had on hand:
E7500 (now falls back to default 100 rather than 0)
E5540 (successfully retrieves 97, as before)
i5-3340M (successfully retrieves 105, as before)
 1.38.4.1 29-Jul-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #254):

sys/arch/x86/x86/coretemp.c: revision 1.39

coretemp(4): Change limits of Tjmax.
- Change the lower limit from 70 to 60. At least, some BIOSes can change
the value down to 62.
- Change the upper limit from 110 to 120. At least, some BIOSes can change
the value up to 115.
- Print error message when rdmsr(TEMPERATURE_TARGET) failed.
- When Tjmax exceeded the limit, print warning message and use the value
as it is.
 1.41.2.1 02-Aug-2025  perseant Sync with HEAD
 1.214 02-May-2025  imil Add support for CPUID leaf 0x40000010 to detect TSC and LAPIC frequency on
hypervisors implementing the VMware-defined interface

This change enables virtual machines to obtain TSC and LAPIC frequency
information directly from the hypervisor via CPUID leaf 0x40000010, avoiding
the need for runtime calibration, thus reducing boot speed in supported
environments.

Tested on GENERIC and MICROVM kernels, QEMU/KVM and QEMU/NVMM (current and
10.1), Intel and AMD CPUs, NetBSD/amd64 and i386.
 1.213 11-Apr-2025  imil nvmm(4): implement CPUID leaf 0x40000010, VMware compatible TSC and LAPIC
frequency detection. Partially fixes PR kern/59170
 1.212 06-Mar-2025  imil Revert VMware-compatible TSC and LAPIC frequency detection.
 1.211 06-Mar-2025  imil Add support for CPUID leaf 0x40000010, which enables VMware-compatible TSC
and LAPIC frequency detection for virtual machines.
 1.210 22-Apr-2024  andvar branches: 1.210.2;
Surround full mp_cpu_start() method with NLAPIC > 0 guard.

Initialization is based on x86_ipi* functions, which are implemented only
when lapic flag is enabled.
 1.209 16-Jul-2023  riastradh x86: Sprinkle extensive commentary about %fs/%gs initialization.

Plus some other side quests like the three-stage GDT metamorphosis
lifecycle.

No functional change intended.
 1.208 03-Mar-2023  riastradh x86: Call fpuinit_mxcsr_mask only once.

No need to call it again and again on the secondary CPUs to compute
what should be the same mxcsr mask. (If it's not, we have deeper
problems!)
 1.207 25-Feb-2023  riastradh x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.
 1.206 24-Sep-2022  riastradh branches: 1.206.4;
x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.
 1.205 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.204 14-Aug-2022  mlelstv Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.
 1.203 01-Apr-2022  riastradh x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.
 1.202 07-Oct-2021  msaitoh KNF. No functional change.
 1.201 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.200 24-Apr-2021  thorpej branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.199 09-Oct-2020  christos branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.
 1.198 09-Aug-2020  christos move lcall sniffer to x86_machdep since xen/pv has its own cpu.c
 1.197 08-Aug-2020  christos PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.
 1.196 28-Jul-2020  fcambus Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@
 1.195 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.194 15-Jun-2020  msaitoh Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.193 13-Jun-2020  ad g/c vm_page_zero_enable
 1.192 21-May-2020  ad - Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.191 12-May-2020  msaitoh Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.
 1.190 08-May-2020  ad Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().
 1.189 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.188 29-Apr-2020  ad Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.
 1.187 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.186 23-Apr-2020  ad - Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.
 1.185 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.184 20-Apr-2020  msaitoh Whitespace fix. No functional change.
 1.183 10-Apr-2020  bouyer Revert, wrong branch
 1.182 10-Apr-2020  bouyer Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.
 1.181 14-Jan-2020  pgoyette branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9
 1.180 08-Jan-2020  ad Make "mach cpu" in ddb show the IPL for each cpu.
 1.179 20-Dec-2019  ad branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.
 1.178 07-Dec-2019  nonaka Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.
 1.177 27-Nov-2019  maxv Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();
 1.176 23-Nov-2019  ad cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.175 22-Nov-2019  ad - On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.
 1.174 05-Nov-2019  maxv Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.
 1.173 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.172 30-Aug-2019  mrg avoid misalignment in 32 bit kernels and "mach cpu".
 1.171 29-May-2019  maxv branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.170 27-May-2019  maxv Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.
 1.169 27-May-2019  maxv Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.
 1.168 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.167 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.166 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.165 14-Feb-2019  cherry Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.
 1.164 04-Dec-2018  cherry Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.
 1.163 04-Dec-2018  cherry Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.
 1.162 12-Nov-2018  maxv Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.
 1.161 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.160 26-Jul-2018  maxv Remove useless/outdated comments. No functional change.
 1.159 12-Jul-2018  maxv Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.
 1.158 22-Jun-2018  maxv branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.
 1.157 20-Jun-2018  jdolecek as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv
 1.156 19-Jun-2018  jdolecek fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8
 1.155 05-Apr-2018  maxv Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.
 1.154 04-Apr-2018  maxv Enable the SpectreV2 mitigation by default at boot time.
 1.153 28-Mar-2018  maxv Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.
 1.152 15-Mar-2018  maxv Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.
 1.151 14-Mar-2018  maxv Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.
 1.150 11-Mar-2018  maxv Explain the TSC drift thing.
 1.149 22-Feb-2018  maxv branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.
 1.148 22-Feb-2018  maxv Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.
 1.147 27-Jan-2018  maxv Add SMAP support for i386.
 1.146 11-Jan-2018  maxv Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.
 1.145 11-Jan-2018  msaitoh Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.
 1.144 07-Jan-2018  maxv Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.
 1.143 07-Jan-2018  maxv Use uvm_km_alloc instead of kmem_zalloc.
 1.142 05-Jan-2018  maxv Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.
 1.141 11-Nov-2017  maxv Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.
 1.140 11-Nov-2017  bouyer Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
 1.139 08-Nov-2017  maxv Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.
 1.138 17-Oct-2017  maxv Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.
 1.137 17-Oct-2017  maxv Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.
 1.136 28-Sep-2017  maxv Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.
 1.135 17-Sep-2017  maxv Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.
 1.134 27-Aug-2017  maxv style, and move some i386-specific code into i386/
 1.133 27-Aug-2017  maxv Localify. By the way, we should use a different stack for NMIs.
 1.132 28-Jul-2017  riastradh cpu_trace is no more, remove vestige of it that broke ALL kernel.
 1.131 10-Jun-2017  pgoyette Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch
 1.130 31-May-2017  kre branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.
 1.129 31-May-2017  kre Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.
 1.128 31-May-2017  pgoyette Remove unused variabe (I reverted too much in previous commit!)
 1.127 31-May-2017  pgoyette Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC
 1.126 31-May-2017  maya Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.
 1.125 23-May-2017  nonaka x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.124 22-Apr-2017  nonaka use CR8 instead of LAPIC Task Priority register on x86-64.
 1.123 11-Feb-2017  maxv Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.
 1.122 02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.121 16-Oct-2016  maxv branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.
 1.120 07-Jul-2016  msaitoh branches: 1.120.2;
KNF. Remove extra spaces. No functional change.
 1.119 16-Dec-2015  maxv Extend SMEP support to i386 (does not require PAE).
 1.118 13-Dec-2015  maxv Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).
 1.117 13-Dec-2015  maxv Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.116 17-Sep-2015  nat Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@
 1.115 18-May-2015  msaitoh OOOOPS. Revert previous.
 1.114 18-May-2015  msaitoh Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.
 1.113 12-Jan-2015  christos PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7
 1.112 08-Dec-2014  msaitoh Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.
 1.111 12-May-2014  joerg branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.
 1.110 25-Feb-2014  dsl branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
 1.109 19-Feb-2014  dsl Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.
 1.108 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.107 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.106 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.105 12-Nov-2013  msaitoh Revert previos. I accidentally committed a debug code. Sorry.
 1.104 12-Nov-2013  msaitoh Fix a bug in last commit. Check correct variable.
 1.103 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.102 12-Dec-2012  pgoyette branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.
 1.101 08-Dec-2012  kiyohara #ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.
 1.100 02-Jul-2012  chs branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.
 1.99 12-Jun-2012  yamt cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995
 1.98 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.97 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.96 18-Oct-2011  jruoho branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.
 1.95 17-Oct-2011  jmcneill add a "vm" device class for cpufeaturebus
 1.94 06-Oct-2011  mrg remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.
 1.93 28-Sep-2011  jruoho Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.
 1.92 11-Aug-2011  cherry Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@
 1.91 11-Aug-2011  cherry Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs
 1.90 29-Jul-2011  dyoung Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.
 1.89 22-Jun-2011  jruoho Add small comment.
 1.88 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.87 26-Feb-2011  jruoho branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.
 1.86 24-Feb-2011  jruoho Fix autoconf(9) of cpufeaturebus.
 1.85 24-Feb-2011  jruoho Move VIA_C7TEMP to the cpufeaturebus.
 1.84 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.83 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.82 20-Feb-2011  jruoho Modularize coretemp(4). Ok jmcneill@.
 1.81 19-Feb-2011  jmcneill modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.80 02-Feb-2011  bouyer Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@
 1.79 11-Jan-2011  jruoho branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().
 1.78 06-Nov-2010  uebayasi Machine dependent code is considered as part of UVM. Include
internal API header.
 1.77 20-Aug-2010  jruoho Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.76 09-Aug-2010  jruoho Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
 1.75 09-Aug-2010  jruoho Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.
 1.74 04-Aug-2010  jruoho Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.
 1.73 24-Jul-2010  jym Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.72 08-Jul-2010  rmind cpu_attach: use kmem_zalloc instead of memset.
 1.71 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.70 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.69 24-Feb-2010  dyoung branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
 1.68 09-Feb-2010  jym Wrap a comment; add a space after a comma to another (align with next line)
 1.67 09-Feb-2010  jym Use roundup2() instead of hardcoding the operation.
 1.66 08-Jan-2010  dyoung branches: 1.66.2;
Expand PMF_FN_* macros.
 1.65 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.64 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.63 27-Mar-2009  drochner Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)
 1.62 21-Jan-2009  bouyer branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.
 1.61 23-Dec-2008  cegger move from malloc to kmem
 1.60 19-Dec-2008  ad PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.59 06-Nov-2008  cegger Link cpus in the order they are attaching and not in inverse order.
 1.58 31-Oct-2008  rmind - Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.
 1.57 15-Oct-2008  ad branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.
 1.56 03-Jun-2008  jmcneill branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.
 1.55 02-Jun-2008  ad - Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.
 1.54 28-May-2008  ad Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.
 1.53 21-May-2008  ad Do the errata patchup after identifying the CPU, to avoid badly formatted
output.
 1.52 21-May-2008  ad verbose -> debug for # page colours
 1.51 14-May-2008  ad - cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.
 1.50 13-May-2008  ad Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.
 1.49 12-May-2008  ad - Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().
 1.48 12-May-2008  ad Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.
 1.47 12-May-2008  ad - Complain if unable to reset the lapic ID.
- Minor clean up.
 1.46 12-May-2008  ad cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.
 1.45 11-May-2008  ad - Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.
 1.44 11-May-2008  ad MP + apics are needed now so kill the #ifdefs
 1.43 11-May-2008  ad Don't reload LDTR unless a new value, which only happens for USER_LDT.
 1.42 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.41 11-May-2008  ad Share cpu.h between the x86 ports.
 1.40 11-May-2008  ad Simplify x86 identcpu code, and share between i386/amd64.
 1.39 10-May-2008  ad If the boot processor's lapic has the wrong ID, reset it.
 1.38 10-May-2008  ad Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.
 1.37 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.36 29-Apr-2008  ad branches: 1.36.2;
Minor correction to previous.
 1.35 29-Apr-2008  ad Recognise two new boot flags:

-1 disable MP
-2 disable ACPI
 1.34 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.33 24-Apr-2008  jmcneill branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.
 1.32 22-Apr-2008  tls Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.
 1.31 18-Apr-2008  cegger branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.
 1.30 17-Apr-2008  cegger wrap long line. Requested and OK by simonb.
 1.29 17-Apr-2008  yamt cpu_debug_dump: s/curproc/curlwp/ in a message.
 1.28 17-Apr-2008  cegger use aprint_*_dev.
OK simonb
 1.27 16-Apr-2008  cegger - use aprint_*_dev and device_xname
- use POSIX integer types
 1.26 13-Apr-2008  cegger use device accessors and other misc cleanups
 1.25 02-Apr-2008  ad Add more error reporting to AP startup.
 1.24 01-Apr-2008  ad If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.
 1.23 04-Mar-2008  cube Split device_t/softc.
 1.22 29-Feb-2008  dyoung Use PMF_FN_ARGS, PMF_FN_PROTO.
 1.21 10-Feb-2008  ad branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.
 1.20 30-Jan-2008  jmcneill pmf: Naively track online/offline state of APs during suspend/resume.
 1.19 23-Jan-2008  joerg Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.
 1.18 15-Jan-2008  joerg Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@
 1.17 14-Jan-2008  joerg Ensure that non-primary CPUs save the FPU state on suspend.
 1.16 05-Jan-2008  yamt - make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.
 1.15 04-Jan-2008  yamt i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.
 1.14 18-Dec-2007  joerg Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.
 1.13 15-Dec-2007  joerg For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.
 1.12 09-Dec-2007  jmcneill branches: 1.12.2;
Merge jmcneill-pm branch.
 1.11 04-Dec-2007  ad branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.
 1.10 02-Dec-2007  ad branches: 1.10.2;
Back out part of patch that got merged accidentally.
 1.9 02-Dec-2007  ad Use atomics to adjust ci_flags.
 1.8 14-Nov-2007  ad cpu_hatch: change lapic initialization order.
 1.7 13-Nov-2007  ad In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.
 1.6 12-Nov-2007  ad - cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.
 1.5 10-Nov-2007  ad - When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.
 1.4 18-Oct-2007  yamt branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.
 1.3 26-Sep-2007  ad branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.2 29-Aug-2007  ad branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.1 23-Aug-2007  ad branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.
 1.1.2.5 03-Dec-2007  ad Sync with HEAD.
 1.1.2.4 23-Oct-2007  ad Sync with head.
 1.1.2.3 09-Oct-2007  ad Sync with head.
 1.1.2.2 09-Oct-2007  ad Sync with head.
 1.1.2.1 23-Aug-2007  ad Merged x86 cpu.c.
 1.2.8.3 18-Oct-2007  yamt reduce #ifdef.
 1.2.8.2 06-Oct-2007  yamt sync with head.
 1.2.8.1 30-Sep-2007  yamt implement deferred pmap switching for amd64, and make amd64 use
x86 shared pmap code. it makes several i386 pmap improvements available
to amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
 1.2.6.21 09-Dec-2007  jmcneill Sync with HEAD.
 1.2.6.20 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.2.6.19 03-Dec-2007  joerg Sync with HEAD.
 1.2.6.18 14-Nov-2007  joerg Sync with HEAD.
 1.2.6.17 11-Nov-2007  joerg Sync with HEAD.
 1.2.6.16 06-Nov-2007  joerg Refactor PNP API:
- Make suspend/resume directly a device functionality. It consists of
three layers (class logic, device logic, bus logic), all of them being
optional. This replaces D0/D3 transitions.
- device_is_active returns true if the device was not disabled and was
not suspended (even partially), device_is_enabled returns true if the
device was enabled.
- Change pnp_global_transition into pnp_system_suspend and
pnp_system_resume. Before running any suspend/resume handlers, check
that all currently attached devices support power management and bail
out otherwise. The latter is not done for the shutdown/panic case.
- Make the former bus-specific generic network handlers a class handler.
- Make PNP message like volume up/down/toogle PNP events. Each device
can register what events they are interested in and whether the handler
should be global or not.
- Introduce device_active API for devices to mark themselve in use from
either the system or the device. Use this to implement the idle handling
for audio and input devices. This is intended to replace most ad-hoc
watchdogs as well.
- Fix somes situations in which audio resume would lose mixer settings.
- Make USB host controllers better deal with suspend in the light of
shared interrupts.
- Flush filesystem cache on suspend.
- Flush disk caches on suspend. Put ATA disks into standby on suspend as
well.
- Adopt drivers to use the new PNP API.
- Fix a critical bug in the generic cardbus layer that made D0->D3
break.
- Fix ral(4) to set if_stop.
- Convert cbb(4) to the new PNP API.
- Apply the PCI Express SCI fix on resume again.
 1.2.6.15 28-Oct-2007  joerg Make the reset of FS/GS base in cpu_init_msrs optional. We don't want
that in the ACPI resume path.
 1.2.6.14 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.2.6.13 02-Oct-2007  joerg Sync with HEAD.
 1.2.6.12 10-Sep-2007  joerg Introduce pmap_init_tmp_pgtbl to build a temporary copy of the kernel
side page mapping and an identity mapping low page for use in real mode.
Switch MP bootstrap and i386 ACPI wakeup code to use it.
 1.2.6.11 08-Sep-2007  joerg Sync pmap after trampoline was unmapped again.
 1.2.6.10 08-Sep-2007  joerg Fix compilation of non-MP kernels by always defining
mp_trampoline_paddr. Also make x86_mp_online unconditional as suggested
by ad@.
 1.2.6.9 08-Sep-2007  joerg ANSIfy before further changes.
 1.2.6.8 08-Sep-2007  joerg Move code to spin-up the application processors into a function.
Add code to ensure the MP_TRAMPOLINE is identity mapped, similiar
to ACPI wakecode.
 1.2.6.7 08-Sep-2007  joerg Introduce mp_trampoline_paddr and use that in place of most
MP_TRAMPOLINE variables. Use a temporary kernel mapping to copy the
trampoline in preparation for removing the identity mapping from the
normal pmap.
 1.2.6.6 04-Sep-2007  jmcneill Unset x86_mp_online before going to sleep to prevent IPIs being sent to
unconfigured cpus during resume.
 1.2.6.5 04-Sep-2007  jmcneill Slightly reorganize cpu_power, no functional change.
 1.2.6.4 03-Sep-2007  jmcneill Ignore CPUs in cpu_power where cpu_idlelwp == NULL.
 1.2.6.3 03-Sep-2007  jmcneill In cpu_power, hold cpu_lock while calling cpu_setonline. While we're here,
ignore cpus without the CPUF_PRESENT flag set.
 1.2.6.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.2.6.1 29-Aug-2007  jmcneill file cpu.c was added on branch jmcneill-pm on 2007-09-03 16:47:47 +0000
 1.2.4.9 17-Mar-2008  yamt sync with head.
 1.2.4.8 11-Feb-2008  yamt sync with head.
 1.2.4.7 04-Feb-2008  yamt sync with head.
 1.2.4.6 21-Jan-2008  yamt sync with head
 1.2.4.5 07-Dec-2007  yamt sync with head
 1.2.4.4 15-Nov-2007  yamt sync with head.
 1.2.4.3 27-Oct-2007  yamt sync with head.
 1.2.4.2 03-Sep-2007  yamt sync with head.
 1.2.4.1 29-Aug-2007  yamt file cpu.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:23 +0000
 1.2.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.2.2.1 29-Aug-2007  skrll file cpu.c was added on branch nick-csl-alignment on 2007-09-03 10:19:52 +0000
 1.3.2.3 18-Nov-2007  bouyer Sync with HEAD
 1.3.2.2 13-Nov-2007  bouyer Sync with HEAD
 1.3.2.1 25-Oct-2007  bouyer Sync with HEAD.
 1.4.4.4 23-Mar-2008  matt sync with HEAD
 1.4.4.3 09-Jan-2008  matt sync with HEAD
 1.4.4.2 06-Nov-2007  matt sync with HEAD
 1.4.4.1 18-Oct-2007  matt file cpu.c was added on branch matt-armv6 on 2007-11-06 23:23:46 +0000
 1.4.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.4.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.4.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.4.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.10.2.2 26-Dec-2007  ad Sync with head.
 1.10.2.1 08-Dec-2007  ad Sync with head.
 1.11.2.1 11-Dec-2007  yamt sync with head.
 1.12.2.3 19-Jan-2008  bouyer Sync with HEAD
 1.12.2.2 08-Jan-2008  bouyer Sync with HEAD
 1.12.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.21.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.21.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.21.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.21.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.21.2.1 24-Mar-2008  keiichi sync with head.
 1.31.2.3 17-Jun-2008  yamt sync with head.
 1.31.2.2 04-Jun-2008  yamt sync with head
 1.31.2.1 18-May-2008  yamt sync with head.
 1.33.2.5 09-Oct-2010  yamt sync with head
 1.33.2.4 11-Aug-2010  yamt sync with head.
 1.33.2.3 11-Mar-2010  yamt sync with head
 1.33.2.2 04-May-2009  yamt sync with head.
 1.33.2.1 16-May-2008  yamt sync with head.
 1.36.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.56.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.56.4.1 19-Oct-2008  haad Sync with HEAD.
 1.57.4.4 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.57.4.3 02-Feb-2009  snj branches: 1.57.4.3.2; 1.57.4.3.4;
Pull up following revision(s) (requested by ad in ticket #371):
sys/arch/x86/x86/cpu.c: revision 1.59
Link cpus in the order they are attaching and not in inverse order.
 1.57.4.2 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #343):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.14
sys/arch/x86/include/cpufunc.h: revision 1.9
sys/arch/x86/x86/identcpu.c: revision 1.12
sys/arch/x86/x86/cpu.c: revision 1.60
sys/arch/x86/x86/patch.c: revision 1.15
PR kern/40213 my i386 machine can't boot because of tsc
- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.
- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.57.4.1 13-Nov-2008  snj Pull up following revision(s) (requested by rmind in ticket #48):
sys/kern/kern_cpu.c: revision 1.37
sys/arch/x86/x86/cpu.c: revision 1.58
sys/arch/xen/x86/cpu.c: revision 1.29
sys/sys/cpu.h: revision 1.24
sys/kern/sys_sched.c: revision 1.31
- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().
Should fix PR/39349.
 1.57.4.3.4.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.57.4.3.2.1 23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.57.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.57.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.57.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.62.2.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.62.2.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.62.2.4 10-Jan-2011  jym Sync with HEAD
 1.62.2.3 24-Oct-2010  jym Sync with HEAD
 1.62.2.2 01-Nov-2009  jym Sync with HEAD.
 1.62.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.66.2.6 09-Nov-2010  uebayasi Sync with HEAD.
 1.66.2.5 09-Nov-2010  uebayasi Sync with HEAD.
 1.66.2.4 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.66.2.3 17-Aug-2010  uebayasi Sync with HEAD.
 1.66.2.2 30-Apr-2010  uebayasi Sync with HEAD.
 1.66.2.1 28-Apr-2010  uebayasi Adjustment for uvm/uvm_page.h. More to follow later.
 1.69.2.4 17-Mar-2011  rmind - Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE)
is not (yet) enabled. This fixes the issue of stale TLB entry, experienced
early on boot, when PGE is not yet set on primary CPU.
- Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly),
which simplifies and unifies (under x86) code, plus fixes few bugs.
- cpu_attach: remove assignment to cpus_running, as primary CPU might not be
attached first, which causes reset (and thus missed secondary CPUs).
 1.69.2.3 05-Mar-2011  rmind sync with head
 1.69.2.2 30-May-2010  rmind sync with head
 1.69.2.1 26-Apr-2010  rmind Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap,
also provide TLBSTATS option to measure and track TLB shootdowns. Details:

http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html

Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.

XXX: amd64 and xen are not yet; work in progress.
 1.79.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.79.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.79.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.87.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.96.8.3 05-Jul-2012  riz Pull up following revision(s) (requested by chs in ticket #400):
sys/arch/x86/x86/cpu.c: revision 1.100
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.
 1.96.8.2 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.96.8.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.96.6.2 29-Apr-2012  mrg sync to latest -current.
 1.96.6.1 18-Feb-2012  mrg merge to -current.
 1.96.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.96.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.96.2.3 30-Oct-2012  yamt sync with head
 1.96.2.2 23-May-2012  yamt sync with head.
 1.96.2.1 17-Apr-2012  yamt sync with head
 1.100.2.3 03-Dec-2017  jdolecek update from HEAD
 1.100.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.100.2.1 25-Feb-2013  tls resync with head
 1.102.2.1 18-May-2014  rmind sync with head
 1.110.2.1 10-Aug-2014  tls Rebase.
 1.111.4.7 28-Aug-2017  skrll Sync with HEAD
 1.111.4.6 05-Feb-2017  skrll Sync with HEAD
 1.111.4.5 05-Dec-2016  skrll Sync with HEAD
 1.111.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.111.4.3 22-Sep-2015  skrll Sync with HEAD
 1.111.4.2 06-Jun-2015  skrll Sync with HEAD
 1.111.4.1 06-Apr-2015  skrll Sync with HEAD
 1.111.2.3 06-Mar-2016  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67
Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.111.2.2 06-Nov-2015  riz Pull up following revision(s) (requested by nat in ticket #984):
sys/arch/x86/x86/cpu.c: revision 1.116
Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.
This commit was improved and approved by christos@
 1.111.2.1 12-Jan-2015  snj branches: 1.111.2.1.2;
Pull up following revision(s) (requested by christos in ticket #414):
sys/arch/x86/x86/cpu.c: revision 1.113
PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
 1.111.2.1.2.2 19-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.111.2.1.2.1 08-Nov-2015  riz Pull up following revision(s) (requested by nat in ticket #984):
sys/arch/x86/x86/cpu.c: revision 1.116
Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.
This commit was improved and approved by christos@
 1.120.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.120.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.120.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.121.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.130.2.10 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1593:

sys/arch/x86/conf/files.x86 1.108
sys/arch/x86/include/apicvar.h 1.7 via patch
sys/arch/x86/include/cpu.h 1.121
sys/arch/x86/x86/cpu.c 1.185 via patch
sys/arch/x86/x86/hyperv.c 1.7
sys/arch/x86/x86/tsc.c 1.41
sys/arch/xen/conf/files.xen 1.181

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.130.2.9 21-Jan-2020  martin Pull up following revision(s) (requested by pgoyette in ticket #1483):

sys/arch/x86/x86/cpu.c: revision 1.181

If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9
 1.130.2.8 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.130.2.7 07-Aug-2018  martin Pull up following revision(s) (requested by maxv in ticket #960):

sys/arch/x86/x86/cpu.c: revision 1.159

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.
 1.130.2.6 14-Apr-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #748:

sys/arch/amd64/amd64/copy.S 1.29 (adapted, via patch)
sys/arch/amd64/amd64/amd64_trap.S 1.16,1.19 (partial) (via patch)
sys/arch/amd64/amd64/trap.c 1.102,1.106 (partial),1.110 (via patch)
sys/arch/amd64/include/frameasm.h 1.22,1.24 (via patch)
sys/arch/x86/x86/cpu.c 1.137 (via patch)
sys/arch/x86/x86/patch.c 1.23,1.26 (partial) (via patch)

Backport of SMAP support.
 1.130.2.5 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.130.2.4 16-Mar-2018  martin Pull up the following revisions (via patch), requested by maxv in #635:

sys/arch/amd64/amd64/gdt.c 1.39-1.45 (patch)
sys/arch/amd64/amd64/amd64/machdep.c 1.284,1.287,1.288 (patch)
sys/arch/amd64/amd64/include/param.h 1.23 (patch)
sys/arch/amd64/include/types.h 1.53 (patch)
sys/arch/x86/include/cpu.h 1.87 (patch)
sys/arch/x86/include/pmap.h 1.73,1.74 (patch)
sys/arch/x86/x86/cpu.c 1.142 (patch)
sys/arch/x86/x86/intr.c 1.117 (partial),1.120 (patch)
sys/arch/x86/x86/pmap.c 1.276 (patch)

Initialize ist0 in cpu_init_tss.
Backport __HAVE_PCPU_AREA.
 1.130.2.3 08-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #611):
sys/arch/x86/x86/cpu.c: revision 1.134 (patch)
sys/arch/x86/include/cpu.h: revision 1.78 (patch)
sys/arch/i386/i386/machdep.c: revision 1.792 (patch)

style, and move some i386-specific code into i386/
 1.130.2.2 07-Mar-2018  martin Pull up the following revisions (via patch), requested by maxv in ticket #610:

sys/arch/amd64/amd64/amd64_trap.S 1.8,1.10,1.12 (partial),1.13-1.15,
1.19 (partial),1.20,1.21,1.22,1.24
(via patch)
sys/arch/amd64/amd64/locore.S 1.129 (partial),1.132 (via patch)
sys/arch/amd64/amd64/trap.c 1.97 (partial),1.111 (via patch)
sys/arch/amd64/amd64/vector.S 1.54,1.55 (via patch)
sys/arch/amd64/include/frameasm.h 1.21,1.23 (via patch)
sys/arch/x86/x86/cpu.c 1.138 (via patch)
sys/arch/xen/conf/Makefile.xen 1.45 (via patch)

Rename and reorder several things in amd64_trap.S.
Compile amd64_trap.S as a file.
Introduce nmitrap and doubletrap.
Have the CPU clear PSL_D automatically in the syscall entry point.
 1.130.2.1 14-Jun-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #28):
sys/arch/x86/x86/cpu.c: revision 1.131
Further reduce the loop counter so that hatching completes before the
boot processor times us out.
Add a nice big XXX comment for why the counter is so low.
 1.149.2.9 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.149.2.8 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.149.2.7 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.149.2.6 28-Jul-2018  pgoyette Sync with HEAD
 1.149.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.149.2.4 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.149.2.3 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.149.2.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.149.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.158.2.3 21-Apr-2020  martin Sync with HEAD
 1.158.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.158.2.1 10-Jun-2019  christos Sync with HEAD
 1.171.2.3 02-Aug-2020  martin Apply patch, requested by msaitoh in ticket #1031:

sys/arch/x86/x86/cpu.c patch

Fix a panic on a CPU which has no rdtsc instruction. This bug was
added in ticket #1015.
 1.171.2.2 15-Jul-2020  martin Pull up the following, requested by msaitoh in ticket #1015

sys/arch/x86/conf/files.x86 1.108 (via patch)
sys/arch/x86/include/apicvar.h 1.7 (via patch)
sys/arch/x86/include/cpu.h 1.121 (via patch)
sys/arch/x86/x86/cpu.c 1.185 (via patch)
sys/arch/x86/x86/hyperv.c 1.7 (via patch)
sys/arch/x86/x86/tsc.c 1.41 (via patch)
sys/arch/xen/conf/files.xen 1.181 (via patch)

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.171.2.1 21-Jan-2020  martin Pull up following revision(s) (requested by pgoyette in ticket #623):

sys/arch/x86/x86/cpu.c: revision 1.181

If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9
 1.179.2.1 17-Jan-2020  ad Sync with head.
 1.181.4.5 25-Apr-2020  bouyer sync with bouyer-xenpvh-base2 (HEAD)
 1.181.4.4 20-Apr-2020  bouyer Sync with HEAD
 1.181.4.3 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.181.4.2 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.181.4.1 10-Apr-2020  bouyer Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.
 1.199.4.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.200.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.206.4.1 04-Aug-2025  martin Pull up following revision(s) (requested by rin in ticket #1145):

sys/arch/x86/x86/cpu.c: revision 1.208

x86: Call fpuinit_mxcsr_mask only once.

No need to call it again and again on the secondary CPUs to compute
what should be the same mxcsr mask. (If it's not, we have deeper
problems!)
 1.210.2.1 02-Aug-2025  perseant Sync with HEAD
 1.23 01-Aug-2024  riastradh x86/cpu_rng.c: Archive more links.

Why do major hardware manufacturers consistently seem to think links
should just stop working after a year or two?

No functional chang intended, only comments.
 1.22 31-Jul-2024  riastradh x86/cpu_rng.c: Add reference for Intel's hardware design.

Not normative, unverifiable, possibly outdated -- but still a useful
description of a model of what Intel might have implemented under the
hood of RDRAND/RDSEED.

No functional change.
 1.21 09-Jun-2024  riastradh branches: 1.21.2;
x86/cpu_rng: Fix false alarm rate of CPU RNG health test.

Lower it from 1/2^32 (about one in four billion) to 1/2^256
(approximately not gonna happen squared).

PR port-amd64/58122
 1.20 07-Oct-2021  msaitoh branches: 1.20.4;
KNF. No functional change.
 1.19 30-Jul-2020  riastradh Cite Cryptography Research evaluation of VIA RNG and give live URL.

(URL verified to be archived in the Internet Archive for posterity)
 1.18 25-Jul-2020  riastradh Tweak VIA CPU RNG.

- Cite source for documentation.
- Omit needless kpreempt_disable/enable.
- Explain what's going on.
- Use "D"(out) rather than "+D"(out) -- no REP so no register update.
- Fix interpretation of number of bytes returned.

The last one is likely to address

[ 4.0518619] aes: VIA ACE
....
[ 11.7018582] cpu_rng via: failed repetition test
[ 12.4718583] entropy: ready

reported by Andrius V.
 1.17 15-Jun-2020  riastradh Count down bits of entropy, not bits of data, in x86 cpu_rng.

Fixes logic in this loop for XSTORERNG on VIA CPUs, which are deemed
to have half the entropy per bit of data as RDSEED on Intel CPUs, so
that it gathers enough entropy on the first request, not on the
second request.
 1.16 15-Jun-2020  riastradh Use x86_read_psl/x86_disable_intr/x86_read_psl to defer interrupts.

Using x86_disable_intr/x86_enable_intr causes a bit of a snag when we
try it early at boot before we're ready to handle interrupts, because
it has the effect of enabling interrupts!

Fixes instant reset at boot on VIA CPUs. The instant reset on boot
is new since the entropy rework, which initialized the x86 CPU RNG
earlier than before, but in principle this could also cause other
problems while not early at boot too.

XXX pullup
 1.15 05-Jun-2020  kamil Change const unsigned to preprocessor define

Fixes GCC -O0 build with the stack protector.
 1.14 10-May-2020  maxv Reintroduce cpu_rng_early_sample(), but this time with embedded detection
for RDRAND/RDSEED, because TSC is not very strong.
 1.13 30-Apr-2020  riastradh rnd_attach_source calls the callback itself now.

No need for every driver to explicitly call it to prime the pool.

Eliminate now-unused <sys/rndpool.h>.
 1.12 30-Apr-2020  riastradh Omit needless #include <sys/rnd.h>.
 1.11 30-Apr-2020  riastradh Simplify Intel RDRAND/RDSEED and VIA C3 RNG API.

Push it all into MD x86 code to keep it simpler, until we have other
examples on other CPUs. Simplify RDSEED-to-RDRAND fallback.
Eliminate cpu_earlyrng in favour of just using entropy_extract, which
is available early now.
 1.10 01-Nov-2019  taca Check CPU support of RDRAND before calling cpu_rng_rdrand().

cpu_earlyrng() checks CPU support of RDSEED and RDRAND before calling
cpu_rng_rdseed() and cpu_rng_rdrand().

But cpu_rng_rdseed() did not check CPU support of RDRAND and system had
crashed on such an environment. There is no such case with real CPU but
some VM environment.

Fix kern/54655 and confirmed by msaitoh@.

Needs pullup to netbsd-9.
 1.9 22-Aug-2018  maxv branches: 1.9.4;
Add support for monitoring the stack with kASan. This allows us to detect
illegal memory accesses occuring there.

The compiler inlines a piece of code in each function that adds redzones
around the local variables and poisons them. The illegal accesses are then
detected using the usual kASan machinery.

The stack size is doubled, from 4 pages to 8 pages.

Several boot functions are marked with the __noasan flag, to prevent the
compiler from adding redzones in them (because we haven't yet initialized
kASan). The kasan_early_init function is called early at boot time to
quickly create the shadow for the current stack; after this is done, we
don't need __noasan anymore in the boot path.

We pass -fasan-shadow-offset=0xDFFF900000000000, because the compiler
wants to do
shad = shadow-offset + (addr >> 3)
and we do, in kasan_addr_to_shad
shad = KASAN_SHADOW_START + ((addr - CANONICAL_BASE) >> 3)
hence
shad = KASAN_SHADOW_START + (addr >> 3) - (CANONICAL_BASE >> 3)
= [KASAN_SHADOW_START - (CANONICAL_BASE >> 3)] + (addr >> 3)
implies
shadow-offset = KASAN_SHADOW_START - (CANONICAL_BASE >> 3)
= 0xFFFF800000000000 - (0xFFFF800000000000 >> 3)
= 0xDFFF900000000000

In UVM, we add a kasan_free (that is not preceded by a kasan_alloc). We
don't add poisoned redzones ourselves, but all the functions we execute
do, so we need to manually clear the poison before freeing the stack.

With the help of Kamil for the makefile stuff.
 1.8 21-Jul-2018  maxv Forgot to commit a change in i386/cpufunc.S; add rdtsc(), so that it can be
used in cpu_rng. Restore the cpu_rng code back to how it was in my initial
commit.
 1.7 21-Jul-2018  kre Unbreak build. Fake out (ie: remove) rdtsc() which does not
exist on XEN (or not yet anyway).

This change needs to be reverted when a proper solution ic implemented.
 1.6 21-Jul-2018  maxv More ASLR. Randomize the location of the direct map at boot time on amd64.
This doesn't need "options KASLR" and works on GENERIC. Will soon be
enabled by default.

The location of the areas is abstracted in a slotspace structure. Ideally
we should always use this structure when touching the L4 slots, instead of
the current cocktail of global variables and constants.

machdep initializes the structure with the default values, and we then
randomize its dmap entry. Ideally machdep should randomize everything at
once, but in the case of the direct map its size is determined a little
later in the boot procedure, so we're forced to randomize its location
later too.
 1.5 29-Feb-2016  riastradh branches: 1.5.2; 1.5.12; 1.5.18; 1.5.20; 1.5.22;
Let the compiler decide whether to inline.

Works around ICE in PCC for now:

/home/riastradh/netbsd/current/src/sys/arch/x86/x86/cpu_rng.c, line 195: bad xasm node type 23
/home/riastradh/netbsd/current/src/sys/arch/x86/x86/cpu_rng.c, line 195: bad xasm node type 23
internal compiler error: /home/riastradh/netbsd/current/src/sys/arch/x86/x86/cpu_rng.c, line 195

This code is not performance-critical.
 1.4 28-Feb-2016  riastradh KNF. No functional change.
 1.3 27-Feb-2016  tls Remove callout-based RNG support in VIA crypto driver; add VIA RNG backend for cpu_rng.
 1.2 27-Feb-2016  tls Add RDSEED and RDRAND backends for cpu_rng on amd64 and i386.
 1.1 27-Feb-2016  tls Add cpu_rng, a framework for simple on-CPU random number generators.
 1.5.22.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5.22.1 10-Jun-2019  christos Sync with HEAD
 1.5.20.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.5.20.1 28-Jul-2018  pgoyette Sync with HEAD
 1.5.18.2 03-Dec-2017  jdolecek update from HEAD
 1.5.18.1 29-Feb-2016  jdolecek file cpu_rng.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.5.12.1 20-Jun-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1560):

sys/arch/x86/x86/cpu_rng.c: revision 1.16

Use x86_read_psl/x86_disable_intr/x86_read_psl to defer interrupts.

Using x86_disable_intr/x86_enable_intr causes a bit of a snag when we
try it early at boot before we're ready to handle interrupts, because
it has the effect of enabling interrupts!

Fixes instant reset at boot on VIA CPUs. The instant reset on boot
is new since the entropy rework, which initialized the x86 CPU RNG
earlier than before, but in principle this could also cause other
problems while not early at boot too.

XXX pullup
 1.5.2.2 19-Mar-2016  skrll Sync with HEAD
 1.5.2.1 29-Feb-2016  skrll file cpu_rng.c was added on branch nick-nhusb on 2016-03-19 11:30:07 +0000
 1.9.4.2 20-Jun-2020  martin Pull up following revision(s) (requested by riastradh in ticket #960):

sys/arch/x86/x86/cpu_rng.c: revision 1.16

Use x86_read_psl/x86_disable_intr/x86_read_psl to defer interrupts.

Using x86_disable_intr/x86_enable_intr causes a bit of a snag when we
try it early at boot before we're ready to handle interrupts, because
it has the effect of enabling interrupts!

Fixes instant reset at boot on VIA CPUs. The instant reset on boot
is new since the entropy rework, which initialized the x86 CPU RNG
earlier than before, but in principle this could also cause other
problems while not early at boot too.

XXX pullup
 1.9.4.1 01-Nov-2019  martin Pull up following revision(s) (requested by taca in ticket #390):

sys/arch/x86/x86/cpu_rng.c: revision 1.10

Check CPU support of RDRAND before calling cpu_rng_rdrand().
cpu_earlyrng() checks CPU support of RDSEED and RDRAND before calling
cpu_rng_rdseed() and cpu_rng_rdrand().

But cpu_rng_rdseed() did not check CPU support of RDRAND and system had
crashed on such an environment. There is no such case with real CPU but
some VM environment.

Fix kern/54655 and confirmed by msaitoh@.
Needs pullup to netbsd-9.
 1.20.4.1 23-Aug-2024  martin Pull up following revision(s) (requested by riastradh in ticket #799):

sys/arch/x86/x86/cpu_rng.c: revision 1.21

x86/cpu_rng: Fix false alarm rate of CPU RNG health test.

Lower it from 1/2^32 (about one in four billion) to 1/2^256
(approximately not gonna happen squared).

PR port-amd64/58122
 1.21.2.1 02-Aug-2025  perseant Sync with HEAD
 1.21 12-Oct-2022  msaitoh Use macros. No functional change.
 1.20 27-Oct-2021  mrg decode SMT parts for AMD family >= 0x17, not just 0x17.

now zen3 systems are properly identified by cpu topology for the
scheduler and cpuctl identify.
 1.19 15-Feb-2020  skrll Remove the 'slow' argument from cpu_topology_set and create a new
function cpu_topology_setspeed which sets the relative speed of the
cpu.

This allows cpu_topology_set is be used at cpu hatch time. The relative
speed is only known once all cpus have hatched/attached

OK ad@
 1.18 20-Jan-2020  mlelstv assert smt_bits value only after it is computed.
 1.17 09-Jan-2020  ad - Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).
 1.16 20-Dec-2019  ad branches: 1.16.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.
 1.15 02-Dec-2019  ad Take the basic CPU topology information we already collect, and use it
to make circular lists of CPU siblings in the same core, and in the
same package. Nothing fancy, just enough to have a bit of fun in the
scheduler trying out different tactics.
 1.14 21-Nov-2018  msaitoh branches: 1.14.4;
- AMD also reports CPUID 7's highest subleaf. Print it.
- Use macro.
 1.13 28-Jan-2018  mlelstv branches: 1.13.2; 1.13.4;
Compute Core/SMT-IDs for AMD family 17h (Ryzen).
 1.12 28-Jan-2018  mlelstv CPUID tells the ApicIdCoreIdSize in bits.
 1.11 28-Jan-2018  mlelstv Check for undefined behaviour when doing right-shift.
 1.10 07-Sep-2017  msaitoh Define CPUID Fn00000001 %ebx bits and use them. No functional change.
 1.9 22-Feb-2014  dsl branches: 1.9.4; 1.9.22;
Re-use the unused ci_cpu_serial[3] to save the highest cpuid values
for the normal and extended leafs.
(The 'normal' one might be luring in the global cpulevel.)
Read the 'extended feature' from cpuid.80000001.%ecx/edx into
ci_feat_val[3/2] just after saving cpuid.1.%ecx/dx in ci_feat_val[1/0]
instead of doing it separately for amd k678 and via c3 processors
in their probe functions and repeating it for all cpus a few instructions
later when x86_cpu_topology() is called.
x86_cpu_topology() is only called from cpu_probe() and really doesn't
deserve its own source file. Chasing the setup code is bad enough anyway.
 1.8 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.7 12-Nov-2013  msaitoh Fix calculation of the cpu family (display family) in x86_cpu_topology().
More than bit 3 in cpu_family variable is checked in the function, so the
variable is assumed that it is not the base family but the display family
(base family + extended family).
 1.6 29-May-2010  rmind branches: 1.6.8; 1.6.18; 1.6.22;
Rename ci_node_id to ci_package_id, as some claim that the former might
be confused with NUMA node.
 1.5 09-May-2010  rmind Drop x86 MD package/core/smt IDs and use MI.
 1.4 18-Apr-2010  jym branches: 1.4.2;
This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.3 18-Jan-2010  rmind branches: 1.3.2; 1.3.4;
x86_cpu_topology, not toplogy.
 1.2 26-May-2009  rmind branches: 1.2.2;
Add CPU topology detection support for AMD processors.
Tested on the following AMD CPUs:
- Family 15, model 65
- Family 15, model 67
- Family 15, model 75
- Family 16, model 2
- Family 17, model 3

Reviewed (slightly older version of patch) by <yamt>.
 1.1 30-Apr-2009  rmind branches: 1.1.2; 1.1.4;
Move x86 CPU topology detection code into the separate file (as it was originally).
OK by <yamt>.
 1.1.4.5 24-Oct-2010  jym Sync with HEAD
 1.1.4.4 01-Nov-2009  jym Sync with HEAD.
 1.1.4.3 31-May-2009  jym Sync with HEAD.
 1.1.4.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.4.1 30-Apr-2009  jym file cpu_topology.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.1.2.5 11-Aug-2010  yamt sync with head.
 1.1.2.4 11-Mar-2010  yamt sync with head
 1.1.2.3 20-Jun-2009  yamt sync with head
 1.1.2.2 04-May-2009  yamt sync with head.
 1.1.2.1 30-Apr-2009  yamt file cpu_topology.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.2.2.4 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.2.2.3 16-Jun-2009  snj Pull up following revision(s) (requested by rmind in ticket #789):
sys/arch/x86/include/specialreg.h: revision 1.36
sys/arch/x86/x86/cpu_topology.c: revision 1.2
Add CPU topology detection support for AMD processors.
Tested on the following AMD CPUs:
- Family 15, model 65
- Family 15, model 67
- Family 15, model 75
- Family 16, model 2
- Family 17, model 3
Reviewed (slightly older version of patch) by <yamt>.
 1.2.2.2 16-Jun-2009  snj Pull up following revision(s) (requested by rmind in ticket #782):
sys/arch/x86/conf/files.x86: revision 1.52 via patch
sys/arch/x86/include/cpu.h: revision 1.17
sys/arch/x86/x86/cpu_topology.c: revision 1.1
sys/arch/x86/x86/identcpu.c: revision 1.16 via patch
Move x86 CPU topology detection code into the separate file (as it was
originally).
OK by <yamt>.
 1.2.2.1 26-May-2009  snj file cpu_topology.c was added on branch netbsd-5 on 2009-06-16 02:19:44 +0000
 1.3.4.1 30-May-2010  rmind sync with head
 1.3.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.3.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.4.2.2 21-Apr-2010  matt sync to netbsd-5
 1.4.2.1 18-Apr-2010  matt file cpu_topology.c was added on branch matt-nb5-mips64 on 2010-04-21 00:33:46 +0000
 1.6.22.1 18-May-2014  rmind sync with head
 1.6.18.2 03-Dec-2017  jdolecek update from HEAD
 1.6.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.22.3 04-Dec-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1120):

usr.sbin/cpuctl/arch/i386.c: revision 1.85
usr.sbin/cpuctl/arch/i386.c: revision 1.86
usr.sbin/cpuctl/arch/i386.c: revision 1.87
usr.sbin/cpuctl/arch/i386.c: revision 1.88
usr.sbin/cpuctl/arch/i386.c: revision 1.89
usr.sbin/cpuctl/arch/i386.c: revision 1.90
sys/arch/x86/include/specialreg.h: revision 1.132
sys/arch/x86/include/specialreg.h: revision 1.133
sys/arch/x86/include/specialreg.h: revision 1.134
sys/arch/x86/include/specialreg.h: revision 1.135
sys/arch/x86/include/specialreg.h: revision 1.136
sys/arch/x86/x86/cpu_topology.c: revision 1.14

Add MAWAU (for BND{LD,ST}X instruction) from the latest Intel SDM.

Whitespace fix. No functional change.

Modify comment. No functional change:
- AMD also has CPUID 0x06 and 0x0d.
- PCOMMIT was obsoleted.
- Use ci_feat_val[7] as CPUID 7 %edx to match x86/cpu.h
- AMD also has CPUID 6.
- Remove unused code for coretemp.
- Consistently use descs[] instead of data[].
- AMD also reports CPUID 7's highest subleaf. Print it.
- Use macro.
Add Intel CPUID Extended Topology Enumeration Fn0000000b definitions.
Decode package, core and SMT id if CPUID 0x0b is available on Intel processor.

If the value is different from the kernel value, we should fix the kernel code.

TODO: Use 0x1f if it's available.

Add Intel/AMD MONITOR/MWAIT leaf.
Decode Intel/AMD MONITOR/MWAIT leaf.

Add Intel CPUID Architectural Performance Monitoring leaf Fn0000000a.

Print Intel CPUID Architectural Performance Monitoring leaf Fn0000000a.
 1.9.22.2 09-Apr-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #717):

sys/arch/x86/x86/cpu_topology.c: revision 1.11-1.13

Check for undefined behaviour when doing right-shift.

CPUID tells the ApicIdCoreIdSize in bits.

Compute Core/SMT-IDs for AMD family 17h (Ryzen).
 1.9.22.1 21-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #365):
sys/arch/x86/include/specialreg.h: revision 1.99
usr.sbin/cpuctl/arch/i386.c: revision 1.75
usr.sbin/cpuctl/arch/i386.c: revision 1.76
usr.sbin/cpuctl/arch/i386.c: revision 1.77
usr.sbin/cpuctl/arch/i386.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.56
sys/arch/x86/x86/identcpu.c: revision 1.57
sys/arch/x86/x86/cpu_topology.c: revision 1.10
sys/arch/x86/include/specialreg.h: revision 1.100
sys/arch/x86/include/specialreg.h: revision 1.101
sys/arch/x86/include/specialreg.h: revision 1.102
sys/arch/x86/include/specialreg.h: revision 1.103
sys/arch/x86/include/specialreg.h: revision 1.104
sys/arch/x86/include/specialreg.h: revision 1.105
Add EFER_TCE. This would be an interesting feature to have, since it
reduces the indirect cost of invlpg; but I'm not convinced the way we
flush upper-levels is correct for this yet.
Fix typo in comment
Add a comment about APICBASE_PHYSADDR. Has to do with PR/42597.
Define CPUID Fn00000001 %ebx bits and use them. No functional change.
Set ci->ci_cflush_lsize correctly. This bug was added in the last commit(1.56).
Add the following instruction bits in Structured Extended Flags Enumeration
Leaf from "Intel Architecture Instruction Set Extensions and Future Features
Programming Reference" (319433-030):
AVX512_IFMA
AVX512_VBMI
AVX512_VBMI2
GFNI
VAES
VPCLMULQDQ
AVX512_VNNI
AVX512_BITALG
AVX512_VPOPCNTDQ
AVX512_4VNNIW
AVX512_4FMAPS
- Print ci_feat_val[5] (Structured Extended Feature leaf Fn0000_0007 %ebx) on
AMD, too.
- Print ci_feat_val[6] (Fn0000_0007 %ecx) on Intel.
Update from the latest Intel SDM:
0x5c: Atom (Goldmont)
0x5f: Atom (Goldmont, Denverton)
0x7a: Atom (Goldmont Plus)
Add Turbo Boost Max Technology 3.0 bit.
Update from Intel SDM:
0x55: Xeon Scalable (Skylake)
0x57: Xeon Phi [357]200 (Knights Landing)
0x66: Future Core (Cannon Lake)
0x85: Future Xeon Phi (Knights Mill)
Add the following bits in AMD Fn8000000a %edx features (SVM features):
PFThreshold (PAUSE filter threshold)
AVIC (AMD virtual interrupt controller)
V_VMSAVE_VMLOAD (virtualized VMSAVE and VMLOAD)
vGIF (virtualized GIF)
 1.9.4.1 09-Oct-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #1636):
sys/arch/x86/include/cacheinfo.h: 1.23-1.26
sys/arch/x86/include/cpu.h: 1.70
sys/arch/x86/include/specialreg.h: 1.91-1.93,1.98,1.100,1.102-1.124,1.126,1.130 via patch
sys/arch/x86/x86/cpu_topology.c: 1.10
sys/arch/x86/x86/identcpu.c: 1.56-1.57,1.70 via patch
usr.sbin/cpuctl/arch/i386.c: 1.71,1.75-1.79,1.81-1.85 via patch
Add some register definitions for x86:
- Add CLWB bit.
- Fix a few (unused) MSR values, and add some bit definitions of
MSR_EFER from Murray Armfield in PR#42861.
- CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify
comments and snprintb() string.
- Define CPUID Fn00000001 %ebx bits and use them.
No functional change.
- Add Structured Extended Flags Enumeration Leaf's bit definitions:
AVX512_{IFMA,VBMI2,VNNI,BITALG,VPOPCNTDQ,4VNNIW,4FMAPS},GFNI&VAES.
- Add Turbo Boost Max Technology 3.0 bit.
- Add AMD SVM features definitions.
- Add Intel cpuid 7 %edx IBRS and STIBP bit definitions.
- Fix swapped comments for EFER LME and LMA
- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add MSR_IA32_ARCH_CAPABILITIES definition.
- Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.
- Add Intel Deterministic Address Translation Parameter Leaf(0x18)
definitions.
- s/CLFUSH/CLFLUSH/
- Add AMD's Disable Indirect Branch Predictor bit definition.
- Add the MSR bits definitions for IBRS, STIBP and IBPB.
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.
- Add AMD's CPUID Fn80000001 %edx MMX and FXSR bit definitions.
- Add RDCL_NO and IBRS_ALL.
- Add SSBD and RSBA bit definitions.
- Add AMD's SSB bit definitions for F15H, F16H and F17H.
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
- Add yet another Shared L2 TLB (2M/4M pages).
- Add 3way and 6way of L2 cache or TLB on AMD CPU.
- AMD L3 cache association bitfield is not 8bit but 4bit like others
association bitfields.
- Sort entries. No functional change.
- Modify comment, fix typo in comment and add comment.
cpuctl(8):
- Add detection for Quark X1000, Xeon E5 v4, E7 v4,
Core i7-69xx Extreme Edition, Xeon Scalable (Skylake),
Xeon Phi [357]200 (Knights Landing), Atom (Goldmont),
Atom (Denverton), Future Core (Cannon Lake), Atom (Goldmont Plus),
Xeon Phi 7215, 7285 and 7295 (Knights Mill) and
7th or 8th gen Core (Kaby Lake, Coffee Lake).
- Print Structured Extended Feature leaf Fn0000_0007 %ebx on AMD,too.
- Print Fn0000_0007 %ecx on Intel.
- Print Intel cpuid 7 %edx.
- Parse the TLB info from `cpuid leaf 18H' on Intel processor.
- Use aprint_error_dev() for error output.
 1.13.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.13.4.1 10-Jun-2019  christos Sync with HEAD
 1.13.2.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.14.4.2 22-Nov-2021  martin Pull up following revision(s) (requested by mrg in ticket #1375):

usr.sbin/cpuctl/arch/i386.c: revision 1.123
sys/arch/x86/x86/cpu_topology.c: revision 1.20

decode SMT parts for AMD family >= 0x17, not just 0x17.

now zen3 systems are properly identified by cpu topology for the
scheduler and cpuctl identify.
 1.14.4.1 25-May-2020  martin Pull up following revision(s) (requested by mlelstv in ticket #922):

sys/arch/x86/x86/cpu_topology.c: revision 1.18

assert smt_bits value only after it is computed.
 1.16.2.3 29-Feb-2020  ad Sync with head.
 1.16.2.2 25-Jan-2020  ad Sync with head.
 1.16.2.1 17-Jan-2020  ad Sync with head.
 1.13 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.12 11-Feb-2019  cherry branches: 1.12.10;
We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.11 02-Feb-2019  cherry Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.10 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.9 28-Mar-2018  maxv branches: 1.9.2;
Add 'break', otherwise we're not gonna go very far. While here use a less
error-prone syntax.
 1.8 18-Mar-2018  christos Separate the compat code in its own file to facilitate module building.
 1.7 17-Mar-2018  christos dedup and handle XEN here.
 1.6 17-Mar-2018  christos tuck in all the compat microcode code in one place.
 1.5 07-Jan-2015  ozaki-r branches: 1.5.16;
Pass a correct firmware size (instead of 0) to firmware_free

firmware_free now uses kmem_free(9) instead of free(9),
so we need to pass a correct size to it.
 1.4 06-Jul-2013  gdt branches: 1.4.8;
Add #endif comments (only).
 1.3 17-Oct-2012  drochner branches: 1.3.2;
put binary compatibility support for the old AMD-only CPU microcode
update API inside COMPAT_60
 1.2 29-Aug-2012  drochner branches: 1.2.2;
Extend the CPU microcode update framework to support Intel x86 CPUs.
Contrary to the AMD implementation, it doesn't use xcalls to distribute
the update to all CPUs but relies on cpuctl(8) to bind itself to the
right CPU -- to keep it simple and avoid possible problems with
hyperthreading.
Also, it doesn't parse the vendor supplied file to pick the right
part for the present CPU model but relies on userland to prepare
files with specific filenames. I'll commit a pkg for this in a minute
(pkgsrc/sysutils/intel-microcode).
The ioctl interface changed; compatibility is provided (should be
limited to COMPAT_NETBSD6 as soon as this is available).
 1.1 13-Jan-2012  cegger branches: 1.1.4; 1.1.6;
Support CPU microcode loading via cpuctl(8).
Implemented and enabled via CPU_UCODE kernel config option
for x86 and Xen Dom0.
Tested on different AMD machines with different
CPU families.

ok wiz@ for the manpages
ok releng@
ok core@ via releng@
 1.1.6.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.6.3 30-Oct-2012  yamt sync with head
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 13-Jan-2012  yamt file cpu_ucode.c was added on branch yamt-pagecache on 2012-04-17 00:07:06 +0000
 1.1.4.2 18-Feb-2012  mrg merge to -current.
 1.1.4.1 13-Jan-2012  mrg file cpu_ucode.c was added on branch jmcneill-usbmp on 2012-02-18 07:33:36 +0000
 1.2.2.3 03-Dec-2017  jdolecek update from HEAD
 1.2.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.3.2.1 28-Aug-2013  rmind sync with head
 1.4.8.1 06-Apr-2015  skrll Sync with HEAD
 1.5.16.7 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.5.16.6 28-Mar-2018  pgoyette Track changes from HEAD
 1.5.16.5 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.5.16.4 18-Mar-2018  pgoyette Import more christos@ changes from -current
 1.5.16.3 17-Mar-2018  pgoyette Import christos's changes for the compat_60 cpu_ucode stuff
 1.5.16.2 17-Mar-2018  pgoyette Back out changes on the branch related to kernel microcode compat.

Christos didn't like the way it was done, so waiting for a better
approach/implementation.
 1.5.16.1 17-Mar-2018  pgoyette Don't try to include opt_*.h files if we're not being built as part
of a kernel (these files only exist for kernel builds).

Don't compile non-compat code if we're not building a module. (This
file is built for both built-in kernel ucode support and for compat
support.)
 1.9.2.1 10-Jun-2019  christos Sync with HEAD
 1.12.10.1 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.11 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.10 15-Oct-2019  chs branches: 1.10.6;
convert more KM_NOSLEEP to KM_SLEEP and remove code to handle failures.
 1.9 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.8 17-Mar-2018  christos branches: 1.8.2;
tuck in all the compat microcode code in one place.
 1.7 15-Nov-2013  msaitoh branches: 1.7.28;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.6 06-Jul-2013  gdt Add #endif comments (only).
 1.5 17-Oct-2012  drochner branches: 1.5.2;
put binary compatibility support for the old AMD-only CPU microcode
update API inside COMPAT_60
 1.4 29-Aug-2012  drochner branches: 1.4.2;
Extend the CPU microcode update framework to support Intel x86 CPUs.
Contrary to the AMD implementation, it doesn't use xcalls to distribute
the update to all CPUs but relies on cpuctl(8) to bind itself to the
right CPU -- to keep it simple and avoid possible problems with
hyperthreading.
Also, it doesn't parse the vendor supplied file to pick the right
part for the present CPU model but relies on userland to prepare
files with specific filenames. I'll commit a pkg for this in a minute
(pkgsrc/sysutils/intel-microcode).
The ioctl interface changed; compatibility is provided (should be
limited to COMPAT_NETBSD6 as soon as this is available).
 1.3 10-May-2012  cegger xc_wait() does not wait for all cpus to finish
their callback. That means the ucode buffer is released while still in use
and this causes a crash.
Quick fix: check if the ucode buffer has been freed and abort.
You may need to run 'cpuctl ucode' twice to apply it to all cpus.

Per discussion with rmind@ use low priority xcalls and splhigh.
 1.2 09-May-2012  cegger fix crash when booting with -x.
 1.1 13-Jan-2012  cegger branches: 1.1.2; 1.1.4; 1.1.6;
Support CPU microcode loading via cpuctl(8).
Implemented and enabled via CPU_UCODE kernel config option
for x86 and Xen Dom0.
Tested on different AMD machines with different
CPU families.

ok wiz@ for the manpages
ok releng@
ok core@ via releng@
 1.1.6.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.6.4 30-Oct-2012  yamt sync with head
 1.1.6.3 23-May-2012  yamt sync with head.
 1.1.6.2 17-Apr-2012  yamt sync with head
 1.1.6.1 13-Jan-2012  yamt file cpu_ucode_amd.c was added on branch yamt-pagecache on 2012-04-17 00:07:06 +0000
 1.1.4.3 02-Jun-2012  mrg sync to latest -current.
 1.1.4.2 18-Feb-2012  mrg merge to -current.
 1.1.4.1 13-Jan-2012  mrg file cpu_ucode_amd.c was added on branch jmcneill-usbmp on 2012-02-18 07:33:36 +0000
 1.1.2.1 17-May-2012  riz Pull up following revision(s) (requested by cegger in ticket #248):
sys/arch/x86/x86/cpu_ucode_amd.c: revision 1.2
fix crash when booting with -x.
 1.4.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.5.2.2 18-May-2014  rmind sync with head
 1.5.2.1 28-Aug-2013  rmind sync with head
 1.7.28.3 25-Mar-2018  pgoyette Don't try to #include opt_*.h files when building kernel modules
 1.7.28.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.7.28.1 17-Mar-2018  pgoyette Import christos's changes for the compat_60 cpu_ucode stuff
 1.8.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.2.1 10-Jun-2019  christos Sync with HEAD
 1.10.6.2 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.10.6.1 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.20 15-Sep-2022  msaitoh Verify checksum of the extended signature table.
 1.19 15-Sep-2022  msaitoh Add missing newline in a message. KNF.
 1.18 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.17 10-May-2019  maxv branches: 1.17.2; 1.17.8;
Clean up, and add sanity checks on the microcode lengths.
 1.16 09-May-2019  maxv Invalidate the cache before updating the microcode. Some platforms require
this. Seen in Illumos and FreeBSD.
 1.15 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.14 12-Apr-2018  msaitoh branches: 1.14.2;
Add cpu_ucode_intel_verify() to verify microcode image. Currently, we don't
verify extended signatures'checksum. I have no any image which has extended
signature. If an extended signature found, the function shows
"This image has extended signature table." and continue.
 1.13 17-Mar-2018  christos tuck in all the compat microcode code in one place.
 1.12 01-Jun-2017  chs branches: 1.12.2; 1.12.8;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.11 21-Nov-2016  ozaki-r Sweep unnecessary xcall.h inclusions
 1.10 04-Oct-2015  jym branches: 1.10.2;
Cache CPU index in the non-preemptible part otherwise it can be
unreliable (and report a CPU as patched while it was not).
 1.9 04-Oct-2015  mrg kmem_free() the address returned by kmem_alloc(). found by Brainy.
use the newly aligned location if we needed it. found by kre.
 1.8 12-May-2015  msaitoh Use roundup2() and uintptr_t. Adviced by riastradh@.
 1.7 11-May-2015  msaitoh Re-allocale buffer if a buffer for microcode is not 16byte aligned.
 1.6 12-Dec-2014  msaitoh Use specialreg.h's definitions.
 1.5 26-Mar-2014  christos branches: 1.5.4; 1.5.6;
kill sprintf
 1.4 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.3 06-Jul-2013  gdt #endif comments
 1.2 17-Oct-2012  drochner branches: 1.2.2; 1.2.4;
put binary compatibility support for the old AMD-only CPU microcode
update API inside COMPAT_60
 1.1 29-Aug-2012  drochner branches: 1.1.2;
Extend the CPU microcode update framework to support Intel x86 CPUs.
Contrary to the AMD implementation, it doesn't use xcalls to distribute
the update to all CPUs but relies on cpuctl(8) to bind itself to the
right CPU -- to keep it simple and avoid possible problems with
hyperthreading.
Also, it doesn't parse the vendor supplied file to pick the right
part for the present CPU model but relies on userland to prepare
files with specific filenames. I'll commit a pkg for this in a minute
(pkgsrc/sysutils/intel-microcode).
The ioctl interface changed; compatibility is provided (should be
limited to COMPAT_NETBSD6 as soon as this is available).
 1.1.2.3 03-Dec-2017  jdolecek update from HEAD
 1.1.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.2.4.2 18-May-2014  rmind sync with head
 1.2.4.1 28-Aug-2013  rmind sync with head
 1.2.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.2.2 30-Oct-2012  yamt sync with head
 1.2.2.1 17-Oct-2012  yamt file cpu_ucode_intel.c was added on branch yamt-pagecache on 2012-10-30 17:20:33 +0000
 1.5.6.5 28-Aug-2017  skrll Sync with HEAD
 1.5.6.4 05-Dec-2016  skrll Sync with HEAD
 1.5.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.5.6.2 06-Jun-2015  skrll Sync with HEAD
 1.5.6.1 06-Apr-2015  skrll Sync with HEAD
 1.5.4.3 06-Nov-2015  riz Pull up following revision(s) (requested by jym in ticket #994):
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.10
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.9
kmem_free() the address returned by kmem_alloc(). found by Brainy.
use the newly aligned location if we needed it. found by kre.
Cache CPU index in the non-preemptible part otherwise it can be
unreliable (and report a CPU as patched while it was not).
 1.5.4.2 11-Aug-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #945):
sys/arch/x86/x86/cpu_ucode_intel.c: revisions 1.7, 1.8
Re-allocale buffer if a buffer for microcode is not 16byte aligned.
--
Use roundup2() and uintptr_t. Adviced by riastradh@.
 1.5.4.1 09-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #396):
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.6
sys/arch/x86/include/specialreg.h: revision 1.81
Use specialreg.h's definitions.
 1.10.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.12.8.4 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.12.8.3 25-Mar-2018  pgoyette Don't try to #include opt_*.h files when building kernel modules
 1.12.8.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.12.8.1 17-Mar-2018  pgoyette Import christos's changes for the compat_60 cpu_ucode stuff
 1.12.2.3 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1772):

sys/arch/x86/include/cpu_ucode.h: revision 1.5
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.19
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.20

Add missing newline in a message. KNF.
Verify checksum of the extended signature table.
 1.12.2.2 12-May-2019  martin Pull up following revision(s) (requested by maxv in ticket #1261):

sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.16
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.17

Invalidate the cache before updating the microcode. Some platforms require
this. Seen in Illumos and FreeBSD.

Clean up, and add sanity checks on the microcode lengths.
 1.12.2.1 26-Jul-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #929):
sys/arch/x86/x86/cpu_ucode_intel.c: 1.14
sys/kern/kern_cpu.c: 1.74
Add cpu_ucode_intel_verify() to verify microcode image. Currently, we don't
verify extended signatures'checksum. I have no any image which has extended
signature. If an extended signature found, the function shows
"This image has extended signature table." and continue.
--
Don't allocate memory and return EFTYPE if sc->sc_blobsize==0 to prevent
panic in firmware_malloc().
 1.14.2.1 10-Jun-2019  christos Sync with HEAD
 1.17.8.1 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.17.2.1 11-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1538):
sys/arch/x86/include/cpu_ucode.h: revision 1.5
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.19
sys/arch/x86/x86/cpu_ucode_intel.c: revision 1.20
Add missing newline in a message. KNF.
Verify checksum of the extended signature table.
 1.16 27-Aug-2022  riastradh x86/db_memrw.c: Mark db_read_bytes, db_write_bytes __noubsan.

These intentionally do loads and stores that may be misaligned, which
are fine on this x86-specific code. Should avoid double-panic in
disassembler on panic with UBSan enabled.
 1.15 27-Aug-2022  riastradh x86/db_memrw.c: Use uint64_t, not long, for 8-byte r/w.

This is shared with amd64 and i386, and while long works on amd64,
not so much on i386.

While here, use uint32_t intead of int and uint16_t instead of short
for clarity.
 1.14 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.13 20-Aug-2022  riastradh x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
 1.12 07-Oct-2021  msaitoh KNF. No functional change.
 1.11 21-Apr-2019  maxv Rename the PTE bits.
 1.10 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.9 07-Mar-2019  maxv Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.
 1.8 19-Jan-2019  martin PR kern/53893: add missing #ifdef DDB around db_printf calls.
 1.7 18-Nov-2018  christos fix whitespace
 1.6 16-Mar-2018  ozaki-r branches: 1.6.2;
x86: avoid accessing invalid addresses in ddb like arm32

This avoids that a command stops in the middle of an execution if
a fault occurs due to an access to an invalid address.
 1.5 15-Mar-2018  ozaki-r Use db_printf instead of printf in ddb
 1.4 11-Nov-2017  maxv branches: 1.4.2;
Modify the layout of the bootspace structure, in such a way that it can
contain several kernel segments of the same type (eg several .text
segments). Some parts are still a bit messy but will be cleaned up soon.

I cannot compile-test this change on i386, but it seems fine enough.

NOTE: you need to rebuild and reinstall a new prekern after this change.
 1.3 30-Sep-2017  maxv use bootspace
 1.2 12-May-2016  maxv branches: 1.2.10;
Split the {text+rodata} chunk in two separate chunks on x86. The
rodata segment now loses the large page optimization, gets mapped inside
the data segment, and therefore becomes RWX. It may break the build on
Xen.
 1.1 07-May-2012  jym branches: 1.1.2; 1.1.4; 1.1.6; 1.1.20;
Merge i386 and amd64 version of db_memrw.c.

Use this opportunity to skip calculating the VA of the page. Let the CPU
deal with the invalidation itself through invlpg + destination address to
avoid converting between canonical/non canonical forms.
 1.1.20.1 29-May-2016  skrll Sync with HEAD
 1.1.6.1 03-Dec-2017  jdolecek update from HEAD
 1.1.4.2 02-Jun-2012  mrg sync to latest -current.
 1.1.4.1 07-May-2012  mrg file db_memrw.c was added on branch jmcneill-usbmp on 2012-06-02 11:09:11 +0000
 1.1.2.2 23-May-2012  yamt sync with head.
 1.1.2.1 07-May-2012  yamt file db_memrw.c was added on branch yamt-pagecache on 2012-05-23 10:07:51 +0000
 1.2.10.1 02-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #687):
sys/kern/kern_rwlock_obj.c: revision 1.4
sys/rump/librump/rumpkern/locks.c: revision 1.80
sys/kern/kern_rwlock.c: revision 1.50
sys/arch/x86/x86/db_memrw.c: revision 1.5,1.6
sys/ddb/db_command.c: revision 1.150-1.153
share/man/man4/ddb.4: revision 1.175 (via patch),1.176-1.178
sys/kern/kern_mutex_obj.c: revision 1.6
sys/kern/subr_lockdebug.c: revision 1.61-1.64
sys/sys/lockdebug.h: revision 1.17
sys/kern/kern_mutex.c: revision 1.71
sys/sys/lockdebug.h: revision 1.18,1.19
sys/kern/subr_xcall.c: revision 1.26

Obtain proper initialized addresses of locks allocated by mutex_obj_alloc or rw_obj_alloc

Initialized addresses of locks allocated by mutex_obj_alloc or rw_obj_alloc
were not useful because the addresses were mutex_obj_alloc or rw_obj_alloc
itself. What we want to know are callers of them.

Spinkle ASSERT_SLEEPABLE to xcall functions

Use db_printf instead of printf in ddb

Add a new command, show lockstat, which shows statistics of locks
Currently the command shows the number of allocated locks.
The command is useful only if LOCKDEBUG is enabled.

Add a new command, show all locks, which shows information of active locks

The command shows information of all active (i.e., being held) locks that are
tracked through either of LWPs or CPUs by the LOCKDEBUG facility. The /t
modifier additionally shows a backtrace for each LWP additionally. This
feature is useful for debugging especially to analyze deadlocks.
The command is useful only if LOCKDEBUG is enabled.

Don't pass a unset address to lockdebug_lock_print

x86: avoid accessing invalid addresses in ddb like arm32
This avoids that a command stops in the middle of an execution if
a fault occurs due to an access to an invalid address.

Get rid of a redundant output

Improve wording. Fix a Cm argument.

ddb: rename "show lockstat" to "show lockstats" to avoid conflicting with lockstat(8)
Requested by mrg@
 1.4.2.4 26-Jan-2019  pgoyette Sync with HEAD
 1.4.2.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.4.2.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.4.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.6.2.1 10-Jun-2019  christos Sync with HEAD
 1.6 24-Dec-2022  uwe db_trace.c: Use DB_SYM_NULL instead of respelling it
 1.5 24-Dec-2022  uwe db_trace.c: Make parens balanced across #ifdef

Same object code is generated on both i386 and amd64.
 1.4 11-Feb-2018  maxv Style, and reduce the diff between i386 and amd64. No functional change.
 1.3 21-Apr-2011  enami branches: 1.3.2; 1.3.4; 1.3.8;
lwpaddr is a boolean variable and thus doesn't hold an address of lwp.
Compare with correct value so that tr/t works again on current procecss.
 1.2 11-Apr-2011  mrg obsolete DB_AOUT_SYMBOLS. however, we need to leave most of the code
in db_sym.[ch] as it is used by the elf version of crash(8).

i will be cleaning up the db_sym.c code in a follow up commit to avoid
having dead code compiled.
 1.1 10-Apr-2011  christos Merge db_trace for x86. From: Vladimir Kirillov proger at wilab dot org dot ua
 1.3.8.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.8.1 21-Apr-2011  jruoho file db_trace.c was added on branch jruoho-x86intr on 2011-06-06 09:07:07 +0000
 1.3.4.2 02-May-2011  jym Sync with head.
 1.3.4.1 21-Apr-2011  jym file db_trace.c was added on branch jym-xensuspend on 2011-05-02 22:49:57 +0000
 1.3.2.3 31-May-2011  rmind sync with head
 1.3.2.2 21-Apr-2011  rmind sync with head
 1.3.2.1 21-Apr-2011  rmind file db_trace.c was added on branch rmind-uvmplock on 2011-04-21 01:41:32 +0000
 1.15 31-Jan-2020  maxv 'oldlwp' is never NULL now, so remove the NULL checks.
 1.14 14-Jan-2019  maxv branches: 1.14.6;
Add #ifndef i386, the dbregs are 32bit in this case anyway.
 1.13 13-Jan-2019  maxv Error out if the higher 32 bits of DR6 and DR7 are set. MOV DR would
fault otherwise.
 1.12 27-Sep-2018  maxv Export x86_dbregs_{save/restore}, will be used outside. Reproduce some
internal dbregs logic in them.
 1.11 26-Jul-2018  maxv Rework dbregs, to switch the registers during context switches, and not on
each user->kernel transition via userret. Reloads of DR6/DR7 are expensive
on both native and xen.
 1.10 22-Jul-2018  maxv Clean up dbregs; remove useless comments, remove arguments from prototypes,
style, add KASSERT and move x86_dbregspl into dbregs.c. No real functional
change.
 1.9 08-Apr-2018  kamil branches: 1.9.2;
Add paranoid code to X86 Debug Registers

Reset certain bits in DR6 and DR7 in x86_dbregs_setup_initdbstate().

Reset X86_BREAKPOINT_CONDITION_DETECTED in DR6.
Reset X86_DR7_GENERAL_DETECT_ENABLE in DR7.

It's allowed by devices or software before the kernel boot, to
use these registers for their own purposes. Handle this paranoid case
explicitly setting the mentioned bits to zero.

Sponsored by <The NetBSD Foundation>
 1.8 05-Apr-2018  maxv Hum, don't let userland set bit 13, because this can crash the kernel.
 1.7 05-Apr-2018  maxv Fix the check, should be >=.
 1.6 23-Feb-2017  martin branches: 1.6.6; 1.6.12; 1.6.14;
Make it compilable in non-diagnostic kernels
 1.5 23-Feb-2017  kamil Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>
 1.4 18-Jan-2017  kamil branches: 1.4.2;
Fix bug with swapped event type and register that fired in hw watchpoints

Swap bits for DR_EVENT_MASK and DR_REGISTER_MASK.

Sponsored by <The NetBSD Foundation>
 1.3 18-Jan-2017  kamil Remove assert that Debug Registers are not mixed with Debug Trap Flag

New code is designed to mix them.

Sponsored by <The NetBSD Foundation>
 1.2 18-Jan-2017  kamil Embed hardware trap and its type that fired (x86), information for tracers

Now x86 throws SIGTRAP on hardware exception with:
- si_code TRAP_HWWPT - dedicated for hw assisted watchpoint interface
- si_trap - unchanged (T_TRCTRAP)
- si_trap2 - watchpoint number that fired
- si_trap3 - watchpoint specific event description

x86 returns in si_trap3 one of the field from <x86/dbregs.h>
- X86_HW_WATCHPOINT_EVENT_FIRED - watchpoint fired
- X86_HW_WATCHPOINT_EVENT_FIRED_AND_SSTEP - watchpoint fired under PT_STEP

Othe changes:
- restrict more code from <x86/dbregs.h> to _KERNEL

Sponsored bt <The NetBSD Foundation>
 1.1 15-Dec-2016  kamil branches: 1.1.2; 1.1.4;
Add support for hardware assisted watchpoints/breakpoints API in ptrace(2)

Add new ptrace(2) calls:
- PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
- PT_READ_WATCHPOINT - read struct ptrace_watchpoint from the kernel state
- PT_WRITE_WATCHPOINT - write new struct ptrace_watchpoint state, this
includes enabling and disabling watchpoints

The ptrace_watchpoint structure contains MI and MD parts:

typedef struct ptrace_watchpoint {
int pw_index; /* HW Watchpoint ID (count from 0) */
lwpid_t pw_lwpid; /* LWP described */
struct mdpw pw_md; /* MD fields */
} ptrace_watchpoint_t;

For example amd64 defines MD as follows:
struct mdpw {
void *md_address;
int md_condition;
int md_length;
};

These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.

Tested on amd64, initial support added for i386 and XEN.

Sponsored by <The NetBSD Foundation>
 1.1.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.1.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 15-Dec-2016  pgoyette file dbregs.c was added on branch pgoyette-localcount on 2017-01-07 08:56:28 +0000
 1.4.2.3 28-Aug-2017  skrll Sync with HEAD
 1.4.2.2 05-Feb-2017  skrll Sync with HEAD
 1.4.2.1 18-Jan-2017  skrll file dbregs.c was added on branch nick-nhusb on 2017-02-05 13:40:23 +0000
 1.6.14.5 18-Jan-2019  pgoyette Synch with HEAD
 1.6.14.4 30-Sep-2018  pgoyette Ssync with HEAD
 1.6.14.3 28-Jul-2018  pgoyette Sync with HEAD
 1.6.14.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.6.14.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.6.12.2 03-Dec-2017  jdolecek update from HEAD
 1.6.12.1 23-Feb-2017  jdolecek file dbregs.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.6.6.1 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #712):

sys/arch/x86/x86/dbregs.c: revision 1.7-1.9

Fix the check, should be >=.

Hum, don't let userland set bit 13, because this can crash the kernel.

Add paranoid code to X86 Debug Registers

Reset certain bits in DR6 and DR7 in x86_dbregs_setup_initdbstate().
Reset X86_BREAKPOINT_CONDITION_DETECTED in DR6.
Reset X86_DR7_GENERAL_DETECT_ENABLE in DR7.

It's allowed by devices or software before the kernel boot, to
use these registers for their own purposes. Handle this paranoid case
explicitly setting the mentioned bits to zero.

Sponsored by <The NetBSD Foundation>
 1.9.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.2.1 10-Jun-2019  christos Sync with HEAD
 1.14.6.1 29-Feb-2020  ad Sync with head.
 1.23 30-Aug-2022  riastradh x86: Rename x86/efi.c -> x86/efi_machdep.c.

Avoid collision with dev/efi.c.
 1.22 07-Oct-2021  msaitoh KNF. No functional change.
 1.21 10-Dec-2019  manu Add multiboot 2 support to amd64 kernel
 1.20 18-Oct-2019  manu Fix EFI system table mapping in virtual space

Previous version was annoted as untested, and indeed it did not work.
New version uses the same approach as for ACPI table mapping.
 1.19 03-Dec-2018  cherry branches: 1.19.4;
Do not assume that all uses of efi are pci aware.

Allow efi.c to compile in the case where pci is not enabled.
 1.18 15-Nov-2018  riastradh No need to write any initializer here, casted or otherwise.

(Sorry about the build breakage; thanks, kre!)
 1.17 15-Nov-2018  kre Update signature in prototype of efi_relva() to match
change in definition in previous, and explicitly cast
NULL to paddr_t to avoid gcc noise.
 1.16 15-Nov-2018  riastradh Make the direct-map API always available, but fail if KASAN or rump.

(Only for architectures that support it at all; on others,
__HAVE_MM_MD_DIRECT_MAPPED_PHYS/IO are still undefined and the
functions unimplemented.)

This gives modules like zfs an opportunity to use it.

While here, fix the one caller of mm_md_direct_mapped_phys that
ignored the return value (and make sure to call pmap_kremove/update
before uvm_km_free).
 1.15 19-May-2018  jakllsch branches: 1.15.2;
Refine previous change to enable PCI window decoding in Command
Register upon mapping; conditionalize on a global variable, that is set
to true on x86 machines booting under EFI.

For now, initialize the global variable at compile time to false. This
is intended to limit potential problems for other NetBSD ports, should
this changeset be pulled up to netbsd-8.

Related to PR #53286.
 1.14 22-Oct-2017  maya branches: 1.14.2; 1.14.4;
Add sysctl machdep.bootmethod

either "UEFI" or "BIOS" to mimic freebsd
 1.13 22-Oct-2017  maya Move initialization code out of efi_probe into efi_init

and call it from cpu_configure
 1.12 22-Oct-2017  maya more static
 1.11 11-Mar-2017  nonaka branches: 1.11.6;
search SMBIOS from UEFI configuration table when boot with UEFI.
 1.10 23-Feb-2017  nonaka Avoid panic when amd64 kernel is booted from 32bit UEFI.
 1.9 16-Feb-2017  nonaka Quell maybe-uninitialized false positives from gcc -Os.

reported by John D. Baker at current-users@.
http://mail-index.netbsd.org/current-users/2017/02/15/msg031132.html
 1.8 14-Feb-2017  nonaka Handle persistent memory. Currently only debug output.
 1.7 14-Feb-2017  nonaka x86: make btinfo_memmap from btinfo_efimemmap for to reduce mem_cluster_cnt.

should fix PR/51953.
 1.6 26-Jan-2017  nonaka Fix compile failure on i386 with PAE.
 1.5 24-Jan-2017  nonaka Initial commit of native amd64 EFI boot loader.
 1.4 24-Aug-2016  nonaka branches: 1.4.2;
fix incorrect check in efi_getcfgtblhead().
 1.3 10-Jun-2016  pgoyette branches: 1.3.2;
Add missing \n (I triggered the systbl message on a qemu virtual machine!)
 1.2 29-Jan-2016  christos branches: 1.2.2;
fix printf formats
 1.1 28-Jan-2016  christos Add support for grub to find the ACPI root table pointer via a bootinfo entry
from grub.
From: https://mail-index.netbsd.org/tech-kern/2014/05/22/msg017119.html
 1.2.2.6 28-Aug-2017  skrll Sync with HEAD
 1.2.2.5 05-Feb-2017  skrll Sync with HEAD
 1.2.2.4 05-Oct-2016  skrll Sync with HEAD
 1.2.2.3 09-Jul-2016  skrll Sync with HEAD
 1.2.2.2 19-Mar-2016  skrll Sync with HEAD
 1.2.2.1 29-Jan-2016  skrll file efi.c was added on branch nick-nhusb on 2016-03-19 11:30:07 +0000
 1.3.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.4.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.6.2 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1799):

sys/arch/x86/x86/efi_machdep.c: revision 1.3
(applied to sys/arch/x86/x86/efi.c)

x86/efi: Print uuids in slightly more standard notation.

Anyone need a spare hyphen? We had a few extras, apparently.
 1.11.6.1 07-Jun-2018  martin Pull up following revision(s) (requested by jakllsch in ticket #832):

sys/dev/pci/pcivar.h: revision 1.112
sys/dev/pci/pci_map.c: revision 1.34,1.35
sys/arch/x86/x86/efi.c: revision 1.15

Enable the appropriate memory or I/O space decode in the PCI
Command/Status Register upon mapping a BAR.

This should fix PR #53286. It's also possible there are other similar
PRs that might be fixed by this.
-
Refine previous change to enable PCI window decoding in Command
Register upon mapping; conditionalize on a global variable, that is set
to true on x86 machines booting under EFI.

For now, initialize the global variable at compile time to false. This
is intended to limit potential problems for other NetBSD ports, should
this changeset be pulled up to netbsd-8.

Related to PR #53286.
 1.14.4.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.14.4.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.14.4.1 21-May-2018  pgoyette Sync with HEAD
 1.14.2.2 03-Dec-2017  jdolecek update from HEAD
 1.14.2.1 22-Oct-2017  jdolecek file efi.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.15.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.15.2.1 10-Jun-2019  christos Sync with HEAD
 1.19.4.1 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1603):

sys/arch/x86/x86/efi_machdep.c: revision 1.3
(applied to sys/arch/x86/x86/efi.c)

x86/efi: Print uuids in slightly more standard notation.

Anyone need a spare hyphen? We had a few extras, apparently.
 1.6 22-May-2023  riastradh efi(4): Implement EFIIOC_GET_TABLE on x86.

PR kern/57076

XXX pullup-10
 1.5 22-May-2023  riastradh efi(4): Move error macros to efi.h.

PR kern/57076

XXX pullup-10
 1.4 24-Dec-2022  andvar s/reqest/request/, s/requst/request/ and s/reuqest/request/ in comments.
 1.3 24-Sep-2022  riastradh branches: 1.3.4;
x86/efi: Print uuids in slightly more standard notation.

Anyone need a spare hyphen? We had a few extras, apparently.

XXX pullup-8
XXX pullup-9
 1.2 24-Sep-2022  riastradh x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.
 1.1 30-Aug-2022  riastradh x86: Rename x86/efi.c -> x86/efi_machdep.c.

Avoid collision with dev/efi.c.
 1.3.4.1 01-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #292):

sys/arch/arm/arm/efi_runtime.c: revision 1.11
sys/dev/efi/efi.h: revision 1.3
sys/arch/x86/x86/efi_machdep.c: revision 1.5
sys/arch/x86/x86/efi_machdep.c: revision 1.6
sys/dev/efi.c: revision 1.5
sys/dev/efi.c: revision 1.6
sys/dev/efi.c: revision 1.7
sys/dev/efi.c: revision 1.8
sys/dev/efi.c: revision 1.9
sys/dev/efivar.h: revision 1.2
sys/sys/efiio.h: revision 1.3

efi(4): Parenthesize EFIERR argument out of paranoia.
PR kern/57076

efi(4): Move error macros to efi.h.
PR kern/57076

efi(4): Implement MI parts of EFIIOC_GET_TABLE.
Intended to be compatible with FreeBSD.
Not yet supported on any architectures.
PR kern/57076

efi(4): Implement EFIIOC_GET_TABLE on x86.
PR kern/57076

efi(4): Translate between size_t and unsigned long.
Fixes i386 build.
PR kern/57076

efi(4): Fix logic to handle buffer sizing.

Can't KASSERT(datasize <= databufsize) because the caller is allowed
to pass in a too-small size and get ERR_BUFFER_TOO_SMALL back, with
the actual size returned so it can resize its buffer. So just clamp
the size to the smaller of what the caller provided and what the
firwmare provided, instead of asserting anything.

PR kern/57076
 1.35 27-Oct-2023  mrg x86: handle AMD errata 1474: A CPU core may hang after about 1044 days

from the new comment:

* This requires disabling CC6 power level, which can be a performance
* issue since it stops full turbo in some implementations (eg, half the
* cores must be in CC6 to achieve the highest boost level.) Set a timer
* to fire in 1000 days -- except NetBSD timers end up having a signed
* 32-bit hz-based value, which rolls over in under 25 days with HZ=1000,
* and doing xcall(9) or kthread(9) from a callout is not allowed anyway,
* so just have a kthread wait 1 day for 1000 times.

documented in:

https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/56323-PUB_1_01.pdf
 1.34 27-Oct-2023  mrg x86: add names for errata that don't have actual numbers

zenbleed is reported as "erratum 65535" currently, this adds a name
for it, and enables the name for any others as well.

pull logging into a function with a tag message.
 1.33 28-Jul-2023  mrg x86: make the CPUID list for errata be far less confusing

the 0x80000001 CPUID result needs some parsing to match against
actual family/model/stepping values. 4-bit 'family' values of
15 or 6 change how to parse the 4-bit extended model and 8-bit
extended family value - for family 6 or 15, the extended model
bits (4) are concatenated with the base 4-bits to create an
8-bit value, and for family 15, the family value is addition
of the family value and the 8-bit extended-family value, giving
a range of 0 to 15 + 0xff aka 270.

use a CPUREV(family, model, stepping) macro that builds the
relevant bit-representation of a CPUID, making it far easier
to understand what each entry means, and to add new ones too.

i have confirmed that the emitted cpurevs[] array has the same
values before/after this change, ie, NFCI or observed.
 1.32 26-Jul-2023  mrg fix the cpuids for the zen2 client CPUs.

i'm not exactly how i came up with the values i had, though one
of them was still valid and matched my test systems.

XXX: pullup-*
 1.31 25-Jul-2023  mrg x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.30 24-Jul-2023  riastradh x86/errata.c: Only say the errata revision search for cpu0.
 1.29 24-Jul-2023  riastradh x86/errata.c: Say what revision we're searching for.
 1.28 24-Jul-2023  riastradh x86/errata.c: Link to original AMD errata guide.

This one is no longer updated; need to link to newer ones for
individual families too. That's where all the cryptic nomenclature
comes from here.
 1.27 07-Oct-2021  msaitoh branches: 1.27.4;
KNF. No functional change.
 1.26 18-May-2019  maxv branches: 1.26.2;
Disable errata #1091. We are the only OS to apply it, and it seems to be
causing trouble to VirtualBox (PR/54143).
 1.25 12-Aug-2018  maxv enable the two errata for AMD Family 16h, tested by mrg@, thanks
 1.24 07-Aug-2018  maxv Add five errata for AMD Family 17h (Ryzen etc), tested by Patrick Welche,
thanks. Also add two errata for Family 16h, not yet tested, so not yet
enabled.
 1.23 05-Jan-2016  hannken branches: 1.23.10; 1.23.16; 1.23.18;
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.

As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.22 27-Jul-2015  msaitoh KNF.
 1.21 21-Mar-2013  christos branches: 1.21.12; 1.21.14; 1.21.16;
PR/47677 Aktado: x86_errata() should be avoided if NetBSD runs as a KVM guest.
XXX: pullup to 6
 1.20 06-Apr-2012  chs branches: 1.20.2;
bring in this change from openbsd:
Implement the AMD suggested workaround for family 10h & 12h errata 721
"Processor May Incorrectly Update Stack Pointer" by setting a bit
marked 'reserved' in an MSR that is only "documented" to exist on 12h.
 1.19 23-Jul-2010  cegger branches: 1.19.8; 1.19.12; 1.19.14;
use __arraycount
 1.18 25-May-2008  chris branches: 1.18.12; 1.18.18; 1.18.20;
Check for erratum 261 on AMD Family 10h Stepping 3 processors.

Also output any detected errata at verbose, rather than debug, level so
they can be seen with dmesg, and at least have a clue if a BIOS update
would fix the errata.
 1.17 25-May-2008  chris Add detection of errata for AMD Family 10h steppings A and 2. Covering
errata:
254: Internal Resource Livelock Involving Cached TLB Reload
261: Processor May Stall Entering Stop-Grant Due to Pending Data
Cache Scrub
298: L2 Eviction May Occur During Processor Operation To Set
Accessed or Dirty Bit
309: Processor Core May Execute Incorrect Instructions on
Concurrent L2 and Northbridge Response
 1.16 21-May-2008  ad Be a bit less pointed with the errata warning.
 1.15 28-Apr-2008  martin branches: 1.15.2;
Remove clause 3 and 4 from TNF licenses
 1.14 16-Apr-2008  cegger branches: 1.14.2; 1.14.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.13 14-Nov-2007  ad branches: 1.13.14;
- Remove I486_CPU, I586_CPU, I686_CPU options. They buy us nothing and
clutter the code significantly.
- Remove pccons.
 1.12 12-Nov-2007  ad - cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.
 1.11 17-Oct-2007  garbled branches: 1.11.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.10 03-Oct-2007  veego branches: 1.10.2;
Add a debug printf (aprint_debug) when a erratum was patched.
 1.9 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.8 25-Mar-2007  tls branches: 1.8.4; 1.8.12; 1.8.14; 1.8.16;
Revert revision 1.6: with a -current GENERIC.MP kernel we cannot reproduce
the TLB shootdown IPI storms on any of the machines in question.
 1.7 21-Feb-2007  thorpej branches: 1.7.2; 1.7.6; 1.7.8; 1.7.10;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.6 05-Feb-2007  ad branches: 1.6.2;
The TLB flush filter workaround causes TLB shootdown storms on our build
machines. Disable it for now until that problem is solved.
 1.5 11-Jan-2007  ad branches: 1.5.2;
x86_errata: correct the definition of MSR_HWCR and re-enable. Problem
noted and debugged by Murray Armfield (murray at river-styx.org).
 1.4 02-Jan-2007  ad - Don't print any specifics unless booted with -d.
- Disable for now, at least one model of CPU throws a GPF.
 1.3 01-Jan-2007  ad Cut size of tables slighty.
 1.2 01-Jan-2007  ad Oops, issue a warning only once.
 1.1 01-Jan-2007  ad Report on and where possible, try to work around some of the known errata
for Athlon 64 and Opteron processors. Tested briefly by cube@ and elad@.
 1.5.2.3 09-Feb-2007  ad Sync with HEAD.
 1.5.2.2 12-Jan-2007  ad Sync with head.
 1.5.2.1 11-Jan-2007  ad file errata.c was added on branch newlock2 on 2007-01-12 01:01:01 +0000
 1.6.2.2 15-Apr-2007  yamt sync with head.
 1.6.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.7.10.1 29-Mar-2007  reinoud Pullup to -current
 1.7.8.1 11-Jul-2007  mjf Sync with head.
 1.7.6.3 03-Dec-2007  ad Sync with HEAD.
 1.7.6.2 09-Oct-2007  ad Sync with head.
 1.7.6.1 10-Apr-2007  ad Sync with head.
 1.7.2.5 15-Nov-2007  yamt sync with head.
 1.7.2.4 27-Oct-2007  yamt sync with head.
 1.7.2.3 03-Sep-2007  yamt sync with head.
 1.7.2.2 26-Feb-2007  yamt sync with head.
 1.7.2.1 21-Feb-2007  yamt file errata.c was added on branch yamt-lazymbuf on 2007-02-26 09:08:50 +0000
 1.8.16.1 06-Oct-2007  yamt sync with head.
 1.8.14.2 09-Jan-2008  matt sync with HEAD
 1.8.14.1 06-Nov-2007  matt sync with HEAD
 1.8.12.4 21-Nov-2007  joerg Sync with HEAD.
 1.8.12.3 14-Nov-2007  joerg Sync with HEAD.
 1.8.12.2 04-Oct-2007  joerg Sync with HEAD.
 1.8.12.1 02-Oct-2007  joerg Sync with HEAD.
 1.8.4.1 03-Oct-2007  garbled Sync with HEAD
 1.10.2.3 18-Nov-2007  bouyer Sync with HEAD
 1.10.2.2 13-Nov-2007  bouyer Sync with HEAD
 1.10.2.1 17-Oct-2007  bouyer amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by
Mathieu Ropert in 2006.
DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to
sysinst if you're lucky. Often it panics because a runable LWP has
a NULL stack (really, it's all of l->l_addr which is has been zeroed out
while the process was on the queue !)
TODO:
- bug fixes :)
- Most of the xpq_* functions should be shared with xen/i386
- The xen/i386 assembly bootstrap code should be remplaced with the C
version in xenamd64/amd64/xpmap.c
- see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
 1.11.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.13.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.14.4.3 11-Aug-2010  yamt sync with head.
 1.14.4.2 04-May-2009  yamt sync with head.
 1.14.4.1 16-May-2008  yamt sync with head.
 1.14.2.2 04-Jun-2008  yamt sync with head
 1.14.2.1 18-May-2008  yamt sync with head.
 1.15.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.18.20.1 05-Mar-2011  rmind sync with head
 1.18.18.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.18.12.1 24-Oct-2010  jym Sync with HEAD
 1.19.14.2 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1361):
sys/arch/x86/include/cpufunc.h: revision 1.19
sys/arch/x86/x86/errata.c: revision 1.23
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.19.14.1 09-Apr-2012  riz branches: 1.19.14.1.4; 1.19.14.1.6;
Pull up following revision(s) (requested by chs in ticket #168):
sys/arch/x86/include/specialreg.h: revision 1.57
sys/arch/x86/x86/errata.c: revision 1.20
bring in this change from openbsd:
Implement the AMD suggested workaround for family 10h & 12h errata 721
"Processor May Incorrectly Update Stack Pointer" by setting a bit
marked 'reserved' in an MSR that is only "documented" to exist on 12h.
 1.19.14.1.6.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1361):
sys/arch/x86/include/cpufunc.h: revision 1.19
sys/arch/x86/x86/errata.c: revision 1.23
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.19.14.1.4.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1361):
sys/arch/x86/include/cpufunc.h: revision 1.19
sys/arch/x86/x86/errata.c: revision 1.23
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.19.12.1 29-Apr-2012  mrg sync to latest -current.
 1.19.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.19.8.1 17-Apr-2012  yamt sync with head
 1.20.2.2 03-Dec-2017  jdolecek update from HEAD
 1.20.2.1 23-Jun-2013  tls resync from head
 1.21.16.1 06-Feb-2016  snj Pull up following revision(s) (requested by hannken in ticket #1073):
sys/arch/x86/x86/errata.c: revision 1.23
sys/arch/x86/include/cpufunc.h: revision 1.19
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.21.14.2 19-Mar-2016  skrll Sync with HEAD
 1.21.14.1 22-Sep-2015  skrll Sync with HEAD
 1.21.12.1 26-Jan-2016  snj Pull up following revision(s) (requested by hannken in ticket #1073):
sys/arch/x86/x86/errata.c: revision 1.23
sys/arch/x86/include/cpufunc.h: revision 1.19
Adapt prototypes and usage of rdmsr_locked() and wrmsr_locked() to
their implementation. Both functions don't take the passcode as
argument.
As wrmsr_locked() no longer writes the passcode to the msr the
erratum 721 on my Opteron 2356 really gets patched and cc1 no longer
crashes with SIGSEGV.
 1.23.18.1 10-Jun-2019  christos Sync with HEAD
 1.23.16.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.23.10.3 27-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #1852):

sys/arch/x86/x86/errata.c: revision 1.32

fix the cpuids for the zen2 client CPUs.

i'm not exactly how i came up with the values i had, though one
of them was still valid and matched my test systems.
 1.23.10.2 25-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #1851):

sys/arch/x86/include/specialreg.h: revision 1.207
sys/arch/x86/x86/errata.c: revision 1.31

x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.23.10.1 05-Aug-2020  martin Pull up the following, requested by msaitoh in ticket #1595:

sys/arch/x86/include/specialreg.h 1.129 via patch
sys/arch/x86/x86/errata.c 1.24-1.26

- Add six errata for AMD Family 17h (Ryzen etc), tested by
Patrick Welche and mrg@.
 1.26.2.2 27-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #1667):

sys/arch/x86/x86/errata.c: revision 1.32

fix the cpuids for the zen2 client CPUs.

i'm not exactly how i came up with the values i had, though one
of them was still valid and matched my test systems.
 1.26.2.1 25-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #1664):

sys/arch/x86/include/specialreg.h: revision 1.207
sys/arch/x86/x86/errata.c: revision 1.31

x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.27.4.3 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #919):

sys/arch/x86/x86/errata.c: revision 1.28
sys/arch/x86/x86/errata.c: revision 1.29
sys/arch/x86/include/specialreg.h: revision 1.209
usr.sbin/cpuctl/arch/i386.c: revision 1.144
sys/arch/x86/x86/errata.c: revision 1.30
sys/arch/x86/x86/errata.c: revision 1.33
sys/arch/x86/x86/errata.c: revision 1.34
sys/arch/x86/x86/errata.c: revision 1.35
sys/arch/x86/include/specialreg.h: revision 1.210
sys/arch/x86/include/specialreg.h: revision 1.211

x86/errata.c: Link to original AMD errata guide.

This one is no longer updated; need to link to newer ones for
individual families too. That's where all the cryptic nomenclature
comes from here.

x86/errata.c: Say what revision we're searching for.

x86/errata.c: Only say the errata revision search for cpu0.

x86: make the CPUID list for errata be far less confusing
the 0x80000001 CPUID result needs some parsing to match against
actual family/model/stepping values. 4-bit 'family' values of
15 or 6 change how to parse the 4-bit extended model and 8-bit
extended family value - for family 6 or 15, the extended model
bits (4) are concatenated with the base 4-bits to create an
8-bit value, and for family 15, the family value is addition
of the family value and the 8-bit extended-family value, giving
a range of 0 to 15 + 0xff aka 270.

use a CPUREV(family, model, stepping) macro that builds the
relevant bit-representation of a CPUID, making it far easier
to understand what each entry means, and to add new ones too.
i have confirmed that the emitted cpurevs[] array has the same
values before/after this change, ie, NFCI or observed.

x86: add names for errata that don't have actual numbers
zenbleed is reported as "erratum 65535" currently, this adds a name
for it, and enables the name for any others as well.
pull logging into a function with a tag message.

x86: handle AMD errata 1474: A CPU core may hang after about 1044 days
from the new comment:
* This requires disabling CC6 power level, which can be a performance
* issue since it stops full turbo in some implementations (eg, half the
* cores must be in CC6 to achieve the highest boost level.) Set a timer
* to fire in 1000 days -- except NetBSD timers end up having a signed
* 32-bit hz-based value, which rolls over in under 25 days with HZ=1000,
* and doing xcall(9) or kthread(9) from a callout is not allowed anyway,
* so just have a kthread wait 1 day for 1000 times.
documented in:
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/56323-PUB_1_01.pdf

add MSR stuff for AMD errata 1474.

cpuctl: fix i386 bit descriptions for CPUID_SEF_FLAGS1
warning: non-printing character '\31' in description
'BUS_LOCK_DETECT""b\31' [363]
s/RPMQUERY/RMPQUERY/
 1.27.4.2 27-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #247):

sys/arch/x86/x86/errata.c: revision 1.32

fix the cpuids for the zen2 client CPUs.

i'm not exactly how i came up with the values i had, though one
of them was still valid and matched my test systems.
 1.27.4.1 25-Jul-2023  martin Pull up following revision(s) (requested by mrg in ticket #243):

sys/arch/x86/include/specialreg.h: revision 1.207
sys/arch/x86/x86/errata.c: revision 1.31

x86: turn off zenbleed chicken bit on Zen2 cpus.

this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)

(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)

tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)

https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
https://lock.cmpxchg8b.com/zenbleed.html
 1.33 07-Oct-2021  msaitoh KNF. No functional change.
 1.32 25-Oct-2020  nia Normalize some machine dependent CPU frequenct sysctl variables.

This moves machdep.*.frequency.* to machdep.cpu.frequency.*.

This was proposed on tech-kern some time ago. The intention is to allow
third-party tools such as estd and conky to more easily and reliably
fetch or modify the current CPU frequency without iterating through
various machine-dependent variables to check their presence.
 1.31 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.30 18-Apr-2014  christos branches: 1.30.4;
more conservative length check.
 1.29 27-Mar-2014  christos branches: 1.29.2;
correct/add protection against snprintf overflow.
 1.28 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.27 08-Nov-2013  msaitoh space -> tab
 1.26 08-Nov-2013  christos fix unused variables
 1.25 02-Jun-2012  dsl branches: 1.25.2; 1.25.4;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.24 04-Mar-2011  jruoho branches: 1.24.4;
Raise the return value of the match-function of est(4) and powernow(4).
The assigned priorities are now: 10 for acpicpu(4), 5 for est(4) and
powernow(4), and 1 for odcm(4). These are used to pick the preferred driver.
 1.23 24-Feb-2011  jruoho Do not return the bus clock directly in the match() function.
 1.22 24-Feb-2011  jruoho Fix autoconf(9) of cpufeaturebus.
 1.21 24-Feb-2011  jruoho Also check CPU vendor in the match-function.
 1.20 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.19 20-Aug-2010  jruoho branches: 1.19.2; 1.19.4;
Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.18 09-Aug-2010  jruoho Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.
 1.17 09-Aug-2010  jruoho Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.
 1.16 08-Aug-2010  jruoho Merge P-state support for acpicpu(4).

Remarks:

1. All processors (x86 or not) for which the vendor has implemented
ACPI I/O access routines are supported. Native instructions are
currently supported only for Intel's "Enhanced Speedstep". Code for
"PowerNow!" (AMD) will be merged later. Native support for VIA's
"PowerSaver" will be investigated.

2. Backwards compatibility with existing userland code is maintained.
Comparable to the case with cpu_idle(9), the ACPI CPU driver
installs alternative functions for the existing sysctl(8) controls.
The "native" behavior (if any) is restored upon detachment.

3. The dynamic nature of ACPI-provided P-states needs more investigation.
The maximum frequency induced (but not forced) by the firmware may
change dynamically. Currently, the sysctl(8) controls error out with
a value larger than the dynamic maximum. The code itself does not
however yet react to the notifications from the firmware by changing
the frequencies in-place. Presumably the system administrator should
be able to choose whether to use dynamic or static frequencies.
 1.15 07-Aug-2010  jym Use aprint_debug_dev(). Caught thanks to coccinelle.
 1.14 23-May-2010  christos simplify the debugging code and make it more informative.
 1.13 05-Oct-2009  rmind branches: 1.13.2; 1.13.4;
Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.12 02-Oct-2009  jmcneill Use the TSC and current multiplier to calculate bus clock on VIA C7 Esther.
Probably needed for all C7 and Nano processors, but to be safe only use
this alternate method on Esther for now.

Now est on my C7-M 1.6GHz properly reports frequencies from 1600 to 400,
instead of 2133 to 533.
 1.11 25-Mar-2009  dyoung It is only by accident that these get definitions they need from
<sys/device.h>, so explicitly #include <sys/device.h>.
 1.10 17-Feb-2009  jmcneill Shorten est message, use aprint_debug
 1.9 28-Apr-2008  martin branches: 1.9.8; 1.9.10; 1.9.14; 1.9.18;
Remove clause 3 and 4 from TNF licenses
 1.8 16-Apr-2008  cegger branches: 1.8.2; 1.8.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.7 09-Dec-2007  jmcneill branches: 1.7.10;
Merge jmcneill-pm branch.
 1.6 07-Dec-2007  xtraeme branches: 1.6.2;
Print the informational messages after all checks have been done,
and while I'm here change some of those to be aprint_debug(). Also
use __func__ rather than __FUNCTION__.
 1.5 28-Oct-2007  joerg branches: 1.5.2; 1.5.4;
More debugging reveals that the U7600 really wants to use the same
voltage for different frequencies. The fake table can be computed as
driven by the frequency here as well, but don't recompute the voltage as
it would result in an underflow.

Fix argument order in a debug message to match the format string.
 1.4 24-Oct-2007  joerg Before faking up a state table, make sure that neither frequency nor
voltage difference is 0. This avoids a divide by zero.
 1.3 06-Aug-2007  simonb branches: 1.3.2; 1.3.4; 1.3.6; 1.3.10; 1.3.12;
If an EST frequency table isn't found, fake one up by interpolating
values from the high/low voltages and frequencies. Replaces the old
method of using the lower and upper frequencies only.
 1.2 01-Jul-2007  xtraeme branches: 1.2.2; 1.2.4; 1.2.6; 1.2.10;
Add support for the VIA C7-M and Eden processors in the
Enhanced Speedstep driver.

Tested by Heron Gallegos <gallegos at csxxi dot net dot mx>
 1.1 03-Jun-2007  xtraeme branches: 1.1.2;
Make the Enhanced Speedstep driver available for i386 and amd64.
To use it on EM64T CPUs supporting the EST CPUID feature. Note that
some CPUs still don't work with this driver, like Xeon or Pentium 4.

Move the p[34]_get_bus_clock functions into its own file,
intel_busclock.c and remove this code from i386/identcpu.c.

Tested on i386 by myself and amd64 by Tonerre.
 1.1.2.6 03-Dec-2007  ad Sync with HEAD.
 1.1.2.5 20-Aug-2007  ad Sync with HEAD.
 1.1.2.4 15-Jul-2007  ad Sync with head.
 1.1.2.3 09-Jun-2007  ad Sync with head.
 1.1.2.2 09-Jun-2007  ad Sync with head.
 1.1.2.1 03-Jun-2007  ad file est.c was added on branch vmlocking on 2007-06-09 21:37:05 +0000
 1.2.10.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.2.10.5 28-Oct-2007  joerg Sync with HEAD.
 1.2.10.4 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.2.10.3 29-Sep-2007  jmcneill Don't define EST_DEBUG by default.
 1.2.10.2 16-Aug-2007  jmcneill Sync with HEAD.
 1.2.10.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.2.6.1 15-Aug-2007  skrll Sync with HEAD.
 1.2.4.1 07-Aug-2007  matt Sync with HEAD.
 1.2.2.2 11-Jul-2007  mjf Sync with head.
 1.2.2.1 01-Jul-2007  mjf file est.c was added on branch mjf-ufs-trans on 2007-07-11 20:03:18 +0000
 1.3.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.3.10.2 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.3.10.1 06-Aug-2007  wrstuden file est.c was added on branch wrstuden-fixsa on 2007-09-23 21:36:27 +0000
 1.3.6.2 12-Sep-2007  msaitoh Pull up following patches (requested by xtraeme in ticket #809)

share/man/man4/options.4 patch
sys/arch/i386/conf/files.i386 patch
sys/arch/i386/i386/est.c delete
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/include/cpuvar.h patch
sys/arch/x86/x86/est.c new file
sys/arch/x86/x86/intel_busclock.c new file
sys/arch/amd64/amd64/identcpu.c patch
sys/arch/amd64/conf/GENERIC patch

Add support for the VIA C7-M and Eden processors in the Enhanced
Speedstep driver.
amd64: The Enhanced Speedstep driver is now able to work on EM64T
CPUs running in 64bit mode.
 1.3.6.1 06-Aug-2007  msaitoh file est.c was added on branch netbsd-4 on 2007-09-12 10:05:03 +0000
 1.3.4.5 21-Jan-2008  yamt sync with head
 1.3.4.4 15-Nov-2007  yamt sync with head.
 1.3.4.3 27-Oct-2007  yamt sync with head.
 1.3.4.2 03-Sep-2007  yamt sync with head.
 1.3.4.1 06-Aug-2007  yamt file est.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:24 +0000
 1.3.2.2 09-Jan-2008  matt sync with HEAD
 1.3.2.1 06-Nov-2007  matt sync with HEAD
 1.5.4.2 26-Dec-2007  ad Sync with head.
 1.5.4.1 08-Dec-2007  ad Sync with head.
 1.5.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.5.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.6.2.1 11-Dec-2007  yamt sync with head.
 1.7.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.8.4.5 09-Oct-2010  yamt sync with head
 1.8.4.4 11-Aug-2010  yamt sync with head.
 1.8.4.3 11-Mar-2010  yamt sync with head
 1.8.4.2 04-May-2009  yamt sync with head.
 1.8.4.1 16-May-2008  yamt sync with head.
 1.8.2.1 18-May-2008  yamt sync with head.
 1.9.18.1 21-Apr-2010  matt sync to netbsd-5
 1.9.14.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.9.14.3 24-Oct-2010  jym Sync with HEAD
 1.9.14.2 01-Nov-2009  jym Sync with HEAD.
 1.9.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.9.10.1 05-Oct-2009  sborrill Pull up following revision(s) (requested by jmcneill in ticket #1059):
sys/arch/x86/include/cpuvar.h: 1.30
sys/arch/x86/x86/est.c: 1.12
sys/arch/x86/x86/intel_busclock.c: 1.8

Use the TSC and current multiplier to calculate bus clock on VIA C7 Esther.
Probably needed for all C7 and Nano processors, but to be safe only use this
alternate method on Esther for now.
 1.9.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.9.8.1 03-Mar-2009  skrll Sync with HEAD.
 1.13.4.2 05-Mar-2011  rmind sync with head
 1.13.4.1 30-May-2010  rmind sync with head
 1.13.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.13.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.19.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.19.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.24.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.24.4.1 30-Oct-2012  yamt sync with head
 1.25.4.1 18-May-2014  rmind sync with head
 1.25.2.2 03-Dec-2017  jdolecek update from HEAD
 1.25.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.29.2.1 10-Aug-2014  tls Rebase.
 1.30.4.1 28-Aug-2017  skrll Sync with HEAD
 1.93 14-May-2025  riastradh x86/fpu.c: Reduce needless stack usage of fpuinit_mxcsr_mask.

This is using the FXSAVE instruction explicitly, which is defined to
operate on a 512-byte area, unconditionally, so let's just use the
struct fxsave type we've defined for that instead of the larger union
savefpu.

XXX The fxsave/fxrstor functions should really be typed, but that
requires moving includes or definitions around between cpufunc.h and
cpu_extende_state.h which might have userland API implications that
I'd rather not think about right now.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.92 14-May-2025  riastradh x86/fpu.c: Revert unnecessary part of rev. 1.90.

Most of the changes including x86/fpu.c rev. 1.90 served to support a
separately allocated XSAVE area in each thread's pcb. On machines
with large XSAVE areas, the fpu.c changes also:

1. allocated a separate area for fpuinit_mxcsr_mask, and

2. allocated a separate area for the safe/zero states of
fpu_kern_enter/leave.

But (1) is unnecessary because we uncondiitonally use FXSAVE for
that, for which union savefpu is enough -- actually, struct fxsave
(exactly 512 bytes) is enough. To be done in a separate change.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.91 28-Apr-2025  riastradh xen: Stop-gap FPU PCB fix; disable Intel AMX for now.

Since the custom cpu_uarea_alloc/free are disabled under XENPV,
nothing would initialize struct pcb::pcb_savefpu to point either to
struct pcb::pcb_savefpusmall, or to a separately allocated large area
on machines with Intel AMX TILECFG/TILEDATA requiring it. So the
memset in fpu_lwp_fork would crash on null pointer dereference:

[ 1.0000030] uvm_fault(0xffffffff8094a300, 0x0, 2) -> e
[ 1.0000030] fatal page fault in supervisor mode
[ 1.0000030] trap type 6 code 0x2 rip 0xffffffff8062795c cs 0xe030 rflags 0x10202 cr2 0 ilevel 0 rsp 0xffffffff80adad38
[ 1.0000030] curlwp 0xffffffff8078f880 pid 0.0 lowest kstack 0xffffffff80ad62c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:memset+0x2c: repe stosq %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x2f1
fork1() at netbsd:fork1+0x42c
main() at netbsd:main+0x44f

In order to support Intel AMX TILECFG/TILEDATA, or any other CPU
extensions that increase the XSAVE area beyond what fits in a single
page after struct pcb, we would need to enable the the custom
cpu_uarea_alloc/free. Currently that would imply allocating stack
guard pages (`redzone') under XENPV; if there's some reason the stack
guard pages don't work, we could also push #ifdef XENPV conditionals
into cpu_uarea_alloc/free to cover the guard pages -- to be
considered.

PR kern/59371: Xen domU uvm_fault since FPU state allocation patch

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.90 24-Apr-2025  riastradh amd64: Allocate FPU save state outside pcb if it's too large.

We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as
large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte
state.

We only do this for user threads, and only on machines where it's
necessary, to avoid incurring much overhead. There is still a tiny
bit of overhead when saving and restoring the FPU state by using a
pointer indirection instead of arithmetic indirection for access to
struct pcb::pcb_savefpu, but this is probably a drop in the bucket
compared to the memory traffic incurred by the FPU state save/restore
anyway.

For now, these paths are mostly disabled on i386. We could enable
them but it will require either rewriting cpu_uarea_alloc/free for
i386, or adopting a guard page like amd64 does, which might be costly
and so should be undertaken only with some thought and care. And
since Intel AMX instructions only work in 64-bit mode, it's not
likely to be useful on i386.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu

These changes, as a side effect, may fix:

PR kern/57258: kthread_fpu_enter/exit problem

by making sure to allocate an FPU save space that is large enough to
guarantee fpu_kern_enter/leave work safely, instead of just using a
union savefpu object on the stack (which, at 576 bytes, may be too
small on some machines, particularly with AVX512 requiring ~2.5K).
(But we'll have to do some extra work with kthread_fpu_enter/exit_md
-- if we try doing them again on x86 -- to actually allocate the
separate pcb on these machines!)
 1.89 21-Jun-2024  riastradh branches: 1.89.2;
x86/fpu.c: Nix trailing whitespace.

No functional change intended.
 1.88 17-May-2024  manu iWorkaround panic: fpudna from userland

i386 Xen PV domU get spurious fpudna traps from userland. Older eager FPU
contact switching code took care of ignoring them. When transitioning
from eager switching to awlays switching, this special handling was
removed, causing "fpudna from userland" panics.

This change restores the previosu behavior where fpudna traps from
userland are ignored on Xen PV domU.
 1.87 18-Jul-2023  riastradh x86/fpu: In kernel mode fpu traps, print the instruction pointer.
 1.86 03-Mar-2023  riastradh x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.

16 bytes is not enough.

(Is this why it never worked on Xen some years back? Got lucky and
accidentally had 64-byte alignment on native x86, but not in the call
stack in Xen?)

XXX pullup-10
 1.85 03-Mar-2023  riastradh Revert "x86: Add kthread_fpu_enter/exit support, take two."

kthread_fpu_enter/exit changes broke some hardware, unclear why, to
investigate before fixing and reapplying these changes.
 1.84 03-Mar-2023  riastradh Revert "x86/fpu.c: Sprinkle KNF."

kthread_fpu_enter/exit changes broke some hardware, unclear why, to
investigate before fixing and reapplying these changes.
 1.83 25-Feb-2023  riastradh x86/fpu.c: Sprinkle KNF.

No functional change intended.
 1.82 25-Feb-2023  riastradh x86: Add kthread_fpu_enter/exit support, take two.

This time, make sure to restore the FPU state when switching to a
kthread in the middle of kthread_fpu_enter/exit.

This adds a single predicted-taken branch for the case of kthreads
that are not in kthread_fpu_enter/exit, so it incurs a penalty only
for threads that actually use it. Since it avoids FPU state
switching in kthreads that do use the FPU, namely cgd worker threads,
this should be a net performance win on systems using it and have
negligible impact otherwise.

XXX pullup-10
 1.81 25-Feb-2023  riastradh x86: Label boolean is_64bit argument to fpu_area_restore.

No functional change intended.
 1.80 25-Feb-2023  riastradh x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.

In fpu_kern_enter, make sure all the MXCSR exception status bits are
set when we start using the FPU, so that instructions which exhibit
MCDT are unaffected by it.

While here, zero all the other FPU registers in fpu_kern_enter.

In principle we could skip this step on future CPUs that fix the MCDT
bug, but there's probably not much benefit -- workloads that do a lot
of crypto in the kernel are probably better off using
kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles
in the first place.

For details, see:
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
 1.79 20-Aug-2022  riastradh branches: 1.79.4;
fpu_kern_enter/leave: Disable IPL assertions.

These don't work because mutex_enter/exit on a spin lock may raise an
IPL but not lower it, if another spin lock was already held. For
example,

mutex_enter(some_lock_at_IPL_VM);
printf("foo\n");
fpu_kern_enter();
...
fpu_kern_leave();
mutex_exit(some_lock_at_IPL_VM);

will trigger the panic, because printf takes a lock at IPL_HIGH where
the IPL wil remain until the mutex_exit. (This was a nightmare to
track down before I remembered that detail of spin lock IPL
semantics...)
 1.78 24-May-2022  andvar fix various typos in comments, docs and log messages.
 1.77 01-Apr-2022  riastradh x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.
 1.76 24-Oct-2020  mgorny Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs

When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64
use the 64-suffixed variant in order to include the complete FIP/FDP
registers in the x87 area.

The difference between the two variants is that the FXSAVE64 (new)
variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64),
while the legacy FXSAVE variant uses split fields: 32-bit offset,
16-bit segment and 16-bit reserved field (union fp_addr.fa_32).
The latter implies that the actual addresses are truncated to 32 bits
which is insufficient in modern programs.

The change is applied only to 64-bit programs on amd64. Plain i386
and compat32 continue using plain FXSAVE. Similarly, NVMM is not
changed as I am not familiar with that code.

This is a potentially breaking change. However, I don't think it likely
to actually break anything because the data provided by the old variant
were not meaningful (because of the truncated pointer).
 1.75 15-Oct-2020  mgorny Revert "Merge convert_xmm_s87.c into fpu.c"

I am going to add ATF tests for these two functions, and having them
in a separate file will make it more convenient to build and run them
in userspace.
 1.74 02-Aug-2020  riastradh Revert "Add kthread_fpu_enter/exit support to x86." for now.

Need to find all the paths out of interrupts back into _kernel_
context to add HANDLE_DEFERRED_FPU, I think, before this can be
enabled.
 1.73 01-Aug-2020  riastradh Add kthread_fpu_enter/exit support to x86.
 1.72 20-Jul-2020  riastradh Fix fpu_kern_enter in a softint that interrupted a softint.

We need to find the lwp that was originally interrupted to save its
fpu state.

With this, fpu-heavy programs (like firefox) are once again stable,
at least under modest stress testing, on systems configured to use
wifi with WPA2 and CCMP.
 1.71 20-Jul-2020  riastradh Save fpu state at IPL_VM to exclude fpu_kern_enter/leave.

This way fpu_kern_enter/leave cannot interrupt the transition, so the
transition from state-on-CPU to state-in-memory (with TS set) is
atomic whether in an interrupt or not.

(I am not 100% convinced that this is necessary, but it makes
reasoning about the transition simpler.)
 1.70 20-Jul-2020  riastradh Revert 1.66 "Fix race in fpu save with fpu_kern_enter in softint."

This only fixed part of the race, and we can do it more simply.
 1.69 20-Jul-2020  riastradh Revert 1.67 "Restore the lwp's fpu state, not zeros, and leave with fpu enabled."

This didn't actually avoid double-restore, and it doesn't solve the
problem anyway, and made it harder to detect in-kernel fpu abuse.
 1.68 13-Jul-2020  riastradh Limit x86 fpu_kern_enter/leave to IPL_VM or below.

There are no users of crypto at IPL_SCHED or IPL_HIGH as far as I
know, and although we generally limit the amount of time spent in any
one crypto operation -- e.g., cgd is usually limited to processing
512 or 4096 bytes at a time -- it's better not to block IPL_SCHED and
IPL_HIGH interrupts at all. This should make ddb a little more
accessible during crypto-heavy workloads.

This means the aes_* API cannot be used at IPL_SCHED or IPL_HIGH; the
same will go for any new crypto subsystems, like the ChaCha and
Poly1305 ones I'm drafting. It might be better to prohibit them
altogether in hard interrupt context, but right now cprng_fast and
cprng_strong are both technically allowed at IPL_VM and are sometimes
used there (e.g., for opencrypto CBC IV generation).

KASSERT the ilevel to detect violation of this constraint in case I'm
wrong.
 1.67 06-Jul-2020  riastradh Restore the lwp's fpu state, not zeros, and leave with fpu enabled.

We need to clear the fpu state anyway because it is likely to contain
secrets at this point. Previously we set it to zeros, and then issued
stts to disable the fpu in order to detect the mistake of further use
of the fpu in kernel. But there must be some path I haven't identified
yet that doesn't do fpu_handle_deferred, leading to fpudna panics.

In any case, there's no benefit to restoring the fpu state twice
(once with zeros and once with the real data). The downside is,
although this avoids spurious fpudna traps, using fpu_kern_enter in a
softint has the side effect that -- until the next userland context
switch triggering stts -- we no longer detect misuse of fpu in the
kernel in that lwp. This will serve for now, but we should find
another way to issue clts/stts judiciously to detect such misuse.

May improve the continued symptoms of
https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
although may not fix everything.
 1.66 06-Jul-2020  riastradh Fix race in fpu save with fpu_kern_enter in softint.

Likely source of:

https://mail-index.netbsd.org/current-users/2020/07/02/msg039051.html
 1.65 14-Jun-2020  riastradh Use static constant rather than stack memset buffer for zero fpregs.
 1.64 13-Jun-2020  riastradh Add comments over fpu_kern_enter/leave.
 1.63 13-Jun-2020  riastradh Zero the fpu registers on fpu_kern_leave.

Avoid Spectre-class attacks on any values left in them.
 1.62 04-Jun-2020  riastradh Call clts/stts in fpu_kern_enter/leave so they work.
 1.61 31-Jan-2020  maxv 'oldlwp' is never NULL now, so remove the NULL checks.
 1.60 27-Nov-2019  maxv branches: 1.60.2;
Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();
 1.59 30-Oct-2019  maxv Style.
 1.58 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.57 04-Oct-2019  maxv Rename fpu_eagerswitch to fpu_switch, and add fpu_xstate_reload to
simplify.
 1.56 03-Oct-2019  maxv Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
 1.55 05-Jul-2019  maxv branches: 1.55.2;
More inlines, prerequisites for future changes. Also, remove fngetsw(),
which was a duplicate of fnstsw().
 1.54 26-Jun-2019  mgorny Implement PT_GETXSTATE and PT_SETXSTATE

Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE,
that provide access to the extended (and extensible) set of FPU
registers on amd64 and i386. At the moment, this covers AVX (YMM)
and AVX-512 (ZMM, opmask) registers. It can be easily extended
to cover further register types without breaking backwards
compatibility.

PT_GETXSTATE issues the XSAVE instruction with all kernel-supported
extended components enabled. The data is copied into 'struct xstate'
(which -- unlike the XSAVE area itself -- has stable format
and offsets).

PT_SETXSTATE issues the XRSTOR instruction to restore the register
values from user-provided 'struct xstate'. The function replaces only
the specific XSAVE components that are listed in 'xs_rfbm' field,
making it possible to issue partial updates.

Both syscalls take a 'struct iovec' pointer rather than a direct
argument. This requires the caller to explicitly specify the buffer
size. As a result, existing code will continue to work correctly
when the structure is extended (performing partial reads/updates).
 1.53 25-May-2019  maxv Fix bug. We must fetch the whole FPU state, otherwise XSTATE_BV could be
outdated, and we could be filling the AVX registers with garbage.
 1.52 19-May-2019  maxv Rename

fpu_save_area_clear -> fpu_clear
fpu_save_area_reset -> fpu_sigreset

Clearer, and reduces a future diff. No real functional change.
 1.51 19-May-2019  maxv Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.
 1.50 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.49 20-Jan-2019  maxv Improvements in NVMM

* Handle the FPU differently, limit the states via the given mask rather
than via XCR0. Align to 64 bytes. Provide an initial gXCR0, to be sure
that XCR0_X87 is set. Reset XSTATE_BV when the state is modified by
the virtualizer, to force a reload from memory.

* Hide RDTSCP.

* Zero-extend RBX/RCX/RDX when handling the NVMM CPUID signature.

* Take ECX and not RCX on MSR instructions.
 1.48 05-Oct-2018  maxv export x86_fpu_mxcsr_mask, fpu_area_save and fpu_area_restore
 1.47 17-Sep-2018  maxv Reduce the noise, reorder and rename some things for clarity.
 1.46 01-Jul-2018  maxv Use a variable-sized memcpy, instead of copying the PCB and then adding
the extra bytes. The PCB embeds the biggest static FPU state, but our
real FPU state may be smaller (FNSAVE), so we don't need to memcpy the
extra unused bytes.
 1.45 01-Jul-2018  maxv Use a switch, we can (and will) optimize each case separately. No
functional change.
 1.44 29-Jun-2018  maxv Add more KASSERTs.

Should help PR/53399.
 1.43 23-Jun-2018  maxv branches: 1.43.2;
Add XXX in fpuinit_mxcsr_mask.
 1.42 22-Jun-2018  maxv Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.
 1.41 20-Jun-2018  jdolecek as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv
 1.40 19-Jun-2018  jdolecek fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8
 1.39 19-Jun-2018  maxv When using EagerFPU, create the fpu state in execve at IPL_HIGH.

A preemption could occur in the middle, and we don't want that to happen,
because the context switch would use the partially-constructed fpu state.

The procedure becomes:

splhigh
unbusy the current cpu's fpu
create a new fpu state in memory
install the state on the current cpu's fpu
splx

Disabling preemption also ensures that x86_fpu_eager doesn't change in
the middle.

In LazyFPU mode we drop IPL_HIGH right away.

Add more KASSERTs.
 1.38 18-Jun-2018  maxv Add more KASSERTs, see if they help PR/53383.
 1.37 17-Jun-2018  maxv No, I meant to put the panic in fpudna not fputrap. Also appease it: panic
only if the fpu already has a state. We're fine with getting a DNA, what
we're not fine with is if the DNA is received while the FPU is busy.

I believe (even though I couldn't trigger it) that the panic would
otherwise fire if PT_SETFPREGS is used. And also ACPI sleep/wakeup,
probably.
 1.36 16-Jun-2018  maxv Need IPIs when enabling eager fpu switch, to clear each fpu and get us
started. Otherwise it is possible that the first context switch on one of
the cpus will restore an invalid fpu state in the new lwp, if that lwp
had its fpu state stored on another cpu that didn't have time to do an
fpu save since eager-fpu was enabled.

Use barriers and all the related crap. The point is that we want to
ensure that no context switch occurs between [each fpu is cleared] and
[x86_fpu_eager is set to 'true'].

Also add KASSERTs.
 1.35 16-Jun-2018  maxv Actually, don't do anything if we switch to a kernel thread. When the cpu
switches back to a user thread the fpu is restored, so no point calling
fninit (which doesn't clear all the states anyway).
 1.34 14-Jun-2018  maxv Install the FPU state on the current CPU in setregs (execve).
 1.33 14-Jun-2018  maxv Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.
 1.32 23-May-2018  maxv Add a comment about recent AMD CPUs.
 1.31 23-May-2018  maxv Clarify and extend the fix for the AMD FPU leaks. We were clearing the x87
state only on FXRSTOR, but the same problem exists on XRSTOR, so clear the
state there too.
 1.30 23-May-2018  maxv Merge convert_xmm_s87.c into fpu.c. It contains only two functions, that
are used only in fpu.c.
 1.29 23-May-2018  maxv style
 1.28 09-Feb-2018  maxv branches: 1.28.2;
Force a reload of CW in fpu_set_default_cw(). This function is used only
in COMPAT_FREEBSD, it really needs to die.
 1.27 11-Nov-2017  maxv Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.
 1.26 11-Nov-2017  bouyer Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
 1.25 08-Nov-2017  maxv Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.
 1.24 04-Nov-2017  maxv Add support for xsaveopt. It is basically an instruction that optimizes
context switch performance by not saving to memory FPU registers that are
known to be in their initial state or known not to have changed since the
last time they were saved to memory.

Our code is now compatible with the internal state tracking engine:
- We don't modify the in-memory FPU state after doing an XSAVE/XSAVEOPT.
That is to say, we always call XRSTOR first.
- During a fork, the whole in-memory FPU state area is memcopied in the
new PCB, and CR0_TS is set. Next time the forked thread uses the FPU it
will fault, we migrate the area, call XRSTOR and clear CR0_TS. During
this XRSTOR XSTATE_BV still contains the initial values, and it forces
a reload of XINUSE.
- Whenever software wants to change the in-memory FPU state, it manually
sets XSTATE_BV[i]=1, which forces XINUSE[i]=1.
- The address of the state passed to xrstor is always the same for a
given LWP.

fpu_save_area_clear is changed not to force a reload of CW if fx_cw is
the standard FPU value. This way we have XINUSE[i]=0 for x87, and xsaveopt
will optimize this state.

Small benchmark:
switch lwp to cpu2
do float operation
switch lwp to cpu3
do float operation
Doing this 10^6 times in a loop, my cpu goes on average from 28,2 seconds
to 20,8 seconds.
 1.23 04-Nov-2017  maxv Always set XCR0_X87, to force a reload of CW. That's needed for compat
options where fx_cw is not the standard fpu value.
 1.22 04-Nov-2017  maxv Fix xen. Not tested, but seems fine enough.
 1.21 03-Nov-2017  maxv Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).
 1.20 31-Oct-2017  maxv Zero out the buffer entirely.
 1.19 31-Oct-2017  maxv Mask mxcsr, otherwise userland could set reserved bits to 1 and make
xrstor fault.
 1.18 31-Oct-2017  maxv Initialize xstate_bv with the structures that were just filled in,
otherwise xrstor does not restore them. This can happen only if userland
calls setcontext without having used the FPU before.

Until rev1.15 xstate_bv was implicitly initialized because the xsave area
was not zeroed out properly.
 1.17 31-Oct-2017  maxv Don't embed our own values in the reserved fields of the XSAVE area, it
really is a bad idea. Move them into the PCB.
 1.16 31-Oct-2017  maxv Always use x86_fpu_save, clearer.
 1.15 31-Oct-2017  maxv Remove comments that are more misleading than anything else. While here
make sure we zero out the FPU area entirely, and not just its legacy
region.
 1.14 09-Oct-2017  maya GC i386_fpu_present. no FPU x86 is not supported.

Also delete newly unused send_sigill
 1.13 17-Sep-2017  maxv Remove the second argument from USERMODE and KERNELMODE, it is unused
now that we don't have vm86 anymore.
 1.12 29-Sep-2016  maxv branches: 1.12.8;
Remove outdated comments, typos, rename and reorder a few things.
 1.11 18-Aug-2016  maxv Simplify.
 1.10 27-Nov-2014  uebayasi branches: 1.10.2; 1.10.4;
Consistently use kpreempt_*() outside scheduler path.
 1.9 25-Feb-2014  dsl branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12; 1.9.16;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
 1.8 23-Feb-2014  dsl Add fpu_set_default_cw() and use it in the emulations to set the default
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.
 1.7 23-Feb-2014  dsl Determine whether the cpu supports xsave (and hence AVX).
The result is only written to sysctl nodes at the moment.
I see:
machdep.fpu_save = 3 (implies xsaveopt)
machdep.xsave_size = 832
machdep.xsave_features = 7
Completely common up the i386 and amd64 machdep sysctl creation.
 1.6 15-Feb-2014  dsl Load and save the fpu registers (for copies to/from userspace) using
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.
 1.5 15-Feb-2014  dsl Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).
 1.4 13-Feb-2014  dsl Check the argument types for the fpu asm functions.
 1.3 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.2 12-Feb-2014  dsl Change the argument to fpudna() to be the trapframe.
Move the checks for fpu traps in kernel into x86/fpu.c.
Remove the code from amd64/trap.c related to fpu traps (they've not gone
there for ages - expect to panic in kernel mode).
In fpudna():
- Don't actually enable hardware interrupts unless we need to
allow in IPIs.
- There is no point in enabling them when they are blocked in software
(by splhigh()).
- Keep the splhigh() to avoid a load of the KASSERTS() firing.
 1.1 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.9.16.1 12-Dec-2017  snj Pull up following revision(s) (requested by maxv in ticket #1540):
sys/arch/x86/x86/fpu.c: 1.19 via patch
Mask mxcsr, otherwise userland could set reserved bits to 1 and make
xrstor fault.
 1.9.12.1 12-Dec-2017  snj Pull up following revision(s) (requested by maxv in ticket #1540):
sys/arch/x86/x86/fpu.c: 1.19 via patch
Mask mxcsr, otherwise userland could set reserved bits to 1 and make
xrstor fault.
 1.9.10.3 03-Dec-2017  jdolecek update from HEAD
 1.9.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.9.10.1 25-Feb-2014  tls file fpu.c was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
 1.9.8.1 12-Dec-2017  snj Pull up following revision(s) (requested by maxv in ticket #1540):
sys/arch/x86/x86/fpu.c: 1.19 via patch
Mask mxcsr, otherwise userland could set reserved bits to 1 and make
xrstor fault.
 1.9.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.6.1 25-Feb-2014  yamt file fpu.c was added on branch yamt-pagecache on 2014-05-22 11:40:14 +0000
 1.9.4.2 18-May-2014  rmind sync with head
 1.9.4.1 25-Feb-2014  rmind file fpu.c was added on branch rmind-smpnet on 2014-05-18 17:45:30 +0000
 1.10.4.1 04-Nov-2016  pgoyette Sync with HEAD
 1.10.2.1 05-Oct-2016  skrll Sync with HEAD
 1.12.8.3 10-Jul-2018  martin Pull up the following, requested by maxv in ticket #910:

sys/arch/amd64/amd64/locore.S r1.167 (patch)
sys/arch/i386/i386/locore.S r1.158 (patch)
sys/arch/x86/x86/fpu.c r1.44 (patch)

Don't switch the FPU when leaving a softint. This fixes
several problems when EagerFPU is enabled.
 1.12.8.2 23-Jun-2018  martin Pull up the following, via patch, requested by maxv in ticket #897:

sys/arch/amd64/amd64/locore.S 1.166 (patch)
sys/arch/i386/i386/locore.S 1.157 (patch)
sys/arch/x86/include/cpu.h 1.92 (patch)
sys/arch/x86/include/fpu.h 1.9 (patch)
sys/arch/x86/x86/fpu.c 1.33-1.39 (patch)
sys/arch/x86/x86/identcpu.c 1.72 (patch)
sys/arch/x86/x86/vm_machdep.c 1.34 (patch)
sys/arch/x86/x86/x86_machdep.c 1.116,1.117 (patch)

Support eager fpu switch, to work around INTEL-SA-00145.
Provide a sysctl machdep.fpu_eager, which gets automatically
initialized to 1 on affected CPUs.
 1.12.8.1 21-Dec-2017  snj Pull up following revision(s) (requested by maxv in ticket #442):
sys/arch/x86/x86/fpu.c: 1.19 via patch
Mask mxcsr, otherwise userland could set reserved bits to 1 and make
xrstor fault.
 1.28.2.5 26-Jan-2019  pgoyette Sync with HEAD
 1.28.2.4 20-Oct-2018  pgoyette Sync with head
 1.28.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.28.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.28.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.43.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.43.2.1 10-Jun-2019  christos Sync with HEAD
 1.55.2.2 25-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1665):

sys/arch/x86/x86/fpu.c: revision 1.86

x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.
16 bytes is not enough.

(Is this why it never worked on Xen some years back? Got lucky and
accidentally had 64-byte alignment on native x86, but not in the call
stack in Xen?)
 1.55.2.1 18-Oct-2020  martin Pull up following revision(s) (requested by kamil in ticket #1117):

sys/arch/sh3/include/ptrace.h: revision 1.19
sys/arch/amd64/amd64/process_machdep.c: revision 1.48
sys/arch/sh3/sh3/process_machdep.c: revision 1.23
sys/arch/sh3/sh3/process_machdep.c: revision 1.24
sys/arch/i386/i386/process_machdep.c: revision 1.95
sys/arch/x86/x86/fpu.c (apply patch)
sys/kern/sys_ptrace_common.c: revision 1.84
sys/arch/powerpc/powerpc/process_machdep.c: revision 1.40
sys/sys/ptrace.h: revision 1.71
sys/arch/powerpc/powerpc/process_machdep.c: revision 1.41
(all via patch, adapted)

Fix s87_tw reconstruction to correctly indicate register states

Fix the code reconstructing s87_tw (full tag word) from fx_sw (abridged
tag word) to correctly represent all register states. The previous code
only distinguished between empty/non-empty registers, and assigned
'regular value' to all non-empty registers. The new code explicitly
distinguishes the two other tag word values: empty and special.

Fix the machine-dependent ptrace register-related requests (e.g.
PT_GETXMMREGS, PT_GETXSTATE on x86) to correctly respect the LWP number
passed as the data argument. Before this change, these requests
did not operate on the requested LWP of a multithreaded program.
This change required moving ptrace_update_lwp() out of unit scope,
and changing ptrace_machdep_dorequest() function to take a pointer
to pointer as the second argument, consistently with ptrace_regs().

I am planning to extend the ATF ptrace() register tests in the future
to check for regressions in multithreaded programs, as time permits.

Reviewed by kamil.

Add missing 'error' declaration
 1.60.2.1 29-Feb-2020  ad Sync with head.
 1.79.4.5 15-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1119):

sys/arch/x86/x86/fpu.c: revision 1.81
sys/arch/x86/x86/fpu.c: revision 1.90 (partial, via patch)

x86: Fix fpu_kern_enter/leave for machines with big fpu state.

fpu_kern_enter/leave are used for FPU/SIMD access from the kernel,
for cgd(4) crypto and similar. Currently, we use a statically
allocated union savefpu object with particular content to put the
FPU into a safe state or an all-zero state.

This doesn't work (reliably, anyway) when the FPU state is larger
than 576 bytes, e.g. with AVX-512 register state (another ~2 KiB)
or Intel AMX TILECFG/TILEDATA (another ~10 KiB). For machines
with larger FPU state, this change dynamically allocates a larger
area for the safe/zero FPU states.

(This change doesn't add support for Intel AMX TILECFG/TILEDATA;
it just avoids the failure mode. The part of
https://mail-index.netbsd.org/source-changes/2025/04/24/msg156552.html
it applies is limited.)
This may improve the situation for the following PRs:

PR kern/57258: kthread_fpu_enter/exit problem
PR kern/58650: unable to install v10.0 - kernel does not boot, sysinst does not start
 1.79.4.4 20-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #890):

sys/arch/x86/x86/fpu.c: revision 1.87

x86/fpu: In kernel mode fpu traps, print the instruction pointer.
 1.79.4.3 20-Jun-2024  martin Pull up following revision(s) (requested by manu in ticket #701):

sys/arch/x86/x86/fpu.c: revision 1.88

Workaround panic: fpudna from userland

i386 Xen PV domU get spurious fpudna traps from userland. Older eager FPU
contact switching code took care of ignoring them. When transitioning
from eager switching to awlays switching, this special handling was
removed, causing "fpudna from userland" panics.

This change restores the previosu behavior where fpudna traps from
userland are ignored on Xen PV domU.
 1.79.4.2 25-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #244):

sys/arch/x86/x86/fpu.c: revision 1.80
sys/arch/x86/include/cpu_extended_state.h: revision 1.18

x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use.

In fpu_kern_enter, make sure all the MXCSR exception status bits are
set when we start using the FPU, so that instructions which exhibit
MCDT are unaffected by it.

While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT
bug, but there's probably not much benefit -- workloads that do a lot
of crypto in the kernel are probably better off using
kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles
in the first place.

For details, see:
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
 1.79.4.1 25-Jul-2023  martin Pull up following revision(s) (requested by riastradh in ticket #245):

sys/arch/x86/x86/fpu.c: revision 1.86

x86/fpu: Align savefpu to 64 bytes in fpuinit_mxcsr_mask.
16 bytes is not enough.

(Is this why it never worked on Xen some years back? Got lucky and
accidentally had 64-byte alignment on native x86, but not in the call
stack in Xen?)
 1.89.2.1 02-Aug-2025  perseant Sync with HEAD
 1.23 19-Oct-2023  bouyer Move definition of acpi_md_vesa_modenum to acpi_wakeup.c; allows building
kernels without framebuffer devices.
Problem reported by John D. Baker on current-users@
 1.22 17-Oct-2023  bouyer Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.
Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen
when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.
x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console
xen/x86/consinit.c: support genfb as possible console
xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.
xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.
 1.21 16-Oct-2023  bouyer Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.20 25-Aug-2023  riastradh xen: Provide definitions or ifdefs to make drm build in XEN3_DOM0.

No idea if it works, but it builds now.

PR port-xen/49330
 1.19 13-Sep-2022  riastradh branches: 1.19.4;
x86/genfb: Re-enable shadowfb by defualt for now.

Something makes radeondrmkmsfb, at at least, extremely slow, and it's
not yet clear what, and shadowfb=true fixes it. I verified that the
framebuffer pages are correctly getting mapped write-combining, so
the page table entries aren't the problem -- not sure what is the
problem.
 1.18 14-Aug-2022  riastradh x86/genfb: Disable shadowfb by default.

The motivation for this was obviated by mapping the framebuffer
write-combining instead of uncacheable.

If this still appears to be needed, most likely the mapping is still
wrong and that should be fixed directly, or this should be enabled
only in the circumstances where the mapping can't be made right.
 1.17 16-Jul-2022  mlelstv Use pixel format information from bootloader.
 1.16 28-Jan-2021  jmcneill Remove x86_genfb_mtrr_init. PATs have been available since the Pentium III
and this code has been #if notyet'd shortly after being introduced.
 1.15 30-Nov-2019  nonaka branches: 1.15.8;
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.14 01-Oct-2019  chs in many device attach paths, allocate memory with KM_SLEEP instead of KM_NOSLEEP
and remove code to handle failures that can no longer happen.
 1.13 19-May-2019  mlelstv branches: 1.13.2;
correct order of parameters, has no effect as anything set here is
overwritten by the following reconfig.
 1.12 25-Feb-2017  nonaka branches: 1.12.6; 1.12.14;
EFI console is drawing faster by shadowfb.
 1.11 25-Jul-2013  macallan branches: 1.11.6; 1.11.10; 1.11.14;
fix width vs height typo
from imre at vdsz.com
 1.10 01-Jul-2011  dyoung branches: 1.10.2; 1.10.12; 1.10.16;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.9 15-Feb-2011  jruoho Fix wrong assertion logic.
 1.8 10-Feb-2011  jmcneill Unfortunately the current MTRR code can't grow an existing WC mapping, and
since we don't know the total framebuffer size setting up an MTRR here
would prevent X from creating a larger one later.

Instead map the framebuffer with BUS_SPACE_MAP_PREFETCHABLE and hope that
PAT is supported.
 1.7 09-Feb-2011  jmcneill if genfb is attached, hook into db_trap_callback to switch in and out of
polling mode as necessary
 1.6 09-Feb-2011  jmcneill PRIx64 instead of llx for uint64_t format
 1.5 08-Feb-2011  jmcneill add a 'setmode' callback to genfb and use it to setup write-combining
MTRRs on x86 whenever switching to WSDISPLAYIO_MODE_EMUL
 1.4 28-Apr-2010  dyoung branches: 1.4.2; 1.4.4;
On x86, change the bus_space_tag_t to a pointer to a struct
bus_space_tag. For now, bus_space_tag's only member is
bst_type, the type of space, which is either X86_BUS_SPACE_IO
or X86_BUS_SPACE_MEM. In the future, new bus_space_tag members
will refer to override-functions installed by a new function,
bus_space_tag_create(9).

Add pointers to constant struct bus_space_tag, x86_bus_space_io and
x86_bus_space_mem. Use them to replace most uses of X86_BUS_SPACE_IO
and X86_BUS_SPACE_MEM.

Add an x86-specific bus_space_is_equal(9) implementation that compares
the two tags' bst_type.
 1.3 24-Aug-2009  jmcneill branches: 1.3.2; 1.3.4;
Pass the VBE mode number from the bootloader to the kernel, and then
make the ACPI wakecode aware of it. Restore the desired VBE mode on resume
when acpi_vbios_reset=1, so suspend/resume with genfb console will work.
 1.2 17-Feb-2009  jmcneill branches: 1.2.2; 1.2.4; 1.2.6;
Set clear-screen and cursor-row so the transition from the early console
driver and genfb is seamless. While we're here, clear the screen when
we first attach in case the bootloader scribbled on it.
 1.1 17-Feb-2009  jmcneill PR# port-i386/37026: userconf(4) doesn't work with vesafb(4)

Add early console support for x86 genfb.
 1.2.6.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.2.6.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.2.6.4 24-Oct-2010  jym Sync with HEAD
 1.2.6.3 01-Nov-2009  jym Sync with HEAD.
 1.2.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.6.1 17-Feb-2009  jym file genfb_machdep.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.2.4.4 11-Aug-2010  yamt sync with head.
 1.2.4.3 16-Sep-2009  yamt sync with head
 1.2.4.2 04-May-2009  yamt sync with head.
 1.2.4.1 17-Feb-2009  yamt file genfb_machdep.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:10 +0000
 1.2.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.2.2.1 17-Feb-2009  skrll file genfb_machdep.c was added on branch nick-hppapmap on 2009-03-03 18:29:37 +0000
 1.3.4.2 05-Mar-2011  rmind sync with head
 1.3.4.1 30-May-2010  rmind sync with head
 1.3.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.4.4.3 17-Feb-2011  bouyer Sync with HEAD
 1.4.4.2 09-Feb-2011  bouyer Sync with HEAD
 1.4.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.4.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.10.16.1 28-Aug-2013  rmind sync with head
 1.10.12.2 03-Dec-2017  jdolecek update from HEAD
 1.10.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.11.6.1 28-Aug-2017  skrll Sync with HEAD
 1.12.14.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.12.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12.14.1 10-Jun-2019  christos Sync with HEAD
 1.12.6.1 05-Dec-2019  bouyer Pull up following revision(s) (requested by nonaka in ticket #1466):
sys/arch/x86/x86/hyperv.c: revision 1.5
sys/arch/x86/include/genfb_machdep.h: revision 1.4
sys/arch/x86/x86/genfb_machdep.c: revision 1.15
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.13.2.1 08-Dec-2019  martin Pull up following revision(s) (requested by nonaka in ticket #502):
sys/arch/x86/x86/hyperv.c: revision 1.5
sys/arch/x86/include/genfb_machdep.h: revision 1.4
sys/arch/x86/x86/genfb_machdep.c: revision 1.15
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.15.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.19.4.4 21-Oct-2023  martin Apply patch, requested by bouyer in ticket #433:

sys/arch/x86/pci/pci_machdep.c (apply patch)
sys/arch/x86/x86/genfb_machdep.c (apply patch)

Fix build of XEN kernels with genfb(4)
 1.19.4.3 20-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #432):

sys/arch/x86/x86/genfb_machdep.c: revision 1.23 (patch)
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.57 (patch)

Move definition of acpi_md_vesa_modenum to acpi_wakeup.c; allows building
kernels without framebuffer devices.

Problem reported by John D. Baker on current-users@
 1.19.4.2 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #428):

sys/arch/xen/xen/xen_machdep.c: revision 1.28
sys/arch/x86/pci/pci_machdep.c: revision 1.97
sys/arch/xen/xen/genfb_xen.c: revision 1.1
sys/arch/xen/xen/genfb_xen.c: revision 1.2
sys/arch/xen/include/hypervisor.h: revision 1.59
sys/arch/i386/conf/XEN3PAE_DOM0: revision 1.41 (patch)
sys/arch/x86/x86/genfb_machdep.c: revision 1.22
sys/arch/xen/x86/consinit.c: revision 1.18
sys/arch/xen/x86/autoconf.c: revision 1.26
sys/external/mit/xen-include-public/dist/xen/include/public/platform.h: revision 1.2
sys/arch/xen/conf/files.xen: revision 1.188
sys/arch/x86/x86/consinit.c: revision 1.37
sys/arch/xen/conf/files.xen: revision 1.189
sys/arch/x86/x86/consinit.c: revision 1.38
sys/external/mit/xen-include-public/dist/xen/include/public/xen.h: revision 1.2
sys/arch/x86/include/genfb_machdep.h: revision 1.7
sys/arch/xen/x86/pvh_consinit.c: revision 1.5
sys/arch/xen/x86/pvh_consinit.c: revision 1.6
sys/arch/amd64/conf/XEN3_DOM0: revision 1.201

Move the pvh_xencons so xen_machdep.c as early_xencons, so it can be
used in the future as early ouput for plain PV guests too.

Support non-VGA framebuffers for Xen dom0. This is mandatory for graphic
console on EFI-only hardware.

Add a xen_genfb_getbtinfo() function which will return a btinfo_framebuffer
structure, filled in with parameters provided by Xen

when runing as a Xen dom0, call xen_genfb_getbtinfo() instead of
lookup_bootinfo(BTINFO_FRAMEBUFFER) when adding properties to the
PCI graphic device (when genfb is attached) and in x86_genfb_init()
when genfb is used as console.

x86/x86/consinit.c: If running as a Xen dom0, use xen_genfb_getbtinfo()
to check if we have a genfb console

xen/x86/consinit.c: support genfb as possible console

xen/x86/consinit.c: use the hypervior IO as console until a better one
is found. If the hypervisor is using a serial port for boot messages,
we'll get NetBSD's boot message on the serial port too until
the real console takes over.

xen/x86/autoconf.c: rework device_register() to be closer to the x86 version.
Especially make sure that device_pci_register() is called.

Make sure to always fall back to xen_early_console, even for dom0

Enable genfb in DOM0 kernels

Add ext_lfb_base to dom0_vga_console_info, from recent Xen. We know if it's
present or not by checking dom0.info_size

Add XENPF_get_dom0_console, which gets a dom0_vga_console_info stucture
from the hypervisor. To be used by PVH dom0 kernels.

XENPVH option is not used. Fix consinit.c to use XENPVHVM as intended
and XENPVH from defflag
for a dom0 PVH, the dom0_vga_console_info structure has to be retrieved
using a platform hypercall; do so in the XENPVHVM case.

Now genfb works in a PVH dom0 running on Xen 4.18 (Xen 4.15 doesn't support
this platoform op, so no way to make it work here).
 1.19.4.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #425):

sys/arch/x86/pci/pci_machdep.c: revision 1.96
sys/arch/x86/acpi/acpi_machdep.c: revision 1.36
sys/arch/x86/x86/hyperv.c: revision 1.16
sys/arch/x86/x86/genfb_machdep.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.56
sys/arch/x86/include/genfb_machdep.h: revision 1.6

Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.17 12-Apr-2025  nonaka PR/58298: Convert hyperv_hypercall_page to naked function.
 1.16 16-Oct-2023  bouyer branches: 1.16.6;
Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.15 20-May-2022  nonaka branches: 1.15.4;
Improve Hyper-V support.

vmbus(4):
- Added support for multichannel.

hvn(4):
- Added support for multichannel.
- Added support for change MTU.
- Added support for TX aggregation.
- Improve VLAN support.
- Improve checksum offload support.
 1.14 23-Dec-2021  yamaguchi hyper-v: move idt vector allocating to vmbus_init_interrupts_md()
for refactoring

And, the deallocating is also moved to
vmbus_deinit_interrupts_md().

reviewed by nonaka@n.o.
 1.13 28-Jan-2021  jmcneill Remove x86_genfb_mtrr_init. PATs have been available since the Pentium III
and this code has been #if notyet'd shortly after being introduced.
 1.12 12-Oct-2020  ryoon branches: 1.12.2;
Fix typo in comment
 1.11 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.10 15-Jun-2020  msaitoh Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.9 17-May-2020  nonaka Fixed a problem that caused a page fault when attaching vmbus(4).

Dynamically allocate a page of memory with uvm_km_alloc(kernel_map, ...)
for Hyper-V hypercall. However, this method can no longer be used to
make an executable page.
So we prevent it by using statically allocated memory for text segment.
 1.8 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.7 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.6 07-Dec-2019  nonaka branches: 1.6.6;
Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.
 1.5 30-Nov-2019  nonaka Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.4 03-Jun-2019  nonaka branches: 1.4.2; 1.4.4;
Use efi_probe().
 1.3 30-May-2019  nonaka Avoid undefined reference to `hyperv_guid_video' without vmbus(4).
 1.2 24-May-2019  nonaka Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.1 15-Feb-2019  nonaka branches: 1.1.2;
Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.1.2.5 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1593:

sys/arch/x86/conf/files.x86 1.108
sys/arch/x86/include/apicvar.h 1.7 via patch
sys/arch/x86/include/cpu.h 1.121
sys/arch/x86/x86/cpu.c 1.185 via patch
sys/arch/x86/x86/hyperv.c 1.7
sys/arch/x86/x86/tsc.c 1.41
sys/arch/xen/conf/files.xen 1.181

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.1.2.4 05-Dec-2019  bouyer Pull up following revision(s) (requested by nonaka in ticket #1466):
sys/arch/x86/x86/hyperv.c: revision 1.5
sys/arch/x86/include/genfb_machdep.h: revision 1.4
sys/arch/x86/x86/genfb_machdep.c: revision 1.15
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.1.2.3 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.1.2.2 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.1.2.1 15-Feb-2019  martin file hyperv.c was added on branch netbsd-8 on 2019-03-09 17:10:19 +0000
 1.4.4.2 15-Jul-2020  martin Pull up the following, requested by msaitoh in ticket #1015

sys/arch/x86/conf/files.x86 1.108 (via patch)
sys/arch/x86/include/apicvar.h 1.7 (via patch)
sys/arch/x86/include/cpu.h 1.121 (via patch)
sys/arch/x86/x86/cpu.c 1.185 (via patch)
sys/arch/x86/x86/hyperv.c 1.7 (via patch)
sys/arch/x86/x86/tsc.c 1.41 (via patch)
sys/arch/xen/conf/files.xen 1.181 (via patch)

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.4.4.1 08-Dec-2019  martin Pull up following revision(s) (requested by nonaka in ticket #502):
sys/arch/x86/x86/hyperv.c: revision 1.5
sys/arch/x86/include/genfb_machdep.h: revision 1.4
sys/arch/x86/x86/genfb_machdep.c: revision 1.15
Prevent panic when attaching genfb if using a serial console with Hyper-V Gen.2.
 1.4.2.4 21-Apr-2020  martin Sync with HEAD
 1.4.2.3 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4.2.2 10-Jun-2019  christos Sync with HEAD
 1.4.2.1 03-Jun-2019  christos file hyperv.c was added on branch phil-wifi on 2019-06-10 22:06:53 +0000
 1.6.6.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.12.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.15.4.1 18-Oct-2023  martin Pull up following revision(s) (requested by bouyer in ticket #425):

sys/arch/x86/pci/pci_machdep.c: revision 1.96
sys/arch/x86/acpi/acpi_machdep.c: revision 1.36
sys/arch/x86/x86/hyperv.c: revision 1.16
sys/arch/x86/x86/genfb_machdep.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.56
sys/arch/x86/include/genfb_machdep.h: revision 1.6

Declare
int acpi_md_vesa_modenum;
int acpi_md_vbios_reset;
struct vcons_screen x86_genfb_console_screen;

in genfb_machdep.h instead of locally as extern in various .c files.
 1.16.6.1 02-Aug-2025  perseant Sync with HEAD
 1.1 15-Feb-2019  nonaka branches: 1.1.2; 1.1.6;
Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.1.6.2 10-Jun-2019  christos Sync with HEAD
 1.1.6.1 15-Feb-2019  christos file hypervreg.h was added on branch phil-wifi on 2019-06-10 22:06:53 +0000
 1.1.2.2 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.1.2.1 15-Feb-2019  martin file hypervreg.h was added on branch netbsd-8 on 2019-03-09 17:10:19 +0000
 1.2 07-Dec-2019  nonaka Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.
 1.1 24-May-2019  nonaka branches: 1.1.2; 1.1.4;
Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.1.4.4 13-Jun-2019  martin Ooops, fix misaplied patch from ticket #1281
 1.1.4.3 12-Jun-2019  martin Missing files from commit for ticket #1280:

sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hvkbdvar.h: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.1.4.2 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.1.4.1 24-May-2019  martin file hypervvar.h was added on branch netbsd-8 on 2019-06-12 10:17:32 +0000
 1.1.2.3 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1.2.2 10-Jun-2019  christos Sync with HEAD
 1.1.2.1 24-May-2019  christos file hypervvar.h was added on branch phil-wifi on 2019-06-10 22:06:53 +0000
 1.25 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.24 21-Apr-2020  msaitoh Whitespace fix. No functional change.
 1.23 11-Feb-2019  cherry branches: 1.23.10;
We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.22 25-Dec-2018  cherry Excise XEN specific code out of x86/x86/intr.c into xen/x86/xen_intr.c

While at it, separate the source function tracking so that the interrupt
paths are truly independant.

Use weak symbol exporting to provision for future PVHVM co-existence
of both files, but with independant paths. Introduce assembler code
such that in a unified scenario, native interrupts get first priority
in spllower(), followed by XEN event callbacks. IPL management and
semantics are unchanged - native handlers and xen callbacks are
expected to maintain their ipl related semantics.

In summary, after this commit, native and XEN now have completely
unrelated interrupt handling mechanisms, including
intr_establish_xname() and assembler stubs and intr handler
management.

Happy Christmas!
 1.21 08-Oct-2018  cherry Clean up XEN specific stuff from the apic code, and move to intr.c

No functional change.
 1.20 07-Oct-2018  cherry In the case of a shared GSI, bind will fail, so we do not attempt this.
The sharing is accomplished by demultiplexing the port event of the first
bind. This is accomplished in intr.c:intr_establish_xname()

Note that the pic_delroute() is buggy (commented suitably) for the shared
gsi case, since it will unbind reset it unconditionally, leaving the other
shared callbacks stranded.

This problem will go awaywhen we unify further with native code, as this
case is taken care of appropriately in that case.
 1.19 07-Oct-2018  cherry While we're here, fix pic->pic_delroute() to DTRT on XEN and
cleanup after itself.
 1.18 07-Oct-2018  cherry Switch over to a "GSI" concept for guest irqs.

On XEN there is a namespace called GSI which includes:

i) legacy_irq (0 - 16)
ii) "gsi" (16-nr_irqs_gsi)
iii) msi

We try to mirror this in guest space, but are mindful that legacy_irq
is 1:1 bound to actual hardware legacy_irq. Apart from this, XEN doesn't
really care what number scheme we use, as long as it doesn't encroach
on the MSI space, which is TBD for us.

Thus we trust the mpbios.c/mpacpi.c code to correctly map the pic,pin
tuples into the correct global gsi space, which we then register with
xen. As we now do, we allow for duplicate gsi registrations, in case
any hardware shares the same (pic,pin);

This enables us to now use the (pic,pin) tuple as the canonical reference
for device interrupt addresses, and leave any global mappings to specific
code. Thus xen_pic_to_gsi().

Note that this requires separate support for MSI, which I will get around to
once things stabilise - however the API change facilitates this nicely.

I note that the msi addroute() function does not use the "pin" parameter.
This can be made use of, to encode the gsi number, for XEN. This is however
TBD.

We further tweak the xen_vec_alloc() code to be uniform for the NIOAPICS
and other cases, and ensure that i8259.c DTRT wrt to route().

This will allow us to use pic->pic_addroute() without needing to worry about
pic specific issues.

The next step is to consolidate the pic_addroute() XEN related #ifdefs into
a -DXEN specific file, so that we don't clutter x86/ code with #ifdef XENs.

This change has functional implications, and there is likely breakage coming
especially on bespoke platforms that I haven't been able to test yet.

I am especially interested in bug reports from platforms with legacy (esp. i386)
and with multiple ioapics.
 1.17 17-Feb-2018  maxv branches: 1.17.2; 1.17.4;
Rename i8259_stubs -> legacy_stubs. We will want the entries to have the
same name, eg:

legacy_stubs
-> Xintr_legacy0, Xrecurse_legacy0, Xresume_legacy0
-> Xintr_legacy1, Xrecurse_legacy1, Xresume_legacy1
...
 1.16 06-Nov-2013  mrg gcc 4.8 issues:
- avoid running over the end of an array (this is a real bug, but
i didn't really look closely at what memory is clobbered. it
may not actually matter.)
- move variables inside their #if usage.
 1.15 18-Dec-2008  cegger branches: 1.15.14; 1.15.24; 1.15.28;
remove unused malloc.h
 1.14 03-Jul-2008  drochner branches: 1.14.4;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.13 16-Apr-2008  cegger branches: 1.13.4; 1.13.6; 1.13.8;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.12 17-Oct-2007  garbled branches: 1.12.16;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.11 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.10 16-Nov-2006  christos branches: 1.10.8; 1.10.16; 1.10.26; 1.10.28; 1.10.30;
__unused removal on arguments; approved by core.
 1.9 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.8 04-Jul-2006  christos branches: 1.8.4; 1.8.6;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.7 19-May-2006  tsutsui branches: 1.7.2; 1.7.4;
Use macro defined in <dev/ic/i8259reg.h>. Same binaries are generated.
 1.6 11-Dec-2005  christos branches: 1.6.4; 1.6.6; 1.6.8; 1.6.12;
merge ktrace-lwp.
 1.5 10-Apr-2004  kochi branches: 1.5.12;
use designated initializer for struct pic initializers.
just for readability.
 1.4 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.3 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.2 02-Mar-2003  fvdl branches: 1.2.2;
Clean up some unneeded "mca.h" and "eisa.h" includes, make one that is
needed dependent on !__x86_64__. To be revisited later.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.12.3 27-Oct-2007  yamt sync with head.
 1.5.12.2 30-Dec-2006  yamt sync with head.
 1.5.12.1 21-Jun-2006  yamt sync with head.
 1.6.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.6.8.2 11-Aug-2006  yamt sync with head
 1.6.8.1 24-May-2006  yamt sync with head.
 1.6.6.1 01-Jun-2006  kardel Sync with head.
 1.6.4.1 09-Sep-2006  rpaulo sync with head
 1.7.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.7.2.2 19-May-2006  tsutsui Use macro defined in <dev/ic/i8259reg.h>. Same binaries are generated.
 1.7.2.1 19-May-2006  tsutsui file i8259.c was added on branch chap-midi on 2006-05-19 13:59:51 +0000
 1.8.6.2 10-Dec-2006  yamt sync with head.
 1.8.6.1 22-Oct-2006  yamt sync with head
 1.8.4.1 18-Nov-2006  ad Sync with head.
 1.10.30.1 06-Oct-2007  yamt sync with head.
 1.10.28.1 06-Nov-2007  matt sync with HEAD
 1.10.26.1 02-Oct-2007  joerg Sync with HEAD.
 1.10.16.1 03-Oct-2007  garbled Sync with HEAD
 1.10.8.1 09-Oct-2007  ad Sync with head.
 1.12.16.3 17-Jan-2009  mjf Sync with HEAD.
 1.12.16.2 28-Sep-2008  mjf Sync with HEAD.
 1.12.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.8.1 03-Jul-2008  simonb Sync with head.
 1.13.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.13.4.1 04-May-2009  yamt sync with head.
 1.14.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.15.28.1 18-May-2014  rmind sync with head
 1.15.24.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.15.14.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.17.4.2 21-Apr-2020  martin Sync with HEAD
 1.17.4.1 10-Jun-2019  christos Sync with HEAD
 1.17.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.17.2.1 20-Oct-2018  pgoyette Sync with head
 1.23.10.3 25-Apr-2020  bouyer sync with bouyer-xenpvh-base2 (HEAD)
 1.23.10.2 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.23.10.1 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.14 04-Mar-2011  jruoho Move INTEL_ONDEMAND_CLOCKMOD -- or odcm(4) -- to the cpufeaturebus.
 1.13 05-Oct-2009  rmind branches: 1.13.4; 1.13.6; 1.13.8;
Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.12 11-May-2008  ad branches: 1.12.12;
Simplify x86 identcpu code, and share between i386/amd64.
 1.11 16-Apr-2008  cegger branches: 1.11.2; 1.11.4; 1.11.6;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.10 02-Dec-2007  rumble branches: 1.10.14;
Fix two minor bugs:
1) If ODCM is disabled (ODCM_ENABLE not set), clockmod_getstate() should
return the maximum level (7), not the lowest (0), as the levels are
defined as duty cycles where the highest implies no ODCM. Now sysctl
machdep.clockmod.current doesn't lie upon init.

2) Make the sysctl handler ensure that no disabled levels are permitted.
Previously, a level disabled due to errata could be passed to
clockmod_setstate(), which would search through the state array,
skipping the unusable value. Consequently our index would be out of
range and badness could ensue.

Okay'd by xtraeme@.
 1.9 17-Oct-2007  garbled branches: 1.9.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.8 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.7 04-Apr-2007  rmind branches: 1.7.2; 1.7.6; 1.7.8; 1.7.16; 1.7.18; 1.7.20; 1.7.22;
clockmod_sysctl_helper: For the sake of clarity - avoid magic number.
No functional change.
 1.6 25-Mar-2007  xtraeme Add another member to struct cpu_msr_broadcast, msr_read that will
enable the rdmsr call in msr_write_ipi(), so that when it's not
defined we don't read it before writing; disabled in powernow_k8
and enabled in the others.
 1.5 21-Mar-2007  xtraeme branches: 1.5.2;
Remove the MSR read IPI handler, there won't be any driver that will
use it, and we can see if the values are ok in the CPUs in the write
operation.

Suggested by YAMAMOTO Takashi.
 1.4 21-Mar-2007  xtraeme There's no need to use MSR_CPU_BROADCAST_READ in clockmod_getstate(),
because clockmod_setstate() will do it for us.
 1.3 21-Mar-2007  xtraeme Use the CPUID2STEPPING macro.
 1.2 21-Mar-2007  xtraeme Do not use cpu_id and cpu_feature, they are not available for i386
(or have different types), use CPUID.
 1.1 20-Mar-2007  xtraeme Driver for Intel Thermal Monitor (feature TM) On-Demand Clock
Modulation.

This works by changing the duty cycle of the clock modulation,
and saves power and helps to not increase the temperature by
software.

Adapted from OpenBSD/FreeBSD's p4tcc.

To enable it one must use "options INTEL_ONDEMAND_CLOCKMOD".

Tested by me in UP and SMP, ok'ed by Matthew R. Green.
 1.5.2.3 15-Apr-2007  yamt sync with head.
 1.5.2.2 24-Mar-2007  yamt sync with head.
 1.5.2.1 21-Mar-2007  yamt file iclockmod.c was added on branch yamt-idlelwp on 2007-03-24 14:55:06 +0000
 1.7.22.1 06-Oct-2007  yamt sync with head.
 1.7.20.4 07-Dec-2007  yamt sync with head
 1.7.20.3 27-Oct-2007  yamt sync with head.
 1.7.20.2 03-Sep-2007  yamt sync with head.
 1.7.20.1 04-Apr-2007  yamt file iclockmod.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:25 +0000
 1.7.18.2 09-Jan-2008  matt sync with HEAD
 1.7.18.1 06-Nov-2007  matt sync with HEAD
 1.7.16.2 03-Dec-2007  joerg Sync with HEAD.
 1.7.16.1 02-Oct-2007  joerg Sync with HEAD.
 1.7.8.2 11-Jul-2007  mjf Sync with head.
 1.7.8.1 04-Apr-2007  mjf file iclockmod.c was added on branch mjf-ufs-trans on 2007-07-11 20:03:19 +0000
 1.7.6.1 03-Oct-2007  garbled Sync with HEAD
 1.7.2.4 03-Dec-2007  ad Sync with HEAD.
 1.7.2.3 09-Oct-2007  ad Sync with head.
 1.7.2.2 10-Apr-2007  ad Sync with head.
 1.7.2.1 04-Apr-2007  ad file iclockmod.c was added on branch vmlocking on 2007-04-10 13:22:45 +0000
 1.9.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.10.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.11.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.4.2 11-Mar-2010  yamt sync with head
 1.11.4.1 16-May-2008  yamt sync with head.
 1.11.2.1 18-May-2008  yamt sync with head.
 1.12.12.2 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.12.12.1 01-Nov-2009  jym Sync with HEAD.
 1.13.8.1 05-Mar-2011  bouyer Sync with HEAD
 1.13.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.13.4.1 05-Mar-2011  rmind sync with head
 1.138 01-May-2025  imil Avoid redundant and incorrect hypervisor detection in second
identify_hypervisor() call

The second call to identify_hypervisor() during autoconf needlessly repeats
hypervisor detection logic and could incorrectly override vm_guest values
(e.g., replacing VM_GUEST_KVM with VM_GUEST_VM). This change ensures that
if vm_guest is already set to a known hypervisor, the second pass skips
detection, preserving correct identification from the early call.

Tested on NetBSD/amd64, NetBSD/i386 with QEMU/KVM and QEMU/NVMM (BIOS and PVH),
Xen domU and Xen PVH.
 1.137 01-May-2025  imil Introduce cpu_max_hypervisor_cpuid to cache hypervisor CPUID leaf

This variable stores the maximum supported hypervisor CPUID leaf so that
future checks can avoid repeated calls to x86_cpuid().
 1.136 24-Apr-2025  riastradh x86/identcpu.c: Sort includes.

No functional change intended.

Preparation for:

PR port-amd64/59299: Support Intel AMX CPU state (TILECFG/TILEDATA)
 1.135 24-Apr-2025  riastradh x86: Reserve space for only the extended CPU state we will use.

CPUID[EAX=0x0d, ECX=0].ECX, i.e., the value of descs[2] after
x86_cpuid2(0x0d, 0, descs), gives the size in bytes of the extended CPU
state for all features supported by the hardware in CPUID[EAX=0x0d,
ECX=0].EAX which can be enabled in XCR0. However, on i386, it is
senseless to leave TILECFG and TILEDATA enabled, because they are state
for Intel AMX instructions which work only in 64-bit mode.

So, instead of querying the hardware's supported features and maximum
_supported_ extended CPU state size:

1. Query the hardware's supported features.
2. Enable only those supported by software as well (XCR0_FPU).
3. Query the hardware's maximum _enabled_ extended CPU state size.

We will also disable TILECFG and TILEDATA on amd64 for now too
because:

(a) This is not a regression, at least for TILEDATA (and I'm not sure
any machines ship with TILECFG but not TILEDATA), because the
size overflowed the PCB page and therefore never worked on amd64
(PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu).

(b) We need a little extra work to properly support reading and
writing a process's TILECFG and TILEDATA in ptrace(2), and that
work hasn't been done yet.

While here, write out x86_cpuid2(0x0d, <ecx>, ...) explicitly, rather
than x86_cpuid(0x0d, ...), to make it clear that ECX must be set --
otherwise we may get garbage. (It is, perhaps, an accident that
x86_cpuid(<eax>, ...) always sets ECX=0, but other CPUID access
paths, like gcc's <cpuid.h> __cpuid(<eax>, ...), do not, so let's
make it clear for the reader.)

XXX When we enable TILECFG and TILEDATA in amd64, we should arrange
to disable them in compat32 processes -- no sense in allocating extra
space for state they can't use anyway, since the Intel AMX
instructions work only in 64-bit mode. However, selectively
disabling them in some contexts might require hardware support for
XFD, Extended Feature Disable, which is another kettle of fish to
deal with.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.134 22-Apr-2025  imil NVMM hypervisor identification, KVM and GenPVH identification fixes

arch/x86/include/cpu.h, arch/x86/x86/identcpu.c: Enable NVMM hypervisor
discovery
arch/x86/x86/identcpu.c: Fix vm_guest_t for KVM in vm_system_products
iarch/x86/x86/x86_machdep.c: Add NVMM and GenPVH in vm_guest_name
 1.133 17-Jan-2025  riastradh x86/identcpu.c: Add archive link just in case.

Refill paragraph while here to avoid overlong lines.
 1.132 13-Jan-2025  andvar Remove stepping check for APL30 Errata. Issue also affects newer Apollo Lake
CPUs. Therefore, the stepping check is unnecessary.

Include a reference to the errata and provide a description to clarify
the nature of the issue.

Should fix PR port-amd64/58982 reported by Wolfgang Stukenbrock.
 1.131 02-Dec-2024  bouyer Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.
 1.130 01-Jul-2024  andvar Disable the VIA Alternate Instructions according the VIA documentation:
* C7 and above do not support ALTINST, do not check or attempt to disable them.
* For VIA C3 Nehemiah check extended feature flags for support and status,
do no attempt to disable when AIS is not supported or enabled.
* For pre-Nehemiah models explicitly disable, if they are in the range
of documented models, flags aren't present to check the status on these models.
Note: for pre-Nehemiah may be other functional side effects depdending
on the version and stepping.

Explicit disabling of ALTINST was introduced with rev. 1.84 following
the discovery of some VIA CPUs having these instructions enabled by default
leading to the potential backdoor (aka rosenbrindge).

Unfortunately, implementation used a wrong check (ACE supported flag),
which can be true for the later models, still supporting padlock features.
Setting ALTINST bit on those may have unexpected side effects like VIA C7 CPUID
instruction for temperature sensor not reporting correct value or
`cpuctl identify' not reporting certain CPU features. Similar side effects
can be observed even for Nehemiah models not supporting AIS instructions. This
change should limit possibility of such issues to only the pre-Nehemiah models,
not covered at all in the previous implementation.

Feature Control Register (FCR) macros were unified under one group and
consistent naming while implementing the change. Few comments updated as well.

patch reviewed by Riastradh@ (thank you)

need pullups to netbsd-9, 10.

PR kern/58370
 1.129 30-Jun-2024  andvar Move determination of the largest VIA CPU extended function value
to the intended place where the checks are performed.

Currently the value can be overridden while checking for the padlock features,
and failing the check for max function value as a result.
 1.128 17-Oct-2023  riastradh branches: 1.128.6;
Revert "x86: Panic early if fpu save size is too large, take 2."

Apparently this is too early to print anything useful, so it just
causes a reboot loop.

PR kern/57661
 1.127 17-Oct-2023  riastradh x86: Panic early if fpu save size is too large, take 2.

This shouldn't break any existing systems (for real this time), but
it should make the failure mode more obvious on systems that are
already broken.

PR kern/57661

XXX pullup-10
XXX pullup-9
XXX pullup-8
 1.126 17-Oct-2023  riastradh x86: Remove incomplete fpu save size check.

Will fix it later, but this makes pullups easier.
 1.125 15-Oct-2023  riastradh x86: Disable savefpu size check for now.

This is apparently so broken that the error check for what should
have been a safe size fails, which is breaking boot on x86 all the
way back to Sandy Bridge at this point. Grrr.

We need to expand savefpu so that it supports the maximum size
instead.
 1.124 15-Oct-2023  riastradh x86: Panic if cpuid's fpu save size is larger than we support.

Ideally this wouldn't panic, but the alternative right now is to
crash in a memset later -- or silently corrupt kernel memory -- so
this doesn't make the situation worse than it was before.

PR kern/57661

XXX pullup-10
XXX pullup-9
XXX pullup-8
 1.123 07-Oct-2021  msaitoh branches: 1.123.4;
Move some common functions into x86/identcpu_subr.c. No functional change.
 1.122 07-Oct-2021  msaitoh KNF. No functional change.
 1.121 12-Apr-2021  mrg make a numeric literal unsigned as it is bit-negated.
 1.120 06-Mar-2021  bouyer branches: 1.120.2;
return early from identify_hypervisor() if we already know we're running
Xen PV or PVH, as this was before 1.119.
Trying to read the BIOS faults (as expected, as there's no BIOS in this case).
Problem pointed out and fix tested by Brian Marcotte
 1.119 19-Feb-2021  christos Identify VirtualBox as a separate guest type.
 1.118 27-Oct-2020  ryo branches: 1.118.2;
move vmt(4) from MD to MI, and add support vmt on aarch64. tested on ESXi-Arm Fling

- move from sys/arch/x86/x86/{vmt.c,vmtreg.h,vmtvar.h} to sys/dev/vmt/{vmt_subr.c,vmtreg.h,vmtvar.h},
and split the attach part of the cpufeaturebus and fdt
- add aarch64 vmware backdoor op
- add include guard to vmt{reg,var}.h
- Yet there is still some little-endian dependency. it needs to be fixed in order to work properly on aarch64eb
 1.117 05-Sep-2020  maxv x86: fix several CPUID flags

- Rename: CPUID_PN -> CPUID_PSN
CPUID_CFLUSH -> CPUID_CLFSH
CPUID_SBF -> CPUID_PBE
CPUID_LZCNT -> CPUID_ABM
CPUID_P1GB -> CPUID_PAGE1GB
CPUID2_PCLMUL -> CPUID2_PCLMULQDQ
CPUID2_CID -> CPUID2_CNXTID
CPUID2_xTPR -> CPUID2_XTPR
CPUID2_AES -> CPUID2_AESNI
To match the x86 specification and the other OSes.

- Remove: CPUID_B10, CPUID_B20, CPUID_IA64. They do not exist.
 1.116 25-Jul-2020  riastradh Implement ChaCha with SSE2 on x86 machines.

Slightly disappointed that it only doubles, rather than quadruples,
throughput on my Ivy Bridge laptop. Worth investigating.
 1.115 25-Jul-2020  riastradh Nix outdated comment.

The substance of the change that introduced it was reverted, but I
neglected to revert the comment when reverting the substance.
 1.114 25-Jul-2020  riastradh Split aes_impl declarations out into aes_impl.h.

This will make it less painful to add more operations to struct
aes_impl without having to recompile everything that just uses the
block cipher directly or similar.
 1.113 20-Jul-2020  riastradh Revert 1.112 "Disable x86 in-kernel AES temporarily."

The bug in fpu_kern_enter motivating this appears to have been fixed.
 1.112 20-Jul-2020  riastradh Disable x86 in-kernel AES temporarily.

There's a bug in the FPU state handling that it triggers -- likely
limited to the softint path since I've only ever seen it on a system
using wifi configured with WPA2 and CCMP, which uses AES heavily in
softint.

This is to be reverted once we diagnose the bug. (There is also a
performance regression on wifi with WPA2 and CCMP, which I plan to
fix too once we figure out the FPU state handling bug.)
 1.111 29-Jun-2020  riastradh New permutation-based AES implementation using SSSE3.

This covers a lot of CPUs -- particularly lower-end CPUs over the
past decade which lack AES-NI.

Derived from Mike Hamburg's public domain vpaes software; see
<https://crypto.stanford.edu/vpaes/> for details.
 1.110 29-Jun-2020  riastradh New SSE2-based bitsliced AES implementation.

This should work on essentially all x86 CPUs of the last two decades,
and may improve throughput over the portable C aes_ct implementation
from BearSSL by

(a) reducing the number of vector operations in sequence, and
(b) batching four rather than two blocks in parallel.

Derived from BearSSL'S aes_ct64 implementation adjusted so that where
aes_ct64 uses 64-bit q[0],...,q[7], aes_sse2 uses (q[0], q[4]), ...,
(q[3], q[7]), each tuple representing a pair of 64-bit quantities
stacked in a single 128-bit register. This translation was done very
naively, and mostly reduces the cost of ShiftRows and data movement
without doing anything to address the S-box or (Inv)MixColumns, which
spread all 64-bit quantities across separate registers and ignore the
upper halves.

Unfortunately, SSE2 -- which is all that is guaranteed on all amd64
CPUs -- doesn't have PSHUFB, which would help out a lot more. For
example, vpaes relies on that. Perhaps there are enough CPUs out
there with PSHUFB but not AES-NI to make it worthwhile to import or
adapt vpaes too.

Note: This includes local definitions of various Intel compiler
intrinsics for gcc and clang in terms of their __builtin_* &c.,
because the necessary header files are not available during the
kernel build. This is a kludge -- we should fix it properly; the
present approach is expedient but not ideal.
 1.109 29-Jun-2020  riastradh Add AES implementation with VIA ACE.
 1.108 29-Jun-2020  riastradh Add x86 AES-NI support.

Limited to amd64 for now. In principle, AES-NI should work in 32-bit
mode, and there may even be some 32-bit-only CPUs that support
AES-NI, but that requires work to adapt the assembly.
 1.107 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.106 20-Apr-2020  msaitoh Whitespace fix. No functional change.
 1.105 09-Apr-2020  christos flip the comparison again
 1.104 09-Apr-2020  christos use __arraycount, and fix comparison
 1.103 09-Apr-2020  christos Add EX2 for Vortex86 SoCs (Andrius V)
 1.102 04-Apr-2020  ad branches: 1.102.2;
Enable MONITOR/MWAIT idle on AMD chips, except some buggy Ryzens.
 1.101 03-Apr-2020  ad CPU topology makes almost no sense for Xen, and populates it with B/S values
 1.100 21-Dec-2019  ad Fix build break (ci->ci_dev is not available on every port).
 1.99 20-Dec-2019  ad Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.
 1.98 29-Oct-2019  maxv Enable XSAVEOPT.
 1.97 21-Oct-2019  maxv Call cpu_probe_fpu() only once (from cpu0), and style.
 1.96 03-Oct-2019  maxv Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
 1.95 12-Sep-2019  maxv Fix a normally harmless race: initialize several global variables only on
cpu0, so we don't get eg cpu1 re-initializing them while cpu0 is using
them.
 1.94 09-Sep-2019  msaitoh Call cpu_dcp_cacheinfo() only when the cpuid Topology Extension flag is set
on AMD prcessor.
 1.93 26-Jul-2019  msaitoh branches: 1.93.2;
- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.92 26-Jun-2019  mgorny Fetch XSAVE area component offsets and sizes when initializing x86 CPU

Introduce two new arrays, x86_xsave_offsets and x86_xsave_sizes,
and initialize them with XSAVE area component offsets and sizes queried
via CPUID. This will be needed to implement getters and setters for
additional register types.

While at it, add XSAVE_* constants corresponding to specific XSAVE
components.
 1.91 24-May-2019  nonaka Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.90 18-May-2019  maxv Enable EagerFPU by default. Sent on port-amd64@.
 1.89 15-May-2019  maxv Enable EagerFPU on Xen PV. Should work as-is. Sent on port-amd64@.
 1.88 11-Feb-2019  cherry Detect and report running in a XEN hvm container.

This allows the lapic code to apply its x2apic probe logic while
running in a XEN hvm container.
 1.87 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.86 13-Jan-2019  maxv On certain AMD f10h CPUs (like mine), the BIOS does not enable WC+. It
means that the guest pages that are WC+ become CD, and this degrades
performance of the guests.

Explicitly enable WC+.

While here clarify the AMD identification code.
 1.85 06-Jan-2019  maxv Handle the NVMM signature.
 1.84 16-Dec-2018  maxv Explicitly disable ALTINST on VIA, in case it isn't disabled by default
already (the 'VIA cpu backdoor').
 1.83 19-Nov-2018  jdolecek enable XSAVE (and hence AVX) under XEN by default, fixes PR kern/50332

okayed by maxv@
 1.82 10-Nov-2018  maxv Merge the VIA detection code into cpu_probe_c3.
 1.81 10-Nov-2018  maxv Declare the MSR_VIA_ACE values as macros, and use a consistent naming,
similar to the rest of the file.

I'm wondering if I'm not fixing a huge bug here. The ECX8 value we were
using was wrong: ECX8 is bit 1, not bit 0. Bit 0 is ALTINST, an alternate
ISA, which is now known to be backdoored.

So it looks like we were explicitly enabling the backdoor.

Not tested, because I don't have a VIA cpu.
 1.80 10-Nov-2018  maxv Remove unused cpu_msr.h includes.
 1.79 04-Jul-2018  maya Disable MWAIT/MONITOR on Apollo Lake CPUs to workaround APL30 errata.

We use MWAIT/MONITOR to hatch secondary CPUs. The errata means that
the wakeup may not happen, so SMP boot fails.
Use wrmsr to disable it in hardware too, for extra paranoia.

PR port-amd64/53420,
also reported on netbsd-users by joern clausen and ssartor.
 1.78 01-Jul-2018  maxv Optimize FNSAVE. The size of its save area is 108 bytes, so don't set
x86_fpu_save_size = 512, because otherwise we uselessly memset extra
bytes at execve time.

While here use sizeof instead of hardcoded values.
 1.77 23-Jun-2018  jdolecek branches: 1.77.2;
re-do the XEN XSAVE support, this time to leave all probe code in
cpu_probe_fpu(), and have XEN cpu_init() just act

the xen probe is now guarded by XEN_USE_XSAVE option and XSAVE
support is thus still disabled by default (same as before), so it
wouldn't interfere with maxv's eager fpu rototil, while still
allowing testing for others

PR kern/50332
 1.76 23-Jun-2018  maxv Reorder the code a little. On Xen, return earlier, we don't need to do
the XSAVE-related initialization if we don't support XSAVE.
 1.75 23-Jun-2018  maxv Revert the rest of jdolecek's changes. This puts us back in a clean,
sensical state.
 1.74 22-Jun-2018  christos Handle more Vortex CPU's from Andrius V.
While here refactor the code to make it smaller.
 1.73 19-Jun-2018  jdolecek fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8
 1.72 17-Jun-2018  maxv Enable eager fpu automatically at boot time if the cpu is affected. Intel
hasn't published a list of its affected products, but it appears that Xen
was given this information since they have a specific detection code.

We could just unconditionally enable eager; but on x86_32 eager may have
a greater performance cost than lazy, and we don't want to lose
performance on unaffected (and ~old) CPUs running NetBSD/i386.

So use the same code as Xen: take Family 6, and whitelist certain models.
 1.71 30-Mar-2018  maxv Retrieve cpuid.7:%edx.
 1.70 12-Mar-2018  msaitoh s/CLFUSH/CLFLUSH/
No functional change.
 1.69 09-Feb-2018  maxv branches: 1.69.2;
Disable XSAVEOPT, until it is clear what's wrong with it (PR/52966).
 1.68 07-Feb-2018  maya stopgap fix: restrict XSAVEOPT to Intel CPUs

The current code causes floating point miscalculations on AMD Ryzen.
PR port-amd64/52966: amd64 FPU handling broken on AMD
 1.67 11-Nov-2017  maxv Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.
 1.66 11-Nov-2017  bouyer Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt
 1.65 08-Nov-2017  maxv Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.
 1.64 03-Nov-2017  kre Revert last and instead #include <x86/fpu.h> to get the needed prototype.
 1.63 03-Nov-2017  kre XEN apparently has no fpuinit_mxcsr_mask() and hasn't needed it until
now, so just omit this (as it was before) for XEN. If this is not
correct, someone who knows x86's & the XEN interface can fix it, but
at least it should build, and be essentially the same as before now.
 1.62 03-Nov-2017  maxv Fix MXCSR_MASK, it needs to be detected dynamically, otherwise when masking
MXCSR we are losing some features (eg DAZ).
 1.61 31-Oct-2017  maxv Always use x86_fpu_save, clearer.
 1.60 09-Oct-2017  maya GC i386_fpu_present. no FPU x86 is not supported.

Also delete newly unused send_sigill
 1.59 09-Oct-2017  maya Don't probe for the no FPU case. it's not supported
 1.58 09-Oct-2017  maya Simplify macro logic of cpu_probe_old_fpu
 1.57 12-Sep-2017  msaitoh Set ci->ci_cflush_lsize correctly. This bug was added in the last commit(1.56).
 1.56 07-Sep-2017  msaitoh Define CPUID Fn00000001 %ebx bits and use them. No functional change.
 1.55 23-May-2017  nonaka branches: 1.55.2;
x86: hypervisor detection from FreeBSD for x2APIC support.
 1.54 10-May-2017  msaitoh Print package ID, core ID and SMT ID.
 1.53 16-Feb-2017  tls branches: 1.53.4;
On i386 (but not on amd64) we can enable SSE comparatively very late, when
probing/attaching the FPU. This is a problem for cpu_rng with the VIA
processors because, by design, cpu_rng attaches, and the entropy subsystem
starts up, very early.

If SSE is not enabled, calls to any "PadLock" instructions (ACE, RNG)
on the VIA processors will trap, per the manual:
linux.via.com.tw/support/beginDownload.action?eleid=181&fid=261

All VIA CPUs with PadLock, or which match the model/stepping test as
possibly having PadLock, have SSE. Just unconditionally enable it before
trying to turn the crypto block on.

Fixes crash at RNG attach time reported by Andrus V.; fix proposed by
jak@.
 1.52 02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.51 17-Dec-2016  maxv branches: 1.51.2;
Remove a wrong comment - the FPU save size should never be percpu -, and
be more explicit about Xen.
 1.50 01-Jan-2016  tls branches: 1.50.2;
Enable second noise source on newer VIA CPUs
 1.49 13-Dec-2015  maxv Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.48 08-Dec-2014  msaitoh Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.
 1.47 28-Oct-2014  riz branches: 1.47.2;
Work around the problem in PR port-amd64/49150 for all CPUs under Xen.

The problem (calling xrstor, which is privileged in Xen) has appeared
on some Intel CPUs as well, so implement the workaround (ensure that
x86_xsave_features is 0) for all CPUs, not just AMD CPUs.

XXX pullup to 7
 1.46 14-Oct-2014  jnemeth Force x86_xsave_features to 0 when running under XEN for AMD
processors. This prevents the use of xsave and xrstor thus fixing
the problem in PR/49150. The basic problem is that the way AMD
implements those instructions means that information can leak
between domains so XEN treats them as privileged.

XXX If anybody else comes up with a better / more "proper" fix, go
for it. However, this solves the problem I was having. And, given
that XEN being broken is pretty much a show-stopper for a release,
something needed to be done.
 1.45 08-Jul-2014  msaitoh branches: 1.45.2;
Add Vortex86EX.
 1.44 24-Mar-2014  christos branches: 1.44.2;
use cpu_{g,s}etmodel
 1.43 25-Feb-2014  dsl Fix a 'stupido' that stopped (amongst other things) the cpu brand string
being read.
The most obvious side effect the anita tests failed to detect they were
running under qemu - so reported failures under qemu for things
that qemu doesn't support.
 1.42 23-Feb-2014  dsl Rename (the recently added) 'x86_xsave_size' to 'x86_fpu_save_size'
and default to 512 (the size of the fxsave structure).
 1.41 23-Feb-2014  dsl Determine whether the cpu supports xsave (and hence AVX).
The result is only written to sysctl nodes at the moment.
I see:
machdep.fpu_save = 3 (implies xsaveopt)
machdep.xsave_size = 832
machdep.xsave_features = 7
Completely common up the i386 and amd64 machdep sysctl creation.
 1.40 22-Feb-2014  dsl Re-use the unused ci_cpu_serial[3] to save the highest cpuid values
for the normal and extended leafs.
(The 'normal' one might be luring in the global cpulevel.)
Read the 'extended feature' from cpuid.80000001.%ecx/edx into
ci_feat_val[3/2] just after saving cpuid.1.%ecx/dx in ci_feat_val[1/0]
instead of doing it separately for amd k678 and via c3 processors
in their probe functions and repeating it for all cpus a few instructions
later when x86_cpu_topology() is called.
x86_cpu_topology() is only called from cpu_probe() and really doesn't
deserve its own source file. Chasing the setup code is bad enough anyway.
 1.39 23-Dec-2013  msaitoh CPUID leaf 2 and 4 are only for Intel processors.
Almost the same as usr.sbin/cpuctl/arch/i386.c rev. 1.52.
 1.38 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.37 12-Nov-2013  msaitoh Fix a bug in last commit. Check correct variable.
 1.36 12-Nov-2013  msaitoh Fix calculation of the cpu model (display model) in cpu_probe_amd_cache().
The CPUID2MODEL() must be used only when the CPUID2FAMILY() macro returns
0xf or 0x6. Also fix a bug that CPUID2EXTMODEL() is _ADDED_. The correct way
is shifting the return value of CPUID2EXTMODEL() 4bit left and _OR_ it.
 1.35 21-Oct-2013  msaitoh Check cpuid leaf 4 for newer Intel CPU to know the cache information.
 1.34 27-Jun-2013  christos branches: 1.34.2;
back out previous, fix is in tsc.c
 1.33 26-Jun-2013  christos PR/47967: Jeff Rizzo: Add a probe for QEMU to disable it from claiming it
has MSR_TSC. Fixes DTRACE crashing because it returned a frequency of 0.
 1.32 16-Jun-2012  chs branches: 1.32.2;
rename the global variable "cpu" to "cputype" to avoid conflicting with
dtrace, which wants to use "cpu" as a local variable.
 1.31 30-Apr-2012  christos PR/41267: Andrius V: 5.0 RC4 does not detect second CPU in VIA. VIA Eden cpuid
lies about it's ability to do cmpxchg8b. Turn the feature on using the FCR MSR.
Needs pullup to both 5 and 6.
 1.30 23-Feb-2012  chs move XEN CPU feature masking into cpu_probe() so that it's applied
to all CPUs, not just the boot CPU.
 1.29 03-Feb-2012  yamt branches: 1.29.2;
use a correct macro.
releng@ ok
 1.28 04-Mar-2011  jruoho branches: 1.28.4; 1.28.8;
Move INTEL_ONDEMAND_CLOCKMOD -- or odcm(4) -- to the cpufeaturebus.
 1.27 24-Feb-2011  jruoho Move VIA_C7TEMP to the cpufeaturebus.
 1.26 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.25 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.24 20-Feb-2011  jruoho Modularize coretemp(4). Ok jmcneill@.
 1.23 19-Feb-2011  jmcneill modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.22 27-Jan-2011  bouyer Properly identify vortex86 CPUs.
 1.21 19-Jan-2011  jmcneill branches: 1.21.2;
Print the brand string if present instead of model (model is still
available via sysctl hw.model):

-cpu0 at mainbus0 apid 0: IDT/VIA 686-class, 1596MHz, id 0x6d0
+cpu0 at mainbus0 apid 0: VIA C7-M Processor 1600MHz, id 0x6d0
 1.20 18-Apr-2010  jym branches: 1.20.2;
This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.19 18-Apr-2010  jym Fix a test semantic in cpu_probe(): check that the CPU currently probed
is the first one booting by comparing its struct cpu_info address with
cpu_info_primary, rather than supposing that cpu_feature variables are
set to 0.
 1.18 18-Jan-2010  rmind branches: 1.18.2; 1.18.4;
x86_cpu_topology, not toplogy.
 1.17 02-Oct-2009  jmcneill Add support for VIA C7 temperature sensors (options VIA_C7TEMP)
 1.16 30-Apr-2009  rmind Move x86 CPU topology detection code into the separate file (as it was originally).
OK by <yamt>.
 1.15 01-Apr-2009  tls Fix probe for VIA C3 and successors -- these are CPU family 6, not 5.
The broken probe was causing the VIA padlock driver to never attach!
Now we can see that its AES appears to be broken -- it makes FAST_IPSEC
ESP not work, on systems where it works fine with cryptosoft.

Rework code to detect and (if necessary) enable VIA crypto and RNG.
Add RNG support to VIA padlock driver. In the process, have a quick
go at debugging the AES support but no luck thus far.
 1.14 25-Mar-2009  dyoung It is only by accident that these get definitions they need from
<sys/device.h>, so explicitly #include <sys/device.h>.
 1.13 19-Dec-2008  cegger branches: 1.13.2;
x86_patch() is not available on Xen.
Make Xen kernels link again.
 1.12 19-Dec-2008  ad PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.11 18-Dec-2008  cegger remove unused malloc.h
 1.10 18-Oct-2008  cegger branches: 1.10.2; 1.10.4;
for AMD CPUs: utilize new ci_feature4_flags field and check if SVM is present & disabled by the BIOS.
If so, then print this information in the dmesg, but only once for cpu0.
Don't do this check in a Xen DomU.
 1.9 02-Jun-2008  ad branches: 1.9.4; 1.9.6;
- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.
 1.8 30-May-2008  christos branches: 1.8.2;
- fix an amd cache entry.
- merge tables
- support phenom
from Paul Goyette
 1.7 13-May-2008  ad branches: 1.7.2; 1.7.4;
AMD and IDT/VIA strings were swapped.
 1.6 12-May-2008  tsutsui Remove one more dup line. I should have a cup of coffee before hasty commit..

XXX maybe it's better to sort by cai_desc to sync with the Intel docs.
 1.5 12-May-2008  simonb Only need to add some of the new cache descriptors once(!).
 1.4 11-May-2008  cegger print L3 and TLB cache information for AMD Barcelona/Phenom
 1.3 11-May-2008  tsutsui Update intel_cpuid_cache_info as per Intel's application note:
"AP-485 Intel(R) Processor Identification and the CPUID Instruction"
http://www.intel.com/design/processor/applnots/241618.htm

XXX1: should sort by cai_index or cai_desc?
XXX2: should also check L3CACHE for coloring?
 1.2 11-May-2008  tsutsui Fix an indent.
 1.1 11-May-2008  ad Simplify x86 identcpu code, and share between i386/amd64.
 1.7.4.3 04-Jun-2008  yamt sync with head
 1.7.4.2 18-May-2008  yamt sync with head.
 1.7.4.1 13-May-2008  yamt file identcpu.c was added on branch yamt-pf42 on 2008-05-18 12:33:04 +0000
 1.7.2.5 11-Aug-2010  yamt sync with head.
 1.7.2.4 11-Mar-2010  yamt sync with head
 1.7.2.3 04-May-2009  yamt sync with head.
 1.7.2.2 16-May-2008  yamt sync with head.
 1.7.2.1 13-May-2008  yamt file identcpu.c was added on branch yamt-nfs-mp on 2008-05-16 02:23:29 +0000
 1.8.2.4 17-Jan-2009  mjf Sync with HEAD.
 1.8.2.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.8.2.2 02-Jun-2008  mjf Sync with HEAD.
 1.8.2.1 30-May-2008  mjf file identcpu.c was added on branch mjf-devfs2 on 2008-06-02 13:22:51 +0000
 1.9.6.1 19-Oct-2008  haad Sync with HEAD.
 1.9.4.2 23-Jun-2008  wrstuden Add files to branch that were added on -current.

After this, all that's left of update is to merge some changes
that had conflicts.
 1.9.4.1 02-Jun-2008  wrstuden file identcpu.c was added on branch wrstuden-revivesa on 2008-06-23 05:02:13 +0000
 1.10.4.7 26-Nov-2012  riz Pull up following revision(s) (requested by christos in ticket #1819):
sys/arch/x86/x86/identcpu.c: revision 1.31
PR/41267: Andrius V: 5.0 RC4 does not detect second CPU in VIA. VIA Eden cpuid
lies about it's ability to do cmpxchg8b. Turn the feature on using the FCR MSR.
Needs pullup to both 5 and 6.
 1.10.4.6 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.10.4.5 22-Apr-2010  snj Pull up following revision(s) (requested by jym in ticket #1377):
sys/arch/x86/x86/identcpu.c: revision 1.19
Fix a test semantic in cpu_probe(): check that the CPU currently probed
is the first one booting by comparing its struct cpu_info address with
cpu_info_primary, rather than supposing that cpu_feature variables are
set to 0.
 1.10.4.4 05-Oct-2009  sborrill Pull up the following revisions(s) (requested by jmcneill in ticket #1061):
sys/arch/x86/conf/files.x86: revision 1.53
sys/arch/x86/include/cpuvar.h: revision 1.31
sys/arch/x86/x86/identcpu.c: revision 1.17
sys/arch/x86/x86/viac7temp.c: revision 1.1
sys/arch/i386/conf/ALL: revision 1.218
sys/arch/i386/conf/GENERIC: revision 1.949
Add support for VIA C7 temperature sensors (options VIA_C7TEMP) and enable
in i386 GENERIC kernel.
 1.10.4.3 16-Jun-2009  snj Pull up following revision(s) (requested by rmind in ticket #782):
sys/arch/x86/conf/files.x86: revision 1.52 via patch
sys/arch/x86/include/cpu.h: revision 1.17
sys/arch/x86/x86/cpu_topology.c: revision 1.1
sys/arch/x86/x86/identcpu.c: revision 1.16 via patch
Move x86 CPU topology detection code into the separate file (as it was
originally).
OK by <yamt>.
 1.10.4.2 02-Feb-2009  snj branches: 1.10.4.2.2; 1.10.4.2.4;
Pull up following revision(s) (requested by bouyer in ticket #343):
sys/arch/x86/x86/identcpu.c: revision 1.13
sys/arch/x86/include/cpufunc.h: revision 1.10
x86_patch() is not available on Xen.
Make Xen kernels link again.
 1.10.4.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #343):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.14
sys/arch/x86/include/cpufunc.h: revision 1.9
sys/arch/x86/x86/identcpu.c: revision 1.12
sys/arch/x86/x86/cpu.c: revision 1.60
sys/arch/x86/x86/patch.c: revision 1.15
PR kern/40213 my i386 machine can't boot because of tsc
- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.
- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.10.4.2.4.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.10.4.2.4.1 21-Apr-2010  matt sync to netbsd-5
 1.10.4.2.2.2 23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.10.4.2.2.1 22-Apr-2010  snj Pull up following revision(s) (requested by jym in ticket #1377):
sys/arch/x86/x86/identcpu.c: revision 1.19
Fix a test semantic in cpu_probe(): check that the CPU currently probed
is the first one booting by comparing its struct cpu_info address with
cpu_info_primary, rather than supposing that cpu_feature variables are
set to 0.
 1.10.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.10.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.13.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.13.2.3 24-Oct-2010  jym Sync with HEAD
 1.13.2.2 01-Nov-2009  jym Sync with HEAD.
 1.13.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.18.4.2 05-Mar-2011  rmind sync with head
 1.18.4.1 30-May-2010  rmind sync with head
 1.18.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.20.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.21.2.2 05-Mar-2011  bouyer Sync with HEAD
 1.21.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.28.8.3 02-Jun-2012  mrg sync to latest -current.
 1.28.8.2 24-Feb-2012  mrg sync to -current.
 1.28.8.1 18-Feb-2012  mrg merge to -current.
 1.28.4.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.28.4.3 30-Oct-2012  yamt sync with head
 1.28.4.2 23-May-2012  yamt sync with head.
 1.28.4.1 17-Apr-2012  yamt sync with head
 1.29.2.3 26-Jan-2015  martin Pull up the following, requested by msaitoh in ticket #1241:

sys/arch/x86/x86/identcpu.c 1.35-1.39

- Check cpuid leaf 4 for newer Intel CPU to know the cache information.
This code might improve performance because it changes the number of
page colors.
- Fix calculation of the cpu model (display model) in
cpu_probe_amd_cache().
- CPUID leaf 2 and 4 are only for Intel processors.
 1.29.2.2 07-May-2012  riz Pull up following revision(s) (requested by christos in ticket #220):
sys/arch/x86/x86/identcpu.c: revision 1.31
sys/arch/x86/include/specialreg.h: revision 1.58
PR/41267: Andrius V: 5.0 RC4 does not detect second CPU in VIA. VIA Eden cpuid
lies about it's ability to do cmpxchg8b. Turn the feature on using the FCR MSR.
Needs pullup to both 5 and 6.
Add VIA Eden FCR MSR.
 1.29.2.1 23-Feb-2012  riz Pull up following revision(s) (requested by chs in ticket #38):
sys/arch/amd64/amd64/machdep.c: revision 1.178
sys/arch/x86/x86/identcpu.c: revision 1.30
sys/arch/i386/i386/machdep.c: revision 1.720
move XEN CPU feature masking into cpu_probe() so that it's applied
to all CPUs, not just the boot CPU.
 1.32.2.2 03-Dec-2017  jdolecek update from HEAD
 1.32.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.2.1 18-May-2014  rmind sync with head
 1.44.2.1 10-Aug-2014  tls Rebase.
 1.45.2.3 09-Oct-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #1636):
sys/arch/x86/include/cacheinfo.h: 1.23-1.26
sys/arch/x86/include/cpu.h: 1.70
sys/arch/x86/include/specialreg.h: 1.91-1.93,1.98,1.100,1.102-1.124,1.126,1.130 via patch
sys/arch/x86/x86/cpu_topology.c: 1.10
sys/arch/x86/x86/identcpu.c: 1.56-1.57,1.70 via patch
usr.sbin/cpuctl/arch/i386.c: 1.71,1.75-1.79,1.81-1.85 via patch
Add some register definitions for x86:
- Add CLWB bit.
- Fix a few (unused) MSR values, and add some bit definitions of
MSR_EFER from Murray Armfield in PR#42861.
- CPUID_CFLUSH bit is not for CFLUSH insn but CLFLUSH insn, so modify
comments and snprintb() string.
- Define CPUID Fn00000001 %ebx bits and use them.
No functional change.
- Add Structured Extended Flags Enumeration Leaf's bit definitions:
AVX512_{IFMA,VBMI2,VNNI,BITALG,VPOPCNTDQ,4VNNIW,4FMAPS},GFNI&VAES.
- Add Turbo Boost Max Technology 3.0 bit.
- Add AMD SVM features definitions.
- Add Intel cpuid 7 %edx IBRS and STIBP bit definitions.
- Fix swapped comments for EFER LME and LMA
- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add MSR_IA32_ARCH_CAPABILITIES definition.
- Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.
- Add Intel Deterministic Address Translation Parameter Leaf(0x18)
definitions.
- s/CLFUSH/CLFLUSH/
- Add AMD's Disable Indirect Branch Predictor bit definition.
- Add the MSR bits definitions for IBRS, STIBP and IBPB.
- Add Intel Fn0000_0006 %eax new bit 14-20 (HWP stuff).
- Intel Fn0000_0007 %ecx bit 22 is for both RDPID and IA32_TSC_AUX.
- Add AMD's CPUID Fn80000001 %edx MMX and FXSR bit definitions.
- Add RDCL_NO and IBRS_ALL.
- Add SSBD and RSBA bit definitions.
- Add AMD's SSB bit definitions for F15H, F16H and F17H.
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
- Add yet another Shared L2 TLB (2M/4M pages).
- Add 3way and 6way of L2 cache or TLB on AMD CPU.
- AMD L3 cache association bitfield is not 8bit but 4bit like others
association bitfields.
- Sort entries. No functional change.
- Modify comment, fix typo in comment and add comment.
cpuctl(8):
- Add detection for Quark X1000, Xeon E5 v4, E7 v4,
Core i7-69xx Extreme Edition, Xeon Scalable (Skylake),
Xeon Phi [357]200 (Knights Landing), Atom (Goldmont),
Atom (Denverton), Future Core (Cannon Lake), Atom (Goldmont Plus),
Xeon Phi 7215, 7285 and 7295 (Knights Mill) and
7th or 8th gen Core (Kaby Lake, Coffee Lake).
- Print Structured Extended Feature leaf Fn0000_0007 %ebx on AMD,too.
- Print Fn0000_0007 %ecx on Intel.
- Print Intel cpuid 7 %edx.
- Parse the TLB info from `cpuid leaf 18H' on Intel processor.
- Use aprint_error_dev() for error output.
 1.45.2.2 06-Mar-2016  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67
Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.45.2.1 30-Oct-2014  martin branches: 1.45.2.1.2;
Pull up following revision(s) (requested by riz in ticket #171):
sys/arch/x86/x86/identcpu.c: revision 1.46
sys/arch/x86/x86/identcpu.c: revision 1.47
Force x86_xsave_features to 0 when running under XEN.
This prevents the use of xsave and xrstor thus fixing
the problem in PR/49150. The basic problem is that the way AMD
implements those instructions means that information can leak
between domains so XEN treats them as privileged.
 1.45.2.1.2.1 19-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1118):
sys/arch/x86/include/cpuvar.h: revision 1.47
sys/arch/x86/x86/cpu.c: revision 1.117
sys/arch/x86/x86/identcpu.c: revision 1.49
sys/arch/x86/include/cpu.h: revision 1.67

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.
 1.47.2.5 28-Aug-2017  skrll Sync with HEAD
 1.47.2.4 05-Feb-2017  skrll Sync with HEAD
 1.47.2.3 19-Mar-2016  skrll Sync with HEAD
 1.47.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.47.2.1 06-Apr-2015  skrll Sync with HEAD
 1.50.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.50.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.51.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.53.4.1 11-May-2017  pgoyette Sync with HEAD
 1.55.2.13 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1721:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.55.2.12 08-Dec-2021  martin Pull up the following, requested by msaitoh in ticket #1720:

sys/arch/x86/include/specialreg.h 1.146, 1.171,
1.173-1.178 via patch
sys/arch/x86/x86/identcpu.c 1.106, 1.117,
1.122 via patch
sys/arch/x86/x86/pmap.c patch
sys/external/bsd/drm2/drm/drm_cache.c 1.14
usr.sbin/cpuctl/arch/i386.c 1.114-1.117


- Add PT, PKRU, HDC, LA57, PKE, PKS, CET, CET_U, CET_S, HWP, KL,
AVX512_BF16, TME_EN and PCONFIG.
- Rename some macros to match the x86 specification and the other OSes.
- Print CPUID 0x8000008 %ebx on Intel, too.
- Print CPUID leaf 7 subleaf 1.
- Identify Tiger Lake, 3rd gen Xeon Scalable (Ice Lake), Elkhart Lake
and Jasper Lake.
- Remove a few unused MSRs.
- Add comment.
- KNF. Whitespace fix.
 1.55.2.11 07-Dec-2021  martin Pull up following revision(s) (requested by msaitoh in ticket #1719):

sys/arch/x86/x86/identcpu.c: revision 1.121

make a numeric literal unsigned as it is bit-negated.
 1.55.2.10 07-Dec-2021  martin Pull up following revision(s) (requested by msaitoh in ticket #1718):

sys/arch/x86/x86/identcpu.c: revision 1.103
sys/arch/x86/x86/identcpu.c: revision 1.104
sys/arch/x86/x86/identcpu.c: revision 1.105

Add EX2 for Vortex86 SoCs (Andrius V)

use __arraycount, and fix comparison

flip the comparison again
 1.55.2.9 16-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1338):

usr.sbin/cpuctl/arch/i386.c: revision 1.104
sys/arch/x86/x86/identcpu.c: revision 1.93
sys/arch/x86/include/cacheinfo.h: revision 1.28
sys/arch/x86/include/specialreg.h: revision 1.150

- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.55.2.8 16-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1338):

sys/arch/x86/include/cacheinfo.h: revision 1.27
sys/arch/x86/x86/identcpu.c: revision 1.74

Handle more Vortex CPU's from Andrius V.
While here refactor the code to make it smaller.

-

It seems that AMD zen2's CPUID 0x80000006 leaf's spec has changed.
The EDX register's acsociativity field has 9. In the latest available document,
it's a reserved value. I have no access to zen2's document, but many websites
say that the acsociativity is 16. Add it.

-

- AMD CPUID Fn8000_0001d Cache Topology Information leaf is almost the same as
Intel Deterministic Cache Parameter Leaf(0x04), so make new
cpu_dcp_cacheinfo() and share it.
- AMD's L2 and L3's cache descriptor's definition is the same, so use one
common definition.
- KNF.

XXX Split some common functions to new identcpu_subr.c or use #ifdef _KERNEK
... #endif in identcpu.c to share from both kernel and cpuctl?
 1.55.2.7 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.55.2.6 27-Dec-2018  martin Pull up following revision(s) (requested by maxv in ticket #1148):

sys/arch/x86/x86/identcpu.c: revision 1.81
sys/arch/x86/x86/identcpu.c: revision 1.82
sys/arch/x86/x86/identcpu.c: revision 1.84
sys/arch/x86/include/specialreg.h: revision 1.131

Declare the MSR_VIA_ACE values as macros, and use a consistent naming,
similar to the rest of the file.

I'm wondering if I'm not fixing a huge bug here. The ECX8 value we were
using was wrong: ECX8 is bit 1, not bit 0. Bit 0 is ALTINST, an alternate
ISA, which is now known to be backdoored.

So it looks like we were explicitly enabling the backdoor.

Not tested, because I don't have a VIA cpu.

-

Merge the VIA detection code into cpu_probe_c3.

-

Explicitly disable ALTINST on VIA, in case it isn't disabled by default
already (the 'VIA cpu backdoor').
 1.55.2.5 13-Jul-2018  martin Pull up following revision(s) (requested by maya in ticket #912):

sys/arch/x86/x86/identcpu.c: revision 1.79
sys/arch/x86/include/specialreg.h: revision 1.127

Disable MWAIT/MONITOR on Apollo Lake CPUs to workaround APL30 errata.

We use MWAIT/MONITOR to hatch secondary CPUs. The errata means that
the wakeup may not happen, so SMP boot fails.
Use wrmsr to disable it in hardware too, for extra paranoia.

PR port-amd64/53420,
also reported on netbsd-users by joern clausen and ssartor.
 1.55.2.4 23-Jun-2018  martin Pull up the following, via patch, requested by maxv in ticket #897:

sys/arch/amd64/amd64/locore.S 1.166 (patch)
sys/arch/i386/i386/locore.S 1.157 (patch)
sys/arch/x86/include/cpu.h 1.92 (patch)
sys/arch/x86/include/fpu.h 1.9 (patch)
sys/arch/x86/x86/fpu.c 1.33-1.39 (patch)
sys/arch/x86/x86/identcpu.c 1.72 (patch)
sys/arch/x86/x86/vm_machdep.c 1.34 (patch)
sys/arch/x86/x86/x86_machdep.c 1.116,1.117 (patch)

Support eager fpu switch, to work around INTEL-SA-00145.
Provide a sysctl machdep.fpu_eager, which gets automatically
initialized to 1 on affected CPUs.
 1.55.2.3 01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #681):
sys/arch/x86/include/cpu.h: revision 1.90
sys/arch/x86/x86/identcpu.c: revision 1.71
Retrieve cpuid.7:%edx.
 1.55.2.2 16-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #633):
sys/arch/x86/include/specialreg.h: revision 1.107
sys/arch/x86/include/specialreg.h: revision 1.108
sys/arch/x86/include/specialreg.h: revision 1.109
sys/arch/x86/include/cacheinfo.h: revision 1.23
sys/arch/x86/include/specialreg.h: revision 1.110
sys/arch/x86/include/specialreg.h: revision 1.111
sys/arch/x86/include/specialreg.h: revision 1.112
sys/arch/x86/include/specialreg.h: revision 1.113
sys/arch/x86/include/specialreg.h: revision 1.114
usr.sbin/cpuctl/arch/i386.c: revision 1.79
sys/arch/x86/x86/identcpu.c: revision 1.70
sys/arch/x86/include/specialreg.h: revision 1.106

Add comment.

Add Intel cpuid 7 %edx IBRS(IBPB Speculation Control) and
STIBP(STIBP Speculation Control) from OpenBSD.

Print Intel cpuid 7 %edx.

Example output of cpuctl -v identify 0:
+cpu0: 00000007: 00000000 000027ab 00000000 0c000000
(snip)
+cpu0: SEF edx 0xc000000<IBRS,STIBP>

fix swapped comments for EFER LME and LMA

- Add Intel cpuid 7 %edx bit 29 IA32_ARCH_CAPABILITIES supported bit.
- Add comment.
Add MSR_IA32_ARCH_CAPABILITIES definition.

Add IA32_SPEC_CTRL MSR and IA32_PRED_CMD MSR.

Add Intel Deterministic Address Translation Parameter Leaf(0x18) definitions.

Sort entries. No functional change.

s/CLFUSH/CLFLUSH/
No functional change.
 1.55.2.1 21-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #365):
sys/arch/x86/include/specialreg.h: revision 1.99
usr.sbin/cpuctl/arch/i386.c: revision 1.75
usr.sbin/cpuctl/arch/i386.c: revision 1.76
usr.sbin/cpuctl/arch/i386.c: revision 1.77
usr.sbin/cpuctl/arch/i386.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.56
sys/arch/x86/x86/identcpu.c: revision 1.57
sys/arch/x86/x86/cpu_topology.c: revision 1.10
sys/arch/x86/include/specialreg.h: revision 1.100
sys/arch/x86/include/specialreg.h: revision 1.101
sys/arch/x86/include/specialreg.h: revision 1.102
sys/arch/x86/include/specialreg.h: revision 1.103
sys/arch/x86/include/specialreg.h: revision 1.104
sys/arch/x86/include/specialreg.h: revision 1.105
Add EFER_TCE. This would be an interesting feature to have, since it
reduces the indirect cost of invlpg; but I'm not convinced the way we
flush upper-levels is correct for this yet.
Fix typo in comment
Add a comment about APICBASE_PHYSADDR. Has to do with PR/42597.
Define CPUID Fn00000001 %ebx bits and use them. No functional change.
Set ci->ci_cflush_lsize correctly. This bug was added in the last commit(1.56).
Add the following instruction bits in Structured Extended Flags Enumeration
Leaf from "Intel Architecture Instruction Set Extensions and Future Features
Programming Reference" (319433-030):
AVX512_IFMA
AVX512_VBMI
AVX512_VBMI2
GFNI
VAES
VPCLMULQDQ
AVX512_VNNI
AVX512_BITALG
AVX512_VPOPCNTDQ
AVX512_4VNNIW
AVX512_4FMAPS
- Print ci_feat_val[5] (Structured Extended Feature leaf Fn0000_0007 %ebx) on
AMD, too.
- Print ci_feat_val[6] (Fn0000_0007 %ecx) on Intel.
Update from the latest Intel SDM:
0x5c: Atom (Goldmont)
0x5f: Atom (Goldmont, Denverton)
0x7a: Atom (Goldmont Plus)
Add Turbo Boost Max Technology 3.0 bit.
Update from Intel SDM:
0x55: Xeon Scalable (Skylake)
0x57: Xeon Phi [357]200 (Knights Landing)
0x66: Future Core (Cannon Lake)
0x85: Future Xeon Phi (Knights Mill)
Add the following bits in AMD Fn8000000a %edx features (SVM features):
PFThreshold (PAUSE filter threshold)
AVIC (AMD virtual interrupt controller)
V_VMSAVE_VMLOAD (virtualized VMSAVE and VMLOAD)
vGIF (virtualized GIF)
 1.69.2.7 18-Jan-2019  pgoyette Synch with HEAD
 1.69.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.69.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.69.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.69.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.69.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.69.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.77.2.4 21-Apr-2020  martin Sync with HEAD
 1.77.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.77.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.77.2.1 10-Jun-2019  christos Sync with HEAD
 1.93.2.8 15-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1954):

sys/arch/x86/x86/identcpu.c: revision 1.135 (via patch)

x86: Reserve space for only the extended CPU state we will use.
CPUID[EAX=0x0d, ECX=0].ECX, i.e., the value of descs[2] after
x86_cpuid2(0x0d, 0, descs), gives the size in bytes of the extended CPU
state for all features supported by the hardware in CPUID[EAX=0x0d,
ECX=0].EAX which can be enabled in XCR0. However, on i386, it is
senseless to leave TILECFG and TILEDATA enabled, because they are state
for Intel AMX instructions which work only in 64-bit mode.

So, instead of querying the hardware's supported features and maximum
_supported_ extended CPU state size:
1. Query the hardware's supported features.
2. Enable only those supported by software as well (XCR0_FPU).
3. Query the hardware's maximum _enabled_ extended CPU state size.

We will also disable TILECFG and TILEDATA on amd64 for now too
because:
(a) This is not a regression, at least for TILEDATA (and I'm not sure
any machines ship with TILECFG but not TILEDATA), because the
size overflowed the PCB page and therefore never worked on amd64
(PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu).
(b) We need a little extra work to properly support reading and
writing a process's TILECFG and TILEDATA in ptrace(2), and that
work hasn't been done yet.

While here, write out x86_cpuid2(0x0d, <ecx>, ...) explicitly, rather
than x86_cpuid(0x0d, ...), to make it clear that ECX must be set --
otherwise we may get garbage. (It is, perhaps, an accident that
x86_cpuid(<eax>, ...) always sets ECX=0, but other CPUID access
paths, like gcc's <cpuid.h> __cpuid(<eax>, ...), do not, so let's
make it clear for the reader.)

XXX When we enable TILECFG and TILEDATA in amd64, we should arrange
to disable them in compat32 processes -- no sense in allocating extra
space for state they can't use anyway, since the Intel AMX
instructions work only in 64-bit mode. However, selectively
disabling them in some contexts might require hardware support for
XFD, Extended Feature Disable, which is another kettle of fish to
deal with.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.93.2.7 02-Feb-2025  martin Pull up following revision(s) (requested by andvar in ticket #1935):

sys/arch/x86/x86/identcpu.c: revision 1.132
sys/arch/x86/x86/identcpu.c: revision 1.133

Remove stepping check for APL30 Errata. Issue also affects newer Apollo Lake
CPUs. Therefore, the stepping check is unnecessary.
Include a reference to the errata and provide a description to clarify
the nature of the issue.

Should fix PR port-amd64/58982 reported by Wolfgang Stukenbrock.

x86/identcpu.c: Add archive link just in case.
Refill paragraph while here to avoid overlong lines.
 1.93.2.6 20-Jul-2024  martin Pull up following revision(s) (requested by andvar in ticket #1855):

sys/arch/x86/x86/identcpu.c: revision 1.129
sys/arch/x86/include/specialreg.h: revision 1.212
sys/arch/x86/x86/identcpu.c: revision 1.130

Disable the VIA Alternate Instructions according the VIA documentation:
* C7 and above do not support ALTINST, do not check or attempt to disable them.
* For VIA C3 Nehemiah check extended feature flags for support and status,
do no attempt to disable when AIS is not supported or enabled.
* For pre-Nehemiah models explicitly disable, if they are in the range
of documented models, flags aren't present to check the status on
these models.

Note: for pre-Nehemiah may be other functional side effects depdending
on the version and stepping.

Explicit disabling of ALTINST was introduced with rev. 1.84 following
the discovery of some VIA CPUs having these instructions enabled by default
leading to the potential backdoor (aka rosenbrindge).

Unfortunately, implementation used a wrong check (ACE supported flag),
which can be true for the later models, still supporting padlock features.

Setting ALTINST bit on those may have unexpected side effects like VIA C7 CPUID
instruction for temperature sensor not reporting correct value or
`cpuctl identify' not reporting certain CPU features. Similar side effects
can be observed even for Nehemiah models not supporting AIS instructions. This
change should limit possibility of such issues to only the pre-Nehemiah models,
not covered at all in the previous implementation.

Feature Control Register (FCR) macros were unified under one group and
consistent naming while implementing the change. Few comments updated as well.
patch reviewed by Riastradh@ (thank you)

PR kern/58370

Move determination of the largest VIA CPU extended function value
to the intended place where the checks are performed.
Currently the value can be overridden while checking for the padlock features,
and failing the check for max function value as a result.
 1.93.2.5 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1396:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.93.2.4 08-Dec-2021  martin Pull up the following revisions, requested by msaitoh in ticket #1391:

sys/arch/x86/include/specialreg.h 1.171, 1.173-1.178
sys/arch/x86/x86/identcpu.c 1.106, 1.117,
1.122 via patch
sys/dev/nvmm/x86/nvmm_x86.c 1.18
sys/external/bsd/drm2/drm/drm_cache.c 1.14
sys/external/bsd/drm2/include/asm/cpufeature.h 1.5
usr.sbin/cpuctl/arch/i386.c 1.114-1.117


- Add LA57, PKE, PKS, CET, CET_U, CET_S, HWP, KL, AVX512_BF16, TME_EN
and PCONFIG.
- Rename some macros to match the x86 specification and the other OSes.
- Print CPUID 0x8000008 %ebx on Intel, too.
- Print CPUID leaf 7 subleaf 1.
- Identify Tiger Lake, 3rd gen Xeon Scalable (Ice Lake), Elkhart Lake
and Jasper Lake.
- Add comment.
- KNF. Whitespace fix.
 1.93.2.3 07-Dec-2021  martin Pull up following revision(s) (requested by msaitoh in ticket #1390):

sys/arch/x86/x86/identcpu.c: revision 1.121

make a numeric literal unsigned as it is bit-negated.
 1.93.2.2 07-Dec-2021  martin Pull up following revision(s) (requested by msaitoh in ticket #1389):

sys/arch/x86/x86/identcpu.c: revision 1.103
sys/arch/x86/x86/identcpu.c: revision 1.104
sys/arch/x86/x86/identcpu.c: revision 1.105

Add EX2 for Vortex86 SoCs (Andrius V)

use __arraycount, and fix comparison

flip the comparison again
 1.93.2.1 26-Sep-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #242):

usr.sbin/cpuctl/arch/i386.c: revision 1.106
sys/arch/x86/x86/identcpu.c: revision 1.94

Call cpu_dcp_cacheinfo() only when the cpuid Topology Extension flag is set
on AMD processor.
 1.102.2.3 20-Apr-2020  bouyer Sync with HEAD
 1.102.2.2 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.102.2.1 08-Apr-2020  bouyer Remove VM_GUEST_XEN and define only Xen subtypes:
VM_GUEST_XENPV
VM_GUEST_XENPVH
VM_GUEST_XENHVM
VM_GUEST_XENPVHVM

Set vm_guest in the start routine, if it is hypervisor-specific (e.g Xen PV).
If vm_guest was not set early and we detect Xen in identify_hypervisor(),
assume it is VM_GUEST_XENHVM. Refine to VM_GUEST_PVXENHVM in
hypervisor_match().
 1.118.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.120.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.123.4.4 15-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1118):

sys/arch/x86/x86/identcpu.c: revision 1.135

x86: Reserve space for only the extended CPU state we will use.
CPUID[EAX=0x0d, ECX=0].ECX, i.e., the value of descs[2] after
x86_cpuid2(0x0d, 0, descs), gives the size in bytes of the extended CPU
state for all features supported by the hardware in CPUID[EAX=0x0d,
ECX=0].EAX which can be enabled in XCR0. However, on i386, it is
senseless to leave TILECFG and TILEDATA enabled, because they are state
for Intel AMX instructions which work only in 64-bit mode.

So, instead of querying the hardware's supported features and maximum
_supported_ extended CPU state size:
1. Query the hardware's supported features.
2. Enable only those supported by software as well (XCR0_FPU).
3. Query the hardware's maximum _enabled_ extended CPU state size.

We will also disable TILECFG and TILEDATA on amd64 for now too
because:
(a) This is not a regression, at least for TILEDATA (and I'm not sure
any machines ship with TILECFG but not TILEDATA), because the
size overflowed the PCB page and therefore never worked on amd64
(PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu).
(b) We need a little extra work to properly support reading and
writing a process's TILECFG and TILEDATA in ptrace(2), and that
work hasn't been done yet.

While here, write out x86_cpuid2(0x0d, <ecx>, ...) explicitly, rather
than x86_cpuid(0x0d, ...), to make it clear that ECX must be set --
otherwise we may get garbage. (It is, perhaps, an accident that
x86_cpuid(<eax>, ...) always sets ECX=0, but other CPUID access
paths, like gcc's <cpuid.h> __cpuid(<eax>, ...), do not, so let's
make it clear for the reader.)

XXX When we enable TILECFG and TILEDATA in amd64, we should arrange
to disable them in compat32 processes -- no sense in allocating extra
space for state they can't use anyway, since the Intel AMX
instructions work only in 64-bit mode. However, selectively
disabling them in some contexts might require hardware support for
XFD, Extended Feature Disable, which is another kettle of fish to
deal with.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.123.4.3 29-Mar-2025  martin Pull up following revision(s) (requested by imil in ticket #1074):

sys/arch/x86/x86/x86_machdep.c: revision 1.155
sys/arch/x86/include/cpu.h: revision 1.137
sys/arch/x86/x86/x86_machdep.c: revision 1.156
sys/arch/x86/include/cpu.h: revision 1.138
sys/arch/x86/x86/consinit.c: revision 1.40
sys/arch/x86/acpi/acpi_machdep.c: revision 1.37
sys/arch/x86/acpi/acpi_machdep.c: revision 1.38
sys/arch/amd64/amd64/machdep.c: revision 1.370
sys/arch/xen/xen/hypervisor.c: revision 1.97
sys/arch/xen/xen/hypervisor.c: revision 1.98
sys/arch/amd64/amd64/genassym.cf: revision 1.98
sys/arch/x86/x86/x86_autoconf.c: revision 1.88
sys/arch/x86/x86/x86_autoconf.c: revision 1.89
sys/arch/amd64/amd64/locore.S: revision 1.226
sys/arch/amd64/amd64/locore.S: revision 1.227
sys/arch/x86/x86/identcpu.c: revision 1.131

Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.

Get one more change from PR kern/57813, needed for non-Xen PVH.

Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.123.4.2 02-Feb-2025  martin Pull up following revision(s) (requested by andvar in ticket #1043):

sys/arch/x86/x86/identcpu.c: revision 1.132
sys/arch/x86/x86/identcpu.c: revision 1.133

Remove stepping check for APL30 Errata. Issue also affects newer Apollo Lake
CPUs. Therefore, the stepping check is unnecessary.
Include a reference to the errata and provide a description to clarify
the nature of the issue.

Should fix PR port-amd64/58982 reported by Wolfgang Stukenbrock.

x86/identcpu.c: Add archive link just in case.
Refill paragraph while here to avoid overlong lines.
 1.123.4.1 20-Jul-2024  martin Pull up following revision(s) (requested by andvar in ticket #738):

sys/arch/x86/x86/identcpu.c: revision 1.129
sys/arch/x86/include/specialreg.h: revision 1.212
sys/arch/x86/x86/identcpu.c: revision 1.130

Disable the VIA Alternate Instructions according the VIA documentation:
* C7 and above do not support ALTINST, do not check or attempt to disable them.
* For VIA C3 Nehemiah check extended feature flags for support and status,
do no attempt to disable when AIS is not supported or enabled.
* For pre-Nehemiah models explicitly disable, if they are in the range
of documented models, flags aren't present to check the status on
these models.

Note: for pre-Nehemiah may be other functional side effects depdending
on the version and stepping.

Explicit disabling of ALTINST was introduced with rev. 1.84 following
the discovery of some VIA CPUs having these instructions enabled by default
leading to the potential backdoor (aka rosenbrindge).

Unfortunately, implementation used a wrong check (ACE supported flag),
which can be true for the later models, still supporting padlock features.

Setting ALTINST bit on those may have unexpected side effects like VIA C7 CPUID
instruction for temperature sensor not reporting correct value or
`cpuctl identify' not reporting certain CPU features. Similar side effects
can be observed even for Nehemiah models not supporting AIS instructions. This
change should limit possibility of such issues to only the pre-Nehemiah models,
not covered at all in the previous implementation.

Feature Control Register (FCR) macros were unified under one group and
consistent naming while implementing the change. Few comments updated as well.
patch reviewed by Riastradh@ (thank you)

PR kern/58370

Move determination of the largest VIA CPU extended function value
to the intended place where the checks are performed.
Currently the value can be overridden while checking for the padlock features,
and failing the check for max function value as a result.
 1.128.6.2 02-Aug-2025  perseant Sync with HEAD
 1.128.6.1 01-Jul-2024  perseant Sync with HEAD.
 1.15 16-May-2025  imil PR port-amd64/59424: remove useless lapic_from_cpuid = true
 1.14 02-May-2025  imil Add support for CPUID leaf 0x40000010 to detect TSC and LAPIC frequency on
hypervisors implementing the VMware-defined interface

This change enables virtual machines to obtain TSC and LAPIC frequency
information directly from the hypervisor via CPUID leaf 0x40000010, avoiding
the need for runtime calibration, thus reducing boot speed in supported
environments.

Tested on GENERIC and MICROVM kernels, QEMU/KVM and QEMU/NVMM (current and
10.1), Intel and AMD CPUs, NetBSD/amd64 and i386.
 1.13 06-Mar-2025  imil Revert VMware-compatible TSC and LAPIC frequency detection.
 1.12 06-Mar-2025  imil Allow tsc_freq_vmware_cpuid() for TSC frequency even if there is no LAPIC support
 1.11 06-Mar-2025  imil Test for LAPIC support
 1.10 06-Mar-2025  imil Add support for CPUID leaf 0x40000010, which enables VMware-compatible TSC
and LAPIC frequency detection for virtual machines.
 1.9 07-Oct-2021  msaitoh branches: 1.9.10;
Move some common functions into x86/identcpu_subr.c. No functional change.
 1.8 16-Jan-2021  jmcneill trailing whitespace
 1.7 10-Jul-2020  msaitoh branches: 1.7.2; 1.7.4; 1.7.6;
Add missing NetBSD RCS Id.
 1.6 09-Jun-2020  msaitoh Add braces.
 1.5 09-Jun-2020  msaitoh Remove debug printf.
 1.4 12-May-2020  msaitoh Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.
 1.3 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.2 21-Apr-2020  msaitoh branches: 1.2.2; 1.2.4;
Print "Hz".
 1.1 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.2.4.2 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.2.4.1 21-Apr-2020  bouyer file identcpu_subr.c was added on branch bouyer-xenpvh on 2020-04-25 11:23:57 +0000
 1.2.2.2 21-Apr-2020  martin Sync with HEAD
 1.2.2.1 21-Apr-2020  martin file identcpu_subr.c was added on branch phil-wifi on 2020-04-21 18:42:12 +0000
 1.7.6.1 03-Apr-2021  thorpej Sync with HEAD.
 1.7.4.3 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1721:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.7.4.2 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1585:

usr.sbin/cpuctl/Makefile 1.9
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.5
usr.sbin/cpuctl/arch/i386.c 1.111-1.113 via patch
usr.sbin/cpuctl/cpuctl.c 1.31
usr.sbin/cpuctl/cpuctl.h 1.7
sys/arch/x86/x86/identcpu_subr.c 1.1-1.7

- Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel
processors.
- Add 0xa5 and 0xa6 for Comet Lake.
- Rename ci_cpuid_level to ci_max_cpuid and ci_cpuid_extlevel to
ci_max_ext_cpuid to match x86/include/cpu.h. No functional change.
- Sort some entries.
- Add comment.
 1.7.4.1 10-Jul-2020  martin file identcpu_subr.c was added on branch netbsd-8 on 2020-08-05 15:48:53 +0000
 1.7.2.3 24-Dec-2021  martin Pull up the following (all via patch), requested by msaitoh in ticket #1396:

usr.sbin/cpuctl/arch/i386.c 1.118-1.119, 1.121-1.122
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.6
sys/arch/x86/x86/identcpu_subr.c 1.8-1.9
sys/arch/x86/x86/identcpu.c 1.123
sys/arch/x86/include/cacheinfo.h 1.30
sys/arch/x86/include/cpu.h 1.132

- Fix a bug that some TLB related lines were not printed.
- Fix a bug that STLB is printed as DTLB.
- If a TLB is variable sized, print the max size instead of error message.
- Cosmetic changes to improve readability.
 1.7.2.2 10-Jul-2020  martin Pull up the following revisions (all via patch) requested by msaitoh in
ticket #995:

usr.sbin/cpuctl/Makefile 1.9
usr.sbin/cpuctl/arch/cpuctl_i386.h 1.5
usr.sbin/cpuctl/arch/i386.c 1.111-1.113
usr.sbin/cpuctl/cpuctl.c 1.31
usr.sbin/cpuctl/cpuctl.h 1.7
sys/arch/x86/x86/identcpu_subr.c 1.1-1.7

- Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel
processors.
- Add 0xa5 and 0xa6 for Comet Lake.
- Rename ci_cpuid_level to ci_max_cpuid and ci_cpuid_extlevel to
ci_max_ext_cpuid to match x86/include/cpu.h. No functional change.
- Sort some entries.
- Add comment.
 1.7.2.1 10-Jul-2020  martin file identcpu_subr.c was added on branch netbsd-9 on 2020-07-10 11:20:29 +0000
 1.9.10.1 02-Aug-2025  perseant Sync with HEAD
 1.7 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.6 28-Apr-2008  ad branches: 1.6.2;
cpu_idle: assert ilevel == IPL_NONE.
 1.5 14-Nov-2007  ad branches: 1.5.14; 1.5.16; 1.5.18;
- Remove I486_CPU, I586_CPU, I686_CPU options. They buy us nothing and
clutter the code significantly.
- Remove pccons.
 1.4 26-Sep-2007  ad branches: 1.4.2; 1.4.4;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.3 29-Aug-2007  ad branches: 1.3.2; 1.3.4;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.2 17-May-2007  yamt branches: 1.2.2; 1.2.6; 1.2.10; 1.2.12;
merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.1 03-Mar-2007  yamt branches: 1.1.2; 1.1.4;
file idle_machdep.c was initially added on branch yamt-idlelwp.
 1.1.4.4 03-Dec-2007  ad Sync with HEAD.
 1.1.4.3 09-Oct-2007  ad Sync with head.
 1.1.4.2 21-Aug-2007  ad Don't clear ci_want_resched here, it gets done in mi_switch().
 1.1.4.1 09-Jun-2007  ad Sync with head.
 1.1.2.1 03-Mar-2007  yamt move i386/i386/idle_machdep.c to x86/x86/idle_machdep.c.
 1.2.12.2 09-Jan-2008  matt sync with HEAD
 1.2.12.1 06-Nov-2007  matt sync with HEAD
 1.2.10.3 21-Nov-2007  joerg Sync with HEAD.
 1.2.10.2 02-Oct-2007  joerg Sync with HEAD.
 1.2.10.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.2.6.1 03-Sep-2007  skrll Sync with HEAD.
 1.2.2.2 11-Jul-2007  mjf Sync with head.
 1.2.2.1 17-May-2007  mjf file idle_machdep.c was added on branch mjf-ufs-trans on 2007-07-11 20:03:20 +0000
 1.3.4.1 06-Oct-2007  yamt sync with head.
 1.3.2.4 15-Nov-2007  yamt sync with head.
 1.3.2.3 27-Oct-2007  yamt sync with head.
 1.3.2.2 03-Sep-2007  yamt sync with head.
 1.3.2.1 29-Aug-2007  yamt file idle_machdep.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:25 +0000
 1.4.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.4.2.1 18-Nov-2007  bouyer Sync with HEAD
 1.5.18.1 16-May-2008  yamt sync with head.
 1.5.16.1 17-Jun-2008  yamt fix merge botches
 1.5.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.2.1 23-Jun-2008  wrstuden Remove files removed on branch. Updating using patch has its
drawbacks. :-)
 1.17 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.16 13-Feb-2022  riastradh x86: Membar audit in idt.c.

- idt_vec_free/alloc are obviously supposed to synchronize with a
happens-before relation, so use release/acquire.

- There is no store-before-load ordering needed, so omit membar_sync.
 1.15 23-Dec-2021  yamaguchi x86: improve error handling related to idt_vec_alloc()
 1.14 14-Jul-2020  para mark diagused variable as such

fixing non DIAGNOSTIC builds
 1.13 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.12 04-Jul-2020  bouyer Fix unset_idtgate() for XENPV, pointed out by yamaguchi@
 1.11 17-Jun-2019  msaitoh KNF. No functional change.
 1.10 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.9 18-Oct-2018  cherry Make compile-time type differentiation more explicit.
 1.8 23-Sep-2018  cherry Fix for i386, functionality intended in:
http://mail-index.netbsd.org/source-changes/2018/09/23/msg099357.html

This should fix the build for both GENERIC and XEN3PAE_DOM0

This has not been boot tested on native or xen3pae

Notes: pmap_changeprot_local() seems to be x86_64 only.
I was a bit surprised by this initially, but I suspect that the table
protections are enforced via ring0/ring1 fencing rather than page protections

the gdt registration code in i386 is still messy. I will leave it as is
for now - to avoid a rabbit hole.
 1.7 23-Sep-2018  cherry Make XEN use the same api as native, for idt vector allocation
and registration.

lidt() placed in xenfunc() on maxv@ suggestion.

There should be no functional change due to this commit.

Tested on amd64 native and XEN.
 1.6 04-Nov-2017  cherry branches: 1.6.2; 1.6.4;
In XEN PV, the idt vector table is not required to be altered at
runtime, since only entries for exceptions/traps are registered with
the hypervisor and interrupts are managed via a completely different
mechanism.

This change uses the idt_vec_reserve() mechanism nevertheless,
modifying it slightly to only do namespace management in XEN, while on
native it will continue to do idt entry init as before.

Rationale: Consistent API usage and potential future merging of
XEN/non-XEN code.

There are no functional changes in this commit.
 1.5 07-Aug-2017  maxv Remove incorrect KASSERT, only the allocation is protected by cpu_lock.
 1.4 27-Aug-2016  maxv branches: 1.4.8;
Remove idt_init.
 1.3 19-Apr-2009  ad branches: 1.3.22; 1.3.40;
cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.2 28-Apr-2008  martin branches: 1.2.8; 1.2.14;
Remove clause 3 and 4 from TNF licenses
 1.1 26-Dec-2007  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.16; 1.1.18; 1.1.20;
- share idt entry allocation code among x86.
- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
 1.1.20.2 04-May-2009  yamt sync with head.
 1.1.20.1 16-May-2008  yamt sync with head.
 1.1.18.1 18-May-2008  yamt sync with head.
 1.1.16.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.10.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.10.1 26-Dec-2007  mjf file idt.c was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.1.8.2 21-Jan-2008  yamt sync with head
 1.1.8.1 26-Dec-2007  yamt file idt.c was added on branch yamt-lazymbuf on 2008-01-21 09:40:14 +0000
 1.1.6.2 09-Jan-2008  matt sync with HEAD
 1.1.6.1 26-Dec-2007  matt file idt.c was added on branch matt-armv6 on 2008-01-09 01:49:55 +0000
 1.1.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.4.1 26-Dec-2007  bouyer file idt.c was added on branch bouyer-xeni386 on 2008-01-02 21:51:24 +0000
 1.1.2.2 26-Dec-2007  ad Sync with head.
 1.1.2.1 26-Dec-2007  ad file idt.c was added on branch vmlocking2 on 2007-12-26 19:17:18 +0000
 1.2.14.2 01-Nov-2009  jym Sync with HEAD.
 1.2.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.3.40.2 28-Aug-2017  skrll Sync with HEAD
 1.3.40.1 05-Oct-2016  skrll Sync with HEAD
 1.3.22.1 03-Dec-2017  jdolecek update from HEAD
 1.4.8.1 25-Aug-2017  snj Pull up following revision(s) (requested by jdolecek in ticket #224):
sys/arch/x86/x86/idt.c: revision 1.5
Remove incorrect KASSERT, only the allocation is protected by cpu_lock.
 1.6.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.6.4.1 10-Jun-2019  christos Sync with HEAD
 1.6.2.2 20-Oct-2018  pgoyette Sync with head
 1.6.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.26 20-Jan-2023  msaitoh s/attemping/attempting/ in comment.
 1.25 07-Oct-2021  msaitoh KNF. No functional change.
 1.24 02-Jul-2015  msaitoh Fix bus clock for Airmont from the latest Intel SDM.
 1.23 27-May-2015  msaitoh Add new bus clock for Airmont.
 1.22 26-May-2015  msaitoh Remove obsolete comment.
 1.21 27-Mar-2015  msaitoh Update from Intel SDM:
- Add busclock values for Airmont.
 1.20 17-Dec-2014  msaitoh - Round off some bus clock values.
- Add 333.33MHz for Pentium 4.
 1.19 25-Jul-2014  msaitoh branches: 1.19.2; 1.19.4;
Modify p3_get_bus_clock():
- Intel SDM says 06_17H is the same as 06_0fH. Same as OpenBSD.
- Add some Silvermont models.
- For Slivermont architecture, 0x011 is not 166.67MHz but 116.67MHz.
- Print model name not in decimal but in hexadecimal
- Cleanup code.
 1.18 17-Nov-2013  martin branches: 1.18.2;
Remove an unused variable
 1.17 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.16 12-Nov-2013  msaitoh Check the CPU display model instead of the base model. Re-enable the
checking for Atom and Silvermont.
 1.15 11-Nov-2013  christos CID 1128377: Comment out unreachable code; model is only 4 bits wide, so
none of these constants can ever match.
 1.14 07-Nov-2013  msaitoh Get bus clock for some Atom processors.
 1.13 24-Sep-2011  jym branches: 1.13.2; 1.13.12; 1.13.16;
Be conservative when reading MSR_FSB_FREQ by using rdmsr_safe(). We cannot
tell in advance when new CPU model/family combo will come and trying to
read that MSR early during boot may cause unhandled faults.
 1.12 23-Feb-2011  jruoho Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.
 1.11 08-Aug-2010  jym branches: 1.11.2; 1.11.4;
Some core i7 CPUs report model 0c. In this case, check for the extended
model value.

Required to avoid faulting on rdmsr(MSR_FSB_FREQ) early during boot.

Will ask for a pull-up. This affects GENERIC, and most likely, install iso
too.

XXX quick hack. Obtaining FSB through ACPI should be cleaner.
 1.10 23-May-2010  christos Add entry for:
Intel(R) Core(TM)2 Duo CPU P9500 @ 2.53GHz
 1.9 03-Dec-2009  sborrill branches: 1.9.2; 1.9.4;
Interim workaround for modern Xeons that don't have the simplistic view of
bus speed and therefore do not support MSR_FSB_FREQ (e.g. X3400). In the
long-term, ACPI should be used for this (c.f. FreeBSD).
 1.8 02-Oct-2009  jmcneill Use the TSC and current multiplier to calculate bus clock on VIA C7 Esther.
Probably needed for all C7 and Nano processors, but to be safe only use
this alternate method on Esther for now.

Now est on my C7-M 1.6GHz properly reports frequencies from 1600 to 400,
instead of 2133 to 533.
 1.7 25-Mar-2009  dyoung It is only by accident that these get definitions they need from
<sys/device.h>, so explicitly #include <sys/device.h>.
 1.6 12-Nov-2008  jmcneill branches: 1.6.4;
Add support for enhanced speedstep on Intel Atom CPUs
 1.5 28-Apr-2008  martin branches: 1.5.6; 1.5.8; 1.5.10;
Remove clause 3 and 4 from TNF licenses
 1.4 04-Jan-2008  ad branches: 1.4.6; 1.4.8; 1.4.10;
Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.3 04-Jan-2008  christos add missing includes
 1.2 01-Jul-2007  xtraeme branches: 1.2.2; 1.2.12; 1.2.14; 1.2.16; 1.2.20; 1.2.24; 1.2.30;
Add support for the VIA C7-M and Eden processors in the
Enhanced Speedstep driver.

Tested by Heron Gallegos <gallegos at csxxi dot net dot mx>
 1.1 03-Jun-2007  xtraeme branches: 1.1.2;
Make the Enhanced Speedstep driver available for i386 and amd64.
To use it on EM64T CPUs supporting the EST CPUID feature. Note that
some CPUs still don't work with this driver, like Xeon or Pentium 4.

Move the p[34]_get_bus_clock functions into its own file,
intel_busclock.c and remove this code from i386/identcpu.c.

Tested on i386 by myself and amd64 by Tonerre.
 1.1.2.4 15-Jul-2007  ad Sync with head.
 1.1.2.3 09-Jun-2007  ad Sync with head.
 1.1.2.2 09-Jun-2007  ad Sync with head.
 1.1.2.1 03-Jun-2007  ad file intel_busclock.c was added on branch vmlocking on 2007-06-09 21:37:06 +0000
 1.2.30.1 08-Jan-2008  bouyer Sync with HEAD
 1.2.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.2.20.2 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.2.20.1 01-Jul-2007  wrstuden file intel_busclock.c was added on branch wrstuden-fixsa on 2007-09-23 21:36:28 +0000
 1.2.16.2 12-Sep-2007  msaitoh Pull up following patches (requested by xtraeme in ticket #809)

share/man/man4/options.4 patch
sys/arch/i386/conf/files.i386 patch
sys/arch/i386/i386/est.c delete
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/include/cpuvar.h patch
sys/arch/x86/x86/est.c new file
sys/arch/x86/x86/intel_busclock.c new file
sys/arch/amd64/amd64/identcpu.c patch
sys/arch/amd64/conf/GENERIC patch

Add support for the VIA C7-M and Eden processors in the Enhanced
Speedstep driver.
amd64: The Enhanced Speedstep driver is now able to work on EM64T
CPUs running in 64bit mode.
 1.2.16.1 01-Jul-2007  msaitoh file intel_busclock.c was added on branch netbsd-4 on 2007-09-12 10:05:04 +0000
 1.2.14.3 21-Jan-2008  yamt sync with head
 1.2.14.2 03-Sep-2007  yamt sync with head.
 1.2.14.1 01-Jul-2007  yamt file intel_busclock.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:25 +0000
 1.2.12.1 09-Jan-2008  matt sync with HEAD
 1.2.2.2 11-Jul-2007  mjf Sync with head.
 1.2.2.1 01-Jul-2007  mjf file intel_busclock.c was added on branch mjf-ufs-trans on 2007-07-11 20:03:21 +0000
 1.4.10.4 11-Aug-2010  yamt sync with head.
 1.4.10.3 11-Mar-2010  yamt sync with head
 1.4.10.2 04-May-2009  yamt sync with head.
 1.4.10.1 16-May-2008  yamt sync with head.
 1.4.8.1 18-May-2008  yamt sync with head.
 1.4.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.4.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.10.6 04-Jun-2015  martin Pull up the following revisions, requested by sborrill in #1963:

sys/arch/x86/x86/intel_busclock.c 1.10, 1.13-23

Update p[34]_get_bus_clock() to avoid panic in est(4).
Return correct bus clock on some CPUs. Use rdmsr_safe()
to access MSRs safely.
 1.5.10.5 22-Aug-2012  bouyer branches: 1.5.10.5.2;
sys/arch/x86/x86/intel_busclock.c patch

Add support for Xeon E5
[sborrill, ticket #1787]
 1.5.10.4 31-Aug-2010  bouyer branches: 1.5.10.4.2;
Pull up following revision(s) (requested by jym in ticket #1439):
sys/arch/x86/x86/intel_busclock.c: revision 1.11
Some core i7 CPUs report model 0c. In this case, check for the extended
model value.
Required to avoid faulting on rdmsr(MSR_FSB_FREQ) early during boot.
Will ask for a pull-up. This affects GENERIC, and most likely, install iso
too.
XXX quick hack. Obtaining FSB through ACPI should be cleaner.
 1.5.10.3 18-Dec-2009  snj Pull up following revision(s) (requested by sborrill in ticket #1181):
sys/arch/x86/x86/intel_busclock.c: revision 1.9
Interim workaround for modern Xeons that don't have the simplistic view of
bus speed and therefore do not support MSR_FSB_FREQ (e.g. X3400). In the
long-term, ACPI should be used for this (c.f. FreeBSD).
 1.5.10.2 05-Oct-2009  sborrill Pull up following revision(s) (requested by jmcneill in ticket #1059):
sys/arch/x86/include/cpuvar.h: 1.30
sys/arch/x86/x86/est.c: 1.12
sys/arch/x86/x86/intel_busclock.c: 1.8

Use the TSC and current multiplier to calculate bus clock on VIA C7 Esther.
Probably needed for all C7 and Nano processors, but to be safe only use this
alternate method on Esther for now.
 1.5.10.1 14-Nov-2008  snj branches: 1.5.10.1.4;
Pull up following revision(s) (requested by jmcneill in ticket #52):
sys/arch/x86/x86/intel_busclock.c: revision 1.6
Add support for enhanced speedstep on Intel Atom CPUs
 1.5.10.5.2.1 04-Jun-2015  martin Pull up the following revisions, requested by sborrill in #1963:

sys/arch/x86/x86/intel_busclock.c 1.10, 1.13-23

Update p[34]_get_bus_clock() to avoid panic in est(4).
Return correct bus clock on some CPUs. Use rdmsr_safe()
to access MSRs safely.
 1.5.10.4.2.1 04-Jun-2015  martin Pull up the following revisions, requested by sborrill in #1963:

sys/arch/x86/x86/intel_busclock.c 1.10, 1.13-23

Update p[34]_get_bus_clock() to avoid panic in est(4).
Return correct bus clock on some CPUs. Use rdmsr_safe()
to access MSRs safely.
 1.5.10.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.5.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.5.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.5.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.6.4.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.6.4.3 24-Oct-2010  jym Sync with HEAD
 1.6.4.2 01-Nov-2009  jym Sync with HEAD.
 1.6.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.9.4.2 05-Mar-2011  rmind sync with head
 1.9.4.1 30-May-2010  rmind sync with head
 1.9.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.11.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.11.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.13.16.1 18-May-2014  rmind sync with head
 1.13.12.2 03-Dec-2017  jdolecek update from HEAD
 1.13.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.2.1 10-Aug-2014  tls Rebase.
 1.19.4.3 22-Sep-2015  skrll Sync with HEAD
 1.19.4.2 06-Jun-2015  skrll Sync with HEAD
 1.19.4.1 06-Apr-2015  skrll Sync with HEAD
 1.19.2.2 19-Apr-2015  riz Pull up following revision(s) (requested by msaitoh in ticket #702):
sys/arch/x86/x86/intel_busclock.c: revision 1.21
Update from Intel SDM:
- Add busclock values for Airmont.
 1.19.2.1 08-Jan-2015  martin Pull up following revision(s) (requested by msaitoh in ticket #393):
sys/arch/x86/x86/intel_busclock.c: revision 1.20
- Round off some bus clock values.
- Add 333.33MHz for Pentium 4.
 1.169 11-Sep-2024  mrg apply some more diagnostic checks for x86 interrupts

convert intr_biglock_wrapper() into a slight less complete
intr_wrapper(), and move the kernel lock/unlock points into
the new intr_biglock_wrapper().

add curlwp->l_nopreempt checking for interrupt handlers,
including the dtrace wrapper.

XXX: has to copy the i8254_clockintr hack.

tested for a few months by myself, and recently by rin@ on both
current and netbsd-10. thanks!
 1.168 22-Apr-2024  andvar branches: 1.168.2;
Add opt_pci.h include to fix NO_PCI_MSI_MSIX build.
(Path from Paolo Pisati in current_users@)

While here:
Simplify mp_cpu_start() ifdefs. MULTIPROCESSOR and HYPERV code falls under
NLAPIC > 0, thus just combine all blocks under this guard.
Rearrange opt_acpi.h include alphabetically.
 1.167 05-Mar-2024  andvar Remove duplicate "when" word in comments.
 1.166 29-Nov-2023  mlelstv Fix use-after-free (source->is_type) when detecting unsharable
interrupts. Doesn't solve the interrupt conflict itself, but
avoids a panic.
 1.165 11-Apr-2023  riastradh x86: Omit needless membar_sync in intr_disestablish_xcall.

Details in comments.
 1.164 25-Jan-2023  riastradh x86/intr: Work around sleazy clockintr with a secret frame argument.

PR kern/57197
 1.163 29-Oct-2022  riastradh branches: 1.163.2;
x86: Add dtrace probes for interrupt handler entry and return.

Arguments:

0: interrupt handler function
1: interrupt handler's private cookie argument
2: MD struct intrhand pointer
3: return value (true if relevant to this handler, false if not)

The MD struct intrhand pointer makes the first two arguments
redundant, but we might reuse the signature of the first two
arguments for an MI SDT probe to make it easy to write MI dtrace
scripts for monitoring interrupt handlers. The MD intrhand can be
used for getting more information about the interrupt like ih_level,
ih_pin, ih_slot, &c., which may not make sense as an MI API.
 1.162 26-Oct-2022  riastradh ddb/db_active.h: New home for extern db_active.

This can be included unconditionally, and db_active can then be
queried unconditionally; if DDB is not in the kernel, then db_active
is a constant zero. Reduces need for #include opt_ddb.h, #ifdef DDB.
 1.161 07-Sep-2022  knakahara NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.160 12-Mar-2022  riastradh x86: Check for biglock leakage in interrupt handlers.
 1.159 23-Dec-2021  yamaguchi Move the variable into the section that uses it
 1.158 23-Dec-2021  yamaguchi delete the extra space
 1.157 23-Dec-2021  yamaguchi x86: improve error handling related to idt_vec_alloc()
 1.156 07-Oct-2021  msaitoh KNF. No functional change.
 1.155 09-Aug-2021  andvar s/alway /always/
 1.154 19-Feb-2021  knakahara Fix x86's pci_intr_disestablish clean up routine. Pointed out by t-kusaba@IIJ, thanks.

Fix panic on x86 by the following code.
 1.153 18-Nov-2020  bouyer Preserve Xen SIR slots for VM_GUEST_XENPVH.
 1.152 14-Jul-2020  yamaguchi branches: 1.152.2;
Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.151 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.150 30-Dec-2019  thorpej branches: 1.150.6;
Fix a problem with intr_unmask() that can cause a forever-loop:
- When handling the source-is-masked case in the interrupt vector, set the
interrupt bit in a new ci_imasked field and ensure the bit is cleared
from ci_ipending.
- In intr_unmask(), transfer the bit from ci_imasked to ci_ipending for
non-level-sensitive interrupts (the PIC does the work for us in the
level-sensitive case), and only force pending interrupts to be processed
in this case. (In all cases, make sure the now-unmasked bit is cleared
from ci_imasked.)

Before, the bit was left in ci_ipending so as not to use edge-triggered
interrupts while the source is masked, but Xspllower() relies on the
pending bits getting cleared.

Tested by forcing all wm(4) interrupts on my test system though an
intr_mask() / softint / intr_unmask() cycle and exercising the network
heavily.
 1.149 22-Dec-2019  ad Fix compile on !DIAGNOSTIC.
 1.148 22-Dec-2019  thorpej Add intr_mask() and corresponding intr_unmask() calls that allow specific
interrupt lines / sources to be masked as needed (rather than making a
set of sources by IPL as with spl*()).
 1.147 08-Nov-2019  msaitoh Fix a bug that evcnt_detach() called twice when the idt vector is full.
OK'd by knakahara.
 1.146 17-Jun-2019  msaitoh branches: 1.146.2;
KNF. No functional change.
 1.145 05-Jun-2019  knakahara Add TODO comments to support MSI multiple vectors on x86 systems.
 1.144 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.143 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.142 12-Feb-2019  cherry Fix typo: Parameters are seperated by ','.
 1.141 25-Dec-2018  cherry Excise XEN specific code out of x86/x86/intr.c into xen/x86/xen_intr.c

While at it, separate the source function tracking so that the interrupt
paths are truly independant.

Use weak symbol exporting to provision for future PVHVM co-existence
of both files, but with independant paths. Introduce assembler code
such that in a unified scenario, native interrupts get first priority
in spllower(), followed by XEN event callbacks. IPL management and
semantics are unchanged - native handlers and xen callbacks are
expected to maintain their ipl related semantics.

In summary, after this commit, native and XEN now have completely
unrelated interrupt handling mechanisms, including
intr_establish_xname() and assembler stubs and intr handler
management.

Happy Christmas!
 1.140 24-Dec-2018  cherry Towards bifurcating XEN and native interrupt related functions,
this is a preliminary cleanup sweep.

Move functions related to MP bus probe and scanning to x86/mp.c

Move generic platform pic search function to x86/x86_machdep.c
 1.139 24-Dec-2018  cherry Bifurcate the interrupt establish functions between XEN and non-XEN

Thus intr_establish_xname() becomes xen_intr_establish_xname() etc.

One consequence of this is that dom0 devices expect the native
function calls to be available and we thus provide weak aliasing for
dom0 builds to succeed. XEN and non-XEN devices are distinguished by
the PIC they are established on. XEN interrupts are exclusively
established on xen_pic, while dom0 interrupts are established on
natively available PICs.

This allows us an orthogonal path to xen device management (eg:
xenstore events) in XENPVHVM, without having to worry about unifying
the vector entry paths, etc., which is quite challenging.
 1.138 23-Dec-2018  jdolecek whitespace, NFC
 1.137 04-Dec-2018  cherry Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.
 1.136 02-Dec-2018  cherry make

options NO_PCI_MSI_MSIX

work again for arch/x86/
 1.135 24-Oct-2018  cherry When returning a cached shared irq event value, DTRT
 1.134 08-Oct-2018  cherry Clean up XEN specific stuff from the apic code, and move to intr.c

No functional change.
 1.133 07-Oct-2018  cherry Switch over to a "GSI" concept for guest irqs.

On XEN there is a namespace called GSI which includes:

i) legacy_irq (0 - 16)
ii) "gsi" (16-nr_irqs_gsi)
iii) msi

We try to mirror this in guest space, but are mindful that legacy_irq
is 1:1 bound to actual hardware legacy_irq. Apart from this, XEN doesn't
really care what number scheme we use, as long as it doesn't encroach
on the MSI space, which is TBD for us.

Thus we trust the mpbios.c/mpacpi.c code to correctly map the pic,pin
tuples into the correct global gsi space, which we then register with
xen. As we now do, we allow for duplicate gsi registrations, in case
any hardware shares the same (pic,pin);

This enables us to now use the (pic,pin) tuple as the canonical reference
for device interrupt addresses, and leave any global mappings to specific
code. Thus xen_pic_to_gsi().

Note that this requires separate support for MSI, which I will get around to
once things stabilise - however the API change facilitates this nicely.

I note that the msi addroute() function does not use the "pin" parameter.
This can be made use of, to encode the gsi number, for XEN. This is however
TBD.

We further tweak the xen_vec_alloc() code to be uniform for the NIOAPICS
and other cases, and ensure that i8259.c DTRT wrt to route().

This will allow us to use pic->pic_addroute() without needing to worry about
pic specific issues.

The next step is to consolidate the pic_addroute() XEN related #ifdefs into
a -DXEN specific file, so that we don't clutter x86/ code with #ifdef XENs.

This change has functional implications, and there is likely breakage coming
especially on bespoke platforms that I haven't been able to test yet.

I am especially interested in bug reports from platforms with legacy (esp. i386)
and with multiple ioapics.
 1.132 06-Oct-2018  cherry Change the name of xen_pirq_alloc() to xen_vec_alloc() to reflect
its actual job.

The idea is that we will strip this down until it is as close to
idt_vec_alloc() as possible.
 1.131 06-Oct-2018  cherry Move the pic->pic_addroute() call from within pintr.c:xen_pirq_alloc() to
intr.c:intr_establish_xname()

xen_pirq_alloc() now returns a vector value, as is intended by
the semantics of the call to the hypervisor, PHYSDEVOP_ASSIGN_VECTOR.

This also brings our usage closer to native.
 1.130 20-Sep-2018  cherry When we removed the XEN special case from isa/isa_machdep.c
there was a corner case that was missed in
x86/intr.c:intr_establish_xname()

In isa_machdep.c:isa_intr_establish_xname() the legacy_irq parameter
is never set to -1. It is also incorrect to call
isa_intr_establish_xname() with a legacy_irq parameter of -1.

Thus we infer that whenever we see (legacy_irq == -1) in
intr_establish_xname() which is downstream, we were *NOT* called from
isa_machdep.c:isa_intr_establish_xname()

Given that there are no other users of intr_establish_xnam() which
pass a valid legacy_irq != -1, we assume therefore that we *WERE*
called from isa_machdep.c:isa_intr_establish_xname() in this case.

This is an important distinction in the case where a valid
legacy_irq > NUM_LEGACY_IRQS was passed down from
isa_intr_establish_xname() but was ignored by xen_pirq_alloc() and
overwritten with the "pseudo" irq which is then passed back. We thus
pass the incorrect "legacy" irq value to pirq_establish().

Even though non ISA (ie; PCI and MSI(X) cases), this is the correct
behaviour, we need to maintain (bug?) compatibility with the isa
case.

Thus the one liner diff.

CVS: ----------------------------------------------------------------------
CVS: CVSROOT cvs.NetBSD.org:/cvsroot
CVS: please use "PR category/123" to have the commitmsg appended to PR 123
CVS:
CVS: Please evaluate your changes and consider the following.
CVS: Abort checkin if you answer no.
CVS: => For all changes:
CVS: Do the changed files compile?
CVS: Has the change been tested?
CVS: => If you are not completely familiar with the changed components:
CVS: Has the change been posted for review?
CVS: Have you allowed enough time for feedback?
CVS: => If the change is major:
CVS: => If the change adds files to, or removes files from $DESTDIR:
CVS: => If you are changing a library or kernel interface:
CVS: Have you successfully run "./build.sh release"?
 1.129 14-Sep-2018  mrg fix a !MP build issue.
 1.128 10-Sep-2018  cherry Make the use of 'irqs' in the range 0 < irq < 255 by xen
as a handle for internal use explicit.

This allows us to pass up the handle as "legacy" irq while
establishing interrupt handlers for xen.

No functional change.
 1.127 03-Jul-2018  kamil Avoid unportable signed integer left shift in intr_calculatemasks()

Detected with Kernel Undefined Behavior Sanitizer.

There were at least two places reported, for consistency fix all the
left shift bit shift.

src/sys/arch/x86/x86/intr.c:339:22, left shift of 1 by 31 places cannot be represented in type 'int'
src/sys/arch/x86/x86/intr.c:347:15, left shift of 1 by 31 places cannot be represented in type 'int'

Reported by <Harry Pantazis>
 1.126 24-Jun-2018  jdolecek branches: 1.126.2;
add support for kern.intr.list aka intrctl(8) 'list' for xen

event_set_handler() and pirq_establish() now have extra intrname
parameter; shared intr_create_intrid() is used to provide the value

xen drivers were changed to pass the specific driver instance
name as the xname, e.g. 'vcpu0 clock' instead just 'clock', or
'xencons0' instead of 'xencons'

associated evcnt is now changed to use intrname - this matches native x86
 1.125 04-Apr-2018  christos Rename Xpreempt{recurse,resume} -> X{recurse,resume}_preempt so that
they fit the pattern. Also the debugger trap sniffer matches them
without adding special entries...
XXX: pullup-8.
 1.124 26-Mar-2018  knakahara Fix "intrctl list" causes panic while attaching MSI/MSI-X devices.

When there are devices which is already pci_intr_alloc'ed, however is not
established yet, "intrctl list" causes panic. E.g.
# while true; do intrctl list > /dev/null ; done&
# drvctl -d ixg0 && drvctl -r pci0

And add some KASSERTMSG to similar but not the same code.

Pointed out by msaitoh@n.o.

XXX pullup-8
 1.123 17-Feb-2018  maxv branches: 1.123.2;
Rename i8259_stubs -> legacy_stubs. We will want the entries to have the
same name, eg:

legacy_stubs
-> Xintr_legacy0, Xrecurse_legacy0, Xresume_legacy0
-> Xintr_legacy1, Xrecurse_legacy1, Xresume_legacy1
...
 1.122 23-Jan-2018  roy Rework prior two patches to fix clang release builds.
Original patch by darcy@
 1.121 16-Jan-2018  kre Attempt to complete previous and allow XEN to compile (as well as link)
 1.120 16-Jan-2018  roy Fix XEN builds
 1.119 13-Jan-2018  bouyer Also set ih_realfun/ih_realarg in Xen's intr_establish_xname() as
intr_disestablish() use them.
Should fix the panic at device detach time (esp. at shutdown time).
 1.118 12-Jan-2018  maxv Remove unused.
 1.117 11-Jan-2018  maxv Initialize ist0 in cpu_init_tss. On amd64 this is the DDB stack, and it has
nothing to do with ci_intrstack. While here, style, and don't forget to
pass UVM_KMF_ZERO in uvm_km_alloc.
 1.116 04-Jan-2018  maxv Allocate the TSS area dynamically. This way cpu_info and cpu_tss can be
put in separate pages.
 1.115 04-Jan-2018  maxv Group the different TSSes into a cpu_tss structure. And pack this
structure to make sure there is no padding between 'tss' and 'iomap'.
 1.114 04-Jan-2018  knakahara fix "intrctl list" panic when ACPI is disabled.

reviewed by cherry@n.o and tested by msaitoh@n.o, thanks.
 1.113 13-Dec-2017  bouyer Fixes for physical interrupts on Xen:
- do not cast int * to intr_handle_t *, they're not the same size
- legacy_irq is not always -1 for ioapic interrupts, test pic_type instead
- change irq2port[] to hold (port + 1) so that 0 is an invalid value
- add KASSERTs to make sure vect, port or irq values extracted from arrays are
valid (or that they are invalid before write)
- for the !ioapic case, we still need to do PHYSDEVOP_ASSIGN_VECTOR and
bind_pirq_to_evtch().

now XEN3_DOM0 boots again
 1.112 11-Nov-2017  riastradh Add KASSERT to confirm no change in xen intr MP-safety annotations.
 1.111 11-Nov-2017  riastradh KNF NFC
 1.110 11-Nov-2017  riastradh Free ih when done.
 1.109 11-Nov-2017  riastradh Pass xname through Xen intr_establish_xname to event_set_handler.
 1.108 11-Nov-2017  riastradh Pass IPL through from intr_establish to event_set_handler.

Don't unconditionally use IPL_CLOCK, which aside from being the wrong
IPL for non-IPL_CLOCK interrupt handlers has the side effect of running
all interrupt handlers without the giant lock, even those that are not
MP-safe.

This is a step toward fixing:

https://mail-index.netbsd.org/tech-kern/2017/11/09/msg022571.html

ok cherry
 1.107 11-Nov-2017  riastradh #if DIAGNOSTIC panic ---> KASSERTMSG
 1.106 04-Nov-2017  cherry Retire xen/x86/intr.c and use the new xen specific glue in x86/x86/intr.c

The purpose of this change is to expose the x86/include/intr.h API
to drivers. Specifically the following functions:

void *intr_establish_xname(...);
void *intr_establish(...);
void intr_disestablish(...);

while maintaining the old API from xen/include/evtchn.h, specifically
the following functions:

int event_set_handler(...);
int event_remove_handler(...);

This is so that if things break, we can keep using the old API until
everything stabilises. This is a stepping stone towards getting the
actual XEN event callback path rework code in place - which can be
done opaquely behind the intr.h API - NetBSD/XEN specific drivers that
have been ported to the intr.h API should then work without
significant further modifications.
 1.105 27-Oct-2017  joerg Revert printf return value change.
 1.104 27-Oct-2017  utkarsh009 [syzkaller] Cast all the printf's to (void *)
as a result of new printf(9) declaration.
 1.103 03-Sep-2017  cherry Remove redundant static function declaration
 1.102 31-Jul-2017  maxv Use idt_vec_set instead.
 1.101 01-Jun-2017  chs branches: 1.101.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.100 20-Apr-2017  knakahara always notice if the device's interrupt set affinity to other than CPU#0.

suggested by msaitoh@n.o.
 1.99 18-Apr-2017  knakahara change aprint_verbose() to know easily msi devices affinity to CPU#0 or not.

suggested by msaitoh@n.o.
 1.98 18-Apr-2017  knakahara use DPRINTF instead of #define INTRDEBUG and printf().
 1.97 19-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.96 06-Dec-2016  maxv branches: 1.96.2;
Memory leak, found by Mootja
 1.95 16-Nov-2016  knakahara avoid a failure of interrupt affinity when the interrupt is pending.

pointed out and reviewed by ozaki-r@n.o, thanks.
 1.94 11-Jul-2016  knakahara branches: 1.94.2;
should use strlcpy instead of strncpy.

pointed out by dholland@n.o.
 1.93 11-Jul-2016  knakahara strncpy should use destination buf length instead of source buf length.

pointed out by nonaka@n.o.
 1.92 20-Jun-2016  hannken Prevent use after free. Don't free an interrupt source still in use.

Ok: Kengo NAKAHARA
 1.91 17-Nov-2015  hannken Replace SIMPLEQ_FOREACH with SIMPLEQ_FOREACH_SAFE to prevent use-after-free.
 1.90 09-Oct-2015  knakahara remove quick hack code to avoid shared IRQ issue.
 1.89 09-Oct-2015  knakahara fix: "intrctl list" causes panic when the device using pci_intr_alloc() shares IRQ.
 1.88 06-Oct-2015  knakahara quick hack for shared IRQ issue.
 1.87 17-Aug-2015  knakahara Add kernel code to support intrctl(8).
 1.86 23-Jun-2015  msaitoh Fix a bug that an interrupt mask is "un"masked in intr_"dis"establish_xcall().
It's not intended.
- If there is no any handler,
1) do delroute because it has no any source and
2) dont' hwunmask to prevent spurious interrupt.
- If there is any handler,
1) don't delroute because it has source and
2) do hwunmask to be able to get interrupt again.
 1.85 15-May-2015  knakahara pci_msi_string() must be used by MD code only.
 1.84 09-May-2015  christos CID 1297228: Use strlcpy
 1.83 07-May-2015  martin Make it compilable without PCI
 1.82 02-May-2015  roy Fix compile on clang.
 1.81 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.80 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.79 27-Apr-2015  knakahara add pci_intr_distribute(9) for x86.
 1.78 08-Apr-2015  knakahara add prototype declarations
 1.77 20-May-2014  ozaki-r branches: 1.77.4;
Pad 0 to align outputs
 1.76 29-Mar-2014  christos branches: 1.76.2;
make pci_intr_string and eisa_intr_string take a buffer and a length
instead of relying in local static storage.
 1.75 25-Mar-2013  chs branches: 1.75.4;
only use db_printf() if we're actually called from DDB.
this prevents the boot-time one from pausing the boot process.
 1.74 15-Jun-2012  yamt branches: 1.74.2;
comments
 1.73 12-Jun-2012  yamt intr_find_mpmapping: comments and cosmetic. no functional changes.
 1.72 01-Aug-2011  drochner branches: 1.72.2; 1.72.8;
if checking whether an interrupt is shared, don't compare pin numbers
if it is "-1" -- this is a hack to allow MSIs which don't have a concept
of pin numbers, and are generally not shared
(This doesn't give us sensible event names for statistics display. The
whole abstraction has more exceptions than regular cases, it should
be redesigned imho.)
 1.71 03-Apr-2011  dyoung Clean up excessive #ifdef'age of NMI trap handling for amd64/i386/xen.
Handle NMI in all Xen kernels.
 1.70 22-Jan-2011  tsutsui Fix wrong function names in messages by using __func__. PR kern/44431
 1.69 24-Nov-2010  cegger branches: 1.69.2; 1.69.4;
when DDB is enabled then use 'db_printf'
for 'call intr_printconfig' to respect the 25 lines on vga output
 1.68 17-Jun-2010  mrg attach just one of the cpu timer interrupts as EVCNT_TYPE_INTR, so that at
least something shows up in systat and beyond. myself, and a few others,
have been confused at the lack of any timer interrupts appearing here.
 1.67 24-Feb-2010  dyoung branches: 1.67.2;
Rename to 'pc' all variables 'pci_chipset_tag'.
 1.66 25-Nov-2009  rmind branches: 1.66.2;
Remove IPL_LPT and IPL_IPI aliases, use the actual IPLs.
Fix some broken comments.
 1.65 18-Aug-2009  jmcneill Switch to ACPICA 20090730, and update for API changes.
 1.64 06-May-2009  mrg avoid a warning seen with -O3.
 1.63 27-Apr-2009  cegger sprintf -> snprintf
 1.62 22-Apr-2009  ad Route all interrupts back to the BP again, for the time being.
Distributing them is causing strange problems on some systems.
 1.61 19-Apr-2009  ad cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.60 07-Apr-2009  dyoung Add opt_intrdebug.h for the INTRDEBUG option, and #include it here and
there. Fixes GENERIC/i386 compilation with 'options INTRDEBUG'.
 1.59 24-Feb-2009  yamt - rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.58 17-Dec-2008  cegger branches: 1.58.2;
kill MALLOC and FREE macros.
 1.57 03-Jul-2008  drochner branches: 1.57.4; 1.57.6;
split device/softc for ioapic
 1.56 03-Jul-2008  drochner Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.55 30-May-2008  ad branches: 1.55.2;
Add a 'known_mpsafe' argument to intr_establish().
 1.54 13-May-2008  ad intr_string: don't bother printing the legacy irq number when using the
ioapic. It's confusing.
 1.53 13-May-2008  joerg Restore the behaviour intended by rev 1.51 with the patch I actually
send out for testing. The wrong version ended up in the commit.
Original description:
Don't use the legacy interrupt when deciding how to route IOAPIC pins.
On some modern systems not all devices have the PCI interrupt line
set, typically the cardbus bridge is affected and it would result in
different interrupt vectors used for the same IOAPIC pin.
To allow this, simplify the code by checking for an existing match first
and only allocate a new entry if that doesn't exist. For the IOAPIC case
don't bother with the reserveration on the primary CPU for ISA
interrupts, just use them.
 1.52 13-May-2008  ad Back out 1.50 until the assumptions about NUM_LEGACY_IRQS are removed.
Until then there are not enough free interrupt sources on UP systems.
(Sorry Joerg.)
 1.51 11-May-2008  ad Don't use ci_apicid to identify cpus in debug output.
 1.50 11-May-2008  joerg Don't use the legacy interrupt when deciding how to route IOAPIC pins.
On some modern systems not all devices have the PCI interrupt line
set, typically the cardbus bridge is affected and it would result in
different interrupt vectors used for the same IOAPIC pin.
To allow this, simplify the code by checking for an existing match first
and only allocate a new entry if that doesn't exist. For the IOAPIC case
don't bother with the reserveration on the primary CPU for ISA
interrupts, just use them.
 1.49 07-May-2008  joerg branches: 1.49.2;
Remove some prototypes that are not implemented. Make some functions
static that are only used in intr.c.
 1.48 30-Apr-2008  joerg Exploit ci->ci_isources[slot] == source to simplify code.
 1.47 29-Apr-2008  joerg Remove IOAPIC_HWMASK, it was never defined.
 1.46 28-Apr-2008  ad Don't count many items as EVCNT_TYPE_INTR because they clutter up the
systat vmstat display.
 1.45 28-Apr-2008  ad Add support for kernel preeemption to the i386 and amd64 ports. Notes:

- I have seen one isolated panic in the x86 pmap, but otherwise i386
seems stable with preemption enabled.

- amd64 is missing the FPU handling changes and it's not yet safe to
enable it there.

- The usual level for kern.sched.kpreempt_pri will be 128 once enabled
by default. For testing, setting it to 0 helps to shake out bugs.
 1.44 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.43 16-Apr-2008  cegger branches: 1.43.2; 1.43.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.42 11-Apr-2008  dyoung Remove debug printf that snuck in with my commit.
 1.41 10-Apr-2008  dyoung Add the redzones above and below the interrupt stack back to the
DIAGNOSTIC kernel.
 1.40 21-Jan-2008  dyoung branches: 1.40.6;
Add primitive routines to establish NMI handlers on i386.

TBD: synchronize (dis)establishment of handlers.
 1.39 05-Jan-2008  yamt - make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.
 1.38 26-Dec-2007  yamt - share idt entry allocation code among x86.
- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
 1.37 06-Dec-2007  ad branches: 1.37.4;
Share cpu_intr_p() with xen. Why xen has its own intr.c is a mystery.
 1.36 03-Dec-2007  ad branches: 1.36.2;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.35 28-Nov-2007  ad Use the new atomic ops.
 1.34 17-Oct-2007  garbled branches: 1.34.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.33 07-Oct-2007  joerg Merge intr.c (1.29.8.2) and ioapic.c (1.19.8.5) changes from jmcneill-pm:

Always write entries to all IOAPIC pins. The first 16 pins are
threated as ISA IRQs by default, the others like PCI IRQs. This avoids
inconsistencies based on incomplete BIOS setups. This resulted in early
ACPI SCI notifications to be lost, effectively breaking the Embedded
Controller on cold start on many notebooks.

Don't special case the IOAPIC setup between ioapic_attach and
ioapic_enable, always setup the correct redirections. Depend on
splhigh/disable_intr to stop interrupts and don't keep them masked in
the IOAPIC. This avoids unacknowleged edge interrupts and fixing the problem
of broken PS/2 keyboard when hitting keys during early boot.
 1.32 30-Aug-2007  ad branches: 1.32.2;
amd64 doesn't have opt_noredzone.h. Just test DIAGNOSTIC instead.
 1.31 29-Aug-2007  ad Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.30 29-Aug-2007  dyoung Add interrupt stack "red zones". Reserve and unmap the virtual
pages immediately above and below the x86 interrupt stack so that
both an overgrown interrupt stack and other faults produce a page
fault trap. Condition this on the historical option NOREDZONE,
for now.
 1.29 09-Jul-2007  ad branches: 1.29.4; 1.29.8; 1.29.10;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.28 21-Feb-2007  thorpej branches: 1.28.4; 1.28.6; 1.28.10; 1.28.12;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.27 09-Feb-2007  ad branches: 1.27.2;
Merge newlock2 to head.
 1.26 24-Jan-2007  hubertf Remove duplicate #includes, patch contributed in private mail
by Slava Semushin <slava.semushin@gmail.com>.

To verify that no nasty side effects of duplicate includes (or their
removal) have an effect here, I've compiled an i386/ALL kernel with
and without the patch, and the only difference in the resulting .o
files was in shifted line numbers in some assert() calls.
The comparison of the .o files was based on the output of "objdump -D".

Thanks to martin@ for the input on testing.
 1.25 08-Dec-2006  yamt - pass intrframe by-pointer, not by-value.
- make i386 and xen use per-cpu interrupt stack.

xen part is reviewed by Manuel Bouyer.
 1.24 08-Jul-2006  christos branches: 1.24.4; 1.24.6;
remove INTRDEBUG
 1.23 04-Jul-2006  christos Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.22 11-Dec-2005  christos branches: 1.22.4; 1.22.8; 1.22.16;
merge ktrace-lwp.
 1.21 29-May-2005  christos branches: 1.21.2;
Sprinkle const.
 1.20 23-Oct-2004  yamt don't reference kernel_lock directly.
 1.19 23-Oct-2004  yamt to determine if an interrupt needs to grab the kernel lock or not,
check interrupt's own ipl rather than cpu's current ipl.
 1.18 20-Aug-2004  wennmach o Split copyright into mycroft and UCB parts
o remove advertising clause from UCB part
 1.17 02-Jul-2004  mycroft Ahem. Parts of this are *clearly* derived from the old i386/isa/intr.c, so
put back the copyright from there.
 1.16 05-May-2004  kochi Fix parameters for PPB_INTERRUPT_SWIZZLE macro.

The macro expects pin = 1..4 while previously passing 0..3.
 1.15 10-Apr-2004  kochi use designated initializer for struct pic initializers.
just for readability.
 1.14 20-Feb-2004  yamt branches: 1.14.2;
don't assume that bus on intr_extra_buses has non-null pci_bridge_tag.
pchb's second bus doesn't have it.

ok'ed by Frank van der Linden.
 1.13 17-Nov-2003  fvdl Set the bridge tag correctly when adding an extra PCI bus.
 1.12 06-Nov-2003  fvdl intr_find_pcibridge returns 0 or error, not < 0.
 1.11 30-Oct-2003  fvdl * keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.10 22-Oct-2003  fvdl Only declare intr_scan_bus if NIOAPIC > 0.
 1.9 21-Oct-2003  fvdl Correctly walk up the PCI bus tree to find an interrupt match with
a swizzled pin.
 1.8 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.7 18-Sep-2003  skd Fix for ioapic irq routing. This fixes kern/22728.
Approved by fvdl.
 1.6 06-Sep-2003  fvdl Move the bulk of pci_intr_string into a seperate intr_string function. Use
that new function to print the pciide compat interrupt in pciide_machdep.c.
Share pciide_machdep.c between amd64 and i386.
 1.5 20-Aug-2003  fvdl Pass pointers to frames from assembly, do not use the 'frame on stack
as argument passed by value' trick, as gcc 3.3.x makes (valid) assumptions
about the stack that will not be true. Costs 2 instructions per trap/syscall
on i386, 4 per interrupt for MP. One instruction per trap/syscall on amd64,
2 per interrupt for MP. I expect gcc 3.3.1 to make up for this by better
optimization (it'd better..)

While here, make amd64 compile again by using subr_mbr_disk.c
 1.4 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.3 03-Mar-2003  fvdl branches: 1.3.2;
The IDT is an array of struct gate_descriptor.
 1.2 02-Mar-2003  fvdl Clean up some unneeded "mca.h" and "eisa.h" includes, make one that is
needed dependent on !__x86_64__. To be revisited later.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.3.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.3.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.14.2.2 02-Jul-2004  he Pull up revision 1.17 (requested by mycroft in ticket #582):
This is clearly derived from the old i386/isa/intr.c, so
insert the copyright from there.
 1.14.2.1 09-May-2004  jdc Pull up revision 1.16 (requested by kochi in ticket #265)

Fix parameters for PPB_INTERRUPT_SWIZZLE macro.

The macro expects pin = 1..4 while previously passing 0..3.
 1.21.2.6 21-Jan-2008  yamt sync with head
 1.21.2.5 07-Dec-2007  yamt sync with head
 1.21.2.4 27-Oct-2007  yamt sync with head.
 1.21.2.3 03-Sep-2007  yamt sync with head.
 1.21.2.2 26-Feb-2007  yamt sync with head.
 1.21.2.1 30-Dec-2006  yamt sync with head.
 1.22.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.22.8.1 11-Aug-2006  yamt sync with head
 1.22.4.1 09-Sep-2006  rpaulo sync with head
 1.24.6.1 10-Dec-2006  yamt sync with head.
 1.24.4.4 01-Feb-2007  ad Sync with head.
 1.24.4.3 12-Jan-2007  ad Sync with head.
 1.24.4.2 11-Jan-2007  ad Checkpoint work in progress.
 1.24.4.1 17-Nov-2006  ad Checkpoint work in progress.
 1.27.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.28.12.2 16-Oct-2007  garbled Sync with HEAD
 1.28.12.1 03-Oct-2007  garbled Sync with HEAD
 1.28.10.1 18-Apr-2007  thorpej Convert i386 and amd64 to the new atomic ops API.
 1.28.6.1 11-Jul-2007  mjf Sync with head.
 1.28.4.13 03-Dec-2007  ad Sync with HEAD.
 1.28.4.12 03-Dec-2007  ad Sync with HEAD.
 1.28.4.11 10-Oct-2007  ad Pull in sys/cpu.h, not machine/cpu.h.
 1.28.4.10 10-Oct-2007  ad Share cpu_intr_p() between amd64/i386.
 1.28.4.9 09-Oct-2007  ad Sync with head.
 1.28.4.8 09-Oct-2007  ad Sync with head.
 1.28.4.7 09-Oct-2007  ad Sync with head.
 1.28.4.6 23-Aug-2007  ad Fix some more bugs.
 1.28.4.5 29-Jul-2007  ad - When zeroing/copying pages, use SSE2 movtni to avoid polluting the cache.
- By default, align assembly routines on 32-byte starting boundaries.
- There are now 8 interrupt priority levels, half of which are softints.
Update intrdefs.h to match.
- Always clear/set spinlock words - removes lots of ifdefs.
- Remove the horrible ci_self150 hack that I introduced.
- Overhaul how TLB shootdown is performed. Inspired by a similar change in
OpenBSD but implemented quite differently. This should be a lot faster
but I have not benchmarked it yet.
 1.28.4.4 15-Jul-2007  ad Sync with head.
 1.28.4.3 07-Jul-2007  ad - Remove the interrupt priority range and use 'kernel RT' instead,
since only soft interrupts are threaded.
- Rename l->l_pinned to l->l_switchto. It might be useful for (re-)
implementing SA or doors.
- Simplify soft interrupt dispatch so MD code is doing as little as
possible that is new.
 1.28.4.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.28.4.1 29-Apr-2007  ad Replace another simplelock.
 1.29.10.3 23-Mar-2008  matt sync with HEAD
 1.29.10.2 09-Jan-2008  matt sync with HEAD
 1.29.10.1 06-Nov-2007  matt sync with HEAD
 1.29.8.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.29.8.3 03-Dec-2007  joerg Sync with HEAD.
 1.29.8.2 06-Oct-2007  joerg Getting interrupts early is better then interrupts stuck in the IOAPIC
in pending case. Always mask/unmask interrupts before touching the
IOAPIC, the worst case of unmask is still better then before as it just
means we end up with a pending flag set due to the early splhigh.
 1.29.8.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.29.4.1 03-Sep-2007  skrll Sync with HEAD.
 1.32.2.1 14-Oct-2007  yamt sync with head.
 1.34.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.34.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.36.2.2 26-Dec-2007  ad Sync with head.
 1.36.2.1 08-Dec-2007  ad Sync with head.
 1.37.4.3 23-Jan-2008  bouyer Sync with HEAD.
 1.37.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.37.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.40.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.40.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.40.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.43.4.6 11-Aug-2010  yamt sync with head.
 1.43.4.5 11-Mar-2010  yamt sync with head
 1.43.4.4 19-Aug-2009  yamt sync with head.
 1.43.4.3 16-May-2009  yamt sync with head
 1.43.4.2 04-May-2009  yamt sync with head.
 1.43.4.1 16-May-2008  yamt sync with head.
 1.43.2.2 04-Jun-2008  yamt sync with head
 1.43.2.1 18-May-2008  yamt sync with head.
 1.49.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.49.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.55.2.1 03-Jul-2008  simonb Sync with head.
 1.57.6.1 21-Nov-2010  riz Pull up following revision(s) (requested by hubertf in ticket #1403):
sys/arch/x86/conf/files.x86: revision 1.49
sys/arch/i386/i386/autoconf.c: revision 1.94
sys/arch/x86/x86/intr.c: revision 1.60
Add opt_intrdebug.h for the INTRDEBUG option, and #include it here and
there. Fixes GENERIC/i386 compilation with 'options INTRDEBUG'.
 1.57.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.57.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.57.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.58.2.7 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.58.2.6 02-May-2011  jym Sync with head.
 1.58.2.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.58.2.4 10-Jan-2011  jym Sync with HEAD
 1.58.2.3 24-Oct-2010  jym Sync with HEAD
 1.58.2.2 01-Nov-2009  jym Sync with HEAD.
 1.58.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.66.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.66.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.67.2.3 21-Apr-2011  rmind sync with head
 1.67.2.2 05-Mar-2011  rmind sync with head
 1.67.2.1 03-Jul-2010  rmind sync with head
 1.69.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.69.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.72.8.1 31-Mar-2013  riz Pull up following revision(s) (requested by chs in ticket #856):
sys/arch/x86/x86/intr.c: revision 1.75
only use db_printf() if we're actually called from DDB.
this prevents the boot-time one from pausing the boot process.
 1.72.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.72.2.1 30-Oct-2012  yamt sync with head
 1.74.2.3 03-Dec-2017  jdolecek update from HEAD
 1.74.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.74.2.1 23-Jun-2013  tls resync from head
 1.75.4.1 18-May-2014  rmind sync with head
 1.76.2.1 10-Aug-2014  tls Rebase.
 1.77.4.8 28-Aug-2017  skrll Sync with HEAD
 1.77.4.7 05-Feb-2017  skrll Sync with HEAD
 1.77.4.6 05-Dec-2016  skrll Sync with HEAD
 1.77.4.5 05-Oct-2016  skrll Sync with HEAD
 1.77.4.4 09-Jul-2016  skrll Sync with HEAD
 1.77.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.77.4.2 22-Sep-2015  skrll Sync with HEAD
 1.77.4.1 06-Jun-2015  skrll Sync with HEAD
 1.94.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.94.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.94.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.96.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.101.2.8 19-Feb-2021  martin Pull up following revision(s) (requested by knakahara in ticket #1657):

sys/arch/x86/x86/intr.c: revision 1.154 (via patch)

Fix x86's pci_intr_disestablish clean up routine. Pointed out by t-kusaba@IIJ, thanks.

Fix panic on x86 by the following code.
 1.101.2.7 14-Nov-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1437):

sys/arch/x86/x86/intr.c: revision 1.147

Fix a bug that evcnt_detach() called twice when the idt vector is full.

OK'd by knakahara.
 1.101.2.6 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.101.2.5 05-Apr-2018  martin Pull up following revision(s) (requested by christos in ticket #696):

sys/arch/amd64/amd64/vector.S: revision 1.62 (patch)
sys/arch/x86/include/intr.h: revision 1.55
sys/arch/i386/i386/vector.S: revision 1.77
sys/arch/i386/i386/db_interface.c: revision 1.82 (patch)
sys/arch/amd64/amd64/spl.S: revision 1.34 (patch)
sys/arch/amd64/amd64/db_interface.c: revision 1.33 (patch)
sys/arch/x86/x86/intr.c: revision 1.125
sys/arch/i386/i386/spl.S: revision 1.43 (patch)
sys/arch/i386/i386/machdep.c: revision 1.805 (patch)
sys/arch/x86/x86/lapic.c: revision 1.66 (patch)

Rename the DDB IPI IDT vectors for consistency. ok maxv@

Rename Xpreempt{recurse,resume} -> X{recurse,resume}_preempt so that
they fit the pattern. Also the debugger trap sniffer matches them
without adding special entries...

XXX: pullup-8.
 1.101.2.4 26-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #658):
sys/arch/x86/x86/intr.c: revision 1.124
Fix "intrctl list" causes panic while attaching MSI/MSI-X devices.
When there are devices which is already pci_intr_alloc'ed, however is not
established yet, "intrctl list" causes panic. E.g.
# while true; do intrctl list > /dev/null ; done&
# drvctl -d ixg0 && drvctl -r pci0
And add some KASSERTMSG to similar but not the same code.
Pointed out by msaitoh@n.o.
XXX pullup-8
 1.101.2.3 16-Mar-2018  martin Pull up the following revisions (via patch), requested by maxv in #635:

sys/arch/amd64/amd64/gdt.c 1.39-1.45 (patch)
sys/arch/amd64/amd64/amd64/machdep.c 1.284,1.287,1.288 (patch)
sys/arch/amd64/amd64/include/param.h 1.23 (patch)
sys/arch/amd64/include/types.h 1.53 (patch)
sys/arch/x86/include/cpu.h 1.87 (patch)
sys/arch/x86/include/pmap.h 1.73,1.74 (patch)
sys/arch/x86/x86/cpu.c 1.142 (patch)
sys/arch/x86/x86/intr.c 1.117 (partial),1.120 (patch)
sys/arch/x86/x86/pmap.c 1.276 (patch)

Initialize ist0 in cpu_init_tss.
Backport __HAVE_PCPU_AREA.
 1.101.2.2 13-Mar-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #629:

sys/arch/amd64/amd64/genassym.cf 1.63,1.64
sys/arch/amd64/amd64/locore.S 1.144
sys/arch/amd64/amd64/machdep.c 1.281-1.283
sys/arch/i386/i386/genassym.cf 1.105-1.106
sys/arch/i386/i386/locore.S 1.155
sys/arch/i386/i386/machdep.c 1.802 (adapted),1.803
sys/arch/x86/include/cpu.h 1.85
sys/arch/x86/x86/intr.c 1.115-1.116
sys/arch/x86/x86/pmap.c 1.275
sys/arch/x86/x86/sys_machdep.c 1.45
sys/arch/xen/x86/cpu.c 1.117

Stop sharing the double-fault stack.
Merge the TSS structures into one single cpu_tss structure, and
allocate it dynamically.
 1.101.2.1 13-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #493):
sys/arch/x86/include/intr.h: revision 1.53
sys/arch/x86/pci/pci_intr_machdep.c: revision 1.42
sys/arch/x86/x86/intr.c: revision 1.114 via patch
fix "intrctl list" panic when ACPI is disabled.
reviewed by cherry@n.o and tested by msaitoh@n.o, thanks.
 1.123.2.8 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.123.2.7 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.123.2.6 20-Oct-2018  pgoyette Sync with head
 1.123.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.123.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.123.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.123.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.123.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.126.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.126.2.1 10-Jun-2019  christos Sync with HEAD
 1.146.2.2 19-Feb-2021  martin Pull up following revision(s) (requested by knakahara in ticket #1209):

sys/arch/x86/x86/intr.c: revision 1.154 (via patch)

Fix x86's pci_intr_disestablish clean up routine. Pointed out by t-kusaba@IIJ, thanks.

Fix panic on x86 by the following code.
 1.146.2.1 11-Nov-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #416):

sys/arch/x86/x86/intr.c: revision 1.147

Fix a bug that evcnt_detach() called twice when the idt vector is full.

OK'd by knakahara.
 1.150.6.6 20-Apr-2020  bouyer channel %d -> chan %d, for the benefit of 'systat vm'
 1.150.6.5 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.150.6.4 19-Apr-2020  bouyer Add a struct pic * member to struct intrhand.
This will be used for interrupt_get_count()
For Xen remplace pic_type with a pointer to the pic, and add a pointer
to intrhand, in struct pintrhand
Make event_set_handler return the pointer to struct intrhand.
Don't allocate a fake intrhand in xen_intr_establish_xname(), use the
one returned by event_set_handler().
 1.150.6.3 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.150.6.2 12-Apr-2020  bouyer Get rid of xen-specific ci_x* interrupt handling:
- use the general SIR mechanism, reserving 3 more slots for IPL_VM, IPL_SCHED
and IPL_HIGH
- remove specific handling from C sources, or change to ipending
- convert IPL number to SIR number in various places
- Remove XUNMASK/XPENDING in assembly or change to IUNMASK/IPENDING
- remove Xen-specific ci_xsources, ci_xmask, ci_xunmask, ci_xpending from
struct cpu_info
- for now remove a KASSERT that there are no pending interrupts in
idle_block(). We can get there with some software interrupts pending
in autoconf XXX needs to be looked at.
 1.150.6.1 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING
 1.152.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.152.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.163.2.3 11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #832):

sys/arch/x86/x86/intr.c: revision 1.166

Fix use-after-free (source->is_type) when detecting unsharable
interrupts. Doesn't solve the interrupt conflict itself, but
avoids a panic.
 1.163.2.2 11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #821):

sys/arch/x86/x86/intr.c: revision 1.169
sys/kern/kern_softint.c: revision 1.76
sys/kern/subr_workqueue.c: revision 1.48
sys/kern/kern_idle.c: revision 1.36
sys/kern/subr_xcall.c: revision 1.38

check that l_nopreempt (preemption count) doesn't change after callbacks

check that the idle loop, soft interrupt handlers, workqueue, and xcall
callbacks do not modify the preemption count, in most cases, knowing it
should be 0 currently.

this work was originally done by simonb. cleaned up slightly and some
minor enhancement made by myself, and with discussion with riastradh@.
other callback call sites could check this as well (such as MD interrupt
handlers, or really anything that includes a callback registration. x86
version to be commited separately.)

apply some more diagnostic checks for x86 interrupts
convert intr_biglock_wrapper() into a slight less complete
intr_wrapper(), and move the kernel lock/unlock points into
the new intr_biglock_wrapper().
add curlwp->l_nopreempt checking for interrupt handlers,
including the dtrace wrapper.

XXX: has to copy the i8254_clockintr hack.

tested for a few months by myself, and recently by rin@ on both
current and netbsd-10. thanks!
 1.163.2.1 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #136):

sys/arch/x86/x86/intr.c: revision 1.164
sys/arch/x86/isa/clock.c: revision 1.41
sys/arch/x86/include/intr_private.h: revision 1.1

x86/intr: Work around sleazy clockintr with a secret frame argument.
PR kern/57197
 1.168.2.1 02-Aug-2025  perseant Sync with HEAD
 1.66 06-Oct-2022  msaitoh Print detail about misconfigured APIC ID.
 1.65 07-Oct-2021  msaitoh KNF. No functional change.
 1.64 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.63 19-Jun-2019  msaitoh branches: 1.63.2; 1.63.8;
Fix ioapic_dump_raw() to dump whole ioapic area.
 1.62 17-Jun-2019  msaitoh KNF. No functional change.
 1.61 14-Jun-2019  msaitoh - Dump LAPIC and I/O APIC correctly.
- Don't print redirect target on LAPIC.
- Fix DEST_MASK:
- DEST_MASK is not 1 bit but 2 bit.
- Add missing "\0"s to print decoded name correctly.
- Support both LAPIC and I/O APIC correctly in apic_format_redir().
- Improve output of some bits using with snprintb()'s "F\B\1" and ":\V".
 1.60 13-Jun-2019  msaitoh Whitespace fix. No functional change.
 1.59 08-Oct-2018  cherry Clean up XEN specific stuff from the apic code, and move to intr.c

No functional change.
 1.58 07-Oct-2018  cherry In the case of a shared GSI, bind will fail, so we do not attempt this.
The sharing is accomplished by demultiplexing the port event of the first
bind. This is accomplished in intr.c:intr_establish_xname()

Note that the pic_delroute() is buggy (commented suitably) for the shared
gsi case, since it will unbind reset it unconditionally, leaving the other
shared callbacks stranded.

This problem will go awaywhen we unify further with native code, as this
case is taken care of appropriately in that case.
 1.57 07-Oct-2018  cherry While we're here, fix pic->pic_delroute() to DTRT on XEN and
cleanup after itself.
 1.56 13-Dec-2017  bouyer branches: 1.56.2; 1.56.4;
Fixes for physical interrupts on Xen:
- do not cast int * to intr_handle_t *, they're not the same size
- legacy_irq is not always -1 for ioapic interrupts, test pic_type instead
- change irq2port[] to hold (port + 1) so that 0 is an invalid value
- add KASSERTs to make sure vect, port or irq values extracted from arrays are
valid (or that they are invalid before write)
- for the !ioapic case, we still need to do PHYSDEVOP_ASSIGN_VECTOR and
bind_pirq_to_evtch().

now XEN3_DOM0 boots again
 1.55 26-Nov-2017  maxv Remove unused variables.
 1.54 13-Nov-2017  nakayama Don't write a 1 to the read only RIRR bit in the IOAPIC redirection
register to fix "tlp0: filter setup and transmit timeout" observed
on Hyper-V VMs with the Legacy Network Adapter.

From OpenBSD via PR kern/49323:

https://marc.info/?l=openbsd-cvs&m=146718035432599&w=2

| Modified files:
| sys/arch/amd64/amd64: ioapic.c
| sys/arch/amd64/include: i82093reg.h
|
| Log message:
| Don't write a 1 to the RIRR bit in the IOAPIC redirection register. This bit
| is R/O, and although it should not matter what value is written there,
| Hyper-V's emulated IOAPIC interprets a write of 1 in some unexpected way and
| subsequently blocks interrupt delivery. This primarily manifests itself as
| de(4) timeouts when using Hyper-V VMs with the "Legacy Network Adapter"
| interface.

Tested both amd64 and i386 on Client Hyper-V on Windows 10.
 1.53 04-Nov-2017  cherry Retire xen/x86/intr.c and use the new xen specific glue in x86/x86/intr.c

The purpose of this change is to expose the x86/include/intr.h API
to drivers. Specifically the following functions:

void *intr_establish_xname(...);
void *intr_establish(...);
void intr_disestablish(...);

while maintaining the old API from xen/include/evtchn.h, specifically
the following functions:

int event_set_handler(...);
int event_remove_handler(...);

This is so that if things break, we can keep using the old API until
everything stabilises. This is a stepping stone towards getting the
actual XEN event callback path rework code in place - which can be
done opaquely behind the intr.h API - NetBSD/XEN specific drivers that
have been ported to the intr.h API should then work without
significant further modifications.
 1.52 27-Jul-2015  msaitoh branches: 1.52.10;
KNF.
 1.51 17-Jul-2015  msaitoh KNF. No functional change.
 1.50 27-Apr-2015  knakahara add x86 MD MSI/MSI-X support code.
 1.49 27-Apr-2015  knakahara add intr_handle_t and let pci_intr_handle_t use it.
 1.48 28-Jun-2013  jakllsch branches: 1.48.8; 1.48.10; 1.48.12; 1.48.16;
Print the ioapic version using unambiguous base.

From Felix Deichmann.
 1.47 30-Jan-2012  jakllsch branches: 1.47.6;
Need i8259.h for previous.
 1.46 30-Jan-2012  jakllsch Mask all i8259 interrupts in ioapic_enable().
Should fix PR kern/45160.
 1.45 05-Apr-2011  pgoyette branches: 1.45.4; 1.45.8;
If an ioapic doesn't really exist, don't add it to internal tables.
This is what other xxxBSDs seem to do in similar circumstances.

Addresses my PR kern/43568

OK jruoho@ in private Email
 1.44 18-Aug-2009  jmcneill branches: 1.44.4; 1.44.6;
Switch to ACPICA 20090730, and update for API changes.
 1.43 16-May-2009  ad Fix suspend/resume problem with some configurations. From drochner@.
 1.42 01-May-2009  cegger struct device * -> device_t
 1.41 22-Apr-2009  ad Always write REDHI before REDLO, since REDLO contains the mask bit.
 1.40 19-Apr-2009  ad cpuctl:

- Add interrupt shielding (direct hardware interrupts away from the
specified CPUs). Not documented just yet but will be soon.

- Redo /dev/cpu time_t compat so no kernel changes are needed.

x86:

- Make intr_establish, intr_disestablish safe to use when !cold.

- Distribute hardware interrupts among the CPUs, instead of directing
everything to the boot CPU.

- Add MD code for interrupt sheilding. This works in most cases but there is
a bug where delivery is not accepted by an LAPIC after redistribution. It
also needs re-balancing to make things fair after interrupts are turned
back on for a CPU.
 1.39 13-Feb-2009  bouyer Fix printf format for 64bit paddr_t on i386
 1.38 03-Jul-2008  drochner branches: 1.38.4; 1.38.6; 1.38.10; 1.38.14;
split device/softc for ioapic
 1.37 03-Jul-2008  drochner Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.36 11-May-2008  ad branches: 1.36.2;
Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.35 28-Apr-2008  martin branches: 1.35.2;
Remove clause 3 and 4 from TNF licenses
 1.34 25-Apr-2008  christos branches: 1.34.2;
minor restructuring.
 1.33 18-Apr-2008  cegger branches: 1.33.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.
 1.32 16-Apr-2008  cegger - use aprint_*_dev and device_xname
- use POSIX integer types
 1.31 24-Jan-2008  jmcneill branches: 1.31.6;
In ioapic_reenable, don't try to remap the apic id if it is already correct.
 1.30 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.29 09-Dec-2007  jmcneill branches: 1.29.2;
Merge jmcneill-pm branch.
 1.28 02-Dec-2007  jmcneill branches: 1.28.2; 1.28.4;
SILENCE! I kill you!
 1.27 01-Dec-2007  jmcneill aprintify
 1.26 13-Nov-2007  joerg Force all interrupts to notify the primary CPU's APIC by default,
independent on what the BIOS programmed the IOAPIC for.
 1.25 17-Oct-2007  joerg branches: 1.25.2;
Add ioapic_dump_raw, which dumps the full IOAPIC register set.
 1.24 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.23 10-Oct-2007  joerg branches: 1.23.2;
Install the default entries for the non-ISA interrupts as masked as
intended. Report by Christoph Egger.
 1.22 07-Oct-2007  joerg Merge intr.c (1.29.8.2) and ioapic.c (1.19.8.5) changes from jmcneill-pm:

Always write entries to all IOAPIC pins. The first 16 pins are
threated as ISA IRQs by default, the others like PCI IRQs. This avoids
inconsistencies based on incomplete BIOS setups. This resulted in early
ACPI SCI notifications to be lost, effectively breaking the Embedded
Controller on cold start on many notebooks.

Don't special case the IOAPIC setup between ioapic_attach and
ioapic_enable, always setup the correct redirections. Depend on
splhigh/disable_intr to stop interrupts and don't keep them masked in
the IOAPIC. This avoids unacknowleged edge interrupts and fixing the problem
of broken PS/2 keyboard when hitting keys during early boot.
 1.21 06-Oct-2007  joerg Merge from jmcneill-pm: Close a small race in the IOAPIC setup.
When changing the redirection entry for an interrupt, write the
high 32bit first. The low 32bit contain the mask bit and removing
that before setting the destionation ID can lead to lost interrupts.
 1.20 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.19 17-May-2007  yamt branches: 1.19.8; 1.19.10; 1.19.12;
merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.18 05-Mar-2007  drochner branches: 1.18.2; 1.18.4; 1.18.10;
clean up how cpus and ioapics are attached at the mainbus:
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
 1.17 09-Feb-2007  ad branches: 1.17.2;
Merge newlock2 to head.
 1.16 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.15 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.14 28-Sep-2006  bouyer - make it possible to have ACPI without IOAPIC and/or LAPIC
- make it possible for machine-specific code to provide custom R/W routines
in its i82093*.h headers
- always initialize sc->sc_pins[pin], even in the !ioapic_cold case.
No objections on port-i386 and port-amd64.
 1.13 04-Jul-2006  christos branches: 1.13.4; 1.13.6;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.12 24-Dec-2005  perry branches: 1.12.4; 1.12.8; 1.12.16;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.11 11-Dec-2005  christos merge ktrace-lwp.
 1.10 29-May-2005  christos branches: 1.10.2;
Sprinkle const.
 1.9 23-Aug-2004  fvdl Keep ioapic in the correct order in the global linked list that stores
them. Fixes cases where the ACPI SCI int has to be guessed, because there
are multiple ioapics in the system.
 1.8 13-Feb-2004  wiz Uppercase CPU, plural is CPUs.
 1.7 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.6 15-May-2003  fvdl branches: 1.6.2;
Try a little harder to find PCI buses in the MPACPI code, in a (probably
futile) attempt to get quirky ACPI implementations going.

Work around a problem with quirky MP tables for ioapic interrupt routing.
 1.5 11-May-2003  fvdl Add a function that dumps ioapic redir state, for our debugging pleasure.
 1.4 04-May-2003  fvdl Block level-triggered interrupts at the ioapic if they are deferred.
Avoids interrupt storms seen on some systems. Many thanks to
Stoned Elipot for testing.
 1.3 01-Apr-2003  thorpej Use PAGE_SIZE rather than NBPG.
 1.2 04-Mar-2003  fvdl Use read_psl and write_psl.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.6.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.6.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.6.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.6.2.1 03-Aug-2004  skrll Sync with HEAD
 1.10.2.9 04-Feb-2008  yamt sync with head.
 1.10.2.8 21-Jan-2008  yamt sync with head
 1.10.2.7 07-Dec-2007  yamt sync with head
 1.10.2.6 15-Nov-2007  yamt sync with head.
 1.10.2.5 27-Oct-2007  yamt sync with head.
 1.10.2.4 03-Sep-2007  yamt sync with head.
 1.10.2.3 26-Feb-2007  yamt sync with head.
 1.10.2.2 30-Dec-2006  yamt sync with head.
 1.10.2.1 21-Jun-2006  yamt sync with head.
 1.12.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.12.8.1 11-Aug-2006  yamt sync with head
 1.12.4.1 09-Sep-2006  rpaulo sync with head
 1.13.6.2 10-Dec-2006  yamt sync with head.
 1.13.6.1 22-Oct-2006  yamt sync with head
 1.13.4.2 06-Feb-2007  ad Quieten noisy boot messages.
 1.13.4.1 18-Nov-2006  ad Sync with head.
 1.17.2.2 16-Apr-2007  ad Don't panic if the pic lock is held.
 1.17.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.18.10.3 16-Oct-2007  garbled Sync with HEAD
 1.18.10.2 03-Oct-2007  garbled Sync with HEAD
 1.18.10.1 22-May-2007  matt Update to HEAD.
 1.18.4.1 11-Jul-2007  mjf Sync with head.
 1.18.2.5 03-Dec-2007  ad Sync with HEAD.
 1.18.2.4 23-Oct-2007  ad Sync with head.
 1.18.2.3 10-Oct-2007  rmind Sync with HEAD.
 1.18.2.2 09-Oct-2007  ad Sync with head.
 1.18.2.1 29-Apr-2007  ad Replace another simplelock.
 1.19.12.3 18-Oct-2007  yamt sync with head.
 1.19.12.2 14-Oct-2007  yamt sync with head.
 1.19.12.1 06-Oct-2007  yamt sync with head.
 1.19.10.3 23-Mar-2008  matt sync with HEAD
 1.19.10.2 09-Jan-2008  matt sync with HEAD
 1.19.10.1 06-Nov-2007  matt sync with HEAD
 1.19.8.16 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.19.8.15 02-Dec-2007  jmcneill Sync with HEAD.
 1.19.8.14 01-Dec-2007  jmcneill Sync with HEAD.
 1.19.8.13 14-Nov-2007  joerg Sync with HEAD.
 1.19.8.12 06-Nov-2007  joerg Refactor PNP API:
- Make suspend/resume directly a device functionality. It consists of
three layers (class logic, device logic, bus logic), all of them being
optional. This replaces D0/D3 transitions.
- device_is_active returns true if the device was not disabled and was
not suspended (even partially), device_is_enabled returns true if the
device was enabled.
- Change pnp_global_transition into pnp_system_suspend and
pnp_system_resume. Before running any suspend/resume handlers, check
that all currently attached devices support power management and bail
out otherwise. The latter is not done for the shutdown/panic case.
- Make the former bus-specific generic network handlers a class handler.
- Make PNP message like volume up/down/toogle PNP events. Each device
can register what events they are interested in and whether the handler
should be global or not.
- Introduce device_active API for devices to mark themselve in use from
either the system or the device. Use this to implement the idle handling
for audio and input devices. This is intended to replace most ad-hoc
watchdogs as well.
- Fix somes situations in which audio resume would lose mixer settings.
- Make USB host controllers better deal with suspend in the light of
shared interrupts.
- Flush filesystem cache on suspend.
- Flush disk caches on suspend. Put ATA disks into standby on suspend as
well.
- Adopt drivers to use the new PNP API.
- Fix a critical bug in the generic cardbus layer that made D0->D3
break.
- Fix ral(4) to set if_stop.
- Convert cbb(4) to the new PNP API.
- Apply the PCI Express SCI fix on resume again.
 1.19.8.11 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.19.8.10 10-Oct-2007  joerg Install the default entries for the non-ISA interrupts as masked as
intended. Report by Christoph Egger.
 1.19.8.9 07-Oct-2007  joerg Sync with HEAD.
 1.19.8.8 02-Oct-2007  joerg Sync with HEAD.
 1.19.8.7 01-Oct-2007  joerg Reorder the writes of the IOAPIC redirection register. Writing the low
half first breaks if there is already a level interrupt waiting and the
destination in the high part is not valid yet.

Ensure that on ACPI resume, the possibly changed APIC IDs are written
back to match what the rest of the world expects.
 1.19.8.6 30-Sep-2007  joerg Add a second function ioapic_reenable that restores all vectors.
 1.19.8.5 30-Sep-2007  joerg Change ACPI and IOAPIC initialisation to better deal with early
interrupts.

(a) Split the ACPI subsystem initialisation into the hardware init and
ACPI enabling on the one side and the event and SCI setup on the other
side. Process the ACPI interrupt tables in between. Strictly speaking is
this a violation of the ACPI specs as the switch to APIC mode requires
evaluation of an ACPI object and that could depend on the SCI. In
practise, the SCI never worked at this point and before the removal of
the defered setup it wasn't even created.

(b) Always write entries to all IOAPIC pins. The first 16 pins are
threated as ISA IRQs by default, the others like PCI IRQs. This avoids
inconsistencies based on incomplete BIOS setups. This resulted in early
ACPI SCI notifications to be lost, effectively breaking the Embedded
Controller on cold start on many notebooks.

Don't special case the IOAPIC setup between ioapic_attach and
ioapic_enable, always setup the correct redirections. Depend on
splhigh/disable_intr to stop interrupts and don't keep them masked in the
IOAPIC. This avoids unacknowleged edge interrupts and fixing the problem
of broken PS/2 keyboard when hitting keys during early boot.
 1.19.8.4 07-Sep-2007  jmcneill Remove debug printf that spits out the apicbase.
 1.19.8.3 05-Aug-2007  jmcneill Certain devices either don't require a power handler, or are restored
on resume outside of the pnp power management framework. For such devices,
introduce the null power handler, pnp_generic_power.
 1.19.8.2 05-Aug-2007  jmcneill No need to be so loud on resume.
 1.19.8.1 03-Aug-2007  jmcneill Pull in power management changes from private branch.
 1.23.2.3 18-Nov-2007  bouyer Sync with HEAD
 1.23.2.2 25-Oct-2007  bouyer Sync with HEAD.
 1.23.2.1 17-Oct-2007  bouyer amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by
Mathieu Ropert in 2006.
DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to
sysinst if you're lucky. Often it panics because a runable LWP has
a NULL stack (really, it's all of l->l_addr which is has been zeroed out
while the process was on the queue !)
TODO:
- bug fixes :)
- Most of the xpq_* functions should be shared with xen/i386
- The xen/i386 assembly bootstrap code should be remplaced with the C
version in xenamd64/amd64/xpmap.c
- see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
 1.25.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.25.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.25.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.25.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.28.4.1 11-Dec-2007  yamt sync with head.
 1.28.2.1 26-Dec-2007  ad Sync with head.
 1.29.2.1 08-Jan-2008  bouyer Sync with HEAD
 1.31.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.31.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.33.2.1 18-May-2008  yamt sync with head.
 1.34.2.4 19-Aug-2009  yamt sync with head.
 1.34.2.3 20-Jun-2009  yamt sync with head
 1.34.2.2 04-May-2009  yamt sync with head.
 1.34.2.1 16-May-2008  yamt sync with head.
 1.35.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.35.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.36.2.1 03-Jul-2008  simonb Sync with head.
 1.38.14.1 21-Apr-2010  matt sync to netbsd-5
 1.38.10.4 02-May-2011  jym Sync with head.
 1.38.10.3 01-Nov-2009  jym Sync with HEAD.
 1.38.10.2 31-May-2009  jym Sync with HEAD.
 1.38.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.38.6.2 21-Apr-2012  riz Pull up following revision(s) (requested by jakllsch in ticket #1747):
sys/arch/x86/x86/ioapic.c: revision 1.46
sys/arch/x86/x86/ioapic.c: revision 1.47
Mask all i8259 interrupts in ioapic_enable().
Should fix PR kern/45160.
Need i8259.h for previous.
 1.38.6.1 29-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/x86/ioapic.c: revision 1.39
sys/arch/x86/x86/mpbios.c: revision 1.53
Fix printf format for 64bit paddr_t on i386
 1.38.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.38.4.1 03-Mar-2009  skrll Sync with HEAD.
 1.44.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.44.4.1 21-Apr-2011  rmind sync with head
 1.45.8.1 18-Feb-2012  mrg merge to -current.
 1.45.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.45.4.1 17-Apr-2012  yamt sync with head
 1.47.6.2 03-Dec-2017  jdolecek update from HEAD
 1.47.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.48.16.1 03-Jan-2018  snj Pull up following revision(s) (requested by nakayama in ticket #1527):
sys/arch/amd64/include/i82093reg.h: revision 1.9
sys/arch/i386/include/i82093reg.h: revision 1.11
sys/arch/x86/x86/ioapic.c: revision 1.54
Don't write a 1 to the read only RIRR bit in the IOAPIC redirection
register to fix "tlp0: filter setup and transmit timeout" observed
on Hyper-V VMs with the Legacy Network Adapter.
From OpenBSD via PR kern/49323:
https://marc.info/?l=openbsd-cvs&m=146718035432599&w=2
 1.48.12.1 03-Jan-2018  snj Pull up following revision(s) (requested by nakayama in ticket #1527):
sys/arch/amd64/include/i82093reg.h: revision 1.9
sys/arch/i386/include/i82093reg.h: revision 1.11
sys/arch/x86/x86/ioapic.c: revision 1.54
Don't write a 1 to the read only RIRR bit in the IOAPIC redirection
register to fix "tlp0: filter setup and transmit timeout" observed
on Hyper-V VMs with the Legacy Network Adapter.
From OpenBSD via PR kern/49323:
https://marc.info/?l=openbsd-cvs&m=146718035432599&w=2
 1.48.10.2 22-Sep-2015  skrll Sync with HEAD
 1.48.10.1 06-Jun-2015  skrll Sync with HEAD
 1.48.8.1 03-Jan-2018  snj Pull up following revision(s) (requested by nakayama in ticket #1527):
sys/arch/amd64/include/i82093reg.h: revision 1.9
sys/arch/i386/include/i82093reg.h: revision 1.11
sys/arch/x86/x86/ioapic.c: revision 1.54
Don't write a 1 to the read only RIRR bit in the IOAPIC redirection
register to fix "tlp0: filter setup and transmit timeout" observed
on Hyper-V VMs with the Legacy Network Adapter.
From OpenBSD via PR kern/49323:
https://marc.info/?l=openbsd-cvs&m=146718035432599&w=2
 1.52.10.2 10-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1769):

sys/arch/x86/x86/ioapic.c: revision 1.66
sys/arch/x86/include/i82093reg.h: revision 1.7

Print detail about misconfigured APIC ID.

IOAPIC_ID_MASK is 8 bits these days. Fixes PR kern/54276.
 1.52.10.1 21-Nov-2017  martin Pull up following revision(s) (requested by nakayama in ticket #359):
sys/arch/amd64/include/i82093reg.h: revision 1.9
sys/arch/x86/x86/ioapic.c: revision 1.54
sys/arch/i386/include/i82093reg.h: revision 1.11
Don't write a 1 to the read only RIRR bit in the IOAPIC redirection
register to fix "tlp0: filter setup and transmit timeout" observed
on Hyper-V VMs with the Legacy Network Adapter.
From OpenBSD via PR kern/49323:
https://marc.info/?l=openbsd-cvs&m=146718035432599&w=2
Modified files:
sys/arch/amd64/amd64: ioapic.c
sys/arch/amd64/include: i82093reg.h
Log message:
Don't write a 1 to the RIRR bit in the IOAPIC redirection register. This bit
is R/O, and although it should not matter what value is written there,
Hyper-V's emulated IOAPIC interprets a write of 1 in some unexpected way and
subsequently blocks interrupt delivery. This primarily manifests itself as
de(4) timeouts when using Hyper-V VMs with the "Legacy Network Adapter"
interface.
Tested both amd64 and i386 on Client Hyper-V on Windows 10.
 1.56.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.56.4.1 10-Jun-2019  christos Sync with HEAD
 1.56.2.1 20-Oct-2018  pgoyette Sync with head
 1.63.8.1 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.63.2.1 10-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1536):

sys/arch/x86/x86/ioapic.c: revision 1.66
sys/arch/x86/include/i82093reg.h: revision 1.7

Print detail about misconfigured APIC ID.

IOAPIC_ID_MASK is 8 bits these days. Fixes PR kern/54276.
 1.30 01-Dec-2019  ad Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.
 1.29 23-Nov-2019  ad cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.28 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.27 08-Feb-2017  maxv branches: 1.27.14;
Remove gdt_reload_cpu. GDTR takes a VA as base, and in our x86
implementation this VA is per-cpu and does not change; there is therefore
no need to remotely reload GDTR.
 1.26 20-Jul-2014  uebayasi branches: 1.26.4; 1.26.8; 1.26.12;
KNF.
 1.25 20-Jul-2014  uebayasi ipifunc[]: Comment IPI constant names for grep'ability. Constify.
 1.24 19-May-2014  rmind Implement MI IPI interface with cross-call support.
 1.23 19-Feb-2014  dsl branches: 1.23.2;
Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.
 1.22 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.21 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.20 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.19 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.18 22-Jun-2010  rmind branches: 1.18.6; 1.18.8; 1.18.18; 1.18.22;
Implement high priority (XC_HIGHPRI) xcall(9) mechanism - a facility
to execute functions from software interrupt context, at SOFTINT_CLOCK.
Functions must be lightweight. Will be used for passive serialization.

OK ad@.
 1.17 25-Apr-2010  ad Nothing uses x86_multicast_ipi() right now and it complicates many
CPU support, so remove it.
 1.16 05-Oct-2009  rmind branches: 1.16.2; 1.16.4;
Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.15 18-Aug-2009  jmcneill Switch to ACPICA 20090730, and update for API changes.
 1.14 30-Mar-2009  rmind Merge i386 and amd64 ipifuncs.c into x86. No functional changes intended.
XXX: fpu #ifdefs are ugly (should be revisited at some point).
 1.13 11-May-2008  ad branches: 1.13.6; 1.13.12;
Use ci_cpumask.
 1.12 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.11 28-Apr-2008  martin branches: 1.11.2;
Remove clause 3 and 4 from TNF licenses
 1.10 25-Apr-2008  ad branches: 1.10.2;
Include null IPI functions if !MULTIPROCESSOR.
 1.9 16-Apr-2008  cegger branches: 1.9.2;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.8 28-Nov-2007  ad branches: 1.8.14;
Use the new atomic ops.
 1.7 11-Dec-2005  christos branches: 1.7.30; 1.7.36; 1.7.48; 1.7.50; 1.7.56;
merge ktrace-lwp.
 1.6 13-Jan-2005  fvdl branches: 1.6.10;
* Wrap IPI sending in splclock(), since an interrupt at IPL_CLOCK or lower
may cause IPIs.
* Make broadcast IPIs go through x86_ipi() as well, so that they wait for
the APIC to be ready too.

From Stephan Uphoff.
 1.5 13-Feb-2004  wiz Uppercase CPU, plural is CPUs.
 1.4 26-Oct-2003  yamt use ffs() rather than handcrafted one for ipi bit search.
 1.3 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.2 01-Mar-2003  fvdl branches: 1.2.2;
Remove accidentally enabled debug printf.
From Enami.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.6.10.1 07-Dec-2007  yamt sync with head
 1.7.56.1 08-Dec-2007  mjf Sync with HEAD.
 1.7.50.1 09-Jan-2008  matt sync with HEAD
 1.7.48.1 03-Dec-2007  joerg Sync with HEAD.
 1.7.36.1 18-Apr-2007  thorpej Convert i386 and amd64 to the new atomic ops API.
 1.7.30.1 03-Dec-2007  ad Sync with HEAD.
 1.8.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.2.1 18-May-2008  yamt sync with head.
 1.10.2.5 11-Aug-2010  yamt sync with head.
 1.10.2.4 11-Mar-2010  yamt sync with head
 1.10.2.3 19-Aug-2009  yamt sync with head.
 1.10.2.2 04-May-2009  yamt sync with head.
 1.10.2.1 16-May-2008  yamt sync with head.
 1.11.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.13.12.3 24-Oct-2010  jym Sync with HEAD
 1.13.12.2 01-Nov-2009  jym Sync with HEAD.
 1.13.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.13.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.16.4.2 03-Jul-2010  rmind sync with head
 1.16.4.1 30-May-2010  rmind sync with head
 1.16.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.16.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.18.22.1 18-May-2014  rmind sync with head
 1.18.18.2 03-Dec-2017  jdolecek update from HEAD
 1.18.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.18.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.6.2 23-Jul-2011  cherry Remove the bogus TLB ipi wrapper code. We don't use it in xen anyway.
This syncs back, with -current.
 1.18.6.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.23.2.1 10-Aug-2014  tls Rebase.
 1.26.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.26.8.1 20-Mar-2017  pgoyette Sync with HEAD
 1.26.4.1 28-Aug-2017  skrll Sync with HEAD
 1.27.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.27.14.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.71 25-Dec-2018  mlelstv Make ipmi driver available to other platforms.
Add ACPI attachment.
 1.70 17-Dec-2018  christos Back to using aprint_error() and get more info about the error so we can
figure out why we can't map the registers.
 1.69 17-Dec-2018  gson Don't call aprint_error_dev() with a NULL dev. Fixes PR port-amd64/53789.
 1.68 01-Dec-2018  msaitoh Use DVF_ATTACH_INPROGRESS.
 1.67 14-Oct-2018  jdolecek remove M_CANFAIL flag for malloc(9) - it was completely ignored, so had
actually no effect
 1.66 22-Jun-2017  joerg branches: 1.66.4; 1.66.6;
Use a proper format string.
 1.65 21-Jun-2017  christos knf, fix more error and debugging messages.
 1.64 07-Jul-2016  msaitoh branches: 1.64.10;
KNF. Remove extra spaces. No functional change.
 1.63 03-Apr-2016  mlelstv Only fix up IPMI on ProLiant Microserver when address if set.
Don't assume a specific preconfigured address, just clear the lower bits.
 1.62 28-Aug-2015  joerg Cast to uint64_t first in case the input is negative.
 1.61 13-Apr-2015  riastradh Convert arch/x86 to use <sys/rnd*.h>. Omit needless includes.
 1.60 29-Dec-2014  mlelstv Avoid NULL pointer dereference if SMBIOS key "system-product" does not
exist.
 1.59 22-Sep-2014  nat branches: 1.59.2;
Make remote access cards on HP ProLiant microservers N36L,
N40L and N54L work with ipmi(4).

Addresses PR 48233.

This commit was approved by christos@
 1.58 21-Sep-2014  christos fix leak.
 1.57 10-Aug-2014  tls branches: 1.57.2;
Merge tls-earlyentropy branch into HEAD.
 1.56 17-Oct-2013  christos branches: 1.56.2;
__USE a debugging variable
 1.55 12-Aug-2013  yamt fix validness check of sensor value

this change is intended to mirror what ipmitool does.
(their macros for these bits are IS_READING_UNAVAILABLE and
IS_SCANNING_DISABLED.)

see also:
second-gen-interface-spec-v2-rev1-4
Table 35-15, Get Sensor Reading Command

might fix PR/46833 from Francois Tigeot

reviewed by Masanobu SAITOH and Tom Ivar Helbekkmo
tested by Tom Ivar Helbekkmo
 1.54 19-Mar-2013  msaitoh branches: 1.54.6;
KNF a bit.
 1.53 04-Apr-2012  njoly branches: 1.53.2;
For 1's and 2's complement sensor data, convert unsigned raw data to a
signed type before proceeding any computation. Fix handling of
negative temperatures that can be set for critmin/warnmin limits.
 1.52 02-Feb-2012  tls branches: 1.52.2;
Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.51 10-Jan-2012  njoly Call aprint_naive for quiet boot message.
 1.50 11-Aug-2010  pgoyette branches: 1.50.8; 1.50.12;
Keep condvar wmesg within 8-char limit.
 1.49 01-Aug-2010  mlelstv For sensors with inverted value (1/x function), exchange lower and upper
limits so that {warn,crit}min < curval < {warn,crit}max.
 1.48 01-Aug-2010  mlelstv sc_cmd_mtx protects a command sequence, no longer abuse it for delays.

Initialize mutexes and condition variables in attach and not in the
asynchronously started kernel thread.

Increase BMC spin timeout from 5ms to 15ms, this is necessary to detect
the BMC in a HP ML110G4 reliably.

Implement non-linear sensors as defined in IPMIv2.0 with some crude
32.32 fixed point arithmetic. This adds some small errors as logarithm
and power functions are only approximated.

Fix sensor index mapping so that sensor limits are computed correctly.
 1.47 17-Jul-2010  pgoyette Register ipmi(4) with power management subsystem so we might have a chance
of suspending. Suspending will still be denied if the watchdog is active.

As discussed on tech-kern@

XXX The pmf handlers for this and all other watchdogs should be factored
XXX out into a common handler for a generic wdog(4) pseudo-device, but
XXX that's left for the future.
 1.46 10-Apr-2010  pgoyette Save initial, boot-time limit values, and restore them upon request
from sysmon_envsys(9).
 1.45 22-Mar-2010  dyoung A lot of good it does to printf() a bus_space_tag_t. Don't do it.
 1.44 14-Mar-2010  pgoyette branches: 1.44.2;
Remove setting of the edata->monitor since that member no longer exists.
 1.43 14-Feb-2010  pgoyette Adapt to changes in sysmon's limit structure.
 1.42 31-Jan-2010  mlelstv branches: 1.42.2;
Release buffer in case a receive failed.
 1.41 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.40 20-Jul-2009  dyoung In ipmi_match(), initialize the condition variable sc_cmd_sleep.
Fixes a bug in previous, exposed by

# drvctl -d ipmi0
# drvctl -r -a ipmibus mainbus0
*lockdebug panic here*
 1.39 20-Jul-2009  dyoung Overhaul synchronization in ipmi(4): synchronize all access to
device registers with a mutex. Convert tsleep/wakeup calls to
cv_wait/cv_signal.

Do not repeatedly malloc/free tiny buffers for sending/receiving
commands, but reserve a command buffer in the softc.

Tickle the watchdog in the sensors-refreshing thread.

I am fairly certain that after the device is attached, every register
access happens in the sensors-refreshing thread. Moreover, no
software interrupt touches any register, now. So I may get rid of
the mutex that protects register accesses, sc_cmd_mtx.
 1.38 11-Jul-2009  pgoyette Store the limit values directly in the driver-private sensor data since
we don't have access to sysmon_envsys(8)'s copy at refresh time. (The
refresh is driven completely within the driver, and sysmon is uninvolved.)

Resolves unexpected alarms (as reported by David Young) such as over-limit
alarms on fan sensors which have only lower limit values.
 1.37 09-Jul-2009  pgoyette Don't extract upper/lower limit values if the values are not valid.

Correct comparison of cur_value against lower-limits.
 1.36 29-Jun-2009  pgoyette Adapt to new features available in sysmon_envsys:

1) expose the built-in limits to user-land (via envstat(8)), and
2) allow user-specified limits to override the built-in limits.

No comments received from current-users@ over 2-week period.
 1.35 01-Jun-2009  pgoyette Replace a flag that was accidentally removed.
 1.34 01-Jun-2009  pgoyette Since we no longer have individual events for each sensor value limit,
we don't need individual flag bits. Clean up extra bit definitions.
Bump kernel version - welcome to 5.99.13
 1.33 24-Apr-2009  ad - Attach via the kthread so boot is not so slow on some systems with IPMI.
- NOWAIT -> WAITOK
 1.32 07-Apr-2009  dyoung When ipmi0 detaches, free all of the ipmi_sensor's on the (global!)
ipmi_sensor_list.
 1.31 07-Apr-2009  dyoung In ipmi_detach(), don't sysmon_envsys_destroy(), but just _unregister():
_unregister() calls _destroy().
 1.30 07-Apr-2009  dyoung Add a device-detachment hook for ipmi(4).
 1.29 22-Feb-2009  dholland Improve some cryptic warning messages; from a patch attached to PR 38019
by Greg A. Woods, with a couple adjustments. Compile-tested only, but should
not be able to break anything.
 1.28 20-Dec-2008  taca branches: 1.28.2;
Change max retry time to 90 seconds from 5 seconds.
It is processed in background to detect ipmi.

Now my ML115 G1 detects ipmi as NetBSD 4_STABLE.

Discussed with Matthias Scheler (tron) with private mail
and approved by him.
 1.27 15-Dec-2008  tron Keep trying to attach ipmi(4) in the background for five seconds.
NetBSD now detects the IPMI support in a HP Proliant ML110 G4 again.
This fixes PR kern/40065 by myself.
 1.26 19-Nov-2008  ad Fix sloppy device_private conversion by cegger@ that prevented systems
with IPMI from booting for the last two weeks.
 1.25 03-Nov-2008  christos If ipmi failed to attach we would crash because we would end up using callouts
while cold. If cold, wait 10 times longer, and if we spinout fail instead of
trying to poll. Makes my machine boot again.
 1.24 03-Nov-2008  cegger unbreak previous. this change wasn't intended.
 1.23 03-Nov-2008  cegger The functions called from ipmi_match use the DEVNAME macro. But the softc is allocated on the stack and the accessed sc_dev member is not initialized.

Initialize the sc_dev.dv_xname in ipmi_match, which is enough to make DEVNAME work. Finally this also allows the device_t/softc split.
 1.22 03-Nov-2008  cegger ipmi_match: remove one indentation level
 1.21 30-Oct-2008  cegger branches: 1.21.2;
prepare for device_t/softc split, but actually don't do it: ipmi_match() wants to access sc_dev before we have chance to initialize it
 1.20 23-Sep-2008  ad branches: 1.20.2;
Speed up ipmi attach a bit, although boot times on my workstation still suck:

before 18s
after 14s
without ipmi 8s
 1.19 08-Sep-2008  pgoyette Separate checking of sensor value vs threshold/limit value into two
routines, so we can distinguish between an over-limit vs under-limit
condition. Set sensor state appropriately based on which threshold
is exceeded.

To do: come up with a means of detecting non-existent fans vs broken
fans. Currently, both report a valid value of "0 RPM" at least on
some platforms.

OK garbled@
Tested by simonb@
 1.18 17-Apr-2008  cegger branches: 1.18.4; 1.18.6; 1.18.10;
Add missing bracket. Fixes build for i386 ALL kernel.
 1.17 16-Apr-2008  cegger - use aprint_*_dev and device_xname
- use POSIX integer types
 1.16 04-Jan-2008  ad branches: 1.16.6;
sys/lock.h isn't needed here.
 1.15 16-Nov-2007  xtraeme branches: 1.15.6;
Extend the envsys2 API (one more time, sorry) as defined in:

http://mail-index.netbsd.org/tech-kern/2007/11/09/0001.html

sysmon_envsys_create() and sysmon_envsys_destroy() were added to
create/destroy sysmon_envsys objects (and its TAILQ/LIST for sensors/events).

sysmon_envsys_sensor_attach() and sysmon_envsys_sensor_detach() were
added to attach/detach sensors to a specified sysmon_envsys device.

The events framework is now per device and configurable via the
ENVSYS_SETDICTIONARY ioctl or /etc/envsys.conf and envstat(8).

Update all users and documentation to reflect these changes.
 1.14 17-Oct-2007  garbled branches: 1.14.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.13 23-Sep-2007  bouyer branches: 1.13.2;
Don't attach ipmi if GETID failed. From Nicolas Joly.
 1.12 13-Aug-2007  briggs branches: 1.12.2; 1.12.4;
Check for duplicate sensor names in the IPMI table. If a duplicate name
is found, try to make it unique by appending a count (1-99) to the sensor
description (truncating, if necessary). This takes my Dell PowerEdge 1800
from:
Temp: 40.000 degC
VRD 1 Temp: 35.000 degC
VRD 0 Temp: 39.000 degC
Planar Temp: 35.000 degC
Ambient Temp: 20.000 degC
Fan 2: 1500 RPM
Fan 1: 1425 RPM
CMOS Battery: 3.057 V
Intrusion: ON
Status : ON

to:
Temp3: 40.000 degC
Temp2: 40.000 degC
VRD 1 Temp: 35.000 degC
VRD 0 Temp: 39.000 degC
Planar Temp: 35.000 degC
Ambient Temp: 20.000 degC
Temp1: 41.000 degC
Temp: 43.000 degC
Fan 2: 1500 RPM
Fan 1: 1425 RPM
CMOS Battery: 3.057 V
Intrusion: ON
Status 1: ON
Status : ON
 1.11 09-Jul-2007  ad branches: 1.11.4; 1.11.8;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.10 04-Jul-2007  bouyer Start with all sensors in ENVSYS_SINVALID state, and switch to ENVSYS_SVALID
(or other, depending on result) once the sensor has been read.
This way envstat(8) won't show sensors which have not yet their correct value.
 1.9 03-Jul-2007  xtraeme ipmi_sensor_status: if state is ok return ENVSYS_SVALID and not
ENVSYS_WARN_OK, the latter is deprecated on envsys2.
 1.8 03-Jul-2007  briggs Use PRIx64 for a 64-bit quantity instead of llx in a debug print.
 1.7 02-Jul-2007  xtraeme - On Intrusion sensors, if something is not ok return a critical event.
- On Power Supply sensors:
* if power supply is not installed, return a critical event.
* if power supply is installed but not powered on, return a
warnover event.
 1.6 01-Jul-2007  xtraeme Imported envsys 2, a brief description of the new features:
(Part 2: drivers)

* Support for detachable sensors.
* Cleaned up the API for simplicity and efficiency.
* Ability to send capacity/critical/warning events to powerd(8).
* Adapted all the code to the new locking order.
* Compatibility with the old envsys API: the ENVSYS_GTREINFO
and ENVSYS_GTREDATA ioctl(2)s are supported.
* Added support for a 'dictionary based communication channel' between
sysmon_power(9) and powerd(8), that means there is no 32 bytes event
size restriction anymore.
* Binary compatibility with old envstat(8) and powerd(8) via COMPAT_40.
* All drivers with the n^2 gtredata bug were fixed, PR kern/36226.

Tested by:

blymn: smsc(4).
bouyer: ipmi(4), mfi(4).
kefren: ug(4).
njoly: viaenv(4), adt7463.c.
riz: owtemp(4).
xtraeme: acpiacad(4), acpibat(4), acpitz(4), aiboost(4), it(4), lm(4).
 1.5 15-Feb-2007  ad branches: 1.5.6; 1.5.8; 1.5.14;
Replace some uses of lockmgr() / simplelocks.
 1.4 16-Nov-2006  christos branches: 1.4.2; 1.4.4; 1.4.6; 1.4.8; 1.4.10;
__unused removal on arguments; approved by core.
 1.3 10-Nov-2006  christos convert variable allocation to constant.
 1.2 12-Oct-2006  christos branches: 1.2.2;
- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.1 01-Oct-2006  bouyer Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
 1.2.2.3 10-Dec-2006  yamt sync with head.
 1.2.2.2 22-Oct-2006  yamt sync with head
 1.2.2.1 12-Oct-2006  yamt file ipmi.c was added on branch yamt-splraiseipl on 2006-10-22 06:05:16 +0000
 1.4.10.2 03-Jun-2008  skrll Sync with netbsd-4.
 1.4.10.1 30-Sep-2007  wrstuden Catch up on netbsd-4 as of a few days ago.
 1.4.8.3 15-Oct-2007  riz Pull up following revision(s) (requested by bouyer in ticket #1847):
sys/arch/x86/x86/ipmi.c: revision 1.13
Don't attach ipmi if GETID failed. From Nicolas Joly.
 1.4.8.2 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1621):
sys/arch/i386/conf/GENERIC: revision 1.787 via patch
share/man/man4/Makefile: revision 1.407 via patch
distrib/sets/lists/man/mi: revision 1.936 via patch
share/man/man4/ipmi.4: revision 1.1 via patch
sys/arch/i386/i386/bios32.c: revision 1.11 via patch
sys/dev/DEVNAMES: revision 1.221 via patch
sys/arch/x86/x86/ipmi.c: revision 1.1 via patch
sys/arch/i386/i386/mainbus.c: revision 1.65 via patch
sys/arch/x86/include/smbiosvar.h: revision 1.1 via patch
sys/arch/x86/include/ipmivar.h: revision 1.1 via patch
sys/arch/x86/conf/files.x86: revision 1.20 via patch
sys/arch/i386/conf/files.i386: revision 1.293 via patch
Add ipmi(4) driver, from OpenBSD. This requires SMBios support, so add
SMBios detection and mapping to bios32.c, also from OpenBSD (for now this
is only compiled in if ipmi(4) is configured). The sensors and watchdog are
accessible though envsys(4).
Works on i386; some work is needed on amd64 to access the BIOS. It would
eventually work on Xen if the SMBios is accessible (to be tested).
Add manpage for new ipmi driver.
Claim ipmi.
 1.4.8.1 16-Nov-2006  ghen file ipmi.c was added on branch netbsd-3 on 2007-01-08 16:36:20 +0000
 1.4.6.7 21-Jan-2008  yamt sync with head
 1.4.6.6 07-Dec-2007  yamt sync with head
 1.4.6.5 27-Oct-2007  yamt sync with head.
 1.4.6.4 03-Sep-2007  yamt sync with head.
 1.4.6.3 26-Feb-2007  yamt sync with head.
 1.4.6.2 30-Dec-2006  yamt sync with head.
 1.4.6.1 16-Nov-2006  yamt file ipmi.c was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.4.4.2 19-Dec-2007  ghen Pull up following revision(s) (requested by briggs in ticket #989):
sys/arch/amd64/conf/GENERIC: revision 1.151
sys/arch/x86/x86/ipmi.c: revision 1.12
sys/dev/DEVNAMES: revision 1.228
sys/arch/amd64/amd64/bios32.c: revision 1.6
sys/arch/x86/x86/ipmi.c: revision 1.8
sys/arch/amd64/conf/files.amd64: revision 1.39 via patch
sys/arch/amd64/amd64/mainbus.c: revision 1.17
Use PRIx64 for a 64-bit quantity instead of llx in a debug print.
Add (commented-out) support for IPMI on amd64--pretty much copied straight
from i386.
Check for duplicate sensor names in the IPMI table. If a duplicate name
is found, try to make it unique by appending a count (1-99) to the sensor
description (truncating, if necessary).
 1.4.4.1 27-Sep-2007  xtraeme Pull up following revision(s) (requested by bouyer in ticket #898):
sys/arch/x86/x86/ipmi.c: revision 1.13 (via patch)

Don't attach ipmi if GETID failed. From Nicolas Joly.
 1.4.2.2 18-Nov-2006  ad Sync with head.
 1.4.2.1 16-Nov-2006  ad file ipmi.c was added on branch newlock2 on 2006-11-18 21:29:39 +0000
 1.5.14.1 03-Oct-2007  garbled Sync with HEAD
 1.5.8.1 11-Jul-2007  mjf Sync with head.
 1.5.6.8 03-Dec-2007  ad Sync with HEAD.
 1.5.6.7 09-Oct-2007  ad Sync with head.
 1.5.6.6 20-Aug-2007  ad Sync with HEAD.
 1.5.6.5 15-Jul-2007  ad Sync with head.
 1.5.6.4 01-Jul-2007  ad Adapt to callout API change.
 1.5.6.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.5.6.2 10-Apr-2007  ad Nuke the deferred kthread creation stuff, as it's no longer needed.
Pointed out by thorpej@.
 1.5.6.1 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.11.8.3 21-Nov-2007  joerg Sync with HEAD.
 1.11.8.2 02-Oct-2007  joerg Sync with HEAD.
 1.11.8.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.11.4.1 15-Aug-2007  skrll Sync with HEAD.
 1.12.4.1 06-Oct-2007  yamt sync with head.
 1.12.2.2 09-Jan-2008  matt sync with HEAD
 1.12.2.1 06-Nov-2007  matt sync with HEAD
 1.13.2.1 18-Nov-2007  bouyer Sync with HEAD
 1.14.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.14.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.15.6.1 08-Jan-2008  bouyer Sync with HEAD
 1.16.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.16.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.16.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.18.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.18.10.1 19-Oct-2008  haad Sync with HEAD.
 1.18.6.2 10-Oct-2008  skrll Sync with HEAD.
 1.18.6.1 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.18.4.7 09-Oct-2010  yamt sync with head
 1.18.4.6 11-Aug-2010  yamt sync with head.
 1.18.4.5 11-Mar-2010  yamt sync with head
 1.18.4.4 19-Aug-2009  yamt sync with head.
 1.18.4.3 18-Jul-2009  yamt sync with head.
 1.18.4.2 20-Jun-2009  yamt sync with head
 1.18.4.1 04-May-2009  yamt sync with head.
 1.20.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.20.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.20.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.21.2.7 23-Dec-2008  snj Pull up following revision(s) (requested by taca in ticket #197):
sys/arch/x86/x86/ipmi.c: revision 1.28
Change max retry time to 90 seconds from 5 seconds.
It is processed in background to detect ipmi.
Now my ML115 G1 detects ipmi as NetBSD 4_STABLE.
Discussed with Matthias Scheler (tron) with private mail
and approved by him.
 1.21.2.6 18-Dec-2008  snj Pull up following revision(s) (requested by tron in ticket #191):
sys/arch/x86/x86/ipmi.c: revision 1.27
Keep trying to attach ipmi(4) in the background for five seconds.
NetBSD now detects the IPMI support in a HP Proliant ML110 G4 again.
This fixes PR kern/40065 by myself.
 1.21.2.5 20-Nov-2008  snj Pull up following revision(s) (requested by ad in ticket #94):
sys/arch/x86/x86/ipmi.c: revision 1.26
Fix sloppy device_private conversion by cegger@ that prevented systems
with IPMI from booting for the last two weeks.
 1.21.2.4 06-Nov-2008  snj Pull up following revision(s) (requested by cegger in ticket #10):
sys/arch/x86/x86/ipmi.c: revision 1.25
If ipmi failed to attach we would crash because we would end up using
callouts while cold. If cold, wait 10 times longer, and if we spinout fail
instead of trying to poll. Makes my machine boot again.
 1.21.2.3 06-Nov-2008  snj Pull up following revision(s) (requested by cegger in ticket #10):
sys/arch/x86/x86/ipmi.c: revision 1.24
unbreak previous. this change wasn't intended.
 1.21.2.2 06-Nov-2008  snj Pull up following revision(s) (requested by cegger in ticket #10):
sys/arch/x86/x86/ipmi.c: revision 1.23
sys/arch/x86/include/ipmivar.h: revision 1.9
The functions called from ipmi_match use the DEVNAME macro. But the
softc is allocated on the stack and the accessed sc_dev member is not
initialized.
Initialize the sc_dev.dv_xname in ipmi_match, which is enough to make
DEVNAME work. Finally this also allows the device_t/softc split.
 1.21.2.1 06-Nov-2008  snj Pull up following revision(s) (requested by cegger in ticket #10):
sys/arch/x86/x86/ipmi.c: revision 1.22
ipmi_match: remove one indentation level
 1.28.2.4 24-Oct-2010  jym Sync with HEAD
 1.28.2.3 01-Nov-2009  jym Sync with HEAD.
 1.28.2.2 23-Jul-2009  jym Sync with HEAD.
 1.28.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.42.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.42.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.44.2.2 05-Mar-2011  rmind sync with head
 1.44.2.1 30-May-2010  rmind sync with head
 1.50.12.2 05-Apr-2012  mrg sync to latest -current.
 1.50.12.1 18-Feb-2012  mrg merge to -current.
 1.50.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.50.8.1 17-Apr-2012  yamt sync with head
 1.52.2.1 12-Apr-2012  riz Pull up following revision(s) (requested by njoly in ticket #176):
sys/arch/x86/x86/ipmi.c: revision 1.53
For 1's and 2's complement sensor data, convert unsigned raw data to a
signed type before proceeding any computation. Fix handling of
negative temperatures that can be set for critmin/warnmin limits.
 1.53.2.3 03-Dec-2017  jdolecek update from HEAD
 1.53.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.53.2.1 23-Jun-2013  tls resync from head
 1.54.6.2 18-May-2014  rmind sync with head
 1.54.6.1 28-Aug-2013  rmind sync with head
 1.56.2.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.57.2.1 10-Nov-2014  snj Pull up following revision(s) (requested by maxv in ticket #195):
sys/arch/arm/iomd/iomd_irqhandler.c: revision 1.21
sys/arch/arm/ofw/ofw_irqhandler.c: revision 1.21
sys/arch/atari/atari/intr.c: revision 1.24-1.25
sys/arch/ews4800mips/sbd/fb_sbdio.c: revision 1.14
sys/arch/hpcmips/tx/tx39icu.c: revision 1.34
sys/arch/shark/isa/isa_irqhandler.c: revision 1.27
sys/arch/sparc/sparc/machdep.c: revision 1.327
sys/arch/sparc64/dev/psycho.c: revision 1.119
sys/arch/sparc64/dev/schizo.c: revision 1.32
sys/arch/sparc64/sparc64/machdep.c: revision 1.279
sys/arch/sun68k/sun68k/bus.c: revision 1.22
sys/arch/x86/x86/ipmi.c: revision 1.58
sys/arch/xen/xen/privcmd.c: revision 1.46-1.49
Fix several memory leaks.
 1.59.2.5 28-Aug-2017  skrll Sync with HEAD
 1.59.2.4 22-Apr-2016  skrll Sync with HEAD
 1.59.2.3 22-Sep-2015  skrll Sync with HEAD
 1.59.2.2 06-Jun-2015  skrll Sync with HEAD
 1.59.2.1 06-Apr-2015  skrll Sync with HEAD
 1.64.10.1 18-Aug-2020  martin Pull up following revision(s) (requested by nonaka in ticket #1597):

sys/dev/ipmi.c: revision 1.5
(applied to sys/arch/x86/x86/ipmi.c)

ipmi(4): Fixed a bug that incorrect condition is notified.

When the value obtained from the sensor is below the lower limit of
the critical threshold, it is notified that the value is below the lower
limit of the warning threshold.
 1.66.6.1 10-Jun-2019  christos Sync with HEAD
 1.66.4.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.66.4.1 20-Oct-2018  pgoyette Sync with head
 1.6 22-Aug-2022  hannken Sprinkle "#include <machine/pmap_private.h>", kernel ALL/amd64
compiles again.
 1.5 21-Apr-2019  maxv Rename the PTE bits.
 1.4 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.3 17-Sep-2017  maxv branches: 1.3.2; 1.3.6;
Remove the second argument from USERMODE and KERNELMODE, it is unused
now that we don't have vm86 anymore.
 1.2 15-Aug-2017  maxv branches: 1.2.2;
style
 1.1 15-Aug-2017  maxv Merge into x86/.
 1.2.2.2 28-Aug-2017  skrll Sync with HEAD
 1.2.2.1 15-Aug-2017  skrll file kgdb_machdep.c was added on branch nick-nhusb on 2017-08-28 17:51:56 +0000
 1.3.6.1 10-Jun-2019  christos Sync with HEAD
 1.3.2.2 03-Dec-2017  jdolecek update from HEAD
 1.3.2.1 17-Sep-2017  jdolecek file kgdb_machdep.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.93 02-May-2025  imil Add support for CPUID leaf 0x40000010 to detect TSC and LAPIC frequency on
hypervisors implementing the VMware-defined interface

This change enables virtual machines to obtain TSC and LAPIC frequency
information directly from the hypervisor via CPUID leaf 0x40000010, avoiding
the need for runtime calibration, thus reducing boot speed in supported
environments.

Tested on GENERIC and MICROVM kernels, QEMU/KVM and QEMU/NVMM (current and
10.1), Intel and AMD CPUs, NetBSD/amd64 and i386.
 1.92 06-Mar-2025  imil Revert VMware-compatible TSC and LAPIC frequency detection.
 1.91 06-Mar-2025  imil Add support for CPUID leaf 0x40000010, which enables VMware-compatible TSC
and LAPIC frequency detection for virtual machines.
 1.90 25-Feb-2024  andvar branches: 1.90.2;
s/asynchronious/asynchronous/ in comment.
 1.89 07-Sep-2022  knakahara NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.88 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.87 26-Apr-2022  msaitoh Fix typo. No funcitonal change.
 1.86 07-Oct-2021  msaitoh KNF. No functional change.
 1.85 27-Oct-2020  ryo move vmt(4) from MD to MI, and add support vmt on aarch64. tested on ESXi-Arm Fling

- move from sys/arch/x86/x86/{vmt.c,vmtreg.h,vmtvar.h} to sys/dev/vmt/{vmt_subr.c,vmtreg.h,vmtvar.h},
and split the attach part of the cpufeaturebus and fdt
- add aarch64 vmware backdoor op
- add include guard to vmt{reg,var}.h
- Yet there is still some little-endian dependency. it needs to be fixed in order to work properly on aarch64eb
 1.84 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.83 29-May-2020  rin For struct timecounter, use C99 initializers.
Compile tested. No functional changes intended.
 1.82 21-May-2020  ad Fix merge error
 1.81 21-May-2020  ad - Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.80 20-May-2020  msaitoh Temporary back to lapic_initclocks() from lapci_reset() to avoid compile
error.
 1.79 19-May-2020  ad lapic_delay() disable preemption since the state is very CPU dependent.
 1.78 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.77 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.76 01-Dec-2019  maxv branches: 1.76.6;
localify
 1.75 14-Jun-2019  msaitoh - Dump LAPIC and I/O APIC correctly.
- Don't print redirect target on LAPIC.
- Fix DEST_MASK:
- DEST_MASK is not 1 bit but 2 bit.
- Add missing "\0"s to print decoded name correctly.
- Support both LAPIC and I/O APIC correctly in apic_format_redir().
- Improve output of some bits using with snprintb()'s "F\B\1" and ":\V".
 1.74 14-Jun-2019  msaitoh No functional change:
- Rename macros:
- ICR, LVT and MSIDATA can share the bit definitions. Remove redundant
definitions and use the common macros.
- Consistently use LAPIC_LVT_ for all local vector table's macro names.
- Use __BITS().
- Add definition for TSC-deadline (LAPIC_LVT_TMM_TSCDLT).
 1.73 13-Jun-2019  msaitoh lapic_dump(): Print CMCI and thermal local vector table, too.
 1.72 13-Jun-2019  msaitoh No functional change:
- Simplify some code for readability.
- KNF a little.
 1.71 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.70 17-Feb-2019  nonaka PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.
 1.69 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.68 16-Dec-2018  jdolecek use ci_ipending instead of ci_istate.ipending, NFC
 1.67 23-Sep-2018  cherry Make XEN use the same api as native, for idt vector allocation
and registration.

lidt() placed in xenfunc() on maxv@ suggestion.

There should be no functional change due to this commit.

Tested on amd64 native and XEN.
 1.66 03-Apr-2018  christos branches: 1.66.2;
Rename the DDB IPI IDT vectors for consistency. ok maxv@
 1.65 26-Nov-2017  maxv branches: 1.65.2;
Remove unused variables.
 1.64 23-Nov-2017  jmcneill Add a workaround for local APIC timers running under KVM. It seems these
timers don't reload the current-count register in periodic mode when it
reaches 0, so we need to detect this condition and reload it ourselves.

XXX pullup
 1.63 04-Nov-2017  maxv Fix stack overflow, found when testing a new feature.
 1.62 15-Aug-2017  maxv Rename intrddb -> intrddbipi, like i386.
 1.61 11-Aug-2017  maxv Fix a bug introduced in r1.55, this should be LAPIC_BASE.
 1.60 13-Jul-2017  nonaka PR/52266: Before access MSR[APICBASE], need to check if APIC is present.
 1.59 08-Jul-2017  nonaka PR/52266: use rdmsr_safe(9) instead of rdmsr(9) for old machine.

tested by simonb@
 1.58 23-May-2017  nonaka branches: 1.58.2;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.57 23-May-2017  nonaka whitespace
 1.56 22-Apr-2017  nonaka use CR8 instead of LAPIC Task Priority register on x86-64.
 1.55 22-Apr-2017  nonaka move LAPIC_MSR* to specialreg.h.
 1.54 25-Nov-2016  maxv Move the virtual address of the LAPIC page out of the data segment on amd64
and i386. The old design was error-prone, and it didn't allow us to map the
data segment with large pages.

Now, the VA is allocated dynamically in the pmap bootstrap code, and entered
manually later. We go from using &local_apic to using *local_apic_va, and we
therefore need one more level of indirection in the asm code.

Discussed on tech-kern.
 1.53 15-Oct-2016  maxv Instead of setting the TPR to the value that was in the data segment, set
zero directly. On amd64, the data version of lapic_tpr is not explicitly
initialized.
 1.52 25-Jul-2016  maxv The L1 entry of the first page of the data segment is overwritten for the
LAPIC page, and set as RWX+PG_N. The LAPIC pa is fixed, and its va resides
in the data segment. Because of this error-prone design, the kernel image
map is not linear, and I first thought it was a bug (as I vaguely said in
PR/51148). Using large pages for the data segment is therefore wrong, since
the first page does not actually belong to the data segment (even if its va
is in the range). This bug is not triggered currently, since local_apic is
not large-page-aligned.

We will certainly have to allocate a va dynamically instead of using the
first page of data; but for now, disable large pages on the data segment,
and map the LAPIC as RW.

This is the last x86-specific RWX page.
 1.51 27-Jul-2015  msaitoh branches: 1.51.2;
KNF.
 1.50 17-Jul-2015  msaitoh KNF. No functional change.
 1.49 15-Jul-2015  msaitoh - Add lapic_dump() to print lapic's setting.
- Add mpacpi_dump() to dump mp_intrs[].
 1.48 18-May-2015  msaitoh Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.
 1.47 15-Nov-2013  msaitoh branches: 1.47.4; 1.47.6;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.46 12-Jun-2011  rmind branches: 1.46.2; 1.46.12; 1.46.16;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.45 18-May-2011  drochner branches: 1.45.2;
remove stale declarations / empty function
 1.44 21-Nov-2009  rmind branches: 1.44.4; 1.44.6;
Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.43 29-Jan-2009  joerg branches: 1.43.2;
rtclock_tval is defined as u_long in isa/clock.c, match.
 1.42 03-Jul-2008  drochner branches: 1.42.4;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.41 21-May-2008  ad branches: 1.41.2;
aprint_debug for the lapic ESR reports during startup.
 1.40 13-May-2008  ad Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.
 1.39 12-May-2008  ad - lapic_map: if we have an APIC MSR, ignore the supplied address and ask the
hardware where it is mapped. At least one ACPI implementation seems to lie
about the physical address of the lapic.

- lapic_initclocks: be paranoid and issue an EOI.
 1.38 11-May-2008  ad splclock -> splhigh
 1.37 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.36 10-May-2008  ad Remove tsc debugging code.
 1.35 28-Apr-2008  martin branches: 1.35.2;
Remove clause 3 and 4 from TNF licenses
 1.34 16-Apr-2008  cegger branches: 1.34.2; 1.34.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.33 25-Jan-2008  xtraeme branches: 1.33.6;
Some indentation for a few printfs that weren't respecting 80 chars
per line.
 1.32 25-Jan-2008  joerg Simplify the calibration code a great bit by just waiting around 100ms
polling the i8254 as reference and counting the cycles with that.
Disable interrupts. This should be relatively stable even in the light
of SMIs as long as they happen in the middle of the loop. This fixes
long delays during boot.

If greater precision for the calibration is desired, a second run should
be done at a different time when the HPET or ACPI timer is present. Both
provide much faster access (less jitter) and a higher frequency.
 1.31 23-Jan-2008  joerg Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.
 1.30 26-Dec-2007  yamt - share idt entry allocation code among x86.
- introduce a function to reserve an idt entry and use it instead of
manipulating idt_allocmap directly.
- rename idt to xen_idt for amd64 xen. add missing #ifdef XEN.
 1.29 09-Dec-2007  jmcneill branches: 1.29.2;
Merge jmcneill-pm branch.
 1.28 03-Dec-2007  joerg branches: 1.28.2;
Add a CPU local timer based on the LAPIC. This is consistently faster
than TSC, but doesn't suffer from SpeedStep as TSC does.

The default quality is higher than HPET for UP, but -100 for
MULTIPROCESSOR as it needs CPU local state which doesn't exist yet.
 1.27 14-Nov-2007  joerg branches: 1.27.2;
Merge from jmcneill-pm:
Add some more defines from the spec. Remove some old ones not
existing in the current Intel Architecture Guide. Use some more
understandable names.

ANSIfy and use uintXX_t to hurt my eyes less.

Further improve readability by exploiting __HAVE_TIMECOUNTER as
invariance on x86 platforms.
 1.26 14-Nov-2007  ad Use i8254_delay().
 1.25 26-Oct-2007  joerg branches: 1.25.2;
Match delay/DELAY on x86 with delay(9). It takes an unsigned int as
argument. Use this and replace the inline assembly (mul + div using the
64bit intermediate result) with normal 32bit multiplication and
division. The compiler can turn the division into a multiplication and
shift, making it even cheaper then the original assembly. For extreme
long delays, just use 64bit arithmetic.
 1.24 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.23 26-Sep-2007  ad branches: 1.23.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.22 29-Aug-2007  ad branches: 1.22.2;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.21 07-Aug-2007  ad branches: 1.21.2;
x86_ipi: don't wait for the IPI to go, unless DIAGNOSTIC. If it doesn't go,
the system is going to fail regardless.
 1.20 09-Feb-2007  ad branches: 1.20.6; 1.20.14; 1.20.18; 1.20.22;
Merge newlock2 to head.
 1.19 08-Dec-2006  yamt - pass intrframe by-pointer, not by-value.
- make i386 and xen use per-cpu interrupt stack.

xen part is reviewed by Manuel Bouyer.
 1.18 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.17 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.16 07-Jun-2006  kardel branches: 1.16.6; 1.16.8;
add timecounter support (from branch simonb-timecounters)
 1.15 04-Jan-2006  rpaulo branches: 1.15.2; 1.15.4; 1.15.6; 1.15.12;
Kill __P.
 1.14 24-Dec-2005  perry branches: 1.14.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.13 11-Dec-2005  christos merge ktrace-lwp.
 1.12 29-May-2005  christos branches: 1.12.2;
avoid variable shadowing.
 1.11 13-Jan-2005  fvdl * Wrap IPI sending in splclock(), since an interrupt at IPL_CLOCK or lower
may cause IPIs.
* Make broadcast IPIs go through x86_ipi() as well, so that they wait for
the APIC to be ready too.

From Stephan Uphoff.
 1.10 01-Jul-2004  yamt {i8254,lapic}_initclocks: try to be more precise using fixtick.
 1.9 30-Jun-2004  kochi fix a duplicate member in designated initializers, which was a bug
introduced in rev 1.5.
pointed out by Andreas Gustafsson.
 1.8 05-Jun-2004  yamt - introduce a function, i82489_icr_wait, which waits for
LAPIC_DLSTAT_BUSY cleared, and use it where appropriate.
- panic if lapic's busy too long and DIAGNOSTIC.
 1.7 12-May-2004  yamt x86_ipi: call x86_pause() in busy loops.
 1.6 30-Apr-2004  toshii Compile TSC support code when __x86_64__ is defined.
 1.5 10-Apr-2004  kochi use designated initializer for struct pic initializers.
just for readability.
 1.4 13-Feb-2004  wiz branches: 1.4.2;
Uppercase CPU, plural is CPUs.
 1.3 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.2 08-May-2003  fvdl branches: 1.2.2;
Add x86_pause() inline function, containing the "pause" instruction
for i386, and nothing for amd64. Sprinkle it in various spinloops,
as recommended by Intel.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.4.2.1 06-May-2004  jmc Pullup rev 1.6 (requested by toshii in ticket #254)

Compile TSC support code when __x86_64__ is defined.
 1.12.2.9 04-Feb-2008  yamt sync with head.
 1.12.2.8 21-Jan-2008  yamt sync with head
 1.12.2.7 07-Dec-2007  yamt sync with head
 1.12.2.6 15-Nov-2007  yamt sync with head.
 1.12.2.5 27-Oct-2007  yamt sync with head.
 1.12.2.4 03-Sep-2007  yamt sync with head.
 1.12.2.3 26-Feb-2007  yamt sync with head.
 1.12.2.2 30-Dec-2006  yamt sync with head.
 1.12.2.1 21-Jun-2006  yamt sync with head.
 1.14.2.1 15-Jan-2006  yamt sync with head.
 1.15.12.1 19-Jun-2006  chap Sync with head.
 1.15.6.1 26-Jun-2006  yamt sync with head.
 1.15.4.3 30-Apr-2006  kardel - initialize rtclock (i8254) appropriately
- add lapic late interrupt detection debug code
 1.15.4.2 28-Feb-2006  kardel disable i8254 reloading when using the lapic timer for interrupts
reverse __HAVE_TIMECOUNTER test for microset code
 1.15.4.1 04-Feb-2006  simonb For timecounters, we don't need to do the microset thing, nor set up
variables used by old NTP code.
 1.15.2.1 09-Sep-2006  rpaulo sync with head
 1.16.8.2 10-Dec-2006  yamt sync with head.
 1.16.8.1 22-Oct-2006  yamt sync with head
 1.16.6.3 06-Feb-2007  ad Quieten noisy boot messages.
 1.16.6.2 12-Jan-2007  ad Sync with head.
 1.16.6.1 18-Nov-2006  ad Sync with head.
 1.20.22.11 09-Dec-2007  jmcneill Sync with HEAD.
 1.20.22.10 14-Nov-2007  joerg Sync with HEAD.
 1.20.22.9 14-Nov-2007  joerg GC lapic_state.
 1.20.22.8 28-Oct-2007  joerg Sync with HEAD.
 1.20.22.7 02-Oct-2007  joerg Sync with HEAD.
 1.20.22.6 06-Sep-2007  joerg Add some more defines from the spec. Remove some old ones not
existing in the current Intel Architecture Guide. Use some more
understandable names.
 1.20.22.5 06-Sep-2007  joerg Further improve readability by exploiting __HAVE_TIMECOUTNER as
invariance on x86 platforms.
 1.20.22.4 05-Sep-2007  joerg ANSIfy and use uintXX_t to hurt my eyes less.
 1.20.22.3 03-Sep-2007  jmcneill Sync with HEAD.
 1.20.22.2 09-Aug-2007  jmcneill Sync with HEAD.
 1.20.22.1 03-Aug-2007  jmcneill Pull in power management changes from private branch.
 1.20.18.2 03-Sep-2007  skrll Sync with HEAD.
 1.20.18.1 15-Aug-2007  skrll Sync with HEAD.
 1.20.14.1 03-Oct-2007  garbled Sync with HEAD
 1.20.6.4 03-Dec-2007  ad Sync with HEAD.
 1.20.6.3 09-Oct-2007  ad Sync with head.
 1.20.6.2 20-Aug-2007  ad Sync with HEAD.
 1.20.6.1 29-Jul-2007  ad - When zeroing/copying pages, use SSE2 movtni to avoid polluting the cache.
- By default, align assembly routines on 32-byte starting boundaries.
- There are now 8 interrupt priority levels, half of which are softints.
Update intrdefs.h to match.
- Always clear/set spinlock words - removes lots of ifdefs.
- Remove the horrible ci_self150 hack that I introduced.
- Overhaul how TLB shootdown is performed. Inspired by a similar change in
OpenBSD but implemented quite differently. This should be a lot faster
but I have not benchmarked it yet.
 1.21.2.3 23-Mar-2008  matt sync with HEAD
 1.21.2.2 09-Jan-2008  matt sync with HEAD
 1.21.2.1 06-Nov-2007  matt sync with HEAD
 1.22.2.1 06-Oct-2007  yamt sync with head.
 1.23.2.2 18-Nov-2007  bouyer Sync with HEAD
 1.23.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.25.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.25.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.25.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.25.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.27.2.2 26-Dec-2007  ad Sync with head.
 1.27.2.1 08-Dec-2007  ad Sync with head.
 1.28.2.1 11-Dec-2007  yamt sync with head.
 1.29.2.1 02-Jan-2008  bouyer Sync with HEAD
 1.33.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.33.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.4.3 11-Mar-2010  yamt sync with head
 1.34.4.2 04-May-2009  yamt sync with head.
 1.34.4.1 16-May-2008  yamt sync with head.
 1.34.2.2 04-Jun-2008  yamt sync with head
 1.34.2.1 18-May-2008  yamt sync with head.
 1.35.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.35.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.41.2.1 03-Jul-2008  simonb Sync with head.
 1.42.4.1 03-Mar-2009  skrll Sync with HEAD.
 1.43.2.2 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.43.2.1 24-Oct-2010  jym Sync with HEAD
 1.44.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.44.4.2 31-May-2011  rmind sync with head
 1.44.4.1 26-Apr-2010  rmind Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap,
also provide TLBSTATS option to measure and track TLB shootdowns. Details:

http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html

Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.

XXX: amd64 and xen are not yet; work in progress.
 1.45.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.46.16.1 18-May-2014  rmind sync with head
 1.46.12.2 03-Dec-2017  jdolecek update from HEAD
 1.46.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.46.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.47.6.5 28-Aug-2017  skrll Sync with HEAD
 1.47.6.4 05-Dec-2016  skrll Sync with HEAD
 1.47.6.3 05-Oct-2016  skrll Sync with HEAD
 1.47.6.2 22-Sep-2015  skrll Sync with HEAD
 1.47.6.1 06-Jun-2015  skrll Sync with HEAD
 1.47.4.2 11-Aug-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #948):
sys/arch/x86/x86/mpacpi.c: revisions 1.99, 1.100
sys/arch/x86/x86/lapic.c: revision 1.49
- Add lapic_dump() to print lapic's setting.
- Add mpacpi_dump() to dump mp_intrs[].
--
Configure ioapic before lapic because lapic(lapic_set_lvt()) checks the
existence of ioapic. This change fixes a problem that some machines hang
after attaching ehci (little after writing EHCI_USBINTR to enable interrupt).
Even though cold == 1, LAPIC_LVINT0 was not set as masked. Perhaps it's the
reason of the problem.
This problem was observed on SuperMicro X10SLX-F, X10SDV-TLN4F and
Shuttle DS57U without wm(4) driver.
 1.47.4.1 22-May-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #795):
sys/arch/x86/x86/lapic.c: revision 1.48
Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.
 1.51.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.51.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.51.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.51.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.58.2.6 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.58.2.5 05-Apr-2018  martin Pull up following revision(s) (requested by christos in ticket #696):

sys/arch/amd64/amd64/vector.S: revision 1.62 (patch)
sys/arch/x86/include/intr.h: revision 1.55
sys/arch/i386/i386/vector.S: revision 1.77
sys/arch/i386/i386/db_interface.c: revision 1.82 (patch)
sys/arch/amd64/amd64/spl.S: revision 1.34 (patch)
sys/arch/amd64/amd64/db_interface.c: revision 1.33 (patch)
sys/arch/x86/x86/intr.c: revision 1.125
sys/arch/i386/i386/spl.S: revision 1.43 (patch)
sys/arch/i386/i386/machdep.c: revision 1.805 (patch)
sys/arch/x86/x86/lapic.c: revision 1.66 (patch)

Rename the DDB IPI IDT vectors for consistency. ok maxv@

Rename Xpreempt{recurse,resume} -> X{recurse,resume}_preempt so that
they fit the pattern. Also the debugger trap sniffer matches them
without adding special entries...

XXX: pullup-8.
 1.58.2.4 30-Nov-2017  martin Pull up following revision(s) (requested by maxv in ticket #403):
sys/arch/x86/x86/lapic.c: revision 1.63
Fix stack overflow, found when testing a new feature.
 1.58.2.3 30-Nov-2017  martin Pull up following revision(s) (requested by maxv in ticket #402):
sys/arch/x86/x86/lapic.c: revision 1.61
Fix a bug introduced in r1.55, this should be LAPIC_BASE.
 1.58.2.2 14-Jul-2017  martin Pull up following revision(s) (requested by nonaka in ticket #135):
sys/arch/x86/x86/lapic.c: revision 1.60
PR/52266: Before access MSR[APICBASE], need to check if APIC is present.
 1.58.2.1 10-Jul-2017  martin Pull up following revision(s) (requested by nonaka in ticket #110):
sys/arch/x86/x86/lapic.c: revision 1.59
PR/52266: use rdmsr_safe(9) instead of rdmsr(9) for old machine.
tested by simonb@
 1.65.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.65.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.65.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.66.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.66.2.1 10-Jun-2019  christos Sync with HEAD
 1.76.6.4 19-Apr-2020  bouyer Add per-PIC callbacks for interrupt_get_devname(), interrupt_get_assigned()
and interrupt_get_count(). Implement Xen-specific callbacks for
PIC_XEN and use the x86 one for others.
In event_set_handler(), call intr_allocate_io_intrsource() so that
events appears in interrupt list (intrctl list).
 1.76.6.3 18-Apr-2020  bouyer Centralize initialisations of delay_func and initclock_func
in x86_machdep.c and export from <x86/machdep.h>
Introduce a x86_dummy_initclock() and a x86_cpu_initclock_func pointer,
to be used later for Xen HVM native clock support.
rename rtclock_tval to x86_rtclock_tval and export from <x86/machdep.h>,
for the benefit of lapic.c
 1.76.6.2 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.76.6.1 08-Apr-2020  bouyer Remove VM_GUEST_XEN and define only Xen subtypes:
VM_GUEST_XENPV
VM_GUEST_XENPVH
VM_GUEST_XENHVM
VM_GUEST_XENPVHVM

Set vm_guest in the start routine, if it is hypervisor-specific (e.g Xen PV).
If vm_guest was not set early and we detect Xen in identify_hypervisor(),
assume it is VM_GUEST_XENHVM. Refine to VM_GUEST_PVXENHVM in
hypervisor_match().
 1.90.2.1 02-Aug-2025  perseant Sync with HEAD
 1.12 07-Oct-2021  msaitoh KNF. No functional change.
 1.11 19-Feb-2012  rmind Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
 1.10 07-Jul-2010  chs branches: 1.10.8; 1.10.12;
many changes for COMPAT_LINUX:
- update the linux syscall table for each platform.
- support new-style (NPTL) linux pthreads on all platforms.
clone() with CLONE_THREAD uses 1 process with many LWPs
instead of separate processes.
- move the contents of sys__lwp_setprivate() into a new
lwp_setprivate() and use that everywhere.
- update linux_release[] and linux32_release[] to "2.6.18".
- adjust placement of emul fork/exec/exit hooks as needed
and adjust other emul code to match.
- convert all struct emul definitions to use named initializers.
- change the pid allocator to allow multiple pids to refer to the same proc.
- remove a few fields from struct proc that are no longer needed.
- disable the non-functional "vdso" code in linux32/amd64,
glibc works fine without it.
- fix a race in the futex code where we could miss a wakeup after
a requeue operation.
- redo futex locking to be a little more efficient.
 1.9 13-Jan-2010  njoly branches: 1.9.2; 1.9.4;
Use __arraycount instead of locally defined macro.
 1.8 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.7 21-Oct-2008  ad branches: 1.7.8;
Undo revivesa damage to userret().
 1.6 15-Oct-2008  wrstuden Merge wrstuden-revivesa into HEAD.
 1.5 28-Apr-2008  martin branches: 1.5.2; 1.5.6;
Remove clause 3 and 4 from TNF licenses
 1.4 09-Feb-2007  ad branches: 1.4.44; 1.4.46; 1.4.48;
Merge newlock2 to head.
 1.3 17-Mar-2006  erh branches: 1.3.8;
Fix Coverity issue 1471. Shouldn't actually be problem, but the array
bounds checks to figure out which signal to issue were off by one.
 1.2 11-Dec-2005  christos branches: 1.2.4; 1.2.6; 1.2.8; 1.2.10; 1.2.12;
merge ktrace-lwp.
 1.1 15-May-2005  fvdl branches: 1.1.2; 1.1.8;
Move linux_trap.c from sys/arch/i386/i386 to sys/arch/x86/x86, and share
it. Remove the amd64 linux_trap.c (which was just a stub with a printf
anyway).
 1.1.8.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.8.1 15-May-2005  skrll file linux_trap.c was added on branch ktrace-lwp on 2005-11-10 14:00:20 +0000
 1.1.2.2 26-Feb-2007  yamt sync with head.
 1.1.2.1 21-Jun-2006  yamt sync with head.
 1.2.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.2.10.1 19-Apr-2006  elad sync with head - hopefully this will work
 1.2.8.1 01-Apr-2006  yamt sync with head.
 1.2.6.1 22-Apr-2006  simonb Sync with head.
 1.2.4.1 09-Sep-2006  rpaulo sync with head
 1.3.8.2 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.3.8.1 24-Oct-2006  ad Compile fixes
 1.4.48.4 11-Aug-2010  yamt sync with head.
 1.4.48.3 11-Mar-2010  yamt sync with head
 1.4.48.2 04-May-2009  yamt sync with head.
 1.4.48.1 16-May-2008  yamt sync with head.
 1.4.46.1 18-May-2008  yamt sync with head.
 1.4.44.2 17-Jan-2009  mjf Sync with HEAD.
 1.4.44.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.6.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.5.6.1 19-Oct-2008  haad Sync with HEAD.
 1.5.2.2 23-May-2008  wrstuden Re-add userret hook. Add a new define, SA_NO_USERRET, which
indicates that upcall support should NOT be included. Add this
for all non-netbsd emulations. They will never be SA apps, so
let's make the invarient pretty blatant.

NetBSD code should include both sys/sa.h and sys/savar.h.
 1.5.2.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.7.8.1 24-Oct-2010  jym Sync with HEAD
 1.9.4.1 05-Mar-2011  rmind sync with head
 1.9.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.10.12.1 24-Feb-2012  mrg sync to -current.
 1.10.8.1 17-Apr-2012  yamt sync with head
 1.8 07-Nov-2007  ad __cpu_simple_locks really should be simple, otherwise they can cause
problems for e.g. profiling.
 1.7 09-Feb-2007  ad branches: 1.7.12; 1.7.22; 1.7.24; 1.7.28; 1.7.30;
Merge newlock2 to head.
 1.6 11-Dec-2005  christos branches: 1.6.20;
merge ktrace-lwp.
 1.5 29-May-2005  christos branches: 1.5.2;
avoid variable shadowing.
 1.4 31-Oct-2004  yamt use __insn_barrier rather than homegrown equivalents.
 1.3 26-Oct-2003  yamt issue PAUSE in the debug version of __cpu_simple_lock as well.
 1.2 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.1 01-Mar-2003  fvdl branches: 1.1.2;
lock_machdep.c moved here from arch/i386/i386.
 1.1.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.4 02-Nov-2004  skrll Sync with HEAD.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.2.2 15-Nov-2007  yamt sync with head.
 1.5.2.1 26-Feb-2007  yamt sync with head.
 1.6.20.1 29-Dec-2006  ad Checkpoint work in progress.
 1.7.30.1 19-Nov-2007  mjf Sync with HEAD.
 1.7.28.1 13-Nov-2007  bouyer Sync with HEAD
 1.7.24.1 08-Nov-2007  matt sync with -HEAD
 1.7.22.1 11-Nov-2007  joerg Sync with HEAD.
 1.7.12.1 18-Apr-2007  thorpej Convert i386 and amd64 to the new atomic ops API.
 1.7 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.6 24-Apr-2021  thorpej branches: 1.6.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.5 01-Aug-2020  jdolecek branches: 1.5.4;
reorder includes to pull __HAVE_PCI_MSI_MSIX properly via
<x86/pci_machdep_common.h>
 1.4 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.3 14-Feb-2019  cherry branches: 1.3.4; 1.3.12;
Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.2 22-Dec-2018  maxv branches: 1.2.2;
Style, once again.
 1.1 22-Dec-2018  cherry This change modifies the mainbus(4) entry point for all x86 sub-archs
in the following way:

i) It provides a unified entry point in
x86/x86/mainbus.c:mainbus_attach()
ii) It carves out the preliminary bus attachment sequence that is
common to all sub-archs into
x86/x86/mainbus.c: x86_cpubus_attach()
iii) It consolidates the remaining pathways as internal callee
functions so that these may be called piecemeal if required. A
special usecase of this is XEN PVHVM which may need to call the
native configure path, the xen configure path, or both.
iv) It moves the driver private data structures from
i386/i386_mainbus.c to an x86/ level one. This allows for other
sub-arch's to do similar, if needed. (They do not at the moment).
v) For dom0 kernels, it enables 'acpi0 at mainbus?' and
'acpi0 at hypervisorbus'. This serves two purposes:
a) To demonstrate the possibility of dynamic configuration tree
traversal ordering changes.
b) To allow for the common acpi_check(self, "acpibus") call in
x86/mainbus.c to not barf when it is called from the dom0 attach
path. We allow for the acpi0 device to be a child of mainbus with
the changes to amd64/conf/XEN3_DOM0 and i386/conf/XEN3PAE_DOM0
without actually probing further in the code. This path will later
be pursued in a PVHVM boot codepath.

There should be no operative changes with this change. If there are,
please complain loudly.
 1.2.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.2.2.1 22-Dec-2018  pgoyette file mainbus.c was added on branch pgoyette-compat on 2018-12-26 14:01:45 +0000
 1.3.12.3 18-Apr-2020  bouyer Add PVHVM multiprocessor support:
We need the hypervisor to be set up before cpus attaches.
Move hypervisor setup to a new function xen_hvm_init(), called at the
beggining of mainbus_attach(). This function searches the cfdata[] array
to see if the hypervisor device is enabled (so you can disable PV
support with
disable hypervisor
from userconf).
For HVM, ci_cpuid doens't match the virtual CPU index needed by Xen.
Introduce ci_vcpuid to cpu_info. Introduce xen_hvm_init_cpu(), to be
called for each CPU in in its context, which initialize ci_vcpuid and
ci_vcpu, and setup the event callback.
Change Xen code to use ci_vcpuid.

Do not call lapic_calibrate_timer() for VM_GUEST_XENPVHVM, we will use
Xen timers.

Don't call lapic_initclocks() from cpu_hatch(); instead set
x86_cpu_initclock_func to lapic_initclocks() in lapic_calibrate_timer(),
and call *(x86_cpu_initclock_func)() from cpu_hatch().
Also call x86_cpu_initclock_func from cpu_attach() for the boot CPU.
As x86_cpu_initclock_func is called for all CPUs, x86_initclock_func can
be a NOP for lapic timer.

Reorganize Xen code for x86_initclock_func/x86_cpu_initclock_func.
Move x86_cpu_idle_xen() to hypervisor_machdep.c
 1.3.12.2 16-Apr-2020  bouyer Reorganise sources to make it possible to include Xen PVHVM support in
native kernels. Among others:
- move xen/include/amd64/hypercall.h to amd64/include/xen and
xen/include/i386/hypercall.h to i386/include/xen
- exclude some native files from the build for xenpv
- add xen to "machine" config statement for amd64 and i386
- split arch/xen/conf/files.xen to arch/xen/conf/files.xen (for pv drivers)
and arch/xen/conf/files.xen.pv (for full pv support)
- add GENERIC_XENHVM kernel config which includes GENERIC and add Xen PV
drivers.
 1.3.12.1 11-Apr-2020  bouyer Attach hypervisor earlier, so that ISA/PCI emulated device are disabled
before we probe them.
 1.3.4.2 10-Jun-2019  christos Sync with HEAD
 1.3.4.1 14-Feb-2019  christos file mainbus.c was added on branch phil-wifi on 2019-06-10 22:06:54 +0000
 1.5.4.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.6.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.7 07-Oct-2021  msaitoh KNF. No functional change.
 1.6 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.5 24-Apr-2021  thorpej branches: 1.5.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.4 24-Dec-2018  cherry branches: 1.4.14;
Towards bifurcating XEN and native interrupt related functions,
this is a preliminary cleanup sweep.

Move functions related to MP bus probe and scanning to x86/mp.c

Move generic platform pic search function to x86/x86_machdep.c
 1.3 01-Jul-2011  dyoung branches: 1.3.52; 1.3.54;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.2 13-Jun-2009  tsutsui Apply fixes from jmcneill@ for PR port-i386/38729
(ACPI kernel booted under qemu cannot detect devices):
- make MP SCANPCI function for ACPI_SCANPCI and MPBIOS_SCANPCI
return a number of attached PCI busses
- if no valid PCI busses are attached in the MP SCANPCI function,
try to probe and attach pci0 at mainbus as well as kernels
with no SCANPCI options

"Feel free to check it in" from jmcneill@.
Tested in pkgsrc qemu-0.9.1 (both i386 and x86_64) on NetBSD/i386.

Note original jmcneill's patch was posted on March:
http://mail-index.NetBSD.org/port-i386/2009/03/24/msg001281.html
and I also applied it to amd64:
http://mail-index.NetBSD.org/port-i386/2009/03/24/msg001283.html
but x86 MP attach functions have been reorganized by dyoung@ on April:
http://mail-index.NetBSD.org/source-changes/2009/04/17/msg219992.html
so I've modified the original patches to adapt the changes.
(mpacpi_scan_pci() and mpbios_scan_pci() have been merged into
common mp_pci_scan() in new arch/x86/x86/mp.c)
For netbsd-5 and netbsd-5-0 branches, the original patches should be
applied cleanly, and they have been tested by abs@ on a selection of
i386 boxes and in qemu.
 1.1 17-Apr-2009  dyoung branches: 1.1.2; 1.1.4; 1.1.6;
Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.1.6.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.1.6.4 01-Nov-2009  jym Sync with HEAD.
 1.1.6.3 23-Jul-2009  jym Sync with HEAD.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 17-Apr-2009  jym file mp.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.1.4.3 20-Jun-2009  yamt sync with head
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 17-Apr-2009  yamt file mp.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:11 +0000
 1.1.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.1.2.1 17-Apr-2009  skrll file mp.c was added on branch nick-hppapmap on 2009-04-28 07:34:57 +0000
 1.3.54.1 10-Jun-2019  christos Sync with HEAD
 1.3.52.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.4.14.1 02-Apr-2021  thorpej config_found_ia() -> config_found() w/ CFARG_IATTR.
 1.5.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.112 06-Oct-2025  riastradh x86: Wire up PCI resource manager if enabled.

Enable in your kernel config with `options PCI_RESOURCE'.

Adapted from a patch by mlelstv@.

PR port-amd64/59118: Thinkpad T495s - iwm PCI BAR is zero
 1.111 30-Sep-2024  bouyer branches: 1.111.2;
Remove check (x2apic->LocalApicId <= 0xff) in mpacpi_config_cpu(),
the ACPI spec mentions this for compatibility with "legacy OSes" but
doens't explicitely forbid it (AFAIK). This makes a recent Dell
poweredge R750 boot to the installer.
See
https://mail-index.netbsd.org/port-amd64/2023/12/30/msg003666.html
and
https://mail-index.netbsd.org/port-amd64/2024/09/28/msg003695.html

It should help for PR kern/57737
 1.110 24-Mar-2023  bouyer branches: 1.110.6;
mpacpi_config_cpu(): Xen with a PVH dom0 reports x2apic->LocalApicId
below 0xff, which causes a panic later because no CPUs are attached.
Accept the bogus LocalApicId value for VM_GUEST_XENPVH.
 1.109 22-Jan-2022  thorpej branches: 1.109.4;
Change the devhandle_from_*() functions to also take a "super handle",
from which the newly created handle will inherit it's implementation.
The root implementation for a new handle type is used if an invalid
"super handle" is passed.
 1.108 07-Oct-2021  msaitoh KNF. No functional change.
 1.107 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.106 12-May-2021  thorpej branches: 1.106.4;
In mpacpi_pci_attach_hook(), set the device handle of the PCI bus instance
to the associated ACPI handle if a device handle is not already set.

XXX This is a mess. Sure would be nice if it looked / worked more like
XXX the ARM code.
 1.105 24-Apr-2021  thorpej branches: 1.105.2; 1.105.4;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.104 17-Jan-2020  jmcneill branches: 1.104.8;
Add support for Arm N1 SDP PCIe host controller.

The N1 SDP has a few bugs that we need to work around:
- PCIe root port config space lives in a non-standard location.
- Access to PCIe config space of devices that do not exist results in
an sync SError. Firmware creates a "known devices" table at a fixed
physical address that we use to filter PCI conf access to only known
devices.

This change splits the Arm ACPI PCI quirks into separate files for each
host controller, and allows per-segment quirks to be applied.

These changes exposed some bugs in the MI ACPI layer related to
multi-segment support. The MI ACPI PCI code was using a shared PCI
chipset tag to access devices, and these accesses can happen before our
PCI host bridge drivers are attached! The global chipset tag is now gone,
and an MD callback can provide a custom tag on a per-segment basis.
 1.103 01-Jun-2017  chs branches: 1.103.10; 1.103.16;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.102 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.101 17-Jul-2015  msaitoh KNF. No functional change.
 1.100 15-Jul-2015  msaitoh Configure ioapic before lapic because lapic(lapic_set_lvt()) checks the
existence of ioapic. This change fixes a problem that some machines hang
after attaching ehci (little after writing EHCI_USBINTR to enable interrupt).
Even though cold == 1, LAPIC_LVINT0 was not set as masked. Perhaps it's the
reason of the problem.

This problem was observed on SuperMicro X10SLX-F, X10SDV-TLN4F and
Shuttle DS57U without wm(4) driver.
 1.99 15-Jul-2015  msaitoh - Add lapic_dump() to print lapic's setting.
- Add mpacpi_dump() to dump mp_intrs[].
 1.98 22-Jun-2015  msaitoh Fix wrong output in mpacpi_pci_foundbus() with MPVERBOSE. Assign
vaues before printing them.
 1.97 25-Mar-2013  chs branches: 1.97.10; 1.97.12;
redo the ACPI interrupt handler setup again, this time handling
MADT overrides that change the pin as well as the polarity.
fixes PR 47648.
 1.96 03-Oct-2012  chs as a workaround for PR 47016, call ioapic_reenable() at the end of
ACPI interrupt routing to fix the settings for the SCI interrupt.
the problem is that after my recent changes, the SCI handler is
installed before the MADT info is parsed, so we don't know what
polarity it should have. the real fix for this will be to rearrange
the ACPI initialization so that everything is done in a more sensible
order, but that will take some more time.
 1.95 23-Sep-2012  chs locate PCI buses and determine their bus numbers using the info
previously extracted from ACPICA rather than trying to figure it out again.
allow PCI buses that don't have a _PRT method.
 1.94 27-Apr-2012  jruoho branches: 1.94.2;
Revert previous. Revision 1.79 was right; Qemu does not implement _PIC.
 1.93 26-Apr-2012  jruoho Based on PR kern/44069, revert revision 1.79.

XXX: The actual problem related to Qemu/KVM is yet to be determined.
 1.92 01-Jul-2011  dyoung branches: 1.92.2; 1.92.8;
#include <sys/bus.h> instead of <machine/bus.h>.
 1.91 05-Apr-2011  pgoyette Display a warning message if an attempt is made to process interrupt
routing for a bus that has previously been processed.

From PR kern/43570 - doesn't fix the problem but at least lets you
know it exists.
 1.90 16-Mar-2011  dholland Fix build with no pchb. From Aran Clauson in PR 44720.
 1.89 07-Aug-2010  jruoho branches: 1.89.2;
Reorganize: also the APIC tables will be dumped in ACPIVERBOSE, and the
callback functions will be modified to be suitable also with other tables.
 1.88 04-Aug-2010  jruoho Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.
 1.87 27-Apr-2010  jruoho Clean up <dev/acpi/acpireg.h>. While documenting the control methods is an
admirable goal, it is pretty much mission impossible; the specifications are
nearly thousand pages each and the amount of methods is counted in hundreds.

In addition, use ACPICA's native constants from <actypes.h> when possible.
Also move ACPI_STA_OK from "mpacpi.c" to <dev/acpi/acpireg.h> to simplify
the evaluation of device status.
 1.86 14-Apr-2010  jruoho UINT32 -> uint32_t; UINT8 -> uint8_t.
 1.85 08-Apr-2010  jruoho ACPICA 20091112:

Implemented a post-order callback to AcpiWalkNamespace. The existing
interface only has a pre-order callback. This change adds an
additional parameter for a post-order callback which will be more
useful for bus scans. ACPICA BZ 779. Lin Ming. Updated the ACPICA
Programmer Reference.

We will use the old "pre-order callback" for the time being.
 1.84 09-Jan-2010  cegger branches: 1.84.2; 1.84.4;
add x2apic support.
patch presented on current-users@, port-i386@ and port-amd64@ on 2009-12-22

No comments.
 1.83 05-Jan-2010  jruoho Put back the evaluation of the return value from mpacpi_get_bbn().

Break reported by njoly@. Thanks!
 1.82 05-Jan-2010  jruoho Use acpi_eval_set_integer() to simplify code. No functional change intended.

ok pgoyette@, jmcneill@
 1.81 05-Jan-2010  jruoho Fix several possible memory leaks in mpacpi_derive_bus().

ok pgoyette@, jmcneill@
 1.80 05-Jan-2010  mbalmer One semicolon only (;; -> ;)
 1.79 04-Nov-2009  toshii Don't return an error if the _PIC method isn't found.
It's an optional method and not found in kvm/qemu.
 1.78 16-Sep-2009  mlelstv Allow for 'options ACPI_DEBUG' by providing module declarations
and using memory allocation macros instead of calling AcpiOs* stubs
directly.
 1.77 18-Aug-2009  jmcneill Switch to ACPICA 20090730, and update for API changes.
 1.76 17-Apr-2009  dyoung Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.75 14-Jan-2009  cegger branches: 1.75.2;
use KM_SLEEP per request from ad@
 1.74 12-Jan-2009  sborrill Return ENOENT instead of panicking when irq doesn't equal line
(mpacpi_findintr_linkdev: irq mismatch). This doesn't fix the cause of
kern/38540, but stops the bogus panic. It's pretty definite that the device
with the mismatched irq will not function.
 1.73 23-Dec-2008  cegger move from malloc to kmem
 1.72 16-Dec-2008  christos replace bitmask_snprintf(9) with snprintb(3)
 1.71 09-Nov-2008  cegger struct device * -> device_t
 1.70 09-Nov-2008  cegger Nuke last parameter from mpaci_scan_apics() and mpbios_scan().
It is unused.
 1.69 26-Aug-2008  cegger branches: 1.69.2; 1.69.4;
beautify dmesg with MPVERBOSE:

don't print an empty line.
 1.68 31-Jul-2008  joerg machdep.acpi_vbios_reset = 2 --> vga_pci_resume will use x86emu to do a
POST when options VGA_POST is present.
 1.67 21-Jul-2008  cegger beautify dmesg with MPVERBOSE.
before:

pci0 at hypervisor0 bus 0: configuration mode 1hypervisor0: added to list as bus 0

pchb0 at pci0 dev 0 function 0

now:

pci0 at hypervisor0 bus 0: configuration mode 1
hypervisor0: added to list as bus 0
pchb0 at pci0 dev 0 function 0
 1.66 03-Jul-2008  drochner branches: 1.66.2;
Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.65 25-Jun-2008  joerg Mask the interrupt pin in the other places as well as reminded by
Jared.
 1.64 25-Jun-2008  joerg Mask the higher bits of the interrupt pin extract from the _PTR.
Alan Barrett reported a system in PR 38959 that (incorrectly) uses the
higher bits and which resulted in a bad table being built.
 1.63 06-Jun-2008  joerg branches: 1.63.2;
Explicitly recognize the PNP ID of PCI-X bridges. This is normally
redundant as DSDTs should provide _CID for it.
 1.62 04-Jun-2008  joerg Add back break to fix PCI bridge traversal as reported by various users.
 1.61 03-Jun-2008  joerg Make the logic for _BBN overrides less aggressive. When mpacpi_get_bbn
failed and the current goal is to enumerate all PCI bus and this is the
first PCI host bridge, just assume it is bus 0 and ignore the error.
When querying the bus number, assume that the system paniced earlier if
an error happened and this is not the first/only PCI host bridge and
override the BBN as 0 in that case.
 1.60 01-Jun-2008  joerg When a PCI host bridge description in the DSDT has a missing _BBN or the
_BBN is 0, check if the _ADR field is also 0. If it is, assume that the
_BBN really should be 0. Otherwise, try to extract the _BBN from the
bridge itself using pchb logic and panic only, if that fails as well.
Reported and tested by Martin Husemann as interrupt issue.
 1.59 01-Jun-2008  joerg When building the ACPI PCI Interrupt Table, check for duplicate entries
and drop all but the first. This is the behaviour Windows seems to
implement and some BIOSes depend on that due to broken dups.

This should fix PR 37001.
 1.58 26-Apr-2008  darcy branches: 1.58.2; 1.58.4;
Add a little more detail when verbosity is requested.
 1.57 16-Apr-2008  cegger branches: 1.57.2;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.56 12-Dec-2007  jmcneill branches: 1.56.6;
Try not to pass garbage to pci_make_tag; workaround for odd ACPI DSDTs.
Fixes kern/37527.
 1.55 09-Dec-2007  jmcneill branches: 1.55.2;
Merge jmcneill-pm branch.
 1.54 01-Dec-2007  jmcneill branches: 1.54.2; 1.54.4;
aprintify
 1.53 24-Oct-2007  joerg branches: 1.53.2;
Remove code that was never meant to hit the tree in first place.
 1.52 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.51 10-Oct-2007  joerg branches: 1.51.2;
Install the default entries for the non-ISA interrupts as masked as
intended. Report by Christoph Egger.
 1.50 06-Oct-2007  joerg Merge from mpacpi.h 1.4.32.1, acpi_machdep.c 1.13.22.5 and
mpacpi.c 1.48.12.2 from jmcneill-pm:

Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.

Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.49 10-Aug-2007  joerg branches: 1.49.2; 1.49.4;
Print the polarity and trigger flags as well. Can help with debugging
on fancy notebooks.
 1.48 10-Apr-2007  bouyer branches: 1.48.4; 1.48.8; 1.48.12;
Fix previous: don't AcpiOsFree() twice if the device is valid.
 1.47 08-Apr-2007  bouyer Properly skip inactive devices; avoids a panic in pci_make_tag() later.
Thanks to cube@ for the idea.
An ACPI kernel can now boot on a poweredge 2950.
 1.46 05-Mar-2007  drochner branches: 1.46.2; 1.46.4;
clean up how cpus and ioapics are attached at the mainbus:
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
 1.45 15-Feb-2007  ad branches: 1.45.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.
 1.44 16-Nov-2006  christos branches: 1.44.2;
__unused removal on arguments; approved by core.
 1.43 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.42 29-Sep-2006  martin If using NLAPIC, we better include lapic.h. Pointed out by Kurt Schreiner
on current-users.
 1.41 28-Sep-2006  bouyer - make it possible to have ACPI without IOAPIC and/or LAPIC
- make it possible for machine-specific code to provide custom R/W routines
in its i82093*.h headers
- always initialize sc->sc_pins[pin], even in the !ioapic_cold case.
No objections on port-i386 and port-amd64.
 1.40 23-Sep-2006  fvdl While the low-level trigger and polarity values are the same for
ACPI and MPS, the ACPICA values are different. Convert them,
so that we get the right values into the ioapic.
 1.39 23-Sep-2006  fvdl Check for the bad irq0 override quirk.
 1.38 12-Aug-2006  fvdl branches: 1.38.2; 1.38.4;
Record the ACPI global int in the interrupt structure for ISA interrupt
overrided (e.g. the SCI interrupt), so that it may be found correctly by
the ACPI interrupt establish function, should the number be different
from the original source.
 1.37 20-Jul-2006  kochi eliminate bogus acpi debug #define symbols
 1.36 04-Jul-2006  christos Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.35 11-Dec-2005  christos branches: 1.35.4; 1.35.8; 1.35.16;
merge ktrace-lwp.
 1.34 26-Aug-2005  drochner s/locdesc_t/int/g
 1.33 29-May-2005  christos branches: 1.33.2;
Sprinkle const.
 1.32 21-Dec-2004  fvdl Use fixed mode, not lopri, for delivering IO interrupts. Suggested by
Peter O'Kane. Fixes interrupt problems on some Xeon systems.
 1.31 29-Nov-2004  ws We just checked that the parent is the root, not current.
So we better determine the bus number of this parent node.
Now, MPACPI on my Opteron board finally correctly determines its
PCI/AGP busses even without the help of the AMD64 Address Map
support implemented in my local tree.
 1.30 30-Aug-2004  drochner Phase out the use of a string as first "attach args" member to control
which bustype should be attached with a specific call to config_found()
(from a "mainbus" or a bus bridge).
Do it for isa/eisa/mca and pci/agp for now. These buses all attach to
an mi interface attribute "isabus", "eisabus" etc., and the autoconf
framework now allows to specify an interface attribute on config_found()
and config_search(), which limits the search of matching config data
to these which attach to that specific attribute.
So we basically have to call config_found_ia(..., "foobus", ...) where
such a bus is attached.
As a consequence, where a "mainbus" or alike also attaches other
devices (eg CPUs) which do not attach to a specific attribute yet,
we need at least pass an attribute name (different from "foobus") so
that the foo bus is not found at these places. This made some minor
changes necessary which are not obviously related to the mentioned buses.
 1.29 23-May-2004  kochi prevent panic for machines without any ACPI MADT table.
 1.28 21-May-2004  kochi Fix panic / bogus PCI bus detection.
 1.27 21-May-2004  kochi Clean up variable usage.
 1.26 21-May-2004  kochi Make sure we don't use the same bus number for PCI and ISA.
 1.25 21-May-2004  kochi Back out bogus node check of revision 1.22.
This check is not necessary.
 1.24 21-May-2004  kochi add some comments, make local variables/functions static and some style fix.
 1.23 25-Apr-2004  tron Make this compile without ACPI_DEBUG again.
 1.22 25-Apr-2004  christos make this compile with ACPI_DEBUG again.
 1.21 22-Apr-2004  skd 1) Skip over bogus device nodes, prevents a panic in pci_make_tag.
2) Clarify a printf.
 1.20 10-Apr-2004  kochi whitespace nit
 1.19 24-Mar-2004  martin branches: 1.19.2;
Make it compile (int -> ACPI_INTEGER)
 1.18 23-Mar-2004  kochi Don't use ACPI CA internal functions
 1.17 13-Nov-2003  fvdl Remove leftover debugging printf.
 1.16 31-Oct-2003  fvdl Catch up with the new acpica code.
 1.15 30-Oct-2003  fvdl * keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.14 21-Oct-2003  fvdl If a bus has not been configured by MPBIOS/ACPI, and the attach hook
for it is called, mark it as configured.
 1.13 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.12 09-Oct-2003  fvdl Allow probing of CPUs only by ACPI, so that MPBIOS can still do interrupt
mapping should ACPI have a quirk. From Christos. One change by me: make
sure that lapic_boot_init doesn't get called twice, otherwise the
cpu_info entry for the CPU with id 0 gets zapped.
 1.11 07-Oct-2003  fvdl Backout previous for now, it breaks second CPU spinup. It'll be back later.
 1.10 07-Oct-2003  fvdl Changes from Christos to fall back to MPBIOS for interrupt probing
if MPACPI fails, so that MPACPI can be used to only probe CPUs
if needed.
 1.9 06-Sep-2003  fvdl When establishing the ACPI SCI, make sure it's always active low (as well
as level-triggered). Do this by changing the MP config entry that was
set up for the interrupt. Do not change anything if there was an ACPI
interrupt source override, assume that this contains the correct
information already.
 1.8 22-Jul-2003  simonb Use local APIC id to determine boot CPU.

Fixes PR kern/20690 from Jaromir Dolecek. Fix from fvdl.
 1.7 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.6 01-Jun-2003  fvdl branches: 1.6.2;
mpb_name may not be set for a bus, since it's possible a PCI bus
doesn't show up when looking at ACPI, but is found on a ppb. So
check if it's NULL before doing a strcmp on it.

From Takayoshi Kochi.
 1.5 29-May-2003  fvdl Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.4 15-May-2003  fvdl Don't start the process of scanning CPUs and I/O APICs (with interrupt
routing to follow later) if the ACPI implementation is marked as
having a quirky PCI bus/interrupt configuration. If MPBIOS is also
defined, it'll do the job instead.
 1.3 15-May-2003  fvdl Try a little harder to find PCI buses in the MPACPI code, in a (probably
futile) attempt to get quirky ACPI implementations going.

Work around a problem with quirky MP tables for ioapic interrupt routing.
 1.2 11-May-2003  fvdl Remove machine/cputypes include.
 1.1 11-May-2003  fvdl Moved here from sys/arch/i386/i386
 1.6.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.6.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.6.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.6.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.6.2.2 03-Sep-2004  skrll Sync with HEAD
 1.6.2.1 03-Aug-2004  skrll Sync with HEAD
 1.19.2.1 01-Jun-2004  jmc branches: 1.19.2.1.2;
Pullup rev 1.20-1.29 (requested by kochi in ticket #427)

Lots of fixes to prevents panic's on HT motherboards
 1.19.2.1.2.1 15-Apr-2005  tron Pull up revision 1.32 (requested by fvdl in ticket #1073):
Use fixed mode, not lopri, for delivering IO interrupts. Suggested by
Peter O'Kane. Fixes interrupt problems on some Xeon systems.
 1.33.2.7 21-Jan-2008  yamt sync with head
 1.33.2.6 07-Dec-2007  yamt sync with head
 1.33.2.5 27-Oct-2007  yamt sync with head.
 1.33.2.4 03-Sep-2007  yamt sync with head.
 1.33.2.3 26-Feb-2007  yamt sync with head.
 1.33.2.2 30-Dec-2006  yamt sync with head.
 1.33.2.1 21-Jun-2006  yamt sync with head.
 1.35.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.35.8.2 03-Sep-2006  yamt sync with head.
 1.35.8.1 11-Aug-2006  yamt sync with head
 1.35.4.1 09-Sep-2006  rpaulo sync with head
 1.38.4.2 10-Dec-2006  yamt sync with head.
 1.38.4.1 22-Oct-2006  yamt sync with head
 1.38.2.1 18-Nov-2006  ad Sync with head.
 1.44.2.3 21-Jun-2009  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1327):
sys/arch/amd64/amd64/mainbus.c: revision 1.28 via patch
sys/arch/x86/x86/mp.c: revision 1.2 via patch
sys/arch/i386/i386/mainbus.c: revision 1.85 via patch
sys/arch/x86/x86/mpacpi.c: patch
sys/arch/x86/x86/mpbios.c patch
Apply fixes from jmcneill@ for PR port-i386/38729
(ACPI kernel booted under qemu cannot detect devices):
- make MP SCANPCI function for ACPI_SCANPCI and MPBIOS_SCANPCI
return a number of attached PCI busses
- if no valid PCI busses are attached in the MP SCANPCI function,
try to probe and attach pci0 at mainbus as well as kernels
with no SCANPCI options
"Feel free to check it in" from jmcneill@.
Tested in pkgsrc qemu-0.9.1 (both i386 and x86_64) on NetBSD/i386.
Note original jmcneill's patch was posted on March:
http://mail-index.NetBSD.org/port-i386/2009/03/24/msg001281.html
and I also applied it to amd64:
http://mail-index.NetBSD.org/port-i386/2009/03/24/msg001283.html
but x86 MP attach functions have been reorganized by dyoung@ on April:
http://mail-index.NetBSD.org/source-changes/2009/04/17/msg219992.html
so I've modified the original patches to adapt the changes.
(mpacpi_scan_pci() and mpbios_scan_pci() have been merged into
common mp_pci_scan() in new arch/x86/x86/mp.c)
For netbsd-5 and netbsd-5-0 branches, the original patches should be
applied cleanly, and they have been tested by abs@ on a selection of
i386 boxes and in qemu.
 1.44.2.2 14-Oct-2007  xtraeme Pull up following revision(s) (requested by joerg in ticket #925):
sys/arch/x86/x86/mpacpi.c: revision 1.50
sys/arch/x86/include/mpacpi.h: revision 1.5
sys/arch/x86/x86/acpi_machdep.c: revision 1.16

Merge from mpacpi.h 1.4.32.1, acpi_machdep.c 1.13.22.5 and
mpacpi.c 1.48.12.2 from jmcneill-pm:

Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.
Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.44.2.1 28-Apr-2007  snj branches: 1.44.2.1.2;
Pull up following revision(s) (requested by bouyer in ticket #565):
sys/arch/x86/x86/mpacpi.c: revision 1.47
sys/arch/x86/x86/mpacpi.c: revision 1.48
Properly skip inactive devices; avoids a panic in pci_make_tag() later.
Thanks to cube@ for the idea.
An ACPI kernel can now boot on a poweredge 2950.
Fix previous: don't AcpiOsFree() twice if the device is valid.
 1.44.2.1.2.1 29-Oct-2007  wrstuden Catch up with 4.0 RC3
 1.45.2.2 15-Apr-2007  yamt sync with head.
 1.45.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.46.4.1 11-Jul-2007  mjf Sync with head.
 1.46.2.5 03-Dec-2007  ad Sync with HEAD.
 1.46.2.4 12-Oct-2007  ad Sync with head.
 1.46.2.3 09-Oct-2007  ad Sync with head.
 1.46.2.2 20-Aug-2007  ad Sync with HEAD.
 1.46.2.1 10-Apr-2007  ad Sync with head.
 1.48.12.5 01-Dec-2007  jmcneill Sync with HEAD.
 1.48.12.4 07-Oct-2007  joerg Sync with HEAD.
 1.48.12.3 02-Oct-2007  jmcneill Update to ACPI-CA 20070320
 1.48.12.2 02-Oct-2007  joerg Don't process the MADT and modify the interrupt config at one moment and
later trying to figure out if an entry was overriden and matches the
ACPI SCI. This is brain-dead and breaks in various situations.

Just check for each ISA override entry, if it matches the SCI. If it
does, remember it and use it for the interrupt setup. If there's no such
override assume that it is not changed, but override the polarity and
level from ISA settings to PCI settings.
 1.48.12.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.48.8.1 15-Aug-2007  skrll Sync with HEAD.
 1.48.4.2 16-Oct-2007  garbled Sync with HEAD
 1.48.4.1 03-Oct-2007  garbled Sync with HEAD
 1.49.4.1 14-Oct-2007  yamt sync with head.
 1.49.2.2 09-Jan-2008  matt sync with HEAD
 1.49.2.1 06-Nov-2007  matt sync with HEAD
 1.51.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.53.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.53.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.54.4.2 13-Dec-2007  yamt sync with head.
 1.54.4.1 11-Dec-2007  yamt sync with head.
 1.54.2.1 26-Dec-2007  ad Sync with head.
 1.55.2.1 13-Dec-2007  bouyer Sync with HEAD
 1.56.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.56.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.56.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.56.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.56.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.57.2.3 17-Jun-2008  yamt sync with head.
 1.57.2.2 04-Jun-2008  yamt sync with head
 1.57.2.1 18-May-2008  yamt sync with head.
 1.58.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.58.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.58.2.4 11-Aug-2010  yamt sync with head.
 1.58.2.3 11-Mar-2010  yamt sync with head
 1.58.2.2 19-Aug-2009  yamt sync with head.
 1.58.2.1 04-May-2009  yamt sync with head.
 1.63.2.3 28-Jul-2008  simonb Sync with head.
 1.63.2.2 03-Jul-2008  simonb Sync with head.
 1.63.2.1 27-Jun-2008  simonb Sync with head.
 1.66.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.66.2.1 19-Oct-2008  haad Sync with HEAD.
 1.69.4.2 19-Jun-2009  snj Pull up following revision(s) (requested by tsutsui in ticket #819):
sys/arch/amd64/amd64/mainbus.c: revision 1.28 via patch
sys/arch/i386/i386/mainbus.c: revision 1.85 via patch
sys/arch/x86/x86/mpacpi.c: patch
sys/arch/x86/x86/mpbios.c: patch
Apply fixes from jmcneill@ for PR port-i386/38729
(ACPI kernel booted under qemu cannot detect devices):
- make MP SCANPCI function for ACPI_SCANPCI and MPBIOS_SCANPCI
return a number of attached PCI busses
- if no valid PCI busses are attached in the MP SCANPCI function,
try to probe and attach pci0 at mainbus as well as kernels
with no SCANPCI options
 1.69.4.1 16-Jan-2009  bouyer branches: 1.69.4.1.4;
Pull up following revision(s) (requested by sborrill in ticket #257):
sys/arch/x86/x86/mpacpi.c: revision 1.74
Return ENOENT instead of panicking when irq doesn't equal line
(mpacpi_findintr_linkdev: irq mismatch). This doesn't fix the cause of
kern/38540, but stops the bogus panic. It's pretty definite that the device
with the mismatched irq will not function.
 1.69.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.69.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.69.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.75.2.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.75.2.5 02-May-2011  jym Sync with head.
 1.75.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.75.2.3 24-Oct-2010  jym Sync with HEAD
 1.75.2.2 01-Nov-2009  jym Sync with HEAD.
 1.75.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.84.4.3 21-Apr-2011  rmind sync with head
 1.84.4.2 05-Mar-2011  rmind sync with head
 1.84.4.1 30-May-2010  rmind sync with head
 1.84.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.84.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.89.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.92.8.2 31-Mar-2013  riz Pull up following revision(s) (requested by chs in ticket #855):
sys/arch/x86/acpi/acpi_machdep.c: revision 1.5
sys/arch/x86/acpi/acpi_machdep.c: revision 1.6
sys/arch/x86/x86/mpacpi.c: revision 1.97
redo the ACPI interrupt handler setup again, this time handling
MADT overrides that change the pin as well as the polarity.
fixes PR 47648.
yet more fixes for PR 47648 / PR 47016:
when using a temporary mp_intr_map, initialize the "flags" field
as well as "redir" since apic_set_redir() uses both. fix how
the flags field is change when applying an override, the trigger
and polarity sub-fields aren't just one bit like they are in redir.
 1.92.8.1 22-Nov-2012  riz Pull up following revision(s) (requested by chs in ticket #683):
sys/arch/ia64/include/acpi_machdep.h: revision 1.6
sys/arch/x86/include/acpi_machdep.h: revision 1.11
sys/dev/acpi/acpi.c: revision 1.255
sys/arch/x86/acpi/acpi_machdep.c: revision 1.4
sys/arch/x86/x86/mpacpi.c: revision 1.95
sys/arch/x86/x86/mpacpi.c: revision 1.96
sys/arch/ia64/acpi/acpi_machdep.c: revision 1.6
locate PCI buses and determine their bus numbers using the info
previously extracted from ACPICA rather than trying to figure it out again.
allow PCI buses that don't have a _PRT method.
as a workaround for PR 47016, call ioapic_reenable() at the end of
ACPI interrupt routing to fix the settings for the SCI interrupt.
the problem is that after my recent changes, the SCI handler is
installed before the MADT info is parsed, so we don't know what
polarity it should have. the real fix for this will be to rearrange
the ACPI initialization so that everything is done in a more sensible
order, but that will take some more time.
 1.92.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.92.2.1 30-Oct-2012  yamt sync with head
 1.94.2.3 03-Dec-2017  jdolecek update from HEAD
 1.94.2.2 23-Jun-2013  tls resync from head
 1.94.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.97.12.3 28-Aug-2017  skrll Sync with HEAD
 1.97.12.2 09-Jul-2016  skrll Sync with HEAD
 1.97.12.1 22-Sep-2015  skrll Sync with HEAD
 1.97.10.2 11-Aug-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #947):
sys/arch/x86/x86/mpacpi.c: revision 1.98
Fix wrong output in mpacpi_pci_foundbus() with MPVERBOSE. Assign
vaues before printing them.
 1.97.10.1 11-Aug-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #948):
sys/arch/x86/x86/mpacpi.c: revisions 1.99, 1.100
sys/arch/x86/x86/lapic.c: revision 1.49
- Add lapic_dump() to print lapic's setting.
- Add mpacpi_dump() to dump mp_intrs[].
--
Configure ioapic before lapic because lapic(lapic_set_lvt()) checks the
existence of ioapic. This change fixes a problem that some machines hang
after attaching ehci (little after writing EHCI_USBINTR to enable interrupt).
Even though cold == 1, LAPIC_LVINT0 was not set as masked. Perhaps it's the
reason of the problem.
This problem was observed on SuperMicro X10SLX-F, X10SDV-TLN4F and
Shuttle DS57U without wm(4) driver.
 1.103.16.1 17-Jan-2020  ad Sync with head.
 1.103.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.104.8.1 22-Mar-2021  thorpej Mechanical conversion of config_found_sm_loc() -> config_found().
CFARG_IATTR usage needs to be audited.
 1.105.4.1 31-May-2021  cjep sync with head
 1.105.2.1 13-May-2021  thorpej Sync with HEAD.
 1.106.4.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.109.4.2 03-Oct-2024  martin Pull up following revision(s) (requested by bouyer in ticket #927):

sys/arch/x86/x86/mpacpi.c: revision 1.111

Remove check (x2apic->LocalApicId <= 0xff) in mpacpi_config_cpu(),
the ACPI spec mentions this for compatibility with "legacy OSes" but
doens't explicitely forbid it (AFAIK). This makes a recent Dell
poweredge R750 boot to the installer.

See
https://mail-index.netbsd.org/port-amd64/2023/12/30/msg003666.html
and
https://mail-index.netbsd.org/port-amd64/2024/09/28/msg003695.html

It should help for PR kern/57737
 1.109.4.1 30-Mar-2023  martin Pull up following revision(s) (requested by bouyer in ticket #130):

sys/arch/x86/x86/mpacpi.c: revision 1.110

mpacpi_config_cpu(): Xen with a PVH dom0 reports x2apic->LocalApicId
below 0xff, which causes a panic later because no CPUs are attached.

Accept the bogus LocalApicId value for VM_GUEST_XENPVH.
 1.110.6.1 02-Aug-2025  perseant Sync with HEAD
 1.111.2.1 20-Oct-2025  martin Pull up following revision(s) (requested by riastradh in ticket #66):

sys/arch/x86/include/mpacpi.h: revision 1.12
sys/arch/x86/x86/mpacpi.c: revision 1.112
sys/arch/amd64/conf/ALL: revision 1.194
sys/arch/i386/conf/ALL: revision 1.524
sys/arch/x86/acpi/acpi_machdep.c: revision 1.40
sys/arch/i386/conf/GENERIC: revision 1.1261
sys/dev/acpi/acpi_mcfg.h: revision 1.6
sys/arch/amd64/conf/GENERIC: revision 1.618

x86: Wire up PCI resource manager if enabled.

Enable in your kernel config with `options PCI_RESOURCE'.

Adapted from a patch by mlelstv@.
PR port-amd64/59118: Thinkpad T495s - iwm PCI BAR is zero
 1.72 13-Jan-2025  imil Firecracker and qemu/microvm in MMIO mode don't have ACPI, either
they rely on MP tables, but using it IOAPIC was not detected.
This patch fixes it by adding a Linux-specific behavior, counting
the right amount of entries and then find the IOAPIC entry.
These bugs were found by Colin Percival and described here
https://www.usenix.org/publications/loginonline/freebsd-firecracker

/!\ This needs a new kernel option: MPTABLE_LINUX_BUG_COMPAT
 1.71 07-Oct-2021  msaitoh branches: 1.71.10;
KNF. No functional change.
 1.70 07-Aug-2021  thorpej Merge thorpej-cfargs2.
 1.69 24-Apr-2021  thorpej branches: 1.69.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).
 1.68 08-Jul-2018  kamil branches: 1.68.14;
Revert previous

Misalignment access handling patches are now discussed on tech-kern.

Requested by <martin> and <christos>.
 1.67 07-Jul-2018  kamil Remove unaligned access to mpbios_page[]

Replace unaligned pointer dereference with a more portable construct that
is free from Undefined Behavior semantics.

sys/arch/x86/x86/mpbios.c:308:11, load of misaligned address 0xffff800031c7a413 for type 'const __uint16_t' which requires 2 byte alignment

Detected with Kernel Undefined Behavior Sanitizer
 1.66 23-May-2017  nonaka branches: 1.66.8; 1.66.10;
x86: No ioapic_softc.sc_apicid is used anymore. Use ioapic_softc.sc_pic.pic_apicid.
 1.65 17-Jul-2015  msaitoh KNF. No functional change.
 1.64 10-Jul-2015  msaitoh Fix a problem that "Disable ACPI" doesn't work (PCI interrputs don't occur)
on some machines.

On some machines' MPBIOS table, dest APIC IDs for PCI interrupts are not
IOAPIC's APIC ID. If we couldn't find an IOAPIC with ioapic_find(id), retry
with ioapic_find_bybase(pin). Tested with SuperMicro X10SLX-F.
 1.63 24-Jun-2015  msaitoh No functional change:
- Fix typo.
- KNF a bit.
 1.62 06-Nov-2013  mrg branches: 1.62.6;
gcc 4.8 issues:
- avoid running over the end of an array (this is a real bug, but
i didn't really look closely at what memory is clobbered. it
may not actually matter.)
- move variables inside their #if usage.
 1.61 21-Aug-2013  christos Use the default mp definition tables for ancient machines. From Felix
Deichmann.
 1.60 27-Nov-2012  jakllsch branches: 1.60.2;
Whitespace.
 1.59 17-Oct-2012  dyoung Quiet down autoconfiguration by changing some printf() calls to
aprint_normal() calls.
 1.58 04-Aug-2010  jruoho branches: 1.58.8; 1.58.18;
Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.
 1.57 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.56 07-Nov-2009  cegger branches: 1.56.2; 1.56.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.55 18-Aug-2009  jmcneill Switch to ACPICA 20090730, and update for API changes.
 1.54 17-Apr-2009  dyoung Introduce sys/arch/x86/x86/mp.c for common x86 MP configuration code.
mpacpi_scan_pci() and mpbios_scan_pci() are identical code, so replace
them with mp_pci_scan().

Introduce mp_pci_childdetached(), which helps us to detach root PCI
buses that were enumerated either by MP BIOS or by ACPI.

Let us detach and re-attach PCI buses from mainbus0 on i386. This is
necessarily a work-in-progress, because testing detach and re-attach
is very difficult: to detach and re-attach the entire PCI tree on most
x86 computers that I own is not possible because some essential device
attaches under the PCI subtree: the console, com0, NIC, or storage
controller always attaches in the PCI tree.
 1.53 13-Feb-2009  bouyer Fix printf format for 64bit paddr_t on i386
 1.52 14-Jan-2009  cegger branches: 1.52.2;
use KM_SLEEP per request from ad@
 1.51 23-Dec-2008  cegger move from malloc to kmem
 1.50 16-Dec-2008  christos replace bitmask_snprintf(9) with snprintb(3)
 1.49 09-Nov-2008  cegger struct device * -> device_t
 1.48 09-Nov-2008  cegger Nuke last parameter from mpaci_scan_apics() and mpbios_scan().
It is unused.
 1.47 26-Aug-2008  cegger branches: 1.47.2; 1.47.4; 1.47.8;
beautify dmesg with MPVERBOSE:

don't print an empty line.
 1.46 21-Jul-2008  cegger beautify dmesg with MPVERBOSE.
before:

pci0 at hypervisor0 bus 0: configuration mode 1hypervisor0: added to list as bus 0

pchb0 at pci0 dev 0 function 0

now:

pci0 at hypervisor0 bus 0: configuration mode 1
hypervisor0: added to list as bus 0
pchb0 at pci0 dev 0 function 0
 1.45 03-Jul-2008  drochner branches: 1.45.2;
split device/softc for ioapic
 1.44 03-Jul-2008  drochner Remove "struct device" from "struct pic", where it was only real
for ioapics and faked up for others. Add it to "struct ioapic_softc"
for now, until device/softc get split.
This required all typecasts between "struct pic" and "struct ioapic_softc"
to be replaced, I hope I got them all.
functionally tested on i386, compile-tested on xen, untested on amd64
 1.43 30-Apr-2008  ad branches: 1.43.2; 1.43.4;
If MP is disabled at the boot prompt, then don't use MPBIOS. When ACPI
is also disabled, this completely avoids using ioapics.
 1.42 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.41 16-Apr-2008  cegger branches: 1.41.2; 1.41.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.40 01-Dec-2007  ad branches: 1.40.14;
Shh
 1.39 17-Oct-2007  garbled branches: 1.39.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.38 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.37 07-Aug-2007  ad branches: 1.37.2; 1.37.4;
Add a couple more calls to pmap_update().
 1.36 28-Apr-2007  christos branches: 1.36.2; 1.36.6; 1.36.10;
Fix compilation when NIOAPIC == 0
 1.35 05-Mar-2007  drochner branches: 1.35.2; 1.35.4;
clean up how cpus and ioapics are attached at the mainbus:
Seperate "cpubus" and "ioapicbus" -- while they share a common "address
space" (the apic id), the kernel doesn't use this fact. There are different
data passed to cpus and apics, which caused some ugly polymorphism. This
also saves the special "submatch" functions needed to distingush cpus
and ioapics for autoconf. (And it makes that "apid" locators wired
in the kernel configuration are honored now; this allows one to dumb down
an mp box to singleprocessor by userconfig.)
Print "apid" locators in the buses "print" function "as everyone does",
so the per-port cpu drivers don't need to do it.
Being here, constify "struct cpu_functions" and g/c the unused MP_PICMODE
flag.
 1.34 15-Feb-2007  ad branches: 1.34.2;
Count the number of CPUs at boot and stash in 'ncpu'. Eventually should
have each CPU register at attach, so we can figure out the topology for
the scheduler.
 1.33 26-Jan-2007  rpaulo Don't panic with "lazy bum". I have a machine that can boot multiuser
and run on SMP with this panic commented out.
No replies on tech-kern about this.
 1.32 16-Nov-2006  christos branches: 1.32.2;
__unused removal on arguments; approved by core.
 1.31 12-Oct-2006  dogcow de-__Pify, ANSIfy, and add __unused where necessary.
 1.30 06-Oct-2006  dogcow add initializers so gcc stops whining.
 1.29 28-Sep-2006  bouyer - make it possible to have ACPI without IOAPIC and/or LAPIC
- make it possible for machine-specific code to provide custom R/W routines
in its i82093*.h headers
- always initialize sc->sc_pins[pin], even in the !ioapic_cold case.
No objections on port-i386 and port-amd64.
 1.28 04-Jul-2006  christos branches: 1.28.4; 1.28.6;
Apply fvdl's acpi pci interrupt configuration code.
- MPACPI is no more.
- MPACPI_SCANPCI -> ACPI_SCANPCI
 1.27 11-Dec-2005  christos branches: 1.27.4; 1.27.8; 1.27.16;
merge ktrace-lwp.
 1.26 26-Aug-2005  drochner s/locdesc_t/int/g
 1.25 01-Jun-2005  blymn branches: 1.25.2;
Fix a couple of sloppy casts
Convert u_int to uint
 1.24 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.23 21-Dec-2004  fvdl branches: 1.23.2; 1.23.4;
Use fixed mode, not lopri, for delivering IO interrupts. Suggested by
Peter O'Kane. Fixes interrupt problems on some Xeon systems.
 1.22 30-Aug-2004  drochner Phase out the use of a string as first "attach args" member to control
which bustype should be attached with a specific call to config_found()
(from a "mainbus" or a bus bridge).
Do it for isa/eisa/mca and pci/agp for now. These buses all attach to
an mi interface attribute "isabus", "eisabus" etc., and the autoconf
framework now allows to specify an interface attribute on config_found()
and config_search(), which limits the search of matching config data
to these which attach to that specific attribute.
So we basically have to call config_found_ia(..., "foobus", ...) where
such a bus is attached.
As a consequence, where a "mainbus" or alike also attaches other
devices (eg CPUs) which do not attach to a specific attribute yet,
we need at least pass an attribute name (different from "foobus") so
that the foo bus is not found at these places. This made some minor
changes necessary which are not obviously related to the mentioned buses.
 1.21 05-May-2004  kochi Fix comment (mp_nbusses -> mp_nbus)
 1.20 03-May-2004  kochi use M_ZERO for malloc
 1.19 30-Oct-2003  fvdl branches: 1.19.4;
* keep track of PCI buses that aren't known by firmware, but are found
by NetBSD
* use this info in in intr_find_mpmapping
* get rid of the last argument to intr_find_mpmapping, it was redundant
 1.18 29-Oct-2003  mycroft Move a panic() to a different location, and eliminate a bogus initializer.
 1.17 27-Oct-2003  lukem appease gcc's uninitialised variable detection
 1.16 21-Oct-2003  fvdl If a bus has not been configured by MPBIOS/ACPI, and the attach hook
for it is called, mark it as configured.
 1.15 16-Oct-2003  fvdl Add hooks and structures to allow the MP table intr mapping code a
better shot at finding a mapping. For PCI interrupts, if a bus
has no mappings, try its parent, with the swizzled pin, and the
bridge's device number.
 1.14 09-Oct-2003  fvdl Allow probing of CPUs only by ACPI, so that MPBIOS can still do interrupt
mapping should ACPI have a quirk. From Christos. One change by me: make
sure that lapic_boot_init doesn't get called twice, otherwise the
cpu_info entry for the CPU with id 0 gets zapped.
 1.13 07-Oct-2003  fvdl Backout previous for now, it breaks second CPU spinup. It'll be back later.
 1.12 07-Oct-2003  fvdl Changes from Christos to fall back to MPBIOS for interrupt probing
if MPACPI fails, so that MPACPI can be used to only probe CPUs
if needed.
 1.11 06-Sep-2003  fvdl When establishing the ACPI SCI, make sure it's always active low (as well
as level-triggered). Do this by changing the MP config entry that was
set up for the interrupt. Do not change anything if there was an ACPI
interrupt source override, assume that this contains the correct
information already.
 1.10 14-Jul-2003  lukem add __KERNEL_RCSID()
 1.9 01-Jun-2003  fvdl branches: 1.9.2;
mpb_name may not be set for a bus, since it's possible a PCI bus
doesn't show up when looking at ACPI, but is found on a ppb. So
check if it's NULL before doing a strcmp on it.

From Takayoshi Kochi.
 1.8 29-May-2003  fvdl Add the options MPBIOS_SCANPCI and MPACPI_SCANPCI to configure PCI roots
with the MPBIOS/ACPI bus information, by walking through the buses, and
descending down every bus that hasn't been marked configured yet.
 1.7 15-May-2003  fvdl Postpone the ioapic_ih assignment a bit, since the pin number may have
been corrected in a workaround in the meantime.
 1.6 15-May-2003  fvdl Try a little harder to find PCI buses in the MPACPI code, in a (probably
futile) attempt to get quirky ACPI implementations going.

Work around a problem with quirky MP tables for ioapic interrupt routing.
 1.5 11-May-2003  fvdl Initialize the global int number to -1 (the MPBIOS code doesn't set it
anywhere else; MPACPI does use it).
 1.4 01-Apr-2003  thorpej Use PAGE_SIZE rather than NBPG.
 1.3 04-Mar-2003  fvdl ioapic address is not actually a pointer, initialize it as uint32_t
 1.2 04-Mar-2003  fvdl Make EISA support conditional (on by default on i386).
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.9.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.9.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.9.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.9.2.2 03-Sep-2004  skrll Sync with HEAD
 1.9.2.1 03-Aug-2004  skrll Sync with HEAD
 1.19.4.1 15-Apr-2005  tron Pull up revision 1.23 (requested by fvdl in ticket #1073):
Use fixed mode, not lopri, for delivering IO interrupts. Suggested by
Peter O'Kane. Fixes interrupt problems on some Xeon systems.
 1.23.4.1 25-Jan-2005  yamt - convert i386 to new apis.
- remove a pmap bootstrap kludge, which is no longer needed.
 1.23.2.1 29-Apr-2005  kent sync with -current
 1.25.2.6 07-Dec-2007  yamt sync with head
 1.25.2.5 27-Oct-2007  yamt sync with head.
 1.25.2.4 03-Sep-2007  yamt sync with head.
 1.25.2.3 26-Feb-2007  yamt sync with head.
 1.25.2.2 30-Dec-2006  yamt sync with head.
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.27.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.27.8.1 11-Aug-2006  yamt sync with head
 1.27.4.1 09-Sep-2006  rpaulo sync with head
 1.28.6.2 10-Dec-2006  yamt sync with head.
 1.28.6.1 22-Oct-2006  yamt sync with head
 1.28.4.2 01-Feb-2007  ad Sync with head.
 1.28.4.1 18-Nov-2006  ad Sync with head.
 1.32.2.1 21-Jun-2009  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1327):
sys/arch/amd64/amd64/mainbus.c: revision 1.28 via patch
sys/arch/x86/x86/mp.c: revision 1.2 via patch
sys/arch/i386/i386/mainbus.c: revision 1.85 via patch
sys/arch/x86/x86/mpacpi.c: patch
sys/arch/x86/x86/mpbios.c patch
Apply fixes from jmcneill@ for PR port-i386/38729
(ACPI kernel booted under qemu cannot detect devices):
- make MP SCANPCI function for ACPI_SCANPCI and MPBIOS_SCANPCI
return a number of attached PCI busses
- if no valid PCI busses are attached in the MP SCANPCI function,
try to probe and attach pci0 at mainbus as well as kernels
with no SCANPCI options
"Feel free to check it in" from jmcneill@.
Tested in pkgsrc qemu-0.9.1 (both i386 and x86_64) on NetBSD/i386.
Note original jmcneill's patch was posted on March:
http://mail-index.NetBSD.org/port-i386/2009/03/24/msg001281.html
and I also applied it to amd64:
http://mail-index.NetBSD.org/port-i386/2009/03/24/msg001283.html
but x86 MP attach functions have been reorganized by dyoung@ on April:
http://mail-index.NetBSD.org/source-changes/2009/04/17/msg219992.html
so I've modified the original patches to adapt the changes.
(mpacpi_scan_pci() and mpbios_scan_pci() have been merged into
common mp_pci_scan() in new arch/x86/x86/mp.c)
For netbsd-5 and netbsd-5-0 branches, the original patches should be
applied cleanly, and they have been tested by abs@ on a selection of
i386 boxes and in qemu.
 1.34.2.2 07-May-2007  yamt sync with head.
 1.34.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.35.4.1 11-Jul-2007  mjf Sync with head.
 1.35.2.4 03-Dec-2007  ad Sync with HEAD.
 1.35.2.3 09-Oct-2007  ad Sync with head.
 1.35.2.2 20-Aug-2007  ad Sync with HEAD.
 1.35.2.1 27-May-2007  ad Sync with head.
 1.36.10.3 03-Dec-2007  joerg Sync with HEAD.
 1.36.10.2 02-Oct-2007  joerg Sync with HEAD.
 1.36.10.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.36.6.1 15-Aug-2007  skrll Sync with HEAD.
 1.36.2.1 03-Oct-2007  garbled Sync with HEAD
 1.37.4.1 06-Oct-2007  yamt sync with head.
 1.37.2.2 09-Jan-2008  matt sync with HEAD
 1.37.2.1 06-Nov-2007  matt sync with HEAD
 1.39.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.40.14.3 17-Jan-2009  mjf Sync with HEAD.
 1.40.14.2 28-Sep-2008  mjf Sync with HEAD.
 1.40.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.41.4.5 11-Aug-2010  yamt sync with head.
 1.41.4.4 11-Mar-2010  yamt sync with head
 1.41.4.3 19-Aug-2009  yamt sync with head.
 1.41.4.2 04-May-2009  yamt sync with head.
 1.41.4.1 16-May-2008  yamt sync with head.
 1.41.2.1 18-May-2008  yamt sync with head.
 1.43.4.2 28-Jul-2008  simonb Sync with head.
 1.43.4.1 03-Jul-2008  simonb Sync with head.
 1.43.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.45.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.45.2.1 19-Oct-2008  haad Sync with HEAD.
 1.47.8.1 21-Apr-2010  matt sync to netbsd-5
 1.47.4.2 29-Sep-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1040):
sys/arch/x86/x86/ioapic.c: revision 1.39
sys/arch/x86/x86/mpbios.c: revision 1.53
Fix printf format for 64bit paddr_t on i386
 1.47.4.1 19-Jun-2009  snj Pull up following revision(s) (requested by tsutsui in ticket #819):
sys/arch/amd64/amd64/mainbus.c: revision 1.28 via patch
sys/arch/i386/i386/mainbus.c: revision 1.85 via patch
sys/arch/x86/x86/mpacpi.c: patch
sys/arch/x86/x86/mpbios.c: patch
Apply fixes from jmcneill@ for PR port-i386/38729
(ACPI kernel booted under qemu cannot detect devices):
- make MP SCANPCI function for ACPI_SCANPCI and MPBIOS_SCANPCI
return a number of attached PCI busses
- if no valid PCI busses are attached in the MP SCANPCI function,
try to probe and attach pci0 at mainbus as well as kernels
with no SCANPCI options
 1.47.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.47.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.47.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.52.2.3 24-Oct-2010  jym Sync with HEAD
 1.52.2.2 01-Nov-2009  jym Sync with HEAD.
 1.52.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.56.4.2 05-Mar-2011  rmind sync with head
 1.56.4.1 30-May-2010  rmind sync with head
 1.56.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.56.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.58.18.4 03-Dec-2017  jdolecek update from HEAD
 1.58.18.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.58.18.2 25-Feb-2013  tls resync with head
 1.58.18.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.58.8.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.58.8.2 16-Jan-2013  yamt sync with (a bit old) head
 1.58.8.1 30-Oct-2012  yamt sync with head
 1.60.2.2 18-May-2014  rmind sync with head
 1.60.2.1 28-Aug-2013  rmind sync with head
 1.62.6.2 28-Aug-2017  skrll Sync with HEAD
 1.62.6.1 22-Sep-2015  skrll Sync with HEAD
 1.66.10.1 10-Jun-2019  christos Sync with HEAD
 1.66.8.1 28-Jul-2018  pgoyette Sync with HEAD
 1.68.14.1 22-Mar-2021  thorpej Mechanical conversion of config_found_sm_loc() -> config_found().
CFARG_IATTR usage needs to be audited.
 1.69.8.1 04-Aug-2021  thorpej Adapt to CFARGS().
 1.71.10.1 02-Aug-2025  perseant Sync with HEAD
 1.16 05-Oct-2009  rmind Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.15 04-Jan-2008  ad branches: 1.15.10; 1.15.24;
Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.14 04-Jan-2008  ad sys/lock.h isn't needed here.
 1.13 09-Dec-2007  ad branches: 1.13.2;
Use sys/atomic instead of __asm().
 1.12 09-Dec-2007  ad Use the new memory barriers.
 1.11 17-Oct-2007  garbled branches: 1.11.2; 1.11.4; 1.11.6;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.10 06-Oct-2007  xtraeme Use a two clause license for all the code I contributed.

The envsys code will be changed later.
 1.9 15-May-2007  xtraeme branches: 1.9.2; 1.9.10; 1.9.12; 1.9.14; 1.9.16;
cosmetic: use a single line for the global vars of same type.
 1.8 25-Mar-2007  xtraeme branches: 1.8.2; 1.8.6; 1.8.8;
Explicitly initialize msr.
 1.7 25-Mar-2007  xtraeme typo.
 1.6 25-Mar-2007  xtraeme Add another member to struct cpu_msr_broadcast, msr_read that will
enable the rdmsr call in msr_write_ipi(), so that when it's not
defined we don't read it before writing; disabled in powernow_k8
and enabled in the others.
 1.5 21-Mar-2007  xtraeme branches: 1.5.2;
typo
 1.4 21-Mar-2007  xtraeme - Remove ci_msr_rvalue, it's not useful anymore as yamt@ pointed out.
- Remove completely debug from msr_ipifuncs, now it's known to work.
 1.3 21-Mar-2007  xtraeme Disable debug.
 1.2 21-Mar-2007  xtraeme Remove the MSR read IPI handler, there won't be any driver that will
use it, and we can see if the values are ok in the CPUs in the write
operation.

Suggested by YAMAMOTO Takashi.
 1.1 20-Mar-2007  xtraeme MSR read and write IPI handlers for x86. A MSR will be read or written
in all CPUs available in the system. This adds another member
to struct cpu_info, ci_msr_rvalue; it will contain the value of the MSR
in a previous operation.

Tested with clockmod in UP and SMP by me, tested with est in SMP
by Daniel Carosone and Michael Van Elst.

Ok'ed by Andrew Doran and Matthew R. Green.
 1.5.2.4 17-May-2007  yamt sync with head.
 1.5.2.3 15-Apr-2007  yamt sync with head.
 1.5.2.2 24-Mar-2007  yamt sync with head.
 1.5.2.1 21-Mar-2007  yamt file msr_ipifuncs.c was added on branch yamt-idlelwp on 2007-03-24 14:55:06 +0000
 1.8.8.2 16-Oct-2007  garbled Sync with HEAD
 1.8.8.1 22-May-2007  matt Update to HEAD.
 1.8.6.2 20-Apr-2007  bouyer Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.8.6.1 25-Mar-2007  bouyer file msr_ipifuncs.c was added on branch netbsd-4 on 2007-04-20 20:31:27 +0000
 1.8.2.4 09-Oct-2007  ad Sync with head.
 1.8.2.3 27-May-2007  ad Sync with head.
 1.8.2.2 10-Apr-2007  ad Sync with head.
 1.8.2.1 25-Mar-2007  ad file msr_ipifuncs.c was added on branch vmlocking on 2007-04-10 13:22:46 +0000
 1.9.16.1 14-Oct-2007  yamt sync with head.
 1.9.14.4 21-Jan-2008  yamt sync with head
 1.9.14.3 27-Oct-2007  yamt sync with head.
 1.9.14.2 03-Sep-2007  yamt sync with head.
 1.9.14.1 15-May-2007  yamt file msr_ipifuncs.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:28 +0000
 1.9.12.2 09-Jan-2008  matt sync with HEAD
 1.9.12.1 06-Nov-2007  matt sync with HEAD
 1.9.10.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.9.10.1 07-Oct-2007  joerg Sync with HEAD.
 1.9.2.2 11-Jul-2007  mjf Sync with head.
 1.9.2.1 15-May-2007  mjf file msr_ipifuncs.c was added on branch mjf-ufs-trans on 2007-07-11 20:03:24 +0000
 1.11.6.1 11-Dec-2007  yamt sync with head.
 1.11.4.1 26-Dec-2007  ad Sync with head.
 1.11.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.13.2.1 08-Jan-2008  bouyer Sync with HEAD
 1.15.24.1 01-Nov-2009  jym Sync with HEAD.
 1.15.10.1 11-Mar-2010  yamt sync with head
 1.32 07-Oct-2021  msaitoh KNF. No functional change.
 1.31 31-Jan-2020  maxv constify
 1.30 04-Mar-2018  jdolecek branches: 1.30.4; 1.30.10;
use tlbflush() instead of writing to %cr3, so it's more clear what code does
 1.29 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.28 29-May-2014  plunky branches: 1.28.4;
\%s is not an escape sequence, and we want %s
 1.27 22-Apr-2012  rmind branches: 1.27.2; 1.27.12;
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.26 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.25 15-Dec-2011  abs branches: 1.25.2;
Increase MTRR_I686_NVAR_MAX from 8 to 16. Avoids
"FIXME: more than 8 MTRRs (10)" message on booting Thinkpad W520 and
similar. While here replace a magic number with MTRR_I686_NVAR_MAX * 2
 1.24 18-Jan-2011  jmcneill branches: 1.24.6; 1.24.10;
- fix an off-by-one that disallowed adjacent mappings with conflicting
types from being created
- only allow MTRR_TYPE_WC mappings if the processor supports it
 1.23 12-Jan-2011  jmcneill branches: 1.23.2;
Handle overlapping variable-range MTRRs following the rules defined in
section 11.11.4.1 "MTRR Precendes" of the Intel(R) 64 and IA-32
Architectures Software Developer's Manual, Volume 3A.
 1.22 08-Jul-2010  cegger use __arraycount
 1.21 17-Jun-2010  mrg when complaining we don't support this many MTRR's, say how many there are.
 1.20 21-Nov-2009  rmind branches: 1.20.2; 1.20.4;
Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.19 13-Oct-2008  sborrill branches: 1.19.4; 1.19.8;
Print actual maximum amount of MTRRs configured
 1.18 01-Jul-2008  mrg branches: 1.18.2;
hack around PR#38480:

- rename MTRR_I686_NVAR to MTRR_I686_NVAR_MAX, still set to 8
- store mtrr VCNT value into i686_mtrr_vcnt. if it is less than 8,
zero out the relevant parts of mtrr_raw[].msraddr
- replace all usage of MTRR_I686_NVAR with either i686_mtrr_vcnt or
with MTRR_I686_NVAR_MAX as appropriate
- in i686_mtrr_reload() and mtrr_init_first() don't use mtrr_raw[]
addresses of 0

still needs a bunch of reworking to handle VCNT > 8 case.
 1.17 12-May-2008  ad branches: 1.17.2;
- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().
 1.16 28-Apr-2008  martin branches: 1.16.2;
Remove clause 3 and 4 from TNF licenses
 1.15 16-Apr-2008  cegger branches: 1.15.2; 1.15.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.14 04-Jan-2008  ad branches: 1.14.6;
sys/lock.h isn't needed here.
 1.13 28-Nov-2007  ad branches: 1.13.6;
Use the new atomic ops.
 1.12 19-Oct-2007  pavel branches: 1.12.2;
The control registers (notably CR3 and CR4) are 64-bit on amd64 (see
"AMD64 Architecture Programmer's Manual"). Declare the variables
holding them as vaddr_t, otherwise the upper bits are lost.

(CR0 is actually 64-bit too, but the upper bits are unused, so I am
not changing it now.)

Should fix the reboot caused by X11. From Arto Huusko in
PR port-amd64/37043.
 1.11 17-Oct-2007  garbled Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.10 26-Sep-2007  ad branches: 1.10.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.9 20-Mar-2007  drochner branches: 1.9.2; 1.9.4; 1.9.12; 1.9.14; 1.9.16;
Import DRM drivers, brought into shape by Yorick Hardy, posted to tech-x11.
Minor modifications by me:
-use an mi device major number
-(coarsly) divided into pci card specific and less specific parts, moved
the latter to dev/drm
-renamed autoconf attributes to reflect this
Todo:
-adapt all card frontends but i915 to drm include file location
-review the mtrr change
-make the change to agp_i810.c coexist with the fix for buggy VESA
BIOSes which is commented out temporarily
-RCS IDs etc style stuff
-LKM support (rescan support for vga)
-test
 1.8 16-Nov-2006  christos branches: 1.8.2; 1.8.4; 1.8.8; 1.8.10; 1.8.12; 1.8.14;
__unused removal on arguments; approved by core.
 1.7 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.6 02-Sep-2006  christos branches: 1.6.2; 1.6.4;
Add missing initializers
 1.5 27-Mar-2006  bouyer Add a comment saying why p can't be NULL here. Coverity ID 764.
 1.4 11-Dec-2005  christos branches: 1.4.4; 1.4.6; 1.4.8; 1.4.10; 1.4.12;
merge ktrace-lwp.
 1.3 01-Nov-2003  jdolecek branches: 1.3.16;
avoid strong words; use 'screw' instead
 1.2 03-Mar-2003  fvdl branches: 1.2.2;
Use unsigned long long to print msr values.
 1.1 26-Feb-2003  fvdl Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.2.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.2.1 03-Aug-2004  skrll Sync with HEAD
 1.3.16.6 21-Jan-2008  yamt sync with head
 1.3.16.5 07-Dec-2007  yamt sync with head
 1.3.16.4 27-Oct-2007  yamt sync with head.
 1.3.16.3 03-Sep-2007  yamt sync with head.
 1.3.16.2 30-Dec-2006  yamt sync with head.
 1.3.16.1 21-Jun-2006  yamt sync with head.
 1.4.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.4.10.1 19-Apr-2006  elad sync with head - hopefully this will work
 1.4.8.2 03-Sep-2006  yamt sync with head.
 1.4.8.1 01-Apr-2006  yamt sync with head.
 1.4.6.1 22-Apr-2006  simonb Sync with head.
 1.4.4.1 09-Sep-2006  rpaulo sync with head
 1.6.4.2 10-Dec-2006  yamt sync with head.
 1.6.4.1 22-Oct-2006  yamt sync with head
 1.6.2.1 18-Nov-2006  ad Sync with head.
 1.8.14.1 06-Jan-2008  wrstuden Catch up to netbsd-4.0 release.
 1.8.12.1 29-Mar-2007  reinoud Pullup to -current
 1.8.10.1 11-Jul-2007  mjf Sync with head.
 1.8.8.4 03-Dec-2007  ad Sync with HEAD.
 1.8.8.3 03-Dec-2007  ad Sync with HEAD.
 1.8.8.2 09-Oct-2007  ad Sync with head.
 1.8.8.1 10-Apr-2007  ad Sync with head.
 1.8.4.1 24-Mar-2007  yamt sync with head.
 1.8.2.4 18-Nov-2008  bouyer Pull up following revision(s) (requested by sborrill in ticket #1173):
sys/arch/x86/include/mtrr.h: revision 1.4
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.54
sys/arch/x86/x86/mtrr_i686.c: revision 1.18
hack around PR#38480:
- rename MTRR_I686_NVAR to MTRR_I686_NVAR_MAX, still set to 8
- store mtrr VCNT value into i686_mtrr_vcnt. if it is less than 8,
zero out the relevant parts of mtrr_raw[].msraddr
- replace all usage of MTRR_I686_NVAR with either i686_mtrr_vcnt or
with MTRR_I686_NVAR_MAX as appropriate
- in i686_mtrr_reload() and mtrr_init_first() don't use mtrr_raw[]
addresses of 0
still needs a bunch of reworking to handle VCNT > 8 case.
Ensure optional MTRR sections are built if MTRR is enabled (missing
Fix build due to changes in revision 1.4 of sys/arch/x86/include/mtrr.h
 1.8.2.3 23-Aug-2008  bouyer Back out ticket #1173, it breaks the build of amd64 kernels.
 1.8.2.2 20-Aug-2008  bouyer Pull up following revision(s) (requested by sborrill in ticket #1173):
sys/arch/x86/include/mtrr.h: revision 1.4
sys/arch/x86/x86/mtrr_i686.c: revision 1.18
hack around PR#38480:
- rename MTRR_I686_NVAR to MTRR_I686_NVAR_MAX, still set to 8
- store mtrr VCNT value into i686_mtrr_vcnt. if it is less than 8,
zero out the relevant parts of mtrr_raw[].msraddr
- replace all usage of MTRR_I686_NVAR with either i686_mtrr_vcnt or
with MTRR_I686_NVAR_MAX as appropriate
- in i686_mtrr_reload() and mtrr_init_first() don't use mtrr_raw[]
addresses of 0
still needs a bunch of reworking to handle VCNT > 8 case.
 1.8.2.1 29-Oct-2007  liamjfoy Pull up following revision(s) (requested by pavel in ticket #959):
sys/arch/x86/x86/mtrr_i686.c: revision 1.12
The control registers (notably CR3 and CR4) are 64-bit on amd64 (see
"AMD64 Architecture Programmer's Manual"). Declare the variables
holding them as vaddr_t, otherwise the upper bits are lost.
(CR0 is actually 64-bit too, but the upper bits are unused, so I am
not changing it now.)
Should fix the reboot caused by X11. From Arto Huusko in
PR port-amd64/37043.
 1.9.16.1 06-Oct-2007  yamt sync with head.
 1.9.14.2 09-Jan-2008  matt sync with HEAD
 1.9.14.1 06-Nov-2007  matt sync with HEAD
 1.9.12.3 03-Dec-2007  joerg Sync with HEAD.
 1.9.12.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.9.12.1 02-Oct-2007  joerg Sync with HEAD.
 1.9.4.1 03-Oct-2007  garbled Sync with HEAD
 1.9.2.1 18-Apr-2007  thorpej Convert i386 and amd64 to the new atomic ops API.
 1.10.2.1 25-Oct-2007  bouyer Sync with HEAD.
 1.12.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.12.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.13.6.1 08-Jan-2008  bouyer Sync with HEAD
 1.14.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.14.6.2 02-Jul-2008  mjf Sync with HEAD.
 1.14.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.15.4.4 11-Aug-2010  yamt sync with head.
 1.15.4.3 11-Mar-2010  yamt sync with head
 1.15.4.2 04-May-2009  yamt sync with head.
 1.15.4.1 16-May-2008  yamt sync with head.
 1.15.2.1 18-May-2008  yamt sync with head.
 1.16.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.16.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.17.2.1 03-Jul-2008  simonb Sync with head.
 1.18.2.1 19-Oct-2008  haad Sync with HEAD.
 1.19.8.2 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.19.8.1 24-Oct-2010  jym Sync with HEAD
 1.19.4.2 19-Jun-2013  bouyer Pull up following revision(s) (requested by msaitoh in ticket #1847):
sys/arch/x86/include/mtrr.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.25
sys/arch/x86/include/specialreg.h: revision 1.55
Increase MTRR_I686_NVAR_MAX from 8 to 16. Avoids
"FIXME: more than 8 MTRRs (10)" message on booting Thinkpad W520 and
similar. While here replace a magic number with MTRR_I686_NVAR_MAX * 2
 1.19.4.1 16-Feb-2011  bouyer Pull up following revision(s) (requested by jmcneill in ticket #1547):
sys/arch/x86/x86/mtrr_i686.c: revisions 1.23, 1.24
allows overlapping uncached and cached mappings
corrects an off-by-one which prevents adjacent mappings from being setup.
 1.20.4.2 05-Mar-2011  rmind sync with head
 1.20.4.1 03-Jul-2010  rmind sync with head
 1.20.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.23.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.24.10.2 29-Apr-2012  mrg sync to latest -current.
 1.24.10.1 18-Feb-2012  mrg merge to -current.
 1.24.6.2 23-May-2012  yamt sync with head.
 1.24.6.1 17-Apr-2012  yamt sync with head
 1.25.2.1 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.27.12.1 10-Aug-2014  tls Rebase.
 1.27.2.2 03-Dec-2017  jdolecek update from HEAD
 1.27.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.28.4.1 28-Aug-2017  skrll Sync with HEAD
 1.30.10.1 29-Feb-2020  ad Sync with head.
 1.30.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.8 07-Oct-2021  msaitoh KNF. No functional change.
 1.7 03-Oct-2021  fcambus Fix typo when erroring out on unknown ELF machine type.
 1.6 25-Jun-2020  jdolecek rearrange code to remove need for the scratch space variable, simply put
the data to destination bootinfo buffer directly

XXX compile tested only, needs confirmation that it still works
 1.5 24-Jun-2020  jdolecek don't try allocating 16KB of scratch space on stack

it's too early for kmem_alloc(), so use static variable in BSS; it's used
post reloc, so don't need to use the RELOC() macros

XXX compile-tested only on i386
 1.4 30-Jan-2020  manu branches: 1.4.6;
Insert memory map with its real size, not the maximum possible.
 1.3 10-Dec-2019  manu branches: 1.3.2;
Add multiboot 2 support to amd64 kernel
 1.2 18-Oct-2019  hannken Make compile with "options DEBUG".
 1.1 18-Oct-2019  manu Multiboot2 kernel support for i386

That implementation works either with BIOS or UEFI bootstrap

This requires the following kernel changes:

Add UEFI boot services and I/O method protoypes
src/sys/arch/x86/include/efi.h 1.8 - 1.9

Fix EFI system table mapping in virtual space
src/sys/arch/x86/x86/efi.c 1.19 - 1.20

Make sure no bioscall is issued when booting off UEFI system
src/sys/arch/i386/i386/machdep.c 1.821 - 1.822
src/sys/arch/i386/pci/piixpcib.c 1.22 - 1.23

And the following bootstrap changes:

Add kernel symbols for multiboot1
src/sys/arch/i386/stand/lib/exec_multiboot1.c 1.2 - 1.3
src/sys/arch/i386/stand/lib/libi386.h 1.45 - 1.47

Fix kernel symbols for multiboot2
src/sys/arch/i386/stand/lib/exec_multiboot2.c 1.2 - 1.3
 1.3.2.1 29-Feb-2020  ad Sync with head.
 1.4.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.6.1 30-Jan-2020  martin file multiboot2.c was added on branch phil-wifi on 2020-04-13 08:04:11 +0000
 1.6 15-May-2022  riastradh x86: Use atomic_store_release/atomic_load_consume for nmi_handlers.

Simplifies things a bit. No functional change intended.
 1.5 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.4 26-Nov-2013  rmind branches: 1.4.6;
Switch XC_HIGHPRI to run at IPL_SOFTSERIAL i.e. the highest software level.
Adjust pcu(9) to this xcall(9) change. This may fix the problems after
x86 FPU was converted to use PCU, since it avoids heavy contention at the
lower levels (particularly, IPL_SOFTNET). This is a good illustration why
software interrupts should generally avoid any blocking on locks.
 1.3 12-Oct-2011  yamt branches: 1.3.2; 1.3.12; 1.3.16;
- (ab)use pserialize instead of home-grown one
- add an explicit membar
 1.2 24-Feb-2009  yamt branches: 1.2.2; 1.2.4; 1.2.6;
nmi_disestablish: fix an inverted condition. pointed out by ad@.
 1.1 24-Feb-2009  yamt - rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.2.6.3 01-Nov-2009  jym Sync with HEAD.
 1.2.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.6.1 24-Feb-2009  jym file nmi.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.2.4.2 04-May-2009  yamt sync with head.
 1.2.4.1 24-Feb-2009  yamt file nmi.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:11 +0000
 1.2.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.2.2.1 24-Feb-2009  skrll file nmi.c was added on branch nick-hppapmap on 2009-03-03 18:29:37 +0000
 1.3.16.1 18-May-2014  rmind sync with head
 1.3.12.2 03-Dec-2017  jdolecek update from HEAD
 1.3.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.6.1 28-Aug-2017  skrll Sync with HEAD
 1.5 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.4 27-Mar-2014  christos branches: 1.4.6;
correct/add protection against snprintf overflow.
 1.3 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.2 02-Jun-2012  dsl branches: 1.2.2; 1.2.4;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.1 04-Mar-2011  jruoho branches: 1.1.2; 1.1.4; 1.1.6; 1.1.10; 1.1.12;
Move INTEL_ONDEMAND_CLOCKMOD -- or odcm(4) -- to the cpufeaturebus.
 1.1.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.12.1 30-Oct-2012  yamt sync with head
 1.1.10.2 06-Jun-2011  jruoho Sync with HEAD.
 1.1.10.1 04-Mar-2011  jruoho file odcm.c was added on branch jruoho-x86intr on 2011-06-06 09:07:08 +0000
 1.1.6.2 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.1.6.1 04-Mar-2011  jym file odcm.c was added on branch jym-xensuspend on 2011-03-28 23:04:53 +0000
 1.1.4.2 05-Mar-2011  rmind sync with head
 1.1.4.1 04-Mar-2011  rmind file odcm.c was added on branch rmind-uvmplock on 2011-03-05 20:52:31 +0000
 1.1.2.2 05-Mar-2011  bouyer Sync with HEAD
 1.1.2.1 04-Mar-2011  bouyer file odcm.c was added on branch bouyer-quota2 on 2011-03-05 15:10:10 +0000
 1.2.4.1 18-May-2014  rmind sync with head
 1.2.2.2 03-Dec-2017  jdolecek update from HEAD
 1.2.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.6.1 28-Aug-2017  skrll Sync with HEAD
 1.2 15-Mar-2007  xtraeme Ok... there were people really angry with this, backing it out.
 1.1 15-Mar-2007  xtraeme Add a driver for the Pentium 4 and later models with feature TM
(Thermal Monitor).

This driver will throttle the CPU clock modulation, saving some
power, also known as ODMC (On Demand Modulation Clock).

The processor can change from 12.5% to 100% (there are two erratas,
so two levels might be skipped in the worst case).

If supported, you'll see the following sysctl sub-tree:

machdep.p4tcc.throttling.target: CPU Clock throttling state (0 = lowest, 7 highest)
machdep.p4tcc.throttling.current: current CPU throttling state
machdep.p4tcc.throttling.available: list of CPU Clock throttling states

machdep.p4tcc.throttling.target = 2
machdep.p4tcc.throttling.current = 2
machdep.p4tcc.throttling.available = 7 6 5 4 3 2

Adapted from OpenBSD/FreeBSD.
 1.53 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.52 20-Aug-2022  riastradh x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
 1.51 30-Jul-2022  riastradh x86: Eliminate mfence hotpatch for membar_sync.

The more-compatible LOCK ADD $0,-N(%rsp) turns out to be cheaper
than MFENCE anyway. Let's save some space and maintenance and rip
out the hotpatching for it.
 1.50 09-Apr-2022  riastradh x86: Every load is a load-acquire, so membar_consumer is a noop.

lfence is only needed for MD logic, such as operations on I/O memory
rather than normal cacheable memory, or special instructions like
RDTSC -- never for MI synchronization between threads/CPUs. No need
for hot-patching to do lfence here.

(The x86_lfence function might reasonably be patched on i386 to do
lfence for MD logic, but it isn't now and this doesn't change that.)
 1.49 07-May-2020  maxv Fix LOCKDEBUG compilation on i386.
 1.48 02-May-2020  maxv Remove the D bit as part of the hotpatch cleanup procedure.
 1.47 02-May-2020  maxv Modify the hotpatch mechanism, in order to make it much less ROP-friendly.

Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.

- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.

The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).

With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.

The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
 1.46 01-May-2020  maxv Switch the rest of i386 to the x86_hotpatch mechanism.
 1.45 01-May-2020  maxv Use absolute jumps, and drop the PC-relative patching. We want exact
templates.
 1.44 01-May-2020  maxv Use the hotpatch framework when patching _atomic_cas_64.
 1.43 30-Apr-2020  maxv Switch to templates.
 1.42 26-Apr-2020  maxv Use the hotpatch framework for LFENCE/MFENCE.
 1.41 26-Apr-2020  maxv Drop the hardcoded array, use the hotpatch section.
 1.40 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.39 10-Apr-2020  bouyer Revert, wrong branch
 1.38 10-Apr-2020  bouyer Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.
 1.37 18-Sep-2019  kamil branches: 1.37.6;
Switch to __noubsan in x86_hotpatch()
 1.36 28-May-2019  kamil Disable sanitizer instrumentation in x86_hotpatch()

Local variables have empty (0-sized), unknown alignment to UBSan.
This is hard to workaround without mutating the code too much.
 1.35 14-Jul-2018  maxv Remove ifdef GPROF.
 1.34 13-Mar-2018  maxv branches: 1.34.2;
Fix wrong order; first enable WP, then enable interrupts. Otherwise we
might get an interrupt before re-enabling WP, and be rescheduled as a
result. In practice it never happens, because the previous PSL always
has interrupts disabled too.
 1.33 22-Feb-2018  maxv branches: 1.33.2;
Improve the SVS initialization.

Declare x86_patch_window_open() and x86_patch_window_close(), and globalify
x86_hotpatch().

Introduce svs_enable() in x86/svs.c, that does the SVS hotpatching.

Change svs_init() to take a bool. This function gets called twice; early
when the system just booted (and nothing is initialized), lately when at
least pmap_kernel has been initialized.
 1.32 22-Feb-2018  maxv Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.
 1.31 27-Jan-2018  maxv Add SMAP support for i386.
 1.30 07-Jan-2018  christos make this compile w/o LOCKDEBUG
 1.29 07-Jan-2018  maxv Switch x86_retpatch[] -> HOTPATCH().
 1.28 07-Jan-2018  maxv Fix previous - atomic_lockpatch[] is still there.
 1.27 07-Jan-2018  maxv Switch x86_lockpatch[] -> HOTPATCH().
 1.26 07-Jan-2018  maxv Implement a real hotpatch feature.

Define a HOTPATCH() macro, that puts a label and additional information
in the new .rodata.hotpatch kernel section. In patch.c, scan the section
and patch what needs to be. Now it is possible to hotpatch the content of
a macro.

SMAP is switched to use this new system; this saves a call+ret in each
kernel entry/exit point.

Many other operating systems do the same.
 1.25 07-Jan-2018  maxv Give patchbytes an array.
 1.24 27-Oct-2017  riastradh Add comment explaining why membar_producer is not sfence.

On x86, ordinary non-temporal stores are always issued in program
order to main memory and to other CPUs.
 1.23 17-Oct-2017  maxv Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.
 1.22 15-Nov-2013  msaitoh branches: 1.22.22;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.21 18-Apr-2010  jym branches: 1.21.8; 1.21.18; 1.21.22;
This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.20 03-Nov-2009  dyoung branches: 1.20.2; 1.20.4;
Gracelessly bracket #include "opt_spldebug.h" with #ifdef i386.
Should fix the amd64 kernel-build failure that Andreas Wrede
reported.
 1.19 03-Nov-2009  dyoung Add a kernel configuration flag, SPLDEBUG, that activates a per-CPU log
of transitions to IPL_HIGH from lower IPLs. SPLDEBUG is only available
on i386 and Xen kernels, today.

'options SPLDEBUG' adds instrumentation to spllower() and splraise() as
well as routines to start/stop debugging and to record IPL transitions:
spldebug_start(), spldebug_stop(), spldebug_raise(), spldebug_lower().
 1.18 24-Apr-2009  ad A workaround for a bug with some Opteron revisions where locked operations
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It's possible that this is the
reason for pkgbuild's longstanding crashiness.

This is not complete (atomic ops need some work too).
 1.17 02-Apr-2009  enami So that profile kernel runs again,
- Adjust the size of functions used to patch.
- Fix the jump offset of mcount call when patching functions.

Approved by Andrew Doran.
 1.16 17-Feb-2009  ad Repair x86_patch to install optimized routines.
Pointed out by enami@.
 1.15 19-Dec-2008  ad branches: 1.15.2;
PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.14 08-Sep-2008  gmcgarry branches: 1.14.2; 1.14.4;
Replace most gcc-specific __attribute__ uses with BSD-style sys/cdef.h
preprocessor macros.
 1.13 30-Apr-2008  ad branches: 1.13.2; 1.13.6;
PR kern/38537 __HAVE_PREEMPTION requires MULTIPROCESSOR

Don't patch out the kernel_lock functions if ncpu == 1. We need it for
preemption, and it's used much less frequently than before.
 1.12 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.11 20-Dec-2007  ad branches: 1.11.6; 1.11.8; 1.11.10;
- Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.10 20-Dec-2007  ad 64-bit atomic ops for i386.
 1.9 28-Nov-2007  ad branches: 1.9.2; 1.9.6;
x86_mb_nop is now unused.
 1.8 28-Nov-2007  ad Hook in the atomic ops from libkern.
 1.7 13-Nov-2007  ad When running uniprocessor, patch _kernel_lock() and _kernel_unlock() to
do nothing more than "nop; ret".
 1.6 10-Nov-2007  ad - When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.
 1.5 17-Oct-2007  garbled branches: 1.5.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.4 26-Sep-2007  ad branches: 1.4.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.3 17-May-2007  yamt branches: 1.3.8; 1.3.10; 1.3.12;
merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.2 09-Feb-2007  ad branches: 1.2.2; 1.2.4; 1.2.8; 1.2.10; 1.2.14; 1.2.16;
Merge newlock2 to head.
 1.1 27-Jan-2007  ad branches: 1.1.2;
file patch.c was initially added on branch newlock2.
 1.1.2.4 02-Feb-2007  ad - Define memory barrier ops in lock_stubs.S.
- If lfence/mfence are available, patch them in at boot.
- Patch to a no-op if !MULTIPROCESSOR. XXX Should be determined at runtime.
 1.1.2.3 01-Feb-2007  ad - Don't patch stuff if it's a profiling kernel.
- Move Xspllower label to show up better in profile output.
 1.1.2.2 27-Jan-2007  ad Rename some functions to better describe what they do.
 1.1.2.1 27-Jan-2007  ad If running on a PPro or later, at boot patch in versions of spllower() and
similar that use cmpxchg8b instead of cli/sti. Cuts the clock cycles for
splx() by a factor of ~6 on the P4, and ~3 on the PIII when bracketed by
serializing instructions (and hopefully more when not).
 1.2.16.2 03-Oct-2007  garbled Sync with HEAD
 1.2.16.1 22-May-2007  matt Update to HEAD.
 1.2.14.2 17-Apr-2007  thorpej New atomic op hot-patches for amd64.
 1.2.14.1 17-Apr-2007  thorpej Hot-patch the atomic ops that need it on x86 (32-bit).
 1.2.10.1 11-Jul-2007  mjf Sync with head.
 1.2.8.3 03-Dec-2007  ad Sync with HEAD.
 1.2.8.2 09-Oct-2007  ad Sync with head.
 1.2.8.1 27-May-2007  ad Sync with head.
 1.2.4.7 21-Jan-2008  yamt sync with head
 1.2.4.6 07-Dec-2007  yamt sync with head
 1.2.4.5 15-Nov-2007  yamt sync with head.
 1.2.4.4 27-Oct-2007  yamt sync with head.
 1.2.4.3 03-Sep-2007  yamt sync with head.
 1.2.4.2 26-Feb-2007  yamt sync with head.
 1.2.4.1 09-Feb-2007  yamt file patch.c was added on branch yamt-lazymbuf on 2007-02-26 09:08:52 +0000
 1.2.2.1 24-Mar-2007  rmind Checkpoint:
- Abstract for per-CPU locking of runqueues.
As a workaround for SCHED_4BSD global runqueue, covered by sched_mutex,
spc_mutex is a pointer for now. After making SCHED_4BSD runqueues
per-CPU, it will became a storage mutex.
- suspendsched: Locking is not necessary for cpu_need_resched().
- Remove mutex_spin_exit() prototype in patch.c and LOCK_ASSERT() check
in runqueue_nextlwp() in sched_4bsd.c to make them compile again.
 1.3.12.1 06-Oct-2007  yamt sync with head.
 1.3.10.2 09-Jan-2008  matt sync with HEAD
 1.3.10.1 06-Nov-2007  matt sync with HEAD
 1.3.8.4 03-Dec-2007  joerg Sync with HEAD.
 1.3.8.3 14-Nov-2007  joerg Sync with HEAD.
 1.3.8.2 11-Nov-2007  joerg Sync with HEAD.
 1.3.8.1 02-Oct-2007  joerg Sync with HEAD.
 1.4.2.2 18-Nov-2007  bouyer Sync with HEAD
 1.4.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.5.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.5.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.5.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.9.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.9.2.1 26-Dec-2007  ad Sync with head.
 1.11.10.4 11-Aug-2010  yamt sync with head.
 1.11.10.3 11-Mar-2010  yamt sync with head
 1.11.10.2 04-May-2009  yamt sync with head.
 1.11.10.1 16-May-2008  yamt sync with head.
 1.11.8.1 18-May-2008  yamt sync with head.
 1.11.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.11.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.11.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.6.1 19-Oct-2008  haad Sync with HEAD.
 1.13.2.1 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.14.4.4 13-May-2009  snj Pull up following revision(s) (requested by ad in ticket #725):
sys/arch/amd64/amd64/lock_stubs.S: revision 1.22
sys/arch/i386/i386/lock_stubs.S: revision 1.23
sys/arch/x86/x86/patch.c: revision 1.18
A workaround for a bug with some Opteron revisions where locked operations
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It is possible that this is the
reason for pkgbuild's longstanding crashiness.
This is not complete (atomic ops need some work too).
 1.14.4.3 03-Apr-2009  snj branches: 1.14.4.3.2;
Pull up following revision(s) (requested by enami in ticket #645):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.17
sys/arch/amd64/amd64/spl.S: revision 1.21
sys/arch/x86/x86/patch.c: revision 1.17
So that profile kernel runs again,
- Adjust the size of functions used to patch.
- Fix the jump offset of mcount call when patching functions.
Approved by Andrew Doran.
 1.14.4.2 19-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #471):
sys/arch/x86/x86/patch.c: revision 1.16
Repair x86_patch to install optimized routines.
Pointed out by enami@.
 1.14.4.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #343):
common/lib/libc/arch/i386/atomic/atomic.S: revision 1.14
sys/arch/x86/include/cpufunc.h: revision 1.9
sys/arch/x86/x86/identcpu.c: revision 1.12
sys/arch/x86/x86/cpu.c: revision 1.60
sys/arch/x86/x86/patch.c: revision 1.15
PR kern/40213 my i386 machine can't boot because of tsc
- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.
- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.
 1.14.4.3.2.1 13-May-2009  snj branches: 1.14.4.3.2.1.2;
Pull up following revision(s) (requested by ad in ticket #725):
sys/arch/amd64/amd64/lock_stubs.S: revision 1.22
sys/arch/i386/i386/lock_stubs.S: revision 1.23
sys/arch/x86/x86/patch.c: revision 1.18
A workaround for a bug with some Opteron revisions where locked operations
sometimes do not serve as memory barriers, allowing memory references to
bleed outside of critical sections. It's possible that this is the
reason for pkgbuild's longstanding crashiness.
This is not complete (atomic ops need some work too).
 1.14.4.3.2.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.14.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.14.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.14.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.15.2.3 24-Oct-2010  jym Sync with HEAD
 1.15.2.2 01-Nov-2009  jym Sync with HEAD.
 1.15.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.20.4.1 30-May-2010  rmind sync with head
 1.20.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.21.22.1 18-May-2014  rmind sync with head
 1.21.18.2 03-Dec-2017  jdolecek update from HEAD
 1.21.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.21.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.22.22.2 14-Apr-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #748:

sys/arch/amd64/amd64/copy.S 1.29 (adapted, via patch)
sys/arch/amd64/amd64/amd64_trap.S 1.16,1.19 (partial) (via patch)
sys/arch/amd64/amd64/trap.c 1.102,1.106 (partial),1.110 (via patch)
sys/arch/amd64/include/frameasm.h 1.22,1.24 (via patch)
sys/arch/x86/x86/cpu.c 1.137 (via patch)
sys/arch/x86/x86/patch.c 1.23,1.26 (partial) (via patch)

Backport of SMAP support.
 1.22.22.1 06-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #603:

amd64/conf/kern.ldscript 1.25 (patch)
amd64/conf/kern.ldscript.Xen 1.14 (patch)
i386/conf/kern.ldscript 1.21 (patch)
i386/conf/kern.ldscript.Xen 1.15 (patch)
x86/include/cpufunc.h 1.24 (patch)
x86/x86/patch.c 1.25 (partial) 1.26 (partial)

Backport x86_hotpatch.
 1.33.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.33.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.34.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.34.2.1 10-Jun-2019  christos Sync with HEAD
 1.37.6.3 15-Apr-2020  bouyer On amd64, always use the cmpxchg8b version of spllower. All x86_64 host should
have it and we already rely on it in lock stubs.
On i386, always use i686_mutex_spin_exit and cx8_spllower for Xen;
Xen doesn't run on CPUs on CPUs lacking the required instructions anyway.
Skip x86_patch only for XENPV, and adjust for changes in assembly functions.
Tested on Xen PV and PVHVM, and on bare metal core i5.
 1.37.6.2 14-Apr-2020  bouyer Always patch spllower with cx8_spllower; it works fine for Xen now
Include x86/x86/patch.c is !xenpv
While there, defopt XENPV
 1.37.6.1 10-Apr-2020  bouyer Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.
 1.18 21-Jul-2021  jmcneill x86's platform.c no longer has any x86 specific code in it, so move it to
dev/smbios_platform.c to let other ports use it
 1.17 21-Jul-2021  jmcneill Separate MI smbios interface from MD specific code.
 1.16 25-Dec-2018  mlelstv branches: 1.16.16;
Expose more DMI variables via sysctl.
 1.15 26-Mar-2014  christos branches: 1.15.28; 1.15.30;
kill sprintf
 1.14 08-Dec-2012  kiyohara branches: 1.14.2;
#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.
 1.13 14-Nov-2011  jmcneill branches: 1.13.10;
add machdep.dmi.bios-date
 1.12 14-Nov-2011  jmcneill add a machdep.dmi sysctl tree with the following read-only keys:
system-vendor, system-product, system-version, system-serial, system-uuid
bios-vendor, bios-version
board-vendor, board-product, board-version, board-serial
the *-serial and *-uuid keys are marked with CTLFLAG_PRIVATE

a few of the pmf platform key names changed so update callers to match
 1.11 18-Jan-2011  jmmv branches: 1.11.6;
Ammend previous to be more accurate in platform_add_date by using the epoch:
* Years in the [70,99] range are considered to be in 1900.
* Years in the [0,69] range are considered to be in 2000.

I don't think we may have hit any machine where the previous numbers were
a problem, but these seem to be the "correct" ones.

From christos@.
 1.10 17-Jan-2011  jmmv Fix year correction in platform_add_date so that:
* Years in the [90,99] range are considered to be in 1900.
* Years in the [0,89] range are considered to be in 2000.

This makes my MacBookPro2,2 be recognized as from 2007 instead of 1907, which
in turn lets ACPI (and many other things!) work.

Fix proposed by jmcneill@ as an alternative to my workaround in acpi_quirks.c
sent to port-i386@.
 1.9 06-Sep-2010  jmcneill branches: 1.9.2;
Add support for blacklisting ACPI BIOS implementations by year. By default,
don't use ACPI on BIOS which advertise release years <= 2000. This
can be changed by setting option ACPI_BLACKLIST_YEAR=0 or by setting
acpi_force_load=1.
 1.8 17-Feb-2009  ad branches: 1.8.2; 1.8.4;
Adjust previous:

Output platform info with aprint_verbose(), so it shows up in dmesg output.
It's useful for bug reports.
 1.7 17-Feb-2009  jmcneill Make platform_print use aprint_debug
 1.6 18-Dec-2008  cegger branches: 1.6.2;
remove unused malloc.h
 1.5 05-May-2008  jmcneill branches: 1.5.8;
Use 2-clause license.
 1.4 30-Mar-2008  ad branches: 1.4.2; 1.4.4;
If SMBIOS is present and there seems to be good expansion slot info,
note the number of ISA compatible slots.
 1.3 09-Dec-2007  xtraeme branches: 1.3.6; 1.3.8; 1.3.14;
Remove useless returns at the end of void functions.
 1.2 09-Dec-2007  jmcneill Merge jmcneill-pm branch.
 1.1 03-Aug-2007  jmcneill branches: 1.1.2; 1.1.8; 1.1.10; 1.1.12;
file platform.c was initially added on branch jmcneill-pm.
 1.1.12.1 11-Dec-2007  yamt sync with head.
 1.1.10.1 26-Dec-2007  ad Sync with head.
 1.1.8.1 27-Dec-2007  mjf Sync with HEAD.
 1.1.2.2 08-Dec-2007  jmcneill Rename pnp(9) -> pmf(9), as requested by many.
 1.1.2.1 03-Aug-2007  jmcneill Pull in power management changes from private branch.
 1.3.14.3 17-Jan-2009  mjf Sync with HEAD.
 1.3.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.3.8.2 21-Jan-2008  yamt sync with head
 1.3.8.1 09-Dec-2007  yamt file platform.c was added on branch yamt-lazymbuf on 2008-01-21 09:40:17 +0000
 1.3.6.2 09-Jan-2008  matt sync with HEAD
 1.3.6.1 09-Dec-2007  matt file platform.c was added on branch matt-armv6 on 2008-01-09 01:49:58 +0000
 1.4.4.3 09-Oct-2010  yamt sync with head
 1.4.4.2 04-May-2009  yamt sync with head.
 1.4.4.1 16-May-2008  yamt sync with head.
 1.4.2.1 18-May-2008  yamt sync with head.
 1.5.8.2 03-Mar-2009  skrll Sync with HEAD.
 1.5.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.6.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.6.2.3 24-Oct-2010  jym Sync with HEAD
 1.6.2.2 01-Nov-2009  jym Sync with HEAD.
 1.6.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.8.4.1 05-Mar-2011  rmind sync with head
 1.8.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.9.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.6.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.6.2 16-Jan-2013  yamt sync with (a bit old) head
 1.11.6.1 17-Apr-2012  yamt sync with head
 1.13.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.10.1 25-Feb-2013  tls resync with head
 1.14.2.1 18-May-2014  rmind sync with head
 1.15.30.1 10-Jun-2019  christos Sync with HEAD
 1.15.28.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.16.16.1 01-Aug-2021  thorpej Sync with HEAD.
 1.428 03-Sep-2025  bouyer We need to allocate HYPERVISOR_shared_info and HYPERVISOR_shared_info_pa
only for VM_GUEST_XENHVM, not in all but VM_GUEST_XENPVH cases.
 1.427 08-Oct-2024  riastradh x86/pmap: Use UVM_KMF_WAITVA to ensure pmap_pdp_alloc never fails.

This is used as the backing page allocator for pmap_pdp_pool, and
pmap_ctor assumes that PR_WAITOK allocations from it don't fail and
unconditionally writes to the resulting kva, which if null leads
nowhere good.

It is unclear to me why uvm_km_alloc can accept any combination of
the options UVM_KMF_NOWAIT and UVM_KMF_WAITVA. It seems to me that
at least one should be required (and they should be exclusive), and
any other use should trip an assertion.

PR kern/58666: panic: lock error: Reader / writer lock:
rw_vector_enter,357: locking against myself
 1.426 04-Oct-2023  ad branches: 1.426.6;
Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
 1.425 26-Jul-2023  riastradh x86/pmap: Print quantities in failed assertions in pmap_load.
 1.424 16-Jul-2023  riastradh x86: Sprinkle extensive commentary about %fs/%gs initialization.

Plus some other side quests like the three-stage GDT metamorphosis
lifecycle.

No functional change intended.
 1.423 24-Sep-2022  riastradh branches: 1.423.4;
x86/pmap: Convert conditional to assertion.

pmap_kernel should never have va < VM_MAXUSER_ADDRESS entered.
 1.422 24-Sep-2022  riastradh x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.
 1.421 31-Aug-2022  bouyer Work in progress on dom0 PVH support: ioctl support for tools.
Basically, in PVH mode (where XENFEAT_auto_translated_physmap is enabled),
the hypervisor will not map foreing ressources in our virtual address
space for us. Instead, we have to pass it an address in our physical
address space (but not mapped to some RAM) where the ressource will show up
and then enter this PA in pour page table.

For this, introduce xenmem_* which manage the PA space. In PVH mode this
is just allocated from the iomem_ex extent.

With this, I can start a PV domU, and the guest's kernel boots (and
the console works). It hangs because the backend driver can't map the
frontend ressources (yet).

Note that, per https://xenbits.xen.org/docs/unstable/support-matrix.html,
dom0 PVH support is still considered experimental by Xen.
 1.420 20-Aug-2022  riastradh x86: Move definition of struct pmap to pmap_private.h.

This makes pmap_resident_count and pmap_wired_count out-of-line
functions instead of inline. No functional change intended
otherwise.
 1.419 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.418 20-Aug-2022  riastradh x86: Move pl*_i, pl_i_roundup, and ptp_va2o out of x86/pmap.h.

- pl[1-4]_i -> x86/pte.h
- pl_i, pl_i_roundup, ptp_va2o -> x86/pmap.c
 1.417 20-Aug-2022  riastradh x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
 1.416 20-Aug-2022  riastradh x86: Move page attribute table bits to x86/pat.h.
 1.415 13-May-2022  riastradh x86/pmap: Feed entropy_extract output through nist_hash_drbg.

The entropy pool algorithm is NOT designed to provide backtracking
resistance on its own -- it MUST be combined with a PRNG/DRBG that
provides that.

The only reason we use entropy_extract here is that cprng(9) is not
available yet (which in turn is because kmem and other basic kernel
facilities aren't available yet), but nist_hash_drbg doesn't have any
initialization order requirements, so we'll just use it directly.
 1.414 07-May-2022  bouyer return after calling xen_pagezero(), don't fall back to the legacy
pmap_zero_page() method.
This should only affect performances.
 1.413 02-Jan-2022  andvar fix few more typos in comments.
 1.412 07-Oct-2021  msaitoh KNF. No functional change.
 1.411 02-Aug-2021  andvar fix various typos in comments and log messages.
 1.410 17-Apr-2021  bouyer Make pat_init() a NOOP on XENPV; it causes a trap with Xen 4.15
 1.409 06-Feb-2021  jdolecek use __builtin_assume_aligned() on places where the memset() or memcpy()
parameters are known to be PAGE_SIZE-aligned, so compiler doesn't need
to emit atrocious alignment check code when inlining them

this particularly improves pmap_zero_page() and pmap_copy_page(),
which are now reduced to only 'rep stosq', and close to what a hand-written
assembly would do

Note: on CPUs supporting ERMS, 'rep stosb' would still be slightly faster, but
this is a solid stop-gap improvement

suggested by Mateusz Guzik in:
http://mail-index.netbsd.org/tech-kern/2020/07/19/msg026620.html
 1.408 30-Nov-2020  bouyer Work in progress on dom0 PVH support. kernel boots and xl info works,
but we can't start a domU yet.
 1.407 06-Sep-2020  riastradh branches: 1.407.2;
Fix fallout from previous uvm.h cleanup.

- pmap(9) needs uvm/uvm_extern.h.

- x86/pmap.h is not usable on its own; it is only usable if included
via uvm/uvm_extern.h (-> uvm/uvm_pmap.h -> machine/pmap.h).

- Make nvmm.h and nvmm_internal.h standalone.
 1.406 02-Sep-2020  bouyer pmap_enter_gnt():
An empty PTP has a wire_count of 1, so KASSERT > 1 if we're sure we have
at last one entry.
 1.405 02-Sep-2020  bouyer pmap_enter_gnt(): call pmap_free_ptp() if needed. We can have a 0 wire count
if we had an old mapping and grant map hypercall failed, and this was the
only page in this ptp.
while there remove ptp != NULL checks for gnt operations: we always have
a ptp here.
 1.404 01-Sep-2020  bouyer Fix braino in pmap_find_gnt(), really return the gnt entry covering the range
and not one that starts just after.
Fixes a KASSERT in pmap_remove_gnt().
 1.403 04-Aug-2020  skrll Trailing whitespace
 1.402 04-Aug-2020  skrll typo in comment
 1.401 19-Jul-2020  maxv we're already in an #ifdef USER_LDT block, so no need to #ifdef again
 1.400 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.399 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.398 03-Jun-2020  ad Revert most of 1.396 and go back to using memset()/memcpy().
Do not restore pageidlezero stuff though.
 1.397 29-May-2020  ad Reported-by: syzbot+fd9be59aa613bbf4eba8@syzkaller.appspotmail.com
Reported-by: syzbot+15dd4dbac6ed159faa4a@syzkaller.appspotmail.com
Reported-by: syzbot+38fa02d3b0e46e57c156@syzkaller.appspotmail.com

pmap_remove_all(): need to drain PV pages only after the PTEs are unmapped,
otherwise there can be a context switch with them mapped in. XXX amd64
should use the direct map.
 1.396 27-May-2020  ad - Add a couple of wrapper functions around STOS and MOVS and use them to zero
and copy PTEs in preference to memset()/memcpy().

- Remove related SSE / pageidlezero stuff.
 1.395 27-May-2020  ad Reported-by: syzbot+c1770938bb3fa7c085f2@syzkaller.appspotmail.com
Reported-by: syzbot+ae26209c7d7f06e0b29f@syzkaller.appspotmail.com

Can't defer freeing PV entries for the kernel's pmap until pmap_update(),
as that means taking locks and potentially recursing, and pmap_update()
for the kernel is used in all sorts of sensitive places.
 1.394 26-May-2020  bouyer Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().
 1.393 19-May-2020  ad Comment
 1.392 15-May-2020  ad PTP pages are zeroed before free again.
 1.391 15-May-2020  ad Reported-by: syzbot+0f38e4aed17c14cf0af8@syzkaller.appspotmail.com
Reported-by: syzbot+c1770938bb3fa7c085f2@syzkaller.appspotmail.com
Reported-by: syzbot+92ca248f1137c4b345d7@syzkaller.appspotmail.com
Reported-by: syzbot+acfd688740461f7edf2f@syzkaller.appspotmail.com

Be careful with pmap_lock in pmap_update(). It can happen that pmap_kernel
has work pending that gets noticed in interrupt context, before process
context has a chance to deal with it.
 1.390 15-May-2020  ad PR kern/55268: tmpfs is slow

pmap_clear_attrs(): if a brand new page with no mappings just zap pp_attrs.
 1.389 08-May-2020  riastradh Factor randomization out of slotspace_rand.

slotspace_rand becomes deterministic; the randomization moves into
the callers instead. Why?

There are two callers of slotspace_rand:

- x86/pmap.c pmap_bootstrap
- amd64/amd64.c init_slotspace

When the randomization was introduced, it used an x86-only
`cpu_earlyrng' abstraction that would hash rdseed/rdrand and rdtsc
output together. Except init_slotspace ran before cpu_probe, so
cpu_feature was not yet filled out, so during init_slotspace, the
only randomization was rdtsc.

In the course of the recent entropy overhaul, I replaced cpu_earlyrng
by entropy_extract, and moved cpu_init_rng much earlier -- but still
after cpu_probe -- in order to reduce the number of abstractions
lying around and the number of copies of rdrand/rdseed logic. In so
doing I added some annoying complication (see curcpu_available) to
kern_entropy.c to make it work early enough for init_slotspace, and
dropped the rdtsc.

For pmap_bootstrap that didn't substantively change anything. But
for init_slotspace, it removed the only randomization. To mitigate
this, this commit pulls the randomization out of slotspace_rand into
pmap_bootstrap and init_slotspace, so that

(a) init_slotspace can use rdtsc and a little private entropy pool in
order to restore the prior (weak) randomization it had, and

(b) pmap_bootstrap, which runs a little bit later, can continue to
use entropy_extract normally and get rdrand/rdseed too.

A subsequent commit will move cpu_init_rng just a wee bit later,
after cpu_init_msrs, so the kern_entropy.c complications can go away.
Perhaps someone else more wizardly with x86 can find a way to make
init_slotspace run a little later too, after cpu_probe and after
cpu_init_msrs and after cpu_rng_init, but I am not that wizardly.
 1.388 05-May-2020  bouyer Make DOM0OPS build for PVH/PVHVM too
 1.387 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.386 02-May-2020  maxv Call kasan_early_init earlier, to unbreak KASAN after the recent RNG
changes. Will also prevent further trouble.
 1.385 30-Apr-2020  riastradh Simplify Intel RDRAND/RDSEED and VIA C3 RNG API.

Push it all into MD x86 code to keep it simpler, until we have other
examples on other CPUs. Simplify RDSEED-to-RDRAND fallback.
Eliminate cpu_earlyrng in favour of just using entropy_extract, which
is available early now.
 1.384 28-Apr-2020  jmcneill Detect PAT on the boot processor before cpu0 attaches so the early genfb
attach code can map the framebuffer with write combining.
 1.383 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.382 24-Apr-2020  maxv Give the ldt a fixed size of one page (512 slots), and drop the variable-
sized mechanism that was too complex.

This fixes a race between USER_LDT and SVS: during context switches, the
way SVS installs the new ldt relies on the ldt pointer AND the ldt size,
but both cannot be accessed atomically at the same time.
 1.381 05-Apr-2020  ad branches: 1.381.2;
Allocate PV entries in PAGE_SIZE chunks, and cache partially allocated PV
pages with the pmap. Worth about 2-3% sys time on build.sh for me.
 1.380 22-Mar-2020  ad x86 pmap:

- Give pmap_remove_all() its own version of pmap_remove_ptes() that on native
x86 does the bare minimum needed to clear out PTPs. Cuts ~4% sys time on
'build.sh release' for me.

- pmap_sync_pv(): there's no need to issue a redundant TLB shootdown. The
caller waits for the competing operation to finish.

- Bring 'options TLBSTATS' up to date.
 1.379 20-Mar-2020  ad - pmap_extract(): This needs to take the pmap's lock, to allow for
concurrent removal of pages (a new requirement).

- pmap_remove_pv(): Keep hold time of pp_lock as short as possible.

- pmap_get_ptp(): Don't re-init struct pmap_page for PD PTPs. Would
have no ill effects but is wrong regardless.
 1.378 19-Mar-2020  ad PR port-amd64/55083 (assertion "pmap->pm_stats.resident_count == PDP_SIZE" failed)

Reported-by: syzbot+2c1e17352173a60eec23@syzkaller.appspotmail.com

Don't screw up resident_count in failure path.
 1.377 18-Mar-2020  ad Pacify assertion in a failure path.

Reported-by: syzbot+e666891e2bc5caee14d8@syzkaller.appspotmail.com
 1.376 17-Mar-2020  ad - Change some expensive checks DEBUG -> DIAGNOSTIC.
- Mark some small functions inline.
- Add an assertion.
 1.375 17-Mar-2020  ad - pmap_enter(): under low memory conditions, if PTP allocation succeeded and
then PV entry allocation failed, PTP pages were being freed without their
struct pmap_page being reset back to the non-PTP setup, which then caused
havoc with pmap_page_removed(). Fix it.

- pmap_enter_pv(): don't do the PV check if memory allocation failed.

Reported-by: syzbot+d9b42238107c155ca0cd@syzkaller.appspotmail.com
Reported-by: syzbot+80cf4850dc1cf29901dc@syzkaller.appspotmail.com
 1.374 17-Mar-2020  ad Hallelujah, the bug has been found. Resurrect prior changes, to be fixed
with following commit.
 1.373 17-Mar-2020  ad Back out the recent pmap changes until I can figure out what is going on
with pmap_page_remove() (to pmap.c rev 1.365).
 1.372 17-Mar-2020  ad - Add more assertions.

- Range clipping for pmap_remove(): only need to keep track of the lowest VA
in PTP, as ptp->wire_count provides an upper bound. D'oh. Move set of
range to where there is already a writeback to the PTP.

- pmap_pp_remove(): panic if pmap_sync_pv() returns an error, because it means
something has gone very wrong. The PTE should not change here since the
pmap is locked.

- pmap_pp_clear_attrs(): wait for the competing V->P operation by acquiring
and releasing the pmap's lock, rather than busy looping.

- pmap_test_attrs(): this needs to wait for any competing operations,
otherwise it could return without all necessary updates reflected in
pp_attrs.

- pmap_enter(): fix cut-n-paste screwup in an error path for Xen.
 1.371 17-Mar-2020  ad Add a bunch of assertions.
 1.370 15-Mar-2020  ad Fix a comment.
 1.369 15-Mar-2020  ad - pmap_enter(): Remove cosmetic differences between the EPT & native cases.
Remove old code to free PVEs that should not be there that caused panics
(merge error moving between source trees on my part).

- pmap_destroy(): pmap_remove_all() doesn't work for EPT yet, so need to catch
up on deferred PTP frees manually in the EPT case.

- pp_embedded: Remove it. It's one more variable to go wrong and another
store to be made. Just check for non-zero PTP pointer & non-zero VA
instead.
 1.368 15-Mar-2020  ad pmap_enter(): look directly in the tree for old PVE when installing an
unmanaged mapping, because there is no existing pmap_page to check in
the shortcut path (it traps).

pv_pte_next(): don't assert pp_embedded because it could have been removed
(during pmap_pp_remove()).
 1.367 14-Mar-2020  ad Re: kern/55071 (Panic shortly after running X11 due to kernel diagnostic assertion "mutex_owned(&pp->pp_lock)")

pmap_pp_remove(): get rid of a "goto" to make it clearer what's going on.
 1.366 14-Mar-2020  ad PR kern/55071 (Panic shortly after running X11 due to kernel diagnostic assertion "mutex_owned(&pp->pp_lock)")

- Fix a locking bug in pmap_pp_clear_attrs() and in pmap_pp_remove() do the
TLB shootdown while still holding the target pmap's lock.

Also:

- Finish PV list locking for x86 & update comments around same.

- Keep track of the min/max index of PTEs inserted into each PTP, and use
that to clip ranges of VAs passed to pmap_remove_ptes().

- Based on the above, implement a pmap_remove_all() for x86 that clears out
the pmap in a single pass. Makes exit() / fork() much cheaper.
 1.365 14-Mar-2020  ad pmap_remove_all(): Return a boolean value to indicate the behaviour. If
true, all mappings have been removed, the pmap is totally cleared out, and
UVM can then avoid doing the work to call pmap_remove() for each map entry.
If false, either nothing has been done, or some helpful arch-specific voodoo
has taken place.
 1.364 14-Mar-2020  maxv On amd64, mark the whole tree as NX. No real functional change, just to
prevent possible future surprises, and to make it a little harder to map
executable pages in ROP chains.
 1.363 10-Mar-2020  ad - pmap_check_inuse() is expensive so make it DEBUG not DIAGNOSTIC.

- Put PV locking back in place with only a minor performance impact.
pmap_enter() still needs more work - it's not easy to satisfy all the
competing requirements so I'll do that with another change.

- Use pmap_find_ptp() (lookup only) in preference to pmap_get_ptp() (alloc).
Make pm_ptphint indexed by VA not PA. Replace the per-pmap radixtree for
dynamic PV entries with a per-PTP rbtree. Cuts system time during kernel
build by ~10% for me.
 1.362 04-Mar-2020  ad pmap_enter(): ditch pv_entry if unmanaged. Shouldn't happen I think, but
do for the sake of correctness.
 1.361 01-Mar-2020  ad - Give pmap uvm_objects an empty pagerops to avoid special casing in UVM.
(This use of uvm_object causes a disproportionate amount of work.)

- Undo the pmap_destroy()/pmap_delref() split. I misunderstood the flow of
control, and there's no need for this.

- For pmap_remove_pv(), always look up the pv_entry in advance as those
calls will need to be covered by lock again soon.
 1.360 29-Feb-2020  ad PR kern/55033: kernel panics when starting X

Remove the uvm_page_owner_locked_p() assertions in the x86 pmap. The DRM
code doesn't follow the locking protocol (it's OK though, since pages aren't
changing identity) and having thought about it more we're most likely going
to have to do full PV locking to make progress on concurrent fault handing,
ergo assertions not so important.
 1.359 23-Feb-2020  ad The PV locking changes are expensive and not needed yet, so back them
out for the moment. I want to find a cheaper approach.
 1.358 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.357 21-Feb-2020  maxv In pmap_changeprot_local(), drop the dirty bit along with the write bit.
 1.356 21-Feb-2020  maxv Add comments.
 1.355 12-Jan-2020  ad x86 pmap:

- It turns out that every page the pmap frees is necessarily zeroed. Tell
the VM system about this and use the pmap as a source of pre-zeroed pages.

- Redo deferred freeing of PTPs more elegantly, including the integration with
pmap_remove_all(). This fixes problems with nvmm, and possibly also a crash
discovered during fuzzing.

Reported-by: syzbot+a97186518c84f1d85c0c@syzkaller.appspotmail.com
 1.354 07-Jan-2020  ad branches: 1.354.2;
pmap_ept_enter(): PTE -> EPT in two places.
 1.353 04-Jan-2020  ad x86 pmap improvements, reducing system time during a build by about 15% on
my test machine:

- Replace the global pv_hash with a per-pmap record of dynamically allocated
pv entries. The data structure used for this can be changed easily, and
has no special concurrency requirements. For now go with radixtree.

- Change pmap_pdp_cache back into a pool; cache the page directory with the
pmap, and avoid contention on pmaps_lock by adjusting the global list in
the pool_cache ctor & dtor. Align struct pmap and its lock, and update
some comments.

- Simplify pv_entry lists slightly. Allow both PP_EMBEDDED and dynamically
allocated entries to co-exist on a single page. This adds a pointer to
struct vm_page on x86, but shrinks pv_entry to 32 bytes (which also gets
it nicely aligned).

- More elegantly solve the chicken-and-egg problem introduced into the pmap
with radixtree lookup for pages, where we need PTEs mapped and page
allocations to happen under a single hold of the pmap's lock. While here
undo some cut-n-paste.

- Don't adjust pmap_kernel's stats with atomics, because its mutex is now
held in the places the stats are changed.
 1.352 02-Jan-2020  ad Back the pv_hash stuff out. Now seeing errors from ATOMIC_*.
For another day.
 1.351 02-Jan-2020  ad Remove unused argment to pmap_remove_pv().
 1.350 02-Jan-2020  ad Replace the pv_hash_locks with atomic ops.

Leave the hash table at the same size for now: with the hash table size
doubled, system time for a build drops 10-15%, but user time starts to rise
suspiciously, presumably because the cache is wrecked. Need to try another
data structure.
 1.349 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.348 22-Dec-2019  ad pmap_get_ptp(): the uvm_pagefree() call in the failure case can block too.
Pacify the assertion in pmap_unmap_ptes().

XXX Revisit and solve this chicken-and-egg problem in a more elegant way.

Reported-by: syzbot+24967905b8d17344581c@syzkaller.appspotmail.com
 1.347 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.346 16-Dec-2019  ad pmap_unmap_ptes(): ci_want_pmapload isn't dependant on TLB state.
 1.345 15-Dec-2019  ad - Share common code between pmap_load() and pmap_map_ptes().
- Make pmap_map_ptes() better tolerate recovery from blocking.
 1.344 15-Dec-2019  ad uvm_pagerealloc() can now block because of radixtree manipulation, so defer
freeing PTPs until pmap_unmap_ptes(), where we still have the pmap locked
but can finally tolerate context switches again.

To be revisited soon: pmap_map_ptes() seems broken WRT other pmap load.

Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com
Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
 1.343 08-Dec-2019  ad Merge x86 pmap changes from yamt-pagecache:

- Deal better with the multi-level pmap object locking kludge.
- Handle uvm_pagealloc() being able to block.
 1.342 03-Dec-2019  riastradh Use __insn_barrier to enforce ordering in l_ncsw loops.

(Only need ordering observable by interruption, not by other CPUs.)
 1.341 16-Nov-2019  maxv Add a NULL check on the structure pointer, not to retrieve its first field
if it is NULL. The previous code was not buggy strictly speaking. This
change probably doesn't change anything, except removing assumptions in the
compiler optimization passes, which too probably doesn't change anything in
this case.

Reported-by: syzbot+110b29c1973f38a38026@syzkaller.appspotmail.com
 1.340 14-Nov-2019  maxv Mark several kASan functions with __nothing, to avoid annoying #ifdefs.
Same as kCSan and kMSan.
 1.339 14-Nov-2019  maxv Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.338 13-Nov-2019  maxv Rename:
PP_ATTRS_M -> PP_ATTRS_D
PP_ATTRS_U -> PP_ATTRS_A
For consistency.
 1.337 30-Oct-2019  maxv Switch to new PTE bits.
 1.336 05-Oct-2019  maxv Switch to the new PTE naming:

PG_PVLIST -> PTE_PVLIST
PG_W -> PTE_WIRED
PG_FRAME -> PTE_FRAME

No functional change.
 1.335 07-Aug-2019  maxv Add support for USER_LDT in SVS. This allows us to have both enabled at
the same time.

We allocate an LDT for each CPU in the GDT and map an area for it, in
addition to the default LDT already present. In context switches between
different processes, we choose between the default or the per-cpu LDT
selector: if the user set specific LDT entries, we memcpy them to the
per-cpu LDT and load the per-cpu selector.

Tested by Naveen Narayanan (with Wine on amd64).
 1.334 01-Jun-2019  maxv branches: 1.334.2;
Fix two bugs in pmap_write_protect():

* The mask should be ~PAGE_MASK, not PTE_FRAME. PTE_FRAME eliminates the
higher bits, and that's not wanted.
* The computation of tva is incorrect: if the VA is in kernel space we
must take the canonical hole into account, and here we were not.

We've had these bugs basically forever. It meant that uvm_km_protect()
would never flush the correct VA, and a stale TLB entry would persist.

Fixes PR/54257. Since I added PCID support we execute invpcid in invlpg(),
and invpcid triggers a #GP if the address is non canonical, contrary to
invlpg. The wrong computation of the VA during a modload happened to hit
the canonical hole.
 1.333 27-May-2019  maxv Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.
 1.332 27-May-2019  maxv Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.
 1.331 12-Mar-2019  gson Add missing space in "wiring for pmap .. did not change" message
 1.330 10-Mar-2019  maxv Two changes:

* Allow large pages to be passed in pmap_pdes_valid, this happens under
DDB when it reads RIP (.text), called via pmap_extract.

* Invert a branch in pmap_extract, so that 'l_cpu' is not touched if we're
dealing with the kernel pmap.

This fixes 'boot -d'.
 1.329 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.328 07-Mar-2019  maxv Drop PG_RO, PG_KR and PG_PROT, they are useless and create confusion.
 1.327 23-Feb-2019  maxv Move PATENTRY into pmap.h, will be used outside.
 1.326 23-Feb-2019  maxv Add support for CPUs that don't have the EPT_{A,D} bits.

On such CPUs, these bits are ignored by the hardware. We don't care about
setting them, however, we must always assume they are set. Modify the pmap
code to do that.

While here, in pmap_ept_remove_pte, don't flush the TLB when it's not
needed.

Tested on an old Intel Celeron.
 1.325 21-Feb-2019  maxv Remove wrong KASSERT in EPT, and reorder the code to reduce duplication.
 1.324 18-Feb-2019  maxv Fix stupid mistake, I didn't reflect correctly the behavior of pmap_sync_pv
in the EPT callback, 'optep' can be NULL.
 1.323 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.322 13-Feb-2019  maxv Add the EPT pmap code, used by Intel-VMX.

The idea is that under NVMM, we don't want to implement the hypervisor page
tables manually in NVMM directly, because we want pageable guests; that is,
we want to allow UVM to unmap guest pages when the host comes under
pressure.

Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from
native, and this has three important consequences:

- We can't use the native PTE bits, so each time we want to modify the
page tables, we need to know whether we're dealing with a native pmap
or an EPT pmap. This is accomplished with callbacks, that handle
everything PTE-related.

- There is no recursive slot possible, so we can't use pmap_map_ptes().
Rather, we walk down the EPT trees via the direct map, and that's
actually a lot simpler (and probably faster too...).

- The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded
on the host. This has two sub-consequences: at creation time we must
zero out all of the top-level PTEs, and at destruction time we force
the page out of the pool cache and into the pool, to ensure that a next
allocation will invoke pmap_pdp_ctor() to create a native pmap and not
recycle some stale EPT entries.

To create an EPT pmap, the caller must invoke pmap_ept_transform() on a
newly-allocated native pmap. And that's about it, from then on the EPT
callbacks will be invoked, and the pmap can be destroyed via the usual
pmap_destroy(). The TLB shootdown callback is not initialized however,
it is the responsibility of the hypervisor (NVMM) to set it.

There are some twisted cases that we need to handle. For example if
pmap_is_referenced() is called on a physical page that is entered both by
a native pmap and by an EPT pmap, we take the Accessed bits from the
two pmaps using different PTE sets in each case, and combine them into a
generic PP_ATTRS_U flag (that does not depend on the pmap type).

Given that the EPT layout is a 4-Level tree with the same address space as
native x86_64, we allow ourselves to use a few native macros in EPT, such
as pmap_pa2pte(), rather than re-defining them with "ept" in the name.

Even though this EPT code is rather complex, it is not too intrusive: just
a few callbacks in a few pmap functions, predicted-false to give priority
to native. So this comes with no messy #ifdef or performance cost.
 1.321 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.320 02-Feb-2019  cherry Switch NetBSD/xen to use XEN api tag RELEASE-4.11.1

The headers for this api are in sys/external/mit/xen-include-public/dist/
 1.319 01-Feb-2019  maxv Add the remaining pmap callbacks, will be used by NVMM-VMX.
 1.318 01-Feb-2019  maxv Change the format of the pp_attrs field: instead of using PTE bits
directly, use abstracted bits that are converted from/to PTE bits when
needed (in pmap_sync_pv).

This allows us to use the same pp_attrs for pmaps that have PTE bits at
different locations.
 1.317 31-Jan-2019  maxv Move some code into a separate function, no functional change.
 1.316 17-Jan-2019  maxv Simplify pmap_sync_pv: just pass a pa, and build the pte inside.
 1.315 17-Dec-2018  maxv Remove dead checks, they were already pointless when I fixed them a few
years ago, and now they are wrong because the PTE space is randomized.
 1.314 17-Dec-2018  maxv Add two pmap fields, will be used by NVMM-VMX. Also apply a few cosmetic
changes.
 1.313 07-Dec-2018  maxv Add an option to have a static kernel memory layout. This option is
disabled by default - that is to say, KASLR remains enabled by default.
 1.312 19-Nov-2018  maxv Introduce pl_pi, will be used soon.
 1.311 19-Nov-2018  maxv Rename 'mask' -> 'frame', we will use the real 'mask' soon.
 1.310 07-Nov-2018  maxv Add two pmap fields, will be used by NVMM.
 1.309 31-Oct-2018  maxv Move the MI parts of KASAN into kern/subr_asan.c. This file includes
machine/asan.h, which contains the MD functions. We use an include rather
than a plain C file, because we want GCC to optimize/inline some functions
into one single block.

The amd64 MD parts of KASAN are moved accordingly.

The naming convention we use is:

kasan_*
a generic kasan object, declared in subr_asan.c
kasan_md_*
an MD kasan object, declared in machine/asan.h, and used
in subr_asan.c
__md_*
an MD object, declared in machine/asan.h, and not used
outside

Overall this makes it easier to add KASAN support on more architectures.

Discussed with several people.
 1.308 29-Sep-2018  cherry For i386 XEN3PAE_DOM0, use the "native" idt registration
infrastructure by removing the #ifndef XEN clause.

This will hopefully be the last commit to "fix" boot
breakage of XEN3PAE_DOM0

Thanks to bouyer@ to focussed bug reports with

# xl dmesg
and relevant ddb> bt
 1.307 29-Aug-2018  maxv clean up a little
 1.306 29-Aug-2018  maxv Simplify the ASLR stuff, we don't care about resizable areas now, and it
makes the code more complicated for no good reason.
 1.305 22-Aug-2018  maxv Add support for monitoring the stack with kASan. This allows us to detect
illegal memory accesses occuring there.

The compiler inlines a piece of code in each function that adds redzones
around the local variables and poisons them. The illegal accesses are then
detected using the usual kASan machinery.

The stack size is doubled, from 4 pages to 8 pages.

Several boot functions are marked with the __noasan flag, to prevent the
compiler from adding redzones in them (because we haven't yet initialized
kASan). The kasan_early_init function is called early at boot time to
quickly create the shadow for the current stack; after this is done, we
don't need __noasan anymore in the boot path.

We pass -fasan-shadow-offset=0xDFFF900000000000, because the compiler
wants to do
shad = shadow-offset + (addr >> 3)
and we do, in kasan_addr_to_shad
shad = KASAN_SHADOW_START + ((addr - CANONICAL_BASE) >> 3)
hence
shad = KASAN_SHADOW_START + (addr >> 3) - (CANONICAL_BASE >> 3)
= [KASAN_SHADOW_START - (CANONICAL_BASE >> 3)] + (addr >> 3)
implies
shadow-offset = KASAN_SHADOW_START - (CANONICAL_BASE >> 3)
= 0xFFFF800000000000 - (0xFFFF800000000000 >> 3)
= 0xDFFF900000000000

In UVM, we add a kasan_free (that is not preceded by a kasan_alloc). We
don't add poisoned redzones ourselves, but all the functions we execute
do, so we need to manually clear the poison before freeing the stack.

With the help of Kamil for the makefile stuff.
 1.304 20-Aug-2018  maxv Add support for kASan on amd64. Written by me, with some parts inspired
from Siddharth Muralee's initial work. This feature can detect several
kinds of memory bugs, and it's an excellent feature.

It can be enabled by uncommenting these three lines in GENERIC:

#makeoptions KASAN=1 # Kernel Address Sanitizer
#options KASAN
#no options SVS

The kernel is compiled without SVS, without DMAP and without PCPU area.
A shadow area is created at boot time, and it can cover the upper 128TB
of the address space. This area is populated gradually as we allocate
memory. With this design the memory consumption is kept at its lowest
level.

The compiler calls the __asan_* functions each time a memory access is
done. We verify whether this access is legal by looking at the shadow
area.

We declare our own special memcpy/memset/etc functions, because the
compiler's builtins don't add the __asan_* instrumentation.

Initially all the mappings are marked as valid. During dynamic
allocations, we add a redzone, which we mark as invalid. Any access on
it will trigger a kASan error message. Additionally, the compiler adds
a redzone on global variables, and we mark these redzones as invalid too.
The illegal-access detection works with a 1-byte granularity.

For now, we cover three areas:

- global variables
- kmem_alloc-ated areas
- malloc-ated areas

More will come, but that's a good start.
 1.303 18-Aug-2018  maxv Simplify the conditions. Fixes compilation of native amd64 without direct
map.
 1.302 12-Aug-2018  maxv More ASLR: randomize the location of the PTE area. The PTE slot is not
created in locore anymore, but a little later; by using the already
entered L4 page, rather than the recursive slot itself (which doesn't
exist yet).

In the prekern we still map the slot - the prekern behaves as an external
locore -, because we need it as part of the randomization/relocation
work. The kernel then removes this slot, and regenerates a randomized
one.

Tested on GENERIC and GENERIC_KASLR, Xen doesn't have it and dom0 still
boots fine.
 1.301 12-Aug-2018  maxv Move the PTE area from slot 255 to slot 509. I've never understood why we
put it on 255; the "kernel" half of the VM space begins on slot 256, so
if anything, the PTE area should have been above it, not below.

Virtually extend the user slots in slotspace, because we don't want
(randomized) kernel mappings to land on slot 255.

The prekern is updated accordingly.

Tested on GENERIC, GENERIC_KASLR and XEN3_DOM0.
 1.300 12-Aug-2018  maxv Introduce PDIR_SLOT_USERLIM, which indicates the limit of the user slots.
Use it instead of PDIR_SLOT_PTE when we just want to iterate over the
user slots. Also use it in SVS, I had hardcoded 255 because there was no
proper define (which there now is).
 1.299 12-Aug-2018  maxv Reduce the minefield: zero out the pdir only once, at the beginning of
the function. This eliminates one assumption on the order of the VM
areas.
 1.298 12-Aug-2018  maxv Randomize the main memory on Xen, same as native. Tested on amd64-dom0.
 1.297 12-Aug-2018  maxv Take the last area into account, there is a hole before it.
 1.296 12-Aug-2018  maxv More ASLR: randomize the kernel main memory. VM_MIN_KERNEL_ADDRESS becomes
variable, and its location is chosen at boot time. There is room for
improvement, since for now we ask for an alignment of NBPD_L4.

This is enabled by default in GENERIC, but not in Xen. Tested extensively
on GENERIC and GENERIC_KASLR, XEN3_DOM0 still boots fine.
 1.295 26-Jul-2018  maxv Remove the non-PAE-i386 code of Xen. The branches are reordered so that
__x86_64__ comes first, eg:

#if defined(PAE)
/* i386+PAE */
#elif defined(__x86_64__)
/* amd64 */
#else
/* i386 */
#endif

becomes

#ifdef __x86_64__
/* amd64 */
#else
/* i386+PAE */
#endif

Tested on i386pae-domU and amd64-dom0.
 1.294 26-Jul-2018  maxv Remove useless/outdated comments. No functional change.
 1.293 21-Jul-2018  maxv I realized the changes I made broke the !aslr conf, so enable aslr by
default now rather than later (and rather than adding more ifdefs).

Now the location of the direct map is randomized at boot time in GENERIC.
 1.292 21-Jul-2018  maxv More ASLR. Randomize the location of the direct map at boot time on amd64.
This doesn't need "options KASLR" and works on GENERIC. Will soon be
enabled by default.

The location of the areas is abstracted in a slotspace structure. Ideally
we should always use this structure when touching the L4 slots, instead of
the current cocktail of global variables and constants.

machdep initializes the structure with the default values, and we then
randomize its dmap entry. Ideally machdep should randomize everything at
once, but in the case of the direct map its size is determined a little
later in the boot procedure, so we're forced to randomize its location
later too.
 1.291 20-Jun-2018  maxv branches: 1.291.2;
Use PMAP_DIRECT_UNMAP.
 1.290 19-May-2018  jdolecek Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.289 04-Mar-2018  jdolecek branches: 1.289.2;
adjust the pmap_check_inuse() fix to avoid #ifdef
 1.288 04-Mar-2018  kre Declare our variables before using them, fixes DIAGNOSTIC amd64 XEN builds.
 1.287 04-Mar-2018  jdolecek reduce intendation in pmap_activate(), NFCI
 1.286 04-Mar-2018  jdolecek move DIAGNOSTIC code for checking in-use pmap from pmap_destroy()
to a separate function for readability
 1.285 04-Mar-2018  jdolecek dedup pmap_pdirpa() KASSERT() in pmap_reactive(), pmap_load(), and
pmap_deactivate()
 1.284 04-Mar-2018  jdolecek dedup code around pmap_reactivate() - do the actual TLB flush also
 1.283 04-Mar-2018  jdolecek drop pmap_update_2pg(), just call pmap_update_pg() separately for each
 1.282 01-Mar-2018  maxv Replace PG_G by pmap_pg_g, for the sake of removing references to the
former. No functional change since pmap_pg_g = PG_G.
 1.281 18-Feb-2018  maxv Add svs_enabled, which defaults to 'true' when SVS is compiled (no dynamic
detection yet).
 1.280 17-Feb-2018  maxv Add svs_init. This is where we will detect the CPU and decide whether
to turn SVS on or not.

Add svs_pgg_update to dynamically add/remove PG_G from all the kernel
pages. Use it now.
 1.279 20-Jan-2018  maxv Mmh, restore PG_G on the direct map, we still want that in the non-SVS
case.
 1.278 07-Jan-2018  maxv Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.
 1.277 05-Jan-2018  martin Mark L1e_idx as __diagused, it is only referenced in a KASSERT.
 1.276 05-Jan-2018  maxv Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.
 1.275 04-Jan-2018  maxv Allocate the TSS area dynamically. This way cpu_info and cpu_tss can be
put in separate pages.
 1.274 04-Jan-2018  maxv Group the different TSSes into a cpu_tss structure. And pack this
structure to make sure there is no padding between 'tss' and 'iomap'.
 1.273 03-Jan-2018  maxv style
 1.272 03-Jan-2018  maxv simplify
 1.271 31-Dec-2017  maxv Ah, finally found you. Fix two bugs in pmap_remap_largepages(), that
could cause KASLR kernels to crash early during the boot procedure.

pmap_remap_largepages assumes that the kernel is far from the end of
the VM space, but this assumption does not hold with KASLR, since the
kernel sections are allowed to reside in the very last page of the VM
space.

Doing +NBPD_L2 or roundup() in such cases caused an integer overflow,
which caused a page fault when touching &L2_BASE, which in turn caused
an immediate CPU reset and a reboot.

Took me a while to reproduce and debug this issue.
 1.270 28-Dec-2017  maxv Use variables in PMAP_DIRECT_*, so that the location of the direct map can
change.
 1.269 28-Dec-2017  maxv Eliminate the assumption that the beginning of the direct map is aligned
to NBPD_L4 and NBPD_L3. It won't be when we'll randomize its location.
 1.268 28-Dec-2017  maxv Downgrade the direct map from 1GB superpages to 2MB large pages, and
simplify. Then, map the "head" region and the kernel segments as RO instead
of RW, to kill the last place that has .text mapped as writable. It will
also allow for a greater number of possibilities when we will randomize
the direct map.

While it is true that this change theoretically reduces performance a bit,
we are more interested in correctness.
 1.267 22-Nov-2017  christos Avoid NPE.
 1.266 20-Nov-2017  chs In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
 1.265 15-Nov-2017  maxv Support large pages on KASLR kernels, in a way that does not reduce
randomness, but on the contrary that increases it.

The size of the kernel sub-blocks is changed to be 1MB. This produces a
kernel with sections that are always < 2MB in size, that can fit a large
page.

Each section is put in a 2MB physical chunk. In this chunk, there is a
padding of approximately 1MB. The prekern uses a random offset aligned to
sh_addralign, to shift the section in physical memory.

For example, physical memory layout created by the bootloader for .text.4
and .rodata.0:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|+---------------+ |+---------------+ |
|| .text.4 | PAD || .rodata.0 | PAD |
|+---------------+ |+---------------+ |
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
PA PA+2MB PA+4MB

Then, physical memory layout, after having been shifted by the prekern:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
| P +---------------+ | +---------------+ |
| A | .text.4 | PAD | PAD | .rodata.0 | PAD |
| D +---------------+ | +---------------+ |
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
PA PA+2MB PA+4MB

The kernel maps these 2MB physical chunks with 2MB large pages. Therefore,
randomness is enforced at both the virtual and physical levels, and the
resulting entropy is higher than that of our current implementaion until
now.

The padding around the section is filled by the prekern. Not to consume
too much memory, the sections that are smaller than PAGE_SIZE are mapped
with normal pages - because there is no point in optimizing them. In these
normal pages, the same shift is applied.

This change has two additional advantages: (a) the cache attacks based on
the TLB are mostly mitigated, because even if you are able to determine
that a given page-aligned range is mapped as executable you don't know
where exactly within that range the section actually begins, and (b) given
that we are slightly randomizing the physical layout we are making some
rare physical attacks more difficult to conduct.

NOTE: after this change you need to update GENERIC_KASLR / prekern /
bootloader.
 1.264 11-Nov-2017  maxv Modify the layout of the bootspace structure, in such a way that it can
contain several kernel segments of the same type (eg several .text
segments). Some parts are still a bit messy but will be cleaned up soon.

I cannot compile-test this change on i386, but it seems fine enough.

NOTE: you need to rebuild and reinstall a new prekern after this change.
 1.263 29-Oct-2017  maxv Add a fifth region, called "head". On kaslr kernels it contains the ELF
Header and the ELF Section Headers. On normal kernels it is empty (the
headers are in the "boot" region).

Note: if you're using GENERIC_KASLR, you also need to rebuild the prekern.
 1.262 08-Oct-2017  maxv Use roundup instead. Otherwise some (userland) pages could get mapped in
the text large pages. We were using roundup to improve performance on i386
(mapping the text with large pages even if it was not aligned). But we're
in a state where correctness matters more than performance - the correct way
to get performance here is to align .text to 4MB.
 1.261 08-Oct-2017  maxv KASLR: add workarounds to compute the bootinfo VAs (use the direct map),
and don't use large pages yet. Both will be fixed later.
 1.260 30-Sep-2017  maxv use bootspace (this branch is never taken)
 1.259 30-Sep-2017  maxv Declare pmap_remap_global, and map the four regions independently with
bootspace.
 1.258 30-Sep-2017  maxv Add a bootspace structure. It describes the physical and virtual space
layout created by the early kernel bootstrap code. Start using it, and
eliminate several references to KERNBASE and other global symbols. While
here clean up xen-i386, it's really tiring.
 1.257 12-Sep-2017  mrg minor KNF.
 1.256 28-Jul-2017  riastradh #if DIAGNOSTIC panic ---> KASSERTMSG
 1.255 22-Jul-2017  maxv Initialize these kpm fields in pmap_bootstrap.
 1.254 25-Jun-2017  bouyer Xen/i386PAE is special, in that top-level entries are not in per-pmap
tables but per-CPU pages. pmap_alloc_level() takes care of making new
entries actives when the kernel pmap is updated, so always use pmap_kernel()
is this case too.
 1.253 25-Jun-2017  bouyer Page tables are not writable under Xen, so we can't memcpy() to them.
Rewite to do the copy using pmap_pte_set() in the Xen case.
 1.252 23-Jun-2017  jdolecek remove panicstr KASSERT() in pmap_kremove_local() - kernel dump can legitimely
invoked also without panic - via reboot -d

fixes PR kern/49610 by Manuel Bouyer
 1.251 15-Jun-2017  christos tidy up printf/kasserts; no functional change.
 1.250 15-Jun-2017  maxv Fix a subtle but important bug in pmap_growkernel. When adding new toplevel
slots to pmap_kernel, we are implicitly using the recursive slot; but this
slot is in the active pmap, which may not be pmap_kernel. Therefore, adding
L4 slots is fine in itself, but when adding L3 slots the kernel faults
since the L4 slots that were just added are not active on the cpu.

So far this has never been triggered, because the current va limit makes it
impossible to add a new L4 slot, and i386 only has one level so the kernel
cannot fault in a lower level.

Now the tree is grown in the current pmap (cpm), copied into pmap_kernel,
and propagated in the other pmaps as expected.

Note that we're using CPUF_PRESENT, because this function may be called
early, before cpu0 is attached. It does add to the current mess in the
cpu attach code, so it will probably have to be revisited later.
 1.249 15-Jun-2017  maxv Mmh, correctly handle the physmem % lvl == 0 case. Don't know how I didn't
see this in the first place.
 1.248 15-Jun-2017  maxv Limit the size of the direct map with a 2MB granularity (instead of 1GB).
This way if there's a computation error somewhere we will fault earlier
instead of letting the cpu access non-present physmem - which may cause
some bizarre behavior.
 1.247 15-Jun-2017  maxv Reorder these loops to reduce the number of enter->flush. I figured out
yesterday that this has a clear impact: a system with 16TB of hard-coded
ram has a 4-second black screen when booting. Now we're down to < 0.5s.

It could be optimized more, but verily I don't have a machine with P1GB
right now.
 1.246 14-Jun-2017  maxv Give the direct map 32 slots (16TB of va). This matches MAXPHYSMEM, in
such a way that the direct map is no longer the limiting factor for high
memory systems.
 1.245 24-Mar-2017  maxv branches: 1.245.6;
Don't forget to flush the xpq queue, otherwise shit may happen.
 1.244 23-Mar-2017  maxv Remove PG_k completely.
 1.243 15-Mar-2017  maxv Add a comment to answer a question regarding privilege separation when
modifying a PTE from an active page tree. The question is from Manuel
Bouyer, and the answer is from me.
 1.242 09-Mar-2017  chs in pmap_get_ptp(), if we need to allocate multiple new ptp levels
and succeed in allocating some pages but fail to get them all,
free any ptps we did allocate before returning.
also, only consume kernel-reserve pages if pmap_enter()
is called without PMAP_CANFAIL set, to help avoid deadlocking
during high memory pressure.
 1.241 05-Mar-2017  maxv Should be PG_k, doesn't change anything.
 1.240 11-Feb-2017  maxv Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.
 1.239 02-Feb-2017  maxv The first va should depend on the text offset, not the kernel base. Use
rounddown. Note: this value is still wrong, it should be roundup. But
that's another issue that will be fixed in amd64 soon.
 1.238 02-Feb-2017  maxv Use __read_mostly on these variables, to reduce the probability of false
sharing.
 1.237 22-Jan-2017  maxv Put pmap_pg_nx into the dummy Xen page. While here, do some KNF and
localify a bit.
 1.236 06-Jan-2017  maxv branches: 1.236.2;
Rename a few things
 1.235 22-Dec-2016  bouyer Xen doens't need lapic so don't allocate a lapic VA/PA for Xen.
As a side effect this makes XEN3PAE boot again but I don't know why ...
 1.234 20-Dec-2016  maxv When the i386 port was designed, the bootstrap code needed little physical
memory, and taking it below the kernel image was fine: we had 160 free
pages, and never allocated more than 20. With amd64 however, we create a
direct map, and for this map we need a number of page table pages that is
mostly proportionate to the number of physical addresses available, which
implies that these 160 free pages may not be enough.

In particular, if the CPU does not support 1GB superpages, each 1GB chunk
of physical memory needs a 4k page in the direct map, which means that if
a machine has 160GB of ram, the bootstrap code allocates more than 160
pages, thereby overwriting the I/O mem area. If we push a little further,
if a machine has 512GB of ram, we allocate ~525 pages, and start
overwriting the kernel text, causing the system to go crazy at boot time.

Fix this moving the physical allocation area from below the kernel to above
it. avail_start is now beyond the kernel, and lowmem_rsvd indicates the
reserved low-memory pages. The area [lowmem_rsvd; IOM_BEGIN[ is
internalized into UVM, so there is no pa loss.

The only limit now is the pa of LAPIC, which is located at ~4GB of memory,
so it is perfectly fine.

This change theoretically adds va support for 512GB of ram; and it is a
prerequisite if we want to support more memory anyway.
 1.233 17-Dec-2016  maxv Use pmap_bootstrap_valloc and simplify. By the way, I think the cache
stuff is wrong, since the pte is not necessarily aligned to 64 bytes, so
nothing guarantees there is no false sharing.
 1.232 16-Dec-2016  maxv The way the xen dummy page is taken care of makes absolutely no sense at
all, with magic offsets here and there in different layers of the system.
It is just blind luck that everything has always worked as expected so
far.

Due to this wrong design we have a problem now: we allocate one physical
page for lapic, and it happens to overlap with the dummy page, which
causes the system to crash.

Fix this by keeping the dummy va directly in a variable instead of magic
offsets. The asm locore now increments the first pa to hide the dummy page
to machdep and pmap.
 1.231 13-Dec-2016  kamil Torn down KSTACK_CHECK_DR0, i386-only feature to detect stack overflow

This feature was intended to detect stack overflow with CPU Debug Registers
(x86). It was never ported to other ports, neither amd64 and should be
adapted for SMP...

Currently there might be better ways to detect stack overflows like page
mapping protection. Since the number of Debug Registers is restricted
(4 on x86), torn it down completely.

This interface introduced helper functions for Debug Registers, they will
be replaced with the new <x86/dbregs.h> interface.

KSTACK_CHECK_DR0 was disabled by default and won't affect ordinary users.

Sponsored by <The NetBSD Foundation>
 1.230 11-Dec-2016  maxv Kenter local_apic_va to a fake physical page, because our x86
implementation expects this va to be valid even if no lapic is present;
which probably is a bug in itself, but let's just reproduce the old
behavior and rehide that bug.
 1.229 25-Nov-2016  maxv Remove this comment and allow the beginning of .data to be mapped with
large pages. The issue is fixed, the lapic va is dynamically allocated
now.
 1.228 25-Nov-2016  maxv Move the virtual address of the LAPIC page out of the data segment on amd64
and i386. The old design was error-prone, and it didn't allow us to map the
data segment with large pages.

Now, the VA is allocated dynamically in the pmap bootstrap code, and entered
manually later. We go from using &local_apic to using *local_apic_va, and we
therefore need one more level of indirection in the asm code.

Discussed on tech-kern.
 1.227 17-Nov-2016  maxv Unmap tmpva once we are done using it, not to pollute the page tree.
 1.226 17-Nov-2016  maxv Remap the pages with G until kern_end, and not just the preloaded modules.
This way the bootstrap tables, proc0's stack and the I/O mem area don't get
flushed each time userland needs a TLB shootdown.
 1.225 11-Nov-2016  maxv Rename xen_pmap_bootstrap to xen_locore, it really has nothing to do with
pmap and is just a C version of what amd64 and i386 do in asm.
 1.224 11-Nov-2016  maxv Update the pmap only once
 1.223 08-Nov-2016  christos PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.222 24-Sep-2016  dholland LDT handling fixes:
- add missing membar_store_store ("membar_producer") when setting a
new ldt;
- use UVM_KMF_WAITVA when allocating space for a new ldt instead of
crashing if uvm_km_alloc fails;
- if uvm_km_alloc fails in pmap_fork, bail instead of crashing;
- clarify what else is going on in pmap_fork;
- don't uvm_km_free while holding a mutex.
 1.221 27-Aug-2016  maxv Map the boot IDT, GDT and LDT in three different pages on x86. It is much
better this way, and it reduces the diff between x86 and Xen. Also, zero
them properly, otherwise we might end up with garbage in several slots.
 1.220 19-Aug-2016  maxv Switch the XXXCDC to panics. Normally it should never be triggered, since
the kernel space is above the PTE space, and the user space is below it.
Any attempt to write or remove this area should be blocked by UVM earlier.
 1.219 19-Aug-2016  maxv Rename new_pve2 -> new_sparepve, makes it less bizarre.
 1.218 27-Jul-2016  maxv Re-enable large pages on the data segment, but don't map the first page,
and add a comment to explain why. We will have to move the LAPIC VA.

The large page support is technically the same as before my last commit,
since in practice, the first page of .data is never mapped with large
pages.
 1.217 25-Jul-2016  maxv The L1 entry of the first page of the data segment is overwritten for the
LAPIC page, and set as RWX+PG_N. The LAPIC pa is fixed, and its va resides
in the data segment. Because of this error-prone design, the kernel image
map is not linear, and I first thought it was a bug (as I vaguely said in
PR/51148). Using large pages for the data segment is therefore wrong, since
the first page does not actually belong to the data segment (even if its va
is in the range). This bug is not triggered currently, since local_apic is
not large-page-aligned.

We will certainly have to allocate a va dynamically instead of using the
first page of data; but for now, disable large pages on the data segment,
and map the LAPIC as RW.

This is the last x86-specific RWX page.
 1.216 22-Jul-2016  maxv Remove pmap_prealloc_lowmem_ptps on amd64. This function creates levels in
the page tree so that the first 2MB of virtual memory can be kentered in
L1.

Strictly speaking, the kernel should never kenter a virtual page below
VM_MIN_KERNEL_ADDRESS, because then it wouldn't be available in userland.
It used to need the first 2MB in order to map the CPU trampoline and the
initial VAs used by the bootstrap code. Now, the CPU trampoline VA is
allocated with uvm_km_alloc and the VAs used by the bootstrap code are
allocated with pmap_bootstrap_valloc, and in either case the resulting VA
is above VM_MIN_KERNEL_ADDRESS.

The low levels in the page tree are therefore unused. By removing this
function, we are making sure no one will be tempted to map an area below
VM_MIN_KERNEL_ADDRESS in kernel mode, and particularly, we are making sure
NULL cannot be kentered.

In short, there is no way to map NULL in kernel mode anymore.
 1.215 22-Jul-2016  maxv Simplify pmap_alloc_level. It is designed to work only with normal_pdes and
PTP_LEVELS, so don't pass them as argument. While here, explain what we are
doing.
 1.214 22-Jul-2016  maxv Unused.
 1.213 20-Jul-2016  maxv There is a huge bug in the way a uvm_map_protect is processed on x86.

When mprotecting a page, the kernel updates the uvm protection associated
with the page, and then gives control to the x86 pmap which splits the
procedure in two: if we are restricting the permissions it updates the page
tree right away, and if we are increasing the permissions it just waits for
the page to fault.

In the first case, it forgets to take care of the X permission. Which means
that if we allocate an executable page, it is impossible to remove the X
permission on it, this being true regardless of whether the mprotect call
comes from the kernel or from userland. It is not possible to make sure the
page is non executable either, since the only holder of the permission
information is uvm, and no track is kept at the pmap level of the actual
permissions enforced. In short, the kernel believes the page is non
executable, while the cpu knows it is.

Fix this by properly taking care of the !VM_PROT_EXECUTE case. Since the
bit manipulation is a little tricky we use two vars: bit_rem (remove) and
bit_put.
 1.212 19-Jul-2016  maxv This loop makes no sense at all.
 1.211 11-Jul-2016  maxv branches: 1.211.2;
KNF and simplify a little.
 1.210 09-Jul-2016  maxv Simplify pmap_get_physpage.
 1.209 09-Jul-2016  maxv Use pmap_bootstrap_palloc.
 1.208 09-Jul-2016  maxv When a user pmap is created, it is populated with the higher kernel
slots, which become accessible upon kernel entry (syscall, cpu switch,
or whatever). Put the NOX bit in the user recursive slot, so the whole
tree does not appear as executable in kernel mode.

This is already what is done in the kernel pmap.
 1.207 09-Jul-2016  maxv KNF this function a little
 1.206 01-Jul-2016  maxv There is no direct map on i386, and therefore we always need to use
temporary VAs and PTEs when mapping an area. These temporary VAs don't
need to be executable. Put the NOX bit on them.
 1.205 01-Jul-2016  maxv Surprisingly enough, the kernel expects the CPU to support large pages
when creating the direct map on amd64. Therefore, the amd64 CPUs that do
not support large pages basically don't work on NetBSD.

It looks like it has always been this way; add a KASSERT to panic
properly in case we come across one of these CPUs.
 1.204 01-Jul-2016  maxv KNF a little, remove some stupid comments, and add some when needed.
 1.203 01-Jul-2016  maxv We use only one L4 slot for the direct map, which means that we cannot
map more than 512GB. Panic properly if this limit is reached.
 1.202 01-Jul-2016  maxv Use pmap_bootstrap_valloc and pmap_bootstrap_palloc under XEN at least
once, for these not to appear as unused functions (not tested, but I
guess).
 1.201 01-Jul-2016  maxv Create the direct map in a separate function. While here, add some
comments to explain what we are doing. No functional change.
 1.200 01-Jul-2016  maxv Introduce pmap_bootstrap_valloc and pmap_bootstrap_palloc, that are used
to allocate a virtual/physical address before the VM system has been set
up.

Start using it.
 1.199 01-Jul-2016  maxv Put the code in charge of remapping the kernel segments with large pages
into another function. No functional change.
 1.198 01-Jul-2016  maxv Define pmap_pg_nx globally. Will be used soon.
 1.197 01-Jul-2016  maxv Remove this area (unused).
 1.196 21-May-2016  maxv There is an issue in the way the direct map is set up on amd64.

When allocating memory, the kernel allocates physical pages and virtual
addresses for these pages. In order to optimize allocations smaller
than PAGE_SIZE, uvm_km_kmem_alloc can allocate a single physical page
and take its virtual address in the direct map in high virtual memory.
This direct map is set up at boot time, its PTEs do not change, and
therefore they don't need to be kentered. These high virtual PTEs being
constant, the permissions of the areas they point to are fixed at boot
time and cannot change.

The problem is that at boot time, they are created with RWX permissions.
Therefore, allocations smaller than PAGE_SIZE in the kernel heap are all
executable: mbufs, pnbufs, small kmem allocations, etc.

Fix this by setting the NOX bit in the direct map pages at boot time. We
also set the NOX bit in the temporary tmpva, since it does not need to
be executable either.

This also makes the U-area non executable on amd64.
 1.195 15-May-2016  maxv Explicitly mention MP_TRAMPOLINE in these comments, so that NXR links them.
 1.194 14-May-2016  maxv The NOX bit on large pages does not need to be amd64-specific anymore.
The i386 secondary CPUs can now properly handle it.
 1.193 13-May-2016  maxv Actually, make the NOX part amd64-specific. The secondary CPUs bug is not
yet fixed on i386.
 1.192 13-May-2016  maxv Remap the rodata and data+bss segments with large pages on x86. There still
is a bug in the way the text segment is mapped, but I'll see later.
 1.191 12-May-2016  maxv Split the {text+rodata} chunk in two separate chunks on x86. The
rodata segment now loses the large page optimization, gets mapped inside
the data segment, and therefore becomes RWX. It may break the build on
Xen.
 1.190 26-Jan-2016  hannken Operation pmap_pp_clear_attrs() may remove the "used" attribute from a page
that is still cached in the TLB of other CPUs.

Call pmap_tlb_shootnow() here before enabling preemption to clear the
TLB entries on other CPUs.

Should prevent tmpfs data corruption under load.

Ok: Chuck Silvers
 1.189 11-Nov-2015  skrll Split out the pmap_pv_track stuff for use by others.

Discussed with riastradh@
 1.188 03-Apr-2015  riastradh Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
 1.187 27-Nov-2014  bouyer branches: 1.187.2;
Revert sys/arch/x86/x86/pmap.c 1.185; a CPU needs to get pmap updates,
especially for pmap_kernel(), as soon as it is up.
Instead move all pmap-related cpu_info initialisations, including
initializing ci_kpm_mtx, in cpu_attach_common() from cpu_init()
(ci_pmap and ci_tlbstate as already initialized in cpu_attach_common()).
 1.186 27-Nov-2014  uebayasi Consistently use kpreempt_*() outside scheduler path.
 1.185 22-Nov-2014  cherry Wait until all cpus are online, before using the codepath
which syncs all per-cpu cached copies of L4 tables.
(On both x86_64 and i386/pae)

This avoids an early bootup mutex use before mutex init situation.

Reported by manu@
 1.184 14-Oct-2014  bouyer Add a missing || defined(XEN) which cause Xen non-DIAGNOSTIC kernels
to panic at boot.
 1.183 14-Jun-2014  pgoyette branches: 1.183.2;
Check hypervisor version before trying to call xen_copypage() or
xen_pagezero(). Fixes recent issue encountered running a -current
kernel on a pre-3.4 hypervisor.

OK cherry@
 1.182 06-May-2014  cherry Use the hypervisor to copy/zero pages. This saves us the extra overheads
of setting up temporary kernel mapping/unmapping.

riz@ reports savings of about 2s on a 120s kernel build.
 1.181 06-Nov-2013  mrg branches: 1.181.2;
gcc 4.8 issues:
- avoid running over the end of an array (this is a real bug, but
i didn't really look closely at what memory is clobbered. it
may not actually matter.)
- move variables inside their #if usage.
 1.180 05-Oct-2013  rmind Remove some unused variables.
 1.179 13-Nov-2012  chs branches: 1.179.2;
add a pmap_kremove_local() that doesn't do TLB invalidations
on other CPUs. this is only intended for use while writing
kernel crash dumps. remove unused pmap_map().
 1.178 15-Jun-2012  yamt branches: 1.178.2;
emap: reduce the number of atomic ops.
 1.177 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.176 25-Feb-2012  cherry Revert previous since it does a redundant xpq queue flush.
xen_bcast_invlpg() flushes the queue for us.
 1.175 25-Feb-2012  cherry (xen) Flush the xpq before broadcasting page invalidate.
 1.174 25-Feb-2012  cherry Use pmap_pte_xxx() functions instead of xen specific pte update
functions.

No functional changes.
 1.173 24-Feb-2012  cherry kernel page attribute change should be reflected on all cpus, since
the page is going to be released after the _dtor() hook is called.
 1.172 24-Feb-2012  cherry Revert previous
 1.171 24-Feb-2012  cherry kernel page attribute change should be reflected on all cpus, since
the page is going to be released after the _dtor() hook is called.
 1.170 23-Feb-2012  bouyer On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.

May fix PR port-xen/38699
 1.169 21-Feb-2012  rmind pmap_kenter_pa: always print about already present mapping.
pmap_bootstrap: add comments.
 1.168 21-Feb-2012  bouyer Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.167 21-Feb-2012  bouyer Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.

Makes LOCKDEBUG kernels boot again
 1.166 20-Feb-2012  bouyer - Make pmap_write_protect() work with pmap_kernel() too ((va & L2_FRAME)
strips the high bits of a LP64 address)
- use pmap_protect() in pmap_pdp_ctor() to remap the PDP read-only instead
of (ab)using pmap_kenter_pa(). No more "mapping already present" on
console with DIAGNOSTIC kernels
- make sure to zero the whole PDP (NTOPLEVEL_PDES doens't include
high-level entries on i386 and i386PAE, reserved by Xen). Not sure
how it has worked before
- remove an always-true test (&& pmap != pmap_kernel(); we KASSERT that
at the function entry).
 1.165 17-Feb-2012  bouyer Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
 1.164 11-Feb-2012  chs branches: 1.164.2;
zero out the unused parts of direct-map page-table pages.
these seem to cause problems even though they aren't actually referenced,
probably due to speculative execution.
fixes PR 45892.
 1.163 01-Feb-2012  cherry amd64/Xen doesn't require special treatment for pmap_is_curpmap(),
since cpu_load_pmap() ensures that the linear map is in place for the
kernel. This emulates normal shared kernel mappings, except for the
recursive mapping of the PDP_BASE, which will point to the per-cpu
pdir, which will be a copy of the pmap_kernel()->pm_pdir; instead of
the user pmap->pm_pdir.
 1.162 30-Jan-2012  cherry Remove obsolete comment
 1.161 30-Jan-2012  cherry On xen, prevent cached PDP objects from being reused by pool_cache(9),
since they are "pinned" on the hypervisor and thus R/O for the domU.

Separately, after every per-cpu pmap pdir entry reset, make sure that
the corresponding L3 PTP pointers are flushed from all cpus with the
pmap loaded. Enforce this immediately via pmap_tlb_shootnow();
 1.160 29-Jan-2012  drochner don't mess with the PDP pool cache before it is initialized,
prevents at least LOCKDEBUG panics
 1.159 29-Jan-2012  cherry remove obsolete comment
 1.158 29-Jan-2012  cherry Remove apte related shootdowns.
 1.157 28-Jan-2012  cherry Fix pae xen build.
 1.156 28-Jan-2012  cherry stop using alternate pde mapping in xen pmap
 1.155 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.154 22-Jan-2012  cherry Do not clobber pmap_kernel()'s pdir unnecessarily while syncing per-cpu pdirs
 1.153 09-Jan-2012  cherry Harden cross-cpu L3 sync - avoid optimisations that may race.
Update ci->ci_kpm_pdir from user pmap, not global pmap_kernel() entry which may get clobbered by other CPUs.
XXX: Look into why we use pmap_kernel() userspace entries at all.
 1.152 09-Jan-2012  cherry While freeing ptps, remove stale L3 frame entries on all CPUS the pmap is loaded on, not just the current one.
 1.151 09-Jan-2012  cherry revert previous commit. DIAGNOSTIC should only do strict checks, not muffle current ones
 1.150 06-Jan-2012  cherry Address those pesky DIAGNOSTIC messages. \n
Take a performance hit at fork() for not DTRT. \n
Note: Only applicable for kernels built with "options DIAGNOSTIC" \n
 1.149 30-Dec-2011  cherry Move the per-cpu l3 page allocation code to a separate MD function. Avoids code duplication for xen PAE
 1.148 30-Dec-2011  cherry per-cpu shadow directory pages should be updated locally via cross-calls. Do this.
 1.147 09-Dec-2011  chs only use PG_G on leaf PTEs.
go back to tlbflush(), all the global entries
that we create in pmap_bootstrap() are permanent.
 1.146 08-Dec-2011  rmind pmap_bootstrap: use tlbflushg(), not tlbflush(), since we use global pages.
 1.145 08-Dec-2011  chs allow building without direct-map support in non-XEN kernels.
 1.144 07-Dec-2011  cegger switch from xen3-public to xen-public.
 1.143 04-Dec-2011  chs map all of physical memory using large pages.
ported from openbsd years ago by Murray Armfield,
updated for changes since then by me.
 1.142 20-Nov-2011  jym branches: 1.142.2;
Expose pmap_pdp_cache publicly to x86/xen pmap. Provide suspend/resume
callbacks for Xen pmap.

Turn static internal callbacks of pmap_pdp_cache.

XXX the implementation of pool_cache_invalidate(9) is still wrong, and
IMHO this needs fixing before -6. See
http://mail-index.netbsd.org/tech-kern/2011/11/18/msg011924.html
 1.141 08-Nov-2011  cherry Expose the PG_k #define pt/pd bit to both xen and "baremetal" x86. This is required, since kernel pages are mapped with user permissions in XEN/amd64 since the VM kernel runs in ring3. Since XEN/i386(including PAE) runs in ring1, supervisor mode is appropriate for these ports. We need to share this since the pmap implementation is still shared. Once the xen implementation is sufficiently independant of the x86 one, this can be made private to xen/include/xenpmap.h
 1.140 08-Nov-2011  njoly Fix build.
 1.139 06-Nov-2011  christos make this compile again.
 1.138 06-Nov-2011  cherry [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware.
 1.137 18-Oct-2011  jym branches: 1.137.2;
Move Xen specific functions out of x86 native pmap to xen_pmap.c.

Provide a wrapper to trigger pmap pool_cache(9) invalidations without
exposing the caches to outside world.
 1.136 18-Oct-2011  jym Make "pmaps" (list of non-kernel pmaps) and "pmaps_lock" externally
visible. Required by pmap MD code that could reside in other
files, notably Xen's pmap.
 1.135 11-Oct-2011  yamt add comments
 1.134 11-Oct-2011  yamt sprinkle __read_mostly
 1.133 10-Oct-2011  christos We don't have printk. Use printf_nolog, and pass the function name.
 1.132 27-Sep-2011  jym Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
 1.131 25-Sep-2011  jym Fix a small typo in comment: pmaps_lock is the lock that keeps all
pmaps in sync for kernel mappings (including when they are obtained from
pool caches).
 1.130 20-Sep-2011  jym Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

Executive summary:
- split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4))
in two parts: suspend and resume, and hook them to pmf(9).
- modify pmap so that Xen hypervisor does not cry out loud in case
it finds "unexpected" recursive memory mappings
- provide a sysctl(7), machdep.xen.suspend, to command suspend from
userland via powerd(8). Note: a suspend can only be handled correctly
when dom0 requested it, so provide a mechanism that will prevent
kernel to blindly validate user's commands

The code is still in experimental state, use at your own risk: restore
can corrupt backend communications rings; this can completely thrash
dom0 as it will loop at a high interrupt level trying to honor
all domU requests.

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.

XXX secondary CPUs are not suspended, I will write the handlers
in sync with cherry's Xen MP work.

Tested under i386 and amd64, bear in mind ring corruption though.

No build break expected, GENERICs and XEN* kernels should be fine.
./build.sh distribution still running. In any case: sorry if it does
break for you, contact me directly for reports.
 1.129 28-Aug-2011  dyoung Use __strict_weak_alias().
 1.128 14-Aug-2011  rmind Convert few panic() uses to asserts, reduce the scope of variable use.
No functional changes.
 1.127 05-Jul-2011  yamt unrelax an assertion
 1.126 24-Jun-2011  yamt pmap_map_ptes: fix a bug introduced by rmind-uvmplock merge
 1.125 23-Jun-2011  rmind pmap_map_ptes: use cpu_load_pmap() to handle i386 PAE case.
Spotted by cherry@
 1.124 18-Jun-2011  rmind pmap_page_remove: perform TLB shootdown, as it is not caller's responsibility
to perform pmap_update() according to the interface. Might want to revisit.

Should fix recently reported tmpfs problems. Thanks to enami@ and hannken@!
 1.123 13-Jun-2011  tls Fix Xen kernel builds (pmap_is_curpmap can't be static)
 1.122 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.121 01-May-2011  jym branches: 1.121.2;
- Rather use pmap_pte_set() when modifying a PTE.

- Call pmap_pte_flush() when issuing PTE modifications. While it is a NOOP
for native x86, it is not for Xen. It will flush all operations that are
possibly waiting in the queue, like xpq_queue_pte_update(). Do it for
each level, this function is only called at boot time and is not
performance critical.

While here:
- No need to cast early_zerop to void with memset().
- Move common variables out of the #ifdef's.
- KNF
 1.120 27-Apr-2011  plunky drop 'inline' here, to avoid C99 vs GNU differences
 1.119 14-Apr-2011  yamt don't bother to register kernel ptp to uvm_object. from yamt-vmem branch.
 1.118 11-Feb-2011  jmcneill add bus_space_mmap support for BUS_SPACE_MAP_PREFETCHABLE, ok matt@
 1.117 10-Feb-2011  jym Use only one function to pin pages with Xen, and provide macros to
call it for different levels (L1 => L4).

Replace all calls to xpq_queue_pin_table(...) in MD code with these new
functions, with proper #ifdef'ing depending on $MACHINE.

Rationale:
- only one function to modify for logging
- pushes responsibility to caller for chosing the proper pin level, rather
than Xen internal functions; this makes the pin level explicit rather than
implicit.

Boot tested for dom0 i386/amd64, PAE included. No functional change intended.
 1.116 05-Feb-2011  yamt assertions
 1.115 01-Feb-2011  chuck remove no-longer-valid wustl email address for me.
no functional change with this commit.
 1.114 01-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
remove no-longer-valid wustl email address for me.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.113 24-Jul-2010  jym branches: 1.113.2; 1.113.4;
Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
 1.112 15-Jul-2010  jym Check the virtual address 'va' for each PDIR_SLOT_PTE entry. PDP_SIZE
is 4 with PAE (Xen only currently), 1 otherwise: loop should be unrolled
when PDP_SIZE is 1.

pmap_alloc_level() is used by pmap_growkernel(), the PDE is a kernel
mapping: mark it so with PG_k. While here, use pmap_pa2pte() for physical
address 'pa'.

No functional change.
 1.111 07-Jul-2010  chs add the guts of TLS support on amd64. based on joerg's patch,
reworked by me to support 32-bit processes as well.
we now keep %fs and %gs loaded with the user values
while in the kernel, which means we don't need to
reload them when returning to user mode.
 1.110 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
 1.109 10-May-2010  dyoung Provide pmap_enter_ma(), pmap_extract_ma(), pmap_kenter_ma() in all x86
kernels, and use them in the bus_space(9) implementation instead of ugly
Xen #ifdef-age. In a non-Xen kernel, the _ma() functions either call or
alias the equivalent _pa() functions.

Reviewed on port-xen@netbsd.org and port-i386@netbsd.org. Passes
rmind@'s and bouyer@'s inspection. Tested on i386 and on Xen DOMU /
DOM0.
 1.108 04-May-2010  jym Enable the NX bit feature for Xen i386pae and amd64 kernels.

Tested with Xen 3.1 and Xen 3.3, dom0 and domU, by bouyer@ and jym@.

Ok bouyer@.
 1.107 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.106 31-Mar-2010  ad KNF FTW
 1.105 26-Feb-2010  jym branches: 1.105.2;
Fixes regarding paddr_t/pd_entry_t types in MD x86 code, exposed by PAE:

- NBPD_* macros are set to the types that better match their architecture
(UL for i386 and amd64, ULL for i386 PAE) - will revisit when paddr_t is
set to 64 bits for i386 non-PAE.

- type fixes in printf/printk messages (Use PRIxPADDR when printing paddr_t
values, instead of %lx - paddr_t/pd_entry_t being 64 bits with PAE)

- remove casts that are no more needed now that Xen2 support has been dropped

Some fixes are from jmorse@ patches for PAE.

Compile + tested for i386 GENERIC and XEN3 kernels. Only compile tested for
amd64.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
 1.104 16-Feb-2010  jym - re-factor code in pmap_map_ptes() slightly, and make it PAE-ready for
native i386 by using PDP_SIZE

- introduce pmap_unmap_apdp(), used to clear the APDP entries in PD, and
replace the relevant code parts with this function.

Comes from Jeremy Morse's patch for i386 PAE support. Adjustments by me.
 1.103 12-Feb-2010  jym Starting with Xen 3 API, MMU_EXTENDED_COMMAND (tlb flush, cache flush, page
pinning/unpinning, set_ldt, invlpg) operations cannot be queued in the
xpq_queue[] any more, as they use their own specific hypercall, mmuext_op().

Their associated xpq_queue_*() functions already call xpq_flush_queue()
before issuing the mmuext_op() hypercall, which makes these xpq_flush_queue()
calls not necessary.

Rapidly discussed with bouyer@ in private mail. XEN3_DOM0/XEN3PAE_DOM0 tested
through a build.sh release, amd64 was only compile tested. No regression
expected.
 1.102 10-Feb-2010  jym To properly account for the total number of pages allocated for PDP, use
PDP_SIZE, as PAE (i386) requires 4 pages instead of 1.
 1.101 09-Feb-2010  jym Fix typos in comments.
 1.100 31-Jan-2010  hubertf branches: 1.100.2;
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.99 10-Jan-2010  jym Simplify pmap_init_tmp_pgtbl() a bit.

The first level of the temporary page mappings are also done in the first
iteration of the loop below, so no need to do it before.

ok by joerg@ in private mail.
 1.98 25-Nov-2009  rmind Remove IPL_LPT and IPL_IPI aliases, use the actual IPLs.
Fix some broken comments.
 1.97 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.96 07-Nov-2009  cegger Implement pmap_kenter_pa(9) new flag argument in x86.
Make x86 bus_space(9) using it to eliminate an extra TLB flush.
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
Thanks to Martin Husemann for spotting copy&pasto errors in the original patch version.
 1.95 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.94 22-Oct-2009  rmind Simplify pmap_remove() a little by avoiding pmap_do_remove() layer, since
possibility to skip wired mappings is not needed anymore. Apart from that,
no functional differences are intended.
 1.93 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.92 19-Oct-2009  bouyer Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.91 17-Aug-2009  thorpej pmap_page_remove(), pmap_test_attrs(), pmap_clear_attrs(): We're passed in
a vm_page, so there is little point in the DIAGNOSTIC test to see that we
have been passed a managed page.
 1.90 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.89 23-Jul-2009  jym Fix typos in comments and __PRINTKs.
 1.88 19-Jul-2009  rmind pmap_emap_sync: add an argument, and do not perform pmap_load() during
context switch (pmap_destroy() path seems to be unsafe), instead just
perform tlbflush(). Slightly inefficient, but good enough for now.
 1.87 19-Jul-2009  yamt remove unnecessary casts.
 1.86 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.85 23-Apr-2009  cegger pool uses signed int for flags.
undo the int -> u_int change for pmap_pdp_alloc to unbreak the PAE build.
 1.84 21-Apr-2009  cegger change pmap flags argument from int to u_int.
discussed with christos@ on source-changes-d@
 1.83 18-Apr-2009  cegger Introduce PMAP_NOCACHE as first PMAP MD bit in x86. Make use of it in pmap_enter().
This safes one extra TLB flush when mapping dma-safe memory.
Presented on tech-kern@, port-i386@ and port-amd64@
ok ad@
 1.82 21-Mar-2009  ad Add 2 event counters:

"x86", "io bitmap copy"
"x86", "ldt sync"
 1.81 21-Mar-2009  ad Correction to previous.
 1.80 21-Mar-2009  ad PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash

Fix numerous problems:

1. LDT updates are not atomic.

2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.

3. LDTR can be leaked over context switch.

4. GDT slot allocations can race, giving the same LDT slot to two procs.

5. Incomplete interrupt/trap frames can be stacked.

6. In some rare cases segment faults are not handled correctly.
 1.79 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.78 17-Feb-2009  cegger nuke unused global variable
 1.77 18-Dec-2008  cegger branches: 1.77.2;
remove unused malloc.h
 1.76 10-Dec-2008  pooka Make kernel_pmap_ptr a const. Requested by steve_martin.
 1.75 09-Dec-2008  pooka Make pmap_kernel() a MI macro for struct pmap *kernel_pmap_ptr,
which is now the "API" provided by the pmap module. pmap_kernel()
remains as the syntactic sugar.

Bonus cosmetics round: move all the pmap_t pointer typedefs into
uvm_pmap.h.

Thanks to Greg Oster for providing cpu muscle for doing test builds.
 1.74 25-Oct-2008  yamt branches: 1.74.2; 1.74.4;
pmap_page_remove: remove an unnecessary initialization.
 1.73 16-Jul-2008  drochner add a KASSERT to check the protection bits before using as array index
 1.72 16-Jul-2008  chs in pmap_map(), use pmap_kenter_pa() instead of pmap_enter()
so that we don't need to allocate memory to create the mapping.
this should help with getting crash dumps more reliably.
 1.71 24-Jun-2008  gmcgarry branches: 1.71.2;
Move #ifdef/#endif outside macro arguments. Fixes compile with pcc.
 1.70 23-Jun-2008  reinoud Fix pmap.c compilation issues when PMAP_FORK is defined. The locking code was not adapted to the indexing of pmap->pm_obj[]
 1.69 16-Jun-2008  ad Make pmap_extract() lockless. Reviewed by chs@ who reports a ~1% reduction
in system time during a kernel build on a quad core amd64 system.
 1.68 06-Jun-2008  cegger branches: 1.68.2;
make this build for xen
 1.67 05-Jun-2008  ad pmap_remove_all() for x86. Also, always defer freeing ptps to pmap_update().
There may be a better way to do this, but for now this is simple and avoids
potential bugs.

Proposed on tech-kern and discussed with chs@.
 1.66 02-Jun-2008  drochner make the kernel survive its own KASSERTs (on i386)
 1.65 02-Jun-2008  ad - Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.
 1.64 28-May-2008  ad Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.
 1.63 25-May-2008  chs remove unused macros.
 1.62 20-May-2008  dogcow due to changes in KERN_UNLOCK_ALL, now always define hold_count.
 1.61 11-May-2008  ad Disable preemption across LDT mods.
 1.60 11-May-2008  ad Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.
 1.59 03-May-2008  ad branches: 1.59.2;
Back out previous which was not thought through properly.
 1.58 03-May-2008  ad Implement pmap_remove_all().
 1.57 02-May-2008  ad - Give x86 BIOS boot the ability to load new style modules and pass them
into the kernel. Based on a patch by jmcneill@, with many fixes and
improvements by me.

- Put MEMORY_DISK_DYNAMIC and MODULAR into the GENERIC kernels, so that
you can load miniroot.kmod from the boot blocks and boot into the
installer!
 1.56 29-Apr-2008  ad Oops... EVCNT_TYPE_IPI -> EVCNT_TYPE_INTR.
 1.55 28-Apr-2008  ad Don't count many items as EVCNT_TYPE_INTR because they clutter up the
systat vmstat display.
 1.54 28-Apr-2008  ad Stray kpreempt_disable with no matching kpreempt_enable.
 1.53 27-Apr-2008  ad Sprinkle more assertions / preemption paranoia.
 1.52 27-Apr-2008  ad branches: 1.52.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.
 1.51 18-Apr-2008  cegger Make this build on Xen and MULTIPROCESSOR.
OK bouyer
 1.50 16-Apr-2008  cegger branches: 1.50.2;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.49 16-Feb-2008  bouyer branches: 1.49.6;
Minor optimisation: for Xen amd64, copy to kernel map only the entries from
user map that are needed.
 1.48 07-Feb-2008  yamt constify.
 1.47 31-Jan-2008  wiz Fix typo in comment.
 1.46 30-Jan-2008  yamt pmap_page_remove: add a reference to the pmap earlier and comment why.
 1.45 30-Jan-2008  yamt pmap_remove_ptes, pmap_remove_pte: fix a panic seen on pkgbuild,
reported by S.P.Zeidler.
after recent locking changes, these assertions are no longer true.
 1.44 28-Jan-2008  yamt save a word in pv_entry by making pv_hash SLIST.

although this can slow down pmap_sync_pv if hash lists get long,
we should keep them short anyway.
 1.43 24-Jan-2008  bouyer Fix bad cut'n'paste from bouyer-xeni386, restore use of pool_cache_put()
instead of pool_cache_destruct_object(). Pointed out by yamt@
 1.42 23-Jan-2008  bouyer Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386
(domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU
and INSTALL_XEN3PAE_DOMU kernel config files.

See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works.
In short, we still handle it as a 2-level MMU, with the second level page
directory being 4 pages in size. pmap switching is done by switching the
L2 pages in the L3 entries, instead of loading %cr3. This is almost required
by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff)
in a very special way. But this approach should also work for native PAE
support if ever supported (in fact, the pmap should almost suport native
PAE, what's missing is bootstrap code in locore.S).
 1.41 20-Jan-2008  yamt pmap_enter:
- when overwriting an existing mapping for the same page,
inherit R/M bits so that they won't be lost.
- xen: don't leave ptp on error.
 1.40 20-Jan-2008  yamt pmap_write_protect: remove an unnecessary volatile.
 1.39 20-Jan-2008  yamt - rewrite P->V tracking.
- use a hash rather than SPLAY trees.
SPLAY tree is a wrong algorithm to use here.
will be revisited if it slows down anything other than
micro-benchmarks.
- optimize the single mapping case (it's a common case) by
embedding an entry into mdpage.
- don't keep a pmap pointer as it can be obtained from ptp.
(discussed on port-i386 some years ago.)
ideally, a single paddr_t should be enough to describe a pte.
but it needs some more thoughts as it can increase computational
costs.
- pmap_enter: simplify and fix races with pmap_sync_pv.
- don't bother to lock pm_obj[i] where i > 0, unless DIAGNOSTIC.
- kill mp_link to save space.
- add many KASSERTs.
 1.38 20-Jan-2008  yamt pmap_write_protect: fix an assumption in the previous.
a pte can have PG_M even if it currently doesn't have PG_RW.
 1.37 20-Jan-2008  yamt pmap_write_protect: remove a redundant tlb shootdown.
 1.36 18-Jan-2008  yamt remove stale comments
 1.35 18-Jan-2008  yamt pmap_page_remove: don't bother to find the MIN node where any node is fine.
 1.34 17-Jan-2008  yamt pmap_sync_pv, pmap_map_pte: use PTE_BASE if possible.
 1.33 17-Jan-2008  yamt pmap_sync_pv:
- fix "need_shootdown" in the case of retry.
- add some comments.
 1.32 17-Jan-2008  yamt pmap_clear_attrs: remove a redundant assignment
 1.31 17-Jan-2008  yamt - reduce the number of atomic ops in some cases.
- reduce code duplications.
- fix the "pte == 0" assersion failure in pmap_tlb_shootdown.
 1.30 15-Jan-2008  yamt update comments
 1.29 15-Jan-2008  yamt pmap_unmap_ptp: save some hypercalls for xen.
 1.28 15-Jan-2008  yamt - use cpu-local temporary mapping for P->V operations and kill pmap_main_lock.
fixes a part of PR/37193 from Andrew Doran.
- use pmap_stats_update_bypte in some more places.

some comments and fixes from Andrew Doran and Manuel Bouyer.
 1.27 13-Jan-2008  yamt pmap_extract_ma: fix a missing pmap_unmap_ptes.
 1.26 13-Jan-2008  yamt add functions to update pm_stats and use them in some places.
for pmap_kernel(), use atomic ops to update stats because pmap_map_ptes
doesn't lock the pmap.
 1.25 12-Jan-2008  yamt revert a whitespace change which is against KNF.
 1.24 11-Jan-2008  bouyer Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the
branch is still active and will see i386PAE support developement).
Sumary of changes:
- switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c
pmap bootstrap.
- merge back most of xen/i386/ to i386/i386
- change the build to reduce diffs between i386 and amd64 in file locations
- remove include files that were identical to the i386/amd64 counterparts,
the build will find them via the xen-ma/machine link.
 1.23 09-Jan-2008  yamt remove now unused variable.
 1.22 08-Jan-2008  yamt kill unused PMF_USER_RELOAD.
 1.21 08-Jan-2008  yamt pmap_map_ptes: fix a access-after-free bug.
add a reference on ci->ci_pmap before acquiring locks, so that it won't
disappear while we are blocking on the locks.
 1.20 04-Jan-2008  yamt i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.
 1.19 02-Jan-2008  yamt pmap_extract_ma:
- TRUE/FALSE -> true/false
- (P == FALSE) -> (!P)
 1.18 02-Jan-2008  yamt make pmap_pv_cache static.
 1.17 02-Jan-2008  yamt g/c pv_page stuffs.
 1.16 02-Jan-2008  ad Merge vmlocking2 to head.
 1.15 20-Dec-2007  ad - Make __cpu_simple_lock and similar real functions and patch at runtime.
- Remove old x86 atomic ops.
- Drop text alignment back to 16 on i386 (really, this time).
- Minor cleanup.
 1.14 13-Dec-2007  bouyer Reorder some operations for better handling of failures in
xpq_update_foreign(). Note that this also affects native operations, but it
shoulnd't cause problems even for SMP system. Proposed on port-amd64@ and
port-i386@

Don't invalidate the recursive PTE entry in user pmap when switching from
kernel to userland on Xen/amd64. This effectively means that a userland
process can read its own page tables (no write, of course) on Xen/amd64, but
it shouldn't cause security issue (discussed on tech-kern@ some time ago).
This makes NetBSD Xen/amd64 more than 10x faster
building pkgsrc/pkgtools/digest
 1.13 09-Dec-2007  jmcneill branches: 1.13.2;
Merge jmcneill-pm branch.
 1.12 09-Dec-2007  ad Minor correction to previous.
 1.11 09-Dec-2007  ad - pmap_reactivate: save an atomic op if possible.
- pmap_tlb_shootdown: use atomic ops instead of _lock_cas().
 1.10 28-Nov-2007  ad branches: 1.10.2; 1.10.4;
Use the new atomic ops.
 1.9 23-Nov-2007  bouyer Put back part of revision 1.1.4.8: the L4 page has to be pinned as L4
before before being mapped by APDP_PDE.
 1.8 22-Nov-2007  bouyer Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.7 18-Nov-2007  ad #include <sys/atomic.h> instead of local prototypes.
 1.6 14-Nov-2007  ad - Remove I486_CPU, I586_CPU, I686_CPU options. They buy us nothing and
clutter the code significantly.
- Remove pccons.
 1.5 11-Nov-2007  ad pmap_load: pmap_reference() can no longer block.
 1.4 10-Nov-2007  ad - When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.
 1.3 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.2 18-Oct-2007  yamt branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.
 1.1 29-Sep-2007  yamt branches: 1.1.2; 1.1.4;
file pmap.c was initially added on branch yamt-x86pmap.
 1.1.4.9 18-Nov-2007  bouyer Sync with HEAD
 1.1.4.8 18-Nov-2007  bouyer Don't set the Xen-reserved L4 entries in pmap_map_ptes(), it's not needed.
Xen doens't allows to unpin/repin a page table that maps pages from other
domains (some pmaps have this in dom0), so use xpq_queue_pte_update()
to set the PDIR_SLOT_PTE L4 entry instead of unpin/map RW the L4 PD to modify
it.
 1.1.4.7 16-Nov-2007  bouyer Initial domain0 support for xenamd64. The kernel boots multiuser, but
xen tools have not been tried yet.
In this process, cleanup some more the page table bootstrap, and properly
handle event counters for soft interrupts.
 1.1.4.6 14-Nov-2007  bouyer pmap_pdp_dtor(): invalidate the tlb entry after making the page R/W.
 1.1.4.5 13-Nov-2007  bouyer On Xen, pmap_pdp_ctor() doens't map the kernel. Put in a dummy entry for
the last kernel entry, so that pmap_create() won't enter an infinite loop.
A xenamd64 kernel boots again.
 1.1.4.4 13-Nov-2007  bouyer Sync with HEAD
 1.1.4.3 26-Oct-2007  bouyer Remove unneeded tlbflush() in #ifdef XEN code
 1.1.4.2 25-Oct-2007  bouyer Finish sync with HEAD. Especially use the new x86 pmap for xenamd64.
For this:
- rename pmap_pte_set() to pmap_pte_testset()
- make pmap_pte_set() a function or macro for non-atomic PTE write
- define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries
- define pmap_pte_flush() which is a nop in x86 case, and flush the
MMUops queue in the Xen case
 1.1.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.1.2.22 18-Oct-2007  yamt reduce #ifdef.
 1.1.2.21 14-Oct-2007  yamt move pl_i_roundup to a header.
 1.1.2.20 14-Oct-2007  yamt sync with head. (crit_enter/crit_exit are now available.)
 1.1.2.19 08-Oct-2007  yamt pmap_bootstrap: fix off-by-one.
 1.1.2.18 07-Oct-2007  yamt read_psl -> x86_read_psl
 1.1.2.17 04-Oct-2007  yamt remove LARGEPAGES option. always use large pages if available.
 1.1.2.16 04-Oct-2007  yamt - move etext before rodata. define __data_start at the start of
.data section and use it instead of etext where appropriate.
- put .rodata.* into .rodata section as well.
- pmap_bootstrap: don't assume NBPD_L2 alignment.
- pmap_bootstrap: if DEBUG, print how many large pages and normal pages are
used to map kernel text.
 1.1.2.15 04-Oct-2007  yamt revert debug code slipped in with the previous.
 1.1.2.14 01-Oct-2007  yamt - fix LARGEPAGES.
- add amd64 linker script for it.
 1.1.2.13 30-Sep-2007  yamt pmap_do_remove: don't iterate each leaf ptes.
it should make pmap_collect not too slow.
 1.1.2.12 30-Sep-2007  yamt change (foo == false) to (!foo).
 1.1.2.11 30-Sep-2007  yamt implement deferred pmap switching for amd64, and make amd64 use
x86 shared pmap code. it makes several i386 pmap improvements available
to amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
 1.1.2.10 30-Sep-2007  yamt fix a comment.
 1.1.2.9 30-Sep-2007  yamt pmap_bootstrap: "pte++" for consistency. no functional changes are intended.
 1.1.2.8 30-Sep-2007  yamt pmap_bootstrap: use UVM_OBJ_INIT.
 1.1.2.7 30-Sep-2007  yamt fix pmap_growkernel for amd64.
 1.1.2.6 30-Sep-2007  yamt pmap_activate: deal with PCB_GS64 and PCB_FS64 if __x86_64__.
 1.1.2.5 30-Sep-2007  yamt pmap_extract: fix problems with page frames whose upper bits 0.
 1.1.2.4 29-Sep-2007  yamt pmap_bootstrap: fix a problem with 64-bit paddr_t.
 1.1.2.3 29-Sep-2007  yamt pull pmap_changeprot_local and pmap_prealloc_lowmem_ptps
from amd64/amd64/pmap.c
 1.1.2.2 29-Sep-2007  yamt make this compilable on amd64.
 1.1.2.1 29-Sep-2007  yamt move i386/i386/pmap.c to x86/x86/pmap.c.
 1.2.10.5 23-Mar-2008  matt sync with HEAD
 1.2.10.4 09-Jan-2008  matt sync with HEAD
 1.2.10.3 08-Nov-2007  matt sync with -HEAD
 1.2.10.2 06-Nov-2007  matt sync with HEAD
 1.2.10.1 18-Oct-2007  matt file pmap.c was added on branch matt-armv6 on 2007-11-06 23:23:53 +0000
 1.2.8.4 18-Feb-2008  mjf Sync with HEAD.
 1.2.8.3 27-Dec-2007  mjf Sync with HEAD.
 1.2.8.2 08-Dec-2007  mjf Sync with HEAD.
 1.2.8.1 19-Nov-2007  mjf Sync with HEAD.
 1.2.6.8 27-Feb-2008  yamt sync with head.
 1.2.6.7 11-Feb-2008  yamt sync with head.
 1.2.6.6 04-Feb-2008  yamt sync with head.
 1.2.6.5 21-Jan-2008  yamt sync with head
 1.2.6.4 07-Dec-2007  yamt sync with head
 1.2.6.3 15-Nov-2007  yamt sync with head.
 1.2.6.2 27-Oct-2007  yamt sync with head.
 1.2.6.1 18-Oct-2007  yamt file pmap.c was added on branch yamt-lazymbuf on 2007-10-27 11:29:04 +0000
 1.2.4.7 09-Dec-2007  jmcneill Sync with HEAD.
 1.2.4.6 03-Dec-2007  joerg Sync with HEAD.
 1.2.4.5 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.4.4 21-Nov-2007  joerg Sync with HEAD.
 1.2.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.2.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.2.4.1 18-Oct-2007  joerg file pmap.c was added on branch jmcneill-pm on 2007-10-26 15:43:46 +0000
 1.2.2.5 03-Dec-2007  ad Sync with HEAD.
 1.2.2.4 03-Dec-2007  ad Sync with HEAD.
 1.2.2.3 24-Oct-2007  ad Use a pool_cache to allocate pv entries. PR port-i386/37193.
 1.2.2.2 23-Oct-2007  ad Sync with head.
 1.2.2.1 18-Oct-2007  ad file pmap.c was added on branch vmlocking on 2007-10-23 20:36:41 +0000
 1.10.4.3 11-Dec-2007  yamt sync with head.
 1.10.4.2 10-Dec-2007  yamt make pmap_growkernel interrupt-safe.
 1.10.4.1 10-Dec-2007  yamt pmap_get_physpage: don't bother to associate kernel ptps to uvm objects.
 1.10.2.2 26-Dec-2007  ad Sync with head.
 1.10.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.13.2.21 23-Jan-2008  bouyer Sync with HEAD.
 1.13.2.20 20-Jan-2008  bouyer Sync with HEAD
 1.13.2.19 20-Jan-2008  bouyer Remove debug printk()
 1.13.2.18 20-Jan-2008  bouyer Really make non-PAE kernels build.
 1.13.2.17 20-Jan-2008  bouyer Make native kernel build again.
 1.13.2.16 19-Jan-2008  bouyer Switch back to a pool cache for PAE pdp, using a custom pool_allocator
in this case (items of 4 * PAGE_SIZE). thanks to tls@ for the hint.
 1.13.2.15 19-Jan-2008  bouyer Sync with HEAD
 1.13.2.14 18-Jan-2008  bouyer Make non-PAE kernels build again.
 1.13.2.13 18-Jan-2008  bouyer Fix APDP handling. A XEN i386PAE kernel now boots multiuser
 1.13.2.12 17-Jan-2008  bouyer - Fix L2_SLOT_APTE value (not sure how I got this value but it was definitively
wrong)
- Use global variable for the PAE L3 page adresses, so that pmap.c can get it
from the bootstrap code
- Extent the size of our virtual PDP from 3 to 4 pages, so that pmap->pm_pdir[]
is contigous for the whole VA range. The last page is a shadow of
the kernel's real PDP (L3[3]).
- make pm_pdirpa an array of 4 paddr_t if using PAE. introduce a
pmap_pdirpa macro to get the physical address of a given PD entry.
- fix pmap_map_pte

The kernel now boots single-user. fsck will cause a kernel fault in
pmap_pdes_invalid() on exit.
 1.13.2.11 15-Jan-2008  bouyer Snapshot of work in progress: an Xen i386PAE kernel boots and start init
on a amd64 dom0, but panics when init forks.
This code needs a lot of cleanup, and the pmap handling is minimal to
allow init to start. It's a proof of concept of how PAE on Xen can work.

For PAE guest, the Xen MMU handling differs in some significant way
from the i386 or amd64 Xen.
The L3 page has only 4 entries, the last one mapping 0xc0000000->0xffffffff
(which happens to be our kenrel VM range, that's cool). The L2 page
pointed to by this last entry is handled specially by Xen because it
contains some Xen private mapping, including a recursive mapping. So this
page can only be pointed to by exactly one L3 entry, and nothing else
(it can't be part of a recursive mapping for example). In addition, it
would waste too much VA space to do recursive mapping at the L3 level.

We do pmap switching at the L# level, instead of doing it though %cr3.
%cr3 is static, as is L3[3] which contains only kenrel mappings.
pmap_load() does pmap switching though the first 3 entries for L3.

PTE mapping is done though 4 contigous L2 entries; the last one pointing
to a shadow of L3[3]. This way we can consider we have a 2-level VM system,
but with the L2 being 4 pages in size instead of one. The plx_i()
macros can be used with it to access the PTE without changes.

This can be reused as is for native PAE support (without the L3[3] shadow
which wouldn't be needed here)
 1.13.2.10 13-Jan-2008  bouyer Now that addr is a paddr_t, increment it by sizeof(pd_entry_t)
 1.13.2.9 13-Jan-2008  bouyer Work in progress on xeni386 PAE support:
Make xeni386 build with a 64bit paddr_t. For this vaddr_t vs paddr_t vs
pointers usages had to be clarified.
If 'options PAE' is present in a Xen3 kernel, switch paddr_t, pd_entry_t
and pt_entry_t to 64bits, and add the PAE entry in the __xen_guest ELF section.
 1.13.2.8 10-Jan-2008  bouyer Sync with HEAD
 1.13.2.7 09-Jan-2008  bouyer Merge xen bits to i386/i386/gdt.c. Convert remaining uses of PTE_* macros to
pmap_pte_* macros/inlines.
Fix think-o in pmap.c for native i386.
 1.13.2.6 08-Jan-2008  bouyer Sync with HEAD
 1.13.2.5 02-Jan-2008  bouyer Sync with HEAD
 1.13.2.4 13-Dec-2007  bouyer Sync with HEAD
 1.13.2.3 13-Dec-2007  bouyer - make amd64 XEN3 kernels build again
- pin the pdp pages in the PDP cache contructor, and unpin them in the
destructor. garbage-collect PMF_USER_XPIN.
 1.13.2.2 13-Dec-2007  bouyer Make it work for XEN2
 1.13.2.1 11-Dec-2007  bouyer Switch i386 to x86/x86/pmap.c
 1.49.6.7 17-Jan-2009  mjf Sync with HEAD.
 1.49.6.6 28-Sep-2008  mjf Sync with HEAD.
 1.49.6.5 29-Jun-2008  mjf Sync with HEAD.
 1.49.6.4 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.49.6.3 03-Jun-2008  mjf I accidentally committed a local patch. Revert it.
 1.49.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.49.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.50.2.3 17-Jun-2008  yamt sync with head.
 1.50.2.2 04-Jun-2008  yamt sync with head
 1.50.2.1 18-May-2008  yamt sync with head.
 1.52.2.6 11-Aug-2010  yamt sync with head.
 1.52.2.5 11-Mar-2010  yamt sync with head
 1.52.2.4 19-Aug-2009  yamt sync with head.
 1.52.2.3 18-Jul-2009  yamt sync with head.
 1.52.2.2 04-May-2009  yamt sync with head.
 1.52.2.1 16-May-2008  yamt sync with head.
 1.59.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.59.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.68.2.3 18-Jul-2008  simonb Sync with head.
 1.68.2.2 27-Jun-2008  simonb Sync with head.
 1.68.2.1 18-Jun-2008  simonb Sync with head.
 1.71.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.71.2.1 19-Oct-2008  haad Sync with HEAD.
 1.74.4.4 24-Feb-2012  sborrill Pull up the following revisions(s) (requested by bouyer in ticket #1729):
sys/arch/x86/x86/pmap.c: revision 1.170 via patch
sys/arch/xen/x86/x86_xpmap.c: revision 1.40 via patch

Fix random kernel panic on domains with large memory.
May fix PR port-xen/38699
 1.74.4.3 22-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.74.4.2 14-Feb-2010  bouyer Pull up following revision(s) (requested by hubertf in ticket #1290):
sys/kern/kern_ksyms.c: revision 1.53
sys/dev/pci/agp_via.c: revision 1.18
sys/netipsec/key.c: revision 1.63
sys/arch/x86/x86/x86_autoconf.c: revision 1.49
sys/kern/init_main.c: revision 1.415
sys/kern/cnmagic.c: revision 1.11
sys/netipsec/ipsec.c: revision 1.47
sys/arch/x86/x86/pmap.c: revision 1.100
sys/netkey/key.c: revision 1.176
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.74.4.1 04-Apr-2009  snj branches: 1.74.4.1.2; 1.74.4.1.4;
Pull up following revision(s) (requested by ad in ticket #656):
sys/arch/amd64/amd64/gdt.c: revision 1.21 via patch
sys/arch/amd64/amd64/machdep.c: revision 1.129 via patch
sys/arch/i386/i386/gdt.c: revision 1.47 via patch
sys/arch/i386/i386/kvm86.c: revision 1.17 via patch
sys/arch/i386/i386/locore.S: revision 1.85 via patch
sys/arch/i386/i386/machdep.c: revision 1.666 via patch
sys/arch/i386/i386/vector.S: revision 1.45 via patch
sys/arch/i386/include/pcb.h: revision 1.47 via patch
sys/arch/x86/include/pmap.h: revision 1.22 via patch
sys/arch/x86/include/sysarch.h: revision 1.8 via patch
sys/arch/x86/x86/pmap.c: revision 1.80 via patch
sys/arch/x86/x86/sys_machdep.c: revision 1.17 via patch
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.143 via patch
sys/kern/init_main.c: revision 1.384 via patch
PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash
Fix numerous problems:
1. LDT updates are not atomic.
2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.
3. LDTR can be leaked over context switch.
4. GDT slot allocations can race, giving the same LDT slot to two procs.
5. Incomplete interrupt/trap frames can be stacked.
6. In some rare cases segment faults are not handled correctly.
 1.74.4.1.4.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.74.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.74.4.1.2.1 23-Apr-2010  snj Apply patch (requested by jym in ticket #1380):
Fix the NX regression issue observed on amd64 kernels, where per-page
execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).
 1.74.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.74.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.74.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.77.2.13 17-Sep-2011  jym Fix comment, as noted by cherry@.
 1.77.2.12 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.77.2.11 07-May-2011  jym Fix the (recurring) problem with APDPs and Xen: unmap all of them at
suspend, and let MD parts remap them when they are needed.

The issue is not PAE specific, this can be triggered with i386 and amd64.
Xen evaluates mappings in a lazy fashion, and it can incorrectly detects
recursive ones when they are pointing to inactive pmaps.

Move a comment that explains the L2 shadow page unmapping code closer to
the associated function, it makes more sense.

Now, you can save/suspend all kind of NetBSD domUs, with xbd(4) and
xennet(4) devices. Remaining bugs are in xbd(4) and xennet(4) resuming,
where the mappings have to be updated before issuing more I/Os. More
Linux code reading I guess... Stay tuned.

XXX (note to myself): move away from the machdep.sleep_state sysctl.
 1.77.2.10 02-May-2011  jym Sync with head.
 1.77.2.9 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.77.2.8 24-Oct-2010  jym Sync with HEAD
 1.77.2.7 01-Nov-2009  jym - Upgrade suspend/resume code to comply with Xen2 removal.
- Add support for PAE domUs suspend/resume.
- Fix an issue regarding initialization of the xbd ring I/O that could end
badly during resume, with invalid block operations submitted to dom0 backend.

NetBSD supports PAE under x86_32 by considering the L2 page as being
4 pages long instead of 1.

Xen validates the page types during resume. Sadly, the hypervisor handles
alternative recursive mappings (== PG/PD entries pointing to pages other
than self) inadequately, which could lead to incorrect page pinning.

As a result, the important change with this patch is to clear these alternative
mappings during suspend, and reset them back to their former self upon
resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as
alternative recursive mappings.

See comments in pmap.c for further details.

Now, let the testing and bug hunting begin.
 1.77.2.6 01-Nov-2009  jym Sync with HEAD.
 1.77.2.5 24-Jul-2009  jym - rework the page pinning API, so that now a function is provided for
each level of indirection encountered during virtual memory translations. Update
pmap accordingly. Pinning looks cleaner that way, and it offers the possibility
to pin lower level pages if necessary (NetBSD does not do it currently).

- some fixes and comments to explain how page validation/invalidation take
place during save/restore/migrate under Xen. L2 shadow entries from PAE are now
handled, so basically, suspend/resume works with PAE.

- fixes an issue reported by Christoph (cegger@) for xencons suspend/resume
in dom0.

TODO:

- PAE save/restore is currently limited to single-user only, multi-user
support requires modifications in PAE pmap that should be discussed first. See
the comments about the L2 shadow pages cached in pmap_pdp_cache in this commit.

- grant table bug is still there; do not use the kernels of this branch
to test suspend/resume, unless you want to experience bad crashes in dom0,
and push the big red button.

Now there is light at the end of the tunnel :)

Note: XEN2 kernels will neither build nor work with this branch.
 1.77.2.4 23-Jul-2009  jym Sync with HEAD.
 1.77.2.3 31-May-2009  jym Modifications for the Xen suspend/migrate/resume branch:

- introduce xenbus_device_{suspend,resume}() functions. These are routines
used to suspend/resume MI parts of the Xenbus device interfaces, like updating
frontend/backend devices' paths found in XenStore.

- introduce HYPERVISOR_sysctl(), an hypercall used only by Xentools to obtain
information from hypervisor (listing VMs, printing console, etc.). I use it
to query xenconsole from ddb(), as a last resort in case of a panic() in
dom0 (xm being not available). Currently unused in the branch; could be, if
requested.

- disable the rwlock(9) used to protect code that could use transient MFNs.
It could trigger nasty context switches in place it should not to.

- fix some bugs in the xennet/xbd suspend/resume pmf(9) handlers.

- following XenSource's design, talk_to_otherend() is now called
watch_otherend(), and free_otherend_details() is used by Xenbus device
suspend/resume routines.

- some slight modifications in pmap regarding APDP. Introduce an inline
function (pmap_unmap_apdp_pde()) that clears APDP entry for the current pmap.

- similarly, implement pmap_unmap_all_apdp_pdes() that iterates through all
pmaps and tears down APDP, as Xen does not handle them properly.

TODO/XXX:

- pmap_unmap_apdp_pde() does not handle APDP shadow entry of PAE. It will,
once I figure out how PAE uses it.

- revisit the pmap locking issue regarding transient MFNs. As NetBSD does not
use kernel preemption and MP for Xen, this could be skipped momentarily. See
http://mail-index.netbsd.org/port-xen/2009/04/27/msg004903.html for details.

- fix a bug regarding grant tables which could technically DoS a dom0 if
ridiculously high consumer/producer indexes are passed down in the ring during
a resume.

All in all, once the grant table index issue and APDP PAE are fixed, next step
is to torture test this branch.

Tested under i386 PAE and non-PAE, Xen3 dom0 and domU. amd64 is only compile
tested.
 1.77.2.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.77.2.1 09-Feb-2009  jym Initial code for xen save/restore/migrate facilities.

- split the attach code of frontends in two half: one that is only needed
during autoconf(9) attach/detach phases, and one used at each save/restore
of device state (between suspend and resume).

Applies to hypervisor, xencons, xenbus, xbd, and xennet.

- add a rwlock(9) ("ptom_lock") to protect the different parts in the kernel
that manipulate MFNs (which could change between a suspend and a resume,
without the kernel noticing it). Parts that require MFNs acquire a reader lock,
while suspend code will acquire a writer lock to ensure that no-other parts
in kernel still use MFNs.

- integrate the suspend code with sysmon.

- various things in pmap(9), and clock.

TODO:
- factorize code a bit more inside frontends drivers.
- remove all alternative recursive (APDP_PDE) mappings found in PD/PT during
suspend, as Xen does not support them.
- abstract the ptom_lock locking, it is only required when kernel preemption
is enabled, or on MP systems.

Current code works mostly. You may experience difficulties in some corner
cases (dom0 warnings about xennet interface errors, and Xen tools failing to
validate NetBSD's alternative pmaps).
 1.100.2.6 31-Oct-2010  uebayasi We already have a flag PMAP_NOCACHE. s/PMAP_UNMANAGED/PMAN_NOCACHE/.
Pointed out by Chuck Silvers, thanks.
 1.100.2.5 30-Oct-2010  uebayasi Implement pmap_physload_device(9) to replace xmd(4) MD backend.
Implement pmap_mmap(9) and use it from mem(4) and xmd(4).
 1.100.2.4 17-Aug-2010  uebayasi Sync with HEAD.
 1.100.2.3 30-Apr-2010  uebayasi Sync with HEAD.
 1.100.2.2 27-Apr-2010  uebayasi Support PMAP_UNMANAGED in some pmaps.

(Others should be converted eventually, but no problem while managed
device page is not used.)
 1.100.2.1 25-Feb-2010  uebayasi pg->mdpage -> VM_PAGE_TO_MD(pg)
 1.105.2.16 31-May-2011  rmind sync with head
 1.105.2.15 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.105.2.14 21-Apr-2011  rmind sync with head
 1.105.2.13 17-Mar-2011  rmind - Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE)
is not (yet) enabled. This fixes the issue of stale TLB entry, experienced
early on boot, when PGE is not yet set on primary CPU.
- Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly),
which simplifies and unifies (under x86) code, plus fixes few bugs.
- cpu_attach: remove assignment to cpus_running, as primary CPU might not be
attached first, which causes reset (and thus missed secondary CPUs).
 1.105.2.12 08-Mar-2011  rmind pmap_deactivate: improve comment.
 1.105.2.11 08-Mar-2011  rmind - pmap_remove_ptes: simplify by removing the duplicate code and re-using
equivalent functionality in pmap_remove_pte().
- pmap_remove_pte: fix assert to allow page owner lock to be unacquired
if pmap is pmap_kernel(); relevant for UVM_KMF_PAGEABLE memory case.
 1.105.2.10 05-Mar-2011  rmind sync with head
 1.105.2.9 31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.105.2.8 30-May-2010  rmind sync with head
 1.105.2.7 26-May-2010  rmind pmap_map_ptes: handle emap on TLB flush.
 1.105.2.6 26-May-2010  rmind Split x86 TLB shootdown code into a separate file.
Code part is under TNF license, as per pmap.c 1.105.2.4 revision.
 1.105.2.5 26-Apr-2010  rmind Partly rewrite amd64 TLB shutdown handler for the changes in x86 pmap.
At this point, branch seems to pass preliminar stress tests on amd64.
 1.105.2.4 26-Apr-2010  rmind Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap,
also provide TLBSTATS option to measure and track TLB shootdowns. Details:

http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html

Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.

XXX: amd64 and xen are not yet; work in progress.
 1.105.2.3 25-Apr-2010  rmind - Drop vmmap and its reserved page on hp700, sparc and x86.
- mm_init: use UVM_KMF_WAITVA when allocating a VA.
 1.105.2.2 25-Apr-2010  rmind Drop per-"MD page" (i.e. struct pmap_page) locking i.e. pp_lock/pp_unlock
and rely on locking provided by upper layer, UVM. Sprinkle asserts.
 1.105.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.113.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.113.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.113.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.121.2.8 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.121.2.7 09-Sep-2011  cherry make #define PG_k visible on all xen archs
 1.121.2.6 20-Aug-2011  cherry PAE MP support (preliminary), amd64 per-cpu L4 model redesigned, i386 pmap_pa_start/end fixup
 1.121.2.5 17-Aug-2011  cherry Pullup relevant changes from -current
 1.121.2.4 31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.121.2.3 16-Jul-2011  cherry Introduce a per-cpu "shadow" for pmap_kernel()'s L4 page
 1.121.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.121.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.137.2.12 22-May-2014  yamt g/c a write-only variable.
 1.137.2.11 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.137.2.10 16-Jan-2013  yamt sync with (a bit old) head
 1.137.2.9 02-Nov-2012  yamt remove a debug printf
 1.137.2.8 30-Oct-2012  yamt sync with head
 1.137.2.7 23-May-2012  yamt sync with head.
 1.137.2.6 17-Apr-2012  yamt sync with head
 1.137.2.5 24-Nov-2011  yamt share a lock among pmap uobjs
 1.137.2.4 18-Nov-2011  yamt share a lock among pmap uobjs
 1.137.2.3 10-Nov-2011  yamt remove uobj->memq
 1.137.2.2 10-Nov-2011  yamt sync with head
 1.137.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.142.2.4 29-Apr-2012  mrg sync to latest -current.
 1.142.2.3 04-Mar-2012  mrg sync to latest -current.
 1.142.2.2 24-Feb-2012  mrg sync to -current.
 1.142.2.1 18-Feb-2012  mrg merge to -current.
 1.164.2.6 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1441):
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
sys/arch/x86/include/pmap.h: revision 1.63 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.164.2.5 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1365):
sys/arch/x86/x86/pmap.c: revision 1.190
Operation pmap_pp_clear_attrs() may remove the "used" attribute from a page
that is still cached in the TLB of other CPUs.
Call pmap_tlb_shootnow() here before enabling preemption to clear the
TLB entries on other CPUs.
Should prevent tmpfs data corruption under load.
Ok: Chuck Silvers
 1.164.2.4 09-May-2012  riz branches: 1.164.2.4.4; 1.164.2.4.6;
Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.164.2.3 23-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #39):
sys/arch/x86/x86/pmap.c: revision 1.170
sys/arch/xen/x86/x86_xpmap.c: revision 1.40
On Xen, there is variable-sized Xen data after the kernel's text+data+bss
(this include the physical->machine table).
(vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
on a domU with lots of RAM (more than 4GB) (so large
xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
some page shared with the hypervisor, or our kernel page table). Using it for
early_zerop will cause of these pages to be unmapped after bootstrap.
This will cause a kernel page fault for the domU, either immediatly or
eventually much later, depending on where early_zerop points to.
To fix this, account for early_zerop when building the bootstrap pages,
and its VA from here.
May fix PR port-xen/38699
 1.164.2.2 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #31):
sys/arch/x86/x86/pmap.c: revision 1.166
sys/arch/xen/x86/cpu.c: revision 1.83
- Make pmap_write_protect() work with pmap_kernel() too ((va & L2_FRAME)
strips the high bits of a LP64 address)
- use pmap_protect() in pmap_pdp_ctor() to remap the PDP read-only instead
of (ab)using pmap_kenter_pa(). No more "mapping already present" on
console with DIAGNOSTIC kernels
- make sure to zero the whole PDP (NTOPLEVEL_PDES doens't include
high-level entries on i386 and i386PAE, reserved by Xen). Not sure
how it has worked before
- remove an always-true test (&& pmap != pmap_kernel(); we KASSERT that
at the function entry).
use pmap_protect() instead of pmap_kenter_pa() to remap R/O an exiting
page. This gets rid of the last "mapping already present" warnings.
 1.164.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.164.2.4.6.2 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1441):
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
sys/arch/x86/include/pmap.h: revision 1.63 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.164.2.4.6.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1365):
sys/arch/x86/x86/pmap.c: revision 1.190
Operation pmap_pp_clear_attrs() may remove the "used" attribute from a page
that is still cached in the TLB of other CPUs.
Call pmap_tlb_shootnow() here before enabling preemption to clear the
TLB entries on other CPUs.
Should prevent tmpfs data corruption under load.
Ok: Chuck Silvers
 1.164.2.4.4.2 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1441):
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
sys/arch/x86/include/pmap.h: revision 1.63 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.164.2.4.4.1 14-Jul-2016  snj Pull up following revision(s) (requested by hannken in ticket #1365):
sys/arch/x86/x86/pmap.c: revision 1.190
Operation pmap_pp_clear_attrs() may remove the "used" attribute from a page
that is still cached in the TLB of other CPUs.
Call pmap_tlb_shootnow() here before enabling preemption to clear the
TLB entries on other CPUs.
Should prevent tmpfs data corruption under load.
Ok: Chuck Silvers
 1.178.2.3 03-Dec-2017  jdolecek update from HEAD
 1.178.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.178.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.179.2.1 18-May-2014  rmind sync with head
 1.181.2.1 10-Aug-2014  tls Rebase.
 1.183.2.7 03-Jan-2018  snj Apply patch (requested by maxv in ticket #1531):
amd64: Make the direct map non executable.
 1.183.2.6 06-Mar-2017  snj branches: 1.183.2.6.2;
Pull up following revision(s) (requested by bouyer in ticket #1388):
sys/arch/x86/x86/pmap.c: revision 1.241
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.183.2.5 18-Dec-2016  snj Pull up following revision(s) (requested by riastradh in ticket #1316):
sys/arch/x86/x86/pmap.c: revision 1.223
sys/arch/x86/x86/vm_machdep.c: revision 1.26
sys/arch/x86/include/pmap.h: revision 1.61
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.183.2.4 26-Feb-2016  snj branches: 1.183.2.4.2;
Pull up following revision(s) (requested by hannken in ticket #1100):
sys/arch/x86/x86/pmap.c: revision 1.190
Operation pmap_pp_clear_attrs() may remove the "used" attribute from a page
that is still cached in the TLB of other CPUs.
Call pmap_tlb_shootnow() here before enabling preemption to clear the
TLB entries on other CPUs.
Should prevent tmpfs data corruption under load.
Ok: Chuck Silvers
 1.183.2.3 12-Feb-2016  snj Pull up following revision(s) (requested by riastradh in ticket #1115):
sys/arch/x86/x86/pmap.c: patch
Use IPL_NONE for pserialized lock. Assert sleepable. (OOPS.)
 1.183.2.2 23-Apr-2015  snj branches: 1.183.2.2.2;
Pull up following revision(s) (requested by mrg in ticket #718):
sys/arch/x86/include/pmap.h: revision 1.56
sys/arch/x86/x86/pmap.c: revision 1.188
sys/dev/pci/agp_amd64.c: revision 1.8
sys/dev/pci/agp_i810.c: revision 1.118
sys/external/bsd/drm2/dist/drm/i915/i915_dma.c: revision 1.16
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: revision 1.29
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_agp.c: revision 1.3
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_ttm.c: revision 1.4
sys/external/bsd/drm2/dist/drm/radeon/atombios_crtc.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_agp.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_display.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_legacy_crtc.c: revision 1.2
sys/external/bsd/drm2/dist/drm/radeon/radeon_object.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_ttm.c: revision 1.7
sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c: revisions 1.7-1.10
sys/external/bsd/drm2/dist/drm/ttm/ttm_bo_util.c: revision 1.5
sys/external/bsd/drm2/i915drm/intelfb.c: revision 1.13
sys/external/bsd/drm2/include/drm/drm_wait_netbsd.h: revisions 1.12, 1.13
sys/external/bsd/drm2/include/linux/mm.h: revision 1.5
sys/external/bsd/drm2/include/linux/pci.h: revisions 1.16, 1.17
sys/external/bsd/drm2/nouveau/nouveaufb.c: revision 1.2
sys/external/bsd/drm2/radeon/radeon_pci.c: revisions 1.8, 1.9
sys/uvm/uvm_init.c: revision 1.46
Hack against the blank console problem:
Leave the CLUT alone on ancient cards. At least this leaves us with a
semi working console (red and blue are flipped). Leave an example of what
seems to be happening but disable it because colors are better than 444 bit
greyscale.
--
Initialize P->V tracking for unmanaged device pages in uvm_init.

Conditional on __HAVE_PMAP_PV_TRACK until we add it to all pmaps.

MI part of pmap_pv(9) change proposed on tech-kern:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
--
Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
--
Use pmap_pv(9) to remove mappings of Intel graphics aperture pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html

Further background at:

https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html
--
Use pmap_pv(9) to remove mappings of device pages in TTM.

Adapt nouveau and radeon to do pmap_pv_track for their device pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html

Further background at:

https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html
--
Fix error branches in agp_amd64.c.

- agp_generic_detach always.
- Free asc if it was allocated. (Found by Brainy, noted by maxv@.)
- Free the GATT if it was allocated.
--
pmf_device_register returns false on failure, not true
--
In DRM_SPIN_WAIT_ON, don't stop after waiting only one tick.

Continue the loop to recheck the condition and count the whole
duration.
--
Don't use the video BIOS memory as an i915 flush page!
--
Don't let anyone else allocate the video BIOS either.
--
Missed a zero: it's 0x100000, not 0x10000.
--
Don't reserve if atomic -- caller must have pre-pinned the buffer.
--
Don't reserve if atomic -- caller must have pre-pinned the buffer.
--
almost add radeondrmkms suspend/resume support. it unfortunately doesn't work.
--
Need the page's uvm object lock to do pmap_page_protect.
--
Use KASSERTMSG to show bad base/offset.
--
KASSERT about page-alignment on initialization too.
--
Don't break when hardclock_ticks wraps around.

Since we now only count time spent in wait, rather than determining
the end time and checking whether we've passed it, timeouts might be
marginally longer in effect. Unlikely to be an issue.
--
Remove broken drm2 vm_mmap stub. Can't possibly have ever worked.
--
apply some of the additional changes from Arto Huusko in PR#49645:
- call pmf_device_deregister on detach.

i've kept the "resume = true" for radeon_resume_kms() call as it
seems to work for me (indeed, code inspection shows it is unused
on netbsd :-)

my old nforce4 box that can resume old drm (or could, last i tried
several years ago) while X and GL apps were running, can at least
survive a resume if X hasn't started. my one attempt so far with
X exited, but having run, did not work.
--
First attempt to make ttm_buffer_object_transfer less bogus.
--
Make sure mem.bus.is_iomem is initialized. PR 49833
 1.183.2.1 14-Oct-2014  martin Pull up following revision(s) (requested by bouyer in ticket #140):
sys/arch/x86/x86/pmap.c: revision 1.184
Add a missing || defined(XEN) which cause Xen non-DIAGNOSTIC kernels
to panic at boot.
 1.183.2.6.2.1 03-Jan-2018  snj Apply patch (requested by maxv in ticket #1531):
amd64: Make the direct map non executable.
 1.183.2.4.2.2 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.183.2.4.2.1 18-Jan-2017  skrll Sync with netbsd-5
 1.183.2.2.2.4 03-Jan-2018  snj Apply patch (requested by maxv in ticket #1531):
amd64: Make the direct map non executable.
 1.183.2.2.2.3 06-Mar-2017  snj Pull up following revision(s) (requested by bouyer in ticket #1388):
sys/arch/x86/include/pmap.h: revision 1.63 via patch
sys/arch/x86/x86/pmap.c: revision 1.241 via patch
Should be PG_k, doesn't change anything.
--
Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege
separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the
separation is guaranteed by the hypervisor - each syscall/trap is
intercepted by Xen and sent manually to the kernel. Before that, the
hypervisor modifies the page tables so that the kernel becomes accessible.
Later, when returning to userland, the hypervisor removes the kernel pages
and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages
flushed Xen marks the userland pages as global, while keeping the kernel
ones as local. This way, when returning to userland, only the kernel pages
get flushed - which makes sense since they are the only ones that got
removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the
PTE; if a page has this bit then Xen tags it as global, otherwise Xen
manually adds the bit but keeps the page as local. The thing is, since we
set PG_u in the kernel pages, Xen believes our kernel pages are in fact
userland pages, so it marks them as global. Therefore, when returning to
userland, the kernel pages indeed get removed from the page tree, but are
not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window
where it can read/write to the last kernel pages accessed, which is enough
to completely escalate privileges: the sysent structure systematically gets
read when performing a syscall, and chances are that it will still be
cached in the TLB. Userland can then use this to patch a chosen syscall,
make it point to a userland function, retrieve %gs and compute the address
of its credentials, and finally grant itself root privileges.
 1.183.2.2.2.2 18-Dec-2016  snj Pull up following revision(s) (requested by riastradh in ticket #1316):
sys/arch/x86/x86/pmap.c: revision 1.223
sys/arch/x86/x86/vm_machdep.c: revision 1.26
sys/arch/x86/include/pmap.h: revision 1.61
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.183.2.2.2.1 26-Feb-2016  snj Pull up following revision(s) (requested by hannken in ticket #1100):
sys/arch/x86/x86/pmap.c: revision 1.190
Operation pmap_pp_clear_attrs() may remove the "used" attribute from a page
that is still cached in the TLB of other CPUs.
Call pmap_tlb_shootnow() here before enabling preemption to clear the
TLB entries on other CPUs.
Should prevent tmpfs data corruption under load.
Ok: Chuck Silvers
 1.187.2.9 28-Aug-2017  skrll Sync with HEAD
 1.187.2.8 05-Feb-2017  skrll Sync with HEAD
 1.187.2.7 05-Dec-2016  skrll Sync with HEAD
 1.187.2.6 05-Oct-2016  skrll Sync with HEAD
 1.187.2.5 09-Jul-2016  skrll Sync with HEAD
 1.187.2.4 29-May-2016  skrll Sync with HEAD
 1.187.2.3 19-Mar-2016  skrll Sync with HEAD
 1.187.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.187.2.1 06-Apr-2015  skrll Sync with HEAD
 1.211.2.7 26-Apr-2017  pgoyette Sync with HEAD
 1.211.2.6 20-Mar-2017  pgoyette Sync with HEAD
 1.211.2.5 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.211.2.4 04-Nov-2016  pgoyette Sync with HEAD
 1.211.2.3 06-Aug-2016  pgoyette Resolve $NetBSD$ conflict
 1.211.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.211.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.236.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.245.6.7 08-Dec-2021  martin Pull up the following, requested by msaitoh in ticket #1720:

sys/arch/x86/include/specialreg.h 1.146, 1.171,
1.173-1.178 via patch
sys/arch/x86/x86/identcpu.c 1.106, 1.117,
1.122 via patch
sys/arch/x86/x86/pmap.c patch
sys/external/bsd/drm2/drm/drm_cache.c 1.14
usr.sbin/cpuctl/arch/i386.c 1.114-1.117


- Add PT, PKRU, HDC, LA57, PKE, PKS, CET, CET_U, CET_S, HWP, KL,
AVX512_BF16, TME_EN and PCONFIG.
- Rename some macros to match the x86 specification and the other OSes.
- Print CPUID 0x8000008 %ebx on Intel, too.
- Print CPUID leaf 7 subleaf 1.
- Identify Tiger Lake, 3rd gen Xeon Scalable (Ice Lake), Elkhart Lake
and Jasper Lake.
- Remove a few unused MSRs.
- Add comment.
- KNF. Whitespace fix.
 1.245.6.6 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.245.6.5 16-Mar-2018  martin Pull up the following revisions (via patch), requested by maxv in #635:

sys/arch/amd64/amd64/gdt.c 1.39-1.45 (patch)
sys/arch/amd64/amd64/amd64/machdep.c 1.284,1.287,1.288 (patch)
sys/arch/amd64/amd64/include/param.h 1.23 (patch)
sys/arch/amd64/include/types.h 1.53 (patch)
sys/arch/x86/include/cpu.h 1.87 (patch)
sys/arch/x86/include/pmap.h 1.73,1.74 (patch)
sys/arch/x86/x86/cpu.c 1.142 (patch)
sys/arch/x86/x86/intr.c 1.117 (partial),1.120 (patch)
sys/arch/x86/x86/pmap.c 1.276 (patch)

Initialize ist0 in cpu_init_tss.
Backport __HAVE_PCPU_AREA.
 1.245.6.4 13-Mar-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #629:

sys/arch/amd64/amd64/genassym.cf 1.63,1.64
sys/arch/amd64/amd64/locore.S 1.144
sys/arch/amd64/amd64/machdep.c 1.281-1.283
sys/arch/i386/i386/genassym.cf 1.105-1.106
sys/arch/i386/i386/locore.S 1.155
sys/arch/i386/i386/machdep.c 1.802 (adapted),1.803
sys/arch/x86/include/cpu.h 1.85
sys/arch/x86/x86/intr.c 1.115-1.116
sys/arch/x86/x86/pmap.c 1.275
sys/arch/x86/x86/sys_machdep.c 1.45
sys/arch/xen/x86/cpu.c 1.117

Stop sharing the double-fault stack.
Merge the TSS structures into one single cpu_tss structure, and
allocate it dynamically.
 1.245.6.3 06-Mar-2018  martin Also pull up r1.267, requested by mrg in ticket #593: avoid a NULL pointer
deref and simplify.
 1.245.6.2 27-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #593):
sys/dev/marvell/mvxpsec.c: revision 1.2
sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70
sys/opencrypto/crypto.c: revision 1.102
sys/arch/sparc64/sparc64/pmap.c: revision 1.308
sys/ufs/chfs/chfs_malloc.c: revision 1.5
sys/arch/powerpc/oea/pmap.c: revision 1.95
sys/sys/pool.h: revision 1.80,1.82
sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220
sys/arch/alpha/alpha/pmap.c: revision 1.262
sys/kern/uipc_mbuf.c: revision 1.173
sys/uvm/uvm_fault.c: revision 1.202
sys/sys/mbuf.h: revision 1.172
sys/kern/subr_extent.c: revision 1.86
sys/arch/x86/x86/pmap.c: revision 1.266 (via patch)
sys/dev/dtv/dtv_scatter.c: revision 1.4

Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.

Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.

This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.

Define the new flag too for previous commit.

pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.

Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.

Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
Handle the ERESTART case from pool_grow()

don't pass 0 to the pool flags
Guess pool_cache_get(pc, 0) means PR_WAITOK here.
Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).

use PR_WAITOK everywhere.
use PR_NOWAIT.

Don't use 0 for PR_NOWAIT

use PR_NOWAIT instead of 0

panic ex nihilo -- PR_NOWAITing for zerot

Add assertions that either PR_WAITOK or PR_NOWAIT are set.
- fix an assert; we can reach there if we are nowait or limitfail.
- when priming the pool and failing with ERESTART, don't decrement the number
of pages; this avoids the issue of returning an ERESTART when we get to 0,
and is more correct.
- simplify the pool_grow code, and don't wakeup things if we ENOMEM.

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.

implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART. other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.
ok chs@ riastradh@

Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.

This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
 1.245.6.1 05-Jul-2017  snj Pull up following revision(s) (requested by jdolecek in ticket #98):
sys/arch/x86/x86/pmap.c: revision 1.252
remove panicstr KASSERT() in pmap_kremove_local() - kernel dump can
legitimely invoked also without panic - via reboot -d
fixes PR kern/49610 by Manuel Bouyer
 1.289.2.8 18-Jan-2019  pgoyette Synch with HEAD
 1.289.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.289.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.289.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.289.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.289.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.289.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.289.2.1 21-May-2018  pgoyette Sync with HEAD
 1.291.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.291.2.1 10-Jun-2019  christos Sync with HEAD
 1.334.2.8 18-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1924):

sys/arch/x86/x86/pmap.c: revision 1.427

x86/pmap: Use UVM_KMF_WAITVA to ensure pmap_pdp_alloc never fails.

This is used as the backing page allocator for pmap_pdp_pool, and
pmap_ctor assumes that PR_WAITOK allocations from it don't fail and
unconditionally writes to the resulting kva, which if null leads
nowhere good.

It is unclear to me why uvm_km_alloc can accept any combination of
the options UVM_KMF_NOWAIT and UVM_KMF_WAITVA. It seems to me that
at least one should be required (and they should be exclusive), and
any other use should trip an assertion.

PR kern/58666: panic: lock error: Reader / writer lock:
rw_vector_enter,357: locking against myself
 1.334.2.7 13-May-2022  martin Pull up following revision(s) (requested by bouyer in ticket #1443):

sys/arch/x86/x86/pmap.c: revision 1.414

return after calling xen_pagezero(), don't fall back to the legacy
pmap_zero_page() method.

This should only affect performances.
 1.334.2.6 03-Sep-2021  martin Pull up following revision(s) (requested by manu in ticket #1341):

sys/arch/x86/x86/pmap.c: revision 1.410

Make pat_init() a NOOP on XENPV; it causes a trap with Xen 4.15
 1.334.2.5 03-Sep-2020  martin Apply patch, requested by bouyer in ticket #1075:

sys/arch/x86/x86/pmap.c (apply patch)

Fix double count on ptp entries in pmap_enter_gnt(), which causes a KASSERT
at pmap_destroy() time. Call pmap_free_ptp() if needed. We can have a 0 wire
count if we had an old mapping and grant map hypercall failed,
and this was the only page in this ptp.

while there remove ptp != NULL checks for gnt operations: we always have
a ptp here.
 1.334.2.4 02-Sep-2020  martin Pull up following revision(s) (requested by bouyer in ticket #1073):

sys/arch/x86/x86/pmap.c: revision 1.404

Fix braino in pmap_find_gnt(), really return the gnt entry covering the range
and not one that starts just after.

Fixes a KASSERT in pmap_remove_gnt().
 1.334.2.3 07-Jun-2020  martin Apply patch, requested by bouyer in ticket #941:

sys/arch/x86/x86/pmap.c (apply patch)

Fix Xen dom0 kernel build with options DIAGNOSTIC.
 1.334.2.2 31-May-2020  martin Pull up following revision(s) (requested by bouyer in ticket #935):

sys/arch/xen/x86/x86_xpmap.c: revision 1.89
sys/arch/x86/include/pmap.h: revision 1.121
sys/arch/xen/xen/privcmd.c: revision 1.58
sys/external/mit/xen-include-public/dist/xen/include/public/memory.h: revision 1.2
sys/arch/xen/include/xenpmap.h: revision 1.44
sys/arch/xen/include/xenio.h: revision 1.12
sys/arch/x86/x86/pmap.c: revision 1.394
(all via patch)

Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl:
pass flags to xpq_update_foreign()

Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign()
to use the MMU_PT_UPDATE_NO_TRANSLATE flag.
make xpq_update_foreign() return the raw Xen error. This will cause
pmap_enter_ma() to return a negative error number in this case, but the
only user of this code path is privcmd.c and it can deal with it.

Add pmap_enter_gnt()m which maps a set of Xen grant entries at the
specified va in the specified pmap. Use the hooks implemented for EPT to
keep track of mapped grand entries in the pmap, and unmap them
when pmap_remove() is called. This requires pmap_remove() to be split
into a pmap_remove_locked(), to be called from pmap_remove_gnt().

Implement new ioctl, needed by Xen 4.13:
IOCTL_PRIVCMD_MMAPBATCH_V2
IOCTL_PRIVCMD_MMAP_RESOURCE
IOCTL_GNTDEV_MMAP_GRANT_REF
IOCTL_GNTDEV_ALLOC_GRANT_REF

Always enable declarations needed by privcmd.c
 1.334.2.1 29-Apr-2020  martin Pull up following revision(s) (requested by jmcneill in ticket #868):

sys/arch/x86/x86/pmap.c: revision 1.384
sys/arch/amd64/amd64/machdep.c: revision 1.349

Detect PAT on the boot processor before cpu0 attaches so the early genfb
attach code can map the framebuffer with write combining.
 1.354.2.2 29-Feb-2020  ad Sync with head.
 1.354.2.1 17-Jan-2020  ad Sync with head.
 1.381.2.2 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.381.2.1 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.407.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.407.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.423.4.1 18-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1008):

sys/arch/x86/x86/pmap.c: revision 1.427

x86/pmap: Use UVM_KMF_WAITVA to ensure pmap_pdp_alloc never fails.

This is used as the backing page allocator for pmap_pdp_pool, and
pmap_ctor assumes that PR_WAITOK allocations from it don't fail and
unconditionally writes to the resulting kva, which if null leads
nowhere good.

It is unclear to me why uvm_km_alloc can accept any combination of
the options UVM_KMF_NOWAIT and UVM_KMF_WAITVA. It seems to me that
at least one should be required (and they should be exclusive), and
any other use should trip an assertion.

PR kern/58666: panic: lock error: Reader / writer lock:
rw_vector_enter,357: locking against myself
 1.426.6.1 02-Aug-2025  perseant Sync with HEAD
 1.9 22-Jan-2018  jdolecek rename sys/arch/x86/x86/pmap_tlb.c to sys/arch/x86/x86/x86_tlb.c, so that
x86 can eventually use uvm/pmap/pmap_tlb.c; step to future PCID support
 1.8 13-Nov-2016  maxv Explain why this is the right value, otherwise someone (like me) could be
tempted to increase it. The invlpg part is from rmind, the statistical from
me.
 1.7 24-Jul-2015  hannken branches: 1.7.2;
Operation pmap_tlb_processpacket() uses x86_ipi(.., LAPIC_DEST_ALLEXCL, ...)
when cpuset "target" equals "kcpuset_running". During boot, while some CPUs
are not running yet, this will result in more IPI interrupts than expected
and "pmap_tlb_pendcount" related KASSERTs fire.

Compare the cpuset "target" against "kcpuset_attached", as this set represents
the CPUs LAPIC_DEST_ALLEXCL will notify.

Should fix PR port-amd64/47437
 1.6 21-Apr-2012  rmind branches: 1.6.2; 1.6.16;
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
 1.5 20-Apr-2012  rmind - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
 1.4 04-Dec-2011  cherry branches: 1.4.2; 1.4.4;
Split out the cross-CPU tlb flushing code between XEN and non-XEN.
x86 tlb flushing is asynchronous and uses x86_ipi()
XEN tlb flushing uses synchronous hypercalls.
 1.3 15-Jun-2011  rmind branches: 1.3.2; 1.3.4; 1.3.6;
Few XEN fixes:
- cpu_load_pmap: perform tlbflush() after xen_set_user_pgd().
- xen_pmap_bootstrap: perform xpq_queue_tlb_flush() in the end.
- pmap_tlb_shootdown: do not check PG_G for Xen.
 1.2 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.1 26-May-2010  rmind branches: 1.1.2;
file pmap_tlb.c was initially added on branch rmind-uvmplock.
 1.1.2.5 17-Mar-2011  rmind - Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE)
is not (yet) enabled. This fixes the issue of stale TLB entry, experienced
early on boot, when PGE is not yet set on primary CPU.
- Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly),
which simplifies and unifies (under x86) code, plus fixes few bugs.
- cpu_attach: remove assignment to cpus_running, as primary CPU might not be
attached first, which causes reset (and thus missed secondary CPUs).
 1.1.2.4 08-Mar-2011  rmind - pmap_tlb_shootdown: fix a bug when state for full TLB flush can be
reverted to a single page invalidation(s).
- pmap_tlb_init: clear pmap_tlb_packet and pmap_tlb_mailbox structs.
 1.1.2.3 02-Jul-2010  rmind pmap_tlb_shootdown: add assert demonstrating assumption.
 1.1.2.2 31-May-2010  rmind - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap,
also move pmap_apte_flush() with pmap_unmap_apdp() there.
- Make Xen buildable.
 1.1.2.1 26-May-2010  rmind Split x86 TLB shootdown code into a separate file.
Code part is under TNF license, as per pmap.c 1.105.2.4 revision.
 1.3.6.2 23-May-2012  yamt sync with head.
 1.3.6.1 17-Apr-2012  yamt sync with head
 1.3.4.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.3.4.1 15-Jun-2011  jym file pmap_tlb.c was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.3.2.5 20-Sep-2011  cherry Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
 1.3.2.4 31-Jul-2011  cherry Oops. remove spurious "#undef MULTIPROCESSOR"
 1.3.2.3 31-Jul-2011  cherry grow MP support for i386. boots to single user
 1.3.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.2.1 15-Jun-2011  cherry file pmap_tlb.c was added on branch cherry-xenmp on 2011-06-23 14:19:49 +0000
 1.4.4.1 09-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #202):
sys/arch/x86/include/cpuvar.h: revision 1.46
sys/arch/xen/include/xenpmap.h: revision 1.34
sys/arch/i386/include/param.h: revision 1.77
sys/arch/x86/x86/pmap_tlb.c: revision 1.5
sys/arch/x86/x86/pmap_tlb.c: revision 1.6
sys/arch/i386/i386/genassym.cf: revision 1.92
sys/arch/xen/x86/cpu.c: revision 1.91
sys/arch/x86/x86/pmap.c: revision 1.177
sys/arch/xen/x86/xen_pmap.c: revision 1.21
sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31
sys/kern/subr_kcpuset.c: revision 1.5
sys/arch/amd64/include/param.h: revision 1.18
sys/sys/kcpuset.h: revision 1.5
sys/arch/x86/x86/mtrr_i686.c: revision 1.26
sys/arch/x86/x86/mtrr_i686.c: revision 1.27
sys/arch/xen/x86/x86_xpmap.c: revision 1.43
sys/arch/x86/x86/cpu.c: revision 1.98
sys/arch/amd64/amd64/mptramp.S: revision 1.14
sys/kern/sys_sched.c: revision 1.42
sys/arch/amd64/amd64/genassym.cf: revision 1.50
sys/arch/i386/i386/mptramp.S: revision 1.24
sys/arch/x86/include/pmap.h: revision 1.52
sys/arch/x86/include/cpu.h: revision 1.50
- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.
- pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge
like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running.
Do not reset tp_userpmap if pmap_kernel().
- Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes.
- pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure.
i686_mtrr_init_first: handle the case when there are no variable-size MTRR
registers available (i686_mtrr_vcnt == 0).
 1.4.2.1 29-Apr-2012  mrg sync to latest -current.
 1.6.16.2 05-Dec-2016  skrll Sync with HEAD
 1.6.16.1 22-Sep-2015  skrll Sync with HEAD
 1.6.2.1 03-Dec-2017  jdolecek update from HEAD
 1.7.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.13 13-Jul-2018  maxv Remove the X86PMC code I had written, replaced by tprof. Many defines
become unused in specialreg.h, so remove them. We don't want to add
defines all the time, there are countless PMCs on many generations, and
it's better to just inline the event/unit values.
 1.12 12-Jul-2018  maxv Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.
 1.11 07-Aug-2017  maxv branches: 1.11.2; 1.11.4; 1.11.6; 1.11.8;
Fix GCC warning on NET4501, PR/52451.
 1.10 12-Jul-2017  maxv Properly handle overflows, and take them into account in userland.
 1.9 12-Jul-2017  maxv include opt_pmc.h
 1.8 14-Jun-2017  maxv Make the PMC syscalls privileged.
 1.7 23-May-2017  nonaka branches: 1.7.2;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.6 18-Apr-2017  maya branches: 1.6.2;
switch pmc_ncounters to unsigned int.

matches userland pmc, makes it clear to static analyzers that if the loop
in pmc_nmi (for (i = 0; i < pmc_ncounters; i++) ) is not entered, then the
condition i == pmc_ncounters (== 0) is satisfied and no null derefs occur

this change only helps analyzers read the code, null deref was not possible
before.
 1.5 24-Mar-2017  maxv Handle counter overflows, and sample with 500000 events per interrupt.
It's a pre-requisite for real sampling. Overflows are not yet displayed by
pmc(1), but will be soon.
 1.4 24-Mar-2017  maxv Drop support for 586 PMCs; the detection is broken, and I'm not sure the
code even works. No one has ever cared about this anyway, and we won't
maintain it.

While here, fix the mask on the counter - K7 and F10H have 48bit counters.
 1.3 11-Mar-2017  maxv branches: 1.3.2;
Mmh, remove a debug printf I mistakenly added in my previous commit
 1.2 11-Mar-2017  maxv Add the AMD 10h family, with additional events that I believe are useful,
the DTLB misses on large pages for example.

While here, remove a few K7 flags that do not actually exist on K7 (there
must have been a confusion between K7 and K8); and make the 'pmc list'
command a little more user-friendly.
 1.1 10-Mar-2017  maxv Move pmc.c into x86/, it can be shared with amd64.
 1.3.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.3.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.3.2.1 11-Mar-2017  pgoyette file pmc.c was added on branch pgoyette-localcount on 2017-03-20 06:57:22 +0000
 1.6.2.2 21-Apr-2017  bouyer Sync with HEAD
 1.6.2.1 18-Apr-2017  bouyer file pmc.c was added on branch bouyer-socketcan on 2017-04-21 16:53:39 +0000
 1.7.2.2 25-Aug-2017  snj Pull up following revision(s) (requested by jdolecek in ticket #225):
sys/arch/x86/x86/pmc.c: revision 1.11
Fix GCC warning on NET4501, PR/52451.
 1.7.2.1 01-Aug-2017  snj Pull up following revision(s) (requested by maxv in ticket #164):
distrib/sets/lists/base/md.amd64: revision 1.269
distrib/sets/lists/debug/md.amd64: revision 1.97
sys/arch/amd64/conf/GENERIC: revision 1.460
sys/arch/amd64/conf/files.amd64: revision 1.89
sys/arch/i386/conf/GENERIC: revision 1.1157
sys/arch/i386/conf/files.i386: revision 1.379
sys/arch/i386/i386/i386_trap.S: revision 1.7-1.8
sys/arch/i386/include/frameasm.h: revision 1.16
sys/arch/x86/include/sysarch.h: revision 1.12
sys/arch/x86/x86/pmc.c: revision 1.8-1.10
sys/arch/x86/x86/sys_machdep.c: revision 1.36
sys/arch/xen/conf/files.compat: revision 1.26
sys/secmodel/suser/secmodel_suser.c: revision 1.43
sys/sys/kauth.h: revision 1.74
usr.bin/pmc/Makefile: revision 1.5
usr.bin/pmc/pmc.1: revision 1.12-1.13
usr.bin/pmc/pmc.c: revision 1.24-1.25
style
--
style
--
Disable interrupts for T_NMI (inline calltrap). Note that there's still a
way to evade the NMI mode here, if a segment register faults in
INTRFASTEXIT; but we don't care. I didn't test this change, but it seems
fine enough.
--
Make the PMC syscalls privileged.
--
Check argc, and add a message.
--
include opt_pmc.h
--
Build the pmc tool on amd64.
--
Properly handle overflows, and take them into account in userland.
--
Update.
--
Enable PMCs by default.
--
Sort sections. Fix macro usage.
 1.11.8.1 10-Jun-2019  christos Sync with HEAD
 1.11.6.1 28-Jul-2018  pgoyette Sync with HEAD
 1.11.4.2 03-Dec-2017  jdolecek update from HEAD
 1.11.4.1 07-Aug-2017  jdolecek file pmc.c was added on branch tls-maxphys on 2017-12-03 11:36:50 +0000
 1.11.2.2 28-Aug-2017  skrll Sync with HEAD
 1.11.2.1 07-Aug-2017  skrll file pmc.c was added on branch nick-nhusb on 2017-08-28 17:51:56 +0000
 1.12 07-Oct-2021  msaitoh KNF. No functional change.
 1.11 25-Oct-2020  nia Normalize some machine dependent CPU frequenct sysctl variables.

This moves machdep.*.frequency.* to machdep.cpu.frequency.*.

This was proposed on tech-kern some time ago. The intention is to allow
third-party tools such as estd and conky to more easily and reliably
fetch or modify the current CPU frequency without iterating through
various machine-dependent variables to check their presence.
 1.10 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.9 30-Mar-2016  christos PR/51016: David Binderman: comment out pointless code.
 1.8 15-Nov-2013  msaitoh branches: 1.8.6;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.7 27-Oct-2012  joerg branches: 1.7.2;
Fix uninitialised variable warning from clang by removing the variable
used in first place.
 1.6 18-Jul-2012  joerg branches: 1.6.2;
Remove unused variable.
 1.5 02-Jun-2012  dsl Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.4 29-Oct-2011  jnemeth branches: 1.4.2;
Don't run off the beginning of an array from Maurizio Lombardi.
 1.3 04-Mar-2011  jruoho branches: 1.3.2; 1.3.4; 1.3.6; 1.3.10;
Raise the return value of the match-function of est(4) and powernow(4).
The assigned priorities are now: 10 for acpicpu(4), 5 for est(4) and
powernow(4), and 1 for odcm(4). These are used to pick the preferred driver.
 1.2 24-Feb-2011  jruoho Fix autoconf(9) of cpufeaturebus.
 1.1 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.3.10.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.10.1 04-Mar-2011  jruoho file powernow.c was added on branch jruoho-x86intr on 2011-06-06 09:07:09 +0000
 1.3.6.2 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.3.6.1 04-Mar-2011  jym file powernow.c was added on branch jym-xensuspend on 2011-03-28 23:04:53 +0000
 1.3.4.2 05-Mar-2011  rmind sync with head
 1.3.4.1 04-Mar-2011  rmind file powernow.c was added on branch rmind-uvmplock on 2011-03-05 20:52:31 +0000
 1.3.2.2 05-Mar-2011  bouyer Sync with HEAD
 1.3.2.1 04-Mar-2011  bouyer file powernow.c was added on branch bouyer-quota2 on 2011-03-05 15:10:10 +0000
 1.4.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.2.1 30-Oct-2012  yamt sync with head
 1.6.2.3 03-Dec-2017  jdolecek update from HEAD
 1.6.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.7.2.1 18-May-2014  rmind sync with head
 1.8.6.2 28-Aug-2017  skrll Sync with HEAD
 1.8.6.1 22-Apr-2016  skrll Sync with HEAD
 1.12 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.11 21-May-2008  ad branches: 1.11.12; 1.11.20; 1.11.26; 1.11.28;
cpuctl shows the power management features.
 1.10 11-May-2008  cegger aprint_normal -> aprint_normal_dev
sizeof line -> sizeof(line)
 1.9 29-Apr-2008  martin branches: 1.9.2;
Convert to new 2 clause license
 1.8 04-Jan-2008  christos branches: 1.8.6; 1.8.8; 1.8.10;
add missing includes
 1.7 17-Oct-2007  garbled branches: 1.7.2; 1.7.8;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.6 26-Sep-2007  ad x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.5 10-Sep-2007  cube branches: 1.5.2;
Remove 3rd clause and my name from all the licences which were only in my
name.
 1.4 08-Oct-2006  cube branches: 1.4.4; 1.4.10; 1.4.18; 1.4.28; 1.4.30;
Fix thinko about CPUID(0x80000000).
 1.3 04-Oct-2006  cube Rework the way PowerNow! and Cool'n'Quiet features are detected and
displayed, to make the code much simpler and easier to follow. Also, use
bitmask_printf() to make output consistent with other stuff. Use
CPUID2FAMILY() where appropriate.
 1.2 07-Aug-2006  xtraeme branches: 1.2.4; 1.2.6; 1.2.8; 1.2.10;
* Do not change struct powernow_pst_s (I added another member in my
previous patch) and this MUST be of that size, otherwise the tables
won't be found.

* powernow_k8.c moved into x86/x86, it should work both i386 and amd64.

* Added more DPRINTFs needed to found the first problem.

* Create "machdep.powernow.frequency" again, I can't remember why I
removed frequency... it should work with estd now.

* Do not try to call k[78]_powernow_init() if cpu is not AMD (thanks
to christos).

And more things I can't remember, but this time it will work in
Athlon 64 cpus and it won't crash in EM64T cpus.
 1.1 06-Aug-2006  xtraeme AMD PowerNow!/Cool`n'Quiet driver for NetBSD/amd64,
adapted from OpenBSD.

Tested on a few machines:

http://bigbird.dohd.org:3021/NetBSD/dmesg
http://www.bsd.org.il/netbsd/acpi/dmesg

Thanks to cube, elad and others for testing and fixes.

Enabled by default on GENERIC.
 1.2.10.1 22-Oct-2006  yamt sync with head
 1.2.8.2 09-Sep-2006  rpaulo sync with head
 1.2.8.1 07-Aug-2006  rpaulo file powernow_common.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.2.6.1 18-Nov-2006  ad Sync with head.
 1.2.4.2 11-Aug-2006  yamt sync with head
 1.2.4.1 07-Aug-2006  yamt file powernow_common.c was added on branch yamt-pdpolicy on 2006-08-11 15:43:16 +0000
 1.4.30.2 09-Jan-2008  matt sync with HEAD
 1.4.30.1 06-Nov-2007  matt sync with HEAD
 1.4.28.1 02-Oct-2007  joerg Sync with HEAD.
 1.4.18.1 03-Oct-2007  garbled Sync with HEAD
 1.4.10.1 09-Oct-2007  ad Sync with head.
 1.4.4.4 21-Jan-2008  yamt sync with head
 1.4.4.3 27-Oct-2007  yamt sync with head.
 1.4.4.2 30-Dec-2006  yamt sync with head.
 1.4.4.1 08-Oct-2006  yamt file powernow_common.c was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.5.2.1 06-Oct-2007  yamt sync with head.
 1.7.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.7.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.8.10.2 04-May-2009  yamt sync with head.
 1.8.10.1 16-May-2008  yamt sync with head.
 1.8.8.2 04-Jun-2008  yamt sync with head
 1.8.8.1 18-May-2008  yamt sync with head.
 1.8.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.28.1 05-Mar-2011  bouyer Sync with HEAD
 1.11.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.20.1 05-Mar-2011  rmind sync with head
 1.11.12.1 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.29 24-Feb-2011  jruoho Move PowerNow! to the cpufeaturebus.
 1.28 20-Aug-2010  jruoho branches: 1.28.2; 1.28.4;
Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.
 1.27 19-Aug-2010  jruoho Add sysctl-glue for interaction with the acpicpu(4).
 1.26 05-Oct-2009  rmind branches: 1.26.4;
Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.25 23-Aug-2009  ahoka Typo fix: Mhz -> MHz

No functional change intended.
 1.24 12-Nov-2008  ad branches: 1.24.4;
Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.23 11-May-2008  cegger branches: 1.23.4; 1.23.6;
remove one indent level. No functional change.
 1.22 28-Apr-2008  martin branches: 1.22.2;
Remove clause 3 and 4 from TNF licenses
 1.21 16-Apr-2008  cegger branches: 1.21.2; 1.21.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.20 05-Apr-2007  xtraeme branches: 1.20.34;
k8_powernow_destroy: free k8pnow_current_state rather than cstate,
and remove the global cstate struct.
 1.19 04-Apr-2007  rmind k8_powernow_init_main: Initialize k8pnow_current_state.
 1.18 03-Apr-2007  xtraeme Initialize msr_read explicitly to avoid a hang, from FUKUMUTO Atsushi.
 1.17 24-Mar-2007  xtraeme * Remove the WRITE_FIDVID macro from powernow.h and use it in in the
powernow_k8 driver (much better than undeffing and write it again).
* Fix the WRITE_FIDVID macro, I changed it to use the third argument
for the bitmask, but it's not correct.

Last change should fix the problem reported by FUKUMOTO Atsushi.
 1.16 21-Mar-2007  xtraeme static'ify.
 1.15 21-Mar-2007  xtraeme Remove the MSR read IPI handler, there won't be any driver that will
use it, and we can see if the values are ok in the CPUs in the write
operation.

Suggested by YAMAMOTO Takashi.
 1.14 20-Mar-2007  xtraeme Use the new MSR IPI handlers to make it work properly with SMP.
 1.13 18-Mar-2007  xtraeme Forgot to initialize cstate, make it global and static. Fixes
build problem with the LKM.
 1.12 18-Mar-2007  xtraeme Fix previous, sync prototypes and missing curcpu().
 1.11 18-Mar-2007  xtraeme Don't write same code when there's an error, just use the goto
statement.
 1.10 18-Mar-2007  xtraeme Fix mem leak in k8_powernow_destroy, when it's called multiple times.
Found by mrg@.

Also, make sure they have data before trying to free them.
 1.9 18-Mar-2007  xtraeme There's no need to run est_init or k8_powernow_init on each CPU.
Just run it once (in the first cpu probed) with the RUN_ONCE(9)
framework.

Change the argument of est_init and k8_powernow_init to void, we don't
need cpu_info * anymore.

Suggested by tls@ and mrg@.
 1.8 18-Mar-2007  xtraeme Change k8_powernow_init to accept a struct cpu_info * as argument,
so that in the informative messages it prints the correct cpu
and not curcpu().

This fixes the first part of PR kern/35676.
 1.7 03-Sep-2006  christos branches: 1.7.4; 1.7.8; 1.7.10; 1.7.12; 1.7.16; 1.7.18; 1.7.20;
avoid empty else.
 1.6 02-Sep-2006  xtraeme - Remove k8pnow_read_pending_wait() and use a macro that _always_
will wait for the bit pending wait to be cleared. When this bit is
cleared the CPU is ready to enter to new processor state. [1]
- Remove useless comments.
- Sync boot messages with est.c

[1] Macro taken from FreeBSD.

And new changes tested by elad.
 1.5 27-Aug-2006  xtraeme Update powernow module with POWERNOW_K7 and POWERNOW_K8 support.
Works fine on amd64 cpus running in 32-bit mode.

Tested by Joel Carnat.
 1.4 26-Aug-2006  xtraeme Be a little less agressive in declaring the change pending bit stuck,
increase the number of retries by two orders of magnitude wont affect
most systems but will make transitions smoother on marginal ones.

From gwk@openbsd.
 1.3 10-Aug-2006  xtraeme branches: 1.3.2;
Update license. I've been talking with Martin Vegiard (original author)
and he wanted to move his license to TNF, make it so.
 1.2 08-Aug-2006  xtraeme Attach the "available" node under "frequency". Thanks to Robert Swindells.
 1.1 07-Aug-2006  xtraeme branches: 1.1.2;
* Do not change struct powernow_pst_s (I added another member in my
previous patch) and this MUST be of that size, otherwise the tables
won't be found.

* powernow_k8.c moved into x86/x86, it should work both i386 and amd64.

* Added more DPRINTFs needed to found the first problem.

* Create "machdep.powernow.frequency" again, I can't remember why I
removed frequency... it should work with estd now.

* Do not try to call k[78]_powernow_init() if cpu is not AMD (thanks
to christos).

And more things I can't remember, but this time it will work in
Athlon 64 cpus and it won't crash in EM64T cpus.
 1.1.2.5 02-Sep-2006  tron Pull up following revision(s) (requested by xtraeme in ticket #73):
sys/arch/x86/x86/powernow_k8.c: revision 1.6
- Remove k8pnow_read_pending_wait() and use a macro that _always_
will wait for the bit pending wait to be cleared. When this bit is
cleared the CPU is ready to enter to new processor state. [1]
- Remove useless comments.
- Sync boot messages with est.c
[1] Macro taken from FreeBSD.
And new changes tested by elad.
 1.1.2.4 02-Sep-2006  tron Pull up following revision(s) (requested by xtraeme in ticket #73):
sys/arch/x86/x86/powernow_k8.c: revision 1.4
Be a little less agressive in declaring the change pending bit stuck,
increase the number of retries by two orders of magnitude wont affect
most systems but will make transitions smoother on marginal ones.
From gwk@openbsd.
 1.1.2.3 30-Aug-2006  tron Pull up following revision(s) (requested by xtraeme in ticket #74):
sys/lkm/arch/i386/powernow/Makefile: revision 1.3
sys/arch/x86/x86/powernow_k8.c: revision 1.5
sys/arch/x86/include/powernow.h: revision 1.5
sys/lkm/arch/i386/powernow/lkminit_powernow.c: revision 1.6
Update powernow module with POWERNOW_K7 and POWERNOW_K8 support.
Works fine on amd64 cpus running in 32-bit mode.
Tested by Joel Carnat.
 1.1.2.2 14-Aug-2006  ghen Pull up following revision(s) (requested by xtraeme in ticket #21):
sys/arch/i386/i386/powernow_k7.c: revision 1.15
sys/arch/i386/i386/powernow_k7.c: revision 1.16
sys/arch/x86/x86/powernow_k8.c: revision 1.3
Update license. I've been talking with Martin Vegiard (original author)
and he wanted to move his license to TNF, make it so.
* Skip duplicated freq values (they show up with different fid/vid).
* Fix cstate->fsb before calling k7pnow_states(), we need to use CPU
MHz value like openbsd does.
Tested by Rhialto.
 1.1.2.1 11-Aug-2006  riz Pull up following revision(s) (requested by xtraeme in ticket #9):
sys/arch/i386/i386/powernow_k7.c: revision 1.14
sys/arch/x86/x86/powernow_k8.c: revision 1.2
Attach the "available" node under "frequency". Thanks to Robert Swindells.
 1.3.2.3 03-Sep-2006  yamt sync with head.
 1.3.2.2 11-Aug-2006  yamt sync with head
 1.3.2.1 10-Aug-2006  yamt file powernow_k8.c was added on branch yamt-pdpolicy on 2006-08-11 15:43:16 +0000
 1.7.20.1 29-Mar-2007  reinoud Pullup to -current
 1.7.18.1 11-Jul-2007  mjf Sync with head.
 1.7.16.1 10-Apr-2007  ad Sync with head.
 1.7.12.2 15-Apr-2007  yamt sync with head.
 1.7.12.1 24-Mar-2007  yamt sync with head.
 1.7.10.3 03-Sep-2007  yamt sync with head.
 1.7.10.2 30-Dec-2006  yamt sync with head.
 1.7.10.1 03-Sep-2006  yamt file powernow_k8.c was added on branch yamt-lazymbuf on 2006-12-30 20:47:22 +0000
 1.7.8.1 20-Apr-2007  bouyer Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.7.4.2 09-Sep-2006  rpaulo sync with head
 1.7.4.1 03-Sep-2006  rpaulo file powernow_k8.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.20.34.2 17-Jan-2009  mjf Sync with HEAD.
 1.20.34.1 02-Jun-2008  mjf Sync with HEAD.
 1.21.4.4 11-Mar-2010  yamt sync with head
 1.21.4.3 16-Sep-2009  yamt sync with head
 1.21.4.2 04-May-2009  yamt sync with head.
 1.21.4.1 16-May-2008  yamt sync with head.
 1.21.2.1 18-May-2008  yamt sync with head.
 1.22.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.23.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.23.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.24.4.3 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.24.4.2 24-Oct-2010  jym Sync with HEAD
 1.24.4.1 01-Nov-2009  jym Sync with HEAD.
 1.26.4.1 05-Mar-2011  rmind sync with head
 1.28.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.28.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.49 06-Oct-2024  msaitoh Add AMD svsm bit for x86's /proc/cpuinfo
 1.48 07-Aug-2023  msaitoh branches: 1.48.6;
Update /proc/cpuinfo.

- Move "ssbd" to an unused Linux mapping.
- Update unused Linux mappings.
 1.47 11-Apr-2023  msaitoh Add Intel lam and AMD vnmi.
 1.46 30-Dec-2022  msaitoh Add x2avic. Modify comment.
 1.45 20-Jun-2022  msaitoh branches: 1.45.4;
Add tdx_guest, brs, hfi, ibt, amx_bf16, amx_tile and amx_int8.
 1.44 31-Jan-2022  msaitoh Fix procfs_machdep.c rev. 1.143. Print CPUID 0x00000007:1 %eax correctly.
 1.43 14-Jan-2022  msaitoh Update for cpuid flags:

- The table 11 was changed from CPUID 0x0f leaf 0 %edx to a Linux mapping.
- The table 12 was changed from CPUID 0x0f leaf 1 %edx to CPUID 0x07 leaf 1
%edx. Print avx_vnni and avx512_bf16.
- Print cppc, enqcmd and arch_lbr.
- Modify linux mapping. No used on NetBSD.
 1.42 07-Oct-2021  msaitoh KNF. No functional change.
 1.41 10-Jul-2021  msaitoh Add v_spec_ctrl, avx512_fp16, sme, sev and sev_es. Tested by nonaka@.
 1.40 30-Nov-2020  msaitoh branches: 1.40.4;
Add sgx, sgx_lc, serialize and tsxldtrk.
 1.39 25-Apr-2020  bouyer branches: 1.39.2;
Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.38 24-Apr-2020  msaitoh Lowercase ppin.
 1.37 24-Apr-2020  msaitoh Add AMD protected processor identification number (PPIN).
 1.36 01-Apr-2020  msaitoh branches: 1.36.2;
Add AVX512_VP2INTERSECT, SERIALIZE and TSXLDTRK(TSX suspend load addr tracking)
 1.35 17-Jan-2020  msaitoh Add Fast Short Rep Mov(fsrm).
 1.34 17-Oct-2019  msaitoh branches: 1.34.2;
Add rdpru.
 1.33 24-Jul-2019  msaitoh branches: 1.33.2;
Add avx512ifma, cqm_mbm_total, cqm_mbm_local and waitpkg
 1.32 28-May-2019  kamil Avoid the 1<<31 construct

Shift unsigned int rather than signed one.

Detected with kUBSan when reading /proc/cpuinfo.
 1.31 16-May-2019  msaitoh Revert rev. 1.29. Use current cpuid 7 edx value to print.
 1.30 16-May-2019  msaitoh Add md_clear.
 1.29 16-May-2019  msaitoh Use ci_feat_val[7] instead of directly getting cpuid 7 edx.
 1.28 18-Feb-2019  msaitoh - Add wbnoinvd, virt_ssbd, tme, cldemote, movdiri, movdir64b and pconfig.
- Move AMD 0x80000008 ebx's ibpb, ibrs and stibp to x86_features[8] linux
mapping.
 1.27 06-Jan-2019  christos restore original now that weak symbols are gone
 1.26 05-Jan-2019  christos Comment out rcr0 use until the weak symbol mess is undone.
 1.25 15-Nov-2018  msaitoh - I misread ci_acpiid as ci_apicid... LAPIC ID is in ci_cpuid.
Print it correctly.
- ci_initapicid(Initial APIC ID) is uint32_t, so use %u.
 1.24 20-Aug-2018  msaitoh OK'd by maxv:
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
 1.23 23-May-2018  msaitoh branches: 1.23.2;
Add SSBD bit for Intel.
 1.22 05-Mar-2018  msaitoh branches: 1.22.2;
- Add AMD CPUID leaf 0x80000008 ebx's xsaveerptr, ibpb, ibrs, stibp.
- Add Intel CPUID leaf 7 ebx's umip, avx512_vbmi2, gfni, vaes, vpclmulqdq,
avx512_vnni and avx512_bitalg.
- Add Intel CPUID leaf 7 edx's avx512_4vnniw, avx512_4fmaps and
arch_capabilities.
 1.21 10-Jan-2018  msaitoh Print intel_pt in /proc/cpuinfo.
 1.20 10-Oct-2017  msaitoh Fix the location of AMD's smca(Scalable MCA) bit. Thanks Yasushi Oshima for
finding this bug.
 1.19 09-Oct-2017  maya GC i386_fpu_present. no FPU x86 is not supported.

Also delete newly unused send_sigill
 1.18 05-Oct-2017  msaitoh - Use per cpu ci->ci_max_cupid instead of global "cpuid_level" variable.
- Print AMD specific cpuid leafs:
0x80000008 ebx
0x8000000a edx
0x80000007 ebx
 1.17 28-Sep-2017  msaitoh Print the following cpuid bits:

0x0000000d:1 eax (xsaveopt, xsavec, xgetbv1, xsaves)
0x0000000f:0 edx (cqm_llc)
0x0000000f:1 edx (cqm_occup_llc)
0x00000006 eax (dtherm, ida, arat, pln, pts, hwp, hwp_notify,
hwp_act_window, hwp_epp, hwp_pkg_req)
 1.16 28-Aug-2017  msaitoh Check buffer length correctly to not to print a garbage character.
Fixes PR#52352 reported by Yasushi Oshima.
 1.15 15-May-2017  msaitoh branches: 1.15.2;
- Print 0x00000007:0 ecx leaf bits.
- Don't print fdiv_bug on amd64.
- Print APIC ID, Initial APIC ID and clflush size.
 1.14 08-Dec-2016  msaitoh branches: 1.14.6;
- Remove "pcommit".
- Add "rdt_a".
 1.13 08-Aug-2016  msaitoh - Update VIA/Cyrix/Centaur-defined bits. Part of PR#39950
- Fix comment. x86_features[4] is not 0x80000001 but 0x00000001
- Update comment
 1.12 27-Apr-2016  msaitoh branches: 1.12.2;
Take some changes from the Linux's latest x86/include/asm/cpufeatures.h.
- Add ptsc, avx512dq, avx512bw and avx512vl
- Remove some Linux mappings.
 1.11 12-Feb-2016  msaitoh Fix typo in comment.
 1.10 18-Jan-2016  msaitoh Add comments. Fix comments. No functional change.
 1.9 13-Jan-2016  msaitoh Use CPUID_TO_*() macros. This change fix a bug that /proc/cpuinfo's CPU model
was incorrect on many newer CPUs and CPU family was incorrect on some AMD
machines.
 1.8 13-Jan-2016  msaitoh PR#49246 "x86/x86/procfs_machdep.c (/proc/cpuinfo) is very old" related change
- Decode NetBSD's ci_feat_val[0-5]. The output order of the bits is the same as
linux. Before this commit, only ci_feat_val[0] was decoded.
- Linux defined feature words and some others are not decoded yet.
- procfs_getonecpufeatures() will be rewritten when all of linux entries are
decoded.
 1.7 16-Apr-2015  njoly Always output 2 digits for the cpu frequency decimal part.
 1.6 05-Apr-2014  christos branches: 1.6.4; 1.6.6;
make this compute the needed size instead of bailing.
 1.5 27-Mar-2014  christos correct/add protection against snprintf overflow.
 1.4 24-Mar-2014  christos use cpu_{g,s}etmodel
 1.3 12-Feb-2014  dsl Change i386 to use x86/fpu.c instead of i386/isa/npx.c
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
 1.2 02-Feb-2014  dsl Minor fpu initialisation cleanups:
Set default CR) so that the FPU is enabled (unset CR0_EM) and initialise
i386_fpu_present to 1.
No need to call the npx trap indirectly, rename to fpunda() to match amd64.
Remove the i386_fpu_exception variable and sysctl (It used to indicate
which irq was used for fpu exceptions, but we only support 'internal'
now). Hopefully no one cares.
fpuinit() now only needs to clear TS before the fninit(). Apart from the
checks for 486SX and the 'fdiv bug' this matches the amd64 version.
Exclude fpuinit() from XEN kernels, they don't call it - which rather begs
the question as to whether it is needed at all!
 1.1 08-Jul-2010  rmind branches: 1.1.2; 1.1.4; 1.1.6; 1.1.12; 1.1.16; 1.1.26; 1.1.30;
Unify i386 and amd64 procfs MD code into x86.
 1.1.30.1 18-May-2014  rmind sync with head
 1.1.26.2 03-Dec-2017  jdolecek update from HEAD
 1.1.26.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.16.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.12.2 05-Mar-2011  rmind sync with head
 1.1.12.1 08-Jul-2010  rmind file procfs_machdep.c was added on branch rmind-uvmplock on 2011-03-05 20:52:31 +0000
 1.1.6.2 24-Oct-2010  jym Sync with HEAD
 1.1.6.1 08-Jul-2010  jym file procfs_machdep.c was added on branch jym-xensuspend on 2010-10-24 22:48:19 +0000
 1.1.4.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.1.4.1 08-Jul-2010  uebayasi file procfs_machdep.c was added on branch uebayasi-xip on 2010-08-17 06:45:34 +0000
 1.1.2.2 11-Aug-2010  yamt sync with head.
 1.1.2.1 08-Jul-2010  yamt file procfs_machdep.c was added on branch yamt-nfs-mp on 2010-08-11 22:52:58 +0000
 1.6.6.6 28-Aug-2017  skrll Sync with HEAD
 1.6.6.5 05-Feb-2017  skrll Sync with HEAD
 1.6.6.4 05-Oct-2016  skrll Sync with HEAD
 1.6.6.3 29-May-2016  skrll Sync with HEAD
 1.6.6.2 19-Mar-2016  skrll Sync with HEAD
 1.6.6.1 06-Jun-2015  skrll Sync with HEAD
 1.6.4.4 18-Nov-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1650):

sys/arch/x86/x86/procfs_machdep.c: revision 1.25

- I misread ci_acpiid as ci_apicid... LAPIC ID is in ci_cpuid.
Print it correctly.

- ci_initapicid(Initial APIC ID) is uint32_t, so use %u.
 1.6.4.3 11-Sep-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #1505):
sys/arch/x86/x86/procfs_machdep.c: 1.15-1.16
- Print 0x00000007:0 ecx leaf bits.
- Don't print fdiv_bug on amd64.
- Print APIC ID, Initial APIC ID and clflush size.
--
Check buffer length correctly to not to print a garbage character.
Fixes PR#52352 reported by Yasushi Oshima.
 1.6.4.2 08-Dec-2016  snj Pull up following revision(s) (requested by msaitoh in ticket #1293):
sys/arch/x86/x86/procfs_machdep.c: revisions 1.12-1.14
Update for x86 /proc/cpuinfo:
- Add ptsc, avx512dq, avx512bw, avx512vl and rdt_a.
- Update VIA/Cyrix/Centaur-defined bits. Part of PR#39950.
- Remove pcommit.
- Update some Linux mapping unused in /proc/cpuinfo.
 1.6.4.1 06-Mar-2016  martin branches: 1.6.4.1.2;
Pull up the following revisions, requested by msaitoh in ticket #1119:

sys/arch/x86/x86/procfs_machdep.c 1.7-1.11

x86's /proc/cpuinfo fixes:
- Always output 2 digits for the cpu frequency decimal part.
- Update x86's feature bits in /proc/cpuinfo (PR#49246).
- Fix a bug that /proc/cpuinfo's CPU model was incorrect on many newer
CPUs and CPU family was incorrect on some AMD CPUs.
- Add comment. Fix comment.
 1.6.4.1.2.1 18-Jan-2017  skrll Sync with netbsd-5
 1.12.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14.6.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.15.2.16 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1830):

sys/arch/x86/x86/procfs_machdep.c: revision 1.47

Add Intel lam and AMD vnmi.
 1.15.2.15 23-Jan-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1788):

sys/arch/x86/x86/procfs_machdep.c: revision 1.46

Add x2avic. Modify comment.
 1.15.2.14 16-Sep-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1766):

sys/arch/x86/x86/procfs_machdep.c: revision 1.45

Add tdx_guest, brs, hfi, ibt, amx_bf16, amx_tile and amx_int8.
 1.15.2.13 31-Jan-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1733):

sys/arch/x86/x86/procfs_machdep.c: revision 1.43
sys/arch/x86/x86/procfs_machdep.c: revision 1.44

Update for cpuid flags:
- The table 11 was changed from CPUID 0x0f leaf 0 %edx to a Linux mapping.
- The table 12 was changed from CPUID 0x0f leaf 1 %edx to CPUID 0x07 leaf 1
%edx. Print avx_vnni and avx512_bf16.
- Print cppc, enqcmd and arch_lbr.
- Modify linux mapping. No used on NetBSD.

Fix procfs_machdep.c rev. 1.143. Print CPUID 0x00000007:1 %eax correctly.
 1.15.2.12 03-Dec-2021  martin Pull up the following revisions, requested by msaitoh in ticket #1715:

sys/arch/x86/x86/procfs_machdep.c 1.40-1.42

- Add v_spec_ctrl, avx512_fp16, sme, sev, sev_es, sgx, sgx_lc,
serialize and tsxldtrk.
- Whitespace fix.
 1.15.2.11 20-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #1581):

sys/arch/x86/x86/procfs_machdep.c: revision 1.37
sys/arch/x86/x86/procfs_machdep.c: revision 1.38

Add AMD protected processor identification number (PPIN).

Lowercase ppin.
 1.15.2.10 15-Apr-2020  martin Pull up the following, requested by msaitoh in ticket #1530:

sys/arch/x86/x86/procfs_machdep.c 1.33-1.36
sys/arch/x86/x86/tsc.c 1.40
sys/arch/x86/x86/specialreg.h 1.159-1.161
usr.sbin/cpuctl/arch/i386.c 1.109-1.110 via patch

- Print avx512ifma, cqm_mbm_total, cqm_mbm_local, waitpkg, rdpru,
Fast Short Rep Mov(fsrm), AVX512_VP2INTERSECT, SERIALIZE and
TSXLDTRK.
- Rename CPUID Fn8000_0007 %edx bit 8 from "TSC" to "ITSC"
(Invariant TSC) to avoid confusion.
- Print CPUID 0x80000007 %edx on both Intel and AMD.
- Remove ci_max_ext_cpuid from usr.sbin/cpuctl/arch/i386.c because it's
the same as ci_cpuid_extlevel.
- Use unsigned to avoid undefined behavior in procfs_getonefeatreg().
 1.15.2.9 29-May-2019  martin Pullup the following, requested by msaitoh in ticket #1270:

sys/arch/x86/include/specialreg.h 1.143, 1.145 via patch
sys/arch/x86/x86/procfs_machdep.c 1.30

Add TSX_FORCE_ABORT related definitions.
Add cpuid7 edx bit 10 "MD_CLEAR".
 1.15.2.8 07-Mar-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1204):

sys/arch/x86/x86/procfs_machdep.c: revision 1.28

- Add wbnoinvd, virt_ssbd, tme, cldemote, movdiri, movdir64b and pconfig.
- Move AMD 0x80000008 ebx's ibpb, ibrs and stibp to x86_features[8] linux
mapping.
 1.15.2.7 18-Nov-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1094):

sys/arch/x86/x86/procfs_machdep.c: revision 1.25

- I misread ci_acpiid as ci_apicid... LAPIC ID is in ci_cpuid.
Print it correctly.

- ci_initapicid(Initial APIC ID) is uint32_t, so use %u.
 1.15.2.6 23-Sep-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1026):

sys/arch/x86/x86/procfs_machdep.c: revision 1.24
sys/arch/x86/include/specialreg.h: revision 1.130

OK'd by maxv:
- Add cpuid 7 edx L1D_FLUSH bit.
- Add IA32_ARCH_SKIP_L1DFL_VMENTRY bit.
- Add IA32_FLUSH_CMD MSR.
 1.15.2.5 09-Jun-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #867):

sys/arch/x86/x86/procfs_machdep.c: revision 1.23

Add SSBD bit for Intel.
 1.15.2.4 16-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #634):
sys/arch/x86/x86/procfs_machdep.c: revision 1.22
- Add AMD CPUID leaf 0x80000008 ebx's xsaveerptr, ibpb, ibrs, stibp.
- Add Intel CPUID leaf 7 ebx's umip, avx512_vbmi2, gfni, vaes, vpclmulqdq,
avx512_vnni and avx512_bitalg.
- Add Intel CPUID leaf 7 edx's avx512_4vnniw, avx512_4fmaps and
arch_capabilities.
 1.15.2.3 13-Jan-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #492):
sys/arch/x86/x86/procfs_machdep.c: revision 1.21
Print intel_pt in /proc/cpuinfo.
 1.15.2.2 21-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #367):
sys/arch/x86/x86/procfs_machdep.c: revision 1.20
sys/arch/x86/x86/procfs_machdep.c: revision 1.17
sys/arch/x86/x86/procfs_machdep.c: revision 1.18
Print the following cpuid bits:
0x0000000d:1 eax (xsaveopt, xsavec, xgetbv1, xsaves)
0x0000000f:0 edx (cqm_llc)
0x0000000f:1 edx (cqm_occup_llc)
0x00000006 eax (dtherm, ida, arat, pln, pts, hwp, hwp_notify,
hwp_act_window, hwp_epp, hwp_pkg_req)
- Use per cpu ci->ci_max_cupid instead of global "cpuid_level" variable.
- Print AMD specific cpuid leafs:
0x80000008 ebx
0x8000000a edx
0x80000007 ebx
Fix the location of AMD's smca(Scalable MCA) bit. Thanks Yasushi Oshima for
finding this bug.
 1.15.2.1 31-Aug-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #247):
sys/arch/x86/x86/procfs_machdep.c: revision 1.16
Check buffer length correctly to not to print a garbage character.
Fixes PR#52352 reported by Yasushi Oshima.
 1.22.2.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.22.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.22.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.23.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.23.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.23.2.1 10-Jun-2019  christos Sync with HEAD
 1.33.2.8 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1649):

sys/arch/x86/x86/procfs_machdep.c: revision 1.47

Add Intel lam and AMD vnmi.
 1.33.2.7 23-Jan-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #1571):

sys/arch/x86/x86/procfs_machdep.c: revision 1.46

Add x2avic. Modify comment.
 1.33.2.6 16-Sep-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1526):

sys/arch/x86/x86/procfs_machdep.c: revision 1.45

Add tdx_guest, brs, hfi, ibt, amx_bf16, amx_tile and amx_int8.
 1.33.2.5 31-Jan-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1419):

sys/arch/x86/x86/procfs_machdep.c: revision 1.43
sys/arch/x86/x86/procfs_machdep.c: revision 1.44

Update for cpuid flags:
- The table 11 was changed from CPUID 0x0f leaf 0 %edx to a Linux mapping.
- The table 12 was changed from CPUID 0x0f leaf 1 %edx to CPUID 0x07 leaf 1
%edx. Print avx_vnni and avx512_bf16.
- Print cppc, enqcmd and arch_lbr.
- Modify linux mapping. No used on NetBSD.

Fix procfs_machdep.c rev. 1.143. Print CPUID 0x00000007:1 %eax correctly.
 1.33.2.4 03-Dec-2021  martin Pull up the following revisions, requested by msaitoh in ticket #1385:

sys/arch/x86/x86/procfs_machdep.c 1.40-1.42

- Add v_spec_ctrl, avx512_fp16, sme, sev, sev_es, sgx, sgx_lc,
serialize and tsxldtrk.
- Whitespace fix.
 1.33.2.3 10-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #993):

sys/arch/x86/x86/procfs_machdep.c: revision 1.37
sys/arch/x86/x86/procfs_machdep.c: revision 1.38

Add AMD protected processor identification number (PPIN).

Lowercase ppin.
 1.33.2.2 14-Apr-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #833):

usr.sbin/cpuctl/arch/i386.c: revision 1.109
sys/arch/x86/include/specialreg.h: revision 1.159
usr.sbin/cpuctl/arch/i386.c: revision 1.110
sys/arch/x86/include/specialreg.h: revision 1.160
sys/arch/x86/include/specialreg.h: revision 1.161
sys/arch/x86/x86/tsc.c: revision 1.40
sys/arch/x86/x86/procfs_machdep.c: revision 1.35
sys/arch/x86/x86/procfs_machdep.c: revision 1.36

Add Fast Short Rep Mov(fsrm).

Add AVX512_VP2INTERSECT, SERIALIZE and TSXLDTRK(TSX suspend load addr tracking)

CPUID Fn00000001 %edx bit 8 is printed as "TSC", so rename CPUID Fn8000_0007
%edx bit 8 from "TSC" to "ITSC" (Invariant TSC) to avoid confusion.

Rename CPUID_APM_TSC to CPUID_APM_ITSC. No functional change.

Remove ci_max_ext_cpuid because it's the same as ci_cpuid_extlevel.

Print CPUID 0x80000007 %edx on both Intel and AMD.
 1.33.2.1 17-Oct-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #344):

sys/arch/x86/include/specialreg.h: revision 1.154
sys/arch/x86/include/specialreg.h: revision 1.155
usr.sbin/cpuctl/arch/i386.c: revision 1.107
sys/arch/x86/x86/procfs_machdep.c: revision 1.34

- Add definitions of AMD's CPUID Fn8000_001f Encrypted Memory features.
- Add definition of AMD's CPUID Fn8000_000a %edx bit 11 "GMET".
- Define CPUID_AMD_SVM_PFThreshold correctly.
- Modify comment a bit for consistency.

Fix AMD Fn8000_0001f %eax bit 0's name.

Add rdpru.
 1.34.2.1 17-Jan-2020  ad Sync with head.
 1.36.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.39.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.40.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.45.4.2 21-Jun-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #203):

sys/arch/x86/x86/procfs_machdep.c: revision 1.47

Add Intel lam and AMD vnmi.
 1.45.4.1 23-Jan-2023  martin Pull up following revision(s) (requested by msaitoh in ticket #54):

sys/arch/x86/x86/procfs_machdep.c: revision 1.46

Add x2avic. Modify comment.
 1.48.6.1 02-Aug-2025  perseant Sync with HEAD
 1.4 03-Dec-2007  ad Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.3 17-Oct-2007  garbled branches: 1.3.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.2 17-May-2007  yamt branches: 1.2.8; 1.2.10;
merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.1 26-Feb-2003  fvdl branches: 1.1.18; 1.1.56; 1.1.60; 1.1.62; 1.1.68;
Move some files out of i386 into x86, so that they can be shared with
other ports.
 1.1.68.1 22-May-2007  matt Update to HEAD.
 1.1.62.1 11-Jul-2007  mjf Sync with head.
 1.1.60.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.1.60.1 27-May-2007  ad Sync with head.
 1.1.56.1 23-Mar-2007  ad - Decouple intr.h from cpu.h.
- Define splraise in spl.S. As a side effect it becomes "preemption safe".
- Make softintr_schedule a function in softintr.c.
- Make softintr a function in spl.S, and remove the unneeded lock prefix.
 1.1.18.2 07-Dec-2007  yamt sync with head
 1.1.18.1 03-Sep-2007  yamt sync with head.
 1.2.10.2 23-Mar-2008  matt sync with HEAD
 1.2.10.1 06-Nov-2007  matt sync with HEAD
 1.2.8.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.3.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.36 07-Oct-2021  msaitoh KNF. No functional change.
 1.35 02-May-2020  maxv Modify the hotpatch mechanism, in order to make it much less ROP-friendly.

Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.

- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.

The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).

With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.

The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
 1.34 21-Feb-2020  joerg Explicitly cast pointers to uintptr_t before casting to enums. They are
not necessarily the same size. Don't cast pointers to bool, check for
NULL instead.
 1.33 31-Jan-2020  maxv 'oldlwp' is never NULL now, so remove the NULL checks.
 1.32 12-Dec-2019  maxv branches: 1.32.2;
Check CPUID.IBRS in addition to ARCH_CAP.IBRS_ALL. For clarity, and also
because VirtualBox clears the former but forgets to clear the latter (which
makes us hit a #GP on RDMSR).
 1.31 12-Nov-2019  maxv Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA).

Two sysctls are added:

machdep.taa.mitigated = {0/1} user-settable
machdep.taa.method = {string} constructed by the kernel

There are two cases:

(1) If the CPU is affected by MDS, then the MDS mitigation will also
mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf
read-only, and force:
machdep.taa.mitigated = machdep.mds.mitigated
machdep.taa.method = [MDS]
The kernel already enables the MDS mitigation by default.

(2) If the CPU is not affected by MDS but is affected by TAA, then we use
the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode
update, now available on the Intel website. The kernel will automatically
enable the TAA mitigation if the updated microcode is present. If the new
microcode is not present, the user can load it via cpuctl, and set
machdep.taa.mitigated=1.
 1.30 30-Aug-2019  msaitoh Use macro.
 1.29 01-Jun-2019  maxv branches: 1.29.2;
Mmh, check the highest leaf before calling x86_cpuid(), otherwise on old
CPUs we might be getting garbage. While here fix a typo.

Likely fixes PR/54256.
 1.28 18-May-2019  maxv Use XC_HIGHPRI for SpectreV2 to reduce the CPU downtime. We already do this
for MDS.
 1.27 14-May-2019  maxv Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS).

It requires a microcode update, now available on the Intel website. The
microcode modifies the behavior of the VERW instruction, and makes it flush
internal CPU buffers. We hotpatch the return-to-userland path to add VERW.

Two sysctls are added:

machdep.mds.mitigated = {0/1} user-settable
machdep.mds.method = {string} constructed by the kernel

The kernel will automatically enable the mitigation if the updated
microcode is present. If the new microcode is not present, the user can
load it via cpuctl, and set machdep.mds.mitigated=1.
 1.26 27-Apr-2019  maxv Add support for EnhancedIBRS, a more performant mitigation for SpectreV2,
available on future CPUs (or maybe they already exist now...).
 1.25 23-Mar-2019  maxv In fact, xc_broadcast also applies to offline CPUs, so we don't need to
make sure each CPU is online. Remove the checks, I suspect they weren't
totally correct by the way.
 1.24 27-Jan-2019  dholland fix duplicated chunk from merge
 1.23 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.22 22-Dec-2018  maxv In the end, disable the supposed architectural SpectreV2 mitigation on
AMD f12h and f16h. The SDMs of these CPUs haven't been updated since, and
we shouldn't assume the position of the bits, we just can't know where
they are.

Initially I included f12h and f16h because f10h is actually documented
to have a bit to disable the indirect branch predictor, and there were
patches available in SuSE and CentOS that were treating f10h/f12h/f16h
all the same. Knowing that SuSE has ties with AMD, it seemed safe to
assume that these patches were correct and that f12h and f16h could
indeed be treated the same way as f10h.

But these patches have now disappeared, and the main Linux branch
doesn't have them, without clear explanation. Therefore, I prefer to
roll-back.
 1.21 22-Dec-2018  maxv Add AMD_SSB_NO, so that we explicitly say than an AMD CPU is not affected
when it's not affected.
 1.20 22-Dec-2018  maxv If the CPU is not vulnerable to SpectreV4, say it in the sysctl by default.
Apply some minor style while here.
 1.19 28-May-2018  maxv branches: 1.19.2; 1.19.4;
Mmh, don't automatically set enabled=1 for SpectreV4, the actual mitigation
is not yet applied by default. Just so people can test.
 1.18 22-May-2018  maxv Extend the AMD NONARCH method to family 17h. The AMD spec states that for
17h care must be taken when handling sibling threads.

The concern is that if we have a protected two-thread process running on
two siblings, and context switch one thread to another unprotected thread,
disabling the SSB protection on one logical core will disable SSB on its
sibling too (which is still running the protected thread).

All of that doesn't matter to us, because the SSB value we set is
system-wide, not per-process.
 1.17 22-May-2018  maxv Simplify the sysctl handlers.
 1.16 22-May-2018  maxv Forgot switch cases for AMD.
 1.15 22-May-2018  maxv Implement a mitigation for SpectreV4 on AMD families 15h and 16h. We use
a non-architectural MSR. This MSR is also available on 17h, but there SMT
is involved, and it needs more investigation.

Not tested (I have only 10h).
 1.14 22-May-2018  maxv Several changes:

- Move the sysctl initialization code into spectre.c. This way each
variable is local. Rename the variables, use shorter names.

- Use mitigation methods for SpectreV4, like SpectreV2. There are
several available on AMD (that we don't support yet). Add a "method"
leaf.

- Make SSB_NO a mitigation method by itself. This way we report as
"mitigated" a CPU that is not affected by SpectreV4. In this case,
of course, the user can't enable/disable the mitigation. Drop the
"affected" sysctl leaf.
 1.13 22-May-2018  maxv Clarify the parameters for the SpectreV2 mitigation.

Add:
machdep.spectre_v2.swmitigated
Rename:
machdep.spectre_v2.mitigated -> machdep.spectre_v2.hwmitigated

Change the method string, to combine both the hardware and software
mitigations. swmitigated is set at compile time, hwmitigated can be
set by the user.

Examples:

spectre_v2.swmitigated = 1
spectre_v2.hwmitigated = 0
spectre_v2.method = [GCC retpoline]

spectre_v2.swmitigated = 0
spectre_v2.hwmitigated = 0
spectre_v2.method = (none)

spectre_v2.swmitigated = 1
spectre_v2.hwmitigated = 1
spectre_v2.method = [GCC retpoline] + [Intel IBRS]
 1.12 22-May-2018  maxv Mitigation for SpectreV4, based on SSBD. The following sysctl branches
are added:

machdep.spectre_v4.mitigated = {0/1} user-settable
machdep.spectre_v4.affected = {0/1} set by the kernel

The mitigation is not enabled by default yet. It is not tested either,
because no microcode update has been published yet.

On current CPUs a microcode/bios update must be applied for SSBD to be
available. The user can then set mitigated=1. Even with an update applied
the kernel will set affected=1.

On future CPUs, where the problem will presumably be fixed by default,
the CPU will report SSB_NO, and the kernel will set affected=0. In this
case we also have mitigated=0, but the mitigation is not needed.

For now the feature is system-wide. Perhaps we will want a more
fine-grained, per-process approach in the future.
 1.11 22-May-2018  maxv Reorder and rename, to make the code less SpectreV2-specific.
 1.10 05-Apr-2018  maxv Set the "method" string at boot time too.
 1.9 04-Apr-2018  maxv Add machdep.spectre_v2.method, a string that tells which method is
active.
 1.8 04-Apr-2018  maxv Enable the SpectreV2 mitigation by default at boot time.
 1.7 31-Mar-2018  maxv Reorganize to simplify.
 1.6 31-Mar-2018  maxv Add #ifdef, for i386 not to panic.
 1.5 29-Mar-2018  maxv branches: 1.5.2;
Allow IBRS to be disabled dynamically.
 1.4 29-Mar-2018  maxv Fix sysctl type, should be bool.
 1.3 28-Mar-2018  maxv oldlwp can be NULL, so ensure it isn't.
 1.2 28-Mar-2018  maxv Add the IBRS mitigation for SpectreV2 on amd64.

Different operations are performed during context transitions:

user->kernel: IBRS <- 1
kernel->user: IBRS <- 0

And during context switches:

user->user: IBPB <- 0
kernel->user: IBPB <- 0
[user->kernel:IBPB <- 0 this one may not be needed]

We use two macros, IBRS_ENTER and IBRS_LEAVE, to set the IBRS bit. The
thing is hotpatched for better performance, like SVS.

The idea is that IBRS is a "privileged" bit, which is set to 1 in kernel
mode and 0 in user mode. To protect the branch predictor between user
processes (which are of the same privilege), we use the IBPB barrier.

The Intel manual also talks about (MWAIT/HLT)+HyperThreading, and says
that when using either of the two instructions IBRS must be disabled for
better performance on the core. I'm not totally sure about this part, so
I'm not adding it now.

IBRS is available only when the Intel microcode update is applied. The
mitigation must be enabled manually with machdep.spectreV2.mitigated.

Tested by msaitoh a week ago (but I adapted a few things since). Probably
more changes to come.
 1.1 28-Mar-2018  maxv Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.
 1.5.2.5 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.5.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.5.2.3 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.5.2.2 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.5.2.1 29-Mar-2018  pgoyette file spectre.c was added on branch pgoyette-compat on 2018-03-30 06:20:13 +0000
 1.19.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.19.4.1 10-Jun-2019  christos Sync with HEAD
 1.19.2.5 12-Nov-2019  martin Pull up following revision(s) (requested by maxv in ticket #1433):

sys/arch/x86/include/specialreg.h: revision 1.157
sys/arch/x86/x86/spectre.c: revision 1.31

Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA).

Two sysctls are added:
machdep.taa.mitigated = {0/1} user-settable
machdep.taa.method = {string} constructed by the kernel

There are two cases:

(1) If the CPU is affected by MDS, then the MDS mitigation will also
mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf
read-only, and force:

machdep.taa.mitigated = machdep.mds.mitigated
machdep.taa.method = [MDS]

The kernel already enables the MDS mitigation by default.

(2) If the CPU is not affected by MDS but is affected by TAA, then we use
the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode
update, now available on the Intel website. The kernel will automatically
enable the TAA mitigation if the updated microcode is present. If the new
microcode is not present, the user can load it via cpuctl, and set
machdep.taa.mitigated=1.
 1.19.2.4 02-Jun-2019  martin Pull up following revision(s) (requested by maxv in ticket #1279):

sys/arch/x86/x86/spectre.c: revision 1.29

Mmh, check the highest leaf before calling x86_cpuid(), otherwise on old
CPUs we might be getting garbage. While here fix a typo.

Likely fixes PR/54256.
 1.19.2.3 14-May-2019  martin Pull up following revision(s) (requested by maxv in ticket #1269):

sys/arch/amd64/amd64/locore.S: revision 1.181 (adapted)
sys/arch/amd64/amd64/amd64_trap.S: revision 1.47 (adapted)
sys/arch/x86/include/specialreg.h: revision 1.144 (adapted)
sys/arch/amd64/include/frameasm.h: revision 1.43 (adapted)
sys/arch/x86/x86/spectre.c: revision 1.27 (adapted)

Mitigation for INTEL-SA-00233: Microarchitectural Data Sampling (MDS).
It requires a microcode update, now available on the Intel website. The
microcode modifies the behavior of the VERW instruction, and makes it flush
internal CPU buffers. We hotpatch the return-to-userland path to add VERW.

Two sysctls are added:

machdep.mds.mitigated = {0/1} user-settable
machdep.mds.method = {string} constructed by the kernel

The kernel will automatically enable the mitigation if the updated
microcode is present. If the new microcode is not present, the user can
load it via cpuctl, and set machdep.mds.mitigated=1.
 1.19.2.2 09-Jun-2018  martin Pullup the following revisions, requested by maxv in ticket #865:

sys/arch/amd64/amd64/machdep.c 1.303 (patch)
sys/arch/amd64/conf/GENERIC 1.492 (patch)
sys/arch/amd64/conf/files.amd64 1.103 (patch)
sys/arch/i386/i386/machdep.c 1.806 (patch)
sys/arch/i386/conf/GENERIC 1.1179 (patch)
sys/arch/i386/conf/files.i386 1.393 (patch)
sys/arch/x86/include/cpu.h 1.91 (patch)
sys/arch/x86/include/specialreg.h upto 1.126 (patch)
sys/arch/x86/x86/x86_machdep.c upto 1.115 (patch, adapted)
sys/arch/x86/x86/spectre.c upto 1.19 (patch, adapted,
no IBRS,
SpectreV2 mitigations not
enabled by default)

Backport the hardware SpectreV2 and SpectreV4 mitigations.
 1.19.2.1 28-May-2018  martin file spectre.c was added on branch netbsd-8 on 2018-06-09 15:12:21 +0000
 1.29.2.3 14-Dec-2019  martin Pull up following revision(s) (requested by maxv in ticket #550):

sys/arch/x86/x86/spectre.c: revision 1.32

Check CPUID.IBRS in addition to ARCH_CAP.IBRS_ALL. For clarity, and also
because VirtualBox clears the former but forgets to clear the latter (which
makes us hit a #GP on RDMSR).
 1.29.2.2 12-Nov-2019  martin Pull up following revision(s) (requested by maxv in ticket #419):

sys/arch/x86/include/specialreg.h: revision 1.157
sys/arch/x86/x86/spectre.c: revision 1.31

Mitigation for CVE-2019-11135: TSX Asynchronous Abort (TAA).

Two sysctls are added:
machdep.taa.mitigated = {0/1} user-settable
machdep.taa.method = {string} constructed by the kernel

There are two cases:

(1) If the CPU is affected by MDS, then the MDS mitigation will also
mitigate TAA, and we have nothing else to do. We make the 'mitigated' leaf
read-only, and force:

machdep.taa.mitigated = machdep.mds.mitigated
machdep.taa.method = [MDS]

The kernel already enables the MDS mitigation by default.

(2) If the CPU is not affected by MDS but is affected by TAA, then we use
the new TSX_CTRL MSR to disable RTM. This MSR is provided via a microcode
update, now available on the Intel website. The kernel will automatically
enable the TAA mitigation if the updated microcode is present. If the new
microcode is not present, the user can load it via cpuctl, and set
machdep.taa.mitigated=1.
 1.29.2.1 26-Sep-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #241):

sys/arch/x86/include/specialreg.h: revision 1.152
sys/arch/x86/include/specialreg.h: revision 1.153
usr.sbin/cpuctl/arch/i386.c: revision 1.105
sys/arch/x86/x86/spectre.c: revision 1.30
sys/arch/x86/include/specialreg.h: revision 1.151

Add definitions of AMD's CPUID Fn8000_0008 %ebx.
Decode AMD's CPUID Fn8000_0008 %ebx.
Use macro.
Add MCOMMIT instruction.
Define CPUID_CAPEX_FLAGS's bit 10 correctly.
 1.32.2.1 29-Feb-2020  ad Sync with head.
 1.42 24-Sep-2022  riastradh x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.
 1.41 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.40 07-Oct-2021  msaitoh KNF. No functional change.
 1.39 19-Jul-2020  maxv Revert most of ad's movs/stos change. Instead do a lot simpler: declare
svs_quad_copy() used by SVS only, with no need for instrumentation, because
SVS is disabled when sanitizers are on.
 1.38 14-Jul-2020  yamaguchi Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.
 1.37 27-May-2020  ad svs_pdir_switch(): Use MOVS to copy the PTES.
 1.36 27-May-2020  ad svs_pmap_sync(): Fast-path the curcpu case. Could be improved further
with a kcpuset iterator thing.
 1.35 02-May-2020  maxv Modify the hotpatch mechanism, in order to make it much less ROP-friendly.

Currently x86_patch_window_open is a big problem, because it is a perfect
function to inject/modify executable code with ROP.

- Remove x86_patch_window_open(), along with its x86_patch_window_close()
counterpart.
- Introduce a read-only link-set of hotpatch descriptor structures,
which reference a maximum of two read-only hotpatch sources.
- Modify x86_hotpatch() to open a window and call the new
x86_hotpatch_apply() function in a hard-coded manner.
- Modify x86_hotpatch() to take a name and a selector, and have
x86_hotpatch_apply() resolve the descriptor from the name and the
source from the selector, before hotpatching.
- Move the error handling in a separate x86_hotpatch_cleanup() function,
that gets called after we closed the window.

The resulting implementation is a bit complex and non-obvious. But it
gains the following properties: the code executed in the hotpatch window
is strictly hard-coded (no callback and no possibility to execute your own
code in the window) and the pointers this code accesses are strictly
read-only (no possibility to forge pointers to hotpatch an area that was
not designated as hotpatchable at compile-time, and no possibility to
choose what bytes to write other than the maximum of two read-only
templates that were designated as valid for the given destination at
compile-time).

With current CPUs this slightly improves a situation that is already
pretty bad by definition on x86. Assuming CET however, this change closes
a big hole and is kinda great.

The only ~problem there is, is that dtrace-fbt tries to hotpatch random
places with random bytes, and there is just no way to make it safe.
However dtrace is only in a module, that is rarely used and never compiled
into the kernel, so it's not a big problem; add a shitty & vulnerable
independent hotpatch window in it, and leave big XXXs. It looks like fbt
is going to collapse soon anyway.
 1.34 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.33 24-Apr-2020  maxv Give the ldt a fixed size of one page (512 slots), and drop the variable-
sized mechanism that was too complex.

This fixes a race between USER_LDT and SVS: during context switches, the
way SVS installs the new ldt relies on the ldt pointer AND the ldt size,
but both cannot be accessed atomically at the same time.
 1.32 31-Jan-2020  maxv branches: 1.32.4;
'oldlwp' is never NULL now, so remove the NULL checks.
 1.31 08-Dec-2019  ad branches: 1.31.2;
Merge x86 pmap changes from yamt-pagecache:

- Deal better with the multi-level pmap object locking kludge.
- Handle uvm_pagealloc() being able to block.
 1.30 07-Aug-2019  maxv Add support for USER_LDT in SVS. This allows us to have both enabled at
the same time.

We allocate an LDT for each CPU in the GDT and map an area for it, in
addition to the default LDT already present. In context switches between
different processes, we choose between the default or the per-cpu LDT
selector: if the user set specific LDT entries, we memcpy them to the
per-cpu LDT and load the per-cpu selector.

Tested by Naveen Narayanan (with Wine on amd64).
 1.29 29-May-2019  maxv Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.28 27-May-2019  maxv Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.
 1.27 27-May-2019  maxv Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.
 1.26 15-May-2019  maxv Change the way SVS is disabled. Now you have to pass "boot -3" from the
bootloader. The machdep.svs.enabled sysctl becomes read-only, and just
indicates whether SVS is enabled.

Sent on port-amd64@.
 1.25 21-Apr-2019  maxv Rename the PTE bits.
 1.24 23-Mar-2019  maxv In fact, xc_broadcast also applies to offline CPUs, so we don't need to
make sure each CPU is online. Remove the checks, I suspect they weren't
totally correct by the way.
 1.23 09-Mar-2019  maxv Start replacing the x86 PTE bits.
 1.22 06-Dec-2018  maxv Simplify, use _pi instead of modulos, no real functional change.
 1.21 19-Nov-2018  maxv Rename 'mask' -> 'frame', we will use the real 'mask' soon.
 1.20 12-Aug-2018  maxv Introduce PDIR_SLOT_USERLIM, which indicates the limit of the user slots.
Use it instead of PDIR_SLOT_PTE when we just want to iterate over the
user slots. Also use it in SVS, I had hardcoded 255 because there was no
proper define (which there now is).
 1.19 12-Jul-2018  maxv Handle NMIs correctly when SVS is enabled. We store the kernel's CR3 at the
top of the NMI stack, and we unconditionally switch to it, because we don't
know with which page tables we received the NMI. Hotpatch the whole thing as
usual.

This restores the ability to use PMCs on Intel CPUs.
 1.18 26-Apr-2018  alnsn branches: 1.18.2;
Add KAUTH_MACHDEP_SVS_DISABLE and add support to secmodel_securelevel(9).

Disabling SVS is denied at securelevel 1 and above.
 1.17 30-Mar-2018  maxv Improve the detection. Future generations of Intel CPUs will have a bit to
say they are not affected by Meltdown.
 1.16 29-Mar-2018  maxv Use EOPNOTSUPP instead of EINVAL.
 1.15 29-Mar-2018  maxv Fix sysctl type, should be bool.
 1.14 13-Mar-2018  maxv branches: 1.14.2;
Mmh, add a missing x86_disable_intr(). My intention there was to ensure
interrupts were disabled before the barriers.
 1.13 01-Mar-2018  maxv branches: 1.13.2;
Remove these two KASSERTs. Thinking about it, they may fire when the user
enters "sysctl -w machdep.svs.enabled=0", if the xcall is received between
the 'svs_enabled' check in the caller and the same check in these KASSERTs.

In such a case we perform an SVS operation with svs_enabled set to false,
but that's intentional: after it is done svs_pmap_sync and svs_lwp_switch
won't be called anymore, the pdir synchronization is dropped.

Having said that, I didn't see these KASSERTs getting triggered.
 1.12 25-Feb-2018  maxv Remove the first entry from the todo list, it's handled properly now.
 1.11 24-Feb-2018  maxv Fix one thing in the documentation, I meant to say only SVS_UTLS.
 1.10 24-Feb-2018  maxv Document SVS. Also, remove an entry from the todo list.
 1.9 23-Feb-2018  maxv Add a new entry in the TODO list.
 1.8 22-Feb-2018  maxv Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.
 1.7 22-Feb-2018  maxv Ensure the CPUs are all online. We take cpu_lock, so nobody can go offline
in the meantime.
 1.6 22-Feb-2018  maxv Make the machdep.svs_enabled sysctl writable, and add the kernel code
needed to disable SVS at runtime.

We set 'svs_enabled' to false, and hotpatch the kernel entry/exit points
to eliminate the context switch code.

We need to make sure there is no remote CPU that is executing the code we
are hotpatching. So we use two barriers:

* After the first one each CPU is guaranteed to be executing in
svs_disable_cpu with interrupts disabled (this way it can't leave this
place).

* After the second one it is guaranteed that SVS is disabled, so we flush
the cache, enable interrupts and continue execution normally.

Between the two barriers, cpu0 will disable SVS (svs_enabled=false and
hotpatch), and each CPU will restore the generic syscall entry point.

Three notes:

* We should call svs_pgg_update(true) afterwards, to put back PG_G on
the kernel pages (for better performance). This will be done in another
commit.

* The fact that we disable interrupts does not prevent us from receiving
an NMI, and it would be problematic. So we need to add some code to
verify that PMCs are disabled before hotpatching. This will be done
in another commit.

* In svs_disable() we expect each CPU to be online. We need to add a
check to make sure they indeed are.

The sysctl allows only a 1->0 transition. There is no point in doing 0->1
transitions anyway, and it would be complicated to implement because we
need to re-synchronize the CPU user page tables with the current ones (we
lost track of them in the last 1->0 transition).
 1.5 22-Feb-2018  maxv Improve the SVS initialization.

Declare x86_patch_window_open() and x86_patch_window_close(), and globalify
x86_hotpatch().

Introduce svs_enable() in x86/svs.c, that does the SVS hotpatching.

Change svs_init() to take a bool. This function gets called twice; early
when the system just booted (and nothing is initialized), lately when at
least pmap_kernel has been initialized.
 1.4 22-Feb-2018  maxv Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.
 1.3 18-Feb-2018  maxv Add svs_enabled, which defaults to 'true' when SVS is compiled (no dynamic
detection yet).
 1.2 17-Feb-2018  maxv Add svs_init. This is where we will detect the CPU and decide whether
to turn SVS on or not.

Add svs_pgg_update to dynamically add/remove PG_G from all the kernel
pages. Use it now.
 1.1 11-Feb-2018  maxv Move SVS into x86/svs.c
 1.13.2.8 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.13.2.7 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.13.2.6 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.13.2.5 28-Jul-2018  pgoyette Sync with HEAD
 1.13.2.4 02-May-2018  pgoyette Synch with HEAD
 1.13.2.3 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.13.2.2 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.13.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.14.2.4 05-May-2018  martin Pull up following revision(s) (requested by alnsn in ticket #790):

share/man/man9/secmodel_securelevel.9: revision 1.16
sys/secmodel/suser/secmodel_suser.c: revision 1.44
sys/secmodel/securelevel/secmodel_securelevel.c: revision 1.31
sys/sys/kauth.h: revision 1.76
sys/arch/x86/x86/svs.c: revision 1.18

Add KAUTH_MACHDEP_SVS_DISABLE and add support to secmodel_securelevel(9).
Disabling SVS is denied at securelevel 1 and above.

Add SVS. It may not be disabled at securelevel 1 and above.
 1.14.2.3 02-Apr-2018  martin Pull up the following revisions, requested by maxv in ticket #683:

sys/arch/x86/x86/svs.c 1.15-1.17

Fix sysctl type, should be bool.

Use EOPNOTSUPP instead of EINVAL.

Improve the detection. Future generations of Intel CPUs will have a bit to
say they are not affected by Meltdown.
 1.14.2.2 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.14.2.1 13-Mar-2018  martin file svs.c was added on branch netbsd-8 on 2018-03-22 16:59:04 +0000
 1.18.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.18.2.1 10-Jun-2019  christos Sync with HEAD
 1.31.2.1 29-Feb-2020  ad Sync with head.
 1.32.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.58 20-Aug-2022  riastradh x86: Move definition of struct pmap to pmap_private.h.

This makes pmap_resident_count and pmap_wired_count out-of-line
functions instead of inline. No functional change intended
otherwise.
 1.57 07-Oct-2021  msaitoh KNF. No functional change.
 1.56 19-Jun-2020  maxv localify
 1.55 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.54 24-Apr-2020  maxv Give the ldt a fixed size of one page (512 slots), and drop the variable-
sized mechanism that was too complex.

This fixes a race between USER_LDT and SVS: during context switches, the
way SVS installs the new ldt relies on the ldt pointer AND the ldt size,
but both cannot be accessed atomically at the same time.
 1.53 21-Apr-2020  jdolecek two more files to convert to newer HYPERVISOR_physdev_op() interface
 1.52 10-Nov-2019  chs branches: 1.52.6;
in many device attach paths, allocate memory with M_WAITOK instead of M_NOWAIT
and remove code to handle failures that can no longer happen.
 1.51 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.50 08-Nov-2018  maxv Simplify the ifdefs, and error out if XEN and USER_LDT are both defined.
 1.49 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.48 13-Jul-2018  maxv Remove the X86PMC code I had written, replaced by tprof. Many defines
become unused in specialreg.h, so remove them. We don't want to add
defines all the time, there are countless PMCs on many generations, and
it's better to just inline the event/unit values.
 1.47 12-Jul-2018  maxv Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.
 1.46 04-Jan-2018  maxv branches: 1.46.2; 1.46.4;
Declare IOMAP_VALIDOFF, not to use ci_tss pointers.
 1.45 04-Jan-2018  maxv Allocate the TSS area dynamically. This way cpu_info and cpu_tss can be
put in separate pages.
 1.44 04-Jan-2018  maxv Group the different TSSes into a cpu_tss structure. And pack this
structure to make sure there is no padding between 'tss' and 'iomap'.
 1.43 21-Oct-2017  maxv Forbid 64bit entries. That's it, now we support USER_LDT on amd64.
 1.42 21-Oct-2017  maxv Improve our segregs model. Pass 3/3.

Treat %gs the same way we treat %ds/%es/%fs: restore it in INTRFASTEXIT
on 32bit LWPs.

On Xen however, its behavior does not change, because we need to do an
hypercall before INTR_RESTORE_GPRS, and that's too complicated for now.

As a side effect, this change fixes a bug in the ACPI wakeup code; %fs/%gs
were not restored on 32bit LWPs, and chances are they would segfault
shortly afterwards.

Support for USER_LDT on amd64 is almost complete now.
 1.41 19-Oct-2017  maxv Improve our segregs model. Pass 2/3.

Treat %fs the same way we treat %ds and %es. For a new 32bit LWP %fs is
set to GUDATA32_SEL, and always updated in INTRFASTEXIT.

This solves an important issue we had until now: we couldn't handle the
faults generated by the "movw $val,%fs" instructions, because they were
deep into the kernel context. Now %fs can fault only in INTRFASTEXIT,
which is safe.

Note that it also fixes a bug I believe affected the kernel: on AMD CPUs,
setting %fs to zero does not flush the internal register state, and
therefore we could leak the %fs base address when context-switching. This
being said, I couldn't trigger the issue on the AMD cpu I have. Whatever,
it's fixed now, since we first set %fs to GUDATA32 - which does flush the
register state.
 1.40 15-Oct-2017  maxv Remove this #undef on native amd64, but keep it on Xen.
 1.39 15-Oct-2017  maxv Add setusergs on Xen, and simplify.
 1.38 30-Aug-2017  maxv Don't allow userland to create 286/386 call gates anymore - they are not
used by Wine. While here, don't allow it to overwrite the static entries
either, don't allow unknown entry types, remove LDT_DEBUG, and style.
 1.37 12-Aug-2017  maxv Remove vm86.

Pass 3.
 1.36 12-Jul-2017  maxv include opt_pmc.h
 1.35 10-Mar-2017  maxv branches: 1.35.6;
PMCs for amd64 - still disabled, like i386.
 1.34 18-Feb-2017  maxv There is currently an ugly mix between the PERFCTRS subsystem (MI), and
i386's own PMC interface (MD). Stop using PERFCTRS and use PMC instead.
While here remove some unused flags, which are wrong on the latest CPUs
anyway.
 1.33 17-Feb-2017  maxv Support PMCs on multi-processor systems. Still several things to fix, but
at least it works a little. Will be improved and moved into x86/ soon.
 1.32 14-Feb-2017  maxv Add most of my USER_LDT code for amd64, but disable it and put a comment
about why Wine still does not work.

Nothing changes, but at least it is a step forward.
 1.31 05-Feb-2017  maxv Rename ldt->ldtstore and gdt->gdtstore on i386. It reduces the diff with
amd64, and makes it easier to track down these variables on nxr - 'ldt'
and 'gdt' being common keywords.
 1.30 24-Sep-2016  dholland branches: 1.30.2;
LDT handling fixes:
- add missing membar_store_store ("membar_producer") when setting a
new ldt;
- use UVM_KMF_WAITVA when allocating space for a new ldt instead of
crashing if uvm_km_alloc fails;
- if uvm_km_alloc fails in pmap_fork, bail instead of crashing;
- clarify what else is going on in pmap_fork;
- don't uvm_km_free while holding a mutex.
 1.29 23-Oct-2015  christos branches: 1.29.2;
fix broken error handling; error was used uninitialized. Changing the
compilation flags broke all threaded programs for me.
XXX: pullup-7
 1.28 28-Jun-2014  dholland branches: 1.28.2; 1.28.4; 1.28.6;
If we're going to use just the name of the dying function as a panic
string, it should at least be the name of the *right* function. ish.
 1.27 20-Mar-2014  christos branches: 1.27.2;
need compat_netbsd.h
 1.26 04-Oct-2012  dsl branches: 1.26.2;
Remove references to VM86 from the amd64 kernel configs.
VM86 mode isn't supported while in long mode.
 1.25 10-Oct-2011  jakllsch branches: 1.25.2; 1.25.8; 1.25.12; 1.25.14; 1.25.16;
x86_print_ldt() is only used in the USER_LDT && LDT_DEBUG case.
 1.24 07-Jul-2010  chs implement cpu_lwp_setprivate() on several platforms.
 1.23 23-Apr-2010  joerg Use struct segment_descriptor for pcb_fsd and pcb_gsd instead of int[2].
 1.22 21-Nov-2009  rmind branches: 1.22.2; 1.22.4;
Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.21 11-Nov-2009  yamt x86_get_sdbase: copyout to a correct address.
 1.20 29-Jul-2009  cegger remove Xen2 support.
ok bouyer@
 1.19 17-May-2009  bouyer on Xen the GDT has to be updated though HYPERVISOR_update_descriptor().
Export i386/i386/gdt.c:update_descriptor() and use it in x86_set_sdbase(),
as a direct write to the GDT will cause a kernel trap.
Fix PR port-xen/41401.
 1.18 29-Mar-2009  ad _lwp_setprivate: provide the value to MD code if a hook is present.

This will be used to support TLS. The MD method must match the ELF TLS spec
for that CPU architecture (if there is a spec).

At this time it is only implemented for i386, where it means setting the
per-thread base address for %gs. Please implement this for your platform!
 1.17 21-Mar-2009  ad PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash

Fix numerous problems:

1. LDT updates are not atomic.

2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.

3. LDTR can be leaked over context switch.

4. GDT slot allocations can race, giving the same LDT slot to two procs.

5. Incomplete interrupt/trap frames can be stacked.

6. In some rare cases segment faults are not handled correctly.
 1.16 19-Nov-2008  ad branches: 1.16.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
 1.15 11-May-2008  ad branches: 1.15.4; 1.15.6; 1.15.8;
Disable preemption over LDT modifications.
 1.14 28-Apr-2008  martin branches: 1.14.2;
Remove clause 3 and 4 from TNF licenses
 1.13 27-Apr-2008  ad branches: 1.13.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.
 1.12 21-Apr-2008  ad Make ntp, pmc, reboot, sysarch, time syscalls MPSAFE.
 1.11 28-Jan-2008  ad branches: 1.11.6; 1.11.8;
Enablel locking that was stubbed out.
 1.10 05-Jan-2008  yamt - make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.
 1.9 04-Jan-2008  yamt i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.
 1.8 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.7 22-Nov-2007  bouyer branches: 1.7.2; 1.7.6;
Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support
to NetBSD/Xen, both Dom0 and DomU.
 1.6 10-Nov-2007  ad fsbase/gsbase:

- Fix a few bugs with it, in particular fork/exec handling.
- Store the descriptors in the PCB, not in the LWP.
 1.5 10-Nov-2007  ad - When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.
 1.4 17-Oct-2007  garbled branches: 1.4.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.3 29-Aug-2007  ad branches: 1.3.2; 1.3.6;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.2 23-Jun-2007  dsl branches: 1.2.2; 1.2.6; 1.2.10; 1.2.12;
Split x86_set/get_ldt() so they are callable with kernel buffers.
For linux emulation code.
 1.1 16-Apr-2007  ad branches: 1.1.2; 1.1.4; 1.1.6;
Share the sysarch stuff between the x86 ports. PR kern/36046.
 1.1.6.7 03-Dec-2007  ad Sync with HEAD.
 1.1.6.6 09-Oct-2007  ad Sync with head.
 1.1.6.5 15-Jul-2007  ad Sync with head.
 1.1.6.4 15-Jul-2007  ad Sync with head.
 1.1.6.3 09-Jun-2007  ad Sync with head.
 1.1.6.2 09-Jun-2007  ad Sync with head.
 1.1.6.1 16-Apr-2007  ad file sys_machdep.c was added on branch vmlocking on 2007-06-09 21:37:06 +0000
 1.1.4.2 07-May-2007  yamt sync with head.
 1.1.4.1 16-Apr-2007  yamt file sys_machdep.c was added on branch yamt-idlelwp on 2007-05-07 10:55:05 +0000
 1.1.2.2 03-Oct-2007  garbled Sync with HEAD
 1.1.2.1 26-Jun-2007  garbled Sync with HEAD.
 1.2.12.3 23-Mar-2008  matt sync with HEAD
 1.2.12.2 09-Jan-2008  matt sync with HEAD
 1.2.12.1 06-Nov-2007  matt sync with HEAD
 1.2.10.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.2.10.2 11-Nov-2007  joerg Sync with HEAD.
 1.2.10.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.2.6.1 03-Sep-2007  skrll Sync with HEAD.
 1.2.2.2 11-Jul-2007  mjf Sync with head.
 1.2.2.1 23-Jun-2007  mjf file sys_machdep.c was added on branch mjf-ufs-trans on 2007-07-11 20:03:26 +0000
 1.3.6.2 13-Nov-2007  bouyer Sync with HEAD
 1.3.6.1 17-Oct-2007  bouyer amd64 (aka x86-64) support for Xen. Based on the OpenBSD port done by
Mathieu Ropert in 2006.
DomU-only for now. An INSTALL_XEN3_DOMU kernel with a ramdisk will boot to
sysinst if you're lucky. Often it panics because a runable LWP has
a NULL stack (really, it's all of l->l_addr which is has been zeroed out
while the process was on the queue !)
TODO:
- bug fixes :)
- Most of the xpq_* functions should be shared with xen/i386
- The xen/i386 assembly bootstrap code should be remplaced with the C
version in xenamd64/amd64/xpmap.c
- see if a config(5) trick could allow to merge xenamd64 back to xen or amd64.
 1.3.2.6 04-Feb-2008  yamt sync with head.
 1.3.2.5 21-Jan-2008  yamt sync with head
 1.3.2.4 07-Dec-2007  yamt sync with head
 1.3.2.3 15-Nov-2007  yamt sync with head.
 1.3.2.2 03-Sep-2007  yamt sync with head.
 1.3.2.1 29-Aug-2007  yamt file sys_machdep.c was added on branch yamt-lazymbuf on 2007-09-03 14:31:29 +0000
 1.4.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.4.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.4.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.4.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.7.6.2 08-Jan-2008  bouyer Sync with HEAD
 1.7.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.7.2.1 26-Dec-2007  ad Sync with head.
 1.11.8.1 18-May-2008  yamt sync with head.
 1.11.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.11.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.2.6 11-Aug-2010  yamt sync with head.
 1.13.2.5 11-Mar-2010  yamt sync with head
 1.13.2.4 19-Aug-2009  yamt sync with head.
 1.13.2.3 20-Jun-2009  yamt sync with head
 1.13.2.2 04-May-2009  yamt sync with head.
 1.13.2.1 16-May-2008  yamt sync with head.
 1.14.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.15.8.1 04-Apr-2009  snj Pull up following revision(s) (requested by ad in ticket #656):
sys/arch/amd64/amd64/gdt.c: revision 1.21 via patch
sys/arch/amd64/amd64/machdep.c: revision 1.129 via patch
sys/arch/i386/i386/gdt.c: revision 1.47 via patch
sys/arch/i386/i386/kvm86.c: revision 1.17 via patch
sys/arch/i386/i386/locore.S: revision 1.85 via patch
sys/arch/i386/i386/machdep.c: revision 1.666 via patch
sys/arch/i386/i386/vector.S: revision 1.45 via patch
sys/arch/i386/include/pcb.h: revision 1.47 via patch
sys/arch/x86/include/pmap.h: revision 1.22 via patch
sys/arch/x86/include/sysarch.h: revision 1.8 via patch
sys/arch/x86/x86/pmap.c: revision 1.80 via patch
sys/arch/x86/x86/sys_machdep.c: revision 1.17 via patch
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.143 via patch
sys/kern/init_main.c: revision 1.384 via patch
PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash
Fix numerous problems:
1. LDT updates are not atomic.
2. Number of processes running with private LDTs and/or I/O bitmaps
is not capped. System with high maxprocs can be paniced.
3. LDTR can be leaked over context switch.
4. GDT slot allocations can race, giving the same LDT slot to two procs.
5. Incomplete interrupt/trap frames can be stacked.
6. In some rare cases segment faults are not handled correctly.
 1.15.6.2 28-Apr-2009  skrll Sync with HEAD.
 1.15.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.15.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.16.4.4 24-Oct-2010  jym Sync with HEAD
 1.16.4.3 01-Nov-2009  jym Sync with HEAD.
 1.16.4.2 31-May-2009  jym Sync with HEAD.
 1.16.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.22.4.2 05-Mar-2011  rmind sync with head
 1.22.4.1 30-May-2010  rmind sync with head
 1.22.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.22.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.25.16.1 15-Nov-2015  bouyer Pull up following revision(s) (requested by christos in ticket #1341):
sys/arch/x86/x86/sys_machdep.c: revision 1.29
fix broken error handling; error was used uninitialized. Changing the
compilation flags broke all threaded programs for me.
XXX: pullup-7
 1.25.14.1 15-Nov-2015  bouyer Pull up following revision(s) (requested by christos in ticket #1341):
sys/arch/x86/x86/sys_machdep.c: revision 1.29
fix broken error handling; error was used uninitialized. Changing the
compilation flags broke all threaded programs for me.
XXX: pullup-7
 1.25.12.3 03-Dec-2017  jdolecek update from HEAD
 1.25.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.25.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.25.8.1 15-Nov-2015  bouyer Pull up following revision(s) (requested by christos in ticket #1341):
sys/arch/x86/x86/sys_machdep.c: revision 1.29
fix broken error handling; error was used uninitialized. Changing the
compilation flags broke all threaded programs for me.
XXX: pullup-7
 1.25.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.25.2.1 30-Oct-2012  yamt sync with head
 1.26.2.1 18-May-2014  rmind sync with head
 1.27.2.1 10-Aug-2014  tls Rebase.
 1.28.6.1 08-Nov-2015  riz Pull up following revision(s) (requested by christos in ticket #1013):
sys/arch/x86/x86/sys_machdep.c: revision 1.29
fix broken error handling; error was used uninitialized. Changing the
compilation flags broke all threaded programs for me.
XXX: pullup-7
 1.28.4.3 28-Aug-2017  skrll Sync with HEAD
 1.28.4.2 05-Oct-2016  skrll Sync with HEAD
 1.28.4.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.28.2.1 08-Nov-2015  riz Pull up following revision(s) (requested by christos in ticket #1013):
sys/arch/x86/x86/sys_machdep.c: revision 1.29
fix broken error handling; error was used uninitialized. Changing the
compilation flags broke all threaded programs for me.
XXX: pullup-7
 1.29.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.29.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.30.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.35.6.3 13-Mar-2018  martin Pullup the following revisions via patch, requested by maxv in ticket #629:

sys/arch/amd64/amd64/genassym.cf 1.63,1.64
sys/arch/amd64/amd64/locore.S 1.144
sys/arch/amd64/amd64/machdep.c 1.281-1.283
sys/arch/i386/i386/genassym.cf 1.105-1.106
sys/arch/i386/i386/locore.S 1.155
sys/arch/i386/i386/machdep.c 1.802 (adapted),1.803
sys/arch/x86/include/cpu.h 1.85
sys/arch/x86/x86/intr.c 1.115-1.116
sys/arch/x86/x86/pmap.c 1.275
sys/arch/x86/x86/sys_machdep.c 1.45
sys/arch/xen/x86/cpu.c 1.117

Stop sharing the double-fault stack.
Merge the TSS structures into one single cpu_tss structure, and
allocate it dynamically.
 1.35.6.2 09-Sep-2017  snj Pull up following revision(s) (requested by maxv in ticket #258):
sys/arch/amd64/conf/ALL: 1.68
sys/arch/i386/conf/ALL: 1.428
sys/arch/i386/i386/i386_trap.S: 1.12
sys/arch/i386/i386/locore.S: 1.149-1.150
sys/arch/x86/x86/sys_machdep.c: 1.38
Remove undocumented hack.
--
Switch to the temporary stack right away when booted via multiboot. GRUB
happens to give a correct stack, but it is not guaranteed by the spec. This
temporary stack will be reset later, which is fine.
Fixes PR/50245.
--
Pfff, use %ss and not %ds. The latter is controlled by userland, the former
contains the kernel value (flat); FreeBSD fixed this too a few weeks ago.
As I said earlier, this dtrace code is complete bullshit.
--
Don't allow userland to create 286/386 call gates anymore - they are not
used by Wine. While here, don't allow it to overwrite the static entries
either, don't allow unknown entry types, remove LDT_DEBUG, and style.
 1.35.6.1 01-Aug-2017  snj Pull up following revision(s) (requested by maxv in ticket #164):
distrib/sets/lists/base/md.amd64: revision 1.269
distrib/sets/lists/debug/md.amd64: revision 1.97
sys/arch/amd64/conf/GENERIC: revision 1.460
sys/arch/amd64/conf/files.amd64: revision 1.89
sys/arch/i386/conf/GENERIC: revision 1.1157
sys/arch/i386/conf/files.i386: revision 1.379
sys/arch/i386/i386/i386_trap.S: revision 1.7-1.8
sys/arch/i386/include/frameasm.h: revision 1.16
sys/arch/x86/include/sysarch.h: revision 1.12
sys/arch/x86/x86/pmc.c: revision 1.8-1.10
sys/arch/x86/x86/sys_machdep.c: revision 1.36
sys/arch/xen/conf/files.compat: revision 1.26
sys/secmodel/suser/secmodel_suser.c: revision 1.43
sys/sys/kauth.h: revision 1.74
usr.bin/pmc/Makefile: revision 1.5
usr.bin/pmc/pmc.1: revision 1.12-1.13
usr.bin/pmc/pmc.c: revision 1.24-1.25
style
--
style
--
Disable interrupts for T_NMI (inline calltrap). Note that there's still a
way to evade the NMI mode here, if a segment register faults in
INTRFASTEXIT; but we don't care. I didn't test this change, but it seems
fine enough.
--
Make the PMC syscalls privileged.
--
Check argc, and add a message.
--
include opt_pmc.h
--
Build the pmc tool on amd64.
--
Properly handle overflows, and take them into account in userland.
--
Update.
--
Enable PMCs by default.
--
Sort sections. Fix macro usage.
 1.46.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.46.4.1 10-Jun-2019  christos Sync with HEAD
 1.46.2.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.46.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.46.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.52.6.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.22 05-Oct-2023  ad Arrange to update cached LWP credentials in userret() rather than during
syscall/trap entry, eliminating a test+branch on every syscall/trap.

This wasn't possible in the 3.99.x timeframe when l->l_cred came about
because there wasn't a reliable/timely way to force an ONPROC LWP running on
a remote CPU into the kernel (which is just about the only new thing in
this scheme).
 1.21 17-Mar-2022  riastradh x86: Revert previous syscall biglock slippage attribution.

The attribution in userret is good enough as is, because the stack
trace on panic shows the syscall number in the trap frame, so no need
to put extra cost in the syscall entry logic even under DIAGNOSTIC.
 1.20 12-Mar-2022  riastradh x86: Provide better attribution for syscall biglock slippage.
 1.19 07-Oct-2021  msaitoh KNF. No functional change.
 1.18 06-Apr-2019  kamil Centralized shared part of child_return() into MI part

Add a new function md_child_return() for MD specific bits only.

New child_return() is now part of MI and central code that handles
uniformly tracing code (KTR and ptrace(2)).

Synchronize value passed to ktrsysret() among ports to SYS_fork. This is
a traditional value and accessing p_lflag to check for PL_PPWAIT shall
use locking against proc_lock. Returning SYS_fork vs SYS_vfork still isn't
correct enough as there are more entry points to forking code. Instead of
making it too good, just settle with plain SYS_fork for all ports.
 1.17 03-Apr-2019  kamil Rework the fork(2)/vfork(2) event signalling under ptrace(2)

Remove the constraint of SIGTRAP event being maskable by a tracee.

Now all SIGTRAP TRAP_CHLD events are delivered to debugger.

This code touches MD specific logic and the child_return routine.
It's an intermediate step with a room for refactoring in future and
right now the least invasive approach. This allows to assert expected
behavior in already existing ATF tests and make the code prettier
in future keeping the same semantics. Probably there is a need for a MI
wrapper of child_return for shared functionality between ports.
 1.16 12-Aug-2017  maxv branches: 1.16.4;
Remove vm86.

Pass 3.
 1.15 31-Mar-2017  martin PR kern/52117: move stop code for debuged children after fork into MI code.
XXX we might want to revisit this when handling the same event for vfork
better.
 1.14 07-Jul-2016  msaitoh branches: 1.14.2; 1.14.4;
KNF. Remove extra spaces. No functional change.
 1.13 26-Oct-2014  christos branches: 1.13.2;
dtrace expects a globally accessible syscall symbol.
 1.12 26-Jun-2013  matt Use sy_invoke
 1.11 10-Jul-2012  dsl branches: 1.11.2;
Revert the rest of rev 1.6
 1.10 19-Feb-2012  rmind Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
 1.9 11-Feb-2012  martin Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.
 1.8 05-Jan-2012  reinoud Remove unused variable i accidently left standing
 1.7 05-Jan-2012  reinoud Oops, forgot to revert this patch too... thanks Greg for finding it :-/
 1.6 20-Dec-2011  reinoud Part 2 - x86 implementation of MAP_NOSYSCALLS

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..
 1.5 04-Sep-2011  christos branches: 1.5.2; 1.5.6;
Remove code that was used to avoid register spills. setcontext(2) can change
the registers, so re-fetching will produce the wrong result for trace_exit().
 1.4 02-Sep-2011  christos If the process is traced, resulting from a PTRACE_FORK inherited setting,
stop it right now.

XXX[1]: Cannot make this MI, because I cannot wrap child_return because there
is MD code that checks fun == child_return. I think it is better to have an
mi child_return() and add a cpu_child_return()?
XXX[2]: Why do we need to stop so early? Perhaps stopping just after exec
is better?
 1.3 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.2 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.1 16-Apr-2009  rmind branches: 1.1.2; 1.1.4; 1.1.6;
- Add macros to handle (some) trapframe registers for common x86 code.
- Merge i386 and amd64 syscall.c into x86. No functional changes intended.

Proposed on (port-i386 & port-amd64). Unfortunately, I cannot merge these
lists into the single port-x86. :(
 1.1.6.4 24-Oct-2010  jym Sync with HEAD
 1.1.6.3 01-Nov-2009  jym Sync with HEAD.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 16-Apr-2009  jym file syscall.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.1.4.3 11-Mar-2010  yamt sync with head
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 16-Apr-2009  yamt file syscall.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:11 +0000
 1.1.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.1.2.1 16-Apr-2009  skrll file syscall.c was added on branch nick-hppapmap on 2009-04-28 07:34:57 +0000
 1.5.6.2 24-Feb-2012  mrg sync to -current.
 1.5.6.1 18-Feb-2012  mrg merge to -current.
 1.5.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.2.2 30-Oct-2012  yamt sync with head
 1.5.2.1 17-Apr-2012  yamt sync with head
 1.11.2.2 03-Dec-2017  jdolecek update from HEAD
 1.11.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.13.2.2 28-Aug-2017  skrll Sync with HEAD
 1.13.2.1 09-Jul-2016  skrll Sync with HEAD
 1.14.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.14.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.16.4.1 10-Jun-2019  christos Sync with HEAD
 1.10 16-Jul-2018  maxv Move
arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c
into
dev/tprof/tprof_x86_intel.c
dev/tprof/tprof_x86_amd.c
 1.9 15-Jul-2018  maxv Remove unused x86/include/tprof.h, there should be no need for this kind
of includes.
 1.8 13-Jul-2018  maxv Revamp tprof.

Rewrite the Intel backend to use the generic PMC interface, which is
available on all Intel CPUs. Synchronize the AMD backend with the new
interface.

The kernel identifies the PMC interface, and gives its id to userland.
Userland then queries the events itself (via cpuid etc). These events
depend on the PMC interface.

The tprof utility is rewritten to allow the user to choose which event
to count (which was not possible until now, the event was hardcoded in
the backend). The command line format is based on usr.bin/pmc, eg:

tprof -e llc-misses:k -o output sleep 20

The man page is updated too, but the arguments will likely change soon
anyway so it doesn't matter a lot.

The tprof utility has three tables:

Intel Architectural Version 1
Intel Skylake/Kabylake
AMD Family 10h

A CPU can support a combination of tables. For example Kabylake has
Intel-Architectural-Version-1 and its own Intel-Kabylake table.

For now the Intel Skylake/Kabylake table contains only one event, just
to demonstrate that the combination of tables works. Tested on an
Intel Core i5 Kabylake.

The code for AMD Family 10h is taken from the code I had written for
usr.bin/pmc. I haven't tested it yet, but it's the same as pmc(1), so
I guess it works as-is.

The whole thing is written in such a way that (I think) it is not
complicated to add more CPU models, and more architectures (other than
x86).
 1.7 23-May-2017  nonaka branches: 1.7.8; 1.7.10;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.6 11-Feb-2017  maxv As the XXX implicitly suggests, this line is wrong. Many other families
support PMCs (like my 10h amd). While here, put a warning in a comment.
 1.5 31-Jan-2017  maxv Update the URLs, and add the DC_refills_ flags (from the spec, not present
on my cpu).
 1.4 15-Nov-2013  msaitoh branches: 1.4.6; 1.4.10; 1.4.14;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.3 05-Feb-2011  yamt branches: 1.3.4; 1.3.14; 1.3.18;
tprof: record pid and userland events.
 1.2 13-Mar-2009  yamt branches: 1.2.2; 1.2.4; 1.2.6; 1.2.10; 1.2.12; 1.2.14;
tprof_amdpmi_start_cpu: PESR_COUNTER_MASK=0 for simplicity.
(my understanding of the value is that 0 and 1 mean the same thing.)
 1.1 12-Mar-2009  yamt a tprof backend which uses amd perfctr interrupt.
 1.2.14.1 08-Feb-2011  bouyer Sync with HEAD
 1.2.12.1 06-Jun-2011  jruoho Sync with HEAD.
 1.2.10.1 05-Mar-2011  rmind sync with head
 1.2.6.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.2.6.3 01-Nov-2009  jym Sync with HEAD.
 1.2.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.2.6.1 13-Mar-2009  jym file tprof_amdpmi.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.2.4.2 04-May-2009  yamt sync with head.
 1.2.4.1 13-Mar-2009  yamt file tprof_amdpmi.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:11 +0000
 1.2.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.2.2.1 13-Mar-2009  skrll file tprof_amdpmi.c was added on branch nick-hppapmap on 2009-04-28 07:34:57 +0000
 1.3.18.1 18-May-2014  rmind sync with head
 1.3.14.2 03-Dec-2017  jdolecek update from HEAD
 1.3.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.3.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.4.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.4.6.2 28-Aug-2017  skrll Sync with HEAD
 1.4.6.1 05-Feb-2017  skrll Sync with HEAD
 1.7.10.1 10-Jun-2019  christos Sync with HEAD
 1.7.8.1 28-Jul-2018  pgoyette Sync with HEAD
 1.17 16-Jul-2018  maxv Move
arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c
into
dev/tprof/tprof_x86_intel.c
dev/tprof/tprof_x86_amd.c
 1.16 15-Jul-2018  maxv Remove unused x86/include/tprof.h, there should be no need for this kind
of includes.
 1.15 13-Jul-2018  maxv Revamp tprof.

Rewrite the Intel backend to use the generic PMC interface, which is
available on all Intel CPUs. Synchronize the AMD backend with the new
interface.

The kernel identifies the PMC interface, and gives its id to userland.
Userland then queries the events itself (via cpuid etc). These events
depend on the PMC interface.

The tprof utility is rewritten to allow the user to choose which event
to count (which was not possible until now, the event was hardcoded in
the backend). The command line format is based on usr.bin/pmc, eg:

tprof -e llc-misses:k -o output sleep 20

The man page is updated too, but the arguments will likely change soon
anyway so it doesn't matter a lot.

The tprof utility has three tables:

Intel Architectural Version 1
Intel Skylake/Kabylake
AMD Family 10h

A CPU can support a combination of tables. For example Kabylake has
Intel-Architectural-Version-1 and its own Intel-Kabylake table.

For now the Intel Skylake/Kabylake table contains only one event, just
to demonstrate that the combination of tables works. Tested on an
Intel Core i5 Kabylake.

The code for AMD Family 10h is taken from the code I had written for
usr.bin/pmc. I haven't tested it yet, but it's the same as pmc(1), so
I guess it works as-is.

The whole thing is written in such a way that (I think) it is not
complicated to add more CPU models, and more architectures (other than
x86).
 1.14 23-May-2017  nonaka branches: 1.14.8; 1.14.10;
x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.
 1.13 15-Nov-2013  msaitoh branches: 1.13.6;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.12 05-Feb-2011  yamt branches: 1.12.4; 1.12.14; 1.12.18;
tprof: record pid and userland events.
 1.11 09-May-2010  rmind branches: 1.11.2; 1.11.4;
Drop x86 MD package/core/smt IDs and use MI.
 1.10 26-Mar-2009  dyoung branches: 1.10.2; 1.10.4;
This only got the definition of device_xname() by chance, so explicitly
#include <sys/device.h>.
 1.9 13-Mar-2009  yamt tprof_pmi_start_cpu: replace magic numbers with a macro.
 1.8 12-Mar-2009  yamt s/__amd64__/__x86_64__/ as it's processor dependent.
suggested by matthew green on source-changes-d@.
 1.7 12-Mar-2009  yamt test a correct macro. amd64 -> __amd64__
 1.6 11-Mar-2009  yamt fix breakage where db_regs_t != trapframe.
the problem pointed out by Martin Husemann on tech-kern@.
 1.5 10-Mar-2009  yamt - adapt to MODULAR.
- some preparations to have more backends.
- add some comments.
 1.4 24-Feb-2009  yamt - rewrite x86 nmi dispatcher so that establish and disesablish are safe
on a running system.
- adapt existing users of the api. (elan)
- adapt tprof_pmi driver to use the api.
 1.3 11-May-2008  yamt branches: 1.3.6; 1.3.12;
tprof_backend_estimate_freq: ci_tsc_freq -> ci_data.cpu_cc_freq
 1.2 04-Jan-2008  yamt branches: 1.2.2; 1.2.4; 1.2.6; 1.2.12; 1.2.14; 1.2.16; 1.2.18;
use device_xname.
 1.1 01-Jan-2008  yamt branches: 1.1.2;
a simple performance monitor based profiler, inspired from linux oprofile.
 1.1.2.3 08-Jan-2008  bouyer Sync with HEAD
 1.1.2.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.2.1 01-Jan-2008  bouyer file tprof_pmi.c was added on branch bouyer-xeni386 on 2008-01-02 21:51:26 +0000
 1.2.18.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.2.16.3 11-Aug-2010  yamt sync with head.
 1.2.16.2 04-May-2009  yamt sync with head.
 1.2.16.1 16-May-2008  yamt sync with head.
 1.2.14.1 18-May-2008  yamt sync with head.
 1.2.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.2.6.2 18-Feb-2008  mjf Sync with HEAD.
 1.2.6.1 04-Jan-2008  mjf file tprof_pmi.c was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.2.4.2 21-Jan-2008  yamt sync with head
 1.2.4.1 04-Jan-2008  yamt file tprof_pmi.c was added on branch yamt-lazymbuf on 2008-01-21 09:40:18 +0000
 1.2.2.2 09-Jan-2008  matt sync with HEAD
 1.2.2.1 04-Jan-2008  matt file tprof_pmi.c was added on branch matt-armv6 on 2008-01-09 01:50:00 +0000
 1.3.12.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.3.12.3 24-Oct-2010  jym Sync with HEAD
 1.3.12.2 01-Nov-2009  jym Sync with HEAD.
 1.3.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.3.6.2 28-Apr-2009  skrll Sync with HEAD.
 1.3.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.10.4.2 05-Mar-2011  rmind sync with head
 1.10.4.1 30-May-2010  rmind sync with head
 1.10.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.11.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.11.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.12.18.1 18-May-2014  rmind sync with head
 1.12.14.2 03-Dec-2017  jdolecek update from HEAD
 1.12.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.13.6.1 28-Aug-2017  skrll Sync with HEAD
 1.14.10.1 10-Jun-2019  christos Sync with HEAD
 1.14.8.1 28-Jul-2018  pgoyette Sync with HEAD
 1.63 08-May-2025  imil Rename BOOTCYCLETIME kernel option and subsequent files to BOOT_DURATION
 1.62 06-May-2025  imil Add BOOTCYCLETIME option to print kernel boot time

Introduce a new kernel option, BOOTCYCLETIME, which will print
the time taken for the kernel to boot on (for now) amd64 and i386
architectures.
 1.61 03-Oct-2024  riastradh x86/tsc.c: Fix comment indentation.

No functional change intended.
 1.60 19-Feb-2024  mrg branches: 1.60.2;
remove unintended printf() in previous. (thx dh)
 1.59 19-Feb-2024  mrg make TSC get a quality of -100 on AMD Family 15h and 16h

this should "fix" PR#56322 and is known as AMD errata
"778: Processor Core Time Stamp Counters May Experience Drift"
 1.58 09-Sep-2023  ad tsc_get_timecount(): cover the backwards check by DIAGNOSTIC since it has
proven the point by now.
 1.57 15-Oct-2021  jmcneill branches: 1.57.4;
Fix typo in comment: "porniters" -> "pointers"
 1.56 02-Jun-2021  nia when warning about TSC going backwards, provide advice to the sysadmin.
 1.55 01-Jun-2021  riastradh x86: Reset cached tsc in every lwp to 0 on suspend/resume.

This avoids spuriously warning about tsc going backwards, which is to
be expected after a suspend/resume cycle.
 1.54 19-Feb-2021  christos branches: 1.54.4;
Penalize TSC on VirtualBox because it is not accurate enough.
 1.53 17-Feb-2021  rillig x86/tsc: fix double space in warning about TSC going backwards
 1.52 15-Jun-2020  riastradh branches: 1.52.2;
Nix trailing whitespace.
 1.51 15-Jun-2020  msaitoh Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.50 14-Jun-2020  ad tsc_get_timecount(): disable the "clock goes backwards" check on i386 for
the moment since it requires 64-bit store to be atomic because of nesting
via interrupt.
 1.49 13-Jun-2020  ad Print a rate limited warning if the TSC timecounter goes backwards from the
viewpoint of any single LWP.
 1.48 27-May-2020  ad tsc_delay(): use tsc_freq in preference to cpu_frequency().
 1.47 20-May-2020  ad The boot CPU suffers a cache miss during TSC sync, before RDTSC. Make the
secondary CPU take a miss as well to try and delay it an equal amount.
 1.46 19-May-2020  ad Ignore x86_delay, for xen
 1.45 19-May-2020  ad If the the TSC timecounter is good then use the TSC for DELAY() too.
 1.44 08-May-2020  ad Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().
 1.43 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.42 23-Apr-2020  ad When computing TSC skew make 8 measurements and use the average.
 1.41 21-Apr-2020  msaitoh Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.
 1.40 06-Apr-2020  msaitoh branches: 1.40.2;
Rename CPUID_APM_TSC to CPUID_APM_ITSC. No functional change.
 1.39 03-Apr-2020  knakahara Fix TSC drift is observed almost every time wrongly.

Ths "TSC drift" in tsc_tc_init() means the cpu_cc_skew delta between
first measurement (in cpu_start_secondary) and second measurement
(in cpu_boot_secondary), that is, the TSC drift is expected to be
almost zero. However, the second measument in current implementation
is added extra cpu_cc_skew accidentally, so current delta value means
almost cpu_cc_skew wrongly.

tsc_sync_bp and tsc_sync_ap should use rdtsc() to get raw values.

Advised by nonaka@n.o, thanks.
 1.38 21-Feb-2020  joerg Explicitly cast pointers to uintptr_t before casting to enums. They are
not necessarily the same size. Don't cast pointers to bool, check for
NULL instead.
 1.37 02-Oct-2017  maxv branches: 1.37.4; 1.37.8; 1.37.10;
Add a machdep.tsc_user_enable sysctl, to enable/disable the rdtsc
instruction in usermode. It defaults to enabled.
 1.36 18-Dec-2013  msaitoh branches: 1.36.22;
Fix comment.
 1.35 11-Dec-2013  msaitoh Make new function named tsc_is_invariant() to avoid code duplication.
The behavior of acpicpu_md_flags() will change on some CPUs because
the detecting code of invariant TSC is replaced with newer code.
 1.34 08-Dec-2013  msaitoh Update invariant TSC detect code from both Intel and AMD documents.
The best way to check whether the TSC counter is invariant or not is to check
CPUID 80000007.
 1.33 15-Nov-2013  msaitoh Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.32 02-Jul-2013  christos - remove unused variable
- call rdmsr() twice to avoid the 5.4210108624275222e-18% probability that
rdmsr() returns 0.
From dsl@
 1.31 27-Jun-2013  christos branches: 1.31.2;
detect a bad msr tsc and don't use it.
 1.30 08-Aug-2011  jmcneill branches: 1.30.2; 1.30.12;
revert previous
 1.29 08-Aug-2011  jmcneill If the USE_PLATFORM_CLOCK flag is set in the FADT, it indicates that OSPM
should use a platform provided timer (either HPET or the PM timer). A
platform may set this flag if internal processor clock(s) cannot provide
consistent monotonically non-decreasing counters. Set TSC quality to -100
if this flag is set.
 1.28 02-Feb-2011  bouyer Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@
 1.27 21-Aug-2010  jruoho branches: 1.27.2; 1.27.4;
Use a constant from <x86/specialreg.h>.
 1.26 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.25 27-Mar-2009  drochner branches: 1.25.2; 1.25.4;
Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)
 1.24 19-Dec-2008  ad branches: 1.24.2;
Bah, re-apply. I still see errors without it.
 1.23 19-Dec-2008  ad Back out previous. atomic_cas_64() doesn't work during early boot. I'll
fix that instead.
 1.22 15-Dec-2008  ad More paranoia.
 1.21 25-Nov-2008  ad A fix for failed TSC calibration with the message "ERROR: %lld cycle TSC
drift observed".

On i386, cpu_cc_skew was written out as two 32-bit words. If unlucky, the
boot processor could read the whole 64-bit value after only 32-bits of the
update were written back to main memory.
 1.20 31-Aug-2008  fvdl branches: 1.20.2; 1.20.4;
If tsc_freq is 0 (probably due to bad virtualization, as currently
seen under VirtualBox), don't try to use TSC as a timecounter source
to avoid trouble. Matches the FreeBSD behavior.
 1.19 11-May-2008  ad branches: 1.19.4;
Fix a potential hang during skew detection (not observed).
 1.18 10-May-2008  ad Assume that TSC is stable on P-II and P-III Xeons, since systems with those
CPUs are likely to have a TSC-friendly configuration.
 1.17 10-May-2008  ad TSC should also be enabled for intel f03.
 1.16 10-May-2008  ad Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.
 1.15 28-Apr-2008  martin branches: 1.15.2;
Remove clause 3 and 4 from TNF licenses
 1.14 27-Apr-2008  ad Make preemption safe.
 1.13 16-Apr-2008  cegger branches: 1.13.2; 1.13.4;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.12 10-Mar-2008  ad Implement an optimized, preemption-safe asm version of tsc_get_timecount().
The C version needs work to be preemption safe. Cuts the clock cycles
for microtime() from 950 down to 300 on a Pentium D.
 1.11 14-Nov-2007  ad branches: 1.11.10; 1.11.14;
Use i8254_delay().
 1.10 16-Nov-2006  christos branches: 1.10.8; 1.10.26; 1.10.28; 1.10.32; 1.10.34;
__unused removal on arguments; approved by core.
 1.9 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.8 23-Sep-2006  xtraeme Protect opt_powernow_k7.h with ifdef i386, I forget that it's only
available on i386...
 1.7 23-Sep-2006  xtraeme Do not use the TSC in POWERNOW_K[78] case (same behaviour than
enhanced speedstep), closes PR port-amd64/34550.
 1.6 03-Sep-2006  christos branches: 1.6.2; 1.6.4; 1.6.6;
add missing initializers
 1.5 13-Jun-2006  dogcow branches: 1.5.4; 1.5.6; 1.5.8;
...and since amd64 doesn't actually have enhanced speedstep (yet!), only
include the opt file on i386.
 1.4 13-Jun-2006  dogcow we need opt_enhanced_speedstep.h to get whether ENHANCED_SPEEDSTEP is defined.
 1.3 10-Jun-2006  kardel Don't pick TSC by default on systems the have ENHANCED_SPEEDSTEP
compiled in. Many TSC's out there are sensitive to cpu frequency
changes. On these platforms we need to use other fixed frequency
timers (e. g. ACPI_PM_TIMER). Maybe we should add detection code
here whether TSC is sensible to cpu frequency changes.
 1.2 07-Jun-2006  kardel add timecounter support (from branch simonb-timecounters)
 1.1 30-Apr-2006  kardel branches: 1.1.2;
file tsc.c was initially added on branch simonb-timecounters.
 1.1.2.1 30-Apr-2006  kardel - provide TSC for UNI and MP systems
(shared between i386 and amd64)
 1.5.8.3 03-Sep-2006  yamt sync with head.
 1.5.8.2 26-Jun-2006  yamt sync with head.
 1.5.8.1 13-Jun-2006  yamt file tsc.c was added on branch yamt-pdpolicy on 2006-06-26 12:45:40 +0000
 1.5.6.5 17-Mar-2008  yamt sync with head.
 1.5.6.4 15-Nov-2007  yamt sync with head.
 1.5.6.3 30-Dec-2006  yamt sync with head.
 1.5.6.2 21-Jun-2006  yamt sync with head.
 1.5.6.1 13-Jun-2006  yamt file tsc.c was added on branch yamt-lazymbuf on 2006-06-21 14:58:06 +0000
 1.5.4.2 19-Jun-2006  chap Sync with head.
 1.5.4.1 13-Jun-2006  chap file tsc.c was added on branch chap-midi on 2006-06-19 03:45:15 +0000
 1.6.6.2 10-Dec-2006  yamt sync with head.
 1.6.6.1 22-Oct-2006  yamt sync with head
 1.6.4.2 09-Sep-2006  rpaulo sync with head
 1.6.4.1 03-Sep-2006  rpaulo file tsc.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.6.2.1 18-Nov-2006  ad Sync with head.
 1.10.34.1 19-Nov-2007  mjf Sync with HEAD.
 1.10.32.1 18-Nov-2007  bouyer Sync with HEAD
 1.10.28.2 23-Mar-2008  matt sync with HEAD
 1.10.28.1 09-Jan-2008  matt sync with HEAD
 1.10.26.1 14-Nov-2007  joerg Sync with HEAD.
 1.10.8.1 03-Dec-2007  ad Sync with HEAD.
 1.11.14.4 17-Jan-2009  mjf Sync with HEAD.
 1.11.14.3 28-Sep-2008  mjf Sync with HEAD.
 1.11.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.11.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.11.10.1 24-Mar-2008  keiichi sync with head.
 1.13.4.5 09-Oct-2010  yamt sync with head
 1.13.4.4 11-Aug-2010  yamt sync with head.
 1.13.4.3 04-May-2009  yamt fix a merge botch.
 1.13.4.2 04-May-2009  yamt sync with head.
 1.13.4.1 16-May-2008  yamt sync with head.
 1.13.2.1 18-May-2008  yamt sync with head.
 1.15.2.3 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.15.2.2 24-Jun-2008  wrstuden Hand-merge files that didn't merge right in recent sync w/ current.
 1.15.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.19.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.19.4.1 19-Oct-2008  haad Sync with HEAD.
 1.20.4.4 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #362):
sys/arch/x86/x86/tsc.c: revision 1.24
Bah, re-apply. I still see errors without it.
 1.20.4.3 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #362):
sys/arch/x86/x86/tsc.c: revision 1.23
Back out previous. atomic_cas_64() doesn't work during early boot. I'll
fix that instead.
 1.20.4.2 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #362):
sys/arch/x86/x86/tsc.c: revision 1.22
More paranoia.
 1.20.4.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #362):
sys/arch/x86/x86/tsc.c: revision 1.21
A fix for failed TSC calibration with the message "ERROR: %lld cycle TSC
drift observed".
On i386, cpu_cc_skew was written out as two 32-bit words. If unlucky, the
boot processor could read the whole 64-bit value after only 32-bits of the
update were written back to main memory.
 1.20.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.20.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.24.2.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.24.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.24.2.3 24-Oct-2010  jym Sync with HEAD
 1.24.2.2 01-Nov-2009  jym Sync with HEAD.
 1.24.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.4.2 05-Mar-2011  rmind sync with head
 1.25.4.1 30-May-2010  rmind sync with head
 1.25.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.25.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.27.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.27.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.30.12.2 03-Dec-2017  jdolecek update from HEAD
 1.30.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.30.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.31.2.2 18-May-2014  rmind sync with head
 1.31.2.1 28-Aug-2013  rmind sync with head
 1.36.22.2 05-Aug-2020  martin Pull up the following revisions, requested by msaitoh in ticket #1593:

sys/arch/x86/conf/files.x86 1.108
sys/arch/x86/include/apicvar.h 1.7 via patch
sys/arch/x86/include/cpu.h 1.121
sys/arch/x86/x86/cpu.c 1.185 via patch
sys/arch/x86/x86/hyperv.c 1.7
sys/arch/x86/x86/tsc.c 1.41
sys/arch/xen/conf/files.xen 1.181

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.36.22.1 15-Apr-2020  martin Pull up the following, requested by msaitoh in ticket #1530:

sys/arch/x86/x86/procfs_machdep.c 1.33-1.36
sys/arch/x86/x86/tsc.c 1.40
sys/arch/x86/x86/specialreg.h 1.159-1.161
usr.sbin/cpuctl/arch/i386.c 1.109-1.110 via patch

- Print avx512ifma, cqm_mbm_total, cqm_mbm_local, waitpkg, rdpru,
Fast Short Rep Mov(fsrm), AVX512_VP2INTERSECT, SERIALIZE and
TSXLDTRK.
- Rename CPUID Fn8000_0007 %edx bit 8 from "TSC" to "ITSC"
(Invariant TSC) to avoid confusion.
- Print CPUID 0x80000007 %edx on both Intel and AMD.
- Remove ci_max_ext_cpuid from usr.sbin/cpuctl/arch/i386.c because it's
the same as ci_cpuid_extlevel.
- Use unsigned to avoid undefined behavior in procfs_getonefeatreg().
 1.37.10.1 29-Feb-2020  ad Sync with head.
 1.37.8.2 15-Jul-2020  martin Pull up the following, requested by msaitoh in ticket #1015

sys/arch/x86/conf/files.x86 1.108 (via patch)
sys/arch/x86/include/apicvar.h 1.7 (via patch)
sys/arch/x86/include/cpu.h 1.121 (via patch)
sys/arch/x86/x86/cpu.c 1.185 (via patch)
sys/arch/x86/x86/hyperv.c 1.7 (via patch)
sys/arch/x86/x86/tsc.c 1.41 (via patch)
sys/arch/xen/conf/files.xen 1.181 (via patch)

Get TSC frequency from CPUID 0x15 and/or x16 if it's available.
This change fixes a problem that newer Intel processors' timer
counts very slowly.
 1.37.8.1 14-Apr-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #833):

usr.sbin/cpuctl/arch/i386.c: revision 1.109
sys/arch/x86/include/specialreg.h: revision 1.159
usr.sbin/cpuctl/arch/i386.c: revision 1.110
sys/arch/x86/include/specialreg.h: revision 1.160
sys/arch/x86/include/specialreg.h: revision 1.161
sys/arch/x86/x86/tsc.c: revision 1.40
sys/arch/x86/x86/procfs_machdep.c: revision 1.35
sys/arch/x86/x86/procfs_machdep.c: revision 1.36

Add Fast Short Rep Mov(fsrm).

Add AVX512_VP2INTERSECT, SERIALIZE and TSXLDTRK(TSX suspend load addr tracking)

CPUID Fn00000001 %edx bit 8 is printed as "TSC", so rename CPUID Fn8000_0007
%edx bit 8 from "TSC" to "ITSC" (Invariant TSC) to avoid confusion.

Rename CPUID_APM_TSC to CPUID_APM_ITSC. No functional change.

Remove ci_max_ext_cpuid because it's the same as ci_cpuid_extlevel.

Print CPUID 0x80000007 %edx on both Intel and AMD.
 1.37.4.2 21-Apr-2020  martin Sync with HEAD
 1.37.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.40.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.52.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.54.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.57.4.1 02-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #915):

sys/arch/x86/x86/tsc.c: revision 1.59
sys/arch/x86/x86/tsc.c: revision 1.60

make TSC get a quality of -100 on AMD Family 15h and 16h
this should "fix" PR#56322 and is known as AMD errata
"778: Processor Core Time Stamp Counters May Experience Drift"

remove unintended printf() in previous. (thx dh)
 1.60.2.1 02-Aug-2025  perseant Sync with HEAD
 1.8 01-Jun-2021  riastradh x86: Reset cached tsc in every lwp to 0 on suspend/resume.

This avoids spuriously warning about tsc going backwards, which is to
be expected after a suspend/resume cycle.
 1.7 15-Jun-2020  msaitoh branches: 1.7.6;
Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.
 1.6 02-Oct-2017  maxv Add a machdep.tsc_user_enable sysctl, to enable/disable the rdtsc
instruction in usermode. It defaults to enabled.
 1.5 11-Dec-2013  msaitoh Make new function named tsc_is_invariant() to avoid code duplication.
The behavior of acpicpu_md_flags() will change on some CPUs because
the detecting code of invariant TSC is replaced with newer code.
 1.4 10-May-2008  ad branches: 1.4.32; 1.4.42; 1.4.48;
Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.
 1.3 28-Apr-2008  martin branches: 1.3.2;
Remove clause 3 and 4 from TNF licenses
 1.2 07-Jun-2006  kardel branches: 1.2.4; 1.2.6; 1.2.8; 1.2.14; 1.2.68; 1.2.70; 1.2.72;
add timecounter support (from branch simonb-timecounters)
 1.1 30-Apr-2006  kardel branches: 1.1.2;
file tsc.h was initially added on branch simonb-timecounters.
 1.1.2.1 30-Apr-2006  kardel - provide TSC for UNI and MP systems
(shared between i386 and amd64)
 1.2.72.1 16-May-2008  yamt sync with head.
 1.2.70.1 18-May-2008  yamt sync with head.
 1.2.68.1 02-Jun-2008  mjf Sync with HEAD.
 1.2.14.2 09-Sep-2006  rpaulo sync with head
 1.2.14.1 07-Jun-2006  rpaulo file tsc.h was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:44:49 +0000
 1.2.8.2 26-Jun-2006  yamt sync with head.
 1.2.8.1 07-Jun-2006  yamt file tsc.h was added on branch yamt-pdpolicy on 2006-06-26 12:45:40 +0000
 1.2.6.2 21-Jun-2006  yamt sync with head.
 1.2.6.1 07-Jun-2006  yamt file tsc.h was added on branch yamt-lazymbuf on 2006-06-21 14:58:06 +0000
 1.2.4.2 19-Jun-2006  chap Sync with head.
 1.2.4.1 07-Jun-2006  chap file tsc.h was added on branch chap-midi on 2006-06-19 03:45:15 +0000
 1.3.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.4.48.1 18-May-2014  rmind sync with head
 1.4.42.2 03-Dec-2017  jdolecek update from HEAD
 1.4.42.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.32.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.18 12-Feb-2011  jmcneill Don't rely on vga_post_call being called before vga_post_set_vbe
 1.17 12-Nov-2010  uebayasi branches: 1.17.2; 1.17.4;
Pull in uvm/uvm.h where UVM's page level interface is used.
 1.16 03-Oct-2010  rmind vga_post_init: fix a bug and memleak in error path.
 1.15 28-Jun-2010  rmind Add missing pmap_update() in vga_post_init(), remove wrong pmap_kremove()
in error path, and fix pmap_update() in vga_post_set_vbe().
 1.14 07-Nov-2009  cegger branches: 1.14.2; 1.14.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.13 24-Aug-2009  jmcneill Add vga_post_set_vbe for setting video mode.
 1.12 15-Mar-2009  cegger ansify function definitions
 1.11 08-Sep-2008  joerg branches: 1.11.2; 1.11.8;
Make the amount of backing memory a macro to simplify changing it.
 1.10 04-Jun-2008  ad branches: 1.10.4;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.9 29-Mar-2008  jmcneill branches: 1.9.2; 1.9.4; 1.9.6;
Add RCSID to top of file.
 1.8 13-Mar-2008  drochner if "ddb_vgapost" is called but its data structures were not initialized,
print a message, suggested by joerg
(the message might be invisible if the console is in X mode)
 1.7 12-Mar-2008  drochner -add a function to vga_post which can be called from DDB to get a
usable VGA console ("call ddb_vgapost")
-allow to switch from/to screens occupied by an X server if the graphics
device is console and in polling mode (ie DDB)
This together allows to get a DDB session on a VGA console if the
system crashed while X11 was running.
As long as the protocol to tell X servers about virtual screen switches
is as primitive as it is, it is unsafe to restart an X session afterwards.
So this is basically for crash analysis.
 1.6 15-Jan-2008  drochner branches: 1.6.2; 1.6.4; 1.6.6; 1.6.10;
vga_post_init(): allow the memory mapped to the emulator VM to be
physically non-contiguous, and do some cosmetics
approved by joerg
 1.5 15-Jan-2008  joerg Revert last and explicitly cast to paddr_t instead to fix the warning.
 1.4 15-Jan-2008  martin avoid warning on a too large constant
 1.3 14-Jan-2008  xtraeme Missing __KERNEL_RCSID().
 1.2 14-Jan-2008  joerg Allocate 64KB as low memory for the BIOS to write to. Copy the interrupt
table and the BIOS data area before each POST. Allows the Dell of Martin
Husemann to use VGA_POST.
 1.1 25-Dec-2007  joerg branches: 1.1.2; 1.1.4; 1.1.6;
Add initial version of calling VGA POST from vga_resume. This is the
equivalent to "vbetool post" using x86emu in the kernel.
 1.1.6.3 23-Mar-2008  matt sync with HEAD
 1.1.6.2 09-Jan-2008  matt sync with HEAD
 1.1.6.1 25-Dec-2007  matt file vga_post.c was added on branch matt-armv6 on 2008-01-09 01:50:00 +0000
 1.1.4.3 19-Jan-2008  bouyer Sync with HEAD
 1.1.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.4.1 25-Dec-2007  bouyer file vga_post.c was added on branch bouyer-xeni386 on 2008-01-02 21:51:28 +0000
 1.1.2.2 26-Dec-2007  ad Sync with head.
 1.1.2.1 25-Dec-2007  ad file vga_post.c was added on branch vmlocking2 on 2007-12-26 19:17:18 +0000
 1.6.10.3 28-Sep-2008  mjf Sync with HEAD.
 1.6.10.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.6.10.1 03-Apr-2008  mjf Sync with HEAD.
 1.6.6.1 24-Mar-2008  keiichi sync with head.
 1.6.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.6.4.1 15-Jan-2008  mjf file vga_post.c was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.6.2.3 17-Mar-2008  yamt sync with head.
 1.6.2.2 21-Jan-2008  yamt sync with head
 1.6.2.1 15-Jan-2008  yamt file vga_post.c was added on branch yamt-lazymbuf on 2008-01-21 09:40:18 +0000
 1.9.6.2 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.9.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.9.4.5 09-Oct-2010  yamt sync with head
 1.9.4.4 11-Aug-2010  yamt sync with head.
 1.9.4.3 11-Mar-2010  yamt sync with head
 1.9.4.2 16-Sep-2009  yamt sync with head
 1.9.4.1 04-May-2009  yamt sync with head.
 1.9.2.1 17-Jun-2008  yamt sync with head.
 1.10.4.1 19-Oct-2008  haad Sync with HEAD.
 1.11.8.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.11.8.4 10-Jan-2011  jym Sync with HEAD
 1.11.8.3 24-Oct-2010  jym Sync with HEAD
 1.11.8.2 01-Nov-2009  jym Sync with HEAD.
 1.11.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.14.4.2 05-Mar-2011  rmind sync with head
 1.14.4.1 03-Jul-2010  rmind sync with head
 1.14.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.14.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.17.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.17.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.35 22-May-2022  riastradh opencrypto: Make freesession callback return void.

No functional change intended: all drivers already return zero
unconditionally.
 1.34 22-May-2022  riastradh padlock(4): Prune dead branches. Assert session id validity.
 1.33 22-May-2022  riastradh padlock(4): Return zero, not error, if we've issued crypto_done.
 1.32 22-May-2022  andvar fix various small typos, mainly in comments.
 1.31 29-Jun-2020  riastradh Make padlock(4) compile on amd64.
 1.30 29-Jun-2020  riastradh padlock(4): Remove legacy rijndael API use.

This doesn't actually need to compute AES -- it just needs the
standard AES key schedule, so use the BearSSL constant-time key
schedule implementation.

XXX Compile-tested only.
XXX The byte-order business here seems highly questionable.
 1.29 14-Jun-2020  riastradh padlock(4): Don't use prev msg's last block as IV for next msg in CBC.

This violates the security contract of the CBC construction, which
requires that the IV be unpredictable in advance; an adaptive adversary
can exploit this to verify plaintext guesses.

XXX Compile-tested only.
 1.28 07-Mar-2020  maya Fold constant. err is always 0, so switch to return 0;
 1.27 07-Mar-2020  fcambus Return error values directly where appropriate, instead of using the err
variable.
 1.26 14-Jul-2018  maxv Add splhigh() around the FPU code, we don't want to be preempted in the
middle, this could corrupt the FPU state and trigger undefined behavior.

Intentionally use splhigh and not kpreempt_disable, to match the generic
x86 FPU code.

Compile-tested only (I don't have VIA).

Found by Maya almost a year ago.
 1.25 27-Feb-2016  tls branches: 1.25.16; 1.25.18;
Remove callout-based RNG support in VIA crypto driver; add VIA RNG backend for cpu_rng.
 1.24 13-Apr-2015  riastradh Convert arch/x86 to use <sys/rnd*.h>. Omit needless includes.
 1.23 16-Nov-2014  ozaki-r branches: 1.23.2;
Replace callout_stop with callout_halt

In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.

Discussed with martin@ and riastradh@.
 1.22 10-Aug-2014  tls branches: 1.22.2;
Merge tls-earlyentropy branch into HEAD.
 1.21 02-Feb-2012  tls branches: 1.21.2; 1.21.6; 1.21.20;
Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.20 17-Jan-2012  jakllsch In addition to %[er]ax, rep xstore-rng also clobbers %[er]cx and %[er]di.
As such, mark them as outputs, as is done in the VIA Padlock example code.
Additionally, let's assume that VIAC3_RNG_BUFSIZ is in bytes and not DWords.
Furthermore assume that there are not 1 but NBBY bits of entropy per byte.

Fixes PR kern/45847 for me.
 1.19 17-Jan-2012  jakllsch leading whitespace too!
 1.18 17-Jan-2012  jakllsch drop trailing whitespace
 1.17 28-Nov-2011  tls branches: 1.17.2;
Fix one last dangling use of arc4randbytes().
 1.16 19-Nov-2011  tls First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.15 24-May-2011  drochner branches: 1.15.4;
move the "context size" struct member (which is a pure software
implementation thing) from the abstract xform descriptor to
the cryptosoft implementation part -- for sanity, and now clients
of opencrypto don't depend on headers of cipher implementations anymore
 1.14 19-Feb-2011  jmcneill modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module
 1.13 22-Apr-2010  jym branches: 1.13.2; 1.13.4;
Uses cpu_feature, so include <machine/cpuvar.h>
 1.12 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.11 01-Apr-2009  tls branches: 1.11.2; 1.11.4;
Fix probe for VIA C3 and successors -- these are CPU family 6, not 5.
The broken probe was causing the VIA padlock driver to never attach!
Now we can see that its AES appears to be broken -- it makes FAST_IPSEC
ESP not work, on systems where it works fine with cryptosoft.

Rework code to detect and (if necessary) enable VIA crypto and RNG.
Add RNG support to VIA padlock driver. In the process, have a quick
go at debugging the AES support but no luck thus far.
 1.10 17-Dec-2008  cegger branches: 1.10.2;
kill MALLOC and FREE macros.
 1.9 16-Apr-2008  cegger branches: 1.9.4; 1.9.12; 1.9.14; 1.9.20;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.8 02-Feb-2008  tls branches: 1.8.6;
From Darran Hunt at Coyote Point: don't truncate HMAC to 96 bits unless
actually asked to.

Fixed in FreeBSD a while ago, discussed on tech-kern and tech-crypto.
 1.7 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.6 17-Oct-2007  garbled branches: 1.6.2; 1.6.8;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.5 03-Jul-2007  christos branches: 1.5.10;
Support for VIA Esther (From FreeBSD)
 1.4 21-Mar-2007  xtraeme branches: 1.4.4;
Add missing $ in the RCS ID.
 1.3 11-Mar-2007  christos branches: 1.3.2; 1.3.4;
more caddr_t lossage
 1.2 04-Mar-2007  christos branches: 1.2.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.1 17-Feb-2007  daniel branches: 1.1.2; 1.1.4;
Add an opencrypto provider for the AES xcrypt instructions found on VIA
C5P and later cores (also known as 'ACE', which is part of the VIA PadLock
security engine). Ported from OpenBSD.

Reviewed on tech-crypto and port-i386, no objections to commiting this.
 1.1.4.5 04-Feb-2008  yamt sync with head.
 1.1.4.4 21-Jan-2008  yamt sync with head
 1.1.4.3 03-Sep-2007  yamt sync with head.
 1.1.4.2 26-Feb-2007  yamt sync with head.
 1.1.4.1 17-Feb-2007  yamt file via_padlock.c was added on branch yamt-lazymbuf on 2007-02-26 09:08:52 +0000
 1.1.2.3 24-Mar-2007  yamt sync with head.
 1.1.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.1.2.1 17-Feb-2007  rmind file via_padlock.c was added on branch yamt-idlelwp on 2007-03-12 05:51:47 +0000
 1.2.2.3 15-Jul-2007  ad Sync with head.
 1.2.2.2 10-Apr-2007  ad Sync with head.
 1.2.2.1 13-Mar-2007  ad Sync with head.
 1.3.4.1 29-Mar-2007  reinoud Pullup to -current
 1.3.2.1 11-Jul-2007  mjf Sync with head.
 1.4.4.1 03-Oct-2007  garbled Sync with HEAD
 1.5.10.3 23-Mar-2008  matt sync with HEAD
 1.5.10.2 09-Jan-2008  matt sync with HEAD
 1.5.10.1 06-Nov-2007  matt sync with HEAD
 1.6.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.6.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.8.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.8.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.20.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.9.14.1 20-May-2010  snj Apply patch (requested by sborrill in ticket #1404):
Fix build of the i386 ALL kernel.
 1.9.12.2 28-Apr-2009  skrll Sync with HEAD.
 1.9.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.9.4.2 11-Aug-2010  yamt sync with head.
 1.9.4.1 04-May-2009  yamt sync with head.
 1.10.2.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.10.2.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.10.2.3 24-Oct-2010  jym Sync with HEAD
 1.10.2.2 01-Nov-2009  jym Sync with HEAD.
 1.10.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.4.3 31-May-2011  rmind sync with head
 1.11.4.2 05-Mar-2011  rmind sync with head
 1.11.4.1 30-May-2010  rmind sync with head
 1.11.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.13.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.13.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.15.4.1 17-Apr-2012  yamt sync with head
 1.17.2.1 18-Feb-2012  mrg merge to -current.
 1.21.20.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.21.6.2 03-Dec-2017  jdolecek update from HEAD
 1.21.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.21.2.1 07-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #1201):
sys/kern/kern_ktrace.c: revision 1.166
sys/dev/isa/aps.c: revision 1.16
sys/dev/sysmon/sysmonvar.h: revision 1.45
sys/dev/ir/irframe_tty.c: revision 1.60
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.111-1.112 (patch)
sys/dev/pci/pccbb.c: revision 1.207
sys/dev/wscons/wskbd.c: revision 1.135
sys/dev/usb/ohci.c: revision 1.254
sys/net/if_ecosubr.c: revision 1.41
sys/dev/pcmcia/btbc.c: revision 1.17
sys/arch/x86/x86/via_padlock.c: revision 1.23
sys/dev/sdmmc/sdmmc.c: revision 1.23 (patch)
sys/dev/bluetooth/btkbd.c: revision 1.17
sys/dev/bluetooth/bcsp.c: revision 1.25
sys/arch/x86/pci/fwhrng.c: revision 1.8
sys/dev/ic/nslm7x.c: revision 1.61
share/man/man9/callout.9: revision 1.28 (patch)

Replace callout_stop with callout_halt and ensure the callout
is not running before destroying it.
 1.22.2.1 01-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #279):
sys/kern/kern_ktrace.c: revision 1.166
sys/dev/isa/aps.c: revision 1.16
sys/dev/sysmon/sysmonvar.h: revision 1.45
sys/dev/ir/irframe_tty.c: revision 1.60
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.111
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.112
sys/dev/pci/pccbb.c: revision 1.207
sys/dev/wscons/wskbd.c: revision 1.135
sys/dev/usb/ohci.c: revision 1.254
sys/net/if_ecosubr.c: revision 1.41
sys/dev/pcmcia/btbc.c: revision 1.17
sys/arch/x86/x86/via_padlock.c: revision 1.23
sys/dev/sdmmc/sdmmc.c: revision 1.23
sys/dev/bluetooth/btkbd.c: revision 1.17
sys/dev/bluetooth/bcsp.c: revision 1.25
sys/arch/x86/pci/fwhrng.c: revision 1.8
sys/dev/ic/nslm7x.c: revision 1.61
share/man/man9/callout.9: revision 1.28
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
Discussed with martin@ and riastradh@.
Make it clear that we should use not callout_stop but callout_halt
before callout_destroy
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
In this case, we need to pass an interlock to callout_halt to wait for
the callout complete.
Reviewed by riastradh@.
Kill sme_callout_mtx and use sme_mtx instead
We can use sme_mtx for the callout as well. Actually we should do so
because sme_events_list and some other data that are touched in the
callout should be protected by sme_mtx, not sme_callout_mtx.
Discussed with riastradh@ in
http://mail-index.netbsd.org/tech-kern/2014/11/11/msg017956.html
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
In this case, we need to pass an interlock to callout_halt to wait for
the callout complete. And also we make sure that SME_CALLOUT_INITIALIZED
is unset before calling callout_halt to prevent the callout from calling
callout_schedule. This is the same as what we did in sys/netinet6/mld6.c@1.61.
Reviewed by riastradh@.
 1.23.2.2 19-Mar-2016  skrll Sync with HEAD
 1.23.2.1 06-Jun-2015  skrll Sync with HEAD
 1.25.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.25.18.1 10-Jun-2019  christos Sync with HEAD
 1.25.16.1 28-Jul-2018  pgoyette Sync with HEAD
 1.11 30-Apr-2024  andvar viac7temp(4): rewrite temperature sensor to read value from MSR instead of using
documented cpuid instruction and eax register.

This approach is adapted from linux via-cputemp.c, no official documentation is
currently available. However, msr value seems to work on all tested CPUs while
documented cpuid instruction typically reports 0, even for my C7-D CPU.
msr value seems to have temperature in Celsius in lower 24-bits without fraction
(thus "msr & 0xffffff;" is used).

Tested on my personal systems based on CPUs below (i386 and amd64):
C7-D 1.6GHz (i386 only), Nano X2 L4350E, Nano X2 U4300, U2300 Nano, KX-U6580.
Also got one response via email which was based on Nano X2 L4050 (VE-900).
Nano reports independent values for each core.
KX-U6580 seems to show the same value for all cores but more testing is needed.

Since it works on amd64 capable CPUs, adding driver to GENERIC kernel config.
Also moving viac7temp man page to x86 instead of i386 (with updates).
In theory the change should add support for all VIA Nano CPUs and Zhaoxin CPUs
at least up to KX-6000(G) series.

In the future I may need to introduce amd64 kernel module as well.

Plan to pullup to at least netbsd-10.

Patch mainly reviewed by riastradh.
 1.10 13-Apr-2024  andvar viac7temp(4): define module metadata using MODULE() macro and implement
viac7temp_modcmd() to handle module load/unload events.

Fixes PR kern/58148. Look OK by mrg@.

XXX pullup-10, -9, -8
 1.9 07-Oct-2021  msaitoh branches: 1.9.4;
KNF. No functional change.
 1.8 10-Aug-2014  tls branches: 1.8.20; 1.8.32;
Merge tls-earlyentropy branch into HEAD.
 1.7 15-Nov-2013  msaitoh branches: 1.7.2;
Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
 1.6 20-Jun-2011  pgoyette branches: 1.6.2; 1.6.12; 1.6.16;
Initialize sensor state before registering.
 1.5 24-Feb-2011  jruoho branches: 1.5.2;
Fix autoconf(9) of cpufeaturebus.
 1.4 24-Feb-2011  jruoho Move VIA_C7TEMP to the cpufeaturebus.
 1.3 13-Feb-2011  jmcneill don't pass a NULL ci to xc_unicast
 1.2 14-Mar-2010  pgoyette branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Remove setting of the edata->monitor since that member no longer exists.
 1.1 02-Oct-2009  jmcneill branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
Add support for VIA C7 temperature sensors (options VIA_C7TEMP)
 1.1.8.3 11-Aug-2010  yamt sync with head.
 1.1.8.2 11-Mar-2010  yamt sync with head
 1.1.8.1 02-Oct-2009  yamt file viac7temp.c was added on branch yamt-nfs-mp on 2010-03-11 15:03:09 +0000
 1.1.6.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.1.4.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.1.4.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.1.4.3 24-Oct-2010  jym Sync with HEAD
 1.1.4.2 01-Nov-2009  jym Sync with HEAD.
 1.1.4.1 02-Oct-2009  jym file viac7temp.c was added on branch jym-xensuspend on 2009-11-01 13:58:19 +0000
 1.1.2.2 05-Oct-2009  sborrill Pull up the following revisions(s) (requested by jmcneill in ticket #1061):
sys/arch/x86/conf/files.x86: revision 1.53
sys/arch/x86/include/cpuvar.h: revision 1.31
sys/arch/x86/x86/identcpu.c: revision 1.17
sys/arch/x86/x86/viac7temp.c: revision 1.1
sys/arch/i386/conf/ALL: revision 1.218
sys/arch/i386/conf/GENERIC: revision 1.949
Add support for VIA C7 temperature sensors (options VIA_C7TEMP) and enable
in i386 GENERIC kernel.
 1.1.2.1 02-Oct-2009  sborrill file viac7temp.c was added on branch netbsd-5 on 2009-10-05 11:37:14 +0000
 1.2.8.2 05-Mar-2011  bouyer Sync with HEAD
 1.2.8.1 17-Feb-2011  bouyer Sync with HEAD
 1.2.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.2.4.2 21-Apr-2010  matt sync to netbsd-5
 1.2.4.1 14-Mar-2010  matt file viac7temp.c was added on branch matt-nb5-mips64 on 2010-04-21 00:33:46 +0000
 1.2.2.1 05-Mar-2011  rmind sync with head
 1.5.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.6.16.1 18-May-2014  rmind sync with head
 1.6.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.2.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.8.32.1 18-Apr-2024  martin Pull up following revision(s) (requested by andvar in ticket #1835):

sys/arch/x86/x86/viac7temp.c: revision 1.10

viac7temp(4): define module metadata using MODULE() macro and implement
viac7temp_modcmd() to handle module load/unload events.

Fixes PR kern/58148. Look OK by mrg@.
 1.8.20.1 18-Apr-2024  martin Pull up following revision(s) (requested by andvar in ticket #1959):

sys/arch/x86/x86/viac7temp.c: revision 1.10

viac7temp(4): define module metadata using MODULE() macro and implement
viac7temp_modcmd() to handle module load/unload events.

Fixes PR kern/58148. Look OK by mrg@.
 1.9.4.2 20-Jun-2024  martin Pull up following revision(s) (requested by andvar in ticket #700):

sys/arch/i386/conf/GENERIC: revision 1.1256
share/man/man4/man4.x86/viac7temp.4: revision 1.1
sys/arch/x86/x86/viac7temp.c: revision 1.11
share/man/man4/man4.x86/Makefile: revision 1.24
share/man/man4/man4.i386/viac7temp.4: file removal
share/man/man4/man4.i386/Makefile: revision 1.81
distrib/sets/lists/man/mi: revision 1.1773
sys/arch/amd64/conf/GENERIC: revision 1.612
sys/arch/amd64/conf/ALL: revision 1.188
sys/arch/i386/conf/ALL: revision 1.519

viac7temp(4): rewrite temperature sensor to read value from MSR instead of using
documented cpuid instruction and eax register.

This approach is adapted from linux via-cputemp.c, no official documentation is
currently available. However, msr value seems to work on all tested CPUs while
documented cpuid instruction typically reports 0, even for my C7-D CPU.
msr value seems to have temperature in Celsius in lower 24-bits without fraction
(thus "msr & 0xffffff;" is used).

Tested on my personal systems based on CPUs below (i386 and amd64):
C7-D 1.6GHz (i386 only), Nano X2 L4350E, Nano X2 U4300, U2300 Nano, KX-U6580.
Also got one response via email which was based on Nano X2 L4050 (VE-900).

Nano reports independent values for each core.

KX-U6580 seems to show the same value for all cores but more testing is needed.

Since it works on amd64 capable CPUs, adding driver to GENERIC kernel config.

Also moving viac7temp man page to x86 instead of i386 (with updates).

In theory the change should add support for all VIA Nano CPUs and Zhaoxin CPUs
at least up to KX-6000(G) series.

In the future I may need to introduce amd64 kernel module as well.

Patch mainly reviewed by riastradh.
 1.9.4.1 18-Apr-2024  martin Pull up following revision(s) (requested by andvar in ticket #662):

sys/arch/x86/x86/viac7temp.c: revision 1.10

viac7temp(4): define module metadata using MODULE() macro and implement
viac7temp_modcmd() to handle module load/unload events.

Fixes PR kern/58148. Look OK by mrg@.
 1.47 24-Apr-2025  riastradh amd64: Allocate FPU save state outside pcb if it's too large.

We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as
large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte
state.

We only do this for user threads, and only on machines where it's
necessary, to avoid incurring much overhead. There is still a tiny
bit of overhead when saving and restoring the FPU state by using a
pointer indirection instead of arithmetic indirection for access to
struct pcb::pcb_savefpu, but this is probably a drop in the bucket
compared to the memory traffic incurred by the FPU state save/restore
anyway.

For now, these paths are mostly disabled on i386. We could enable
them but it will require either rewriting cpu_uarea_alloc/free for
i386, or adopting a guard page like amd64 does, which might be costly
and so should be undertaken only with some thought and care. And
since Intel AMX instructions only work in 64-bit mode, it's not
likely to be useful on i386.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu

These changes, as a side effect, may fix:

PR kern/57258: kthread_fpu_enter/exit problem

by making sure to allocate an FPU save space that is large enough to
guarantee fpu_kern_enter/leave work safely, instead of just using a
union savefpu object on the stack (which, at 576 bytes, may be too
small on some machines, particularly with AVX512 requiring ~2.5K).
(But we'll have to do some extra work with kthread_fpu_enter/exit_md
-- if we try doing them again on x86 -- to actually allocate the
separate pcb on these machines!)
 1.46 06-Oct-2023  skrll branches: 1.46.6;
Convert the l2->l_md.md_astpending assignments into KASSERTs.

l_md is zeroised by lwp_create with

memset(&l2->l_startzero, 0, sizeof(*l2) -
offsetof(lwp_t, l_startzero));
 1.45 28-Mar-2021  skrll fix a comment that has been c&p'ed around and not updated
 1.44 30-Nov-2020  msaitoh branches: 1.44.2;
s/ we we / we /
 1.43 03-Jul-2020  maxv branches: 1.43.2;
In cpu_uarea_{alloc,free}:

- My previous change in this file was not correct, kremove does not free
the underlying PA, which caused a very slow leak under memory pressure.
Rework to correctly free the PA.
- Add a second redzone, this time after the stack, to catch several stack
overflows. The main concern is read overflows which leak the heap that
follows the stack.
- UVM_KMF_WAITVA doesn't fail, so remove error check.
- Add KASSERTs.
 1.42 17-Mar-2020  maxv Add a redzone between the pcb and the stack. Sent to port-amd64@.
 1.41 25-Jan-2020  ad cpu_lwp_free() can be called with (l != curlwp) in error paths, so don't
detonate.
 1.40 12-Jan-2020  ad x86 pmap:

- It turns out that every page the pmap frees is necessarily zeroed. Tell
the VM system about this and use the pmap as a source of pre-zeroed pages.

- Redo deferred freeing of PTPs more elegantly, including the integration with
pmap_remove_all(). This fixes problems with nvmm, and possibly also a crash
discovered during fuzzing.

Reported-by: syzbot+a97186518c84f1d85c0c@syzkaller.appspotmail.com
 1.39 18-Oct-2019  maxv branches: 1.39.2;
Remove unused call to savectx().
 1.38 12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.37 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.36 26-Jul-2018  maxv Rework dbregs, to switch the registers during context switches, and not on
each user->kernel transition via userret. Reloads of DR6/DR7 are expensive
on both native and xen.
 1.35 01-Jul-2018  maxv Use a variable-sized memcpy, instead of copying the PCB and then adding
the extra bytes. The PCB embeds the biggest static FPU state, but our
real FPU state may be smaller (FNSAVE), so we don't need to memcpy the
extra unused bytes.
 1.34 19-Jun-2018  maxv branches: 1.34.2;
Explicitly clear l2's pcb_fpcpu when forking.

A context switch (preemption) could occur between

fpusave_lwp(l1, true);
and
memcpy(pcb2, pcb1, sizeof(struct pcb));

In this case, l1's FPU state is re-installed on the current CPU, and
pcb1->pcb_fpcpu becomes non NULL. While it's fine to have l1's state
installed, we don't want to indicate l2's state is installed too.

With lazy fpu this was not a problem, because the context-switch would
not re-install the state, so pcb1->pcb_fpcpu was NULL.

Should fix PR/53383.
 1.33 16-Mar-2018  maxv Remove the __HAVE_CPU_UAREA_ROUTINES code from x86.

It was available only in amd64, and I disabled it a few months ago in
order to support SVS. Regardless of SVS this option was questionable,
since it made stack overflows more difficult to detect.
 1.32 18-Jan-2018  maxv branches: 1.32.2;
Unmap the kernel heap from the user page tables (SVS).

This implementation is optimized and organized in such a way that we
don't need to copy the kernel stack to a safe place during user<->kernel
transitions. We create two VAs that point to the same physical page; one
will be mapped in userland and is offset in order to contain only the
trapframe, the other is mapped in the kernel and maps the entire stack.

Sent on tech-kern@ a week ago.
 1.31 11-Jan-2018  maxv The uarea must always be page-aligned.
 1.30 31-Dec-2017  maxv Fix a huge privilege separation vulnerability in Xen-amd64.

On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.

It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.

Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.29 17-Jun-2017  maxv Check (inside), not (!outside). It explains the two install failures
reported between pmap.h::r1.65 and vmparam.h::r1.40.
 1.28 23-Feb-2017  kamil branches: 1.28.6;
Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>
 1.27 15-Dec-2016  kamil branches: 1.27.2;
Add support for hardware assisted watchpoints/breakpoints API in ptrace(2)

Add new ptrace(2) calls:
- PT_COUNT_WATCHPOINTS - count the number of available hardware watchpoints
- PT_READ_WATCHPOINT - read struct ptrace_watchpoint from the kernel state
- PT_WRITE_WATCHPOINT - write new struct ptrace_watchpoint state, this
includes enabling and disabling watchpoints

The ptrace_watchpoint structure contains MI and MD parts:

typedef struct ptrace_watchpoint {
int pw_index; /* HW Watchpoint ID (count from 0) */
lwpid_t pw_lwpid; /* LWP described */
struct mdpw pw_md; /* MD fields */
} ptrace_watchpoint_t;

For example amd64 defines MD as follows:
struct mdpw {
void *md_address;
int md_condition;
int md_length;
};

These calls are protected with the __HAVE_PTRACE_WATCHPOINTS guard.

Tested on amd64, initial support added for i386 and XEN.

Sponsored by <The NetBSD Foundation>
 1.26 08-Nov-2016  christos PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.25 11-Mar-2014  para branches: 1.25.4; 1.25.6; 1.25.8; 1.25.10; 1.25.12;
mark a diagnostic only variable
 1.24 25-Feb-2014  dsl Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
 1.23 20-Feb-2014  dsl Move the amd64 and i386 pcb to the bottom of the uarea, and move the
kernel stack to the top.
Change the pcb layouts so that fpu save area is at the end and is
64byte aligned ready for xsave (saving the ymm registers).
Welcome to 6.99.32
 1.22 15-Feb-2014  dsl Remove all references to MDL_USEDFPU and deferred fpu initialisation.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).
 1.21 11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.20 26-Jan-2014  dsl Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
 1.19 11-Jan-2014  christos Comment about missing stackframe member initialization (Richard Hansen)

I haven't studied the code, but I'm concerned that not initializing
sf->sf_edi could potentially leak a few bytes of information to a new
userspace process.
 1.18 01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.17 23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.16 15-Jul-2012  dsl branches: 1.16.2; 1.16.4;
Rename MDP_IRET to MDL_IRET since it is an lwp flag, not a proc one.
Add an MDL_COMPAT32 flag to the lwp's md_flags, set it for 32bit lwps
and use it to force 'return to user' with iret (as is done when
MDL_IRET is set).
Split the iret/sysret code paths much later.
Remove all the replicated code for 32bit system calls - which was only
needed so that iret was always used.
frameasm.h for XEN contains '#define swapgs', while XEN probable never
needs swapgs, this is likely to be confusing.
Add a SWAPGS which is a nop on XEN and swapgs otherwise.
(I've not yet checked all the swapgs in files that include frameasm.h)
Simple x86 programs still work.
Hijack 6.99.9 kernel bump (needed for compat32 modules)
 1.15 19-Feb-2012  rmind Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
 1.14 21-Jan-2012  chs branches: 1.14.2; 1.14.6; 1.14.8;
allocate uareas contiguously and access them via the direct map.
 1.13 10-Feb-2011  pooka branches: 1.13.4; 1.13.8;
Make vmapbuf() return success/error and make physio deal with a
failure.
 1.12 05-Feb-2011  yamt cpu_lwp_free2: add assertions
 1.11 18-Jan-2011  matt branches: 1.11.2;
Copy PK_32 to p2->p_flag instead of doing it in the cpu_proc_fork hook.
 1.10 07-Jul-2010  chs branches: 1.10.2;
implement cpu_lwp_setprivate() on several platforms.
 1.9 23-Apr-2010  joerg Use struct segment_descriptor for pcb_fsd and pcb_gsd instead of int[2].
 1.8 29-Nov-2009  rmind branches: 1.8.2; 1.8.4;
Replace l_addr with uvm_lwp_getuarea() in various MD code, mostly cpu_lwp_fork().
 1.7 25-Nov-2009  rmind Disable kstack red-zone for now, while we decide on a nice way to fix it.
 1.6 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.5 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.4 29-Oct-2009  yamt fix a typo in a comment.
 1.3 27-Oct-2009  rmind cpu_proc_fork: use pcb1 and pcb2, and thus make routine more readable.
Remove or update outdated comments, add new ones. Clean-up.
 1.2 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.1 30-Mar-2009  rmind branches: 1.1.2; 1.1.4; 1.1.6;
Merge i386 and amd64 vm_machdep.c into x86. No functional changes intended.
Note: some #ifdefs will be removed with macros.
 1.1.6.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.1.6.4 24-Oct-2010  jym Sync with HEAD
 1.1.6.3 01-Nov-2009  jym Sync with HEAD.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 30-Mar-2009  jym file vm_machdep.c was added on branch jym-xensuspend on 2009-05-13 17:18:45 +0000
 1.1.4.4 11-Aug-2010  yamt sync with head.
 1.1.4.3 11-Mar-2010  yamt sync with head
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 30-Mar-2009  yamt file vm_machdep.c was added on branch yamt-nfs-mp on 2009-05-04 08:12:11 +0000
 1.1.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.1.2.1 30-Mar-2009  skrll file vm_machdep.c was added on branch nick-hppapmap on 2009-04-28 07:34:57 +0000
 1.8.4.2 05-Mar-2011  rmind sync with head
 1.8.4.1 30-May-2010  rmind sync with head
 1.8.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.8.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.10.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.2.2 17-Feb-2011  bouyer Sync with HEAD
 1.11.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.13.8.2 24-Feb-2012  mrg sync to -current.
 1.13.8.1 18-Feb-2012  mrg merge to -current.
 1.13.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.13.4.2 30-Oct-2012  yamt sync with head
 1.13.4.1 17-Apr-2012  yamt sync with head
 1.14.8.1 19-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1517):
sys/arch/amd64/amd64/machdep.c: 1.280 via patch
sys/arch/amd64/include/segments.h: 1.34 via patch
sys/arch/i386/i386/machdep.c: 1.800
sys/arch/i386/include/segments.h: 1.64
sys/arch/x86/x86/vm_machdep.c: 1.30
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.14.6.1 19-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1517):
sys/arch/amd64/amd64/machdep.c: 1.280 via patch
sys/arch/amd64/include/segments.h: 1.34 via patch
sys/arch/i386/i386/machdep.c: 1.800
sys/arch/i386/include/segments.h: 1.64
sys/arch/x86/x86/vm_machdep.c: 1.30
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.14.2.1 19-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1517):
sys/arch/amd64/amd64/machdep.c: 1.280 via patch
sys/arch/amd64/include/segments.h: 1.34 via patch
sys/arch/i386/i386/machdep.c: 1.800
sys/arch/i386/include/segments.h: 1.64
sys/arch/x86/x86/vm_machdep.c: 1.30
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.16.4.1 18-May-2014  rmind sync with head
 1.16.2.2 03-Dec-2017  jdolecek update from HEAD
 1.16.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.25.12.1 18-Jan-2017  skrll Sync with netbsd-5
 1.25.10.2 20-Mar-2017  pgoyette Sync with HEAD
 1.25.10.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.25.8.2 22-Jan-2018  snj Pull up following revision(s) (requested by maxv in ticket #1550):
sys/arch/amd64/amd64/machdep.c: revision 1.280 via patch
sys/arch/amd64/include/segments.h: revision 1.34 via patch
sys/arch/i386/i386/machdep.c: revision 1.800 via patch
sys/arch/i386/include/segments.h: revision 1.64 via patch
sys/arch/x86/x86/vm_machdep.c: revision 1.30 via patch
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.25.8.1 18-Dec-2016  snj Pull up following revision(s) (requested by riastradh in ticket #1316):
sys/arch/x86/x86/pmap.c: revision 1.223
sys/arch/x86/x86/vm_machdep.c: revision 1.26
sys/arch/x86/include/pmap.h: revision 1.61
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.25.6.3 28-Aug-2017  skrll Sync with HEAD
 1.25.6.2 05-Feb-2017  skrll Sync with HEAD
 1.25.6.1 05-Dec-2016  skrll Sync with HEAD
 1.25.4.2 22-Jan-2018  snj Pull up following revision(s) (requested by maxv in ticket #1550):
sys/arch/amd64/amd64/machdep.c: revision 1.280 via patch
sys/arch/amd64/include/segments.h: revision 1.34 via patch
sys/arch/i386/i386/machdep.c: revision 1.800 via patch
sys/arch/i386/include/segments.h: revision 1.64 via patch
sys/arch/x86/x86/vm_machdep.c: revision 1.30 via patch
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.25.4.1 18-Dec-2016  snj branches: 1.25.4.1.2;
Pull up following revision(s) (requested by riastradh in ticket #1316):
sys/arch/x86/x86/pmap.c: revision 1.223
sys/arch/x86/x86/vm_machdep.c: revision 1.26
sys/arch/x86/include/pmap.h: revision 1.61
PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present.
XXX: pullup-7
 1.25.4.1.2.1 22-Jan-2018  snj Pull up following revision(s) (requested by maxv in ticket #1550):
sys/arch/amd64/amd64/machdep.c: revision 1.280 via patch
sys/arch/amd64/include/segments.h: revision 1.34 via patch
sys/arch/i386/i386/machdep.c: revision 1.800 via patch
sys/arch/i386/include/segments.h: revision 1.64 via patch
sys/arch/x86/x86/vm_machdep.c: revision 1.30 via patch
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.27.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.28.6.4 23-Jun-2018  martin Pull up the following, via patch, requested by maxv in ticket #897:

sys/arch/amd64/amd64/locore.S 1.166 (patch)
sys/arch/i386/i386/locore.S 1.157 (patch)
sys/arch/x86/include/cpu.h 1.92 (patch)
sys/arch/x86/include/fpu.h 1.9 (patch)
sys/arch/x86/x86/fpu.c 1.33-1.39 (patch)
sys/arch/x86/x86/identcpu.c 1.72 (patch)
sys/arch/x86/x86/vm_machdep.c 1.34 (patch)
sys/arch/x86/x86/x86_machdep.c 1.116,1.117 (patch)

Support eager fpu switch, to work around INTEL-SA-00145.
Provide a sysctl machdep.fpu_eager, which gets automatically
initialized to 1 on affected CPUs.
 1.28.6.3 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.28.6.2 17-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #637:

sys/arch/amd64/amd64/process_machdep.c 1.33,1.34,1.35 (patch)
sys/arch/amd64/include/types.h 1.55 (patch)
sys/arch/x86/x86/vm_machdep.c 1.33 (patch)

- Reduce the number of places where segment register faults can
occur.
- Remove __HAVE_CPU_UAREA_ROUTINES.
 1.28.6.1 01-Jan-2018  snj Pull up following revision(s) (requested by maxv in ticket #477):
sys/arch/amd64/amd64/machdep.c: revision 1.280
sys/arch/amd64/include/segments.h: revision 1.34
sys/arch/i386/i386/machdep.c: revision 1.800
sys/arch/i386/include/segments.h: revision 1.64 via patch
sys/arch/x86/x86/vm_machdep.c: revision 1.30
Fix a huge privilege separation vulnerability in Xen-amd64.
On amd64 the kernel runs in ring3, like userland, and therefore SEL_KPL
equals SEL_UPL. While Xen can make a distinction between usermode and
kernelmode in %cs, it can't when it comes to iopl. Since we set SEL_KPL
in iopl, Xen sees SEL_UPL, and allows (unprivileged) userland processes
to read and write to the CPU ports.
It is easy, then, to completely escalate privileges; by reprogramming the
PIC, by reading the ATA disks, by intercepting the keyboard interrupts
(keylogger), etc.
Declare IOPL_KPL, set to 1 on Xen-amd64, which allows the kernel to use
the ports but not userland. I didn't test this change on i386, but it
seems fine enough.
 1.32.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.32.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.32.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.34.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.34.2.1 10-Jun-2019  christos Sync with HEAD
 1.39.2.2 25-Jan-2020  ad Sync with head.
 1.39.2.1 17-Jan-2020  ad Sync with head.
 1.43.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.43.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.44.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.46.6.1 02-Aug-2025  perseant Sync with HEAD
 1.22 09-May-2024  pho port-arm/58194: Resurrect vmt(4) from bitrot

On this architecture vmt(4) used to search for a node "/hypervisor" in the
FDT and probed the VMware hypervisor call only when the node was
found. However, things appear to have changed and VMware no longer provides
the FDT node.

Since vmt(4) doesn't actually need to read anything from FDT, and the
hypervisor call logically resides in virtual CPUs themselves, it would be
better to attach it directly to cpu, just like how it's probed on x86.
 1.21 27-Oct-2020  ryo move vmt(4) from MD to MI, and add support vmt on aarch64. tested on ESXi-Arm Fling

- move from sys/arch/x86/x86/{vmt.c,vmtreg.h,vmtvar.h} to sys/dev/vmt/{vmt_subr.c,vmtreg.h,vmtvar.h},
and split the attach part of the cpufeaturebus and fdt
- add aarch64 vmware backdoor op
- add include guard to vmt{reg,var}.h
- Yet there is still some little-endian dependency. it needs to be fixed in order to work properly on aarch64eb
 1.20 29-Dec-2017  nakayama Add line break.
 1.19 17-Oct-2017  maya Update protocol reverse engineering URL to a working one
only mention it once.

From openbsd by Seth Jackson
 1.18 17-Oct-2017  maya Check that the host supports GET_SPEED as well as GET_VERSION
before deciding vmt_probe has succeeded.

qemu supports GET_VERSION but not the RPC protocol so the probe succeeds
but the attach fails, resulting in "vmt0: failed to open backdoor RPC
channel (TCLO protocol)". All known versions of vmware support GET_SPEED
and no known qemu versions do, so this prevents it from attempting to
attach (and failing) on qemu while still working on vmware.

stop checking vmt_type to avoid having to adapt this code.

- Taken from openbsd
 1.17 01-Jun-2017  chs branches: 1.17.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.16 23-May-2017  nonaka x86: hypervisor detection from FreeBSD for x2APIC support.
 1.15 10-Nov-2016  ozaki-r Fix a breakout of loops

As the comment "find first available ipv4 address" indicates,
if an IP address is found, we need to leave the two nested loops,
a loop for an interface list and a loop for IP addresses of
an interface. However, the original code broke away only from
the inner loop.

The original (wrong) behavior was non-critical, which just
returned a non-first IP address. Unfortunately, after applying
psref, the behavior may call psref_acquire twice to a target
with the same psref object, resulting in a kernel panic eventually.
 1.14 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.13 07-Jul-2016  ozaki-r branches: 1.13.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.12 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.11 23-Apr-2015  pgoyette Update module dependencies for all the existing modules that depend on sysmon components.
 1.10 25-Jul-2014  ozaki-r branches: 1.10.4;
Use IFADDR_FOREACH for iterating if_addrlist of ifnet
 1.9 17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.8 16-Mar-2013  jmmv branches: 1.8.6; 1.8.10;
Synchronize the clock periodically in vmt(4).

Add periodic clock synchronization to vmt(4) so that the guest clock
remains synchronized even when the host is suspended (which is a very
typical situation in a laptop).

Do this by default once per minute, but provide a sysctl to tune this
value (machdep.vmt0.clock_sync.period).

Sent to tech-kern@ for review and addressed a couple of issues.
 1.7 21-Oct-2011  jmcneill branches: 1.7.2; 1.7.8; 1.7.12;
synchronize the guest clock with the host on attach and resume
 1.6 20-Oct-2011  jmcneill mark vm_reg members as volatile instead of building this with -O0
 1.5 18-Oct-2011  jmcneill report guest type of "other" for i386 and "other-64" for amd64
 1.4 18-Oct-2011  jmcneill use PRId64 for time_t format
 1.3 18-Oct-2011  jmcneill don't allow module autounload
 1.2 17-Oct-2011  jmcneill handle OS_Resume events by sending a sleep pswitch "release" event
 1.1 17-Oct-2011  jmcneill add a port of the VMware Tools driver vmt(4) from OpenBSD
 1.7.12.3 03-Dec-2017  jdolecek update from HEAD
 1.7.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.12.1 23-Jun-2013  tls resync from head
 1.7.8.1 20-Oct-2013  bouyer Pull up following revision(s) (requested by pettai in ticket #965):
sys/arch/x86/x86/vmt.c: revision 1.8
share/man/man4/man4.x86/vmt.4: revision 1.4
Synchronize the clock periodically in vmt(4).
Add periodic clock synchronization to vmt(4) so that the guest clock
remains synchronized even when the host is suspended (which is a very
typical situation in a laptop).
Do this by default once per minute, but provide a sysctl to tune this
value (machdep.vmt0.clock_sync.period).
Sent to tech-kern@ for review and addressed a couple of issues.
 1.7.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.10.1 10-Aug-2014  tls Rebase.
 1.8.6.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.10.4.6 28-Aug-2017  skrll Sync with HEAD
 1.10.4.5 05-Dec-2016  skrll Sync with HEAD
 1.10.4.4 05-Oct-2016  skrll Sync with HEAD
 1.10.4.3 09-Jul-2016  skrll Sync with HEAD
 1.10.4.2 29-May-2016  skrll Sync with HEAD
 1.10.4.1 06-Jun-2015  skrll Sync with HEAD
 1.13.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.13.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.17.2.2 05-Feb-2018  martin Pull up following revision(s) (requested by nakayama in ticket #532):
sys/arch/x86/x86/vmt.c: revision 1.20
Add line break.
 1.17.2.1 25-Oct-2017  snj Pull up following revision(s) (requested by maya in ticket #325):
sys/arch/x86/x86/vmt.c: revision 1.18
Check that the host supports GET_SPEED as well as GET_VERSION
before deciding vmt_probe has succeeded.
qemu supports GET_VERSION but not the RPC protocol so the probe succeeds
but the attach fails, resulting in "vmt0: failed to open backdoor RPC
channel (TCLO protocol)". All known versions of vmware support GET_SPEED
and no known qemu versions do, so this prevents it from attempting to
attach (and failing) on qemu while still working on vmware.
stop checking vmt_type to avoid having to adapt this code.
- Taken from openbsd
 1.3 27-Oct-2020  ryo move vmt(4) from MD to MI, and add support vmt on aarch64. tested on ESXi-Arm Fling

- move from sys/arch/x86/x86/{vmt.c,vmtreg.h,vmtvar.h} to sys/dev/vmt/{vmt_subr.c,vmtreg.h,vmtvar.h},
and split the attach part of the cpufeaturebus and fdt
- add aarch64 vmware backdoor op
- add include guard to vmt{reg,var}.h
- Yet there is still some little-endian dependency. it needs to be fixed in order to work properly on aarch64eb
 1.2 17-Oct-2017  maya branches: 1.2.2;
Update protocol reverse engineering URL to a working one
only mention it once.

From openbsd by Seth Jackson
 1.1 23-May-2017  nonaka branches: 1.1.6;
x86: hypervisor detection from FreeBSD for x2APIC support.
 1.1.6.2 28-Aug-2017  skrll Sync with HEAD
 1.1.6.1 23-May-2017  skrll file vmtreg.h was added on branch nick-nhusb on 2017-08-28 17:51:56 +0000
 1.2.2.2 03-Dec-2017  jdolecek update from HEAD
 1.2.2.1 17-Oct-2017  jdolecek file vmtreg.h was added on branch tls-maxphys on 2017-12-03 11:36:51 +0000
 1.2 27-Oct-2020  ryo move vmt(4) from MD to MI, and add support vmt on aarch64. tested on ESXi-Arm Fling

- move from sys/arch/x86/x86/{vmt.c,vmtreg.h,vmtvar.h} to sys/dev/vmt/{vmt_subr.c,vmtreg.h,vmtvar.h},
and split the attach part of the cpufeaturebus and fdt
- add aarch64 vmware backdoor op
- add include guard to vmt{reg,var}.h
- Yet there is still some little-endian dependency. it needs to be fixed in order to work properly on aarch64eb
 1.1 23-May-2017  nonaka branches: 1.1.6; 1.1.10;
x86: hypervisor detection from FreeBSD for x2APIC support.
 1.1.10.2 03-Dec-2017  jdolecek update from HEAD
 1.1.10.1 23-May-2017  jdolecek file vmtvar.h was added on branch tls-maxphys on 2017-12-03 11:36:51 +0000
 1.1.6.2 28-Aug-2017  skrll Sync with HEAD
 1.1.6.1 23-May-2017  skrll file vmtvar.h was added on branch nick-nhusb on 2017-08-28 17:51:56 +0000
 1.92 13-Oct-2025  thorpej Use device_{get,set}prop_bool().
 1.91 30-Apr-2025  imil Introduce pvh_boot boolean to identify the real hypervisor when booting in PVH
mode.

As of now, sys/arch/x86/x86/identcpu.c / identify_hypervisor() returns in the
case of vm_guest being VM_GUEST_GENPVH, yet this VM type is not an actual
hypervisor but an information recorded in locore.S to drive boot method.
We need to investigate what type of hypervisor is really running the VM in
order to apply specifics, so instead of relying on vm_guest_is_pvh() which only
checks for VM_GUEST_XENPVH || VM_GUEST_GENPVH, pvh_boot informs on the boot
method while allowing to identify the real hypervisor.

Idea ok'd by bouyer@, tested on Xen domU, Xen dom0 with GENERIC PVH and
qemu GENERIC PVH boot.
 1.90 12-Feb-2025  imil Set a skip_attach_delay property to "true" for com port in virtual machines
to avoid a delay(10000) at attach
 1.89 06-Dec-2024  bouyer Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.88 02-Dec-2024  bouyer Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.
 1.87 19-Mar-2022  hannken branches: 1.87.4; 1.87.10;
Fix locking after opendisk(), VOP_IOCTL() needs an unlocked vnode,
vn_rdwr() needs flag IO_NODELOCKED.
 1.86 12-Feb-2022  riastradh sys: Fix various abuse of struct device internals.

Will help to make struct device opaque later.
 1.85 07-Oct-2021  msaitoh KNF. No functional change.
 1.84 09-Jul-2020  jdolecek Adapt to proplib api changes
 1.83 07-Jul-2020  thorpej whitelist -> permitlist
 1.82 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.81 28-Apr-2020  bouyer Add xbd to the list of valid disks.
Remove hardcoded root on xbd0 for Xen PVHVM, now that the x86 findroot()
knowns about xbd disks.
 1.80 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.79 10-Nov-2019  chs branches: 1.79.6;
in many device attach paths, allocate memory with M_WAITOK instead of M_NOWAIT
and remove code to handle failures that can no longer happen.
 1.78 24-May-2019  nonaka Added drivers for Hyper-V Synthetic Keyboard and Video device.
 1.77 07-Jun-2018  thorpej branches: 1.77.2;
In device_register(), if the device is an "iic" child of "imcsmb",
attach a I2C_PROP_INDIRECT_DEVICE_WHITELIST property that limits
the allowed devices to "spdmem" and "sdtemp". Also set the
I2C_PROP_INDIRECT_PROBE_STRATEGY property to I2C_PROBE_STRATEGY_NONE,
since that controller can't issue any of the "quick" commands.

XXX It would be nice to be able to do this in the imcsmb driver
itself, but the way autoconfiguration works makes that infeasible.
 1.76 09-Nov-2017  christos branches: 1.76.2;
add a "booted_method" string to aid in debugging double boot matches.
 1.75 21-Sep-2016  jmcneill branches: 1.75.8;
Set hw.acpi.sleep.vbios when a non-HW accelerated VGA driver attaches.
If the VGA_POST option is present in the kernel the default value is 2,
otherwise 1. PR kern/50781

Reviewed by: agc, mrg
 1.74 10-May-2015  mlelstv branches: 1.74.2;
Don't report EINVAL errors when searching the bootwedge, this error
is most likely the result of reading beyond the end of the wrong disk.
 1.73 10-May-2015  mlelstv If BTINFO_ROOTDEVICE is set but isn't a device name, then treat it
as a root specification. This allows strings like wedge:wedgename.
 1.72 21-Sep-2014  christos branches: 1.72.2;
remove stray continue.
 1.71 10-Jun-2014  christos branches: 1.71.2;
centralize the double match warning.
 1.70 03-Apr-2014  christos branches: 1.70.2;
- prevent matchbiosdisks from being called twice. This could happen
via raid autoconf calling cpu_rootconf() once and then init main
calling cpu_rootconf() a second time.
- separate booted_device setup into cpu_bootconf(), a new optional function.
This function can be called before raid autoconfiguration to determine
the booted device. This needs to be done before raid autoconfiguration,
otherwise if we are using wedges, the raid will autoconfigure wedges,
and we'll be unable to open the underlying devices later to determine
the booted device.
- fix a debugging comment.
 1.69 02-Apr-2014  christos - tidy up debugging
 1.68 16-May-2013  christos branches: 1.68.2;
Complete the dosparts -> mbrparts conversion. Only x86k new uses dosparts
because it also uses struct dos_partition.
 1.67 28-Apr-2013  christos If we have both wedge and partition info, use the partition info in the
wedge case too. From mlelstv.
XXX: pullup-6
 1.66 29-Dec-2012  christos don't leak a vnode on error
 1.65 29-Jul-2012  mlelstv branches: 1.65.2;
Do not call setroot() from MD code and from MI code, which has
unwanted sideeffects in the RB_ASKNAME case. This fixes PR/46732.

No longer wrap MD cpu_rootconf(), as hp300 port stores reboot information
as a side effect. Instead call MI rootconf() from MD code which makes
rootconf() now a wrapper to setroot().

Adjust several MD routines to set the global booted_device,booted_partition
variables instead of passing partial information to setroot().

Make cpu_rootconf(9) describe the calling order.
 1.64 13-Jul-2012  christos fix the comparison to determine if a biosdev is a cdrom (from mhitch)
 1.63 10-Jun-2012  mlelstv Make detection of root on wedges (dk(4)) machine independent. Remove
MD code for x86, xen, sparc64.
 1.62 18-Oct-2011  dyoung branches: 1.62.2; 1.62.8;
Factor device_isa_register() and device_pci_register() out of
device_register() and stick the new routines into isa_machdep.c and
pci_machdep.c, respectively.
 1.61 19-Sep-2011  gsutre PR/38356: Minoura Makoto: Use the device's unit (instead of autoconf's)
to match the bootinfo root device.

Fixes multiboot(8) root= option as well as GRUB knetbsd --root option.
 1.60 02-Jul-2011  mrg insert some (uintptr_t) between some casts involving pointer to int.
(they already had casts for the pointer.)
 1.59 08-Mar-2011  macallan if we know the framebuffer's virtual address pass it to the fb driver
 1.58 22-Feb-2011  dholland vga_posth should be inside NPCI > 0; from Jarle Greipsland in PR 43449.
 1.57 12-Feb-2011  jmcneill x86 genfb: when switching back to the console, if vga_post is present use it
to reset the video mode. gives us a chance of survival if the X server
crashes or the video driver fails to restore the console properly.
 1.56 09-Feb-2011  bouyer Fix build when GENFB is not there.
 1.55 09-Feb-2011  jmcneill if genfb is attached, hook into db_trap_callback to switch in and out of
polling mode as necessary
 1.54 08-Feb-2011  jmcneill add a 'setmode' callback to genfb and use it to setup write-combining
MTRRs on x86 whenever switching to WSDISPLAYIO_MODE_EMUL
 1.53 10-Jan-2011  jakllsch branches: 1.53.2; 1.53.4;
When we fail to read a block computing the matching hash,
it's nice to know what device and why.

Also, drop comment that hasn't been valid since 1.12.
 1.52 21-Aug-2010  jmcneill I guess people still attach com & lpt to isa, so don't skip legacy devices.
 1.51 21-Aug-2010  jmcneill If ACPI is active and the FADT reports no legacy devices present, set
the 'no-legacy-devices' property to true on isa0.
 1.50 24-Feb-2010  dyoung branches: 1.50.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.
 1.49 31-Jan-2010  hubertf branches: 1.49.2;
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.48 08-Jan-2010  dyoung Expand PMF_FN_* macros.
 1.47 25-Nov-2009  tron Fix build of kernels without PCI support like "GENERIC_TINY".
 1.46 06-Nov-2009  dyoung Use deviter(9) instead of accessing alldevs directly.
 1.45 21-Sep-2009  macallan set is_console even if we don't have any fb_info so other console drivers
than genfb have a chance of working.
Tested with radeonfb which works fine with an RV280
TODO: figure out how to deal with more than one PCI_CLASS_DISPLAY device in
a halfway sane manner
 1.44 24-Aug-2009  jmcneill Don't reference genfb unless NGENFB > 0
 1.43 24-Aug-2009  jmcneill PR# port-i386/41929: genfb and machdep.acpi_vbios_reset=2 interaction

When acpi_vbios_reset=2, invoke vga_post_call followed by vga_post_set_mode
in the genfb pmf resume handler.
 1.42 24-Aug-2009  jmcneill Paranoia; restore the genfb colour map on resume.
 1.41 03-Aug-2009  dsl Only define x86_genfb_set_mapreg() and found_console when NPCI > 0
Fixes PR/41451
 1.40 04-May-2009  cegger struct cfdata * -> cfdata_t
 1.39 01-May-2009  cegger struct device * -> device_t
 1.38 17-Feb-2009  jmcneill x86_genfb_console_screen is only available if NWSDISPLAY > 0 and
NGENFB > 0, spotted by Geoff Wing.
 1.37 17-Feb-2009  jmcneill Set clear-screen and cursor-row so the transition from the early console
driver and genfb is seamless. While we're here, clear the screen when
we first attach in case the bootloader scribbled on it.
 1.36 16-Feb-2009  jmcneill Kernel-side modifications for framebuffer console support on i386 and amd64.

* New BTINFO_FRAMEBUFFER kernel parameter to pass screen configuration
* Early attach support for framebuffer console
* Pass BTINFO_FRAMEBUFFER parameters to genfb in device_register
* Provide hooks to genfb to set VGA DAC palette in 8bpp mode
 1.35 14-Oct-2008  tsutsui branches: 1.35.2; 1.35.4; 1.35.8; 1.35.12;
If no booted_device is found in find_root(), also check CD-ROM boot
with strategy bootloader does. This allows one CD system with cd9660
root file system and mfs (like a restorecd for cobalt) using GENERIC.

No objection on port-i386, and no bad side effect on usual harddisk boot
or installation of GENERIC with miniroot module.
 1.34 16-Apr-2008  cegger branches: 1.34.4; 1.34.10;
- use aprint_*_dev and device_xname
- use POSIX integer types
 1.33 12-Feb-2008  joerg branches: 1.33.6;
Garbage collect the remaining parts of COMPAT_OLDBOOT. The boot loader
support has been removed at least 4 years ago and NetBSD 1.3 is ancient.
 1.32 12-Feb-2008  joerg Introduce device_find_by_xname and device_find_by_driver_unit to replace
alldevs iterations all over src.

Patch discussed with and improved on suggestioned from cube@.
 1.31 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.30 17-Oct-2007  garbled branches: 1.30.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.29 27-Aug-2007  xtraeme branches: 1.29.2;
Fix typo in rev 1.26: alldev -> alldevs.
 1.28 27-Aug-2007  xtraeme Fix the $NetBSD$ tag in __KERNEL_RCSID().
 1.27 27-Aug-2007  dyoung Use TAILQ_FOREACH().
 1.26 24-Jun-2007  dyoung branches: 1.26.4; 1.26.8;
Extract common code from i386, xen, and sparc64, creating
config_handle_wedges() and read_disk_sectors(). On x86, handle_wedges()
is a thin wrapper for config_handle_wedges(). Share opendisk()
across architectures.

Add kernel code in support of specifying a root partition by wedge
name. E.g., root specifications "wedge:wd0a", "wedge:David's Root
Volume" are possible. (Patches for config(1) coming soon.)

In support of moving disks between architectures (esp. i386 <->
evbmips), I've written a routine convertdisklabel() that ensures
that the raw partition is at RAW_DISK by following these steps:

0 If we have read a disklabel that has a RAW_PART with
p_offset == 0 and p_size != 0, then use that raw partition.

1 If we have read a disklabel that has both partitions 'c'
and 'd', and RAW_PART has p_offset != 0 or p_size == 0,
but the other partition is suitable for a raw partition
(p_offset == 0, p_size != 0), then swap the two partitions
and use the new raw partition.

2 If the architecture's raw partition is 'd', and if there
is no partition 'd', but there is a partition 'c' that
is suitable for a raw partition, then copy partition 'c'
to partition 'd'.

3 Determine the drive's last sector, using either the
d_secperunit the drive reported, or by guessing (0x1fffffff).
If we cannot read the drive's last sector, then fail.

4 If we have read a disklabel that has no partition slot
RAW_PART, then create a partition RAW_PART. Make it span
the whole drive.

5 If there are fewer than MAXPARTITIONS partitions,
then "slide" the unsuitable raw partition RAW_PART, and
subsequent partitions, into partition slots RAW_PART+1
and subsequent slots. Create a raw partition at RAW_PART.
Make it span the whole drive.

The convertdisklabel() procedure can probably stand to be simplified,
but it ought to deal with all but an extraordinarily broken disklabel,
now.

i386: compiled and tested, sparc64: compiled, evbmips: compiled.
 1.25 04-Mar-2007  christos branches: 1.25.2; 1.25.4; 1.25.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.24 06-Oct-2006  yamt branches: 1.24.2; 1.24.4;
handle_wedges: fix a typo. (NOCREAD -> NOCRED)
 1.23 05-Oct-2006  martin I can not test this right now, but the equivalent change was needed on
sparc64 to make root on raid shutdown cleanly: after using opendisk()
and querying wedges, we need to VOP_CLOSE() the resulting vnode.
 1.22 27-Aug-2006  christos branches: 1.22.2; 1.22.4;
Fix previous thinko.
 1.21 27-Aug-2006  christos Fix reversed test
 1.20 27-Aug-2006  christos Wedges don't have partitions.
 1.19 13-Aug-2006  christos Fix missing initialization of tmpvn; thanks gcc.
 1.18 12-Aug-2006  christos - Check if a disk has wedges, and use the wedge device corresponding to the
root partition, instead of punting. This makes booting work
with traditional disklabel disks and wedge autoconfiguration.
- factor out disk opening code.
 1.17 12-Aug-2006  christos add dk.
 1.16 14-May-2006  elad integrate kauth.
 1.15 29-Mar-2006  thorpej Use device_cfdata().
 1.14 28-Mar-2006  thorpej Use device_unit().
 1.13 26-Feb-2006  thorpej branches: 1.13.2; 1.13.4; 1.13.6;
Fix typo.
 1.12 26-Feb-2006  thorpej Use device_is_a() more.
 1.11 26-Feb-2006  thorpej Use device_is_a().
 1.10 23-Feb-2006  thorpej Use device_parent().
 1.9 21-Feb-2006  thorpej Use device_class() instead of accessing dv_class directly.
 1.8 04-Feb-2006  jmmv Revert yesterday's change that attempted to fix the detection of the
boot device when using a Multiboot boot loader. It couldn't work because
these boot loaders do not pass a checksum of the disk so matchbiosdisk()
cannot really find any matches. I should have gone to sleep before
commiting...

Found by xtraeme@.
 1.7 03-Feb-2006  jmmv branches: 1.7.2;
When booting an i386 kernel with Multiboot, properly detect the boot device
by looking it up in the x86_alldisks table (instead of trying to match it
to 'wd*' manually).

In order to do this, move the cpu_rootconf function from x86 common code
to amd64 and i386 specific one. This way, i386 can do an extra step (call
the appropriate Multiboot code) in the appropriate place (after
x86_matchbiosdisks and before findroot()).
 1.6 03-Feb-2006  jmmv Implement support for 'The Multiboot Specification' so that i386 kernels
can be booted directly from Multiboot-compliant boot loaders (e.g. GRUB).
See the added multiboot(8) manual page for more information.

No objections in tech-kern@; only positive comments.
 1.5 11-Dec-2005  christos branches: 1.5.2; 1.5.4;
merge ktrace-lwp.
 1.4 29-May-2005  christos branches: 1.4.2;
avoid variable shadowing.
 1.3 26-Oct-2004  xtraeme branches: 1.3.2;
Fix typo: labe -> label.
 1.2 23-Oct-2004  thorpej Use the new BTINFO_BOOTWEDGE bootinfo to discover the booted disk and
wedge.
 1.1 20-Oct-2004  thorpej Move boot device detection code from i386 and amd64 ports to x86_autoconf.c.
Rename i386_alldisks and x86_64_alldisks to x86_alldisks, adjust other
references to compensate.
 1.3.2.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.2 02-Nov-2004  skrll Sync with HEAD.
 1.3.2.1 26-Oct-2004  skrll file x86_autoconf.c was added on branch ktrace-lwp on 2004-11-02 07:51:06 +0000
 1.4.2.5 27-Feb-2008  yamt sync with head.
 1.4.2.4 07-Dec-2007  yamt sync with head
 1.4.2.3 03-Sep-2007  yamt sync with head.
 1.4.2.2 30-Dec-2006  yamt sync with head.
 1.4.2.1 21-Jun-2006  yamt sync with head.
 1.5.4.1 09-Sep-2006  rpaulo sync with head
 1.5.2.2 01-Mar-2006  yamt sync with head.
 1.5.2.1 18-Feb-2006  yamt sync with head.
 1.7.2.2 01-Jun-2006  kardel Sync with head.
 1.7.2.1 22-Apr-2006  simonb Sync with head.
 1.13.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.13.6.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.13.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.13.4.1 19-Apr-2006  elad sync with head - hopefully this will work
 1.13.2.3 03-Sep-2006  yamt sync with head.
 1.13.2.2 24-May-2006  yamt sync with head.
 1.13.2.1 01-Apr-2006  yamt sync with head.
 1.22.4.1 22-Oct-2006  yamt sync with head
 1.22.2.1 18-Nov-2006  ad Sync with head.
 1.24.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.24.2.1 05-Nov-2008  snj Pull up following revision(s) (requested by tsutsui in ticket #1220):
sys/arch/x86/x86/x86_autoconf.c: revision 1.35 via patch
If no booted_device is found in find_root(), also check CD-ROM boot
with strategy bootloader does. This allows one CD system with cd9660
root file system and mfs (like a restorecd for cobalt) using GENERIC.
No objection on port-i386, and no bad side effect on usual harddisk boot
or installation of GENERIC with miniroot module.
 1.25.10.2 03-Oct-2007  garbled Sync with HEAD
 1.25.10.1 26-Jun-2007  garbled Sync with HEAD.
 1.25.4.1 11-Jul-2007  mjf Sync with head.
 1.25.2.3 03-Dec-2007  ad Sync with HEAD.
 1.25.2.2 09-Oct-2007  ad Sync with head.
 1.25.2.1 15-Jul-2007  ad Sync with head.
 1.26.8.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.26.8.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.26.4.1 03-Sep-2007  skrll Sync with HEAD.
 1.29.2.3 23-Mar-2008  matt sync with HEAD
 1.29.2.2 09-Jan-2008  matt sync with HEAD
 1.29.2.1 06-Nov-2007  matt sync with HEAD
 1.30.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.30.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.33.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.33.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.10.1 19-Oct-2008  haad Sync with HEAD.
 1.34.4.6 09-Oct-2010  yamt sync with head
 1.34.4.5 11-Mar-2010  yamt sync with head
 1.34.4.4 16-Sep-2009  yamt sync with head
 1.34.4.3 19-Aug-2009  yamt sync with head.
 1.34.4.2 16-May-2009  yamt sync with head
 1.34.4.1 04-May-2009  yamt sync with head.
 1.35.12.1 21-Apr-2010  matt sync to netbsd-5
 1.35.8.5 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.35.8.4 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.35.8.3 24-Oct-2010  jym Sync with HEAD
 1.35.8.2 01-Nov-2009  jym Sync with HEAD.
 1.35.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.35.4.1 14-Feb-2010  bouyer Pull up following revision(s) (requested by hubertf in ticket #1290):
sys/kern/kern_ksyms.c: revision 1.53
sys/dev/pci/agp_via.c: revision 1.18
sys/netipsec/key.c: revision 1.63
sys/arch/x86/x86/x86_autoconf.c: revision 1.49
sys/kern/init_main.c: revision 1.415
sys/kern/cnmagic.c: revision 1.11
sys/netipsec/ipsec.c: revision 1.47
sys/arch/x86/x86/pmap.c: revision 1.100
sys/netkey/key.c: revision 1.176
Replace more printfs with aprint_normal / aprint_verbose
Makes "boot -z" go mostly silent for me.
 1.35.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.49.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.49.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.50.2.2 21-Apr-2011  rmind sync with head
 1.50.2.1 05-Mar-2011  rmind sync with head
 1.53.4.3 05-Mar-2011  bouyer Sync with HEAD
 1.53.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.53.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.53.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.62.8.4 07-Nov-2013  snj Pull up following revision(s) (requested by msaitoh in ticket #977):
sys/arch/x86/x86/x86_autoconf.c: revision 1.67
If we have both wedge and partition info, use the partition info in the
wedge case too. From mlelstv.
XXX: pullup-6
 1.62.8.3 08-Aug-2012  martin Pull up following revision(s) (requested by mlelstv in ticket #466):
sys/arch/amiga/amiga/autoconf.c: revision 1.113
sys/arch/rs6000/rs6000/autoconf.c: revision 1.4
sys/arch/emips/emips/autoconf.c: revision 1.6
sys/arch/sandpoint/sandpoint/autoconf.c: revision 1.27
sys/arch/evbmips/alchemy/autoconf.c: revision 1.18
sys/arch/sgimips/sgimips/autoconf.c: revision 1.43
sys/arch/atari/atari/autoconf.c: revision 1.63
sys/arch/powerpc/oea/ofw_autoconf.c: revision 1.17
sys/arch/mmeye/mmeye/autoconf.c: revision 1.9
distrib/sets/lists/comp/mi: revision 1.1771
sys/arch/mipsco/mipsco/autoconf.c: revision 1.25
sys/arch/iyonix/iyonix/autoconf.c: revision 1.14
sys/arch/hp300/hp300/autoconf.c: revision 1.100
sys/kern/init_main.c: revision 1.445
sys/arch/pmax/pmax/autoconf.c: revision 1.79
sys/arch/netwinder/netwinder/autoconf.c: revision 1.11
sys/arch/dreamcast/dreamcast/autoconf.c: revision 1.10
sys/arch/ibmnws/ibmnws/autoconf.c: revision 1.12
sys/arch/evbppc/ev64260/autoconf.c: revision 1.17
sys/arch/evbmips/gdium/autoconf.c: revision 1.5
sys/arch/algor/algor/autoconf.c: revision 1.21
share/man/man9/Makefile: revision 1.367
sys/arch/ews4800mips/ews4800mips/autoconf.c: revision 1.9
sys/arch/amigappc/amigappc/autoconf.c: revision 1.5
sys/arch/x86/x86/x86_autoconf.c: revision 1.65
sys/arch/acorn26/acorn26/autoconf.c: revision 1.9
sys/arch/mvmeppc/mvmeppc/autoconf.c: revision 1.13
sys/arch/vax/vax/autoconf.c: revision 1.94
sys/arch/usermode/dev/cpu.c: revision 1.72
sys/arch/evbppc/virtex/autoconf.c: revision 1.5
sys/arch/next68k/next68k/autoconf.c: revision 1.26
sys/arch/mac68k/mac68k/autoconf.c: revision 1.73
sys/arch/ia64/ia64/autoconf.c: revision 1.6
sys/arch/evbppc/obs405/obs405_autoconf.c: revision 1.6
share/man/man9/cpu_rootconf.9: revision 1.7
sys/arch/landisk/landisk/autoconf.c: revision 1.6
sys/arch/evbmips/malta/autoconf.c: revision 1.16
sys/arch/sun3/sun3/autoconf.c: revision 1.76
sys/arch/evbppc/explora/autoconf.c: revision 1.13
sys/arch/sun3/sun3/autoconf.c: revision 1.77
sys/arch/evbmips/loongson/autoconf.c: revision 1.3
sys/arch/evbmips/atheros/autoconf.c: revision 1.11
sys/arch/sparc64/sparc64/autoconf.c: revision 1.188
sys/arch/acorn32/acorn32/autoconf.c: revision 1.18
sys/arch/evbarm/evbarm/autoconf.c: revision 1.13
sys/arch/cobalt/cobalt/autoconf.c: revision 1.30
sys/arch/mvme68k/mvme68k/autoconf.c: revision 1.46
sys/arch/hp700/hp700/autoconf.c: revision 1.48
sys/arch/evbmips/adm5120/autoconf.c: revision 1.5
sys/arch/hpcmips/hpcmips/autoconf.c: revision 1.25
sys/arch/alpha/alpha/autoconf.c: revision 1.52
sys/arch/sparc/sparc/autoconf.c: revision 1.244
sys/arch/evbppc/pmppc/autoconf.c: revision 1.7
sys/arch/bebox/bebox/autoconf.c: revision 1.25
sys/arch/luna68k/luna68k/autoconf.c: revision 1.13
sys/arch/hpcarm/hpcarm/autoconf.c: revision 1.20
sys/arch/evbppc/walnut/autoconf.c: revision 1.21
sys/arch/cesfic/cesfic/autoconf.c: revision 1.26
sys/arch/cats/cats/autoconf.c: revision 1.17
sys/arch/x68k/x68k/autoconf.c: revision 1.67
sys/arch/news68k/news68k/autoconf.c: revision 1.21
sys/arch/arc/arc/autoconf.c: revision 1.34
sys/arch/evbsh3/evbsh3/autoconf.c: revision 1.11
sys/sys/conf.h: revision 1.143
sys/arch/evbmips/rasoc/autoconf.c: revision 1.3
sys/arch/hpcsh/hpcsh/autoconf.c: revision 1.26
sys/arch/sun68k/sun68k/autoconf.c: revision 1.29
sys/arch/evbmips/rmixl/autoconf.c: revision 1.6
sys/arch/zaurus/zaurus/autoconf.c: revision 1.12
sys/arch/xen/x86/autoconf.c: revision 1.15
sys/arch/evbppc/mpc85xx/autoconf.c: revision 1.6
sys/arch/shark/shark/autoconf.c: revision 1.18
sys/arch/prep/prep/autoconf.c: revision 1.25
sys/arch/newsmips/newsmips/autoconf.c: revision 1.36
sys/arch/sbmips/sbmips/autoconf.c: revision 1.8
Do not call setroot() from MD code and from MI code, which has
unwanted sideeffects in the RB_ASKNAME case. This fixes PR/46732.
No longer wrap MD cpu_rootconf(), as hp300 port stores reboot information
as a side effect. Instead call MI rootconf() from MD code which makes
rootconf() now a wrapper to setroot().
Adjust several MD routines to set the global booted_device,booted_partition
variables instead of passing partial information to setroot().
Make cpu_rootconf(9) describe the calling order.
add rootconf(9) as a link to cpu_rootconf(9)
make this compile again
 1.62.8.2 20-Jul-2012  riz Pull up following revision(s) (requested by mhitch in ticket #429):
sys/arch/x86/x86/x86_autoconf.c: revision 1.64
fix the comparison to determine if a biosdev is a cdrom (from mhitch)
 1.62.8.1 05-Jul-2012  riz Pull up following revision(s) (requested by mlelstv in ticket #402):
sys/dev/vnd.c: revision 1.221
sys/kern/init_main.c: revision 1.443
sys/kern/init_main.c: revision 1.444
sys/dev/dkwedge/dk.c: revision 1.64
sys/arch/x86/x86/x86_autoconf.c: revision 1.63
sys/arch/sparc64/sparc64/autoconf.c: revision 1.187
sys/sys/device.h: revision 1.141
sys/dev/dkwedge/dkwedge_bsdlabel.c: revision 1.17
sys/kern/kern_subr.c: revision 1.213
sys/arch/zaurus/zaurus/autoconf.c: revision 1.11
sys/arch/xen/x86/autoconf.c: revision 1.14
sys/sys/disk.h: revision 1.57
Use the label's packname to create wedge names instead of the classic
device names. Fall back to classic device names when the label has an
empty name or the default name 'fictitious'.
autodiscover wedges
Make detection of root on wedges (dk(4)) machine independent. Remove
MD code for x86, xen, sparc64.
Make detection of root on wedges (dk(4)) machine independent. Remove
MD code for zaurus.
Do not try to find the wedge we booted from if opendisk(booted_device)
failed.
 1.62.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.62.2.2 23-Jan-2013  yamt sync with head
 1.62.2.1 30-Oct-2012  yamt sync with head
 1.65.2.4 03-Dec-2017  jdolecek update from HEAD
 1.65.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.65.2.2 23-Jun-2013  tls resync from head
 1.65.2.1 25-Feb-2013  tls resync with head
 1.68.2.1 18-May-2014  rmind sync with head
 1.70.2.1 10-Aug-2014  tls Rebase.
 1.71.2.1 30-Oct-2014  martin Pull up following revision(s) (requested by maxv in ticket #165):
sys/arch/newsmips/stand/boot/netif_news.c: revision 1.9
sys/arch/mvme68k/stand/installboot/installboot.c: revision 1.19
sys/arch/arm/arm32/pmap.c: revision 1.300
sys/arch/amiga/dev/siop2.c: revision 1.43
sys/arch/amiga/amiga/disksubr.c: revision 1.62
sys/arch/news68k/news68k/bus_space.c: revision 1.13
sys/arch/amiga/dev/siop.c: revision 1.69
sys/arch/x86/x86/x86_autoconf.c: revision 1.72
Remove dead code in various places under arch/.
 1.72.2.2 05-Oct-2016  skrll Sync with HEAD
 1.72.2.1 06-Jun-2015  skrll Sync with HEAD
 1.74.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.75.8.1 12-Jun-2019  martin Pull up following revision(s) (requested by nonaka in ticket #1280):

sys/arch/x86/x86/consinit.c: revision 1.29
sys/dev/hyperv/vmbusvar.h: revision 1.2
sys/dev/hyperv/genfb_vmbusvar.h: revision 1.1
sys/arch/x86/x86/x86_autoconf.c: revision 1.78
sys/arch/x86/x86/identcpu.c: revision 1.91
sys/arch/x86/x86/hyperv.c: revision 1.2
sys/arch/x86/x86/hyperv.c: revision 1.3
sys/arch/x86/x86/hyperv.c: revision 1.4
sys/arch/i386/conf/GENERIC: revision 1.1207
sys/dev/wscons/wsconsio.h: revision 1.123
sys/arch/x86/x86/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.528
sys/dev/hyperv/files.hyperv: revision 1.2
sys/arch/x86/include/autoconf.h: revision 1.6
sys/dev/hyperv/hyperv_common.c: revision 1.2
sys/arch/xen/x86/autoconf.c: revision 1.23
sys/arch/x86/pci/pci_machdep.c: revision 1.86
sys/dev/hyperv/hvkbd.c: revision 1.1
sys/dev/hyperv/hypervvar.h: revision 1.2
sys/dev/acpi/vmbus_acpi.c: revision 1.2
sys/dev/hyperv/vmbus.c: revision 1.3
sys/dev/hyperv/hvkbdvar.h: revision 1.1
sys/dev/hyperv/genfb_vmbus.c: revision 1.1

Added drivers for Hyper-V Synthetic Keyboard and Video device.

Avoid undefined reference to `hyperv_guid_video' without vmbus(4).

Avoid undefined reference to `hyperv_is_gen1' without hyperv(4).

Use efi_probe().
 1.76.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.77.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.77.2.1 10-Jun-2019  christos Sync with HEAD
 1.79.6.1 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.87.10.1 02-Aug-2025  perseant Sync with HEAD
 1.87.4.1 29-Mar-2025  martin Pull up following revision(s) (requested by imil in ticket #1074):

sys/arch/x86/x86/x86_machdep.c: revision 1.155
sys/arch/x86/include/cpu.h: revision 1.137
sys/arch/x86/x86/x86_machdep.c: revision 1.156
sys/arch/x86/include/cpu.h: revision 1.138
sys/arch/x86/x86/consinit.c: revision 1.40
sys/arch/x86/acpi/acpi_machdep.c: revision 1.37
sys/arch/x86/acpi/acpi_machdep.c: revision 1.38
sys/arch/amd64/amd64/machdep.c: revision 1.370
sys/arch/xen/xen/hypervisor.c: revision 1.97
sys/arch/xen/xen/hypervisor.c: revision 1.98
sys/arch/amd64/amd64/genassym.cf: revision 1.98
sys/arch/x86/x86/x86_autoconf.c: revision 1.88
sys/arch/x86/x86/x86_autoconf.c: revision 1.89
sys/arch/amd64/amd64/locore.S: revision 1.226
sys/arch/amd64/amd64/locore.S: revision 1.227
sys/arch/x86/x86/identcpu.c: revision 1.131

Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.

Get one more change from PR kern/57813, needed for non-Xen PVH.

Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.4 11-May-2008  ad Simplify x86 identcpu code, and share between i386/amd64.
 1.3 22-Jan-2008  yamt branches: 1.3.2; 1.3.8; 1.3.10; 1.3.12; 1.3.14;
identifycpu_cpuids: fix ids for multicore cpus.
 1.2 04-Jan-2008  yamt branches: 1.2.2; 1.2.4;
use device_xname.
 1.1 01-Jan-2008  yamt branches: 1.1.2;
try to detect processor resource sharing topologies. ie. package/core/smt IDs.
 1.1.2.4 23-Jan-2008  bouyer Sync with HEAD.
 1.1.2.3 08-Jan-2008  bouyer Sync with HEAD
 1.1.2.2 02-Jan-2008  bouyer Sync with HEAD
 1.1.2.1 01-Jan-2008  bouyer file x86_identcpu.c was added on branch bouyer-xeni386 on 2008-01-02 21:51:28 +0000
 1.2.4.3 04-Feb-2008  yamt sync with head.
 1.2.4.2 21-Jan-2008  yamt sync with head
 1.2.4.1 04-Jan-2008  yamt file x86_identcpu.c was added on branch yamt-lazymbuf on 2008-01-21 09:40:19 +0000
 1.2.2.3 23-Mar-2008  matt sync with HEAD
 1.2.2.2 09-Jan-2008  matt sync with HEAD
 1.2.2.1 04-Jan-2008  matt file x86_identcpu.c was added on branch matt-armv6 on 2008-01-09 01:50:01 +0000
 1.3.14.1 23-Jun-2008  wrstuden Remove files removed on branch. Updating using patch has its
drawbacks. :-)
 1.3.12.1 16-May-2008  yamt sync with head.
 1.3.10.1 17-Jun-2008  yamt fix merge botches
 1.3.8.1 02-Jun-2008  mjf Sync with HEAD.
 1.3.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.3.2.1 22-Jan-2008  mjf file x86_identcpu.c was added on branch mjf-devfs on 2008-02-18 21:05:17 +0000
 1.3 07-Oct-2021  msaitoh KNF. No functional change.
 1.2 21-Jul-2021  jmcneill Separate MI smbios interface from MD specific code.
 1.1 25-Dec-2018  mlelstv branches: 1.1.2; 1.1.6; 1.1.20;
Make ipmi driver available to other platforms.
Add ACPI attachment.
 1.1.20.1 01-Aug-2021  thorpej Sync with HEAD.
 1.1.6.2 10-Jun-2019  christos Sync with HEAD
 1.1.6.1 25-Dec-2018  christos file x86_ipmi.c was added on branch phil-wifi on 2019-06-10 22:06:54 +0000
 1.1.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.1.2.1 25-Dec-2018  pgoyette file x86_ipmi.c was added on branch pgoyette-compat on 2018-12-26 14:01:45 +0000
 1.159 14-Jul-2025  bouyer Fix fallout from
https://mail-index.netbsd.org/source-changes/2025/06/20/msg157251.html
trap() may return with interrupts disabled to avoid preemption
before the iret. But on XENPV i386 it seems that iret doesn't reenable events
(or at last doesn't in some case). To fix this,
- in i386 calltrap, move a #ifdef XENPV code reenabling interrupts so that
it is always called before returning
- in cpu_kpreempt_exit(), handle calltrap() the same way copyfunc is; that
is reload the pmap if we got preempted here (only for XENPV i386)
 1.158 30-Apr-2025  imil Introduce pvh_boot boolean to identify the real hypervisor when booting in PVH
mode.

As of now, sys/arch/x86/x86/identcpu.c / identify_hypervisor() returns in the
case of vm_guest being VM_GUEST_GENPVH, yet this VM type is not an actual
hypervisor but an information recorded in locore.S to drive boot method.
We need to investigate what type of hypervisor is really running the VM in
order to apply specifics, so instead of relying on vm_guest_is_pvh() which only
checks for VM_GUEST_XENPVH || VM_GUEST_GENPVH, pvh_boot informs on the boot
method while allowing to identify the real hypervisor.

Idea ok'd by bouyer@, tested on Xen domU, Xen dom0 with GENERIC PVH and
qemu GENERIC PVH boot.
 1.157 22-Apr-2025  imil NVMM hypervisor identification, KVM and GenPVH identification fixes

arch/x86/include/cpu.h, arch/x86/x86/identcpu.c: Enable NVMM hypervisor
discovery
arch/x86/x86/identcpu.c: Fix vm_guest_t for KVM in vm_system_products
iarch/x86/x86/x86_machdep.c: Add NVMM and GenPVH in vm_guest_name
 1.156 06-Dec-2024  bouyer Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.155 02-Dec-2024  bouyer Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.
 1.154 04-Oct-2023  ad branches: 1.154.6;
Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
 1.153 23-Dec-2022  bouyer x86_add_cluster() takes the end of the segment, not the size.
Should fix PR port-xen/57121
 1.152 20-Aug-2022  riastradh branches: 1.152.4;
x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.151 20-Aug-2022  riastradh x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
 1.150 28-Oct-2021  riastradh x86: Process bootloader rndseed much sooner.
 1.149 07-Oct-2021  msaitoh KNF. No functional change.
 1.148 19-Feb-2021  christos It is not VirtualBo give some more space.
 1.147 19-Feb-2021  christos add VirtualBox
 1.146 09-Aug-2020  christos branches: 1.146.2;
move lcall sniffer to x86_machdep since xen/pv has its own cpu.c
 1.145 19-Jul-2020  maxv Compile USER_LDT by default, but, put it behind a privileged sysctl that
defaults to disabled. To enable:

# sysctl -w machdep.user_ldt=1
 1.144 04-Jul-2020  chs the x86 xen and non-xen modules are identical,
so remove the unneeded extra copies.
Xen kernels now use the same modules as native kernels.
 1.143 21-May-2020  ad - Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.
 1.142 03-May-2020  bouyer If hvm_start_info has no memmap_entries, fall back to XENMEM_memory_map
hypercall.
 1.141 02-May-2020  bouyer Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().
 1.140 01-May-2020  hannken Use PRIxPADDR for paddr_t to make i386/ALL compile.
 1.139 30-Apr-2020  bouyer Change module path to xen-* only for XENPV
 1.138 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.137 04-Apr-2020  christos branches: 1.137.2;
Infrastructure for putting kernel+modules in /netbsd/kernel and
/netbsd/modules respectively instead of /netbsd and
/stand/<arch>/<version>/modules. This is only supported for x86,
and is turned off by default. To try it, add KERNEL_DIR=yes in your
/mk.conf and install a system from that build.
 1.136 04-Apr-2020  ad Enable MONITOR/MWAIT idle on AMD chips, except some buggy Ryzens.
 1.135 29-Jan-2020  manu Fix startup crashes caused by wrong memory map handling

init_x86_vm() takes the memory map from BIOS and EFI and selects
regions suitable for memory allocation. This involves removing
areas used by the kernel, but the logic missed some corner cases,
which led to possible allocation in regions for which later memory
access would cause a panic.

The typical panic from this bug in GENERIC is at SVS startup:
cpu_svs_init / uvm_pagealloc_strat / pagezero

We fix the bug by adding logic for the missing cases of memory
regions overlapping with the kernel. While there, add more #idef'ed
debug output.
 1.134 28-Dec-2019  pgoyette branches: 1.134.2;
#include "opt_xen.h" so we can tell if we're in a XEN kernel. We need
to know this in order to set module_machine correctly, which in turn is
needed to set the module_base path from which modules are loaded and
which provides the value of sysctl(8) variable kern.module.path

Thanks to jnemeth@ for pointing out the problem.
 1.133 03-Dec-2019  riastradh Use __insn_barrier to enforce ordering in l_ncsw loops.

(Only need ordering observable by interruption, not by other CPUs.)
 1.132 03-Dec-2019  hannken Make sure the assignment to "idepth" is done inside the loop to prevent
preemption between loop end and dereference of "l_cpu->ci_depth".
 1.131 03-Dec-2019  hannken Make cpu_intr_p() work with "curlwp->l_cpu == NULL" and
assert "curlwp == &lwp0" in this case.

Prevents crash during early boot with "options LOCKDEBUG".
 1.130 01-Dec-2019  ad Make cpu_intr_p() safe to use anywhere, i.e. outside assertions:

Don't call kpreempt_disable() / kpreempt_enable() to make sure we're not
preempted while using the value of curcpu(). Instead, observe the value of
l_ncsw before and after the check to see if we have been preempted. If
we have been preempted, then we need to retry the read.
 1.129 23-Nov-2019  ad cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
 1.128 03-Oct-2019  maxv Remove the LazyFPU code, as posted 5 months ago on port-amd64@.
 1.127 29-May-2019  maxv branches: 1.127.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.
 1.126 19-May-2019  maxv Misc changes in the x86 FPU code. Reduces a future diff. No real functional
change.
 1.125 15-May-2019  maxv Change the way SVS is disabled. Now you have to pass "boot -3" from the
bootloader. The machdep.svs.enabled sysctl becomes read-only, and just
indicates whether SVS is enabled.

Sent on port-amd64@.
 1.124 15-Feb-2019  nonaka Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.
 1.123 14-Feb-2019  cherry Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.
 1.122 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.121 24-Dec-2018  cherry Towards bifurcating XEN and native interrupt related functions,
this is a preliminary cleanup sweep.

Move functions related to MP bus probe and scanning to x86/mp.c

Move generic platform pic search function to x86/x86_machdep.c
 1.120 19-Sep-2018  maxv i386 xen is pae
 1.119 13-Jul-2018  maxv Remove the X86PMC code I had written, replaced by tprof. Many defines
become unused in specialreg.h, so remove them. We don't want to add
defines all the time, there are countless PMCs on many generations, and
it's better to just inline the event/unit values.
 1.118 12-Jul-2018  maxv Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.
 1.117 16-Jun-2018  maxv branches: 1.117.2;
Need IPIs when enabling eager fpu switch, to clear each fpu and get us
started. Otherwise it is possible that the first context switch on one of
the cpus will restore an invalid fpu state in the new lwp, if that lwp
had its fpu state stored on another cpu that didn't have time to do an
fpu save since eager-fpu was enabled.

Use barriers and all the related crap. The point is that we want to
ensure that no context switch occurs between [each fpu is cleared] and
[x86_fpu_eager is set to 'true'].

Also add KASSERTs.
 1.116 14-Jun-2018  maxv Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.

Maybe we also need to clear the FPU in setregs(), not sure about this one.

Can be enabled/disabled via:

machdep.fpu_eager = {0/1}

Not yet turned on automatically on affected CPUs (Intel Family 6).

More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.
 1.115 22-May-2018  maxv Several changes:

- Move the sysctl initialization code into spectre.c. This way each
variable is local. Rename the variables, use shorter names.

- Use mitigation methods for SpectreV4, like SpectreV2. There are
several available on AMD (that we don't support yet). Add a "method"
leaf.

- Make SSB_NO a mitigation method by itself. This way we report as
"mitigated" a CPU that is not affected by SpectreV4. In this case,
of course, the user can't enable/disable the mitigation. Drop the
"affected" sysctl leaf.
 1.114 22-May-2018  maxv Clarify the parameters for the SpectreV2 mitigation.

Add:
machdep.spectre_v2.swmitigated
Rename:
machdep.spectre_v2.mitigated -> machdep.spectre_v2.hwmitigated

Change the method string, to combine both the hardware and software
mitigations. swmitigated is set at compile time, hwmitigated can be
set by the user.

Examples:

spectre_v2.swmitigated = 1
spectre_v2.hwmitigated = 0
spectre_v2.method = [GCC retpoline]

spectre_v2.swmitigated = 0
spectre_v2.hwmitigated = 0
spectre_v2.method = (none)

spectre_v2.swmitigated = 1
spectre_v2.hwmitigated = 1
spectre_v2.method = [GCC retpoline] + [Intel IBRS]
 1.113 22-May-2018  maxv Mitigation for SpectreV4, based on SSBD. The following sysctl branches
are added:

machdep.spectre_v4.mitigated = {0/1} user-settable
machdep.spectre_v4.affected = {0/1} set by the kernel

The mitigation is not enabled by default yet. It is not tested either,
because no microcode update has been published yet.

On current CPUs a microcode/bios update must be applied for SSBD to be
available. The user can then set mitigated=1. Even with an update applied
the kernel will set affected=1.

On future CPUs, where the problem will presumably be fixed by default,
the CPU will report SSB_NO, and the kernel will set affected=0. In this
case we also have mitigated=0, but the mitigation is not needed.

For now the feature is system-wide. Perhaps we will want a more
fine-grained, per-process approach in the future.
 1.112 22-May-2018  maxv Reorder and rename, to make the code less SpectreV2-specific.
 1.111 04-Apr-2018  maxv Add machdep.spectre_v2.method, a string that tells which method is
active.
 1.110 31-Mar-2018  maxv Rename spectreV2 -> spectre_v2, and introduce spectre_v1 (which defaults
to not-mitigated).

This gives the user an easy way to find out whether the system is
vulnerable:

machdep.spectre_v1.mitigated
machdep.spectre_v2.mitigated

They are also available on i386.
 1.109 14-Mar-2018  maxv Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.
 1.108 23-Feb-2018  maxv branches: 1.108.2;
Change the SVS node, from machdep.svs_enabled to machdep.svs.enabled.
 1.107 22-Feb-2018  maxv Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.
 1.106 22-Feb-2018  maxv Make the machdep.svs_enabled sysctl writable, and add the kernel code
needed to disable SVS at runtime.

We set 'svs_enabled' to false, and hotpatch the kernel entry/exit points
to eliminate the context switch code.

We need to make sure there is no remote CPU that is executing the code we
are hotpatching. So we use two barriers:

* After the first one each CPU is guaranteed to be executing in
svs_disable_cpu with interrupts disabled (this way it can't leave this
place).

* After the second one it is guaranteed that SVS is disabled, so we flush
the cache, enable interrupts and continue execution normally.

Between the two barriers, cpu0 will disable SVS (svs_enabled=false and
hotpatch), and each CPU will restore the generic syscall entry point.

Three notes:

* We should call svs_pgg_update(true) afterwards, to put back PG_G on
the kernel pages (for better performance). This will be done in another
commit.

* The fact that we disable interrupts does not prevent us from receiving
an NMI, and it would be problematic. So we need to add some code to
verify that PMCs are disabled before hotpatching. This will be done
in another commit.

* In svs_disable() we expect each CPU to be online. We need to add a
check to make sure they indeed are.

The sysctl allows only a 1->0 transition. There is no point in doing 0->1
transitions anyway, and it would be complicated to implement because we
need to re-synchronize the CPU user page tables with the current ones (we
lost track of them in the last 1->0 transition).
 1.105 22-Feb-2018  maxv Improve the SVS initialization.

Declare x86_patch_window_open() and x86_patch_window_close(), and globalify
x86_hotpatch().

Introduce svs_enable() in x86/svs.c, that does the SVS hotpatching.

Change svs_init() to take a bool. This function gets called twice; early
when the system just booted (and nothing is initialized), lately when at
least pmap_kernel has been initialized.
 1.104 22-Feb-2018  maxv Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.
 1.103 17-Feb-2018  maxv Add svs_init. This is where we will detect the CPU and decide whether
to turn SVS on or not.

Add svs_pgg_update to dynamically add/remove PG_G from all the kernel
pages. Use it now.
 1.102 23-Nov-2017  kamil Restore removed sysctl(2) x86 entry: fpu_present

Hardcode it to 1 for now on i386 and amd64.

This unbreaks software that used it (e.g. LLDB).

Removal noted by <christos>

PR lib/52756 by myself
 1.101 29-Oct-2017  maxv Add a fifth region, called "head". On kaslr kernels it contains the ELF
Header and the ELF Section Headers. On normal kernels it is empty (the
headers are in the "boot" region).

Note: if you're using GENERIC_KASLR, you also need to rebuild the prekern.
 1.100 23-Oct-2017  maxv Add two XXXs, so that people don't get confused, a fifth region is needed
anyway.
 1.99 22-Oct-2017  maya Add sysctl machdep.bootmethod

either "UEFI" or "BIOS" to mimic freebsd
 1.98 09-Oct-2017  maya GC i386_fpu_present. no FPU x86 is not supported.

Also delete newly unused send_sigill
 1.97 08-Oct-2017  maxv KASLR: add workarounds to compute the bootinfo VAs (use the direct map),
and don't use large pages yet. Both will be fixed later.
 1.96 02-Oct-2017  maxv Add a machdep.tsc_user_enable sysctl, to enable/disable the rdtsc
instruction in usermode. It defaults to enabled.
 1.95 30-Sep-2017  maxv use bootspace
 1.94 23-Sep-2017  maxv Make MTRR_GET privileged, the structures are not always zeroed (thereby
leaking information), and beyond that we are not particularly interested
in letting userland know how the kernel uses its MTRRs.
 1.93 14-Jun-2017  maxv Define MAXPHYSMEM globally.
 1.92 14-Jun-2017  maxv Fix a bug introduced in bus_space.c::r1.39. This check too is hard-coded.
Might have had a cumulative effect on PR/52000.
 1.91 14-Apr-2017  kamil branches: 1.91.4;
x86: Export fpu_save, fpu_save_size, xsave_features to dedicated sysctl nodes

Add new defines:
- CPU_FPU_SAVE (15)
int: FPU Instructions layout
* to use this, CPU_OSFXSR must be true
* 0: FSAVE
* 1: FXSAVE
* 2: XSAVE
* 3: XSAVEOPT
- CPU_FPU_SAVE_SIZE (16)
int: FPU Instruction layout size
- CPU_XSAVE_FEATURES (17)
quad: FPU XSAVE features

Bump CPU_MAXID from 15 to 18.

These values were prepared originally to be exported without ASCIIZ name to
be used as handler. These values are useful to get FPU accessors in a
debugger easier to implement on x86 (PT_SETFPREG, PT_GETFPREG).

This interface handles all supported x86 targets. In the older (i386) and
less featured CPUs check first osfxsr (OS uses FXSAVE/FXRSTOR).

According to sys/arch/x86/include/cpu.h r.1.65 this was prepared to be
exported beyond simple CTL_CREATE node.

Sponsored by <The NetBSD Foundation>
 1.90 24-Mar-2017  maxv Don't compile PMCs on Xen.
 1.89 14-Feb-2017  nonaka Handle persistent memory. Currently only debug output.
 1.88 14-Feb-2017  nonaka x86: make btinfo_memmap from btinfo_efimemmap for to reduce mem_cluster_cnt.

should fix PR/51953.
 1.87 10-Feb-2017  maxv If the segment list is full, print a warning on the console and launch the
system with the available segments.

High memory systems may have more than VM_PHYSSEG_MAX segments; it is
better to truncate the memory and allow the system to work rather than
just panicking. The user can still increase VM_PHYSSEG_MAX (or ask us to).

Fixes issues such as PR/47093.

Note: the warning is logged but does not appear in dmesg, this too needs
to be fixed for the rest of the bootstrap procedure.
 1.86 10-Feb-2017  maxv Use macros instead of hard-coded constants. By the way, I don't think this
code is correct, but whatever.
 1.85 10-Feb-2017  maxv Import iomem_ex locally.
 1.84 09-Feb-2017  nonaka efi_md::md_virt always uses uint64_t.
 1.83 26-Jan-2017  nonaka Fix compile failure on i386 with DEBUG_MEMLOAD.
 1.82 24-Jan-2017  nonaka Initial commit of native amd64 EFI boot loader.
 1.81 10-Jan-2017  cherry branches: 1.81.2;
While reserving memory at boot time via uvm_physseg_unplug(9),
in the case of a disappearing segment (due to a segment sized msgbuf)
make sure segment offsets are read off before the segment disappears.

This should fix some of the recent boot time hard resets reported on
i386 recently.

Thanks to kre@ for pointing this out to me.


CVS: ----------------------------------------------------------------------
CVS: CVSROOT cvs.NetBSD.org:/cvsroot
CVS: please use "PR category/123" to have the commitmsg appended to PR 123
CVS:
CVS: Please evaluate your changes and consider the following.
CVS: Abort checkin if you answer no.
CVS: => For all changes:
CVS: Do the changed files compile?
CVS: Has the change been tested?
CVS: => If you are not completely familiar with the changed components:
CVS: Has the change been posted for review?
CVS: Have you allowed enough time for feedback?
CVS: => If the change is major:
CVS: => If the change adds files to, or removes files from $DESTDIR:
CVS: => If you are changing a library or kernel interface:
CVS: Have you successfully run "./build.sh release"?
 1.80 26-Dec-2016  cherry the i386 and amd64 boot time msgbuf init code is nearly identical.

Unify them into x86/x86_machdep.c:init_x86_msgbuf()

Boot tested on GENERIC (i386, amd64), XEN3_DOM0 (amd64)
 1.79 20-Dec-2016  maxv When the i386 port was designed, the bootstrap code needed little physical
memory, and taking it below the kernel image was fine: we had 160 free
pages, and never allocated more than 20. With amd64 however, we create a
direct map, and for this map we need a number of page table pages that is
mostly proportionate to the number of physical addresses available, which
implies that these 160 free pages may not be enough.

In particular, if the CPU does not support 1GB superpages, each 1GB chunk
of physical memory needs a 4k page in the direct map, which means that if
a machine has 160GB of ram, the bootstrap code allocates more than 160
pages, thereby overwriting the I/O mem area. If we push a little further,
if a machine has 512GB of ram, we allocate ~525 pages, and start
overwriting the kernel text, causing the system to go crazy at boot time.

Fix this moving the physical allocation area from below the kernel to above
it. avail_start is now beyond the kernel, and lowmem_rsvd indicates the
reserved low-memory pages. The area [lowmem_rsvd; IOM_BEGIN[ is
internalized into UVM, so there is no pa loss.

The only limit now is the pa of LAPIC, which is located at ~4GB of memory,
so it is perfectly fine.

This change theoretically adds va support for 512GB of ram; and it is a
prerequisite if we want to support more memory anyway.
 1.78 20-Dec-2016  maxv Depend on KERNTEXTOFF - KERNBASE, not IOM_END, both are equal but the text
address may change in the future.
 1.77 25-Nov-2016  maxv Initialize the module map limits in amd64, not x86.

For the record: normally we could enable this code on Xen, since the
bootstrap layout is globally the same. But there appears to be an issue
in xen_locore, since any kenter in the area after kern_end triggers a
KASSERT because the va is already busy.
 1.76 15-Nov-2016  maxv Initialize kern_end in amd64 instead of x86.
 1.75 01-Aug-2016  maxv This panic is wrong. There could be two consecutive clusters below
avail_start.
 1.74 17-Jul-2016  maxv Simplify x86_add_cluster.
 1.73 16-Jul-2016  maxv KNF, and rename.
 1.72 16-Jul-2016  maxv Simplify the way physical pages are internalized into the VM system on x86.
Only two functions are called now: init_x86_clusters, which initializes the
memory clusters from the bootinfo, and init_x86_vm, which inserts the pages
from the clusters into VM.
 1.71 16-Jul-2016  maxv Introduce x86_load_region(), and explain a little what we are doing.
 1.70 28-Jan-2016  jnemeth branches: 1.70.2;
move #ifdef notyet to encompass all relevant parts
 1.69 28-Jan-2016  christos fix previous commit that ate all 4's, and add aprint_btinfo()
 1.68 28-Jan-2016  christos just whitespace.
 1.67 11-Aug-2014  jnemeth branches: 1.67.4;
Add the infrastructure for MODULAR support for Xen kernels. At
the moment, this can only load very simple modules due to missing
symbols. It is being add at this time to make pullups to the
netbsd-7 branch easier. It is not enabled by default in any kernels.
 1.66 24-Jul-2014  riastradh Add a FIRST1G page freelist to x86, for old graphics devices.
 1.65 12-Jun-2014  riastradh Tweak x86 page freelists and add x86_select_freelist.

- Add 4G freelist to i386 -- there may be higher addresses if PAE.
- Add 64G and 1T freelists to amd64.
- Simplify freelist setup code and condense it into a table.
- Add x86_select_freelist to get a freelist guaranteed to yield
addresses no greater than a prescribed maximum address.

x86_select_freelist takes a uint64_t, not a paddr_t or bus_addr_t, so
that you can pass in, e.g., a 36-bit maximum address without needing
to write conditionals for i386/PAE.

No objections on port-x86:

https://mail-index.netbsd.org/port-i386/2014/05/21/msg003277.html
https://mail-index.netbsd.org/port-amd64/2014/05/21/msg002062.html
 1.64 01-Apr-2014  dsl branches: 1.64.2;
Revert most of the machdep sysctls to 32bit
 1.63 23-Feb-2014  dsl Rename (the recently added) 'x86_xsave_size' to 'x86_fpu_save_size'
and default to 512 (the size of the fxsave structure).
 1.62 23-Feb-2014  dsl Determine whether the cpu supports xsave (and hence AVX).
The result is only written to sysctl nodes at the moment.
I see:
machdep.fpu_save = 3 (implies xsaveopt)
machdep.xsave_size = 832
machdep.xsave_features = 7
Completely common up the i386 and amd64 machdep sysctl creation.
 1.61 05-Oct-2013  rmind Remove some unused variables.
 1.60 31-Aug-2013  jmcneill md_root_setconf also depends on option MEMORY_DISK_DYNAMIC
 1.59 30-Aug-2013  jmcneill Add support for using a raw file-system image as memory disk root with
the x86 bootloader.
 1.58 12-Apr-2013  christos branches: 1.58.4;
de-duplication police arrests sysctl.
 1.57 28-Nov-2011  tls branches: 1.57.8;

Add support for passing saved entropy (random seed file) to the kernel
from the bootloader. This can fix the problem of poor quality keys
for other kernel modules which call arc4random() early in kernel startup
(NFS startup, in particular, causes this).

We continue to rely on the etc/rc.d/random_seed script to save entropy
to the seed file at shutdown and erase the seed file at startup.

Boot loader support implemented only for i386 and amd64 ports for now but
it should be easy for other ports to do the same or similar.
 1.56 13-Aug-2011  christos branches: 1.56.2;
Always provide a meaningful short name for the kobj in the error message,
as well as the function name and the linenumber, without extra line feeds.
 1.55 11-Aug-2011  cherry Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs
 1.54 10-Aug-2011  cherry xen ipi infrastructure
 1.53 01-Aug-2011  jmcneill x86_reset: use acpi_reset instead of AcpiReset
 1.52 31-Jul-2011  jmcneill x86_reset: If the FADT defines a reset register and ACPI was active, try
to use it to reset the system before attempting any other methods
 1.51 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.50 31-May-2011  dyoung branches: 1.50.2;
Don't use the C preprocessor to configure USERCONF. Instead, either do
or do not link in subr_userconf.c and x86_userconf.c.

Provide no-op stubs for userconf_bootinfo(), userconf_init(), and
userconf_prompt().

Delete all occurrences of #include "opt_userconf.h" as well as USERCONF
and __HAVE_USERCONF_BOOTINFO #ifdef'age.
 1.49 26-May-2011  para typo in comment
 1.48 26-May-2011  para put userconf_bootinfo under option USERCONF, to allow kernels without that option
 1.47 26-May-2011  uebayasi Support userconf(4) command in boot(8)/boot.cfg(5) on i386/amd64.

From jmmv@, no objections seen in the proposed thread:

http://mail-index.netbsd.org/tech-kern/2009/01/22/msg004081.html
 1.46 21-Mar-2011  rmind cpu_need_resched: make AST if no __HAVE_PREEMPTION. Change has no effect
since MP option is mandatory on x86, but makes code more logical.
 1.45 06-Feb-2011  jmcneill - add support for using compressed images as splash images
- retire SPLASHSCREEN_PROGRESS and SPLASHSCREEN_IMAGE options
 1.44 21-Oct-2010  yamt branches: 1.44.2; 1.44.4;
don't forget to call nmi_init.
 1.43 23-Aug-2010  jruoho Other entry points beyond x86_cpu_idle_halt() may use HLT as the
idle-mechanism. Send an IPI also for these in cpu_need_resched().
 1.42 18-Jul-2010  jruoho Merge a driver for ACPI CPUs with basic support for processor power states,
also known as C-states. The code is modular and provides an easy way to add
the remaining functionality later (namely throttling and P-states).

Remarks:

1. Commented out in the GENERICs; more testing exposure is needed.

2. The C3-state is disabled for the time being because it turns off
timers, among them the local APIC timer. This may not be universally
true on all x86 processors; define ACPICPU_ENABLE_C3 to test.

3. The algorithm used to choose a power state may need tuning. When
evaluating the appropriate state, the implementation uses the
previous sleep time as an indicator. Additional hints would include
for example the system load.

Also bus master activity is evaluated when choosing a state. The
usb(4) stack is notorious for such activity even when unused.
Typically it must be disabled in order to reach the C3-state,
but it may also prevent the use of C2.

4. While no extensive empirical measurements have been carried out, the
power savings are somewhere between 1-2 W with C1 and C2, depending
on the processor, firmware, and load. With C3 even up to 4 W can be
saved. The less something ticks, the more power is saved.

ok jmcneill@, joerg@, and discussed with various people.
 1.41 02-Jun-2010  joerg Restore PHYSMEM_MAX* options (hi cegger!)
 1.40 18-Apr-2010  jym This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.
 1.39 09-Feb-2010  jym branches: 1.39.2;
Wrap a comment; add a space after a comma to another (align with next line)
 1.38 09-Jan-2010  cegger branches: 1.38.2;
TOPLIMIT for i386 PAE is 64GB.
 1.37 22-Nov-2009  bouyer For amd64, introduce a third free list distinct from the default free list
for memory between 16M and 4G. On large memory machine, this avoids
the 32bit-accessible memory being eaten by various kernel early allocation,
causing 32bit bus_dma(9) memory allocation to fail at boot time.
Tested on a system with 48GB RAM; based on netbsd-5 patch proposed on
port-amd64 3 days ago.
 1.36 21-Nov-2009  rmind Use lwp_getpcb() on x86 MD code, clean from struct user usage.
 1.35 06-Oct-2009  elad Add a (weak aliased) machdep_init() as a place to do machdep initialization
that can't happen as early as the other init functions as called from
cpu_startup() -- for example, register kauth(9) listeners.

Put unprivileged policy in the x86 code; used by i386, amd64, and xen.
 1.34 05-Oct-2009  rmind Remove X86_IPI_WRITE_MSR (and msr_ipifuncs.c), replace all uses in drivers
with xc_broadcast(). AMD K8 PowerNow driver tested by <jakllsch>, thanks!

Closes PR/37665.
 1.33 05-Aug-2009  jym Add Intel SpeedStep and AMD PowerNow! support in Xen dom0. MSR operations
are now compiled in by default.

Note that MSR support in Xen depends on its version. rdmsr() should always
succeed, but wrmsr() to certain registers can end in a NOOP. In that case,
the error will be logged (see xm dmesg).

Setting CPU frequency (SpeedStep) requires Xen 3.3 with the option
cpufreq="dom0-kernel" passed down to hypervisor during boot.

Compiled and tested for SpeedStep under i386 for XEN3_DOM0 and XEN3PAE_DOM0
by jym@. amd64 was tested by Joel Carnat.

See also http://mail-index.netbsd.org/port-xen/2009/08/02/msg005213.html .

Commit requested by bouyer@.
 1.32 20-Jun-2009  cegger make this build with DEBUG_MEMLOAD in all combinations of 32bit, 32bit PAE and 64bit
 1.31 21-Mar-2009  ad Fix 'boot -z' bogons.
 1.30 13-Feb-2009  apb Use "defopt MODULAR" in sys/conf/files, and #include "opt_modular.h"
in all kernel sources that use the MODULAR option.
Proposed in tech-kern on 18 Jan 2009.
 1.29 27-Jan-2009  christos branches: 1.29.2;
add a couple of include files
 1.28 27-Jan-2009  christos factor out common reset code.
 1.27 15-Dec-2008  cegger cleanup BIOS memmap code:
- get rid of some nested externs
- reduce dependency on global variables
- some preparations for upcoming pmem(9)
 1.26 03-Dec-2008  ad Don't abort pageidlezero unless a realtime thread wants to run.
 1.25 14-Nov-2008  cegger merge BIOS memmap code from i386/i386/machdep.c:init386() and amd64/amd64/machdep.c:init_x86_64 into x86/x86/x86_machdep.c
 1.24 12-Nov-2008  ad Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.23 09-May-2008  joerg branches: 1.23.4; 1.23.6; 1.23.8;
Only check for hlt on !Xen. This needs to be reviewed when Xen gets SMP
support.
 1.22 09-May-2008  joerg Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.
 1.21 02-May-2008  ad branches: 1.21.2;
- Give x86 BIOS boot the ability to load new style modules and pass them
into the kernel. Based on a patch by jmcneill@, with many fixes and
improvements by me.

- Put MEMORY_DISK_DYNAMIC and MODULAR into the GENERIC kernels, so that
you can load miniroot.kmod from the boot blocks and boot into the
installer!
 1.20 30-Apr-2008  ad Avoid unneeded AST faults.
 1.19 29-Apr-2008  yamt sprinkle KASSERT(kpreempt_disabled()).
 1.18 29-Apr-2008  yamt make cpu_intr_p preemption safe.
 1.17 28-Apr-2008  ad Add support for kernel preeemption to the i386 and amd64 ports. Notes:

- I have seen one isolated panic in the x86 pmap, but otherwise i386
seems stable with preemption enabled.

- amd64 is missing the FPU handling changes and it's not yet safe to
enable it there.

- The usual level for kern.sched.kpreempt_pri will be 128 once enabled
by default. For testing, setting it to 0 helps to shake out bugs.
 1.16 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.15 06-Dec-2007  ad branches: 1.15.12; 1.15.14; 1.15.16;
Share cpu_intr_p() with xen. Why xen has its own intr.c is a mystery.
 1.14 12-Nov-2007  ad branches: 1.14.2;
Merge cpu_need_resched() from vmlocking:

- Always do an aston(), even if not sending an IPI. May help with xine.
- Post asts on cpu_onproc, not ci_curlwp.
 1.13 17-Oct-2007  garbled branches: 1.13.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.12 26-Sep-2007  ad branches: 1.12.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.
 1.11 29-Aug-2007  ad branches: 1.11.2;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.
 1.10 21-Mar-2007  xtraeme branches: 1.10.4; 1.10.8; 1.10.12; 1.10.14;
Don't build msr_ipifuncs on Xen, fixes the build with XEN2_DOM0.
 1.9 20-Mar-2007  xtraeme MSR read and write IPI handlers for x86. A MSR will be read or written
in all CPUs available in the system. This adds another member
to struct cpu_info, ci_msr_rvalue; it will contain the value of the MSR
in a previous operation.

Tested with clockmod in UP and SMP by me, tested with est in SMP
by Daniel Carosone and Michael Van Elst.

Ok'ed by Andrew Doran and Matthew R. Green.
 1.8 01-Mar-2007  yamt branches: 1.8.2; 1.8.4; 1.8.6;
check_pa_acc: don't bother to use KAUTH_MACHDEP_UNMANAGEDMEM
if the address is known.
no functional changes, unless listeners do some kind of logging.
 1.7 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.6 14-Jan-2007  ad branches: 1.6.2;
On second thought, implement x86_pause() as a regular function. The small
delay from the call is useful for spinlock backoff.
 1.5 26-Dec-2006  elad Make machdep scope architecture-agnostic by removing all arch-specific
requests and centralizing them all. The result is that some of these
are not used on some architectures, but the documentation was updated
to reflect that.
 1.4 22-Nov-2006  elad branches: 1.4.2;
Introduce KAUTH_REQ_MACHDEP_{ALPHA,X86}_UNMANAGEDMEM to handle access
to unmanaged memory.

These are the last two securelevel references in the MD code.
 1.3 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.2 30-Oct-2006  elad Move i386/amd64 common code (check_pa_acc()) to x86.

I didn't know what header to put the prototype in, so it's both in
i386/mem.c and amd64/mem.c; probably can be moved later.

Tested on amd64, assumed working on i386. :)

yamt@ okay
 1.1 30-Dec-2005  jmmv branches: 1.1.18; 1.1.22; 1.1.24;
Add a 'struct bootinfo' to represent the bootinfo structure used in the
kernel by x86 platforms (instead of a simple char *). This way, the code
in, e.g., lookup_bootinfo, is a bit easier to understand.

While here, move the lookup_bootinfo function used in x86 platforms (amd64,
i386 and xen) to a common file (x86/x86_machdep.c), as it was exactly the
same in all of them.
 1.1.24.1 10-Dec-2006  yamt sync with head.
 1.1.22.3 01-Feb-2007  ad Sync with head.
 1.1.22.2 12-Jan-2007  ad Sync with head.
 1.1.22.1 18-Nov-2006  ad Sync with head.
 1.1.18.8 07-Dec-2007  yamt sync with head
 1.1.18.7 15-Nov-2007  yamt sync with head.
 1.1.18.6 27-Oct-2007  yamt sync with head.
 1.1.18.5 03-Sep-2007  yamt sync with head.
 1.1.18.4 26-Feb-2007  yamt sync with head.
 1.1.18.3 30-Dec-2006  yamt sync with head.
 1.1.18.2 21-Jun-2006  yamt sync with head.
 1.1.18.1 30-Dec-2005  yamt file x86_machdep.c was added on branch yamt-lazymbuf on 2006-06-21 14:58:06 +0000
 1.4.2.3 22-Apr-2007  bouyer Apply patch to make Xen kernels build again; provided by xtraeme as part of
ticket #575
 1.4.2.2 20-Apr-2007  bouyer Pull up following revision(s) (requested by mlelstv in ticket #575):
sys/arch/i386/i386/est.c sync with 1.37
sys/arch/i386/i386/ipifuncs.c sync with 1.16
sys/arch/x86/include/cpu_msr.h sync with 1.4
sys/arch/x86/include/intrdefs.h sync with 1.8
sys/arch/x86/include/powernow.h sync with 1.9
sys/arch/x86/x86/powernow_k8.c sync with 1.20
sys/arch/x86/x86/msr_ipifuncs.c sync with 1.8
sys/arch/amd64/amd64/ipifuncs.c sync with 1.9
sys/arch/i386/i386/identcpu.c patch
sys/arch/i386/i386/machdep.c patch
sys/arch/i386/include/cpu.h patch
sys/arch/x86/conf/files.x86 patch
sys/arch/x86/x86/x86_machdep.c patch
sys/arch/amd64/amd64/machdep.c patch
Add MSR write IPI handler for x86. Use it and the RUN_ONCE framework
to make est and powernow drivers work properly with SMP.
 1.4.2.1 06-Jan-2007  bouyer Pull up following revision(s) (requested by elad in ticket #316):
share/examples/secmodel/secmodel_example.c: revision 1.10 via patch
sys/arch/i386/i386/sys_machdep.c: revision 1.79
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.31
share/man/man9/secmodel_bsd44.9: revision 1.9
sys/arch/vax/vax/mem.c: revision 1.34 via patch
sys/arch/sh3/sh3/mem.c: revision 1.23 via patch
sys/arch/sh5/sh5/mem.c: revision 1.14 via patch
sys/secmodel/bsd44/secmodel_bsd44_suser.c: revision 1.22 via patch
sys/arch/powerpc/powerpc/mem.c: revision 1.27 via patch
sys/arch/x86/x86/x86_machdep.c: revision 1.5
sys/arch/alpha/alpha/machdep.c: revision 1.291
sys/arch/arm/arm32/mem.c: revision 1.17 via patch
sys/secmodel/bsd44/secmodel_bsd44_securelevel.c: revision 1.20
sys/sys/kauth.h: revision 1.29 via patch
sys/arch/amd64/amd64/sys_machdep.c: revision 1.10
share/man/man9/kauth.9: revision 1.43 via patch
sys/arch/xen/i386/sys_machdep.c: revision 1.10
sys/kern/kern_auth.c: revision 1.35
sys/arch/pc532/pc532/mem.c: revision 1.43 via patch
Make machdep scope architecture-agnostic by removing all arch-specific
requests and centralizing them all. The result is that some of these
are not used on some architectures, but the documentation was updated
to reflect that.
 1.6.2.3 24-Mar-2007  yamt sync with head.
 1.6.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.6.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.8.6.1 29-Mar-2007  reinoud Pullup to -current
 1.8.4.1 11-Jul-2007  mjf Sync with head.
 1.8.2.6 03-Dec-2007  ad Sync with HEAD.
 1.8.2.5 03-Dec-2007  ad Sync with HEAD.
 1.8.2.4 01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.8.2.3 09-Oct-2007  ad Sync with head.
 1.8.2.2 10-Apr-2007  ad Sync with head.
 1.8.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.10.14.2 09-Jan-2008  matt sync with HEAD
 1.10.14.1 06-Nov-2007  matt sync with HEAD
 1.10.12.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.10.12.3 14-Nov-2007  joerg Sync with HEAD.
 1.10.12.2 02-Oct-2007  joerg Sync with HEAD.
 1.10.12.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.10.8.1 03-Sep-2007  skrll Sync with HEAD.
 1.10.4.1 03-Oct-2007  garbled Sync with HEAD
 1.11.2.1 06-Oct-2007  yamt sync with head.
 1.12.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.13.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.13.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.14.2.1 08-Dec-2007  ad Sync with head.
 1.15.16.7 09-Oct-2010  yamt sync with head
 1.15.16.6 11-Aug-2010  yamt sync with head.
 1.15.16.5 11-Mar-2010  yamt sync with head
 1.15.16.4 19-Aug-2009  yamt sync with head.
 1.15.16.3 18-Jul-2009  yamt sync with head.
 1.15.16.2 04-May-2009  yamt sync with head.
 1.15.16.1 16-May-2008  yamt sync with head.
 1.15.14.1 18-May-2008  yamt sync with head.
 1.15.12.2 17-Jan-2009  mjf Sync with HEAD.
 1.15.12.1 02-Jun-2008  mjf Sync with HEAD.
 1.21.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.23.8.4 02-Feb-2009  snj Apply patch (requested by he in ticket #396):
Include dev/isa/isareg.h for IO_KBD.
 1.23.8.3 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #396):
sys/arch/x86/x86/x86_machdep.c: revision 1.29
add a couple of include files
 1.23.8.2 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #396):
sys/arch/amd64/amd64/machdep.c: revision 1.122
sys/arch/i386/i386/machdep.c: revision 1.657
sys/arch/x86/include/cpufunc.h: revision 1.11
sys/arch/x86/x86/x86_machdep.c: revision 1.28
factor out common reset code.
 1.23.8.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #345):
sys/arch/amd64/amd64/genassym.cf: revision 1.39
sys/arch/i386/i386/genassym.cf: revision 1.79
sys/arch/i386/i386/locore.S: revision 1.82
sys/arch/x86/x86/x86_machdep.c: revision 1.26
Don't abort pageidlezero unless a realtime thread wants to run.
 1.23.6.3 28-Apr-2009  skrll Sync with HEAD.
 1.23.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.23.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.23.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.29.2.6 27-Aug-2011  jym Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen
work of cherry@.

No regression observed on suspend/restore.
 1.29.2.5 28-Mar-2011  jym Sync with HEAD. TODO before merge:
- shortcut for suspend code in sysmon, when powerd(8) is not running.
Borrow ``xs_watch'' thread context?
- bug hunting in xbd + xennet resume. Rings are currently thrashed upon
resume, so current implementation force flush them on suspend. It's not
really needed.
 1.29.2.4 24-Oct-2010  jym Sync with HEAD
 1.29.2.3 01-Nov-2009  jym Sync with HEAD.
 1.29.2.2 23-Jul-2009  jym Sync with HEAD.
 1.29.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.38.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.38.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.38.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.39.2.7 12-Jun-2011  rmind sync with head
 1.39.2.6 31-May-2011  rmind sync with head
 1.39.2.5 21-Apr-2011  rmind sync with head
 1.39.2.4 05-Mar-2011  rmind sync with head
 1.39.2.3 03-Jul-2010  rmind sync with head
 1.39.2.2 30-May-2010  rmind sync with head
 1.39.2.1 18-Mar-2010  rmind Unify /dev/{mem,kmem,zero,null} implementations in MI code. Based on patch
from Joerg Sonnenberger, proposed on tech-kern@, in February 2008.

Work and depression still in progress.
 1.44.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.44.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.50.2.3 21-Oct-2011  bouyer Make this build without 'options MULTIPROCESSOR'
 1.50.2.2 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.50.2.1 03-Jun-2011  cherry Initial import of xen MP sources, with kernel and userspace tests.
- this is a source priview.
- boots to single user.
- spurious interrupt and pmap related panics are normal
 1.56.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.2.1 17-Apr-2012  yamt sync with head
 1.57.8.3 03-Dec-2017  jdolecek update from HEAD
 1.57.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.57.8.1 23-Jun-2013  tls resync from head
 1.58.4.1 18-May-2014  rmind sync with head
 1.64.2.1 10-Aug-2014  tls Rebase.
 1.67.4.5 28-Aug-2017  skrll Sync with HEAD
 1.67.4.4 05-Feb-2017  skrll Sync with HEAD
 1.67.4.3 05-Dec-2016  skrll Sync with HEAD
 1.67.4.2 05-Oct-2016  skrll Sync with HEAD
 1.67.4.1 19-Mar-2016  skrll Sync with HEAD
 1.70.2.5 26-Apr-2017  pgoyette Sync with HEAD
 1.70.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.70.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.70.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.70.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.81.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.91.4.5 09-Mar-2019  martin Pull up following revision(s) via patch (requested by nonaka in ticket #1210):

sys/dev/hyperv/vmbusvar.h: revision 1.1
sys/dev/hyperv/hvs.c: revision 1.1
sys/dev/hyperv/if_hvn.c: revision 1.1
sys/dev/hyperv/vmbusic.c: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.69
sys/arch/x86/isa/clock.c: revision 1.34
sys/arch/x86/include/intrdefs.h: revision 1.22
sys/arch/i386/conf/GENERIC: revision 1.1201
sys/arch/x86/x86/hyperv.c: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.105
sys/arch/x86/x86/x86_machdep.c: revision 1.124
sys/arch/i386/conf/GENERIC: revision 1.1203
sys/arch/amd64/amd64/genassym.cf: revision 1.74
sys/arch/i386/conf/GENERIC: revision 1.1204
sys/arch/amd64/conf/GENERIC: revision 1.520
sys/arch/x86/x86/hypervreg.h: revision 1.1
sys/arch/amd64/amd64/vector.S: revision 1.69
sys/dev/hyperv/hvshutdown.c: revision 1.1
sys/dev/hyperv/hvshutdown.c: revision 1.2
sys/dev/usb/if_urndisreg.h: file removal
sys/arch/x86/x86/cpu.c: revision 1.167
sys/arch/x86/conf/files.x86: revision 1.107
sys/dev/usb/if_urndis.c: revision 1.20
sys/dev/hyperv/vmbusicreg.h: revision 1.1
sys/dev/hyperv/hvheartbeat.c: revision 1.1
sys/dev/hyperv/vmbusicreg.h: revision 1.2
sys/dev/hyperv/hvheartbeat.c: revision 1.2
sys/dev/hyperv/files.hyperv: revision 1.1
sys/dev/ic/rndisreg.h: revision 1.1
sys/arch/i386/i386/genassym.cf: revision 1.111
sys/dev/ic/rndisreg.h: revision 1.2
sys/dev/hyperv/hyperv_common.c: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.1
sys/dev/hyperv/hypervreg.h: revision 1.1
sys/dev/hyperv/hvtimesync.c: revision 1.2
sys/dev/hyperv/vmbusicvar.h: revision 1.1
sys/dev/hyperv/if_hvnreg.h: revision 1.1
sys/arch/x86/x86/lapic.c: revision 1.70
sys/arch/amd64/amd64/vector.S: revision 1.70
sys/dev/ic/ndisreg.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.516
sys/dev/hyperv/hypervvar.h: revision 1.1
sys/arch/amd64/conf/GENERIC: revision 1.518
sys/arch/amd64/conf/GENERIC: revision 1.519
sys/arch/i386/conf/files.i386: revision 1.400
sys/dev/acpi/vmbus_acpi.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.1
sys/dev/hyperv/vmbus.c: revision 1.2
sys/arch/x86/x86/intr.c: revision 1.144
sys/arch/i386/i386/vector.S: revision 1.83
sys/arch/amd64/conf/files.amd64: revision 1.112

separate RNDIS definitions from urndis(4) for use with Hyper-V NetVSC.

-

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.
graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.

-

Add __diagused.

-

PR/53984: Partial revert of modify lapic_calibrate_timer() in lapic.c r1.69.

-

Update Hyper-V related drivers description.

-

Remove unused definition.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
NFCI intended.

-

commented out hvkvp entry.

-

fix typo. pointed out by pgoyette@n.o.

-

Use IDTVEC instead of NENTRY for handle_hyperv_hypercall.

-

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.
 1.91.4.4 23-Jun-2018  martin Pull up the following, via patch, requested by maxv in ticket #897:

sys/arch/amd64/amd64/locore.S 1.166 (patch)
sys/arch/i386/i386/locore.S 1.157 (patch)
sys/arch/x86/include/cpu.h 1.92 (patch)
sys/arch/x86/include/fpu.h 1.9 (patch)
sys/arch/x86/x86/fpu.c 1.33-1.39 (patch)
sys/arch/x86/x86/identcpu.c 1.72 (patch)
sys/arch/x86/x86/vm_machdep.c 1.34 (patch)
sys/arch/x86/x86/x86_machdep.c 1.116,1.117 (patch)

Support eager fpu switch, to work around INTEL-SA-00145.
Provide a sysctl machdep.fpu_eager, which gets automatically
initialized to 1 on affected CPUs.
 1.91.4.3 09-Jun-2018  martin Pullup the following revisions, requested by maxv in ticket #865:

sys/arch/amd64/amd64/machdep.c 1.303 (patch)
sys/arch/amd64/conf/GENERIC 1.492 (patch)
sys/arch/amd64/conf/files.amd64 1.103 (patch)
sys/arch/i386/i386/machdep.c 1.806 (patch)
sys/arch/i386/conf/GENERIC 1.1179 (patch)
sys/arch/i386/conf/files.i386 1.393 (patch)
sys/arch/x86/include/cpu.h 1.91 (patch)
sys/arch/x86/include/specialreg.h upto 1.126 (patch)
sys/arch/x86/x86/x86_machdep.c upto 1.115 (patch, adapted)
sys/arch/x86/x86/spectre.c upto 1.19 (patch, adapted,
no IBRS,
SpectreV2 mitigations not
enabled by default)

Backport the hardware SpectreV2 and SpectreV4 mitigations.
 1.91.4.2 22-Mar-2018  martin Pull up the following revisions, requested by maxv in ticket #652:

sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch)
sys/arch/amd64/amd64/db_machdep.c 1.6 (patch)
sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch)
sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch)
sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch)
sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch)
sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch)
sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch)
sys/arch/amd64/conf/kern.ldscript 1.26 (patch)
sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch)
sys/arch/amd64/include/param.h 1.25 (patch)
sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch)
sys/arch/x86/conf/files.x86 1.91,1.93 (patch)
sys/arch/x86/include/cpu.h 1.88,1.89 (patch)
sys/arch/x86/include/pmap.h 1.75 (patch)
sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch)
sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch)
sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch)
sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch)
sys/arch/x86/x86/svs.c 1.1-1.14
sys/arch/xen/conf/files.compat 1.30 (patch)

Backport SVS. Not enabled yet.
 1.91.4.1 21-Jun-2017  snj Pull up following revision(s) (requested by maxv in ticket #42):
sys/arch/amd64/conf/kern.ldscript: revision 1.23
sys/arch/x86/x86/x86_machdep.c: revision 1.92
Fix a pretty dumb mistake I made in r1.22: the alignment needs to be in the
bss, otherwise the bootloader will use memory before __kernel_end and give
a wrong start pa to the kernel.
This issue was investigated by Anthony Mallet. Should fix PR/52000.
--
Fix a bug introduced in bus_space.c::r1.39. This check too is hard-coded.
Might have had a cumulative effect on PR/52000.
 1.108.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.108.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.108.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.108.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.108.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.108.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.117.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.117.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.117.2.1 10-Jun-2019  christos Sync with HEAD
 1.127.2.1 29-Dec-2019  martin Pull up following revision(s) (requested by pgoyette in ticket #587):

sys/arch/x86/x86/x86_machdep.c: revision 1.134

to know this in order to set module_machine correctly, which in turn is
needed to set the module_base path from which modules are loaded and
which provides the value of sysctl(8) variable kern.module.path

Thanks to jnemeth@ for pointing out the problem.
 1.134.2.1 29-Feb-2020  ad Sync with head.
 1.137.2.5 18-Apr-2020  bouyer Centralize initialisations of delay_func and initclock_func
in x86_machdep.c and export from <x86/machdep.h>
Introduce a x86_dummy_initclock() and a x86_cpu_initclock_func pointer,
to be used later for Xen HVM native clock support.
rename rtclock_tval to x86_rtclock_tval and export from <x86/machdep.h>,
for the benefit of lapic.c
 1.137.2.4 16-Apr-2020  bouyer More #ifndef XEN -> #ifndef XENPV
 1.137.2.3 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING
 1.137.2.2 11-Apr-2020  bouyer Remove spaces in machdep.hypervisor, suggested by mlelstv@
 1.137.2.1 08-Apr-2020  bouyer Remove VM_GUEST_XEN and define only Xen subtypes:
VM_GUEST_XENPV
VM_GUEST_XENPVH
VM_GUEST_XENHVM
VM_GUEST_XENPVHVM

Set vm_guest in the start routine, if it is hypervisor-specific (e.g Xen PV).
If vm_guest was not set early and we detect Xen in identify_hypervisor(),
assume it is VM_GUEST_XENHVM. Refine to VM_GUEST_PVXENHVM in
hypervisor_match().
 1.146.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.152.4.2 29-Mar-2025  martin Pull up following revision(s) (requested by imil in ticket #1074):

sys/arch/x86/x86/x86_machdep.c: revision 1.155
sys/arch/x86/include/cpu.h: revision 1.137
sys/arch/x86/x86/x86_machdep.c: revision 1.156
sys/arch/x86/include/cpu.h: revision 1.138
sys/arch/x86/x86/consinit.c: revision 1.40
sys/arch/x86/acpi/acpi_machdep.c: revision 1.37
sys/arch/x86/acpi/acpi_machdep.c: revision 1.38
sys/arch/amd64/amd64/machdep.c: revision 1.370
sys/arch/xen/xen/hypervisor.c: revision 1.97
sys/arch/xen/xen/hypervisor.c: revision 1.98
sys/arch/amd64/amd64/genassym.cf: revision 1.98
sys/arch/x86/x86/x86_autoconf.c: revision 1.88
sys/arch/x86/x86/x86_autoconf.c: revision 1.89
sys/arch/amd64/amd64/locore.S: revision 1.226
sys/arch/amd64/amd64/locore.S: revision 1.227
sys/arch/x86/x86/identcpu.c: revision 1.131

Add support for non-Xen PVH guests to amd64. Patch from
Emile 'iMil' Heitor in PR kern/57813, with some cosmetic tweaks by me.
Tested on bare metal, Xen PV and Xen PVH by me.

Get one more change from PR kern/57813, needed for non-Xen PVH.

Introduce vm_guest_is_pvh() and use it in place of
(vm_guest == VM_GUEST_XENPVH || vm_guest == VM_GUEST_GENPVH)
 1.152.4.1 24-Dec-2022  martin Pull up following revision(s) (requested by bouyer in ticket #21):

sys/arch/x86/x86/x86_machdep.c: revision 1.153

x86_add_cluster() takes the end of the segment, not the size.

Should fix PR port-xen/57121
 1.154.6.1 02-Aug-2025  perseant Sync with HEAD
 1.4 07-Sep-2022  knakahara NetBSD/x86: Raise the number of interrupt sources per CPU from 32 to 56.

There has been no objection for three years.
https://mail-index.netbsd.org/port-amd64/2019/09/22/msg003012.html
Implemented by nonaka@n.o, updated by me.
 1.3 08-May-2020  ad KNF
 1.2 25-Apr-2020  bouyer Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor
 1.1 11-Apr-2020  bouyer branches: 1.1.2;
file x86_softintr.c was initially added on branch bouyer-xenpvh.
 1.1.2.2 19-Apr-2020  bouyer Add a struct pic * member to struct intrhand.
This will be used for interrupt_get_count()
For Xen remplace pic_type with a pointer to the pic, and add a pointer
to intrhand, in struct pintrhand
Make event_set_handler return the pointer to struct intrhand.
Don't allocate a fake intrhand in xen_intr_establish_xname(), use the
one returned by event_set_handler().
 1.1.2.1 11-Apr-2020  bouyer Move softint and preemtion-related functions out of x86/x86/intr.c to
its own file, x86/x86/x86_softintr.c
Add x86/x86/x86_softintr.c for native and XenPV
Make sure XenPV also check ci_ioending, which is used for softints.
Switch XenPV to fast softints and allow kernel preemption.
kpreempt_disable() before calling pmap_changeprot_local()
run xen_wallclock_time() and xen_global_systime_ns() at splshed() to
avoid being interrupted.

XXX amd64 lock stubs are racy for XPENDING
 1.5 21-Sep-2016  jmcneill Set hw.acpi.sleep.vbios when a non-HW accelerated VGA driver attaches.
If the VGA_POST option is present in the kernel the default value is 2,
otherwise 1. PR kern/50781

Reviewed by: agc, mrg
 1.4 19-Oct-2011  dyoung branches: 1.4.12; 1.4.30; 1.4.34;
Create a stub implementation of pci_ranges_infer().
 1.3 18-Oct-2011  dyoung Use the right return types for x86_nullop() and x86_zeroop().
 1.2 18-Oct-2011  dyoung Define some optional routines that will help device_register() to
register ISA & PCI devices. Add stub implementations of the routines.
 1.1 03-Apr-2011  dyoung branches: 1.1.2; 1.1.4; 1.1.8;
Clean up excessive #ifdef'age of NMI trap handling for amd64/i386/xen.
Handle NMI in all Xen kernels.
 1.1.8.2 06-Jun-2011  jruoho Sync with HEAD.
 1.1.8.1 03-Apr-2011  jruoho file x86_stub.c was added on branch jruoho-x86intr on 2011-06-06 09:07:10 +0000
 1.1.4.2 02-May-2011  jym Sync with head.
 1.1.4.1 03-Apr-2011  jym file x86_stub.c was added on branch jym-xensuspend on 2011-05-02 22:49:58 +0000
 1.1.2.2 21-Apr-2011  rmind sync with head
 1.1.2.1 03-Apr-2011  rmind file x86_stub.c was added on branch rmind-uvmplock on 2011-04-21 01:41:33 +0000
 1.4.34.1 04-Nov-2016  pgoyette Sync with HEAD
 1.4.30.1 05-Oct-2016  skrll Sync with HEAD
 1.4.12.1 03-Dec-2017  jdolecek update from HEAD
 1.21 08-Dec-2023  andvar fix triple s typos in comments.
 1.20 20-Aug-2022  riastradh x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.
 1.19 07-Oct-2021  msaitoh KNF. No functional change.
 1.18 22-Mar-2020  ad x86 pmap:

- Give pmap_remove_all() its own version of pmap_remove_ptes() that on native
x86 does the bare minimum needed to clear out PTPs. Cuts ~4% sys time on
'build.sh release' for me.

- pmap_sync_pv(): there's no need to issue a redundant TLB shootdown. The
caller waits for the competing operation to finish.

- Bring 'options TLBSTATS' up to date.
 1.17 23-Feb-2020  ad Adjustment to previous: TP_SET_DONE() was wiping out the VA to shoot,
instead of ORing the flag into the array element. This caused the CPU
initiating the shootdown to occasionally miss an INVLPG.
 1.16 22-Feb-2020  maxv add relaxed atomics, ok ad@ riastradh@
 1.15 15-Jan-2020  ad Push the INVLPG limit for shootdowns up to 16 (for UBC).
 1.14 12-Jan-2020  ad x86 pmap:

- It turns out that every page the pmap frees is necessarily zeroed. Tell
the VM system about this and use the pmap as a source of pre-zeroed pages.

- Redo deferred freeing of PTPs more elegantly, including the integration with
pmap_remove_all(). This fixes problems with nvmm, and possibly also a crash
discovered during fuzzing.

Reported-by: syzbot+a97186518c84f1d85c0c@syzkaller.appspotmail.com
 1.13 16-Dec-2019  ad branches: 1.13.2;
Align the TLB packet precisely on the stack, and do 7 INVLPG since it's
what fits in a single line.
 1.12 02-Dec-2019  pgoyette Fix tyo in comment
 1.11 02-Dec-2019  ad Fix a hard hang with Xen MP.
 1.10 22-Nov-2019  ad Minor correction to previous.
 1.9 21-Nov-2019  ad x86 TLB shootdown IPI changes:

- Shave some time off processing.
- Reduce cacheline/bus traffic on systems with many CPUs.
- Reduce time spent at IPL_VM.
 1.8 27-May-2019  maxv Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.
 1.7 21-Apr-2019  maxv Rename the PTE bits.
 1.6 21-Feb-2019  maxv Another locking issue in NVMM: the {svm,vmx}_tlb_flush functions take VCPU
mutexes which can sleep, but their context does not allow it.

Rewrite the TLB handling code to fix that. It becomes a bit complex. In
short, we use a per-VM generation number, which we increase on each TLB
flush, before sending a broadcast IPI to everybody. The IPIs cause a
#VMEXIT of each VCPU, and each VCPU Loop will synchronize the per-VM gen
with a per-VCPU copy, and apply the flushes as neededi lazily.

The behavior differs between AMD and Intel; in short, on Intel we don't
flush the hTLB (EPT cache) if a context switch of a VCPU occurs, so now,
we need to maintain a kcpuset to know which VCPU's hTLBs are active on
which hCPU. This creates some redundancy on Intel, ie there are cases
where we flush the hTLB several times unnecessarily; but hTLB flushes are
very rare, so there is no real performance regression.

The thing is lock-less and non-blocking, so it solves our problem.
 1.5 11-Feb-2019  cherry We reorganise definitions for XEN source support as follows:

XEN - common sources required for baseline XEN support.
XENPV - sources required for support of XEN in PV mode.
XENPVHVM - sources required for support for XEN in HVM mode.
XENPVH - sources required for support for XEN in PVH mode.
 1.4 06-Jan-2019  maxv Flush the host TLB too when dealing with a guest pmap. The pmap is not
active on the host so the pages aren't cached; but the recursive PTE
entries may have been cached by our pmap code.
 1.3 07-Nov-2018  maxv Add two pmap fields, will be used by NVMM.
 1.2 19-May-2018  jakllsch branches: 1.2.2;
remove more vestiges of uvm_emap_*(), to fix x86 kernel linking
 1.1 22-Jan-2018  jdolecek branches: 1.1.2;
rename sys/arch/x86/x86/pmap_tlb.c to sys/arch/x86/x86/x86_tlb.c, so that
x86 can eventually use uvm/pmap/pmap_tlb.c; step to future PCID support
 1.1.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.1.2.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.1.2.1 21-May-2018  pgoyette Sync with HEAD
 1.2.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2.2.1 10-Jun-2019  christos Sync with HEAD
 1.13.2.2 29-Feb-2020  ad Sync with head.
 1.13.2.1 17-Jan-2020  ad Sync with head.
 1.1 31-May-2011  dyoung branches: 1.1.4; 1.1.6; 1.1.8;
Don't use the C preprocessor to configure USERCONF. Instead, either do
or do not link in subr_userconf.c and x86_userconf.c.

Provide no-op stubs for userconf_bootinfo(), userconf_init(), and
userconf_prompt().

Delete all occurrences of #include "opt_userconf.h" as well as USERCONF
and __HAVE_USERCONF_BOOTINFO #ifdef'age.
 1.1.8.2 27-Aug-2011  jym Add/remove files, like in HEAD.
 1.1.8.1 31-May-2011  jym file x86_userconf.c was added on branch jym-xensuspend on 2011-08-27 15:59:49 +0000
 1.1.6.2 12-Jun-2011  rmind sync with head
 1.1.6.1 31-May-2011  rmind file x86_userconf.c was added on branch rmind-uvmplock on 2011-06-12 00:24:11 +0000
 1.1.4.2 06-Jun-2011  jruoho Sync with HEAD.
 1.1.4.1 31-May-2011  jruoho file x86_userconf.c was added on branch jruoho-x86intr on 2011-06-06 09:07:10 +0000
 1.1 20-Aug-2010  uebayasi branches: 1.1.2;
file xmd_machdep.c was initially added on branch uebayasi-xip.
 1.1.2.3 30-Oct-2010  uebayasi Implement pmap_physload_device(9) to replace xmd(4) MD backend.
Implement pmap_mmap(9) and use it from mem(4) and xmd(4).
 1.1.2.2 25-Aug-2010  uebayasi PA != VA here, use vtophys() to convert VA to PA.

(I hate x86.)
 1.1.2.1 20-Aug-2010  uebayasi xmd(4) glue for i386. XIP mount panics now.

RSS XML Feed