Home | History | Annotate | Download | only in nvmm
History log of /src/sys/dev/nvmm/nvmm_ioctl.h
RevisionDateAuthorComments
 1.12  08-Sep-2020  maxv nvmm: cosmetic changes

- Style.
- Explicitly include ioccom.h.
 1.11  05-Sep-2020  maxv nvmm: update copyright headers
 1.10  26-Aug-2020  maxv nvmm: slightly clarify
 1.9  28-Oct-2019  maxv Add nram in struct nvmm_ctl_mach_info.
 1.8  23-Oct-2019  maxv Miscellaneous changes in NVMM, to address several inconsistencies and
issues in the libnvmm API.

- Rename NVMM_CAPABILITY_VERSION to NVMM_KERN_VERSION, and check it in
libnvmm. Introduce NVMM_USER_VERSION, for future use.

- In libnvmm, open "/dev/nvmm" as read-only and with O_CLOEXEC. This is to
avoid sharing the VMs with the children if the process forks. In the
NVMM driver, force O_CLOEXEC on open().

- Rename the following things for consistency:
nvmm_exit* -> nvmm_vcpu_exit*
nvmm_event* -> nvmm_vcpu_event*
NVMM_EXIT_* -> NVMM_VCPU_EXIT_*
NVMM_EVENT_INTERRUPT_HW -> NVMM_VCPU_EVENT_INTR
NVMM_EVENT_EXCEPTION -> NVMM_VCPU_EVENT_EXCP
Delete NVMM_EVENT_INTERRUPT_SW, unused already.

- Slightly reorganize the MI/MD definitions, for internal clarity.

- Split NVMM_VCPU_EXIT_MSR in two: NVMM_VCPU_EXIT_{RD,WR}MSR. Also provide
separate u.rdmsr and u.wrmsr fields. This is more consistent with the
other exit reasons.

- Change the types of several variables:
event.type enum -> u_int
event.vector uint64_t -> uint8_t
exit.u.*msr.msr: uint64_t -> uint32_t
exit.u.io.type: enum -> bool
exit.u.io.seg: int -> int8_t
cap.arch.mxcsr_mask: uint64_t -> uint32_t
cap.arch.conf_cpuid_maxops: uint64_t -> uint32_t

- Delete NVMM_VCPU_EXIT_MWAIT_COND, it is AMD-only and confusing, and we
already intercept 'monitor' so it is never armed.

- Introduce vmx_exit_insn() for NVMM-Intel, similar to svm_exit_insn().
The 'npc' field wasn't getting filled properly during certain VMEXITs.

- Introduce nvmm_vcpu_configure(). Similar to nvmm_machine_configure(),
but as its name indicates, the configuration is per-VCPU and not per-VM.
Migrate and rename NVMM_MACH_CONF_X86_CPUID to NVMM_VCPU_CONF_CPUID.
This becomes per-VCPU, which makes more sense than per-VM.

- Extend the NVMM_VCPU_CONF_CPUID conf to allow triggering VMEXITs on
specific leaves. Until now we could only mask the leaves. An uint32_t
is added in the structure:
uint32_t mask:1;
uint32_t exit:1;
uint32_t rsvd:30;
The two first bits select the desired behavior on the leaf. Specifying
zero on both resets the leaf to the default behavior. The new
NVMM_VCPU_EXIT_CPUID exit reason is added.
 1.7  01-May-2019  maxv branches: 1.7.2; 1.7.4;
Use the comm page to inject events, rather than ioctls, and commit them in
vcpu_run. This saves a few syscalls and copyins.

For example on Windows 10, moving the mouse from the left to right sides of
the screen generates ~500 events, which now don't result in syscalls.

The error handling is done in vcpu_run and it is less precise, but this
doesn't matter a lot, and will be solved with future NVMM error codes.
 1.6  28-Apr-2019  maxv Modify the communication layer between the kernel NVMM driver and libnvmm:
introduce a bidirectionnal "comm page", a page of memory shared between
the kernel and userland, and used to transfer data in and out in a more
performant manner than ioctls.

The comm page contains the VCPU state, plus three flags:

- "wanted": the states the kernel must get/set when requested via ioctls
- "cached": the states that are in the comm page
- "commit": the states the kernel must set in vcpu_run

The idea is to avoid performing expensive syscalls, by using the VCPU
state cached, either explicitly or speculatively, in the comm page. For
example, if the state is cached we do a direct 1->5 with no syscall:

+---------------------------------------------+
| Qemu |
+---------------------------------------------+
| ^
| (0) nvmm_vcpu_getstate | (6) Done
| |
V |
+---------------------------------------+
| libnvmm |
+---------------------------------------+
| ^ | ^
(1) State | | (2) No | (3) Ioctl: | (5) Ok, state
cached? | | | "please cache | fetched
| | | the state" |
V | | |
+-----------+ | |
| Comm Page |------+---------------+
+-----------+ |
^ |
(4) "Alright | V
babe" | +--------+
+-----| Kernel |
+--------+

The main changes in behavior are:

- nvmm_vcpu_getstate(): won't emit a syscall if the state is already
cached in the comm page, will just fetch from the comm page directly
- nvmm_vcpu_setstate(): won't emit a syscall at all, will just cache
the wanted state in the comm page
- nvmm_vcpu_run(): will commit the to-be-set state in the comm page,
as previously requested by nvmm_vcpu_setstate()

In addition to this, the kernel NVMM driver is changed to speculatively
cache certain states known to be of interest, so that the future
nvmm_vcpu_getstate() calls libnvmm or the emulator will perform will use
the comm page rather than expensive syscalls. For example, if an I/O
VMEXIT occurs, the I/O Assist in libnvmm will want GPRS+SEGS+CRS+MSRS,
and now the kernel caches all of that in the comm page before returning
to userland.

Overall, in a normal run of Windows 10, this saves several millions of
syscalls. Eg on a 4CPU Intel with 4VCPUs, booting the Win10 install ISO
goes from taking 1min35 to taking 1min16.

The libnvmm API is not changed, but the ABI is. If we changed the API it
would be possible to save expensive memcpys on libnvmm's side. This will
be avoided in a future version. The comm page can also be extended to
implement future services.
 1.5  10-Apr-2019  maxv Add the NVMM_CTL ioctl, always privileged regardless of the permissions of
/dev/nvmm. We'll use it to provide a way for an admin to control the
registered VMs in the kernel.

Add an associated wrapper in libnvmm.
 1.4  21-Mar-2019  maxv Make it possible for an emulator to set the protection of the guest pages.
For some reason I had initially concluded that it wasn't doable; verily it
is, so let's do it.

The reserved 'flags' argument of nvmm_gpa_map() becomes 'prot' and takes
mmap-like protection codes.
 1.3  08-Jan-2019  maxv _IOWR -> _IOW
 1.2  15-Dec-2018  maxv Invert the mapping logic.

Until now, the "owner" of the memory was the guest, and by calling
nvmm_gpa_map(), the virtualizer was creating a view towards the guest
memory.

Qemu expects the contrary: it wants the owner to be the virtualizer, and
nvmm_gpa_map should just create a view from the guest towards the
virtualizer's address space. Under this scheme, it is legal to have two
GPAs that point to the same HVA.

Introduce nvmm_hva_map() and nvmm_hva_unmap(), that map/unamp the HVA into
a dedicated UOBJ. Change nvmm_gpa_map() and nvmm_gpa_unmap() to just
perform an enter into the desired UOBJ.

With this change in place, all the mapping-related problems in Qemu+NVMM
are fixed.
 1.1  07-Nov-2018  maxv branches: 1.1.2;
Add NVMM - for NetBSD Virtual Machine Monitor -, a kernel driver that
provides support for hardware-accelerated virtualization on NetBSD.

It is made of an MI frontend, to which MD backends can be plugged. One
MD backend is implemented, x86-SVM, for x86 AMD CPUs.

We install

/usr/include/dev/nvmm/nvmm.h
/usr/include/dev/nvmm/nvmm_ioctl.h
/usr/include/dev/nvmm/{arch}/nvmm_{arch}.h

And the kernel module. For now, the only architecture where we do that
is amd64 (arch=x86).

NVMM is not enabled by default in amd64-GENERIC, but is instead easily
modloadable.

Sent to tech-kern@ a month ago. Validated with kASan, and optimized
with tprof.
 1.1.2.4  18-Jan-2019  pgoyette Synch with HEAD
 1.1.2.3  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.1.2.2  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.1.2.1  07-Nov-2018  pgoyette file nvmm_ioctl.h was added on branch pgoyette-compat on 2018-11-26 01:52:31 +0000
 1.7.4.2  29-Aug-2020  martin Pull up following revision(s) (requested by maxv in ticket #1068):

sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.71
sys/dev/nvmm/nvmm.c: revision 1.34
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.72
sys/dev/nvmm/nvmm.c: revision 1.35
sys/dev/nvmm/nvmm.c: revision 1.36
sys/dev/nvmm/x86/nvmm_x86_svmfunc.S: revision 1.5
sys/dev/nvmm/nvmm.c: revision 1.37
sys/dev/nvmm/x86/nvmm_x86_vmxfunc.S: revision 1.5
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.70
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.68
sys/dev/nvmm/x86/nvmm_x86.c: revision 1.15
sys/dev/nvmm/nvmm_ioctl.h: revision 1.10

Micro-optimize: use pushq instead of pushw. To avoid LCP stalls and
unaligned stack accesses.

nvmm-x86: also flush the guest TLB when CR4.{PCIDE,SMEP} changes

nvmm: localify a variable that doesn't need to be global

nvmm: use relaxed atomics to read nmachines

nvmm-x86-svm: dedup code

nvmm-x86: hide more CPUID flags, mostly related to perf monitors

nvmm: misc improvements
- use mach->ncpus to get the number of vcpus, now that we have it
- don't forget to decrement mach->ncpus when a machine gets killed
- add more __predict_false()

nvmm-x86-svm: don't forget to intercept INVD
INVD executed in the guest can be dangerous for the host, due to CPU
caches being flushed without write-back.

nvmm: slightly clarify

nvmm: explicitly include atomic.h
 1.7.4.1  10-Nov-2019  martin Pull up following revision(s) (requested by maxv in ticket #405):

usr.sbin/nvmmctl/nvmmctl.8: revision 1.2
lib/libnvmm/libnvmm.3: revision 1.24
sys/dev/nvmm/nvmm.h: revision 1.11
lib/libnvmm/libnvmm.3: revision 1.25
sys/dev/nvmm/x86/nvmm_x86.h: revision 1.16
sys/dev/nvmm/nvmm.h: revision 1.12
sys/dev/nvmm/x86/nvmm_x86.h: revision 1.17
tests/lib/libnvmm/h_mem_assist.c: revision 1.12
sys/dev/nvmm/x86/nvmm_x86.h: revision 1.18
share/mk/bsd.hostprog.mk: revision 1.82
lib/libnvmm/libnvmm.c: revision 1.15
distrib/sets/lists/base/md.amd64: revision 1.281
tests/lib/libnvmm/h_mem_assist.c: revision 1.13
lib/libnvmm/libnvmm.c: revision 1.16
tests/lib/libnvmm/h_mem_assist.c: revision 1.14
lib/libnvmm/libnvmm_x86.c: revision 1.32
lib/libnvmm/libnvmm.c: revision 1.17
tests/lib/libnvmm/h_mem_assist.c: revision 1.15
lib/libnvmm/libnvmm_x86.c: revision 1.33
lib/libnvmm/libnvmm.c: revision 1.18
usr.sbin/nvmmctl/Makefile: revision 1.1
tests/lib/libnvmm/h_mem_assist_asm.S: revision 1.7
tests/lib/libnvmm/h_mem_assist.c: revision 1.16
lib/libnvmm/libnvmm_x86.c: revision 1.34
usr.sbin/nvmmctl/Makefile: revision 1.2
tests/lib/libnvmm/h_mem_assist_asm.S: revision 1.8
tests/lib/libnvmm/h_mem_assist.c: revision 1.17
sys/dev/nvmm/nvmm_internal.h: revision 1.13
lib/libnvmm/libnvmm_x86.c: revision 1.35
lib/libnvmm/libnvmm_x86.c: revision 1.36
usr.sbin/postinstall/postinstall.in: revision 1.8
lib/libnvmm/libnvmm_x86.c: revision 1.37
lib/libnvmm/libnvmm_x86.c: revision 1.38
lib/libnvmm/libnvmm_x86.c: revision 1.39
usr.sbin/Makefile: revision 1.282
lib/libnvmm/nvmm.h: revision 1.13
lib/libnvmm/nvmm.h: revision 1.14
lib/libnvmm/nvmm.h: revision 1.15
sys/dev/nvmm/nvmm.c: revision 1.23
lib/libnvmm/nvmm.h: revision 1.16
sys/dev/nvmm/nvmm.c: revision 1.24
lib/libnvmm/nvmm.h: revision 1.17
sys/dev/nvmm/nvmm.c: revision 1.25
tests/lib/libnvmm/h_io_assist.c: revision 1.9
etc/MAKEDEV.tmpl: revision 1.209
tests/lib/libnvmm/h_io_assist.c: revision 1.10
tests/lib/libnvmm/h_io_assist.c: revision 1.11
etc/group: revision 1.35
distrib/sets/lists/man/mi: revision 1.1660
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.40
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.41
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.42
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.43
sys/dev/nvmm/x86/nvmm_x86_vmx.c: revision 1.44
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.51
sys/dev/nvmm/nvmm_ioctl.h: revision 1.8
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.52
sys/dev/nvmm/nvmm_ioctl.h: revision 1.9
sys/dev/nvmm/x86/nvmm_x86_svm.c: revision 1.53
usr.sbin/nvmmctl/nvmmctl.c: revision 1.1
lib/libnvmm/libnvmm.3: revision 1.20
distrib/sets/lists/debug/md.amd64: revision 1.106
lib/libnvmm/libnvmm.3: revision 1.21
lib/libnvmm/libnvmm.3: revision 1.22
usr.sbin/nvmmctl/nvmmctl.8: revision 1.1
lib/libnvmm/libnvmm.3: revision 1.23

Fix incorrect parsing: the R/M field uses a special GPR map when the
address size is 16 bits, regardless of the actual operating mode. With
this special map there can be two registers referenced at once, and
also disp16-only.
Implement this special behavior, and add associated tests. While here
simplify a few things.
With this in place, the Windows 95 installer initializes correctly.
Part of PR/54611.
add missing initializer
Implement XCHG, add associated tests, and add comments to explain. With
this in place the Windows 95 installer completes successfuly.
Part of PR/54611.
Improve nvmm_vcpu_dump().
Put back 'default', because llvm apparently doesn't realize that all cases
are covered in the switch.
Miscellaneous changes in NVMM, to address several inconsistencies and
issues in the libnvmm API.
- Rename NVMM_CAPABILITY_VERSION to NVMM_KERN_VERSION, and check it in
libnvmm. Introduce NVMM_USER_VERSION, for future use.
- In libnvmm, open "/dev/nvmm" as read-only and with O_CLOEXEC. This is to
avoid sharing the VMs with the children if the process forks. In the
NVMM driver, force O_CLOEXEC on open().
- Rename the following things for consistency:
nvmm_exit* -> nvmm_vcpu_exit*
nvmm_event* -> nvmm_vcpu_event*
NVMM_EXIT_* -> NVMM_VCPU_EXIT_*
NVMM_EVENT_INTERRUPT_HW -> NVMM_VCPU_EVENT_INTR
NVMM_EVENT_EXCEPTION -> NVMM_VCPU_EVENT_EXCP
Delete NVMM_EVENT_INTERRUPT_SW, unused already.
- Slightly reorganize the MI/MD definitions, for internal clarity.
- Split NVMM_VCPU_EXIT_MSR in two: NVMM_VCPU_EXIT_{RD,WR}MSR. Also provide
separate u.rdmsr and u.wrmsr fields. This is more consistent with the
other exit reasons.
- Change the types of several variables:
event.type enum -> u_int
event.vector uint64_t -> uint8_t
exit.u.*msr.msr: uint64_t -> uint32_t
exit.u.io.type: enum -> bool
exit.u.io.seg: int -> int8_t
cap.arch.mxcsr_mask: uint64_t -> uint32_t
cap.arch.conf_cpuid_maxops: uint64_t -> uint32_t
- Delete NVMM_VCPU_EXIT_MWAIT_COND, it is AMD-only and confusing, and we
already intercept 'monitor' so it is never armed.
- Introduce vmx_exit_insn() for NVMM-Intel, similar to svm_exit_insn().
The 'npc' field wasn't getting filled properly during certain VMEXITs.
- Introduce nvmm_vcpu_configure(). Similar to nvmm_machine_configure(),
but as its name indicates, the configuration is per-VCPU and not per-VM.
Migrate and rename NVMM_MACH_CONF_X86_CPUID to NVMM_VCPU_CONF_CPUID.
This becomes per-VCPU, which makes more sense than per-VM.
- Extend the NVMM_VCPU_CONF_CPUID conf to allow triggering VMEXITs on
specific leaves. Until now we could only mask the leaves. An uint32_t
is added in the structure:
uint32_t mask:1;
uint32_t exit:1;
uint32_t rsvd:30;
The two first bits select the desired behavior on the leaf. Specifying
zero on both resets the leaf to the default behavior. The new
NVMM_VCPU_EXIT_CPUID exit reason is added.
Three changes in libnvmm:
- Add 'mach' and 'vcpu' backpointers in the nvmm_io and nvmm_mem
structures.
- Rename 'nvmm_callbacks' to 'nvmm_assist_callbacks'.
- Rename and migrate NVMM_MACH_CONF_CALLBACKS to NVMM_VCPU_CONF_CALLBACKS,
it now becomes per-VCPU.
Update the libnvmm man page:
- Sync the naming with reality.
- Replace "relevant" by "desired" and "virtualizer" by "emulator", closer
to what I meant.
- Add a "VCPU Configuration" section.
- Add a "Machine Ownership" section.
Add the "nvmm" group, and make nvmm_init() public. Sent to tech-kern@ a few
days ago.
Use the new PTE naming, and define CR3_FRAME_* separately. No functional
change.
Add a new VCPU conf option, that allows userland to request VMEXITs after a
TPR change. This is supported on all Intel CPUs, and not-too-old AMD CPUs.
The reason for wanting this option is that certain OSes (like Win10 64bit)
manage interrupt priority in hardware via CR8 directly, and for these OSes,
the emulator may want to sync its internal TPR state on each change.
Add two new fields in cap.arch, to report the conf capabilities. Report TPR
only on Intel for now, not AMD, because I don't have a recent AMD CPU on
which to test.
Mask CPUID leaf 0x0A on Intel, because we don't want the guest to try (and
fail) to probe the PMC MSRs. This avoids "Unexpected WRMSR" warnings in
qemu-nvmm.
Add PCID support in the guests. This speeds up most 64bit guests, because
since Meltdown, everybody uses PCID (including NetBSD).
Change the way root_owner works: consider the calling process as root_owner
not if it has root privileges, but if the /dev/nvmm device was opened with
write permissions. Introduce the undocumented nvmm_root_init() function to
achieve that.
The goal is to simplify the logic and have more granularity, eg if we want
a monitoring agent to access VMs but don't want to give this agent real
root access on the system.
A few changes:
- Use smaller types in struct nvmm_capability.
- Use smaller type for nvmm_io.port.
- Switch exitstate to a compacted structure.
Add nram in struct nvmm_ctl_mach_info.
Add nvmmctl, with two commands for now.
Macro tidyness.
Sort SEE ALSO.
should be fork(2), noticed by wiz
Add debug entry for newly introduced nvmmctl utility.
Annotate a covering switch as such to avoid warnings about missing
returns.
Forgot to put nvmmctl in the "nvmm" group.
Add nvmm group.
 1.7.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7.2.2  10-Jun-2019  christos Sync with HEAD
 1.7.2.1  01-May-2019  christos file nvmm_ioctl.h was added on branch phil-wifi on 2019-06-10 22:07:14 +0000

RSS XML Feed