Cross Reference: /src/sys/arch/x86/include/cpu_extended

History log of /src/sys/arch/x86/include/cpu_extended_state.h
Revision	Date	Author	Comments
1.19	24-Apr-2025	riastradh	amd64: Allocate FPU save state outside pcb if it's too large. We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte state. We only do this for user threads, and only on machines where it's necessary, to avoid incurring much overhead. There is still a tiny bit of overhead when saving and restoring the FPU state by using a pointer indirection instead of arithmetic indirection for access to struct pcb::pcb_savefpu, but this is probably a drop in the bucket compared to the memory traffic incurred by the FPU state save/restore anyway. For now, these paths are mostly disabled on i386. We could enable them but it will require either rewriting cpu_uarea_alloc/free for i386, or adopting a guard page like amd64 does, which might be costly and so should be undertaken only with some thought and care. And since Intel AMX instructions only work in 64-bit mode, it's not likely to be useful on i386. PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in KVM/Qemu These changes, as a side effect, may fix: PR kern/57258: kthread_fpu_enter/exit problem by making sure to allocate an FPU save space that is large enough to guarantee fpu_kern_enter/leave work safely, instead of just using a union savefpu object on the stack (which, at 576 bytes, may be too small on some machines, particularly with AVX512 requiring ~2.5K). (But we'll have to do some extra work with kthread_fpu_enter/exit_md -- if we try doing them again on x86 -- to actually allocate the separate pcb on these machines!)
1.18	25-Feb-2023	riastradh	branches: 1.18.6; x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
1.17	26-Jun-2019	mgorny	branches: 1.17.28; Implement PT_GETXSTATE and PT_SETXSTATE Introduce two new ptrace() requests: PT_GETXSTATE and PT_SETXSTATE, that provide access to the extended (and extensible) set of FPU registers on amd64 and i386. At the moment, this covers AVX (YMM) and AVX-512 (ZMM, opmask) registers. It can be easily extended to cover further register types without breaking backwards compatibility. PT_GETXSTATE issues the XSAVE instruction with all kernel-supported extended components enabled. The data is copied into 'struct xstate' (which -- unlike the XSAVE area itself -- has stable format and offsets). PT_SETXSTATE issues the XRSTOR instruction to restore the register values from user-provided 'struct xstate'. The function replaces only the specific XSAVE components that are listed in 'xs_rfbm' field, making it possible to issue partial updates. Both syscalls take a 'struct iovec' pointer rather than a direct argument. This requires the caller to explicitly specify the buffer size. As a result, existing code will continue to work correctly when the structure is extended (performing partial reads/updates).
1.16	23-May-2018	maxv	branches: 1.16.2; Clean up the FPU headers.
1.15	08-Nov-2017	maxv	branches: 1.15.2; remove vestige
1.14	31-Oct-2017	maxv	Remove outdated comment.
1.13	31-Oct-2017	maxv	Don't embed our own values in the reserved fields of the XSAVE area, it really is a bad idea. Move them into the PCB.
1.12	31-Oct-2017	maxv	Add xsh_xcomp_bv and fx_zero, and use uint8_t instead.
1.11	10-Aug-2017	maxv	Remove the svr4/ibcs2 fpu flags.
1.10	18-Aug-2016	maxv	KNF and simplify.
1.9	25-Feb-2014	dsl	branches: 1.9.4; 1.9.6; 1.9.10; 1.9.12; Add support for saving the AVX-256 ymm registers during FPU context switches. Add support for the forthcoming AVX-512 registers. Code compiled with -mavx seems to work, but I've not tested context switches with live ymm registers. There is a small cost on fork/exec (a larger area is copied/zerod), but I don't think the ymm registers are read/written unless they have been used. The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
1.8	18-Feb-2014	dsl	It seems that firefox includes machine/fpu.h on amd64. Add the file back so that the firwfox source doesn't have to depend on the version of netbsd it is being compiled for. (The i386 version doesn't play the same games in its SIGFPE handler.)
1.7	15-Feb-2014	dsl	Remove all references to MDL_USEDFPU and deferred fpu initialisation. The cost of zeroing the save area on exec is minimal. This stops the FP registers of a random process being used the first time an lwp uses the fpu. sendsig_siginfo() and get_mcontext() now unconditionally copy the FP registers. I'll remove the double-copy for signal handlers soon. get_mcontext() might have been leaking kernel memory to userspace - and may still do so if i386_use_fxsave is false (short copies).
1.6	13-Feb-2014	dsl	Check the argument types for the fpu asm functions.
1.5	12-Feb-2014	dsl	Change i386 to use x86/fpu.c instead of i386/isa/npx.c This changes the trap10 and trap13 code to call directly into fpu.c, removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c Not all of the code thate appeared to handle fpu traps was ever called! Most of the changes just replace the include of machine/npx.h with x86/fpu.h (or remove it entirely).
1.4	09-Feb-2014	dsl	Add compatibility for some userspace code (eg firefox) that seems to look inside the ucontext structure passed to signal handlers to modify the xmm registers. This should make the code compile - I'm not at all sure it works as expected, the interactions between FP and signal handlers aren't at all clear. AFAICT the FP state is saved on the user stack when the handler is called, however the FP trap code can already done odd things to the FPU....
1.3	08-Feb-2014	dsl	Add bit defs for more of the x87 status register.
1.2	07-Feb-2014	dsl	Convert the amd64 build to use x86/cpu_extended_state.h so that the fpu definitions match those of i386. Mostly just structure and field renames, in addition: 1) process_xmm_to_s87() and process_s87_to_xmm() moved into x86/convert_xmm_s87.c so they can be used by amd64's netbsd32 code. 2) The linux signal code simplified to use a structure copy for ths fxsave data - it matches the hardware definition and won't change.
1.1	07-Feb-2014	dsl	Move all the hardware register layout for the x86 cpus into a header that can also be used by amd64. Add in skeleton definitions for XSAVE and AVX. Update some comments to match reality.
1.9.12.2	28-Aug-2017	skrll	Sync with HEAD
1.9.12.1	05-Oct-2016	skrll	Sync with HEAD
1.9.10.3	03-Dec-2017	jdolecek	update from HEAD
1.9.10.2	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.9.10.1	25-Feb-2014	tls	file cpu_extended_state.h was added on branch tls-maxphys on 2014-08-20 00:03:29 +0000
1.9.6.2	22-May-2014	yamt	sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
1.9.6.1	25-Feb-2014	yamt	file cpu_extended_state.h was added on branch yamt-pagecache on 2014-05-22 11:40:13 +0000
1.9.4.2	18-May-2014	rmind	sync with head
1.9.4.1	25-Feb-2014	rmind	file cpu_extended_state.h was added on branch rmind-smpnet on 2014-05-18 17:45:30 +0000
1.15.2.1	25-Jun-2018	pgoyette	Sync with HEAD
1.16.2.1	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.17.28.1	25-Jul-2023	martin	Pull up following revision(s) (requested by riastradh in ticket #244): sys/arch/x86/x86/fpu.c: revision 1.80 sys/arch/x86/include/cpu_extended_state.h: revision 1.18 x86: Mitigate MXCSR Configuration Dependent Timing in kernel FPU use. In fpu_kern_enter, make sure all the MXCSR exception status bits are set when we start using the FPU, so that instructions which exhibit MCDT are unaffected by it. While here, zero all the other FPU registers in fpu_kern_enter. In principle we could skip this step on future CPUs that fix the MCDT bug, but there's probably not much benefit -- workloads that do a lot of crypto in the kernel are probably better off using kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles in the first place. For details, see: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html
1.18.6.1	02-Aug-2025	perseant	Sync with HEAD

OpenGrok