Home | History | Annotate | Download | only in include
History log of /src/sys/arch/amd64/include/pcb.h
RevisionDateAuthorComments
 1.35  28-Apr-2025  riastradh xen: Stop-gap FPU PCB fix; disable Intel AMX for now.

Since the custom cpu_uarea_alloc/free are disabled under XENPV,
nothing would initialize struct pcb::pcb_savefpu to point either to
struct pcb::pcb_savefpusmall, or to a separately allocated large area
on machines with Intel AMX TILECFG/TILEDATA requiring it. So the
memset in fpu_lwp_fork would crash on null pointer dereference:

[ 1.0000030] uvm_fault(0xffffffff8094a300, 0x0, 2) -> e
[ 1.0000030] fatal page fault in supervisor mode
[ 1.0000030] trap type 6 code 0x2 rip 0xffffffff8062795c cs 0xe030 rflags 0x10202 cr2 0 ilevel 0 rsp 0xffffffff80adad38
[ 1.0000030] curlwp 0xffffffff8078f880 pid 0.0 lowest kstack 0xffffffff80ad62c0
kernel: page fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:memset+0x2c: repe stosq %es:(%rdi)
memset() at netbsd:memset+0x2c
lwp_create() at netbsd:lwp_create+0x2f1
fork1() at netbsd:fork1+0x42c
main() at netbsd:main+0x44f

In order to support Intel AMX TILECFG/TILEDATA, or any other CPU
extensions that increase the XSAVE area beyond what fits in a single
page after struct pcb, we would need to enable the the custom
cpu_uarea_alloc/free. Currently that would imply allocating stack
guard pages (`redzone') under XENPV; if there's some reason the stack
guard pages don't work, we could also push #ifdef XENPV conditionals
into cpu_uarea_alloc/free to cover the guard pages -- to be
considered.

PR kern/59371: Xen domU uvm_fault since FPU state allocation patch

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu
 1.34  24-Apr-2025  kre offsetof() needs <stddef.h> (<sys/stddef.h>)

Include <sys/stddef.h> when offsetof() is to be used.

First step in fixing x86 builds.
 1.33  24-Apr-2025  riastradh amd64: Allocate FPU save state outside pcb if it's too large.

We have seen x86_fpu_save_size values (CPUID[EAX=0x0d, ECX=0].ECX) as
large as 11008 bytes, notably with Intel AMX TILEDATA's 8192-byte
state.

We only do this for user threads, and only on machines where it's
necessary, to avoid incurring much overhead. There is still a tiny
bit of overhead when saving and restoring the FPU state by using a
pointer indirection instead of arithmetic indirection for access to
struct pcb::pcb_savefpu, but this is probably a drop in the bucket
compared to the memory traffic incurred by the FPU state save/restore
anyway.

For now, these paths are mostly disabled on i386. We could enable
them but it will require either rewriting cpu_uarea_alloc/free for
i386, or adopting a guard page like amd64 does, which might be costly
and so should be undertaken only with some thought and care. And
since Intel AMX instructions only work in 64-bit mode, it's not
likely to be useful on i386.

PR port-amd64/57661: Crash when booting on Xeon Silver 4416+ in
KVM/Qemu

These changes, as a side effect, may fix:

PR kern/57258: kthread_fpu_enter/exit problem

by making sure to allocate an FPU save space that is large enough to
guarantee fpu_kern_enter/leave work safely, instead of just using a
union savefpu object on the stack (which, at 576 bytes, may be too
small on some machines, particularly with AVX512 requiring ~2.5K).
(But we'll have to do some extra work with kthread_fpu_enter/exit_md
-- if we try doing them again on x86 -- to actually allocate the
separate pcb on these machines!)
 1.32  17-Mar-2020  maxv Add a redzone between the pcb and the stack. Sent to port-amd64@.
 1.31  12-Oct-2019  christos disable CTASSERT for lint
 1.30  12-Oct-2019  maxv Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.
 1.29  26-Jul-2018  maxv Rework dbregs, to switch the registers during context switches, and not on
each user->kernel transition via userret. Reloads of DR6/DR7 are expensive
on both native and xen.
 1.28  31-Dec-2017  maxv branches: 1.28.2; 1.28.4;
gc unused
 1.27  31-Oct-2017  maxv Don't embed our own values in the reserved fields of the XSAVE area, it
really is a bad idea. Move them into the PCB.
 1.26  23-Feb-2017  kamil Introduce PT_GETDBREGS and PT_SETDBREGS in ptrace(2) on i386 and amd64

This interface is modeled after FreeBSD API with the usage.

This replaced previous watchpoint API. The previous one was introduced
recently in NetBSD-current and remove its spurs without any
backward-compatibility.

Design choices for Debug Register accessors:
- exec() (TRAP_EXEC event) must remove debug registers from LWP
- debug registers are only per-LWP, not per-process globally
- debug registers must not be inherited after (v)forking a process
- debug registers must not be inherited after forking a thread
- a debugger is responsible to set global watchpoints/breakpoints with the
debug registers, to achieve this PTRACE_LWP_CREATE/PTRACE_LWP_EXIT event
monitoring function is designed to be used
- debug register traps must generate SIGTRAP with si_code TRAP_DBREG
- debugger is responsible to retrieve debug register state to distinguish
the exact debug register trap (DR6 is Status Register on x86)
- kernel must not remove debug register traps after triggering a trap event
a debugger is responsible to detach this trap with appropriate PT_SETDBREGS
call (DR7 is Control Register on x86)
- debug registers must not be exposed in mcontext
- userland must not be allowed to set a trap on the kernel

Implementation notes on i386 and amd64:
- the initial state of debug register is retrieved on boot and this value is
stored in a local copy (initdbregs), this value is used to initialize dbreg
context after PT_GETDBREGS
- struct dbregs is stored in pcb as a pointer and by default not initialized
- reserved registers (DR4-DR5, DR9-DR15) are ignored

Further ideas:
- restrict this interface with securelevel

Tested on real hardware i386 (Intel Pentium IV) and amd64 (Intel i7).

This commit enables 390 debug register ATF tests in kernel/arch/x86.
All tests are passing.

This commit does not cover netbsd32 compat code. Currently other interface
PT_GET_SIGINFO/PT_SET_SIGINFO is required in netbsd32 compat code in order to
validate reliably PT_GETDBREGS/PT_SETDBREGS.

This implementation does not cover FreeBSD specific defines in their
<x86/reg.h>: DBREG_DR7_LOCAL_ENABLE, DBREG_DR7_GLOBAL_ENABLE, DBREG_DR7_LEN_1
etc. These values tend to be reinvented by each tracer on its own. GNU
Debugger (GDB) works with NetBSD debug registers after adding this patch:

--- gdb/amd64bsd-nat.c.orig 2016-02-10 03:19:39.000000000 +0000
+++ gdb/amd64bsd-nat.c
@@ -167,6 +167,10 @@ amd64bsd_target (void)

#ifdef HAVE_PT_GETDBREGS

+#ifndef DBREG_DRX
+#define DBREG_DRX(d,x) ((d)->dr[(x)])
+#endif
+
static unsigned long
amd64bsd_dr_get (ptid_t ptid, int regnum)
{


Another reason to stop introducing unpopular defines covering machine
specific register macros is that these value varies across generations of
the same CPU family.

GDB demo:
(gdb) c
Continuing.

Watchpoint 2: traceme

Old value = 0
New value = 16
main (argc=1, argv=0x7f7fff79fe30) at test.c:8
8 printf("traceme=%d\n", traceme);

(Currently the GDB interface is not reliable due to NetBSD support bugs)

Sponsored by <The NetBSD Foundation>
 1.25  20-Feb-2014  dsl branches: 1.25.6; 1.25.10; 1.25.14;
Move the amd64 and i386 pcb to the bottom of the uarea, and move the
kernel stack to the top.
Change the pcb layouts so that fpu save area is at the end and is
64byte aligned ready for xsave (saving the ymm registers).
Welcome to 6.99.32
 1.24  11-Feb-2014  dsl Move sys/arch/amd64/amd64/fpu.c and sys/arch/amd64/include/fpu.h
into sys/arch/x86 in preparation for using the same code for i386.
 1.23  07-Feb-2014  dsl Convert the amd64 build to use x86/cpu_extended_state.h so that the fpu
definitions match those of i386.
Mostly just structure and field renames, in addition:
1) process_xmm_to_s87() and process_s87_to_xmm() moved into
x86/convert_xmm_s87.c so they can be used by amd64's netbsd32 code.
2) The linux signal code simplified to use a structure copy for ths fxsave
data - it matches the hardware definition and won't change.
 1.22  19-Jan-2014  dsl Remove the unused 'struct md_coredump'.
 1.21  11-Dec-2013  dsl Remove the fields that were used to save the i387 fp state on interrupt.
They were written but never read.
Possibly they should be saved for 32 bit processes, but that might be a relic
from real i387 where the fpu was actully asynchronous.
 1.20  01-Dec-2013  christos revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes
 1.19  23-Oct-2013  drochner Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.
 1.18  31-Dec-2012  dsl branches: 1.18.2;
Move the two fields used to save some i387 state on the last fpu trap
into their own sub-structure of the pcb (from 'struct savefpu').
They only (seem) to be used in some code that generates core dumps
for 32bit processes (code that might be broken as well!).
'struct safefpu' is now identical to 'struct fxsave64'. One (or both)
needs extending to support AVX - might need to be dynamically sized.
Removed all the __aligned(16) except for the one in struct pcb itself.
Only the copy used for the fsave instruction need be aligned.
 1.17  07-Jul-2010  chs branches: 1.17.8; 1.17.18;
add the guts of TLS support on amd64. based on joerg's patch,
reworked by me to support 32-bit processes as well.
we now keep %fs and %gs loaded with the user values
while in the kernel, which means we don't need to
reload them when returning to user mode.
 1.16  27-Oct-2009  rmind branches: 1.16.2; 1.16.4;
Make pcb_ldt_sel, in amd64, an unused field. Unlike in i386, it was
missed during clean-up of LDT handling.
 1.15  26-Oct-2008  mrg branches: 1.15.8;
put the contents of these header files around #ifdef __x86_64__, and
#include the <i386/foo.h> in the #else clause, making these files
largely bit-size independant.
 1.14  30-Apr-2008  ad branches: 1.14.6;
lcr0() was changed to take a u_long. pcb_cr0 was a 32-bit signed quantity.
It was being sign extended in cpu_hatch() (CR0_PG is always set), causing
systems to crash and reboot before going multiuser.
 1.13  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.12  16-Apr-2008  cegger branches: 1.12.2; 1.12.4;
use POSIX integer types
 1.11  05-Jan-2008  yamt branches: 1.11.6;
- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.
 1.10  27-Nov-2007  christos branches: 1.10.6;
Shuffle things around so that pcb_savefpu goes back to be aligned in a 16
bit boundary. Noted by Arto Huusko.
 1.9  26-Nov-2007  christos make cr2 64 bits. Requested by fvdl.
 1.8  24-Nov-2007  christos preserve cr2 on pcb for the benefit of linux emulation.
 1.7  17-Oct-2007  garbled branches: 1.7.2;
Merge the ppcoea-renovation branch to HEAD.

This branch was a major cleanup and rototill of many of the various OEA
cpu based PPC ports that focused on sharing as much code as possible
between the various ports to eliminate near-identical copies of files in
every tree. Additionally there is a new PIC system that unifies the
interface to interrupt code for all different OEA ppc arches. The work
for this branch was done by a variety of people, too long to list here.

TODO:
bebox still needs work to complete the transition to -renovation.
ofppc still needs a bunch of work, which I will be looking at.
ev64260 still needs to be renovated
amigappc was not attempted.

NOTES:
pmppc was removed as an arch, and moved to a evbppc target.
 1.6  17-May-2007  yamt branches: 1.6.8; 1.6.10;
merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.5  04-Mar-2007  christos branches: 1.5.2; 1.5.4; 1.5.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.4  11-Dec-2005  christos branches: 1.4.26;
merge ktrace-lwp.
 1.3  15-May-2005  fvdl branches: 1.3.2;
Optionally include saving and restoring the 64bit %gs and %fs base register
values in the PCB. Do this in pmap_activate for now (XXX not a good place
for it, but a convenient one).
 1.2  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.1  26-Apr-2003  fvdl branches: 1.1.2;
Rename the x86_64 port to amd64, as this is the actual name used for
the processor family now. x86_64 is kept as the MACHINE_ARCH value,
since it's already widely used (by e.g. the toolchain, etc), and
by other operating systems.
 1.1.2.4  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1  03-Aug-2004  skrll Sync with HEAD
 1.3.2.3  21-Jan-2008  yamt sync with head
 1.3.2.2  07-Dec-2007  yamt sync with head
 1.3.2.1  03-Sep-2007  yamt sync with head.
 1.4.26.2  12-Mar-2007  rmind Sync with HEAD.
 1.4.26.1  03-Mar-2007  yamt adapt amd64.

XXX changes in identcpu.c is minmum for MONITOR.
XXX identcpu.c should be shared with i386.
 1.5.10.1  22-May-2007  matt Update to HEAD.
 1.5.4.1  11-Jul-2007  mjf Sync with head.
 1.5.2.2  03-Dec-2007  ad Sync with HEAD.
 1.5.2.1  27-May-2007  ad Sync with head.
 1.6.10.2  09-Jan-2008  matt sync with HEAD
 1.6.10.1  06-Nov-2007  matt sync with HEAD
 1.6.8.1  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.7.2.2  18-Feb-2008  mjf Sync with HEAD.
 1.7.2.1  08-Dec-2007  mjf Sync with HEAD.
 1.10.6.1  08-Jan-2008  bouyer Sync with HEAD
 1.11.6.2  17-Jan-2009  mjf Sync with HEAD.
 1.11.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.12.4.4  11-Aug-2010  yamt sync with head.
 1.12.4.3  11-Mar-2010  yamt sync with head
 1.12.4.2  04-May-2009  yamt sync with head.
 1.12.4.1  16-May-2008  yamt sync with head.
 1.12.2.1  18-May-2008  yamt sync with head.
 1.14.6.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.15.8.2  24-Oct-2010  jym Sync with HEAD
 1.15.8.1  01-Nov-2009  jym Sync with HEAD.
 1.16.4.1  05-Mar-2011  rmind sync with head
 1.16.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.17.18.3  03-Dec-2017  jdolecek update from HEAD
 1.17.18.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.18.1  25-Feb-2013  tls resync with head
 1.17.8.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.17.8.1  23-Jan-2013  yamt sync with head
 1.18.2.1  18-May-2014  rmind sync with head
 1.25.14.1  21-Apr-2017  bouyer Sync with HEAD
 1.25.10.1  20-Mar-2017  pgoyette Sync with HEAD
 1.25.6.1  28-Aug-2017  skrll Sync with HEAD
 1.28.4.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.28.4.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.28.4.1  10-Jun-2019  christos Sync with HEAD
 1.28.2.1  28-Jul-2018  pgoyette Sync with HEAD

RSS XML Feed