Home | History | Annotate | Download | only in uvm
History log of /src/sys/uvm/uvm_extern.h
RevisionDateAuthorComments
 1.235  14-Sep-2025  andvar Fix various typos in comments and log message.
 1.234  27-Apr-2025  riastradh posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.
This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:

1. we set up the child to share proc0.p_vmspace at first,

2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,

3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.233  26-Feb-2023  skrll branches: 1.233.6;
nkmempages should be size_t
 1.232  31-May-2021  riastradh branches: 1.232.12;
uvm: Make uvm_extern.h (more) self-contained, needs sys/types.h.
 1.231  14-Aug-2020  chs branches: 1.231.6; 1.231.8;
centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.230  14-Jun-2020  ad g/c vm_page_zero_enable
 1.229  13-Jun-2020  ad uvm_pagerealloc(): resurrect the insertion case.
 1.228  11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.227  26-May-2020  kamil Catch up with the usage of struct vmspace::vm_refcnt

Use the dedicated reference counting routines.

Change the type of struct vmspace::vm_refcnt and struct vm_map::ref_count
to volatile.

Remove the unnecessary vm->vm_map.misc_lock locking in process_domem().

Reviewed by <ad>
 1.226  09-May-2020  thorpej Make the uvm_voaddr structure more compact, only occupying 2 pointers
worth of space, by encoding the type in the lower bits of the object
pointer.
 1.225  27-Apr-2020  rin Add missing \ to fix build for PMAP_CACHE_VIVT, i.e., ARMv4 and prior.
 1.224  23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.223  18-Apr-2020  thorpej Add an API to get a reference on the identity of an individual byte of
virtual memory, a "virtual object address". This is not a reference to
a physical byte of memory, per se, but a reference to a byte residing
in a page, owned by a unique UVM object (either a uobj or an anon). Two
separate address+addresses space tuples that reference the same byte in
an object (such as a location in a shared memory segment) will resolve
to equivalent virtual object addresses. Even if the residency status
of the page changes, the virtual object address remains unchanged.

struct uvm_voaddr -- a structure that encapsulates this address reference.

uvm_voaddr_acquire() -- a function to acquire this address reference,
given a vm_map and a vaddr_t.

uvm_voaddr_release() -- a function to release this address reference.

uvm_voaddr_compare() -- a function to compare two such address references.

uvm_voaddr_acquire() resolves the COW status of the object address before
acquiring.

In collaboration with riastradh@ and chs@.
 1.222  22-Mar-2020  ad branches: 1.222.2;
Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.221  23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.220  18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.219  15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.218  31-Dec-2019  ad branches: 1.218.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.217  31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.216  27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.215  21-Dec-2019  ad Add uvm_free(): returns number of free pages in system.
 1.214  16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.213  28-May-2018  chs branches: 1.213.2; 1.213.6;
allow tmpfs files to be larger than 4GB.
 1.212  19-May-2018  jdolecek Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.211  08-May-2018  christos don't store the rssmax in the lwp rusage, it is a per proc property. Instead
utilize an unused field in the vmspace struct to store it. Also conditionalize
on platforms that have pmap statistics available.
 1.210  20-Apr-2018  jdolecek add prot parameter for uvm_emap_enter(), so that it's possible to
enter also read/write mappings
 1.209  20-Apr-2018  jdolecek make ubc_alloc() and ubc_release() static, they should not be used
outside of ubc_uiomove()/ubc_zeropage(); for now mark as noinline
to keep them available as breakpoints
 1.208  15-Dec-2017  maya branches: 1.208.2;
Match locking notes with reality.
misc_lock is used to protect vm_refcnt.

ok chuq
 1.207  02-Dec-2017  mrg add two new members to uvmexp_sysctl{}: bootpages and poolpages.
bootpages is set to the pages allocated via uvm_pageboot_alloc().
poolpages is calculated from the list of pools nr_pages members.

this brings us closer to having a valid total of pages known by
the system, vs actual pages originally managed.

XXX: poolpages needs some handling for PR_RECURSIVE pools still.
 1.206  20-May-2017  chs MAP_FIXED means something different for mremap() than it does for mmap(),
so we cannot use UVM_FLAG_FIXED to specify both behaviors.
keep UVM_FLAG_FIXED with its earlier meaning (prior to my previous change)
of whether to use uvm_map_findspace() to locate space for the new mapping or
to use the hint address that the caller passed in, and add a new flag
UVM_FLAG_UNMAP to indicate that any existing entries in the range should be
unmapped as part of creating the new mapping. the new UVM_FLAG_UNMAP flag
may only be used if UVM_FLAG_FIXED is also specified.
 1.205  17-May-2017  christos snprintb(3) for UVM_FLAGS.
 1.204  06-May-2017  joerg Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.203  04-Jan-2017  christos branches: 1.203.6;
don't include uvm_physseg.h for kmem grovellers.
 1.202  02-Jan-2017  cherry Remove a redundant #ifdef _KERNEL/#endif pair.

ok mrg@
 1.201  24-Dec-2016  cherry uvm_extern.h is has both a _KERNEL only, and a non _KERNEL only API.

Since we unconditionally expose the uvm_physseg.h API via uvm_extern.h
right now, and since uvm_physseg.h uses a kernel only datatype, viz
psize_t, we restrict exposure of uvm_physseg.h API exposure to kernel
only.

This is in conformance of its documentation via uvm_hotplug(9) as a
kernel internal API.
 1.200  22-Dec-2016  cherry Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h

For this, include uvm_physseg.h in the build and include tree, make a
cosmetic modification to the prototype for uvm_page_physload().
 1.199  22-Dec-2016  cherry Add a new function called uvm_md_init() that can be called at the
appropriate time in the boot path by MD code.
 1.198  20-Jul-2016  maxv Introduce uvm_km_protect.
 1.197  25-May-2016  christos branches: 1.197.2;
Introduce security.pax.mprotect.ptrace sysctl which can be used to bypass
mprotect settings so that debuggers can write to the text segment of traced
processes so that they can insert breakpoints. Turned off by default.
Ok: chuq (for now)
 1.196  05-Feb-2016  christos PR/50744: NONAKA Kimihiro: Protect more stuff with _KERNEL && _KMEMUSER to
make uvm_extern.h compile standalone again for net-snmp.
 1.195  26-Nov-2015  martin We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.
 1.194  20-Mar-2015  riastradh Comments explaining UBC_* flags.
 1.193  06-Feb-2015  maxv Kill kmeminit().
 1.192  14-Dec-2014  chs add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.191  07-Jul-2014  riastradh branches: 1.191.2; 1.191.4;
Initialize ubchist earlier.
 1.190  22-May-2014  riastradh Add uao_set_pgfl to limit a uvm_aobj's pages to a specified freelist.

Brought up on tech-kern:

https://mail-index.netbsd.org/tech-kern/2014/05/20/msg017095.html
 1.189  21-Feb-2014  skrll branches: 1.189.2;
Remove unnecessary struct simplelock forward declaration.
 1.188  03-Jan-2014  dsl There is no need for uvm_coredump_walkmap() to explicity pass the proc_t
pointer to the calller's function.
If the code needs the process its address can be placed in the caller's
cookie.
 1.187  03-Jan-2014  dsl Minor changes to the process coredump code.
- Add some extra comments.
- Add some XXX comments because the process state might not be stable,
- Add uvm_coredump_count_segs() to simplify the calling code.
- uvm code now only returns non-empty sections/segments.
- Put the 'iocookie' into the 'cookie' block passed to uvm_coredump_walkmap()
instead of passing it through as an additional parameter.
amd64 can still generate core dumps that gdb can read.
 1.186  01-Jan-2014  dsl Change the type of the 'cookie' that holds the state of the core dump file
from 'void *' to the actual type 'struct coredump_iostate *'.
In most of the code the contents of the structure are still unknown.
This just stops the wrong type of pointer being passed to the 'void *'
parameter.
I hope I've found everything, amd64 GENERIC and i386 GENERIC & ALL compile.
 1.185  14-Nov-2013  martin As discussed on tech-kern: make TOPDOWN-VM runtime selectable per process
(offer MD code or emulations to override it).
 1.184  01-Sep-2012  matt branches: 1.184.2; 1.184.4;
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.183  08-Apr-2012  martin Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
 1.182  18-Mar-2012  uebayasi Move base type definitions from uvm_extern.h to uvm_param.h so that
other sources can easily include part of UVM headers without the whole
uvm_extern.h (e.g. sys/vnode.h wants only uvm_object.h).
 1.181  02-Feb-2012  para branches: 1.181.2;
- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)
 1.180  27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.179  05-Jan-2012  reinoud Revert MAP_NOSYSCALLS patch.
 1.178  22-Dec-2011  reinoud Redo uvm_map_setattr() to never fail and remove the possible panic. The
possibility of failure was a C&P error.
 1.177  20-Dec-2011  reinoud Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..
 1.176  01-Sep-2011  matt branches: 1.176.2; 1.176.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.
 1.175  27-Aug-2011  christos Add an optional pglist argument to uvm_obj_wirepages, to be
filled with the list of pages that were wired.
 1.174  16-Jun-2011  hannken Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.173  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.172  23-Apr-2011  rmind branches: 1.172.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.171  17-Feb-2011  matt Add support for cpu-specific uarea allocation routines. Allows different
allocation for user and system lwps. MIPS will use this to map uareas of
system lwp used direct-mapped addresses (to reduce the overhead of
switching to kernel threads). ibm4xx could use to map uareas via direct
mapped addresses and avoid the problem of having the kernel stack not in
the TLB.
 1.170  10-Feb-2011  pooka Make vmapbuf() return success/error and make physio deal with a
failure.
 1.169  02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.168  04-Jan-2011  matt branches: 1.168.2; 1.168.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
 1.167  20-Dec-2010  matt Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.166  13-Nov-2010  uebayasi Hide uvm/uvm_page.h again to ensure its internal structures are MD.

GENERIC or at least one kernel compile tested for:
acorn26, acorn32, algor, all, alpha, amd64, amiga, amigappc,
arc, bebox, bighill, cats, cobalt, dreamcast, ews4800mips,
hp300, hp700, hpcarm, hpcmips, hpcsh, i386, ibmnws,
integrator, ixm1200, iyonix, landisk, luna68k, mac68k,
macppc, mipsco, mmeye, mvme68k, mvmeppc, netwinder, news68k,
newsmips, next68k, obs266a, ofppc, pmax, pmppc, prep,
rs6000, sandpoint, sbmips, shark, sidebeach, sparc, sparc64,
sun2, sun3, usermode, vax, x68k, zaurus
 1.165  12-Nov-2010  uebayasi Put back uvm_page.h for now. Sorry for mess.
 1.164  12-Nov-2010  uebayasi Abstraction fix; don't pull in physical segment/page definitions
in UVM external API, uvm_extern.h. Because most users care only
virtual memory.

Device drivers use bus_dma(9) to manage physical memory. Device
drivers pull in bus_dma(9) API, bus_dma.h. bus_dma(9) implementations
pull in UVM internal API, uvm.h.

Tested By: Compiling i386 ALL kernel
 1.163  16-Apr-2010  rmind - Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
 1.162  08-Feb-2010  joerg branches: 1.162.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
 1.161  21-Nov-2009  rmind branches: 1.161.2;
Add uvm_lwp_getuarea() and uvm_lwp_setuarea(). OK matt@.
 1.160  21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.159  18-Aug-2009  yamt whitespace fixes. no functional changes.
 1.158  10-Aug-2009  haad Add uvm_reclaim_hooks support for reclaiming kernel KVA space and memory.
This is used only by zfs where uvm_reclaim hook is added from arc cache.

Oked ad@.
 1.157  05-Aug-2009  pooka kill uvm_aio_biodone1(). only user was lfs and that uses nestiobuf now.
 1.156  05-Aug-2009  pooka add some advice symbols we'll eventually need
 1.155  28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.154  30-Mar-2009  yamt g/c uvm_aiobuf_pool.
 1.153  29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.152  12-Mar-2009  abs Clarify free_list usage in uvm_page_physload() regarding faster/slower RAM.
Slower RAM should be assigned a higher free_list id.
No functional change to code, just comments and manpage
 1.151  18-Feb-2009  yamt make some functions static.
 1.150  26-Nov-2008  pooka branches: 1.150.4;
Rototill all remaining file systems to use ubc_uiomove() instead
of the ubc_alloc() - uiomove() - ubc_release() dance.
 1.149  31-Oct-2008  christos - allocate 8 pointers on the stack to avoid stack overflow in nfs.
- make that 8 a constant
- remove bogus panic
 1.148  08-Aug-2008  skrll branches: 1.148.2; 1.148.4;
g/c exec_map
 1.147  11-Jul-2008  skrll English improvement in comments.

"seems good to me :)" from yamt.
 1.146  04-Jun-2008  ad branches: 1.146.2; 1.146.4;
- vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.145  29-Feb-2008  yamt branches: 1.145.2; 1.145.4; 1.145.6;
uvm_swap_io: if pagedaemon, don't wait for iobuf.
 1.144  28-Jan-2008  yamt branches: 1.144.2; 1.144.6;
remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.143  02-Jan-2008  ad Merge vmlocking2 to head.
 1.142  26-Dec-2007  christos Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
 1.141  24-Dec-2007  perry Remove __attribute__((__noreturn__)) from things already marked __dead
Found by the department of redundancy department.
 1.140  13-Dec-2007  yamt add ddb "whatis" command. inspired from solaris ::whatis dcmd.
 1.139  05-Dec-2007  yamt branches: 1.139.2; 1.139.4;
g/c uvm_vnp_sync
 1.138  05-Dec-2007  yamt fix UBC_WANT_UNMAP.
- check PMAP_CACHE_VIVT after pulling pmap.h.
- VTEXT -> VI_TEXT.
 1.137  30-Nov-2007  ad branches: 1.137.2;
Make {anon,file,exec}pages unsigned.
 1.136  06-Nov-2007  ad Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.135  18-Aug-2007  ad branches: 1.135.2; 1.135.6; 1.135.8;
Make the uarea cache per-CPU and drain in batches of 4.
 1.134  27-Jul-2007  yamt branches: 1.134.4; 1.134.6;
ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.133  22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.132  17-Jul-2007  joerg branches: 1.132.2;
Add native mremap system call based on the UVM implementation for
Linux compat. Add code to enforce alignment of the new location.
Special thanks to wizd for helping with the man page.
 1.131  09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.130  05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.129  24-Mar-2007  rmind Export uvm_uarea_free() to the rest.
Make things compile again.
 1.128  04-Mar-2007  christos branches: 1.128.2; 1.128.4; 1.128.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.127  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.126  21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.125  15-Feb-2007  ad branches: 1.125.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.124  21-Dec-2006  yamt merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.123  07-Dec-2006  elad Back out uvm_is_swap_device().
 1.122  01-Dec-2006  elad branches: 1.122.2;
Introduce uvm_is_swap_device(), to check if the passed struct vnode * is
used as a swap device or not.

Okay mrg@.
 1.121  12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.120  12-Oct-2006  yamt uobj_wirepages and uobj_unwirepages from Mindaugas. PR/34771.
(commented out in files.uvm for now because there is no user in tree.)

http://mail-index.netbsd.org/tech-kern/2006/09/24/0000.html
http://mail-index.netbsd.org/tech-kern/2006/10/10/0000.html
 1.119  05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.118  15-Sep-2006  yamt branches: 1.118.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.117  01-Sep-2006  cherry branches: 1.117.2;
bumps kernel aobj to 64 bit. \
See: http://mail-index.netbsd.org/tech-kern/2006/03/07/0007.html
 1.116  04-Aug-2006  he Rearrange included headers and/or add include of <sys/types.h> and
<sys/lock.h>, so that the mipsco port can build again, ref.
http://mail-index.netbsd.org/port-mips/2006/08/04/0000.html
Reviewed by thorpej
 1.115  05-Jul-2006  drochner Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.
 1.114  19-May-2006  yamt branches: 1.114.2; 1.114.4;
UVM_MAPFLAG: add missing parens.
 1.113  14-May-2006  elad integrate kauth.
 1.112  15-Mar-2006  drochner branches: 1.112.2;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.111  01-Mar-2006  yamt branches: 1.111.2; 1.111.4;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.110  10-Feb-2006  simonb Make a note that some counters should be 64-bit as they wrap far to
quickly.
 1.109  21-Jan-2006  yamt branches: 1.109.2; 1.109.4;
implement compat_linux mremap.
 1.108  21-Dec-2005  yamt branches: 1.108.2;
make length of inactive queue tunable by sysctl. (vm.inactivepct)
 1.107  29-Nov-2005  yamt merge yamt-readahead branch.
 1.106  01-Sep-2005  yamt branches: 1.106.6;
remove one of duplicated forward decl. of vmspace. pointed by Dheeraj S.
 1.105  01-Sep-2005  yamt put back uvm_fault.h for now as it's needed for some ports.
 1.104  27-Aug-2005  yamt don't include uvm_fault.h unnecessarily.
 1.103  10-Jun-2005  matt branches: 1.103.2;
Rework the coredump code to have no explicit knownledge of how coredump
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
 1.102  02-Jun-2005  matt When writing coredumps, don't write zero uninstantiated demand-zero pages.
Also, with ELF core dumps, trim trailing zeroes from sections. These two
changes can shrink coredumps by over 50% in size.
 1.101  15-May-2005  yamt remove anon related statistics which are no longer used.
 1.100  01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.99  26-Mar-2005  fvdl Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.
 1.98  13-Jan-2005  yamt branches: 1.98.2; 1.98.4; 1.98.8;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.
 1.97  09-Jan-2005  chs adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
 1.96  01-Jan-2005  yamt in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.
 1.95  01-Jan-2005  yamt introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.
 1.94  01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.93  28-Aug-2004  thorpej Garbage-collect pagemove(); nothing use it anymore (YAY!!!)
 1.92  04-May-2004  pk Since a `vmspace' always includes a `vm_map' we can re-use vm_map's
reference count lock to also protect the vmspace's reference count.
 1.91  24-Mar-2004  junyoung Nuke __P().
 1.90  14-Mar-2004  jdolecek fix typo in comment
 1.89  13-Feb-2004  yamt when breaking a loan from uobj,
insert the replacement page into the same position
as the original page on the object memq so that
genfs_putpages (and lfs) won't be confused.

noted by Stephan Uphoff (PR/24328)
 1.88  04-Jan-2004  jdolecek Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread
 1.87  18-Dec-2003  pk * Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.
 1.86  18-Dec-2003  pk Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.
 1.85  13-Nov-2003  chs eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
 1.84  11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.83  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.82  29-Jun-2003  fvdl branches: 1.82.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.81  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.80  10-May-2003  thorpej Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.79  08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.78  03-May-2003  wiz Misc fixes from jmc@openbsd.
 1.77  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.76  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.75  11-Dec-2002  thorpej Define a UVM_FLAG_NOWAIT, which indicates that we're not allowed
to sleep. Define UVM_KMF_NOWAIT in terms of UVM_FLAG_NOWAIT.

From Manuel Bouyer. Fixes a problem where any mapping with
read protection was created in a "nowait" context, causing
spurious failures.
 1.74  17-Nov-2002  chs change uvm_uarea_alloc() to indicate whether the returned uarea is already
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
 1.73  22-Sep-2002  chs encapsulate knowledge of uarea allocation in some new functions.
 1.72  15-Sep-2002  chs add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.71  17-May-2002  enami branches: 1.71.2;
Make uvn_findpages to return number of pages found so that caller can
easily check if all requested pages are found or not.
 1.70  10-Dec-2001  thorpej branches: 1.70.8;
Move the code that walks the process's VM map during a coredump
into uvm_coredump_walkmap(), and use callbacks into the coredump
routine to do something with each section.
 1.69  09-Dec-2001  chs add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.68  08-Dec-2001  thorpej Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).
 1.67  15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.66  16-Aug-2001  chs branches: 1.66.2;
user maps are always pageable.
 1.65  02-Jun-2001  chs branches: 1.65.2;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.64  26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.63  25-May-2001  chs remove trailing whitespace.
 1.62  02-May-2001  thorpej Support dynamic sizing of the page color bins. We also support
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.

The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.

Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
 1.61  01-May-2001  thorpej Add the number of page colors to uvmexp.
 1.60  29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.59  25-Apr-2001  thorpej pmap_resident_count() always exists. Besides, returning the
value of vm_rssize is pointless -- it is never initialized to
anything other than 0.
 1.58  15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.57  09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.56  06-Feb-2001  eeh branches: 1.56.2;
Specify a process' address space limits for uvmspace_exec().
 1.55  30-Nov-2000  simonb Move uvm_pgcnt_vnode and uvm_pgcnt_anon into uvmexp (as vnodepages and
anonpages), and add vtextpages which is currently unused but will be
used to trace the number of pages used by vtext vnodes.
 1.54  29-Nov-2000  simonb Add a vm.uvmexp2 sysctl that uses a ABI-safe 'struct uvmexp_sysctl'.
 1.53  27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.52  27-Nov-2000  nisimura Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.
 1.51  28-Sep-2000  eeh Add support for variable end of user stacks needed to support COMPAT_NETBSD32:

`struct vmspace' has a new field `vm_minsaddr' which is the user TOS.

PS_STRINGS is deprecated in favor of curproc->p_pstr which is derived
from `vm_minsaddr'.

Bump the kernel version number.
 1.50  21-Sep-2000  thorpej Make PMAP_PAGEIDLEZERO() return a boolean value. FALSE indidcates
that the page being zero'd was not completed and that page zeroing
should be aborted. This may be used by machine-dependent code doing
slow page access to reduce the latency of running a process that has
become runnable while in the middle of doing a slow page zero.
 1.49  13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.48  12-Aug-2000  thorpej Don't bother with a trampoline to start the pagedaemon and
reaper threads.
 1.47  01-Aug-2000  wiz Rename VM_INHERIT_* to MAP_INHERIT_* and move them to sys/sys/mman.h as
discussed on tech-kern.
Retire sys/uvm/uvm_inherit.h, update man page for minherit(2).
 1.46  24-Jul-2000  jeffs Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.
 1.45  27-Jun-2000  mrg move the contents of <vm/vm.h> into <uvm/uvm_extern.h>. <vm/vm.h> is simply
an include of <uvm/uvm_extern.h> now.
 1.44  27-Jun-2000  mrg more vm header file changes:

<vm/vm_extern.h> merged into <uvm/uvm_extern.h>
<vm/vm_page.h> merged into <uvm/uvm_page.h>
<vm/pmap.h> has become <uvm/uvm_pmap.h>

this leaves just <vm/vm.h> in NetBSD.
 1.43  26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.42  08-Jun-2000  thorpej Change UVM_UNLOCK_AND_WAIT() to use ltsleep() (it is now atomic, as
advertised). Garbage-collect uvm_sleep().
 1.41  28-May-2000  thorpej Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.
 1.40  24-Apr-2000  thorpej branches: 1.40.2;
Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.39  10-Apr-2000  thorpej Add UVM_PGA_ZERO which instructs uvm_pagealloc{,_strat}() to return a
zero'd, ! PG_CLEAN page, as if it were uvm_pagezero()'d.
 1.38  26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.37  11-Feb-2000  thorpej Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.
 1.36  11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.35  30-Dec-1999  eeh I should have made uvm_page_physload() take paddr_t's instead of vaddr_t's.
Also, add uvm_coredump32().
 1.34  22-Jul-1999  thorpej branches: 1.34.2;
Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.33  17-Jul-1999  thorpej Add a set of "lockflags", which can control the locking behavior
of some functions. Use these flags in uvm_map_pageable() to determine
if the map is locked on entry (replaces an already present boolean_t
argument `islocked'), and if the function should return with the map
still locked.
 1.32  02-Jul-1999  thorpej Bring in additional uvmexp members from chs-ubc2, so that VM stats can
be read no matter which kernel you're running.
 1.31  21-Jun-1999  thorpej Protect prototypes, certain macros, and inlines from userland.
 1.30  18-Jun-1999  thorpej Add the guts of mlockall(MCL_FUTURE). This requires that a process's
"memlock" resource limit to uvm_mmap(). Update all calls accordingly.
 1.29  17-Jun-1999  thorpej Make uvm_vslock() return the error code from uvm_fault_wire(). All places
which use uvm_vslock() should now test the return value. If it's not
KERN_SUCCESS, wiring the pages failed, so the operation which is using
uvm_vslock() should error out.

XXX We currently just EFAULT a failed uvm_vslock(). We may want to do
more about translating error codes in the future.
 1.28  15-Jun-1999  thorpej Several changes, developed and tested concurrently:
* Provide POSIX 1003.1b mlockall(2) and munlockall(2) system calls.
MCL_CURRENT is presently implemented. MCL_FUTURE is not fully
implemented. Also, the same one-unlock-for-every-lock caveat
currently applies here as it does to mlock(2). This will be
addressed in a future commit.
* Provide the mincore(2) system call, with the same semantics as
Solaris.
* Clean up the error recovery in uvm_map_pageable().
* Fix a bug where a process would hang if attempting to mlock a
zero-fill region where none of the pages in that region are resident.
[ This fix has been submitted for inclusion in 1.4.1 ]
 1.27  26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.26  26-May-1999  thorpej Pass an access_type to uvm_vslock().
 1.25  13-May-1999  thorpej Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).
 1.24  11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.23  26-Mar-1999  chs branches: 1.23.2;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.22  25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.21  08-Sep-1998  thorpej branches: 1.21.2;
Implement uvm_exit(), which frees VM resources when a process finishes
exiting.
 1.20  28-Aug-1998  thorpej Add a waitok boolean argument to the VM system's pool page allocator backend.
 1.19  13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.18  01-Aug-1998  thorpej We need to be able to specify a uvm_object to the pool page allocator, too.
 1.17  31-Jul-1998  thorpej Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.
 1.16  24-Jul-1998  thorpej branches: 1.16.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.
 1.15  08-Jul-1998  thorpej Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.14  04-Jul-1998  jonathan defopt DDB.
 1.13  09-May-1998  kleink Use size_t to pass the length of the memory region to operate on to chgkprot(),
kernacc(), useracc(), vslock() and vsunlock(); (unsigned) ints are not
adequate on all platforms.
 1.12  30-Apr-1998  thorpej Pass vslock() and vsunlock() a proc *, rather than implicitly operating
on curproc.
 1.11  30-Mar-1998  mycroft Mark scheduler() and uvm_scheduler() as never returning.
 1.10  27-Mar-1998  thorpej Split uvmspace_alloc() into uvmspace_alloc() and uvmspace_init(). The latter
can be used for initializing a pre-allocated vmspace.
 1.9  09-Mar-1998  mrg KNF.
 1.8  10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.7  09-Feb-1998  mrg keep statistics on pageout/pagein, total pages, and total operations.
 1.6  08-Feb-1998  thorpej Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.
 1.5  07-Feb-1998  mrg implement counters for pages paged in/out
 1.4  07-Feb-1998  mrg restore rcsids
 1.3  07-Feb-1998  chs prototype for uvm_map_checkprot() moved here.
add uvmexp fields for pagouts-in-progress and kernel-reserved pages.
 1.2  06-Feb-1998  thorpej RCS ID police.
 1.1  05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.16.2.2  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.16.2.1  30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.21.2.6  02-Jun-1999  chs add a new uvn_findpages() flag, UFP_NORDONLY,
which means that PG_RDONLY pages should not be returned.
 1.21.2.5  30-May-1999  chs uvm_vnp_setpageblknos() is out, uvm_vnp_asyncget() is in.
 1.21.2.4  30-Apr-1999  chs change ubc_alloc()'s length arg to be a pointer instead of the value.
the pointed-to value is the total desired length on input,
and is updated to the length that will fit in the returned window.
this allows callers of ubc_alloc() to be ignorant of the window size.
 1.21.2.3  09-Apr-1999  chs add decl for aiodone daemon.
 1.21.2.2  25-Feb-1999  chs define UFP_* (uvn_findpages() flags).
add uvm_aiobuf pool stuff.
add new prototypes.
 1.21.2.1  09-Nov-1998  chs initial snapshot. lots left to do.
 1.23.2.1  16-Apr-1999  chs branches: 1.23.2.1.2;
pull up 1.23 -> 1.24:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.23.2.1.2.7  09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.23.2.1.2.6  02-Aug-1999  thorpej Update from trunk.
 1.23.2.1.2.5  11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.23.2.1.2.4  04-Jul-1999  chs adjust protos.
 1.23.2.1.2.3  01-Jul-1999  thorpej Sync w/ -current.
 1.23.2.1.2.2  21-Jun-1999  thorpej Sync w/ -current.
 1.23.2.1.2.1  07-Jun-1999  chs merge everything from chs-ubc branch.
 1.34.2.5  27-Mar-2001  bouyer Sync with HEAD.
 1.34.2.4  12-Mar-2001  bouyer Sync with HEAD.
 1.34.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.34.2.2  08-Dec-2000  bouyer Sync with HEAD.
 1.34.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.40.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.56.2.11  11-Dec-2002  thorpej Sync with HEAD.
 1.56.2.10  11-Dec-2002  thorpej Sync with HEAD.
 1.56.2.9  18-Oct-2002  nathanw Catch up to -current.
 1.56.2.8  17-Sep-2002  nathanw Catch up to -current.
 1.56.2.7  20-Jun-2002  nathanw Catch up to -current.
 1.56.2.6  08-Jan-2002  nathanw Catch up to -current.
 1.56.2.5  21-Sep-2001  nathanw Catch up to -current.
 1.56.2.4  24-Aug-2001  nathanw Catch up with -current.
 1.56.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.56.2.2  09-Apr-2001  nathanw Catch up with -current.
 1.56.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.65.2.4  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.65.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.65.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.65.2.1  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.66.2.1  01-Oct-2001  fvdl Catch up with -current.
 1.70.8.1  30-May-2002  gehenna Catch up with -current.
 1.71.2.1  02-Jun-2003  tron Pull up revision 1.72 (requested by skrll):
add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.82.2.10  11-Dec-2005  christos Sync with head.
 1.82.2.9  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.82.2.8  01-Apr-2005  skrll Sync with HEAD.
 1.82.2.7  17-Jan-2005  skrll Sync with HEAD.
 1.82.2.6  31-Oct-2004  skrll Reduce diff to HEAD.
 1.82.2.5  21-Sep-2004  skrll Fix the sync with head I botched.
 1.82.2.4  18-Sep-2004  skrll Sync with HEAD.
 1.82.2.3  03-Sep-2004  skrll Sync with HEAD
 1.82.2.2  03-Aug-2004  skrll Sync with HEAD
 1.82.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.98.8.1  18-Sep-2005  tron Pull up following revision(s) (requested by fvdl in ticket #798):
sys/compat/sunos/sunos_exec.c: revision 1.47
sys/compat/pecoff/pecoff_emul.c: revision 1.11
sys/arch/sparc64/sparc64/netbsd32_machdep.c: revision 1.45
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.12
sys/sys/proc.h: revision 1.198
sys/compat/mach/mach_exec.c: revision 1.56
sys/compat/freebsd/freebsd_exec.c: revision 1.27
sys/arch/sparc64/include/vmparam.h: revision 1.27
sys/kern/kern_resource.c: revision 1.91
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.88
sys/compat/osf1/osf1_exec.c: revision 1.39
sys/compat/svr4_32/svr4_32_resource.c: revision 1.5
sys/compat/ultrix/ultrix_misc.c: revision 1.99
sys/compat/svr4_32/svr4_32_exec.h: revision 1.9
sys/kern/exec_elf32.c: revision 1.103
sys/compat/aoutm68k/aoutm68k_exec.c: revision 1.19
sys/compat/sunos32/sunos32_exec.c: revision 1.20
sys/compat/hpux/hpux_exec.c: revision 1.46
sys/compat/darwin/darwin_exec.c: revision 1.40
sys/kern/sysv_shm.c: revision 1.83
sys/uvm/uvm_extern.h: revision 1.99
sys/uvm/uvm_mmap.c: revision 1.89
sys/kern/kern_exec.c: revision 1.195
sys/compat/netbsd32/netbsd32.h: revision 1.31
sys/arch/sparc64/sparc64/svr4_32_machdep.c: revision 1.20
sys/compat/svr4/svr4_exec.c: revision 1.56
sys/compat/irix/irix_exec.c: revision 1.41
sys/compat/ibcs2/ibcs2_exec.c: revision 1.63
sys/compat/svr4_32/svr4_32_exec.c: revision 1.16
sys/arch/amd64/include/vmparam.h: revision 1.8
sys/compat/linux/common/linux_exec.c: revision 1.73
Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
 1.98.4.3  26-Mar-2005  yamt sync with head.
 1.98.4.2  25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.98.4.1  25-Jan-2005  yamt remove some compatibility functions.
 1.98.2.1  29-Apr-2005  kent sync with -current
 1.103.2.9  17-Mar-2008  yamt sync with head.
 1.103.2.8  04-Feb-2008  yamt sync with head.
 1.103.2.7  21-Jan-2008  yamt sync with head
 1.103.2.6  07-Dec-2007  yamt sync with head
 1.103.2.5  15-Nov-2007  yamt sync with head.
 1.103.2.4  03-Sep-2007  yamt sync with head.
 1.103.2.3  26-Feb-2007  yamt sync with head.
 1.103.2.2  30-Dec-2006  yamt sync with head.
 1.103.2.1  21-Jun-2006  yamt sync with head.
 1.106.6.2  19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.106.6.1  17-Nov-2005  yamt comment.
 1.108.2.4  18-Feb-2006  yamt sync with head.
 1.108.2.3  01-Feb-2006  yamt sync with head.
 1.108.2.2  15-Jan-2006  yamt rename VMSPACE_IS_KERNEL to VMSPACE_IS_KERNEL_P. ("predicate")
suggested by Matt Thomas.
 1.108.2.1  31-Dec-2005  yamt - add a function to add a reference to a vmspace.
- add a macro to check if a vmspace belongs to kernel.
 1.109.4.2  01-Jun-2006  kardel Sync with head.
 1.109.4.1  22-Apr-2006  simonb Sync with head.
 1.109.2.1  09-Sep-2006  rpaulo sync with head
 1.111.4.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.111.4.1  19-Apr-2006  elad oops - *really* sync to head this time.
 1.111.2.5  03-Sep-2006  yamt sync with head.
 1.111.2.4  11-Aug-2006  yamt sync with head
 1.111.2.3  24-May-2006  yamt sync with head.
 1.111.2.2  01-Apr-2006  yamt sync with head.
 1.111.2.1  05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.112.2.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.114.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.114.2.2  19-May-2006  yamt UVM_MAPFLAG: add missing parens.
 1.114.2.1  19-May-2006  yamt file uvm_extern.h was added on branch chap-midi on 2006-05-19 15:08:15 +0000
 1.117.2.2  12-Jan-2007  ad Sync with head.
 1.117.2.1  18-Nov-2006  ad Sync with head.
 1.118.2.2  22-Oct-2006  yamt use workqueue for aiodoned.
 1.118.2.1  22-Oct-2006  yamt sync with head
 1.122.2.1  09-Dec-2006  bouyer Pull up following revision(s) (requested by elad in ticket #261):
sys/uvm/uvm_extern.h: revision 1.123
sys/uvm/uvm_swap.c: revision 1.115
share/man/man9/uvm.9: revision 1.79
Back out uvm_is_swap_device().
 1.125.2.3  15-Apr-2007  yamt sync with head.
 1.125.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.125.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.128.6.1  29-Mar-2007  reinoud Pullup to -current
 1.128.4.1  11-Jul-2007  mjf Sync with head.
 1.128.2.7  18-Oct-2007  ad Free uareas back to the uarea cache on the CPU where they were last used.
 1.128.2.6  21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.128.2.5  20-Aug-2007  ad Sync with HEAD.
 1.128.2.4  15-Jul-2007  ad Sync with head.
 1.128.2.3  09-Jun-2007  ad Sync with head.
 1.128.2.2  10-Apr-2007  ad Sync with head.
 1.128.2.1  05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.132.2.2  03-Sep-2007  skrll Sync with HEAD.
 1.132.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.134.6.2  27-Jul-2007  yamt ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.134.6.1  27-Jul-2007  yamt file uvm_extern.h was added on branch matt-mips64 on 2007-07-27 09:50:38 +0000
 1.134.4.4  09-Dec-2007  jmcneill Sync with HEAD.
 1.134.4.3  03-Dec-2007  joerg Sync with HEAD.
 1.134.4.2  06-Nov-2007  joerg Sync with HEAD.
 1.134.4.1  03-Sep-2007  jmcneill Sync with HEAD.
 1.135.8.4  18-Feb-2008  mjf Sync with HEAD.
 1.135.8.3  27-Dec-2007  mjf Sync with HEAD.
 1.135.8.2  08-Dec-2007  mjf Sync with HEAD.
 1.135.8.1  19-Nov-2007  mjf Sync with HEAD.
 1.135.6.1  13-Nov-2007  bouyer Sync with HEAD
 1.135.2.3  23-Mar-2008  matt sync with HEAD
 1.135.2.2  09-Jan-2008  matt sync with HEAD
 1.135.2.1  06-Nov-2007  matt sync with HEAD
 1.137.2.3  26-Dec-2007  ad Sync with head.
 1.137.2.2  08-Dec-2007  ad Sync with head.
 1.137.2.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.139.4.2  02-Jan-2008  bouyer Sync with HEAD
 1.139.4.1  13-Dec-2007  bouyer Sync with HEAD
 1.139.2.2  13-Dec-2007  yamt sync with head.
 1.139.2.1  10-Dec-2007  yamt - separate kernel va allocation (kernel_va_arena) from
in-kernel fault handling (kernel_map).
- add vmem bootstrap code. vmem doesn't rely on malloc anymore.
- make kmem_alloc interrupt-safe.
- kill kmem_map. make malloc a wrapper of kmem_alloc.
 1.144.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.144.6.3  28-Sep-2008  mjf Sync with HEAD.
 1.144.6.2  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.144.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.144.2.1  24-Mar-2008  keiichi sync with head.
 1.145.6.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.145.6.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.145.4.5  11-Aug-2010  yamt sync with head.
 1.145.4.4  11-Mar-2010  yamt sync with head
 1.145.4.3  19-Aug-2009  yamt sync with head.
 1.145.4.2  18-Jul-2009  yamt sync with head.
 1.145.4.1  04-May-2009  yamt sync with head.
 1.145.2.1  17-Jun-2008  yamt sync with head.
 1.146.4.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.146.4.1  19-Oct-2008  haad Sync with HEAD.
 1.146.2.1  18-Jul-2008  simonb Sync with head.
 1.148.4.2  01-Apr-2009  snj branches: 1.148.4.2.4;
Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.148.4.1  02-Nov-2008  snj Pull up following revision(s) (requested by tron in ticket #9):
sys/nfs/nfs_bio.c: revision 1.180
sys/miscfs/genfs/genfs_io.c: revision 1.14
sys/uvm/uvm_extern.h: revision 1.149
- allocate 8 pointers on the stack to avoid stack overflow in nfs.
- make that 8 a constant
- remove bogus panic
 1.148.4.2.4.6  12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.148.4.2.4.5  09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.148.4.2.4.4  03-Jun-2011  matt Restore $NetBSD$
 1.148.4.2.4.3  03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.148.4.2.4.2  25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.148.4.2.4.1  26-Jan-2010  matt Pass hints to uvm_pagealloc* to get it to use the right page color rather
than guess the right page color.
 1.148.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.148.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.148.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.150.4.2  23-Jul-2009  jym Sync with HEAD.
 1.150.4.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.161.2.13  18-Nov-2010  uebayasi Make XIP pager use cdev_mmap() instead of struct vm_physseg.
 1.161.2.12  16-Nov-2010  uebayasi Factor out the part which lookups physical page "identity" from
UVM object, into sys/uvm/uvm_vnode.c:uvn_findpage_xip(). Eventually
this will become a call to cdev UVM object pager.
 1.161.2.11  15-Nov-2010  uebayasi Hide uvm/uvm_page.h here again.
 1.161.2.10  02-Nov-2010  uebayasi Drop the 'paddr_t avail_start' and 'paddr_t avail_end' arguments
from uvm_page_physload_device(9).

Those two arguments are used by uvm_page_physload(9) to specify a
range of physical memory available for general purpose pages (pages
which are linked to freelists). Totally irrelevant to device
segments.
 1.161.2.9  30-Oct-2010  uebayasi Put back #include <uvm/uvm_page.h> for now, to avoid build erros.

This should be removed again later, because exposing page-level
definitions out of UVM is totally unnecessary.
 1.161.2.8  26-Jul-2010  uebayasi After much consideration, rename bus_space_physload_direct(9) back to
bus_space_physload_device(9).

The latter registers a segment as "device pages". "Device pages" are
managed, but not used for general purpose memory. Most typically XIP
pages.
 1.161.2.7  31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.161.2.6  30-Apr-2010  uebayasi Sync with HEAD.
 1.161.2.5  29-Apr-2010  uebayasi "int free_list" (VM_FREELIST_*) is specific to struct vm_page (memory
page). Handle it only in memory physseg parts.

Record device page's properties in struct vm_physseg for future uses.
For example, framebuffers that is capable of some accelarated bus access
(e.g. write-combining) should register its capability through "int
flags".
 1.161.2.4  28-Apr-2010  uebayasi Initial support of uvm_page_physunload(9) and uvm_page_physunload_device(9).
Note that callers of these functions are responsible to ensure that the
segment is not used.
 1.161.2.3  28-Apr-2010  uebayasi Don't expose uvm_page.h internal for usual uvm(9) users.
 1.161.2.2  27-Apr-2010  uebayasi Forgotten to check this in; now uvm_page_physload() and
uvm_page_physload_device() returns struct vm_physseg * (which is not
used yet).
 1.161.2.1  23-Feb-2010  uebayasi Introduce uvm_page_physload_device(). This registers a physical address
range of a device, similar to uvm_page_physload() for memories. For now,
this is supposed to be called by MD code. We have to consider the design
when we'll manage mmap'able character devices.

Expose paddr_t -> struct vm_page * conversion function for device pages,
uvm_phys_to_vm_page_device(). This will be called by XIP vnode pager.
Because it knows if a given vnode is a device page (and its physical
address base) or not. Don't look up device segments, but directly make a
cookie.
 1.162.2.8  31-May-2011  rmind sync with head
 1.162.2.7  19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.162.2.6  05-Mar-2011  rmind sync with head
 1.162.2.5  30-May-2010  rmind sync with head
 1.162.2.4  26-Apr-2010  rmind Add ubc_purge() and purge/deassociate any related UBC entries during
object (usually, vnode) destruction. Since locking (and thus object)
is required to enter/remove mappings - object is not allowed anymore
to disappear with any UBC entries left.

From original patch by ad@ with some modifications.
 1.162.2.3  23-Apr-2010  rmind Use consistent naming - uvm_obj_*().
 1.162.2.2  18-Mar-2010  rmind Unify /dev/{mem,kmem,zero,null} implementations in MI code. Based on patch
from Joerg Sonnenberger, proposed on tech-kern@, in February 2008.

Work and depression still in progress.
 1.162.2.1  16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.168.4.3  05-Mar-2011  bouyer Sync with HEAD
 1.168.4.2  17-Feb-2011  bouyer Sync with HEAD
 1.168.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.168.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.172.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.176.6.3  29-Apr-2012  mrg sync to latest -current.
 1.176.6.2  05-Apr-2012  mrg sync to latest -current.
 1.176.6.1  18-Feb-2012  mrg merge to -current.
 1.176.2.11  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.176.2.10  30-Oct-2012  yamt sync with head
 1.176.2.9  17-Apr-2012  yamt sync with head
 1.176.2.8  05-Feb-2012  yamt turn vm.loanread sysctl to a threshold.
 1.176.2.7  11-Jan-2012  yamt create a sysctl knob to turn on/off loaned read.
 1.176.2.6  26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.176.2.5  20-Dec-2011  yamt don't inline uvn_findpages in genfs_io.
 1.176.2.4  20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.176.2.3  14-Nov-2011  yamt might dirty -> possibly dirty
suggested by wiz@
 1.176.2.2  12-Nov-2011  yamt redo the page clean/dirty/unknown accounting separately for file and
anonymous pages
 1.176.2.1  11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.181.2.1  12-Apr-2012  riz branches: 1.181.2.1.2;
Pull up following revision(s) (requested by martin in ticket #175):
sys/kern/kern_exit.c: revision 1.238
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.4
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.5
sys/uvm/uvm_extern.h: revision 1.183
lib/libc/gen/posix_spawn_fileactions.c: revision 1.2
sys/kern/kern_exec.c: revision 1.348
sys/kern/kern_exec.c: revision 1.349
sys/compat/netbsd32/syscalls.master: revision 1.95
sys/uvm/uvm_glue.c: revision 1.159
sys/uvm/uvm_map.c: revision 1.317
sys/compat/netbsd32/netbsd32.h: revision 1.95
sys/kern/exec_elf.c: revision 1.38
sys/sys/spawn.h: revision 1.2
sys/sys/exec.h: revision 1.135
sys/compat/netbsd32/netbsd32_execve.c: revision 1.34
Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
Fix asynchronous posix_spawn child exit status (and test for it).
 1.181.2.1.2.1  28-Nov-2012  matt Pull from HEAD:
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.184.4.1  18-May-2014  rmind sync with head
 1.184.2.2  03-Dec-2017  jdolecek update from HEAD
 1.184.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.189.2.1  10-Aug-2014  tls Rebase.
 1.191.4.7  28-Aug-2017  skrll Sync with HEAD
 1.191.4.6  05-Feb-2017  skrll Sync with HEAD
 1.191.4.5  05-Oct-2016  skrll Sync with HEAD
 1.191.4.4  29-May-2016  skrll Sync with HEAD
 1.191.4.3  19-Mar-2016  skrll Sync with HEAD
 1.191.4.2  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.191.4.1  06-Apr-2015  skrll Sync with HEAD
 1.191.2.2  25-Mar-2015  snj Pull up following revision(s) (requested by maxv in ticket #617):
sys/kern/kern_malloc.c: revision 1.144, 1.145
sys/kern/kern_pmf.c: revision 1.37
sys/rump/librump/rumpkern/rump.c: revision 1.316
sys/uvm/uvm_extern.h: revision 1.193
sys/uvm/uvm_km.c: revision 1.139
Don't include <uvm/uvm_extern.h>
--
Kill kmeminit().
--
Remove this MALLOC_DEFINE (M_PMF unused).
 1.191.2.1  31-Dec-2014  snj Pull up following revision(s) (requested by chs in ticket #363):
common/lib/libprop/prop_kern.c: revision 1.18
sys/arch/mac68k/dev/grf_compat.c: revision 1.27
sys/arch/x68k/dev/grf.c: revision 1.45
sys/external/bsd/drm/dist/bsd-core/drm_bufs.c: revision 1.12
sys/external/bsd/drm2/drm/drm_drv.c: revision 1.12
sys/external/bsd/drm2/drm/drm_vm.c: revision 1.6
sys/external/bsd/drm2/include/linux/mm.h: revision 1.4
sys/kern/vfs_vnops.c: revision 1.192 via patch
sys/rump/librump/rumpkern/vm.c: revision 1.160
sys/sys/file.h: revision 1.78 via patch
sys/uvm/uvm_device.c: revision 1.64
sys/uvm/uvm_device.h: revision 1.13
sys/uvm/uvm_extern.h: revision 1.192
sys/uvm/uvm_mmap.c: revision 1.150 via patch
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.197.2.2  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.197.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.203.6.2  19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.203.6.1  11-May-2017  pgoyette Sync with HEAD
 1.208.2.3  25-Jun-2018  pgoyette Sync with HEAD
 1.208.2.2  21-May-2018  pgoyette Sync with HEAD
 1.208.2.1  22-Apr-2018  pgoyette Sync with HEAD
 1.213.6.1  09-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1947):

sys/uvm/uvm_extern.h: revision 1.234 (via patch)
sys/kern/kern_exec.c: revision 1.528 (via patch)
sys/uvm/uvm_map.c: revision 1.427 (via patch)

posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.

This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:
1. we set up the child to share proc0.p_vmspace at first,
2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,
3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.213.2.2  21-Apr-2020  martin Sync with HEAD
 1.213.2.1  08-Apr-2020  martin Merge changes from current as of 20200406
 1.218.2.2  29-Feb-2020  ad Sync with head.
 1.218.2.1  17-Jan-2020  ad Sync with head.
 1.222.2.2  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.222.2.1  20-Apr-2020  bouyer Sync with HEAD
 1.231.8.1  31-May-2021  cjep sync with head
 1.231.6.1  17-Jun-2021  thorpej Sync w/ HEAD.
 1.232.12.1  09-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1109):

sys/uvm/uvm_extern.h: revision 1.234
sys/kern/kern_exec.c: revision 1.528
sys/uvm/uvm_map.c: revision 1.427

posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.

This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:
1. we set up the child to share proc0.p_vmspace at first,
2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,
3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.233.6.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed